Introduction to XPath 1.0

XPath 1.0 is used in the Switch scripting API for querying the contents of XML documents in the XML and JDF data models and in the XML module.

Note:

The current version of Switch does not support XPath 2.0.

Refer to the XPath 1.0 specification, the XML 1.0 specification, the XML namespaces specification, and widely available literature for full details on XPath 1.0.

The remainder of this topic provides a brief introduction to a very small subset of XPath 1.0.

Expressions and location paths

XPath models an XML document as a tree of nodes. There are different types of nodes, including element nodes, attribute nodes and text nodes. XPath defines a way to address specific nodes in the node tree, to compute a string-value for each type of node and to perform more general calculations with these values.

The primary construct in XPath is the expression. An expression is evaluated to yield an object, which has one of the following four basic types:


One important kind of expression is a location path. A location path selects a set of nodes. The result of evaluating an expression that is a location path is the node-set containing the nodes selected by the location path. Location paths can recursively contain expressions that are used to filter sets of nodes.

Location paths

A location path is a sequence of one or more location steps separated by a slash. Each location step in turn (from left to right) selects a set of nodes relative to the context node determined by the previous step. The initial context node is determined by the location path's context (for the data model queries in Switch this is always the document's root node; in the XML module it is the node being queried). A slash in front of a location path makes the location path absolute, that is, the "/" refers to the root node of the document.

Here are some examples of location paths:

para

selects the para element children of the context node

*

selects all element children of the context node

text()

selects all text node children of the context node

@name

selects the name attribute of the context node

@*

selects all the attributes of the context node

para[1]

selects the first para child of the context node

para[last()]

selects the last para child of the context node

*/para

selects all para grandchildren of the context node

/doc/chapter[5]/section[2]

selects the second section element in the fifth chapter element in the doc document element

chapter//para

selects the para element descendants of the chapter element children of the context node

//para

selects all the para descendants of the document root and thus selects all para elements in the same document as the context node

//olist/item

selects all the item elements in the same document as the context node that have an olist parent

.

selects the context node

.//para

selects the para element descendants of the context node

..

selects the parent of the context node

../@lang

selects the lang attribute of the parent of the context node

para[@type="warning"]

selects all para child elements of the context node that have a type attribute with value warning

para[@type="warning"][5]

selects the fifth para child element of the context node that has a type attribute with value warning

para[5][@type="warning"]

selects the fifth para child element of the context node if that child has a type attribute with value warning

chapter[title="Introduction"]

selects the chapter child elements of the context node that have one or more title child elements with string-value equal to Introduction

//field[name="JobID"]/value

selects all value elements in the document that have a parent field element and a sibling name element with string-value equal to JobID

chapter[title]

selects the chapter child elements of the context node that have one or more title child elements

employee[@secretary and @assistant]

selects all the employee child elements of the context node that have both a secretary attribute and an assistant attribute

Expressions

The location paths listed in the previous section contain a few examples of simple expressions used to filter nodes (such as @type="warning" to filter out elements with a particular attribute value). Expressions can also stand on their own, and can be used to express calculations based on the result of a location path. XPath provides a limited number of functions for use in expressions.

Here are some examples of expressions:

para

evaluates to the text contents of the para element(s)

@type

evaluates to the text contents of the type attribute

(2 + 3) * 4

evaluates to the number 20

number(@type) > 10

evaluates to true if the contents of the type attribute represents a number that is greater than 10; and to false otherwise

count(//field)

evaluates to the number of field elements in the document, regardless of their position in the node tree

count(/doc/chapter[5]/section)

evaluates to the number of section elements in the fifth chapter element in the doc document element

string-length(normalize-space(para))

evaluates to the number of characters in the text contents of the para element, after removing leading and trailing white space and replacing sequences of white space characters by a single space

Namespaces

The examples above assume that the XML document being queried does not use namespaces. In practice however many XML documents do use namespaces to avoid conflicting element and attribute names when information of a different type or origin is mixed.

A namespace is identified by a Unique Resource Identifier (URI) which resembles a web page address but in fact is just a unique string (most URIs do not point to an actual web page). Rather than repeating this URI each time, element and attribute names use a namespace prefix (a shorthand) to refer to a namespace. The mapping between namespace prefixes and namespace URIs is defined in namespace declarations in the XML file.

For example:

xml:lang

is the name of a standard xml attribute for indicating a natural language

jdf:node

is the name of a node element in the JDF specification, assuming that the jdf prefix is mapped to the JDF namespace URI

Namespaces and XPath

If the XML file being queried uses namespaces, the XPath expressions and location paths must use namespace prefixes as well. The mapping between those prefixes and the corresponding namespace URIs must be passed to the query function separately, since this information can't be expressed in XPath.

Note that namespace URIs must match between the XPath expression and the XML file; the namespace prefixes may differ.

XPath does not have the concept of a default namespace. If the XML file being queried defines a default namespace, the XPath expression must specify a namespace prefix to refer to elements in that namespace. The default_switch_ns prefix can be used to specify elements using the default namespace.