Streaming Transformations for XML (STX) Version 0.90

1 Introduction

This document defines the syntax and semantics of the STX transformation language. Transformation rules in STX are expressed as well-formed XML documents. These documents, called stylesheets, may include both elements that are defined by STX (STX instructions) and other elements (literals). STX-defined elements are distinguished by belonging to a specific XML namespace, which is referred to in this specification as the STX namespace. This document uses a prefix of 'stx' as a shortcut for referring to elements from the STX namespace.

An STX transformation describes rules for transforming a source event stream into a result event stream. The transformation has a streaming character; this means that it doesn't require to build a tree representing the source document in memory. Result events are generated as soon as single source events appear and are processed.

The transformation is achieved by associating events with templates. A template pattern is matched against events and their context. The best matching template is then instantiated to create part of the result stream. The template is always instantiated with respect to a current context, an amount of information maintained during the transformation. In constructing the result stream, events from the source stream can be filtered and arbitrary events can be added. Events can also be reordered using a working storage.

The syntax of STX is similar to the syntax of [XSLT] formally. STX also employs a compact expression language embedded in certain attributes. This expression language, called STXPath, is again similar to [XPath] on the first sight syntactically. This should allow XSLT users to learn the most of STX syntax easily.

2 Processing Model

An STX processor transforms a source XML document according to rules given in an STX stylesheet and generates a result XML document.
The source document is supplied in the form of a stream of [SAX2] events. This stream is referred to as the source stream.
No tree representation of the source document is constructed. However, when processing each event, a limited amount of contextual information is available from the system.
Consequent character() SAX2 events are treated as a single character() event. Similarly, consequent comment() events are treated as a single comment() event.
The stylesheet is a well-formed XML document that may be precompiled to some kind of executable representation that can be reused to perform multiple transformations.
The output of the transformation consists of a sequence of SAX2 events. This sequence of events is referred to as the result stream.
Each incoming event causes invocation of a rule within the stylesheet by means of a match pattern.
The actions such a rule may perform include emitting SAX events to the result stream, saving working data in a working storage, accessing data written to working storage by previously executed rules, and invoking other rules.

2.1 Context

There is a contextual information available in each moment of the processing. It includes the data arriving with the current event and other data related to the state of processing. The contextual information which is available in a particular moment of the processing is called the current context. The context information consists of the following parts:

current node data - The node which is the subject of the current event is called the current node. It is always given and there is no way to change the current node using stylesheet rules. The information available for the current node depends on a type of node; see [SAX2] definition for details. For example, qualified name, local name, prefix, namespace URI and attributes (qualified name, local name, prefix, namespace URI, and value for each) are available for elements.
ancestor stack - In addition to the current node, all its ancestor nodes (with all properties) are stored in a storage called an ancestor stack.
next node data - The processing of the current node is delayed so that the next node data is available during the processing. The lookahead information is used to access the first text child of an element, provided it is the very first child of the element.
position within siblings - An information about the position relative to other siblings is kept. The position is available for the current node and all its ancestors.
A position number is available for all node kind tests such as node(), text(), cdata(), processing-instruction(), comment(). For elements, the position is available for all qualified names or names containing * shortcut: pre:lname, lname, pre:*, *:lname, *. For processing instructions, the position is also available for each target.

2.2 Precedence Categories

Each incoming event can invoke a rule within the stylesheet by means of precedence categories and a match pattern (see 2.3 Match Patterns). The template that is used to process the current node is called the current template. Templates can be separated into groups (see 3.3 Grouping of Templates). Top-level templates are considered to be members of the default, top-level group. The group containing the current template is referred to as the current group

Templates are associated with the precedence categories according to their visibility. The visibility is defined using the 'visibility' attribute for each template (see 4.3 Templates).

There are two precedence categories (listed with decreasing precedence):

templates from the same group and global or public templates (visibility='global'|'public') from children groups
global templates (visibility='global')

The first precedence category is searched for the best match template by means of a match pattern (see 2.3 Match Patterns. If there is no matching template in this precedence category, the second category is searched.

2.3 Match Patterns

The match pattern specifies a set of conditions on the current context. If the current context satisfies the conditions the current event matches the pattern; if the current context doesn't satisfy the conditions the current event does not match the pattern. The syntax for patterns is a subset of the pattern syntax for XSLT (see [XSLT], 5.2). In particular, patterns are in form of location paths that meet certain restrictions.

Here are some examples of patterns:

item - matches any 'item' element from the namespace used for unprefixed STXPath path patterns (defined with 'default-stxpath-namespace' option, no namespace by default)
list/item - matches any 'item' element with 'list' parent, where both elements are from the namespace used for unprefixed STXPath path patterns
chapter//list/item - matches any 'item' element with 'list' parent and 'chapter' ancestor, where all three elements are from the namespace used for unprefixed STXPath path patterns
/root/list/* - matches any element with 'list' parent and 'root' grand parent which is the document element, where both 'rot' and 'list' elements are from the namespace used for unprefixed STXPath path patterns
pre:list[@id=5]/pre:item - matches any 'item' element with 'list' parent having 'id' attribute with value of 5, where both elements are from the namespace which is bound to 'pre' prefix in stylesheet for this rule
*[position()=1] - matches any element that is the first element child of its parent
node() - matches any node
text() - matches any text node (including CDATA text node)
cdata() - matches any CDATA text node
processing-instruction() - matches any processing instruction

A match pattern is a set of location path patterns separated with |. A location path pattern is a location path whose steps all use only the child and descendant axes. Patterns may use the / operator as well as the // operator. Only abbreviated syntax is allowed. Up to one predicate is allowed in each step. Predicate expressions are STXPath expressions (see 6 Expressions).

Predicate expressions are evaluated and the result is converted to a boolean. If the result is a number, the result will be converted to true if the number is equal to the context position and will be converted to false otherwise. Thus a location path p[3] is equivalent to p[position()=3]. If the result is string, it is converted to true if and only if its length is non-zero. If the result is node-set, it is converted to true if and only if it's not empty.

If there is no matching template available a default rule is applied. One of three default rules can be used: 'ignore' (to ignore all no-match events), 'copy' (to copy all no-match events unmodified), and 'text' (to copy all text children of not matching events). The default rule is settable from the stylesheet (see 4.1 Transformation Options). This feature allows to copy a document with a few changes only, and to select just few items from a document in an easy way. The default behavior is to ignore all not matching events.

It is possible that the current context matches to more than one rule within a precedence category. The template rule to be used is determined according the same rules as in XSLT (see [XSLT], 5.5) then. All rules have a computed priority value. The computed priority can be overwritten with a 'priority' attribute value (see 4.3 Templates).

If the pattern contains multiple alternatives separated with |, then it is treated equivalently to a set of template rules, one for each alternative.
If the pattern has the form of a qualified name or has the form either of processing-instruction(target) or cdata(), then the priority is 0.
If the pattern has the form pre:* or *:lname, then the priority is -0.25.
If the pattern consists of just a node test other than cdata(), then the priority is -0.5.
Otherwise, the priority is 0.5.

The rule with the highest priority is used. If there is more than one matching template rule with the highest priority, an STX processor must choose the rule that occurs last in the stylesheet.

3 Stylesheet Structure

3.1 STX Namespace

The STX namespace has the URI http://stx.sourceforge.net/2002/ns.

3.2 Transform Element

stx:transform

<!-- Category: root -->
<stx:transform
  version = number>
  <!-- Content: top-level-elements -->
</stx:transform>

Stylesheets are required to use the root element of stx:transform.

The version attribute contains a version number to distinguish language versions; this attribute is mandatory and its value must be '1.0' for this version of the language.

The stx:transform element can contain the following children from the STX namespace. These elements are called top-level elements:

stx:options
stx:include
stx:variable
stx:namespace-alias
stx:group
stx:template
stx:procedure

All top-level elements with the only exception of stx:options can be present in multiple instances.

stx:options and stx:namespace-alias elements are allowed as top-level elements only.

3.3 Grouping of Templates

Templates can be organized into groups using stx:group element. Groups of templates then play a role in template matching (precedence categories are defined in terms of groups) and determine the scoping of variables.

Each stylesheet has its virtual default group that is considered to be the parent of top-level groups. Explicit groups are not mandatory; many transformations can be done without grouping templates. On the other hand, templates separated to groups make it possible to define more precise transformation rules and to run safer complex transformation, especially on well-known, regular input data.

stx:group

<!-- Category: top-level or group -->
<stx:group>
<!-- Content: group-elements -->
</stx:group>

This element must be a child of either stx:transform or stx:group element.

3.4 Stylesheet Inclusion

An STX stylesheet may include another STX stylesheet using the stx:include element.

stx:include

<!-- Category: top-level or group -->
<stx:include
  href = uri-reference/>

This element must be either a top-level or must be a child of the stx:group element. stx:include element is replaced with the content of stx:transform element of the included stylesheet with two exceptions; stx:namespace-alias of the included stylesheet is always inserted as a top-level element (even when including to a group) and stx:options of the included stylesheet is ignored. Top-level variables and top-level templates from the included stylesheet are treated as group variables and templates when including into a group. There is no difference between templates from the main stylesheet and included templates in terms of matching precedence.

4 Generating Output

STX templates are called sequentially rather then from other templates. Pair events match one template only which is separated in two parts; the first one is executed when the starting event appears and the second one applies to the ending event. The two parts are separated with the stx:process-children element.

4.1 Transformation Options

stx:options

<!-- Category: top-level -->
<stx:options
  no-match-events = "ignore"|"copy"|"text"
  recognize-cdata = "yes"|"no"
  default-stxpath-namespace = uri-reference
  strip-space = "yes"|"no"
  output-encoding = string/>

Global properties of a transformation can be specified using the stx:options element.

no-match-events - This attribute specifies a default rule how to treat events no matching template is found for. These events are either ignored (default) or copied to the output without modification. For "text", only text nodes are copied to the output.
recognize-cdata - This attribute specifies, whether CDATA boundaries are recognized during the transformation. If so, a node kind test cdata() can be used in STXPath expressions. Otherwise (recognize-cdata="no"), the cdata() kind test never matches in STXPath expressions. The default value is "yes".
default-stxpath-namespace - This optional attribute specifies a namespace used for unprefixed STXPath paths and patterns. No namespace is used by default.
strip-space - This optional attribute specifies whether whitespace text nodes are stripped from the input data stream. Whitespace text nodes are text nodes containing nothing but the following characters: #x20, #x9, #xD or #xA. The default value is "no".
output-encoding - This optional attribute specifies the preferred output encoding of the resulting byte stream. The value of this attribute should be treated case-insensitively; the value must contain only printable ASCII characters (#x21 - #x7E); the value should either be a charset registered with the Internet Assigned Numbers Authority [IANA] or "#input".
If the value of this attribute is "#input" or the attribute is not present, the output encoding should be the same as the input encoding. In the event an STX processor is not able to detect the input encoding, UTF-8 must be used as the output encoding. A compliant STX processor is not required to support any particular encoding other than UTF-8.

4.2 Namespace Aliasing

stx:namespace-alias

<!-- Category: top-level -->
<stx:namespace-alias
  source-prefix = ncname|"#default"
  result-prefix = ncname|"#default"/>

Namespaces from the input stream can be mapped to other namespaces in the result stream using the stx:namespace-alias element. Both attributes are mandatory and can contain either a prefix bound to the namespace to be used or the "#default" keyword for the default namespace.

4.3 Templates

stx:template

<!-- Category: top-level or group -->
<stx:template
  match = pattern
  priority = number
  visibility = "private"|"public"|"global"
  recursion-entry-point = "yes"|"no"
  mode = qname>
<!-- Content: template -->
</stx:template>

Rules to process input events are written in templates. stx:template element must be a child of either stx:transform or stx:group element. Templates match to the events by means of precedence categories and the pattern in the match attribute. Optional priority attribute can contain a priority value used for matching (see 2.3 Match Patterns).

The visibility attribute specifies whether the template is visible from the current group (and thus can match to the next event). Private templates are visible in their group only, public templates are visible from parent groups, and global templates are visible from any group. The default value is "private".

The mode attribute allows to limit nodes the template matches to only to nodes received under the same mode (within process-children, process-attributes or process-self with the same value of the mode attribute).

The recursion-entry-point attribute specifies whether the template creates new instances of group variables. The default value is "no". A new set of group variables is created for each instantiated template with recursion-entry-point="yes". These variables shadow their former values and exist as long as the template is being processed.

The content of templates may include both STX instructions and literal elements. Literal elements are simply copied to the output.

4.4 Procedures

stx:procedure

<!-- Category: top-level or group -->
<stx:procedure
  visibility = "private"|"public"|"global"
  recursion-entry-point = "yes"|"no"
  name = qname>
<!-- Content: template -->
</stx:procedure>

Procedures are sub-templates that can be called by names (with stx:call-procedure element). The visibility and recursion-entry-point attributes have the same meaning as for templates. Only visible procedures can be called by name, the recursion-entry-point must be set to "yes" to create new copies of group variables. It is a static error if a stylesheet contains more than one visible procedure with the same name.

The content of procedures may be the same as the content of templates.

stx:call-procedure

<!-- Category: top-level or group -->
<stx:call-procedure
  name = qname>
<!-- Content: stx:with-param* -->
<stx:call-procedure>

The stx:call-procedure element makes it possible to invoke procedures by their names. The name attribute is mandatory.

4.5 Parameters

Values can be passed to procedures as parameters. A parameter behaves in the same way as a local variable; thus it is only visible within the procedure it is passed to. There are two elements available to work with parameters:

stx:with-param

<!-- Category: call-procedure -->
<stx:with-param
  name = qname
  select = expression>
<!-- Content: text template -->
</stx:with-param>

Parameters are passed to procedures using the stx:with-param element. The required name attribute specifies the name of the parameter. The value of the parameter is the result returned by an expression located either in the select attribute or in the content of this element. stx:with-param is allowed as a child of stx:call-procedure only.

stx:param

<!-- Category: template -->
<stx:param
  name = qname
  select = expression>
<!-- Content: text template -->
</stx:param>

The stx:param element is allowed in procedures only (it must be a child of stx:procedure). The required name attribute specifies the name of the parameter. The select attribute or the content of this element specifies a default value, which is used when there is no value specified using the select attribute or the content of the appropriate stx:with-param element.

4.6 Copying the Current Node

stx:copy

<!-- Category: template -->
<stx:copy
  attributes = pattern>
<!-- Content: template -->
</stx:copy>

The stx:copy element is used to copy the current node to the output. The optional attributes attribute contains a match pattern. The attributes of the current node that match the pattern are copied to the output.

Thus, attributes="@*" copies all attributes, attributes="@foo|@bar" copies the foo and bar attributes only, attributes="@*[not(name()='foo')]" copies all but foo attribute, and attributes="none" doesn't copy any attributes. The default is to copy all attributes.

4.7 Processing Nested Events

stx:process-children

<!-- Category: template -->
<stx:process-children
  mode = qname/>

This element splits a template into 2 parts processed by starting and ending pair SAX2 events (start-element, end-element). There must be always at most one stx:process-children element in a template matching to an element event. Moreover, a template can contain only one of stx:process-children and stx:process-self instructions in the same time; otherwise an error must be reported. This element must be always empty.

Note:

If a template doesn't contain any stx:process-children instruction, the children of this element are not processed at all. The default rule (

<stx:options no-match-events = 
	    "copy|ignore">

) applies only to nodes that are to be processed, but there is no matching template found.

The optional mode attribute allows to limit templates matching to children events to those with the same mode (the same value of the mode attribute) only.

4.8 Processing Attributes

stx:process-attributes

<!-- Category: template -->
<stx:process-attributes
  mode = qname/>

This instruction is used to apply templates to attribute children of an element node. The stx:process-attributes element must always be empty.

The optional mode attribute allows to limit templates matching to attribute children to those with the same mode (the same value of the mode attribute) only.

4.9 Running Overridden Templates

stx:process-self

<!-- Category: template -->
<stx:process-self
  mode = qname/>

This instruction is used to process the current node using the template that would have been chosen if the current template wasn't present in the stylesheet. There must be always at most one stx:process-self element in a template. Moreover, a template can contain only one of stx:process-children and stx:process-self instructions in the same time; otherwise an error must be reported. stx:process-self element must always be empty.

The optional mode attribute allows to limit matching templates to those with the same mode (the same value of the mode attribute) only.

4.10 Outputting Strings

stx:value-of

<!-- Category: template -->
<stx:value-of
  select = string-expression/>

This instructions emits characters to the result stream. The mandatory select attribute contains an STXPath expression evaluating to string. This element is always empty.

stx:text

<!-- Category: template -->
<stx:text>
<!-- Content: #PCDATA -->
</stx:text>

This instructions emits literal character data to the result stream. The content is neither normalized nor stripped should it contain whitespace characters only. Results of consequent stx:value-of and stx:text instructions are joined so that they emit a single character() event.

stx:cdata

<!-- Category: template -->
<stx:cdata>
<!-- Content: #PCDATA -->
</stx:cdata>

This instructions emits literal data as a CDATA section to the result stream. The content is neither normalized nor stripped should it contain whitespace characters only.

4.11 Outputting Elements and Attributes

stx:element

<!-- Category: template -->
<stx:element
  name = {qname}
  namespace = {uri-reference}>
<!-- Content: template -->
</stx:element>

This instruction is used to generate an element. It has the same meaning as in [XSLT].

stx:element-start

<!-- Category: template -->
<stx:element-start
  name = {qname}
  namespace = {uri-reference}/>

stx:element-end

<!-- Category: template -->
<stx:element-end
  name = {qname}
  namespace = {uri-reference}/>

There are separate instructions available to output an element start tag and an element end tag. The name attribute is required for both instructions. The both elements must be empty.

A compliant STX processor is required to produce well-formed XML output. An attempt to create an end-tag without a matching start-tag must be reported as error by the STX processor.

stx:attribute

<!-- Category: template -->
<stx:attribute
  name = {qname}
  namespace = {uri-reference}
  select = string-expression>
<!-- Content: text template -->
</stx:attribute>

This instruction is used to generate an attribute. It has the same meaning as in [XSLT]. stx:attribute must follow an element-starting instruction (stx:element, stx:element-start, stx:copy, or a literal element) and no other output-generating instructions are allowed between the element-starting instruction and stx:attribute.

4.12 Outputting Other Nodes

stx:processing-instruction

<!-- Category: template -->
<stx:processing-instruction
  name = ncname>
<!-- Content: text template -->
</stx:processing-instruction>

This instruction is used to generate a processing instruction. The mandatory name attribute is an attribute value template.

stx:comment

<!-- Category: template -->
<stx:comment>
<!-- Content: text template -->
</stx:comment>

This instruction is used to generate a comment. It has the same meaning as in [XSLT].

4.13 Conditions

stx:if

<!-- Category: template -->
<stx:if
  test = boolean-expression>
<!-- Content: template -->
</stx:if>

The mandatory test attribute contains an STXPath expression evaluating to boolean. The content template is instantiated if and only if the test attribute has evaluated to true.

stx:else

<!-- Category: template -->
<stx:else>
<!-- Content: template -->
</stx:else>

This instruction must follow immediately after stx:if; an error must be reported otherwise. The content template is instantiated if and only if the test attribute of the preceding stx:if instruction has evaluated to false.

stx:choose

<!-- Category: template -->
<stx:choose>
  <stx:when
    test = boolean-expression>
  <!-- Content: template -->
  </stx:when>+
  <stx:otherwise>
  <!-- Content: template -->
  </stx:otherwise>?
</stx:choose>

The same meaning as in [XSLT].

4.14 Loops

stx:for-each

<!-- Category: template -->
<stx:for-each
  select = expression>
<!-- Content: template -->
</stx:for-each>

The stx:for-each instruction contains a template that is instantiated for each member of the sequence specified in the select attribute.

5 Data Types

5.1 Atomic Types

There are four atomic data types in STX:

string
number
boolean
node

There are seven types of node recognized in STXPath. For every type of node, there is a way of determining a string-value. Since descendants are not available in the time of processing, the string value for some types of nodes is not defined.

root nodes - there is no string value defined for root nodes, an error is reported
element nodes - there is no string value defined for element nodes, an error is reported
attribute nodes - the string-value of an attribute is the normalized value of this attribute
text nodes - the string-value of a text node is the character data of this node
cdata nodes - the string-value of a cdata node is the character data of this node
processing instruction nodes - the string-value of a processing instruction node is the part of the processing instruction following the target and any whitespace not including the terminating ?>
comment nodes - the string-value of a comment is the content of this comment not including the opening

5.2 Sequences

STXPath expressions (see 6 Expressions) always return a sequence. A sequence is an ordered collection of zero or more items. Unlike common lists, sequences are "flat"; sequences may not contain other sequences. Sequences may contain duplicate items. An item must be of one of the atomic types: string, number, boolean, or node.

A sequence with zero items is called an empty sequence. A sequence with exactly one item is called a singleton sequence. There is no distinction between an item and a singleton sequence containing this item; an item is equivalent to a singleton sequence containing this item and vice versa. A sequence has no identity. Equality comparison of sequences is performed only by comparing items of the sequences.

5.3 Type Conversions

Certain operators, functions, and syntactic constructs expect a value of a particular type to be supplied: this type is referred to as a required type. In such an event, a general sequence is converted to the required type according to the conversion rules

The empty sequence is converted to required types as defined in the following table:

required type	empty sequence
boolean	`false`
string	empty string
number	`NaN`
node	ERROR

A singleton sequence is converted to a required type according to the type of the only member of the sequence:

required type	boolean member	string member	number member	node member
boolean	-	`false` is converted to 'false', `true` is converted to 'true'	`false` is converted to 0, `true` is converted to 1	ERROR
string	'false', '0', empty string are converted to `false`, other strings are converted to `true`	-	a string that consists of optional whitespace followed by an optional minus sign followed by a numeric literal (see 6.2 Literals) followed by whitespace is converted to the number that is nearest to the mathematical value represented by the string; any other string is converted to `NaN`.	ERROR
number	0, +0, -0, `NaN` are converted to `false`, other numbers are converted to `true`	`NaN` is converted to 'NaN', +0 and -0 are converted to '0', positive infinity is converted to 'Infinity', negative infinity is converted to '-Infinity'. Other numbers are represented in decimal form as numeric literal (see 6.2 Literals) with no leading zeros (apart possibly from the one required digit immediately before the decimal point), preceded by a minus sign (-) if the number is negative.	-	ERROR
node	a node is converted to `true`	a node is converted to its string value (see 6 Expressions)	a node is converted to its string value (see 6 Expressions); then the rules to convert strings to numbers are applied to convert the string value to a number	-

A sequence containing more than one item is converted according to its very first item; all other items are ignored. The same conversion rules as for singleton sequences are applied (see the table above).

5.4 Tree Fragments

See Issue 1.

6 Expressions

STX uses an expression language of its own called STXPath. STXPath is very similar to [XPath] on the first sight. Syntactically, STXPath is close to an [XPath2] sub-set. However, since STX has a different notion of context, the meaning of some expressions may be different in STXPath and in XPath. Consider the following example:

In XPath, the expression /node1/node2 returns a node-set containing all node2 elements, whose parent node1 is the document element. In STXPath, on contrary, the same expression returns only a single node from this node-set; the one which is an ancestor of the current node.

Expressions are used in STX as match patterns, to specify conditions for different ways of processing of the current node, to generate text to be inserted to the output stream, or to access data from the ancestor stack.

Each expression has its static context - the information that is available during static analysis of the expression, prior to its evaluation. The static context includes in-scope namespaces, default namespace for element names, and in-scope variables. The information that is available at the time when the expression is evaluated is the current context as defined in 2.1 Context.

Basic primitives of STXPath include:

variables (6.1 Variables)
literals (6.2 Literals)
parenthesized expressions (6.3 Parenthesized Expressions)
functions (6.4 Functions)

Expressions evaluate to string, number, boolean, or node-set. See the standalone STXPath grammar BNF definition for details.

6.1 Variables

STX variables are scoped statically according to the literal structure of stylesheets. The grouping of templates is used to make the sharing of other than global variables possible.

There are two types of variables:

group variables - stx:variable is child of either stx:transform or stx:group. Top-level variables are considered to be members of the top-most default group that exists for each stylesheet.
local variables - Declared within templates.

A group variable is visible for the group where the variable is declared, for all descendant groups and for all templates belonging to these groups. A local variable is visible for all following siblings of the variable declaration and their descendants. Group variables may be shadowed (another variable with the same name is visible) by descendant group variables and by local variables. It is an error to redeclare a variable with the same name in the same group or template.

Variables always contain a sequence. STX instructions stx:variable and stx:assign are used to evaluate an expression and store its value to a variable.

Since variables are re-assignable, each variable must be declared using the stx:variable element before it's used (assigned, referenced). Group variables are statically initialized while parsing the stylesheet; Only the static context information is available during the initialization. Local variables are initialized in the run-time. A variable declared with no value is initialized with the empty sequence.

stx:variable

<!-- Category: top-level or group or template -->
<stx:variable
  name = qname
  select = expression
  keep-value = "yes"|"no">
<!-- Content: text template -->
</stx:variable>

This instruction is used to declare and initialize a variable. The name mandatory attribute contains the name of variable. An expression in the select attribute is evaluated and the variable is initialized with its result. The select attribute is optional; a variable is initialized with the string resulting from the content of the stx:variable element if the select is missing. If the content is empty (stx:variable element has no children) the variable is initialized with the empty sequence.

The keep-value optional attribute specifies whether a new copy of variable created by recursion is initialized with the value of the shadowed variable (yes) or not (no). The default value is no. If there is no shadowed variable yet, the keep-value attribute is ignored.

stx:assign

<!-- Category: top-level or group or template -->
<stx:assign
  name = qname
  select = expression>
<!-- Content: text template -->
</stx:assign>

This instruction is used to assign a new value to a previously declared variable. The name mandatory attribute contains the name of variable. An expression in the select attribute is evaluated and its result is assigned to the variable. The string resulting from the content of the stx:variable element is assigned to the variable if the select is missing. If the content is empty, the empty sequence is assigned to the variable.

6.2 Literals

A literal is a direct syntactic representation of an atomic value. STXPath supports two kinds of literals: string literals and numeric literals.

The value of a string literal is a singleton sequence containing an item whose atomic type is string and whose value is the string denoted by the characters between the delimiting quotation marks.

StringLiteral      ::= (["][^"]*["]) | (['][^']['])

The value of a numeric literal is a singleton sequence containing an item whose type is number and whose value is obtained by parsing the numeric literal according to the rules for string to numbers conversion (see 5.3 Type Conversions).

NumericLiteral ::= IntegerLiteral | DecimalLiteral | DoubleLiteral

IntegerLiteral ::= Digits
DecimalLiteral ::= ('.' Digits) | (Digits '.' [0-9]*)
DoubleLiteral  ::= (('.' Digits) | (Digits ('.' [0-9]*)?))([e]|[E])([+]|[-])? Digits

6.3 Parenthesized Expressions

Parentheses may be used to enforce a particular evaluation order in expressions that contain multiple operators.

Parentheses are also used as delimiters in constructing a sequence, as described in 6.6 Sequence Expressions.

6.4 Functions

A function call consists of a function name followed by a parenthesized list of zero or more expressions. The expressions inside the parentheses provide the arguments of the function call. The number of arguments must be equal to the number of function parameters; otherwise a static error is raised.

A function calls are evaluated as follows:

Each argument expression is evaluated, producing an argument value (sequence).
If the the corresponding function parameter has a required type, the argument value is converted to this type.
The function is executed using the converted argument values. The result is a value of the function's declared return type.

The following list of STXPath functions is categorized by required types of primary arguments:

6.4.1 Sequence Functions

Function: boolean empty(sequence)

The empty() function returns true if the argument is the empty sequence; otherwise it returns false.

Function: item item-at(sequence, number)

The item-at() function returns the item from the first argument sequence at the position given by the second argument. The index number is rounded to the nearest integer if necessary. If the sequence is the empty sequence, this function returns the empty sequence. If the value of index is greater than the number of items in the sequence, or is less than or equal to zero, then the function reports an error.

Function: sequence sublist(sequence, number, number?)

The sublist() function returns the contiguous sequence of items from the first argument (source sequence) beginning at the position specified by the second argument (index) and continuing for the number of items indicated by the third argument (length). If length is not specified, then the sublist identifies items to the end of the source sequence. The index and length numbers are rounded to the nearest integers if necessary. If the source sequence is the empty sequence, this function returns the empty sequence. If the value of index is greater than the number of items in the sequence, or is less than or equal to zero, then the function reports an error. The length can be greater than the number of items in the source sequence following the beginning position, in which case the sublist identifies items to the source sequence.

Function: number count(sequence)

The count() function returns the number of items in the sequence.

6.4.2 Node Functions

Function: string name(node)

The name function returns a string containing a qualified name representing the expanded-name of the node in the argument.

Function: string namespace(node)

The namespace function returns the namespace URI of the expanded-name of the node in the argument.

Function: string local-name(node)

The local-name() function returns the local part of the expanded-name of the node in the argument.

Function: string prefix(node)

The prefix() function returns the prefix of the expanded-name of the node in the argument.

Function: number position()

The position() function returns a number equal to the position of the current node relative to other siblings, see 2.1 Context for details of position() semantics.

Function: node get-node(number)

The get-node() function returns the node which is in the ancestor stack at the level given by the argument. The level number is rounded to the nearest integer if necessary. For example, get-node(0) returns the root of document, get-node(1) returns the document element. get-node(level()) returns the current node. If there is no node at the requested level in the ancestor stack, the function returns the empty sequence.

Function: boolean has-child-nodes()

The has-child-nodes() function returns true if and only if the current node is the document node or an element node and has child nodes (it is not empty). It returns false otherwise.

6.4.3 Boolean Functions

Function: boolean true()

The true() function returns always true.

Function: boolean false()

The false() function returns always false.

Function: boolean not(sequence)

The not() function reduces its parameter to an effective boolean value using the same rules that are used for the operands of logical expressions (see 6.9 Logical Expressions). It then returns true if the effective boolean value of its parameter is false, and false if the effective boolean value of its parameter is true.

6.4.4 String Functions

Function: boolean starts-with(string, string)

The starts-with() function returns true if the first argument string starts with the second argument string, otherwise it returns false. If the value of any argument is the empty sequence, the function returns the empty sequence.

Function: boolean contains(string, string)

The contains() function returns true if the first argument string is part of the second argument string, otherwise it returns false. If the value of any argument is the empty sequence, the function returns the empty sequence.

Function: string substring(string, number, number?)

The substring() function returns the number specified with the third argument of characters from the offset specified with the second argument in the first argument string; or all characters from the offset to the end of the string if the third argument is omitted; the offset and length numbers are rounded to the nearest integer if necessary. The offset of the first character is 1. If the value of any argument is the empty sequence, the function returns the empty sequence.

Function: string substring-before(string, string)

The substring-before() function returns the part of the first argument string from the beginning of the string up to (but not including) the first occurrence of the second argument string. The empty string is returned if the first argument string does not contain the second argument string. If the value of any argument is the empty sequence, the function returns the empty sequence.

Function: string substring-after(string, string)

The substring-after() function returns the part of the first argument string from the end of the first occurrence of the second argument string to the end of the (first) string. The empty string is returned if the first argument string does not contain the second argument string. If the value of any argument is the empty sequence, the function returns the empty sequence.

Function: number string-length(string)

The string-length() function returns the number of characters in a string. If the value of the argument is the empty sequence, the function returns the empty sequence.

Function: string normalize-space(string)

The normalize-space() function returns the argument string after leading and trailing whitespace is stripped and consequent whitespace characters are replaced with a single space. If the value of the argument is the empty sequence, the function returns the empty sequence.

Function: string translate(string, string, string)

The translate() function returns the first argument string with occurrences of characters in the second argument string replaced by the corresponding characters from the third argument string. If there is a character in the second argument string with no character at a corresponding position in the third argument string (because the second argument string is longer than the third argument string), then occurrences of that character in the first argument string are removed. If a character occurs more than once in the second argument string, then the first occurrence determines the replacement character. If the third argument string is longer than the second argument string, then excess characters are ignored. If the value of any argument is the empty sequence, the function returns the empty sequence.

Function: string concat(string, string?)

The concat() function returns the concatenation of its arguments. If the value of any argument is the empty sequence, the function returns the empty sequence.

Function: string replace(string, string, string, string?)

The replace() function returns the first argument string with parts that match a regular expression given in the second argument string replaced with the third argument string. The regular expression semantics as defined in XML Schema Part 2: Datatypes ([XSD2]), Appendix F is used.

The fourth optional argument is a string consisting of character flags to be used by the match. If a character is present then that flag is true. The flags are:

g - global replace
All occurrences of the regular expression in the string are replaced. If this character is not present, then only the first occurrence of the regular expression is replaced.
i - case insensitive
The regular expression is treated as case insensitive. If this character is not present, then the regular expression is case sensitive.

If the value of any argument is the empty sequence, the function returns the empty sequence.

Function: sequence match(string, string, string?)

The match() function returns a list of integers that identify the offset of the location within the value of the first argument string that is matched by the regular expression that is the value of the second argument string. If there is no substring of the first string that matches the regular expression, the empty sequence is returned. Otherwise, a sequence of two integers is returned: the first integer is the position of the start of the substring and the second integer is the length of the substring that matches. The regular expression semantics as defined in XML Schema Part 2: Datatypes ([XSD2]), Appendix F is used.

The third optional argument is a string consisting of character flags to be used by the match. If a character is present then that flag is true. The flags are:

g - global replace
All occurrences of the regular expression in the string are replaced. If this character is not present, then only the first occurrence of the regular expression is replaced.
i - case insensitive
The regular expression is treated as case insensitive. If this character is not present, then the regular expression is case sensitive.

If the value of any argument is the empty sequence, the function returns the empty sequence.

6.4.5 Numerical Functions

Function: number floor(number)

The floor() function returns the largest number that is not greater than the argument and that is an integer. If the value of the argument is the empty sequence, the function returns the empty sequence.

Function: number ceiling(number)

The ceiling() function returns the smallest number that is not less than the argument and that is an integer. If the value of the argument is the empty sequence, the function returns the empty sequence.

Function: number round(number)

The round() function returns the number that is closest to the argument and that is an integer. If there are two such numbers, then the greater one is returned. If the argument is NaN, then NaN is returned. If the value of the argument is the empty sequence, the function returns the empty sequence.

Function: number sum(sequence)

The sum() function returns the sum, for each item in the argument sequence, of the result of converting the item to a number. If the value of the argument is the empty sequence, the function returns the empty sequence.

6.4.6 Other Functions

Function: string string(sequence)

The string() function returns the result of converting the argument to a string. See 5.3 Type Conversions for details.

Function: number number(sequence)

The number() function returns the result of converting the argument to a number.

Function: boolean boolean(sequence)

The boolean() function returns the result of converting the argument to a boolean.

Function: number level(node?)

The level() function returns the level of the argument node in the ancestor stack. level() and level(.) return the level of the current node. level(/) returns 0. If the value of the argument is the empty sequence, the function returns the empty sequence.

6.5 Data Accessors

The only data available when processing the current node is the data related to the current node, the data related to the next node, and the data related to nodes in the ancestor stack. Location paths called data accessors are used to access to this data. Axes in data accessors are limited to:

parent and ancestor axes in relative location paths
child and descendant axes (abbreviated syntax only) in absolute location paths
attribute axis (abbreviated syntax only)
text() node test (child axis) for the current node

Predicates are not allowed in data accessors.

A data accessor always returns a sequence (often a singleton one). These sequences are very limited; they can contain nothing but nodes stored in the ancestor stack (the current node and its attributes, ancestor elements and their attributes) and the next nodes (only if the next node happens to be a text node, accessed with text()). Resulting sequences can be either passed to functions operating with sequences or converted to string, number or boolean.

Here are some examples of data accessors:

. - returns the current node
text() - returns the first text child of the current node provided it is the very first child of the current node
parent::* - returns the parent node of the current node
ancestor::* - returns a sequence whose items are all ancestors of the current node
@foo - returns the foo attribute of the current node
ancestor::*/@bar - returns a sequence of bar attributes of ancestors of the current node
/aaa/bbb - returns a bbb element from the ancestor stack which is a child of aaa element which is the root element of the ancestor stack (and hence the root element of the input document)

6.6 Sequence Expressions

STXPath supports operators to construct and combine sequences. One way to construct a sequence is using a parenthesized expression (6.3 Parenthesized Expressions), which is zero or more expressions separated with the comma operator and delimited with parentheses. The parenthesized expression is evaluated by evaluating each of its constituent expressions and concatenating the resulting sequences, in order, into a single result sequence.

Here are some examples of expressions that construct sequences:

This expression is a sequence of five integers:

(10, 1, 2, 3, 4)

This expression constructs one sequence from the sequences 10, (1, 2), the empty sequence (), and (3, 4):

(10, (1, 2), (), (3, 4))

It evaluates to (10, 1, 2, 3, 4) sequence.

6.7 Arithmetic Expressions

STXPath provides arithmetic operators for addition, subtraction, multiplication, division, and modulus, in their usual binary and unary forms. The binary subtraction operator must be preceded by white space in order to distinguish it from a hyphen, which is a valid name character.

An arithmetic expression is evaluated by applying the following rules:

If either operand is the empty sequence, the result of the operation is the empty sequence.
Operands other than empty sequences are converted (5.3 Type Conversions) to numbers before the expression is evaluated. If the conversion fails (returns NaN) an error is reported.

6.8 Comparison Expressions

Comparison expressions allow two values to be compared. STXPath provides the following general comparison operators: =, !=, <, <=, >, >=. The result of a comparison is always true or false (a singleton sequence containing one boolean item).

CompOp ::= '=' | '!=' | '<' | '<=' | '>' | '>='

The result of a comparison of sequences is defined by applying the following rules, in order:

If either operand is the empty sequence, the result is false.
The comparison A operator B is true for sequences A and B if the comparison a operator b is true for some item a in A and some item b in B. Otherwise, A operator B is false.

The result of a comparison of items is defined by applying the following rules. The rules defined in 5.3 Type Conversions apply for conversions:

If both items to be compared are nodes, then the comparison will be true if and only if the result of performing the comparison on the string-values of the two nodes is true.
If one item to be compared is a node and the other is a number, then the comparison will be true if and only if the result of performing the comparison on the number and on the result of converting the string-value of that node to a number is true.
If one item to be compared is a node and the other is a string, then the comparison will be true if and only if the result of performing the comparison on the string-value of the node and the other string is true.
If one item to be compared is a node and the other is a boolean, then the comparison will be true if and only if the result of performing the comparison of true and the boolean value is true.
When neither item to be compared is node and the operator is = or !=, then the items are compared by converting them to a common type as follows and then comparing them. If at least one item to be compared is a boolean, then each item to be compared is converted to a boolean. Otherwise, if at least one item to be compared is a number, then each item to be compared is converted to a number. Otherwise, both items to be compared are converted to strings.
When neither item to be compared is node and the operator is <=, <, >= or >, then the items are compared by converting both items to numbers and comparing the numbers.

6.9 Logical Expressions

STXPath provides two common logical operators: and and or. The value of a logical expression is always one of the boolean values true or false (a singleton sequence containing a boolean item).

Logical expressions are evaluated by reducing each of its operands to an effective boolean value by applying the following rules, in order:

If the operand is the empty sequence, its effective boolean value is false.
If the operand is a singleton sequence containing a boolean item, the item serves as the effective boolean value.
If the operand is a sequence that contains at least one node, its effective boolean value is true.
In any other case, operands are converted to boolean (see 5.3 Type Conversions) to get effective boolean values.

An AND expression returns true if the effective boolean values of both of its operands are true; otherwise it returns false.

An OR expression returns false if the effective boolean values of both of its operands are false; otherwise it returns true.

In addition to logical expressions, XPath provides a function named not() that takes a general sequence as parameter and returns a boolean value.

7 Extensions

STX will define extension modules to interact with other XML and non-XML technologies. What this document describes is the core STX language. Extensions can possibly include the following modules:

STX-Script
STX-XSLT
STX-XPath

Streaming Transformations for XML (STX)Version 0.90

Working Draft 0.08 - 20 September 2002

Abstract

Status of this Document

Table of Contents

Appendices