Streaming Transformations for XML (STX) Version 1.0

1 Introduction

This document defines the syntax and semantics of the STX transformation language. Transformation rules in STX are expressed as well-formed XML documents. These documents, called stylesheets, may include both elements that are defined by STX (STX declarations and instructions) and other elements (literals). STX-defined elements are identified by a specific XML namespace, which is referred to in this specification as the STX namespace. This document uses the 'stx' prefix as a shortcut for referring to elements from the STX namespace.

An STX transformation describes rules for transforming one or more source event streams into one or more result event streams. The transformation has a streaming character; this means that it does not need to build a tree representing the source documents in memory. Result events are generated as soon as source events appear and are processed.

The transformation is achieved by associating events with templates. A template pattern is matched against events and their context. The best matching template is then instantiated to create a part of the result stream. A template is always instantiated with respect to the current context, a set of additional information maintained during the transformation. In constructing the result stream, events from the source stream can be filtered and arbitrary events can be added. Events can also be reordered using working storage.

On the surface, the syntax of STX is similar to the syntax of [XSLT]. STX also employs a compact expression language embedded in certain attributes. This expression language, called STXPath, is syntactically similar to [XPath]. This should allow XSLT users to easily adapt to STX syntax.

2 Concepts

The software responsible for running an STX transformation is referred to as an STX processor. An STX processor transforms one or more source XML documents according to rules given in an STX stylesheet and generates one or more result XML documents.

The source documents are supplied in the form of streams of [SAX2] events. These streams are referred to as the source streams. The stream whose events are currently processed is referred to as the current source stream. The current source stream at the time when the transformation is initiated is referred to as the principal source stream.

A possibly empty set of external values for stylesheet parameters is supplied. These values are available for use within expressions in the stylesheet.

No tree representation of the source document is constructed. However, when processing each event, a limited amount of contextual information is available from the system.

Data arriving with an event can form one or more objects called nodes. Pair events for the document and elements form one node only; all node data is passed with the starting event. The data of attributes passed with startElement() event form separate nodes.

Sequential characters() and ignorableWhitespace() events will be combined into a single text node.

The stylesheet is a well-formed XML document that may be precompiled to some kind of executable representation that can be reused to perform multiple transformations. The stylesheet can consist of several stylesheet modules contained in different files. One of these modules is the principal stylesheet module. The complete stylesheet is assembled by finding the stylesheet modules referenced directly or indirectly from the principal stylesheet module using the stx:include declaration.

The output of the transformation consists of one or more sequences of SAX2 events. These sequences of events are referred to as result streams. The stream events are emitted to currently is referred to as the current result stream. The current result stream at the time when the transformation is initiated is referred to as the principal result stream.

Each incoming event can cause an invocation of one or more rules within the stylesheet by means of a match pattern. The actions such a rule may perform include emitting SAX2 events to result streams, saving working data to working storage, accessing data written to working storage by previously executed rules, and invoking other rules.

Note:

The source or result streams are abstract constructs that function as input or output channels for STX transformations. Each source or result stream is identified with a URI. This URI must not be confused with a URI of physical document that may be parsed to generate the source stream or a URI of document the result stream may be serialized to. Stream URIs are passed to a resolver that maps abstract streams to physical resources.

2.1 Initiating a Transformation

This document does not specify interfaces for initiating an STX transformation. Instead, these interfaces are implementation dependent. This section describes the minimum amount of information that must be supplied to execute a transformation:

An identification of the stylesheet module that is to act as the principal stylesheet module for the transformation.
A possibly empty set of values for stylesheet parameters (name-value pairs). External parameter values are matched against global stylesheet parameters.
An identification of the stream that is to act as the principal source stream.
An identification of the stream that is to act as the principal result stream.

2.2 Nodes

The data arriving with an event forms zero or more entities called nodes. Pair events refer to a single node whose data is passed with the starting event. The attribute data arriving with startElement() event form separate attribute nodes. Aggregated consequent events of the same type (characters, ignorableWhitespaces) are treated as a single event and thus form a single node only.

There are seven types of nodes recognized in STX:

root node - Passed with a startDocument() event; this node has no properties.
element node - Passed with a startElement() event. The node properties consist of the element related data (local name, prefix, qualified name, namespace URI).
attribute node - Passed with a startElement() event. The node properties consist of the data related to a particular attribute (local name, prefix, qualified name, namespace URI, value).
text node - Passed with a characters() or ignorableWhitespace() event. The node properties consist of character data.
CDATA node - Passed with a characters() or ignorableWhitespace() event within startCDATA() and endCDATA() lexical events. The node properties consist of character data.
processing instruction node - Passed with a processingInstruction() event. The node properties consist of target and character data.
comment node - Passed with a comment()event. The node properties consist of character data.

2.3 Context

There is contextual information available at each point during processing. It includes the data arriving with the current event and other data related to the state of processing. The contextual information at any particular instant during processing is called the current context. The context information consists of the following parts:

current node data - The node which is the subject of the current event is called the current node. The information available for the current node depends on the node type; see [SAX2] definition for details. For example, qualified name, local name, prefix, namespace URI, attributes (qualified name, local name, prefix, namespace URI, and value for each), and the string-value are available for elements.
ancestor stack - For the current node, all ancestor nodes with all properties are stored in the ancestor stack.
position within siblings - Information about the position relative to other siblings is kept. The position is available for the current node and all its ancestors.
A position number is available for all node kind tests such as node(), text(), cdata(), processingInstruction(), comment(). For elements, the position is available for all qualified names or names containing * shortcut: pre:lname, lname, pre:*, *:lname, *. For processing instructions, the position is also available for each target. The position of attribute nodes is undefined.

2.4 Precedence Categories

Each incoming event can invoke a template within the stylesheet by means of precedence categories and a match pattern (see 2.6 Match Patterns). The template that is used to process the current node is called the current template. Templates can be separated into groups (see 3.3 Grouping of Templates). Top-level templates are considered to be members of the default group. The group containing the current template is referred to as the current group

Templates are classed into the precedence categories according to their visibility from a base group. The base group can be either the current group or the group explicitly specified in the group attribute of process-xxx statements. The visibility is defined using the visibility and public attributes for each template (see 4.2 Templates).

There are three precedence categories (listed with decreasing precedence):

templates from the base group and public templates (public="yes") from child groups
group and global templates (visibility="group"|"global") from all ancestor groups
all global templates (visibility="global")

The first precedence category is searched for the best matching template by means of a match pattern (see 2.6 Match Patterns). If there is no matching template in the first precedence category, the second category is searched. If neither the first nor the second category contain a matching template, the third category is searched.

2.5 Expressions

STX uses an expression language called STXPath that is defined later in this specification (see 6 STXPath).

Expressions are used in STX in match patterns, to specify conditions for different ways of processing of the current node, to generate text to be inserted to the output stream, or to access data from the ancestor stack.

An STXPath expression may occur as the value of certain attributes on STX elements, and also within curly braces in attribute value templates (see 2.7 Attribute Value Templates). The context within a stylesheet where an STXPath expression appears may specify the required data type; the type of value that the expression is expected to return.

It is a static non-recoverable error if the value of an expression attribute does not match the STXPath production expressions. Is is a dynamic non-recoverable error if any STXPath expression is evaluated and raises a dynamic error, or when it raises an error when converting to the required data type.

The attribute default-stxpath-namespace of the stx:transform element (see 3.2 Transform Element) may be used to define the namespace that will be used for an unprefixed name used as a node name test within a step of an STXPath expressions or pattern. The value of the attribute is the namespace URI to be used.

This default namespace URI applies only to elements; it does not apply to attributes. In the absence of this attribute, an unqualified node name test matches an element whose namespace URI is null: the default namespace (as defined by an xmlns="some-uri" declaration) is not used.

2.6 Match Patterns

A match pattern specifies a set of conditions on the current context. If the current context satisfies the conditions the current node matches the pattern; if the current context does not satisfy the conditions the current node does not match the pattern. The syntax for STX patterns is a subset of the syntax for STXPath expressions. In particular, patterns are in form of location paths that meet certain restrictions.

Here are some examples of patterns:

item - matches any 'item' element from the namespace used for unprefixed STXPath path patterns (defined with 'default-stxpath-namespace' attribute, no namespace by default)
list/item - matches any 'item' element with a 'list' parent, where both elements are from the namespace used for unprefixed STXPath path patterns
chapter//list/item - matches any 'item' element with a 'list' parent and a 'chapter' ancestor, where all three elements are from the namespace used for unprefixed STXPath path patterns
/root/list/* - matches any element with a 'list' parent and a 'root' grand parent which is the document element, where both 'root' and 'list' elements are from the namespace used for unprefixed STXPath path patterns
pre:list[@id=5]/pre:item - matches any 'item' element with a 'list' parent having an 'id' attribute with a value of 5, where both elements are from the namespace which is bound to the 'pre' prefix in the stylesheet for this rule
*[sf:position()=1] - matches any element that is the first element child of its parent
node() - matches any child node
text() - matches any text node (including CDATA text node)
cdata() - matches any CDATA text node
processing-instruction() - matches any processing instruction

A match pattern is a set of location path patterns separated with |. A location path pattern is a location path whose steps all use only the child, descendant, and attribute axes. Patterns may use the / operator as well as the // operator. Only abbreviated syntax is allowed. Up to one predicate is allowed in each step. Predicate expressions are STXPath expressions (see 2.5 Expressions).

Predicate expressions are evaluated using the current context. If the result is a number, the result will be converted to true if the number is equal to the context position and will be converted to false otherwise. Thus a location path p[3] is equivalent to p[sf:position()=3]. Otherwise the result will be converted to a boolean using the type conversion rules describes in 5.3 Type Conversions. If the result of evaluating and converting the predicate expression is false, the current template doesn't match the current node.

If there is no matching template available a default rule is applied. One of three default rules, specified in the pass-through attribute of stx:transform or stx:group can be used: "none" (to skip the current node), "all" (to pass through the current node), and "text" (to pass through the current node only if it is a text node). The default rule can be set for the stylesheet (see 3.2 Transform Element) or for a group (see 3.3 Grouping of Templates). This feature enables copying of documents with only a few changes, and to straightforwardly select just a few items from a document. The default behavior is to ignore all not matching events (value "none") on the stylesheet level. Groups inherit the pass-through behavior from their parent group when not specified explicitly.

It is possible that the current context matches more than one rule within a precedence category. The template rule to be used is determined according the same rules as in XSLT (see [XSLT], 5.5) then. All rules have a computed priority value. The computed priority can be overridden with a 'priority' attribute value (see 4.2 Templates).

If the pattern contains multiple alternatives separated with |, then it is treated equivalently to a set of template rules, one for each alternative.
If the pattern has the form of a qualified name or has the form either of processing-instruction(target) or cdata(), then the priority is 0.
If the pattern has the form pre:* or *:lname, then the priority is -0.25.
If the pattern consists of just a node test other than cdata(), then the priority is -0.5.
Otherwise, the priority is 0.5.

The rule with the highest priority is used. If there is more than one matching template rule with the highest priority, an STX processor must report a recoverable error. The processor is allowed to recover from this error by selecting the rule that occurs the last in the stylesheet.

2.7 Attribute Value Templates

In an attribute that is designated as an attribute value template an STXPath expression can be used by surrounding the expression with curly braces ({}).

An attribute value template consists of an alternating sequence of fixed parts and variable parts. A variable part consists of an STXPath expression enclosed in curly braces ({}). A fixed part may contain any characters, except that a left curly brace must be written as {{ and a right curly brace must be written as }}.

The result of evaluating an attribute value template is obtained by concatenating the expansions of the fixed and variable parts. The expansion of a fixed part is obtained by replacing any double curly braces ({{ or }}) by the corresponding single curly brace. The expansion of a variable part is obtained by evaluating the enclosed STXPath expression and converting the resulting value to a string.

If a left curly brace appears in an attribute value template without a matching right curly brace, or if a right curly brace occurs in an attribute value template outside an expression without being followed by a second right curly brace, a processor must signal a non-recoverable error.

2.8 Whitespace Stripping

The source streams and stylesheet modules may contain whitespace nodes (text nodes consisting solely of whitespace characters: #x20, #x9, #xD or #xA). Such whitespace nodes may be removed according to the following rules. This process is referred to as whitespace stripping.

Whitespace nodes are stripped from source streams if the strip-space attribute of stx:transform (see 3.2 Transform Element) or stx:group (see 3.3 Grouping of Templates) is set to "yes". Otherwise they are preserved and treated as any other text nodes.

For stylesheets, whitespace text nodes are preserved only if an ancestor element of this text node has an xml:space attribute with a value of "preserve", and no closer ancestor element has xml:space with a value of "default". All other whitespace nodes are removed from the stylesheet.

The STX elements stx:text and stx:cdata have a default xml:space attribute with a value of "preserve" which may be overridden in the stylesheet. xml:space attributes on literal result elements will not be stripped from these elements.

2.9 Errors

All errors that can occur during an STX transformation belong to one of the following categories:

warnings - The processor may issue a warning; the transformation must not be stopped.
recoverable errors - The processor may either issue an error and stop the transformation, or it can recover from the error in the way defined in this specification.
non-recoverable (fatal) errors - The processor must exit the transformation and issue an error message.

This specification doesn't define how to issue a warning or an error. Implementations are free to use either the standard or standard error output, or any convenient error handler.

3 Stylesheet Structure

3.1 STX Namespace

The STX namespace has the URI http://stx.sourceforge.net/2002/ns.

The STX function namespace has the URI http://stx.sourceforge.net/2003/functions.

These two namespaces are recognized as reserved namespaces in STX stylesheets, and may be used only for purposes specified in this document.

3.2 Transform Element

stx:transform

<!-- Category: root -->
<stx:transform
  version = number
  pass-through = "none"|"all"|"text"
  recognize-cdata = "yes"|"no"
  default-stxpath-namespace = uri-reference
  strip-space = "yes"|"no"
  output-encoding = string/>
  <!-- Content: top-level-elements -->
</stx:transform>

Stylesheets are required to use the root element stx:transform.

The version attribute contains a version number to distinguish language versions; this attribute is mandatory and its value must be "1.0" for this version of STX.

The other attributes make it possible to set global properties of the transformation. Some of these properties (pass-through, recognize-cdata, strip-space) can also be set on the group level.

pass-through - This optional attribute specifies a default rule how to treat events no matching template is found for. These events are either ignored ("none", default) or passed to the output without modification ("all"). For "text", only text nodes are passed through to the output.
recognize-cdata - This optional attribute specifies, whether CDATA boundaries are recognized during the transformation. If so, every CDATA section forms a single node and a node kind test cdata() can be used in STXPath patterns. Otherwise (recognize-cdata="no"), CDATA boundaries will be ignored and all consequent character data form a single text node, thus the cdata() kind test never matches in STXPath patterns. The default value is "yes".
default-stxpath-namespace - This optional attribute specifies a namespace used for unprefixed name tests in STXPath expressions and patterns. See 2.5 Expressions for details. No namespace is used by default.
strip-space - This optional attribute specifies whether whitespace text nodes are stripped from source data streams. See 2.8 Whitespace Stripping for details. The default value is "no".
output-encoding - This optional attribute specifies the preferred output encoding of the resulting byte stream. The value of this attribute should be treated case-insensitively; the value must contain only printable ASCII characters (#x21 - #x7E); the value must be a charset registered with the Internet Assigned Numbers Authority (see [IANA Character Sets]).
If the attribute is not present, the output encoding is UTF-8. A compliant STX processor is not required to support any particular encoding other than UTF-8.

The stx:transform element can contain the following children from the STX namespace. These elements are called top-level elements:

stx:include
stx:variable
stx:param
stx:buffer
stx:namespace-alias
stx:group
stx:template
stx:procedure

All top-level elements may occur multiple times. stx:namespace-alias element is allowed as top-level element only.

3.3 Grouping of Templates

Templates can be organized into groups using the stx:group element. Groups of templates play a role in template matching (precedence categories are defined for groups) and determine the scoping of variables.

Each stylesheet has a virtual default group (represented by stx:transform) that is considered to be the parent of top-level groups. Explicit groups are not mandatory; many transformations can be done without grouping templates. On the other hand, templates separated to groups make it possible to define more precise transformation rules and to run safer complex transformation, especially on a well-known, regular input data.

stx:group

<!-- Category: top-level or group -->
<stx:group
  name = qname
  pass-through = "none"|"all"|"text"|"inherit"
  recognize-cdata = "yes"|"no"|"inherit"
  strip-space = "yes"|"no"|"inherit">
  <!-- Content: group-elements -->
</stx:group>

This element must be a child of either the stx:transform or the stx:group element. The optional name attribute contains a qualified name that must be unique in the stylesheet. The name can be referenced by the group attribute of any of stx:process-children, stx:process-attributes, stx:process-self, stx:process-siblings, stx:process-document or stx:process-buffer instructions. In this event, the referenced group is used instead of the current group for matching. It is not possible to reference the default group.

It is a non-recoverable error if a stylesheet contains more than one group of the same name.

The attributes pass-through, recognize-cdata, and strip-space are optional, and they set transformation properties specific for this group. Their meaning is exactly the same as the meaning of the global properties of the same name specified on the stx:transform element (see 3.2 Transform Element), with the exception that for all of these attributes the additional value "inherit" may be used. This value is also the default value and specifies that the value of the same property of the nearest ancestor group should be used. In other words, the value of the property stems from the nearest ancestor group that has the corresponding attribute set to a value distinct from "inherit", or from the default value of the stx:transform element, if no such attribute was specified. At each point of the processing, the properties of the base group apply.

Note:

The last sentence means, that for a visible template from a different group the properties of its parent group will take effect not before this template will be instantiated, thus not during the matching process.

3.4 Stylesheet Inclusion

An STX stylesheet may include another STX stylesheet using the stx:include element.

stx:include

<!-- Category: top-level or group -->
<stx:include
  href = uri-reference/>

This declaration is used to insert additional stylesheet modules into the principal stylesheet module. Circular inclusion is prohibited.

This element must be top-level or a child of the stx:group element. stx:include is replaced with the stx:transform element of the included stylesheet whereupon the included stx:transform then becomes an stx:group element. There are two exceptions: top-level stx:namespace-alias and stx:param (stylesheet parameters) from the included stylesheet are always inserted as top-level elements. The resulting stylesheet must meet the criteria for being a valid STX stylesheet (for example concerning unique groups and parameters).

The rules for the attributes of the imported stx:transform element are as follows:

The version and output-encoding attributes won't affect the including stylesheet. However, the included stylesheet must be valid, that means its version attribute must be "1.0".
The default-stxpath-namespace attribute will be used for the included stylesheet. The default-stxpath-namespace attribute of the stx:transform element of the principal stylesheet module never affects included stylesheet modules.
The pass-through, recognize-cdata, and strip-space attributes become attributes of the new stx:group element. A missing attribute becomes an attribute on the stx:group element with the default value defined for stx:transform, i.e. it never can have the value "inherit".

There is no difference between templates from the principal stylesheet module and included templates in terms of matching precedence.

4 Generating Output

STX templates are called sequentially for each incoming node rather than from other templates. Pair events for the document and elements match only one template, which is broken into two parts; the first part is executed when the start event appears and the second one at the end event. The two parts are separated by the stx:process-children element.

4.1 Namespace Aliasing

stx:namespace-alias

<!-- Category: top-level -->
<stx:namespace-alias
  source-prefix = ncname|"#default"
  result-prefix = ncname|"#default"/>

Namespaces from source streams can be mapped to different namespaces in result streams using the stx:namespace-alias element. Both attributes are mandatory and can contain either a prefix bound to the namespace to be used or the "#default" keyword for the default namespace.

4.2 Templates

stx:template

<!-- Category: top-level or group -->
<stx:template
  match = pattern
  priority = number
  visibility = "local"|"group"|"global"
  public = "yes"|"no"
  new-scope = "yes"|"no">
  <!-- Content: template -->
</stx:template>

Rules to process input events are written in templates. The stx:template element must be a child of either the stx:transform or the stx:group element. Templates match to the events by means of precedence categories and a pattern in the mandatory match attribute. The optional priority attribute can contain an explicit priority value used for matching (see 2.6 Match Patterns).

Two optional attributes; visibility and public; control whether the template is visible from other groups (and thus can match to the next event) or not. See 2.4 Precedence Categories for meaning of the two attributes. The default value of the visibility attribute is "local". The default value of the public attribute for top-level templates as "yes", for group templates it is "no". Whether a top-level template is public or not is important only when the stylesheet is included into another stylesheet, because every top-level template then becomes a group template, see 3.4 Stylesheet Inclusion.

The optional new-scope attribute specifies whether the template creates new instances of group variables. The default value is "no". A new set of group variables is created for each instantiated template with new-scope="yes". These variables shadow their former values and exist as long as the template is being processed.

The content of templates may include both STX instructions and declarations, and literal elements. Literal elements are simply copied to the output.

A text template is defined as the content of some elements (stx:attribute, stx:variable, stx:param, stx:assign, stx:with-param, stx:cdata, stx:processing-instruction, stx:comment, stx:message). This is a part of template that generates nothing but character events to the current output stream. An STX processor is required to issue a run-time recoverable error if another type of event is emitted. The processor is allowed to recover from this error by ignoring the non-character event.

4.3 Procedures

stx:procedure

<!-- Category: top-level or group -->
<stx:procedure
  visibility = "local"|"group"|"global"
  public = "yes"|"no"
  new-scope = "yes"|"no"
  name = qname>
  <!-- Content: template -->
</stx:procedure>

Procedures are sub-templates that can be called by names (with the stx:call-procedure instruction). The optional visibility, public, and new-scope attributes have the same meaning and default values as for templates. Only visible procedures can be called by name, the new-scope must be set to "yes" to create new copies of group variables. It is a static non-recoverable error if a stylesheet contains more than one visible procedure with the same name within the same precedence category.

The content of procedures may be the same as the content of templates.

stx:call-procedure

<!-- Category: template -->
<stx:call-procedure
  name = qname
  group = qname>
  <!-- Content: stx:with-param* -->
<stx:call-procedure>

The stx:call-procedure element makes it possible to invoke a procedure by its name. The name attribute is mandatory. The optional group attribute allows to use the specified group instead of the current group as a base group for calling the procedure.

The target procedure will be determined according to the precedence categories described in 2.4 Precedence Categories. If the first category doesn't contain a procedure with the requested name, then the second category will be searched. If neither the first nor the second category contain such procedure, the third category is searched. It is a static non-recoverable error if none of the three precedence categories contain the requested procedure.

4.4 Parameters

Values can be passed to stylesheets or to their templates and procedures as parameters. Parameter are variables (see 6.1 Variables) with the additional property that their value can be set by the caller of the stylesheet, the template, or the procedure. Stylesheet parameters behave in the same way as variables of the default group. Template/procedure parameters behave in the same way as local variables; thus they are only visible within the template or procedure they are passed to. There are two elements available to work with parameters:

stx:with-param

<!-- Category: process-xxx, call-procedure -->
<stx:with-param
  name = qname
  select = expression
  <!-- Content: text template -->
</stx:with-param>

Parameters are passed to templates or procedures using the stx:with-param element. The required name attribute specifies the name of the parameter. The value of the parameter is either the result returned by the expression located in the optional select attribute or the content of this element if the select attribute is missing. If neither the select attribute nor the content is present the parameter value is the empty string.

The stx:with-param instruction is allowed as a child of the elements stx:process-children, stx:process-attributes, stx:process-self, stx:process-siblings, stx:process-document, stx:process-buffer, or stx:call-procedure, and must not have any of these elements in its content.

stx:param

<!-- Category: top-level or template -->
<stx:param
  name = qname
  select = expression
  required = "yes" | "no">
  <!-- Content: text template -->
</stx:param>

The stx:param element is allowed as a top-level element (indicating a stylesheet parameter as a child of stx:transform) and in templates or procedures (as a child of stx:template or stx:procedure). The required name attribute specifies the name of the parameter. The optional select attribute or the content of this element specifies a default value, which is both evaluated and used only when there is no value specified using the select attribute or the content of the appropriate stx:with-param element. Should both the select attribute and the content be missing, the parameter defaults to the empty string.

Stylesheet parameters are statically initialized while parsing the stylesheet; only the static context information is available during the initialization. Template/procedure parameters are initialized at run-time. Since there is no current source stream available during the static initialization, it is a recoverable error if a stylesheet (top-level) parameter has an stx:process-children, stx:process-attributes, stx:process-self, or stx:process-siblings instruction in its content. A processor may recover from this error by ignoring such an instruction.

The optional required attribute may be used to indicate that a parameter is mandatory. The default value is "no", indicating that the parameter is optional. If the value of the required attribute is "yes", the stx:param element must be empty, and must have no select attribute. It is a dynamic non-recoverable error if the caller doesn't supply a value with stx:with-param for a required parameter.

4.5 Copying the Current Node

stx:copy

<!-- Category: template -->
<stx:copy
  attributes = pattern>
  <!-- Content: template -->
</stx:copy>

The stx:copy element is used to copy the current node to the output. The optional attributes attribute contains a pattern. These attributes of the current node that match the pattern are copied to the output. If the attributes attribute isn't present no attributes are copied with the current node.

Thus, attributes="@*" copies all attributes, attributes="@foo|@bar" copies the foo and bar attributes only, attributes="@*[not(name()='foo')]" copies all but the foo attribute, and attributes="@*[false()]" doesn't copy any attributes as if the attributes attribute is missing at all.

If the stx:copy instruction applies to a node other than element the attributes attribute is ignored.

4.6 Processing Nested Events

stx:process-children

<!-- Category: template -->
<stx:process-children
  group = qname>
  <!-- Content: stx:with-param* -->
</stx:process-children>

The instruction stx:process-children suspends the processing of the current template by processing the children of the current node. Using SAX2 terms: this instruction splits a template into two parts such that a SAX2 startElement event causes the execution of the first part and the corresponding SAX2 endElement event causes the execution of the second part.

There must be always at most one stx:process-children instruction executed during the processing of a template. Moreover, it is a non-recoverable error if stx:process-children is encountered after an stx:process-self instruction or an stx:process-siblings instruction.

Note:

If a template doesn't contain any stx:process-children instruction, the children of this element will be skipped. The default rule applies only to nodes that are processed and no matching template is found.

Note:

If the current node is neither an element node nor the document root then the stx:process-children instruction simply does nothing.

The optional group attribute makes it possible to use the specified group instead of the current group as the base for matching (see 2.4 Precedence Categories). It is a recoverable error if the group of the specified name is not available. An STX processor can recover from this error by using the current group.

4.7 Processing Attributes

stx:process-attributes

<!-- Category: template -->
<stx:process-attributes
  group = qname>
  <!-- Content: stx:with-param* -->
</stx:process-attributes>

This instruction is used to apply templates to attributes of an element node.

4.8 Processing Siblings

stx:process-siblings

<!-- Category: template -->
<stx:process-siblings
  while = pattern
  until = pattern
  group = qname>
  <!-- Content: stx:with-param* -->
</stx:process-siblings>

The stx:process-siblings instruction suspends the processing of the current template and processes the following siblings of the current node. The processing can be terminated by one of while or until conditions, or because of the end of the parent element or the current buffer (see stx:process-buffer).

Note:

If the current node is an attribute node or the document root node the stx:process-siblings instruction does nothing.

The optional while attribute contains a pattern. The next siblings are processed as long as they match the specified pattern. The first non-matching node stops the processing; this node is not processed by this stx:process-siblings instruction.. The while attribute defaults to node().

The optional until attribute contains a pattern. The next siblings are processed until a node matching the pattern is encountered; this node is not processed by this stx:process-siblings instruction. The until attribute defaults to node()[false()].

If both while and until attributes have been specified then both conditions have to be met. For example <stx:process-siblings while="foo" until="foo"/> doesn't process any siblings. Variable bindings used within the patterns will be interpreted with regard to the current context. That means changed group variables affect the evaluation, whereas new instances of group variables or local variables are not visible.

Note:

Whitespace text nodes not stripped from the document must be considered in the patterns, particularly when using the while attribute. A typical attribute specification would be while="foo | text()" which processes all following foo elements and potential text nodes between these foo elements.

An stx:process-siblings instruction encountered during the processing of siblings does not affect the while and until conditions of the previous stx:process-siblings. In other words: nested stx:process-siblings instructions process at most the siblings chosen in the preceding stx:process-siblings. That means stx:process-siblings also returns if there are no more siblings in the input available or a preceding stx:process-siblings terminates.

Though multiple stx:process-siblings instructions may appear within the same template it is a non-recoverable error if an stx:process-children or stx:process-self instruction will be encountered after stx:process-siblings.

4.9 Running Overridden Templates

stx:process-self

<!-- Category: template -->
<stx:process-self
  group = qname>
  <!-- Content: stx:with-param* -->
</stx:process-self>

This instruction is used to process the current node using the template that would have been chosen if the current template wasn't present in the stylesheet. The current template won't be instantiated again for this node, even in a chain of calls to stx:process-self. There must be always at most one stx:process-self instruction executed during the processing of a template. Moreover it is a non-recoverable error if an stx:process-self instruction is encountered after an stx:process-children or an stx:process-siblings instruction in a template.

Note:

If no group attribute has been specified then the current group will be used for choosing the next best matching template. This is also true if the current group has been automatically entered via a public template.

4.10 Processing Text

stx:process-text

<!-- Category: template -->
<stx:process-text
  select = expression>
  <!-- Content: stx:pattern+ -->
</stx:process-text>

stx:pattern

<stx:pattern
  regexp = expression
  case = "sensitive"|"insensitive">
  <!-- Content: template -->
</stx:pattern>

This instruction processes a string in a similar way as stx:template processes nodes. The mandatory select attribute of stx:process-text selects a string to process by evaluating the expression and converting it to a string. The mandatory regexp attribute of stx:pattern takes a regular expression by evaluating the expression in the regexp attribute and converting it to a string, which describes a substring to look for. The optional case attribute determines whether the regular expression is case-sensitive (value "sensitive") or not (value "insensitive"). The default is "sensitive".

The stx:process-text instruction looks for the pattern among the regexp attributes of all stx:pattern elements that matches first in the string selected by the select attribute. The substring before the matched substring will be output, and the matched substring itself will be replaced by the contents of the stx:pattern element. Afterwards this stx:process-text instruction will continue by processing the substring after the matched substring. If no pattern matches then the remaining string will be emitted as a text node to the result stream. A pattern must match at least one character.

In case two or more pattern may match at the same position then the pattern which matches the longest character sequence will be used. If still two or more patterns meet this condition then the first one will be used.

4.11 Outputting Strings

stx:value-of

<!-- Category: template -->
<stx:value-of
  select = expression/>

This instructions emits characters to the result stream. The mandatory select attribute contains an STXPath expression which is evaluated and converted to a string. This element is always empty.

stx:text

<!-- Category: template -->
<stx:text
  markup = "error"|"ignore"|"serialize">
  <!-- Content: template -->
</stx:text>

This instruction emits literal character data to the result stream.

The optional markup attribute determines how non-text nodes in the content of stx:text should be handled: "error" causes the processor to raise a run-time recoverable error for such nodes, "ignore" ignores any markup by emitting only the string value of the contents to the result stream, "serialize" emits any markup serialized as text. The default value is "error". The processor may recover from an error raised because having markup set to "error" by ignoring this attempt.

Note:

The string created by markup="serialize" may vary in different STX implementations, because some of the lexical representation is not relevant for the information coded in XML. For example every STX implementation may choose its own order for serializing attributes.

The stx:text element has an implicit xml:space attribute with the default value "preserve". Thus the content is normally neither normalized nor stripped should it contain whitespace characters only.

stx:cdata

<!-- Category: template -->
<stx:cdata>
  <!-- Content: text template -->
</stx:cdata>

This instructions emits literal data as a CDATA section to the result stream.

The stx:cdata element has an implicit xml:space attribute with the default value "preserve". Thus the content is normally neither normalized nor stripped should it contain whitespace characters only.

4.12 Outputting Elements and Attributes

stx:element

<!-- Category: template -->
<stx:element
  name = {qname}
  namespace = {uri-reference}>
  <!-- Content: template -->
</stx:element>

This instruction is used to generate an element. It has the same meaning as in [XSLT].

stx:start-element

<!-- Category: template -->
<stx:start-element
  name = {qname}
  namespace = {uri-reference}/>

stx:end-element

<!-- Category: template -->
<stx:end-element
  name = {qname}
  namespace = {uri-reference}/>

There are separate instructions available to output an element start tag and an element end tag. The name attribute is required for both instructions. The both elements must be empty.

A compliant STX processor is required to produce well-formed XML output. An attempt to create an end-tag without a matching start-tag must be reported as non-recoverable error by the STX processor.

stx:attribute

<!-- Category: template -->
<stx:attribute
  name = {qname}
  namespace = {uri-reference}
  select = expression>
  <!-- Content: text template -->
</stx:attribute>

This instruction is used to generate an attribute. It has the same meaning as in [XSLT]. Alternatively, the value of the generated attribute may be specified in the optional select attribute. It is a recoverable error of this instruction has a select attribute and is not empty. A processor can recover from this error by ignoring the content of stx:attribute.

stx:attribute must follow an element-starting instruction (stx:element, stx:start-element, stx:copy, or a literal element) and no other output-generating instructions are allowed between the element-starting instruction and stx:attribute. It is a recoverable error if there is no immediate element-starting instruction before. A processor can recover from this error by ignoring the stx:attribute instruction.

4.13 Outputting Other Nodes

stx:processing-instruction

<!-- Category: template -->
<stx:processing-instruction
  name = {ncname}>
  <!-- Content: text template -->
</stx:processing-instruction>

This instruction is used to generate a processing instruction. It has the same meaning as in [XSLT].

stx:comment

<!-- Category: template -->
<stx:comment>
  <!-- Content: text template -->
</stx:comment>

This instruction is used to generate a comment. It has the same meaning as in [XSLT].

4.14 Conditions

stx:if

<!-- Category: template -->
<stx:if
  test = expression>
  <!-- Content: template -->
</stx:if>

The mandatory test attribute contains an STXPath expression evaluating to boolean. The content template is instantiated if and only if the test attribute has evaluated to true.

stx:else

<!-- Category: template -->
<stx:else>
  <!-- Content: template -->
</stx:else>

This instruction must follow immediately after stx:if; a non-recoverable error must be reported otherwise. The content template is instantiated if and only if the test attribute of the preceding stx:if instruction has evaluated to false.

stx:choose

<!-- Category: template -->
<stx:choose>
  <stx:when
    test = expression>
    <!-- Content: template -->
  </stx:when>+
  <stx:otherwise>
    <!-- Content: template -->
  </stx:otherwise>?
</stx:choose>

The same meaning as in [XSLT].

4.15 Loops

stx:for-each-item

<!-- Category: template -->
<stx:for-each-item
  name = qname
  select = expression>
  <!-- Content: template -->
</stx:for-each-item>

The stx:for-each-item instruction contains a template that is instantiated for each item of the sequence specified by the mandatory select attribute.

The mandatory name attribute specifies a name of local variable that is declared automatically for each item, and that contains the current item.

Neither the current node (accessed with .) nor sf:position() change inside stx:for-each-item.

stx:while

<!-- Category: template -->
<stx:while
  test = expression>
  <!-- Content: template -->
</stx:while>

The mandatory test attribute contains an STXPath expression evaluating to boolean. The contents of the stx:while element is instantiated repeatedly as long as the test attribute evaluates to true.

4.16 Multiple Input Documents

stx:process-document

<!-- Category: template -->
<stx:process-document
  href = expression
  base = {uri-reference}|"#input"|"#stylesheet"
  group = qname>
  <!-- Content: stx:with-param* -->
</stx:process-document>

A stylesheet can process further source streams in addition to this supplied when the transformation is invoked (the principal source stream). The current source stream can be changed with the stx:process-document instruction. When this instruction is instantiated the expression in the mandatory href attribute will be evaluated, each item in the resulting sequence will be converted sequentially to a string (a URI), and its value will be used to identity and to process a new current source stream. Then, the execution of the template containing the stx:process-document instruction continues with the original source stream.

If a URI is a relative URI then the base URI will be derived from the type of the item in the sequence that represents this URI. In case this item is a node then its base URI will be used, otherwise the base URI of the stylesheet will be used. Alternatively, the optional base attribute can be used to specify explicitly which base URI should be used. Its value must be either an absolute URI, the string "#input" in which case the base URI of the current input stream will be used, or the string "#stylesheet" in which case the base URI of the principal stylesheet will be used.

Note:

When processing a new document, the ancestor stack of the original document is not available for matching and navigation. Each new document has an ancestor stack of its own.

4.17 Multiple Output Documents

stx:result-document

<!-- Category: template -->
<stx:result-document
  href = expression
  encoding = string>
  <!-- Content: template -->
</stx:result-document>

A stylesheet can produce further result streams in addition to the principal result stream. The current result stream can be changed with the stx:result-document instruction. Events generated as the result of executing instructions contained within the stx:result-document element are emitted to a new current result stream identified with the URI which is the result of evaluating the expression in the required href attribute and converting its value to a string. Then, the execution of instructions behind the end of the stx:result-document element continues to emit events into the original result stream.

The optional encoding attribute can be used to specify a preferred output encoding for the new result stream. If this attribute is not present the encoding of the principal result stream will be used.

4.18 Buffers

A sequence of events can be stored into an object called a buffer. The stored events can be emitted and processed later, in the same way as events emitted from a source stream. The events are emitted from a buffer in the same order as they were stored in. In other words, the buffers are temporary storages of the 'first in first out' type. The events stored in a buffer must represent a well-formed external general parsed entity (the restriction on a single root node is relaxed).

There are two types of buffers:

group buffers - stx:buffer is child of either stx:transform or stx:group. Top-level buffers are considered members of the top-most default group that exists for each stylesheet.
local buffers - Declared within templates.

A buffer must be declared before it can be used. The same rules as for variables (see 6.1 Variables) apply for the the visibility of buffers, their shadowing, and the creating of new instances for new-scope templates (see 4.2 Templates).

stx:buffer

<!-- Category: top-level, group or template-->
<stx:buffer
  name = qname>
  <!-- Content: template -->
</stx:buffer>

The stx:buffer element declares a buffer. The mandatory name attribute contains a qualified name identifying the declared buffer. The buffer is initialized with events generated as a result of the evaluation of the content of the stx:buffer. If the content is empty (stx:buffer element has no children) the buffer is empty.

For group buffers, the content of stx:buffer element is evaluated statically. It is a recoverable error if the element stx:buffer declaring a group buffer contains an stx:process-children, stx:process-self, stx:process-siblings, stx:process-attributes, stx:process-document, stx:process-buffer, or call-procedure instruction in its content. A processor may recover from this error by ignoring such an instruction.

stx:result-buffer

<!-- Category: template -->
<stx:result-buffer
  name = qname
  clear = "yes"|"no">
  <!-- Content: template -->
</stx:result-buffer>

The stx:result-buffer instruction directs events emitted by its content into the buffer specified with the mandatory name attribute rather than to the current result stream. The buffer must be declared with stx:buffer before it can be employed in stx:result-buffer.

If the buffer specified with the name attribute already contains a sequence of events, the new sequence of events is appended behind the last event in the previously stored sequence normally. If the stx:result-buffer element has the optional clear attribute with the value of "yes", the previously stored events are removed from the buffer before the new sequence of events is stored in. The clear attribute defaults to "no".

Note:

To clear a buffer without storing a new sequence of events, use the stx:result-buffer instruction with no content:<stx:result-buffer name="my-buffer" clear="yes"/>

The events stored in a buffer will be available for a following stx:process-buffer not before the stx:result-buffer instruction has terminated. Until then the previous contents is accessible. Thus for processing a buffer and storing the result in the same buffer again use <stx:result-buffer name="b" clear="yes"> <stx:process-buffer name="b"/> </stx:result-buffer>

It is a non-recoverable error if this instruction is executed for a buffer that acts already as (current or suspended) result buffer.

stx:process-buffer

<!-- Category: template -->
<stx:process-buffer
  name = qname
  group = qname>
  <!-- Content: stx:with-param* -->
</stx:process-buffer>

The stx:process-buffer instruction emits the events currently stored in the buffer specified by the mandatory name attribute to the STX processor. The events are processed in the same way as events supplied by source streams. When the very last event from the buffer is processed, the processing in the current template continues with an instruction, declaration or literal next to the stx:process-buffer instruction.

Note:

Changes to the contents of a buffer that is currently processed won't affect this processing. The stx:process-buffer instruction creates an internal copy of the contained events and emits them afterwards.

The processing of events from a buffer doesn't mean the emptying of this buffer. Once a sequence of events is stored in the buffer, it can be processed repeatedly.

Note:

A buffer is not treated as a new document, but rather as if events emitted from the buffer originate from the current source stream. The ancestor stack of the current source stream remains available for matching and navigation when processing nodes from the buffer.

4.19 Messages

stx:message

<!-- Category: template -->
<stx:message>
  <!-- Content: template -->
</stx:message>

The stx:message instruction generates a separate result stream whose handling is implementation dependent. It can be directed to a log, or to a special message resolver, etc. However, all instructions of the content of the stx:message element must processed even if the message stream is ignored.

5 Data Types

5.1 Atomic Types

There are four atomic data types in STX:

string
number
boolean
node

There are seven types of node recognized in STXPath (see 2.2 Nodes). For every type of node, there is a way of determining the string-value. Since descendants are not available in the time of processing, string-values for some types of nodes are different from XPath string-values.

root nodes - there is no string value defined for root nodes, a recoverable error is reported. An STX processor is allowed to recover from this error by returning the empty string.
element nodes - if the very first child of an element happens to be a text node, the string-value of the element is the string-value of this text node. Otherwise, the string-value of the element is the empty string.
attribute nodes - the string-value of an attribute is the normalized value of this attribute
text nodes - the string-value of a text node is the character data of this node
cdata nodes - the string-value of a cdata node is the character data of this node
processing instruction nodes - the string-value of a processing instruction node is the part of the processing instruction following the target and any whitespace not including the terminating ?>
comment nodes - the string-value of a comment is the content of this comment not including the opening

5.2 Sequences

STXPath expressions (see 6 STXPath) always return a sequence. A sequence is an ordered collection of zero or more items. Unlike common lists, sequences are "flat"; sequences may not contain other sequences. Sequences may contain duplicate items. An item must be of one of the atomic types: string, number, boolean, or node.

A sequence with zero items is called an empty sequence. A sequence with exactly one item is called a singleton sequence. There is no distinction between an item and a singleton sequence containing this item; an item is equivalent to a singleton sequence containing this item and vice versa. A sequence has no identity. Equality comparison of sequences is performed only by comparing items of the sequences.

5.3 Type Conversions

Certain operators, functions, and syntactic constructs expect a value of a particular type to be supplied: this type is referred to as a required type. In such an event, a general sequence is converted to the required type according to the conversion rules.

The empty sequence is converted to required types as defined in the following table:

required type	result
boolean	`false`
string	empty string
number	`NaN`
node	NON-RECOVERABLE ERROR

A singleton sequence is converted to a required type according to the type of the only item in the sequence. An attempt to convert boolean, string, or number to node causes a non-recoverable error.

item type	boolean required	string required	number required
boolean		`false` is converted to 'false', `true` is converted to 'true'	`false` is converted to 0, `true` is converted to 1
string	the empty string is converted to `false`, other strings are converted to `true`		a string that consists of optional whitespace followed by an optional minus sign followed by a numeric literal (see 6.2 Literals) followed by whitespace is converted to the number that is nearest to the mathematical value represented by the string; any other string is converted to `NaN`.
number	0, +0, -0, `NaN` are converted to `false`, other numbers are converted to `true`	`NaN` is converted to 'NaN', +0 and -0 are converted to '0', positive infinity is converted to 'Infinity', negative infinity is converted to '-Infinity'. Other numbers are represented in decimal form as numeric literal (see 6.2 Literals) with no leading zeros (apart possibly from the one required digit immediately before the decimal point), preceded by a minus sign (-) if the number is negative.
node	a node is converted to `true`	a node is converted to its string value (see 2.5 Expressions)	a node is converted to its string value (see 2.5 Expressions); then the rules to convert strings to numbers are applied to convert the string value to a number

A sequence containing more than one item is converted according to its very first item; all other items are ignored. The same conversion rules as for singleton sequences are applied (see the table above).

6 STXPath

STXPath is an expression language for STX which is very similar to [XPath] on the first sight. Syntactically, STXPath is close to an [XPath2] sub-set. However, since STX has a different notion of context, the meaning of some expressions may be different in STXPath and in XPath. Consider the following example:

In XPath, the expression /node1/node2 returns a node-set containing all node2 elements, whose parent node1 is the document element. In STXPath, on contrary, the same expression returns only a single node from this node-set; the one which is an ancestor of the current node.

Each expression has its static context - the information that is available during static analysis of the expression, prior to its evaluation. The static context includes in-scope namespaces, default namespace for element names, default function namespace, and in-scope variables. The information that is available at the time when the expression is evaluated is the current context as defined in 2.3 Context.

Basic primitives of STXPath include:

variables (6.1 Variables)
literals (6.2 Literals)
parenthesized expressions (6.3 Parenthesized Expressions)
functions (6.4 Functions)

Expressions evaluate always to a sequence. See the EBNF production for expression in C STXPath Grammar for the details.

6.1 Variables

STX variables are scoped statically according to the literal structure of stylesheets. The grouping of templates is used to make the sharing of other than global variables possible.

There are two types of variables:

group variables - stx:variable is child of either stx:transform or stx:group. Top-level variables are considered to be members of the top-most default group that exists for each stylesheet.
local variables - Declared within templates.

A group variable is visible for the group where the variable is declared, for all descendant groups and for all templates belonging to these groups. A local variable is visible for all following siblings of the variable declaration and their descendants. Group variables may be shadowed (another variable with the same name is visible) by descendant group variables and by local variables. It is a non-recoverable error to redeclare a variable with the same name in the same group or template.

Variables always contain a sequence. STX instructions stx:variable and stx:assign are used to evaluate an expression and store its value into a variable.

Since variables are re-assignable, each variable must be declared using the stx:variable element before it's used (assigned, referenced). Group variables are statically initialized while parsing the stylesheet; Only the static context information is available during the initialization of group variables. Local variables are initialized at run-time. A variable declared with no value is initialized with the singleton sequence containing the empty string.

stx:variable

<!-- Category: top-level or group or template -->
<stx:variable
  name = qname
  select = expression
  keep-value = "yes"|"no">
  <!-- Content: text template -->
</stx:variable>

This instruction is used to declare and initialize a variable. The mandatory name attribute contains the name of the variable. An expression in the select attribute is evaluated and the variable is initialized with its result. The select attribute is optional; a variable is initialized with the string resulting from the content of the stx:variable element if the select is missing. If the content is empty (stx:variable element has no children) the variable is initialized with the empty string.

Note:

Thus, <stx:assign name="var"/> is equal to <stx:assign name="var" select="''"/>.

It is a recoverable error if the element stx:variable declaring a group variable contains an stx:process-children, stx:process-self, stx:process-siblings, stx:process-attributes, stx:process-document, stx:process-buffer, or call-procedure instruction in its content. A processor may recover from this error by ignoring such an instruction.

The optional keep-value attribute specifies whether a new instance of the variable created by instantiating a template having its new-scope attribute set to "yes" is initialized with the value of the shadowed variable (yes) or not (no). This attribute is allowed only for group variables. The default value is no. If there is no shadowed variable yet, the keep-value attribute is ignored.

stx:assign

<!-- Category: top-level or group or template -->
<stx:assign
  name = qname
  select = expression>
  <!-- Content: text template -->
</stx:assign>

This instruction is used to assign a new value to a previously declared variable. The mandatory name attribute contains the name of the variable. The expression in the optional select attribute is evaluated and its result is assigned to the variable. The string resulting from the content of the stx:assign element is assigned to the variable if the select is missing. If the content is empty, the empty string is assigned to the variable.

Note:

Thus, <stx:assign name="var"/> is equal to <stx:assign name="var" select="''"/>.

6.2 Literals

A literal is a direct syntactic representation of an atomic value. STXPath supports two kinds of literals: string literals and numeric literals.

The value of a string literal is a singleton sequence containing an item whose atomic type is string and whose value is the string denoted by the characters between the delimiting quotation marks.

  StringLiteral    ::=    (["][^"]*["]) | (['][^']*['])

The value of a numeric literal is a singleton sequence containing an item whose type is number and whose value is obtained by parsing the numeric literal according to the rules for string to numbers conversion (see 5.3 Type Conversions).

  NumericLiteral    ::=    IntegerLiteral | DecimalLiteral | DoubleLiteral
IntegerLiteral    ::=    Digits
DecimalLiteral    ::=    ('.' Digits) | (Digits '.' [0-9]*)
DoubleLiteral    ::=    (('.' Digits) | (Digits ('.' [0-9]*)?))([e]|[E])([+][-])? Digits

6.3 Parenthesized Expressions

Parentheses may be used to enforce a particular evaluation order in expressions that contain multiple operators.

Parentheses are also used as delimiters in constructing a sequence, as described in 6.6 Sequence Expressions.

6.4 Functions

A function call consists of a function qualified name followed by a parenthesized list of zero or more expressions. The expressions inside the parentheses provide the arguments of the function call. The number of arguments must be equal to the number of function parameters; otherwise a static non-recoverable error is raised.

A function call is evaluated as follows:

Each argument expression is evaluated, producing an argument value (sequence).
If the corresponding function parameter has a required type, the argument value is converted to this type.
The function is executed using the converted argument values. The result is a value of the function's declared return type.

STXPath function names are contained in the reserved namespace http://stx.sourceforge.net/2003/functions. The sf: prefix is used to refer to this namespace in this document. The default function namespace is assigned to this reserved namespace in STX. Thus, the functions namespace does not need to be declared in STX stylesheets and STXPath functions can be invoked without any namespace prefix.

Some STXPath functions have the same definitions as their counterparts (functions with the same local name) in XPath 2.0. These functions are not re-defined in this section. Instead, original definitions in [Functions and Operators] are referenced. Other STXPath functions are either different from their XPath 2.0 counterparts or have no such counterparts; these functions are defined in this section.

All errors raised when evaluating STXPath functions are non-recoverable errors (see 2.9 Errors).

6.4.1 Sequence Functions

sf:empty(sequence) as boolean

Indicates whether or not the provided sequence is empty.

See the definition in [Functions and Operators].

sf:exists(sequence) as boolean

Indicates whether or not the provided sequence is not empty.

See the definition in [Functions and Operators].

sf:item-at(sequence, number) as item

Returns the item at given index.

See the definition in [Functions and Operators].

sf:index-of(sequence, item) as number

Returns a sequence of integer numbers, each of which is the index of a member of the specified sequence that is equal to the item that is the value of the second argument.

See the definition in [Functions and Operators].

sf:subsequence(sequence, number, number?) as sequence

Returns the subsequence of a given sequence identified by location.

See the definition in [Functions and Operators].

sf:insert(sequence, number, sequence) as sequence

Inserts an item or sequence of items into a specified position of a sequence.

See the definition in [Functions and Operators].

sf:remove(sequence, number) as sequence

Removes an item from a specified position of a sequence.

See the definition in [Functions and Operators].

6.4.2 Node Functions

sf:name(node) as string

Returns the name of the current node or the specified node.

See the definition in [Functions and Operators].

sf:namespace-uri(node) as string

Returns the namespace URI for the QName of the argument node or the current node if the argument is omitted.

See the definition in [Functions and Operators].

sf:local-name(node) as string

Returns the local name of the current node or the specified node.

See the definition in [Functions and Operators].

sf:position() as number

The sf:position function returns a number equal to the position of the current node relative to other siblings normally; see 2.3 Context for details of sf:position() semantics.

sf:has-child-nodes() as boolean

The sf:has-child-nodes function returns true if and only if the current node is the document node or an element node and has child nodes (it is not empty). It returns false otherwise.

sf:node-kind(node) as string

The sf:node-kind function returns a string value representing the node's kind: either "document", "element", "attribute", "text", "cdata", "processing-instruction", or "comment".

sf:get-in-scope-namespaces(node) as sequence

Returns the names of the in-scope namespaces for the given element.

See the definition in [Functions and Operators].

sf:get-namespace-uri-for-prefix(node, string) as string

Returns the namespace URI of one of the in-scope namespaces for the given element, identified by its namespace prefix.

See the definition in [Functions and Operators].

sf:lang(string) as boolean

Returns true or false depending on whether the language of the current node, as defined using the xml:lang attribute, is the same as, or a sub-language of, the language specified by the argument.

See the definition in [Functions and Operators].

6.4.3 Boolean Functions

sf:true() as boolean

Returns the boolean value TRUE.

See the definition in [Functions and Operators].

sf:false() as boolean

Returns the boolean value FALSE.

See the definition in [Functions and Operators].

sf:not(sequence) as boolean

Inverts the boolean value of the argument.

See the definition in [Functions and Operators].

6.4.4 String Functions

sf:concat(string?, ... ) as string

Concatenates two or more character strings.

See the definition in [Functions and Operators].

sf:string-join(string?, string?) as string

Accepts a sequence of strings and returns the strings concatenated together with an optional separator.

See the definition in [Functions and Operators].

sf:starts-with(string, string) as boolean

Indicates whether the value of one string begins with the characters of the value of another string.

See the definition in [Functions and Operators].

sf:ends-with(string, string) as boolean

Indicates whether the value of one string ends with the characters of the value of another string.

See the definition in [Functions and Operators].

sf:contains(string, string) as boolean

Indicates whether the value of one string contains the characters of the value of another string.

See the definition in [Functions and Operators].

sf:substring(string, number, number?) as string

Returns a string located at a specified place in the value of a string.

See the definition in [Functions and Operators].

sf:substring-before(string, string) as string

Returns the characters of one string that precede in that string the characters in the value of another string.

See the definition in [Functions and Operators].

sf:substring-after(string, string) as string

Returns the characters of one string that succeed in that string the characters in the value of another string.

See the definition in [Functions and Operators].

sf:string-length(string) as number

Returns the length of the argument.

See the definition in [Functions and Operators].

sf:normalize-space(string) as string

Returns the whitespace-normalized value of the argument.

See the definition in [Functions and Operators].

sf:normalize-unicode(string, string?) as string

Returns the normalized value of the first argument in the normalization form specified by the second argument.

See the definition in [Functions and Operators].

sf:upper-case(string) as string

Returns the upper-cased value of the argument.

See the definition in [Functions and Operators].

sf:lower-case(string) as string

Returns the lower-cased value of the argument.

See the definition in [Functions and Operators].

sf:translate(string, string, string) as string

Returns the first argument string with occurrences of characters in the second argument replaced by the character at the corresponding position in the third string.

See the definition in [Functions and Operators].

sf:string-pad(string, number) as string

Returns a string composed of as many copies of its first argument as specified in its second argument.

See the definition in [Functions and Operators].

sf:matches(string, string, string?) as boolean

Returns a boolean value that indicates whether the value of the first argument is matched by the regular expression that is the value of the second argument.

See the definition in [Functions and Operators].

sf:replace(string, string, string, string?) as string

Returns the value of the first argument with every substring matched by the regular expression that is the value of the second argument replaced by the replacement string that is the value of the third argument.

See the definition in [Functions and Operators].

sf:tokenize(string, string, string?) as sequence

Returns a sequence of zero or more strings whose values are substrings of the value of the first argument separated by substrings that match the regular expression that is the value of the second argument.

See the definition in [Functions and Operators].

sf:escape-uri(string, boolean) as string

Returns the string representing a URI value with certain characters escaped.

See the definition in [Functions and Operators].

6.4.5 Numerical Functions

sf:floor(number) as number

Returns the largest integer less than or equal to the argument.

See the definition in [Functions and Operators].

sf:ceiling(number) as number

Returns the smallest integer greater than or equal to the argument.

See the definition in [Functions and Operators].

sf:round(number) as number

Rounds to the nearest integer.

See the definition in [Functions and Operators].

6.4.6 Aggregate Functions

sf:count(sequence) as number

Returns the number of items in the sequence.

See the definition in [Functions and Operators].

sf:sum(sequence) as number

The sf:sum function returns the sum, for each item in the argument sequence, of the result of converting the item to a number. If the value of the argument is the empty sequence, the function returns the empty sequence. If an item can't be converted to a number, then an error is raised.

sf:avg(sequence) as number

The sf:avg returns the average of all items in the argument sequence converted to numbers. If the argument sequence is the empty sequence, the empty sequence is returned. If an item can't be converted to a number, then an error is raised.

sf:max(sequence) as number

The sf:max converts all items of the argument sequence to numbers and returns the item whose value is greater than or equal to the value of every other item in the argument sequence. If there are two or more such items, then the specific item whose value is returned is implementation-dependent. If the argument sequence is the empty sequence, the empty sequence is returned. If an item can't be converted to a number, then an error is raised.

sf:min(sequence) as number

The sf:min converts all items of the argument sequence to numbers and returns the item whose value is less than or equal to the value of every other item in the argument sequence. If there are two or more such items, then the specific item whose value is returned is implementation-dependent. If the argument sequence is the empty sequence, the empty sequence is returned. If an item can't be converted to a number, then an error is raised.

6.4.7 Conversion Functions

sf:string(sequence) as string

The sf:string function returns the result of converting the argument to a string. See 5.3 Type Conversions for details.

sf:number(sequence) as number

The sf:number function returns the result of converting the argument to a number. See 5.3 Type Conversions for details.

sf:boolean(sequence) as boolean

The sf:boolean function returns the result of converting the argument to a boolean. See 5.3 Type Conversions for details.

6.5 Data Accessors

The only data available when processing the current node is the data related to the current node itself, the data related to nodes on the ancestor stack, and data stored in variables. Location paths called data accessors are special expressions used to access to this data.

A location path always operates on the ancestor stack and evaluates to a sequence of nodes from this stack. A path beginning with / or // is called an absolute location path, starting at the document root, otherwise it is called a relative location path, starting at the current node, which is always the topmost node of the ancestor stack.

A location path consists of a series of one or more steps, separated by "/" or "//". This sequence of steps is evaluated from left to right. A path S1/S2 is evaluated as follows: S1 evaluates to a sequence of nodes from the ancestor stack. Each of these nodes acts as a base node for the following step S2.

If the step S2 is a NodeNameTest or a KindTest then the result is the node on the ancestor stack following the base node, provided it matches this step. In other words: such a step selects the child of the base node (or the empty sequence).
If the step S2 is ".." then the result is the node on the ancestor stack preceding the base node, or in other words: the parent node, provided the base node is not the root node (otherwise the empty sequence).

A path S1//S2 is evaluated by evaluating the sub expression /S2 on a sequence of base nodes which is the concatenation of the nodes from evaluating S1 and all of the descendant nodes of S1.

The result of evaluating the path is the concatenation of all resulting nodes into a sequence, sorted in document order without duplicate nodes.

Besides location paths variables and function calls may evaluate to a sequence of nodes. Such an expression is called a NodeAccessor.

A node accessor may be optionally followed by a last step which accesses the attributes for each of the nodes selected by the node accessor.

Note:

Compared to full XPath the location paths in STXPath allow only abbreviated axes. Moreover, variables and function calls can not be used as a first step of location paths, unless the next step accesses only attributes. A location path can only select nodes from the ancestor stack.

Predicates are not allowed in data accessors.

The sub-expression "." can not be used as a step within paths. This expression returns the current node (the topmost node from the ancestor stack).

Here are some examples of data accessors:

.. - returns the parent node of the current node
//foo - returns a sequence whose items are all foo elements on the ancestor stack
@foo - returns the foo attribute of the current node
../../@bar - returns the bar attribute of the grand parent of the current node
/aaa/bbb - returns a bbb element from the ancestor stack which is a child of aaa element which is the root element of the ancestor stack (and hence the root element of the input document)
/*//node() - returns all nodes from the ancestor stack except for the first

6.6 Sequence Expressions

STXPath supports operators to construct and combine sequences. One way to construct a sequence is using a parenthesized expression (6.3 Parenthesized Expressions), which consists of zero or more expressions separated with the comma operator and delimited with parentheses. The parenthesized expression is evaluated by evaluating each of its constituent expressions and concatenating the resulting sequences, in order, into a single result sequence.

Here are some examples of expressions that construct sequences:

This expression is a sequence of five integers:

(10, 1, 2, 3, 4)

This expression constructs one sequence from the sequences 10, (1, 2), the empty sequence (), and (3, 4):

(10, (1, 2), (), (3, 4))

It evaluates to the sequence (10, 1, 2, 3, 4).

6.7 Arithmetic Expressions

STXPath provides arithmetic operators for addition, subtraction, multiplication, division, and modulus, in their usual binary and unary forms. The binary subtraction operator must be preceded by a whitespace in order to distinguish it from a hyphen, which is a valid name character.

An arithmetic expression is evaluated by applying the following rules:

If either operand is the empty sequence, the result of the operation is the empty sequence.
Operands other than empty sequences are converted (5.3 Type Conversions) to numbers before the expression is evaluated. If the conversion fails (returns NaN) it returns NaN.

6.8 Comparison Expressions

Comparison expressions allow two values to be compared. STXPath provides the following general comparison operators: =, !=, <, <=, >, >=. The result of a comparison is always true or false (a singleton sequence containing one boolean item).

  CompOp    ::=    '=' | '!=' | '<' | '<=' | '>' | '>='

The result of a comparison of sequences is defined by applying the following rules, in order:

If either operand is the empty sequence, the result is false.
The comparison A operator B is true for sequences A and B if the comparison a operator b is true for some item a in A and some item b in B. Otherwise, A operator B is false.

The result of a comparison of items is defined by applying the following rules. The rules defined in 5.3 Type Conversions apply for conversions:

If both items to be compared are nodes, then the comparison will be true if and only if the result of performing the comparison on the string-values of the two nodes is true.
If one item to be compared is a node and the other is a number, then the comparison will be true if and only if the result of performing the comparison on the number and on the result of converting the string-value of that node to a number is true.
If one item to be compared is a node and the other is a string, then the comparison will be true if and only if the result of performing the comparison on the string-value of the node and the other string is true.
If one item to be compared is a node and the other is a boolean, then the comparison will be true if and only if the result of performing the comparison of true and the boolean value is true.
When neither item to be compared is node and the operator is = or !=, then the items are compared by converting them to a common type as follows and then comparing them. If at least one item to be compared is a boolean, then each item to be compared is converted to a boolean. Otherwise, if at least one item to be compared is a number, then each item to be compared is converted to a number. Otherwise, both items to be compared are converted to strings.
When neither item to be compared is node and the operator is <=, <, >= or >, then the items are compared by converting both items to numbers and comparing the numbers.

6.9 Logical Expressions

STXPath provides two common logical operators: and and or. The value of a logical expression is always one of the boolean values true or false (a singleton sequence containing a boolean item).

Logical expressions are evaluated by reducing each of its operands to an effective boolean value by applying the following rules, in order:

If the operand is the empty sequence, its effective boolean value is false.
If the operand is a singleton sequence containing a boolean item, the item serves as the effective boolean value.
If the operand is a sequence that contains at least one node, its effective boolean value is true.
In any other case, operands are converted to boolean (see 5.3 Type Conversions) to get effective boolean values.

An AND expression returns true if the effective boolean values of both of its operands are true; otherwise it returns false.

An OR expression returns false if the effective boolean values of both of its operands are false; otherwise it returns true.

In addition to logical expressions, XPath provides a function named not() that takes a general sequence as parameter and returns a boolean value.

7 Extensions

STX will define extension modules to interact with other XML and non-XML technologies. What this document describes is the core STX language. Extensions can possibly include the following modules:

STX-Script
STX-XSLT
STX-XPath

`NumericLiteral`	::=	`IntegerLiteral \| DecimalLiteral \| DoubleLiteral`
`IntegerLiteral`	::=	`Digits`
`DecimalLiteral`	::=	`('.' Digits) \| (Digits '.' [0-9]*)`
`DoubleLiteral`	::=	`(('.' Digits) \| (Digits ('.' [0-9]*)?))([e]\|[E])([+][-])? Digits`

[1]	`pattern`	::=	`PathPattern ('\|' PathPattern)?`
[2]	`expression`	::=	`Expr`

[3]	`PathPattern`	::=	`AbsolutePattern \| RelativePattern`
[4]	`AbsolutePattern`	::=	`'/' RelativePattern?`
[5]	`RelativePattern`	::=	`Step (('/' RelativePattern) \| ('//' RelativePattern))?`
[6]	`Step`	::=	`NodeTest Predicate?`
[7]	`NodeTest`	::=	`NameTest \| KindTest`
[8]	`Predicate`	::=	`'[' Expr ']'`
[9]	`NameTest`	::=	`NodeNameTest \| AttributeNameTest`
[10]	`NodeNameTest`	::=	`QName \| NCName ':' '' \| '' \| '*' ':' NCName`
[11]	`AttributeNameTest`	::=	`'@' QName \| '@' NCName ':' '' \| '@' '' \| '@' '*' ':' NCName`
[12]	`KindTest`	::=	`AnyKindTest \| CommentTest \| ProcessingInstructionTest \| TextTest \| CDATATest`
[13]	`AnyKindTest`	::=	`'node()'`
[14]	`CommentTest`	::=	`'comment()'`
[15]	`ProcessingInstructionTest`	::=	`'processing-instruction(' StringLiteral? ')'`
[16]	`TextTest`	::=	`'text()'`
[17]	`CDATATest`	::=	`'cdata()'`

[18]	`Expr`	::=	`OrExpr`
[19]	`OrExpr`	::=	`AndExpr \| OrExpr 'or' AndExpr`
[20]	`AndExpr`	::=	`GeneralComp \| AndExpr 'and' GeneralComp`
[21]	`GeneralComp`	::=	`AdditiveExpr \| GeneralComp CompOp AdditiveExpr`
[22]	`AdditiveExpr`	::=	`MultiplicativeExpr \| AdditiveExpr ('+' \| '-') MultiplicativeExpr`
[23]	`MultiplicativeExpr`	::=	`UnaryExpr \| MultiplicativeExpr ('*' \| 'div' \| 'mod') UnaryExpr`
[24]	`UnaryExpr`	::=	`('-' \| '+')? BasicExpr`
[25]	`BasicExpr`	::=	`DataAccessor \| ParenthesizedExpr \| Literal \| '.'`
[26]	`ParenthesizedExpr`	::=	`'(' ExprSequence? ')'`
[27]	`ExprSequence`	::=	`Expr (',' Expr)*`
[28]	`Literal`	::=	`NumericLiteral \| StringLiteral`

[29]	`DataAccesor`	::=	`NodeAccessor \| NodeAccessor '/' AttributeNameTest \| AttributeNameTest`
[30]	`NodeAccessor`	::=	`PathAccessor \| Variable \| FunctionCall`
[31]	`FunctionCall`	::=	`QName '(' ExprSequence? ')'`
[32]	`PathAccessor`	::=	`('/' \| '//')? RelativeAccessor`
[33]	`RelativeAccessor`	::=	`RelativeAccessor ('/' \| '//') AccessorStep \| AccessorStep`
[34]	`AccessorStep`	::=	`NodeNameTest \| KindTest \| '..'`

[35]	`CompOp`	::=	`'=' \| '!=' \| '<' \| '<=' \| '>' \| '>='`
[36]	`NumericLiteral`	::=	`IntegerLiteral \| DecimalLiteral \| DoubleLiteral`
[37]	`IntegerLiteral`	::=	`Digits`
[38]	`DecimalLiteral`	::=	`('.' Digits) \| (Digits '.' [0-9]*)`
[39]	`DoubleLiteral`	::=	`(('.' Digits) \| (Digits ('.' [0-9]*)?))([e]\|[E])([+][-])? Digits`
[40]	`StringLiteral`	::=	`(["][^"]["]) \| (['][^']['])`
[41]	`Variable`	::=	`'$' QName`
[42]	`Digits`	::=	`[0-9]+`

Streaming Transformations for XML (STX)Version 1.0

Working Draft 5 May 2003

Abstract

Status of this Document

Table of Contents

Appendices

1 Introduction

2 Concepts

2.1 Initiating a Transformation

2.2 Nodes

2.3 Context

2.4 Precedence Categories

2.5 Expressions

2.6 Match Patterns

2.7 Attribute Value Templates

2.8 Whitespace Stripping

2.9 Errors

3 Stylesheet Structure

3.1 STX Namespace

3.2 Transform Element

3.3 Grouping of Templates

3.4 Stylesheet Inclusion

4 Generating Output

4.1 Namespace Aliasing

4.2 Templates

4.3 Procedures

4.4 Parameters

4.5 Copying the Current Node

4.6 Processing Nested Events

4.7 Processing Attributes

4.8 Processing Siblings

4.9 Running Overridden Templates

4.10 Processing Text

4.11 Outputting Strings

4.12 Outputting Elements and Attributes

4.13 Outputting Other Nodes

4.14 Conditions

4.15 Loops

4.16 Multiple Input Documents

4.17 Multiple Output Documents

4.18 Buffers

4.19 Messages

5 Data Types

5.1 Atomic Types

5.2 Sequences

5.3 Type Conversions

6 STXPath

6.1 Variables

6.2 Literals

6.3 Parenthesized Expressions

6.4 Functions

6.4.1 Sequence Functions

6.4.2 Node Functions

6.4.3 Boolean Functions

6.4.4 String Functions

6.4.5 Numerical Functions

6.4.6 Aggregate Functions

6.4.7 Conversion Functions

6.5 Data Accessors

6.6 Sequence Expressions

6.7 Arithmetic Expressions

6.8 Comparison Expressions

6.9 Logical Expressions

7 Extensions

A References

A.1 Normative References

A.2 Other References

B Element Syntax Summary

C STXPath Grammar

Main Constructs

Match Patterns

Expressions

Data Accessors

Syntactic Constructs

D Acknowledgments (Non-Normative)

E Draft Change History since WD 14 January (Non-Normative)

Streaming Transformations for XML (STX)
Version 1.0