STX

Streaming Transformations for XML (STX)
Version 1.0

Working Draft 14 January 2003

This version:
http://stx.sourceforge.net/documents/spec-stx-20030114.html
Latest version:
http://stx.sourceforge.net/documents/
Authors and Contributors:
Petr Cimprich <petr at NO-SPAM.gingerall.cz>
Oliver Becker <obecker at NO-SPAM.informatik.hu-berlin.de>
Christian Nentwich <c.nentwich at NO-SPAM.cs.ucl.ac.uk>
Honza Jiroušek <honza.jirousek at NO-SPAM.ecn.cz>
Michael Kay <michael.h.kay at NO-SPAM.ntlworld.com>
Tom Kaiser <tom at NO-SPAM.gingerall.cz>
Paul Brown <prb at NO-SPAM.fivesight.com>
Manos Batsis <mbatsis at NO-SPAM.humanmarkup.org>
Pavel Hlavnička <pavel at NO-SPAM.gingerall.cz>
Niko Matsakis <niko at NO-SPAM.alum.mit.edu>
Cyrus Dolph <cyrus at NO-SPAM.datapower.com>
Copyright © 2002 authors and contributors. All rights reserved.

Abstract

STX is an XML-based language for transforming XML documents into other XML documents without building a tree in memory. An STX processor transforms one or more source streams of SAX2 events according to rules given in an XML document called STX stylesheet and generates one or more result SAX2 streams. Each incoming event invokes one or more rules, that can e.g. emit events to the result stream or access a working storage.

Status of this Document

This document is a working draft of the STX transformation language specification.

Table of Contents

1 Introduction
2 Concepts
    2.1 Initiating a Transformation
    2.2 Nodes
    2.3 Context
    2.4 Precedence Categories
    2.5 Match Patterns
    2.6 Errors
3 Stylesheet Structure
    3.1 STX Namespace
    3.2 Transform Element
    3.3 Grouping of Templates
    3.4 Stylesheet Inclusion
4 Generating Output
    4.1 Transformation Options
    4.2 Namespace Aliasing
    4.3 Templates
    4.4 Procedures
    4.5 Parameters
    4.6 Copying the Current Node
    4.7 Processing Nested Events
    4.8 Processing Attributes
    4.9 Processing Siblings
    4.10 Running Overridden Templates
    4.11 Processing Text
    4.12 Outputting Strings
    4.13 Outputting Elements and Attributes
    4.14 Outputting Other Nodes
    4.15 Conditions
    4.16 Loops
    4.17 Multiple Input Documents
    4.18 Multiple Output Documents
    4.19 Buffers
    4.20 Messages
5 Data Types
    5.1 Atomic Types
    5.2 Sequences
    5.3 Type Conversions
    5.4 Tree Fragments
6 Expressions
    6.1 Variables
    6.2 Literals
    6.3 Parenthesized Expressions
    6.4 Functions
        6.4.1 Sequence Functions
        6.4.2 Node Functions
        6.4.3 Boolean Functions
        6.4.4 String Functions
        6.4.5 Numerical Functions
        6.4.6 Other Functions
    6.5 Data Accessors
    6.6 Sequence Expressions
    6.7 Arithmetic Expressions
    6.8 Comparison Expressions
    6.9 Logical Expressions
7 Extensions

Appendices

A References
B Element Syntax Summary
C STXPath Grammar
D Acknowledgments (Non-Normative)
E Draft Change History since WD 1 November (Non-Normative)


1 Introduction

This document defines the syntax and semantics of the STX transformation language. Transformation rules in STX are expressed as well-formed XML documents. These documents, called stylesheets, may include both elements that are defined by STX (STX declarations and instructions) and other elements (literals). STX-defined elements are identified by a specific XML namespace, which is referred to in this specification as the STX namespace. This document uses a prefix of 'stx' as a shortcut for referring to elements from the STX namespace.

An STX transformation describes rules for transforming one or more source event streams into one or more result event streams. The transformation has a streaming character; this means that it does not need to build a tree representing the source documents in memory. Result events are generated as soon as source events appear and are processed.

The transformation is achieved by associating events with templates. A template pattern is matched against events and their context. The best matching template is then instantiated to create a part of the result stream. A template is always instantiated with respect to a current context, a set of additional information maintained during the transformation. In constructing the result stream, events from the source stream can be filtered and arbitrary events can be added. Events can also be reordered using working storage.

On the surface, the syntax of STX is similar to the syntax of [XSLT]. STX also employs a compact expression language embedded in certain attributes. This expression language, called STXPath, is syntactically similar to [XPath]. This should allow XSLT users to easily adapt to STX syntax.

2 Concepts

The software responsible for running an STX transformation is referred to as an STX processor. An STX processor transforms one or more source XML documents according to rules given in an STX stylesheet and generates one or more result XML documents.

The source documents are supplied in the form of streams of [SAX2] events. These streams are referred to as the source streams. The stream whose events are currently processed is referred to as the current source stream. The current source stream at the time when the transformation is initiated is referred to as the principal source stream.

A possibly empty set of external values for stylesheet parameters is supplied. These values are available for use within expressions in the stylesheet.

No tree representation of the source document is constructed. However, when processing each event, a limited amount of contextual information is available from the system.

Data arriving with an event can form one or more objects called nodes. Pair events for the document and elements form one node only; all node data is passed with the starting event. The data of attributes passed with startElement() event form separate nodes.

Sequential characters() and ignorableWhitespace() events will be combined into a single text node.

The stylesheet is a well-formed XML document that may be precompiled to some kind of executable representation that can be reused to perform multiple transformations. The stylesheet can consist of several stylesheet modules contained in different files. One of these modules is the principal stylesheet module. The complete stylesheet is assembled by finding the stylesheet modules referenced directly or indirectly from the principal stylesheet module using the stx:include declaration.

The output of the transformation consists of one or more sequences of SAX2 events. These sequences of events are referred to as result streams. The stream events are emitted to currently is referred to as the current result stream. The current result stream at the time when the transformation is initiated is referred to as the principal result stream.

Each incoming event can cause an invocation of one or more rules within the stylesheet by means of a match pattern. The actions such a rule may perform include emitting SAX2 events to result streams, saving working data to working storage, accessing data written to working storage by previously executed rules, and invoking other rules.

Note:

The source or result streams are abstract constructs that function as input or output channels for STX transformations. Each source or result stream is identified with a URI. This URI must not be confused with the URI of a physical document that may be parsed to generate the source stream or a document the result stream may be serialized to. Instead, the stream is associated with a resolver (typically, SAX2 driver for source streams, SAX2 handler for result streams) that maps the abstract stream to particular physical resource.

2.1 Initiating a Transformation

This document does not specify interfaces for initiating an STX transformation. Instead, these interfaces are implementation dependant. This section describes the minimum amount of information that must be supplied to execute a transformation:

  • An identification of the stylesheet module that is to act as the principal stylesheet module for the transformation.
  • A set of values for stylesheet parameters (name-value pairs). External parameter values are matched against global stylesheet parameters.
  • A set of stream definitions (URI-resolver pairs) that are to act as source and result streams.
  • An identification of the stream that is to act as the principal source stream.
  • An identification of the stream that is to act as the principal result stream.

Note:

Some portions of this information can be passed explicitly through an implementation interface while other portions can be built-in for particular implementations. For example, an implementation can have standard resolvers for certain URI schemes (file, http). Thus, streams identified with these URIs may not require explicit definitions.

2.2 Nodes

The data arriving with an event forms zero or more entities called nodes. Pair events refer to a single node whose data is passed with the starting event. The attribute data and namespace data arriving with startElement() event form separate attribute and namespace nodes. Aggregated consequent events of the same type (characters, ignorableWhitespaces) are treated as a single event and thus form a single node only.

There are eight types of nodes recognized in STX:

  • root node - Passed with a startDocument() event; this node has no properties.

  • element node - Passed with a startElement() event. The node properties consist of the element related data (local name, prefix, qualified name, namespace URI).

  • attribute node - Passed with a startElement() event. The node properties consist of the data related to a particular attribute (local name, prefix, qualified name, namespace URI, value).

  • text node - Passed with a characters() or ignorableWhitespace() event. The node properties consist of character data.

  • CDATA node - Passed with a characters() or ignorableWhitespace() event within startCDATA() and endCDATA() lexical events. The node properties consist of character data.

  • processing instruction node - Passed with a processingInstruction() event. The node properties consist of target and character data.

  • comment node - Passed with a comment()event. The node properties consist of character data.

  • namespace node - Passed with a startElement() event. An element has a namespace node for each namespace prefix that is in scope for this element. The namespace node properties consist of prefix and NS URI.

2.3 Context

There is contextual information available at each point during processing. It includes the data arriving with the current event and other data related to the state of processing. The contextual information at any particular instant during processing is called the current context. The context information consists of the following parts:

  • current node data - The node which is the subject of the current event is called the current node. It is always given and there is no way to change the current node using stylesheet rules. The information available for the current node depends on the node type; see [SAX2] definition for details. For example, qualified name, local name, prefix, namespace URI and attributes (qualified name, local name, prefix, namespace URI, and value for each) are available for elements.

  • ancestor stack - For the current node, all ancestor nodes with all properties are stored in the ancestor stack.

  • next node data - The processing of the current node is delayed so that the next node data is available. The lookahead information can be used to access the first text child of an element, provided it is the very first child of the element.

  • position within siblings - Information about the position relative to other siblings is kept. The position is available for the current node and all its ancestors.

    A position number is available for all node kind tests such as node(), text(), cdata(), processingInstruction(), comment(). For elements, the position is available for all qualified names or names containing * shortcut: pre:lname, lname, pre:*, *:lname, *. For processing instructions, the position is also available for each target. The position of attribute nodes is undefined.

2.4 Precedence Categories

Each incoming event can invoke a template within the stylesheet by means of precedence categories and a match pattern (see 2.5 Match Patterns). The template that is used to process the current node is called the current template. Templates can be separated into groups (see 3.3 Grouping of Templates). Top-level templates are considered to be members of the default group. The group containing the current template is referred to as the current group

Templates are associated with the precedence categories according to their visibility from the current group or other explicitly specified group. The visibility is defined using the visibility attribute for each template (see 4.3 Templates).

There are two precedence categories (listed with decreasing precedence):

  1. templates from the same group and global or public templates (visibility='global'|'public') from children groups

  2. global templates (visibility='global')

The first precedence category is searched for the best matching template by means of a match pattern (see 2.5 Match Patterns). If there is no matching template in this precedence category, the second category is searched.

2.5 Match Patterns

The match pattern specifies a set of conditions on the current context. If the current context satisfies the conditions the current node matches the pattern; if the current context does not satisfy the conditions the current node does not match the pattern. The syntax for patterns is a subset of the pattern syntax for XSLT (see [XSLT], 5.2). In particular, patterns are in form of location paths that meet certain restrictions.

Here are some examples of patterns:

  • item - matches any 'item' element from the namespace used for unprefixed STXPath path patterns (defined with 'default-stxpath-namespace' option, no namespace by default)

  • list/item - matches any 'item' element with a 'list' parent, where both elements are from the namespace used for unprefixed STXPath path patterns

  • chapter//list/item - matches any 'item' element with a 'list' parent and a 'chapter' ancestor, where all three elements are from the namespace used for unprefixed STXPath path patterns

  • /root/list/* - matches any element with a 'list' parent and a 'root' grand parent which is the document element, where both 'root' and 'list' elements are from the namespace used for unprefixed STXPath path patterns

  • pre:list[@id=5]/pre:item - matches any 'item' element with a 'list' parent having an 'id' attribute with a value of 5, where both elements are from the namespace which is bound to the 'pre' prefix in the stylesheet for this rule

  • *[position()=1] - matches any element that is the first element child of its parent

  • node() - matches any child node

  • text() - matches any text node (including CDATA text node)

  • cdata() - matches any CDATA text node

  • processing-instruction() - matches any processing instruction

A match pattern is a set of location path patterns separated with |. A location path pattern is a location path whose steps all use only the child, descendant, and attribute axes. Patterns may use the / operator as well as the // operator. Only abbreviated syntax is allowed. Up to one predicate is allowed in each step. Predicate expressions are STXPath expressions (see 6 Expressions).

Predicate expressions are evaluated using the current context. If the result is a number, the result will be converted to true if the number is equal to the context position and will be converted to false otherwise. Thus a location path p[3] is equivalent to p[position()=3]. Otherwise the result will be converted to a boolean using the type conversion rules describes in 5.3 Type Conversions. If the result of evaluating and converting the predicate expression is false, the current template doesn't match the current node.

If there is no matching template available a default rule is applied. One of three default rules, specified in the pass-through attribute of stx:options can be used: 'none' (to skip the current node), 'all' (to pass through the current node), and 'text' (to pass through the current node only if it is a text node). The default rule can be set from the stylesheet (see 4.1 Transformation Options). This feature enables copying of documents with only a few changes, and to straightforwardly select just a few items from a document. The default behavior is to ignore all not matching events (value 'none').

It is possible that the current context matches more than one rule within a precedence category. The template rule to be used is determined according the same rules as in XSLT (see [XSLT], 5.5) then. All rules have a computed priority value. The computed priority can be overridden with a 'priority' attribute value (see 4.3 Templates).

  1. If the pattern contains multiple alternatives separated with |, then it is treated equivalently to a set of template rules, one for each alternative.

  2. If the pattern has the form of a qualified name or has the form either of processing-instruction(target) or cdata(), then the priority is 0.

  3. If the pattern has the form pre:* or *:lname, then the priority is -0.25.

  4. If the pattern consists of just a node test other than cdata(), then the priority is -0.5.

  5. Otherwise, the priority is 0.5.

The rule with the highest priority is used. If there is more than one matching template rule with the highest priority, an STX processor must choose the rule that occurs last in the stylesheet.

2.6 Errors

All errors that can occur during an STX transformation belong to one of the following categories:

  • warnings - The processor may issue a warning; the transformation must not be stopped.
  • recoverable errors - The processor may either issue an error and stop the transformation, or it can recover from the error in the way defined in this specification.
  • non-recoverable (fatal) errors - The processor must exit the transformation and issue an error message.

This specification doesn't define how to issue a warning or an error. Implementations are free to use either the standard or standard error output, or any convenient handler.

3 Stylesheet Structure

3.1 STX Namespace

The STX namespace has the URI http://stx.sourceforge.net/2002/ns.

3.2 Transform Element

stx:transform
<!-- Category: root -->
<stx:transform
  version = number>
  <!-- Content: top-level-elements -->
</stx:transform>

Stylesheets are required to use the root element stx:transform.

The version attribute contains a version number to distinguish language versions; this attribute is mandatory and its value must be '1.0' for this version of the language.

The stx:transform element can contain the following children from the STX namespace. These elements are called top-level elements:

  • stx:options
  • stx:include
  • stx:variable
  • stx:param
  • stx:buffer
  • stx:namespace-alias
  • stx:group
  • stx:template
  • stx:procedure

All top-level elements except stx:options may occur multiple times.

stx:options and stx:namespace-alias elements are allowed as top-level elements only.

3.3 Grouping of Templates

Templates can be organized into groups using the stx:group element. Groups of templates play a role in template matching (precedence categories are defined in terms of groups) and determine the scoping of variables.

Each stylesheet has a virtual default group (represented by stx:transform) that is considered to be the parent of top-level groups. Explicit groups are not mandatory; many transformations can be done without grouping templates. On the other hand, templates separated to groups make it possible to define more precise transformation rules and to run safer complex transformation, especially on well-known, regular input data.

stx:group
<!-- Category: top-level or group -->
<stx:group
  name = qname>
<!-- Content: group-elements -->
</stx:group>

This element must be a child of either the stx:transform or the stx:group element. The optional name attribute contains a qualified name that must be unique in the stylesheet. The name can be referenced by the group attribute of any of stx:process-children, stx:process-attributes, stx:process-self, stx:process-siblings, stx:process-document or stx:process-buffer instructions. In this event, the referenced group is used instead of the current group for matching. It is not possible to reference the default group.

It is a recoverable error if a stylesheet contains more than one group with the same name. The processor can recover from this error by choosing the group which is the last in the document order.

3.4 Stylesheet Inclusion

An STX stylesheet may include another STX stylesheet using the stx:include element.

stx:include
<!-- Category: top-level or group -->
<stx:include
  href = uri-reference/>

This declaration is used to insert additional stylesheet modules into the principal stylesheet module. Circular inclusion is prohibited.

This element must be top-level or a child of the stx:group element. stx:include is replaced with the content of the stx:transform element of the included stylesheet with three exceptions: stx:namespace-alias and stx:param (stylesheet parameters) in the included stylesheet are always inserted as top-level elements (even when including to a group) and stx:options of the included stylesheet is ignored. Top-level variables and top-level templates from the included stylesheet are treated as group variables and templates when including into a group. There is no difference between templates from the principal stylesheet and included templates in terms of matching precedence.

4 Generating Output

STX templates are called sequentially rather than from other templates. Pair events for the document and elements match only one template, which is broken into two parts; the first part is executed when the start event appears and the second one at the end event. The two parts are separated by the stx:process-children element.

4.1 Transformation Options

stx:options
<!-- Category: top-level -->
<stx:options
  pass-through = "none"|"all"|"text"
  recognize-cdata = "yes"|"no"
  default-stxpath-namespace = uri-reference
  strip-space = "yes"|"no"
  output-encoding = string/>

Global properties of a transformation can be specified using the stx:options element.

  • pass-through - This optional attribute specifies a default rule how to treat events no matching template is found for. These events are either ignored ("none", default) or passed to the output without modification ("all"). For "text", only text nodes are passed through to the output.

  • recognize-cdata - This optional attribute specifies, whether CDATA boundaries are recognized during the transformation. If so, every CDATA section forms a single node and a node kind test cdata() can be used in STXPath patterns. Otherwise (recognize-cdata="no"), CDATA boundaries will be ignored and all consequent character data forms a single text node, thus the cdata() kind test never matches in STXPath patterns. The default value is "yes".

  • default-stxpath-namespace - This optional attribute specifies a namespace used for unprefixed STXPath paths and patterns. No namespace is used by default.

  • strip-space - This optional attribute specifies whether whitespace text nodes are stripped from the input data stream. Whitespace text nodes are text nodes containing nothing but the following characters: #x20, #x9, #xD or #xA. The default value is "no".

  • output-encoding - This optional attribute specifies the preferred output encoding of the resulting byte stream. The value of this attribute should be treated case-insensitively; the value must contain only printable ASCII characters (#x21 - #x7E); the value must be a charset registered with the Internet Assigned Numbers Authority [IANA].

    If the attribute is not present, the output encoding is UTF-8. A compliant STX processor is not required to support any particular encoding other than UTF-8.

4.2 Namespace Aliasing

stx:namespace-alias
<!-- Category: top-level -->
<stx:namespace-alias
  source-prefix = ncname|"#default"
  result-prefix = ncname|"#default"/>

Namespaces from the input stream can be mapped to other namespaces in the result stream using the stx:namespace-alias element. Both attributes are mandatory and can contain either a prefix bound to the namespace to be used or the "#default" keyword for the default namespace.

4.3 Templates

stx:template
<!-- Category: top-level or group -->
<stx:template
  match = pattern
  priority = number
  visibility = "private"|"public"|"global"
  new-scope = "yes"|"no">
<!-- Content: template -->
</stx:template>

Rules to process input events are written in templates. The stx:template element must be a child of either the stx:transform or the stx:group element. Templates match to the events by means of precedence categories and the pattern in the mandatory match attribute. The optional priority attribute can contain a priority value used for matching (see 2.5 Match Patterns).

The optional visibility attribute specifies whether the template is visible from other groups (and thus can match to the next event). Private templates are visible in their group only, public templates are visible from parent groups, and global templates are visible from any group. The default value is "private".

The optional new-scope attribute specifies whether the template creates new instances of group variables. The default value is "no". A new set of group variables is created for each instantiated template with new-scope="yes". These variables shadow their former values and exist as long as the template is being processed.

The content of templates may include both STX instructions and literal elements. Literal elements are simply copied to the output.

A text template is defined as the content of some elements (stx:attribute, stx:variable, stx:param, stx:assign, stx:with-param, stx:cdata, stx:processing-instruction, stx:comment, stx:message). This is a part of a template that generates nothing but character events to the current output stream. An STX processor is required to issue a run-time recoverable error if another type of event is emitted. The processor is allowed to recover from this error by the ignoring the non-character event.

4.4 Procedures

stx:procedure
<!-- Category: top-level or group -->
<stx:procedure
  visibility = "private"|"public"|"global"
  new-scope = "yes"|"no"
  name = qname>
<!-- Content: template -->
</stx:procedure>

Procedures are sub-templates that can be called by names (with the stx:call-procedure element). The optional visibility and new-scope attributes have the same meaning as for templates. Only visible procedures can be called by name, the new-scope must be set to "yes" to create new copies of group variables. It is a static non-recoverable error if a stylesheet contains more than one visible procedure with the same name.

The content of procedures may be the same as the content of templates.

stx:call-procedure
<!-- Category: template -->
<stx:call-procedure
  name = qname
  group = qname>
<!-- Content: stx:with-param* -->
<stx:call-procedure>

The stx:call-procedure element makes it possible to invoke procedures by their names. The name attribute is mandatory. The optional group attribute makes it possible to use the specified group instead of the current group to call a procedure from.

4.5 Parameters

Values can be passed to stylesheets or to their templates and procedures as parameters. Stylesheet parameters behave in the same way as variables of the default group. Template/procedure parameters behave in the same way as local variables; thus they are only visible within the template or procedure they are passed to. There are two elements available to work with parameters:

stx:with-param
<!-- Category: process-xxx, call-procedure -->
<stx:with-param
  name = qname
  select = expression
<!-- Content: text template -->
</stx:with-param>

Parameters are passed to templates or procedures using the stx:with-param element. The required name attribute specifies the name of the parameter. The value of the parameter is either the result returned by an expression located in the optional select attribute or the content of this element. The stx:with-param instruction is allowed as a child of the elements stx:process-children, stx:process-attributes, stx:process-self, stx:process-siblings, stx:process-document, stx:process-buffer, or stx:call-procedure, and must not have any of these elements in its content.

stx:param
<!-- Category: top-level or template -->
<stx:param
  name = qname
  select = expression
  required = "yes" | "no">
<!-- Content: text template -->
</stx:param>

The stx:param element is allowed as a top-level element (indicating a stylesheet parameter as a child of stx:transform) and in templates or procedures (as a child of stx:template or stx:procedure). The required name attribute specifies the name of the parameter. The optional select attribute or the content of this element specifies a default value, which is used when there is no value specified using the select attribute or the content of the appropriate stx:with-param element.

Stylesheet parameters are statically initialized while parsing the stylesheet; Only the static context information is available during the initialization. Template/procedure parameters are initialized at run-time. Since there is no current source stream available during the static initialization, it is a recoverable error if a stylesheet (top-level) parameter has an stx:process-children, stx:process-attributes, stx:process-self, or stx:process-siblings instruction in its content. A processor may recover from this error by ignoring such an instruction.

The optional required attribute may be used to indicate that a parameter is mandatory. The default value is "no", indicating that the parameter is optional. If the value of the required attribute is "yes", the stx:param element must be empty, and must have no select attribute. It is a dynamic non-recoverable error if the caller doesn't supply a value with stx:with-param for a required parameter.

4.6 Copying the Current Node

stx:copy
<!-- Category: template -->
<stx:copy
  attributes = pattern>
<!-- Content: template -->
</stx:copy>

The stx:copy element is used to copy the current node to the output. The optional attributes attribute contains a pattern. The attributes of the current node that match the pattern are copied to the output. If the attributes attribute isn't present no attributes are copied with the current node.

Thus, attributes="@*" copies all attributes, attributes="@foo|@bar" copies the foo and bar attributes only, attributes="@*[not(name()='foo')]" copies all but the foo attribute, and attributes="@*[false()]" doesn't copy any attributes as if the attributes attribute is missing at all.

If the stx:copy instruction applies to a node other than element the attributes attribute is ignored.

4.7 Processing Nested Events

stx:process-children
<!-- Category: template -->
<stx:process-children
  group = qname>
<!-- Content: stx:with-param* -->
</stx:process-children>

The instruction stx:process-children suspends the processing of the current template by processing the children of the current node. Using SAX2 terms: this instruction splits a template into two parts such that a SAX2 startElement event causes the execution of the first part and the corresponding SAX2 endElement event causes the execution of the second part.

There must be always at most one stx:process-children executed during the processing of a template. Moreover, it is a non-recoverable error if stx:process-children is encountered after an stx:process-self or an stx:process-siblings instruction.

Note:

If a template doesn't contain any stx:process-children instruction, the children of this element will be skipped. The default rule (<stx:options pass-through = "none"|"all"|"text">) applies only to nodes that will be processed and no matching template has been found.

Note:

If the current node is neither an element node nor the document root then stx:process-children simply does nothing.

The optional group attribute makes it possible to use the specified group instead of the current group as the base for matching (see 2.4 Precedence Categories). It is a recoverable error if the group of the specified name is not available. An STX processor can recover from this error by using the current group.

4.8 Processing Attributes

stx:process-attributes
<!-- Category: template -->
<stx:process-attributes
  group = qname>
<!-- Content: stx:with-param* -->
</stx:process-attributes>

This instruction is used to apply templates to the attributes of an element node.

The optional group attribute makes it possible to use the specified group instead of the current group as the base for matching (see 2.4 Precedence Categories). It is a recoverable error if the group of the specified name is not available. An STX processor can recover from this error by using the current group.

4.9 Processing Siblings

stx:process-siblings
<!-- Category: template -->
<stx:process-siblings
  while = pattern
  until = pattern
  group = qname>
<!-- Content: stx:with-param* -->
</stx:process-siblings>

The instruction stx:process-siblings suspends the processing of the current template and processes the following siblings of the context node.

Note:

If the context node is an attribute node or the document root stx:process-siblings does nothing.

The optional while attribute takes a pattern and causes the processing of the siblings as long as they match the specified pattern. The first non-matching node will stop this stx:process-siblings. The while attribute defaults to node().

The optional until attribute takes a pattern and causes the processing of all following siblings until a node matching the pattern is encountered. This node won't be processed by this stx:process-siblings. The until attribute defaults to node()[false()].

If both while and until attributes have been specified then both conditions have to be met. For example <stx:process-siblings while="foo" until="foo"/> doesn't process any siblings. Variable bindings used within the patterns will be interpreted with regard to the current context. That means changed group variables affect the evaluation, whereas new instances of group variables or local variables are not visible.

Note:

Whitespace text nodes not stripped from the document must be considered in the patterns, particularly when using the while attribute. A typical attribute specification would be while="foo | text()" which processes all following foo elements and potential text nodes between these foo elements.

The optional group attribute makes it possible to use the specified group instead of the current group as the base for matching (see 2.4 Precedence Categories). It is a recoverable error if the group of the specified name is not available. An STX processor can recover from this error by using the current group.

An stx:process-siblings instruction encountered during the processing of the siblings of a node does not affect the while and until conditions of the previous stx:process-siblings. In other words: nested stx:process-siblings instructions process at most the siblings chosen in the preceding stx:process-siblings. That means stx:process-siblings also returns if there are no more siblings in the input available or a preceding stx:process-siblings terminates.

Though multiple stx:process-siblings instructions may appear within the same template it is a non-recoverable error if an stx:process-children or stx:process-self instruction will be encountered after stx:process-siblings.

4.10 Running Overridden Templates

stx:process-self
<!-- Category: template -->
<stx:process-self
  group = qname>
<!-- Content: stx:with-param* -->
</stx:process-self>

This instruction is used to process the current node using the template that would have been chosen if the current template wasn't present in the stylesheet. There must be always at most one stx:process-self instruction executed during the processing of a template. Moreover it is a non-recoverable error if an stx:process-self instruction is encountered after an stx:process-children or an stx:process-siblings instruction in a template.

The optional group attribute makes it possible to use the specified group instead of the current group as the base for matching (see 2.4 Precedence Categories). It is a recoverable error if the group of the specified name is not available. An STX processor can recover from this error by using the current group.

Note:

Specifying a different group results in choosing the best matching template in this group, whereas specifying the same group chooses the next best matching template. The latter is also different from specifying no group attribute at all in case the base group for matching is the parent group of the current group.

4.11 Processing Text

stx:replace
<!-- Category: template -->
<stx:replace
  select = expression>
  <!-- Content: stx:pattern+ -->
</stx:replace>
stx:pattern
<stx:pattern
  value = expression
  case = "sensitive"|"insensitive">
  <!-- Content: template -->
</stx:pattern>

This instruction processes a string in a similar way as stx:template processes nodes. The mandatory select attribute of stx:replace selects a string to process by evaluating the expression and converting it to a string. The mandatory value attribute of stx:pattern takes a regular expression by evaluating the expression in the value attribute and converting it to a string, which describes a substring to look for. The optional case attribute determines whether the regular expression is case-sensitve (value "sensitive") or not (value "insensitive"). The default is "sensitive".

The stx:replace instruction looks for the pattern among the value attributes of all stx:pattern elements that matches first in the string selected by the select attribute. The substring before the matched substring will be output, and the matched substring itself will be replaced by the contents of the stx:pattern element. Afterwards this stx:replace instruction will continue by processing the substring after the matched substring. If no pattern matches then the remaining string will be emitted as a text node to the result stream. A pattern must match at least one character.

In case two or more pattern may match at the same position then the pattern which matches the longest character sequence will be used. If still two or more patterns meet this condition then the first one will be used.

4.12 Outputting Strings

stx:value-of
<!-- Category: template -->
<stx:value-of
  select = expression/>

This instructions emits characters to the result stream. The mandatory select attribute contains an STXPath expression which is evaluated and converted to a string. This element is always empty.

stx:text
<!-- Category: template -->
<stx:text
  markup = "error"|"ignore"|"serialize">
<!-- Content: template -->
</stx:text>

This instruction emits literal character data to the result stream. The content is neither normalized nor stripped should it contain whitespace characters only.

The optional markup attribute determines how non-text nodes in the content of stx:text should be handled: "error" causes the processor to raise a run-time recoverable error for such nodes, "ignore" ignores any markup by emitting only the string value of the contents to the result stream, "serialize" emits any markup serialized as text. The default value is "error". The processor may recover from an error raised because having markup set to "error" by ignoring this attempt.

Note:

The string created by markup="serialize" may vary in different STX implementations, because some of the lexical representation is not relevant for the information coded in XML. For example every STX implementation may choose its own order for serializing attributes.

stx:cdata
<!-- Category: template -->
<stx:cdata>
<!-- Content: text template -->
</stx:cdata>

This instructions emits literal data as a CDATA section to the result stream. The content is neither normalized nor stripped should it contain whitespace characters only.

4.13 Outputting Elements and Attributes

stx:element
<!-- Category: template -->
<stx:element
  name = {qname}
  namespace = {uri-reference}>
<!-- Content: template -->
</stx:element>

This instruction is used to generate an element. It has the same meaning as in [XSLT].

stx:start-element
<!-- Category: template -->
<stx:start-element
  name = {qname}
  namespace = {uri-reference}/>
stx:end-element
<!-- Category: template -->
<stx:end-element
  name = {qname}
  namespace = {uri-reference}/>

There are separate instructions available to output an element start tag and an element end tag. The name attribute is required for both instructions. The both elements must be empty.

A compliant STX processor is required to produce well-formed XML output. An attempt to create an end-tag without a matching start-tag must be reported as non-recoverable error by the STX processor.

stx:attribute
<!-- Category: template -->
<stx:attribute
  name = {qname}
  namespace = {uri-reference}
  select = expression>
<!-- Content: text template -->
</stx:attribute>

This instruction is used to generate an attribute. It has the same meaning as in [XSLT]. Alternatively, the value of the generated attribute may be specified in the optional select attribute. It is a recoverable error of this instruction has a select attribute and is not empty. A processor can recover from this error by ignoring the content of stx:attribute.

stx:attribute must follow an element-starting instruction (stx:element, stx:start-element, stx:copy, or a literal element) and no other output-generating instructions are allowed between the element-starting instruction and stx:attribute. It is a recoverable error if there is no immediate element-starting instruction before. A processor can recover from this error by ignoring the stx:attribute instruction.

4.14 Outputting Other Nodes

stx:processing-instruction
<!-- Category: template -->
<stx:processing-instruction
  name = {ncname}>
<!-- Content: text template -->
</stx:processing-instruction>

This instruction is used to generate a processing instruction. It has the same meaning as in [XSLT].

stx:comment
<!-- Category: template -->
<stx:comment>
<!-- Content: text template -->
</stx:comment>

This instruction is used to generate a comment. It has the same meaning as in [XSLT].

4.15 Conditions

stx:if
<!-- Category: template -->
<stx:if
  test = expression>
<!-- Content: template -->
</stx:if>

The mandatory test attribute contains an STXPath expression evaluating to boolean. The content template is instantiated if and only if the test attribute has evaluated to true.

stx:else
<!-- Category: template -->
<stx:else>
<!-- Content: template -->
</stx:else>

This instruction must follow immediately after stx:if; a non-recoverable error must be reported otherwise. The content template is instantiated if and only if the test attribute of the preceding stx:if instruction has evaluated to false.

stx:choose
<!-- Category: template -->
<stx:choose>
  <stx:when
    test = expression>
  <!-- Content: template -->
  </stx:when>+
  <stx:otherwise>
  <!-- Content: template -->
  </stx:otherwise>?
</stx:choose>

The same meaning as in [XSLT].

4.16 Loops

stx:for-each
<!-- Category: template -->
<stx:for-each
  select = expression>
<!-- Content: template -->
</stx:for-each>

The stx:for-each instruction contains a template that is instantiated for each item of the sequence specified in the select attribute.

4.17 Multiple Input Documents

stx:process-document
<!-- Category: template -->
<stx:process-document
  href = expression
  base = {uri-reference}|"#input"|"#stylesheet"
  group = qname>
<!-- Content: stx:with-param* -->
</stx:process-document>

A stylesheet can process further source streams in addition to this supplied when the transformation is invoked (the principal source stream). The current source stream can be changed with the stx:process-document instruction. When this instruction is instantiated the expression in the mandatory href attribute will be evaluated, each item in the resulting sequence will be converted sequentially to a string (a URI), and its value will be used to identity and to process a new current source stream. Then, the execution of the template containing the stx:process-document instruction continues with the original source stream.

If a URI is a relative URI then the base URI will be derived from the type of the item in the sequence that represents this URI. In case this item is a node then its base URI will be used, otherwise the base URI of the stylesheet will be used. Alternatively, the optional base attribute can be used to specify explicitely which base URI should be used. Its value must be either an absolute URI, the string "#input" in which case the base URI of the current input stream will be used, or the string "#stylesheet" in which case the base URI of the principal stylesheet will be used.

The optional group attribute makes it possible to use the specified group instead of the current group as the base for matching (see 2.4 Precedence Categories). It is a recoverable error if the group of the specified name is not available. An STX processor can recover from this error by using the current group.

Note:

When processing a new document, the ancestor stack of the original document is not available for matching and navigation. Each new document has an ancestor stack of its own.

4.18 Multiple Output Documents

stx:result-document
<!-- Category: template -->
<stx:result-document
  href = expression>
<!-- Content: template -->
</stx:result-document>

A stylesheet can produce further result streams in addition to the principal result stream. The current result stream can be changed with the stx:result-document instruction. Events generated as the result of executing instructions contained within the stx:result-document element are emitted to a new current result stream identified with the URI which is the result of evaluating the expression in the href attribute and converting its value to a string. Then, the execution of instructions behind the end of the stx:result-document element continues to emit events into the original result stream.

4.19 Buffers

A sequence of events can be stored into an object called a buffer. The stored events can be emitted and processed later, in the same way as events emitted from a source stream. The events are emitted from a buffer in the same order as they were stored in. In other words, the buffers are temporary storages of the 'first in first out' type. The events stored in a buffer must represent a well-formed external general parsed entity (the restriction on a single root node is relaxed).

A buffer must be declared before it can be used. The same rules as for group varibles (see 6.1 Variables) apply for the the visibility of buffers, their shadowing, and the creating of new instances for new-scope templates (see 4.3 Templates).

stx:buffer
<!-- Category: top-level or group -->
<stx:buffer
  name = qname/>

The stx:buffer declaration must be either a top-level element or a child of the stx:group element. The mandatory name attribute contains a qualified name identifying the declared buffer.

stx:result-buffer
<!-- Category: template -->
<stx:result-buffer
  name = qname
  clear = "yes"|"no">
<!-- Content: template -->
</stx:result-buffer>

The stx:result-buffer instruction directs events emitted by its content into the buffer specified with the mandatory name attribute rather than to the current result stream. The buffer must be declared with stx:buffer before it can be employed in stx:result-buffer.

If the buffer specified with the name attribute already contains a sequence of events, the new sequence of events is appended behind the last event in the previously stored sequence normally. If the stx:result-buffer element has the optional clear attribute with the value of "yes", the previously stored events are removed from the buffer before the new sequence of events is stored in. The clear attribute defaults to "no".

Note:

To clear a buffer without storing a new sequence of events, use the stx:result-buffer instruction with no content: <stx:result-buffer name="my-buffer" clear="yes"/>

stx:process-buffer
<!-- Category: template -->
<stx:process-buffer
  name = qname
  group = qname>
<!-- Content: stx:with-param* -->
</stx:process-buffer>

The stx:process-buffer instruction emits events stored in the buffer specified by the mandatory name attribute. The events are processed in the same way as events supplied by source streams. When the very last event from the buffer is processed, the processing in the current template continues with an instruction, declaration or literal next to the stx:process-buffer instruction.

The optional group attribute makes it possible to use the specified group instead of the current group as the base for matching (see 2.4 Precedence Categories). It is a recoverable error if the group of the specified name is not available. An STX processor can recover from this error by using the current group.

The processing of events from a buffer doesn't mean the emptying of this buffer. Once a sequence of events is stored in the buffer, it can be processed repeatedly.

Note:

A buffer is not treated as a new document, but rather as if events emitted from the buffer originate from the current source stream. The ancestor stack of the current source stream remains available for matching and navigation when processing nodes from the buffer.

4.20 Messages

stx:message
<!-- Category: template -->
<stx:message>
<!-- Content: text template -->
</stx:message>

The stx:message instruction generates a separate result stream whose handling is implementation dependent. It can be directed to a log, or to a special message resolver, etc. However, all instructions of the content of the stx:message element must processed even if the message stream is ignored.

5 Data Types

5.1 Atomic Types

There are four atomic data types in STX:

  • string

  • number

  • boolean

  • node

There are eight types of node recognized in STXPath (see 2.2 Nodes). For every type of node, there is a way of determining a string-value. Since descendants are not available in the time of processing, string-values for some types of nodes are different from XPath string-values.

  • root nodes - there is no string value defined for root nodes, a recoverable error is reported. An STX processor is allowed to recover from this error by returning the empty string.

  • element nodes - if the very first child of an element happens to be a text node, the string-value of the element is the string-value of this text node. Otherwise, the string-value of the element is the empty string.

  • attribute nodes - the string-value of an attribute is the normalized value of this attribute

  • text nodes - the string-value of a text node is the character data of this node

  • cdata nodes - the string-value of a cdata node is the character data of this node

  • processing instruction nodes - the string-value of a processing instruction node is the part of the processing instruction following the target and any whitespace not including the terminating ?>

  • comment nodes - the string-value of a comment is the content of this comment not including the opening <!-- or the closing -->

  • namespace nodes - the string-value of a namespace node is the namespace URI

5.2 Sequences

STXPath expressions (see 6 Expressions) always return a sequence. A sequence is an ordered collection of zero or more items. Unlike common lists, sequences are "flat"; sequences may not contain other sequences. Sequences may contain duplicate items. An item must be of one of the atomic types: string, number, boolean, or node.

A sequence with zero items is called an empty sequence. A sequence with exactly one item is called a singleton sequence. There is no distinction between an item and a singleton sequence containing this item; an item is equivalent to a singleton sequence containing this item and vice versa. A sequence has no identity. Equality comparison of sequences is performed only by comparing items of the sequences.

5.3 Type Conversions

Certain operators, functions, and syntactic constructs expect a value of a particular type to be supplied: this type is referred to as a required type. In such an event, a general sequence is converted to the required type according to the conversion rules.

The empty sequence is converted to required types as defined in the following table:

required typeempty sequence
booleanfalse
stringempty string
numberNaN
nodeFATAL ERROR

A singleton sequence is converted to a required type according to the type of the only item in the sequence:

required typeboolean itemstring itemnumber itemnode item
boolean-false is converted to 'false', true is converted to 'true'false is converted to 0, true is converted to 1FATAL ERROR
string'false', '0', empty string are converted to false, other strings are converted to true-a string that consists of optional whitespace followed by an optional minus sign followed by a numeric literal (see 6.2 Literals) followed by whitespace is converted to the number that is nearest to the mathematical value represented by the string; any other string is converted to NaN.FATAL ERROR
number0, +0, -0, NaN are converted to false, other numbers are converted to trueNaN is converted to 'NaN', +0 and -0 are converted to '0', positive infinity is converted to 'Infinity', negative infinity is converted to '-Infinity'. Other numbers are represented in decimal form as numeric literal (see 6.2 Literals) with no leading zeros (apart possibly from the one required digit immediately before the decimal point), preceded by a minus sign (-) if the number is negative.-FATAL ERROR
nodea node is converted to truea node is converted to its string value (see 6 Expressions)a node is converted to its string value (see 6 Expressions); then the rules to convert strings to numbers are applied to convert the string value to a number-

A sequence containing more than one item is converted according to its very first item; all other items are ignored. The same conversion rules as for singleton sequences are applied (see the table above).

5.4 Tree Fragments

See Issue 1.

6 Expressions

STX uses an expression language of its own called STXPath. STXPath is very similar to [XPath] on the first sight. Syntactically, STXPath is close to an [XPath2] sub-set. However, since STX has a different notion of context, the meaning of some expressions may be different in STXPath and in XPath. Consider the following example:

In XPath, the expression /node1/node2 returns a node-set containing all node2 elements, whose parent node1 is the document element. In STXPath, on contrary, the same expression returns only a single node from this node-set; the one which is an ancestor of the current node.

Expressions are used in STX in predicates of match patterns, to specify conditions for different ways of processing of the current node, to generate text to be inserted to the output stream, or to access data from the ancestor stack.

Each expression has its static context - the information that is available during static analysis of the expression, prior to its evaluation. The static context includes in-scope namespaces, default namespace for element names, and in-scope variables. The information that is available at the time when the expression is evaluated is the current context as defined in 2.3 Context.

Basic primitives of STXPath include:

Expressions evaluate always to a sequence. See the EBNF production for expression in C STXPath Grammar for the details.

6.1 Variables

STX variables are scoped statically according to the literal structure of stylesheets. The grouping of templates is used to make the sharing of other than global variables possible.

There are two types of variables:

  • group variables - stx:variable is child of either stx:transform or stx:group. Top-level variables are considered to be members of the top-most default group that exists for each stylesheet.

  • local variables - Declared within templates.

A group variable is visible for the group where the variable is declared, for all descendant groups and for all templates belonging to these groups. A local variable is visible for all following siblings of the variable declaration and their descendants. Group variables may be shadowed (another variable with the same name is visible) by descendant group variables and by local variables. It is a non-recoverable error to redeclare a variable with the same name in the same group or template.

Variables always contain a sequence. STX instructions stx:variable and stx:assign are used to evaluate an expression and store its value into a variable.

Since variables are re-assignable, each variable must be declared using the stx:variable element before it's used (assigned, referenced). Group variables are statically initialized while parsing the stylesheet; Only the static context information is available during the initialization. Local variables are initialized at run-time. A variable declared with no value is initialized with the empty sequence.

stx:variable
<!-- Category: top-level or group or template -->
<stx:variable
  name = qname
  select = expression
  keep-value = "yes"|"no">
<!-- Content: text template -->
</stx:variable>

This instruction is used to declare and initialize a variable. The mandatory name attribute contains the name of the variable. An expression in the select attribute is evaluated and the variable is initialized with its result. The select attribute is optional; a variable is initialized with the string resulting from the content of the stx:variable element if the select is missing. If the content is empty (stx:variable element has no children) the variable is initialized with the empty sequence. It is a recoverable error if the element stx:variable declaring a group variable contains an stx:process-children, stx:process-self, stx:process-siblings, or stx:process-attributes instruction in its content. A processor may recover from this error by ignoring such an instruction.

The optional keep-value attribute specifies whether a new instance of the variable created by instantiating a template having its new-scope attribute set to "yes" is initialized with the value of the shadowed variable (yes) or not (no). This attribute is allowed only for group variables. The default value is no. If there is no shadowed variable yet, the keep-value attribute is ignored.

stx:assign
<!-- Category: top-level or group or template -->
<stx:assign
  name = qname
  select = expression>
<!-- Content: text template -->
</stx:assign>

This instruction is used to assign a new value to a previously declared variable. The mandatory name attribute contains the name of the variable. The expression in the optional select attribute is evaluated and its result is assigned to the variable. The string resulting from the content of the stx:variable element is assigned to the variable if the select is missing. If the content is empty, the empty sequence is assigned to the variable.

6.2 Literals

A literal is a direct syntactic representation of an atomic value. STXPath supports two kinds of literals: string literals and numeric literals.

The value of a string literal is a singleton sequence containing an item whose atomic type is string and whose value is the string denoted by the characters between the delimiting quotation marks.

  
StringLiteral   ::=   (["][^"]*["]) | (['][^']*['])

The value of a numeric literal is a singleton sequence containing an item whose type is number and whose value is obtained by parsing the numeric literal according to the rules for string to numbers conversion (see 5.3 Type Conversions).

  
NumericLiteral   ::=   IntegerLiteral | DecimalLiteral | DoubleLiteral
IntegerLiteral   ::=   Digits
DecimalLiteral   ::=   ('.' Digits) | (Digits '.' [0-9]*)
DoubleLiteral   ::=   (('.' Digits) | (Digits ('.' [0-9]*)?))([e]|[E])([+][-])? Digits

6.3 Parenthesized Expressions

Parentheses may be used to enforce a particular evaluation order in expressions that contain multiple operators.

Parentheses are also used as delimiters in constructing a sequence, as described in 6.6 Sequence Expressions.

6.4 Functions

A function call consists of a function name followed by a parenthesized list of zero or more expressions. The expressions inside the parentheses provide the arguments of the function call. The number of arguments must be equal to the number of function parameters; otherwise a static non-recoverable error is raised.

A function call is evaluated as follows:

  1. Each argument expression is evaluated, producing an argument value (sequence).

  2. If the corresponding function parameter has a required type, the argument value is converted to this type.

  3. The function is executed using the converted argument values. The result is a value of the function's declared return type.

The following list of STXPath functions is categorized by the required types of the primary arguments:

6.4.1 Sequence Functions

Function: boolean empty(sequence)

The empty() function returns true if the argument is the empty sequence; otherwise it returns false.

Function: item item-at(sequence, number)

The item-at() function returns the item from the first argument sequence at the position given by the second argument. The index number is rounded to the nearest integer if necessary. If the sequence is the empty sequence, this function returns the empty sequence. If the value of index is greater than the number of items in the sequence, or is less than or equal to zero, then the function reports a non-recoverable error.

Function: sequence sublist(sequence, number, number?)

The sublist() function returns the contiguous sequence of items from the first argument (source sequence) beginning at the position specified by the second argument (index) and continuing for the number of items indicated by the third argument (length). If length is not specified, then the sublist identifies items to the end of the source sequence. The index and length numbers are rounded to the nearest integers if necessary. If the source sequence is the empty sequence, this function returns the empty sequence. If the value of index is greater than the number of items in the sequence, or is less than or equal to zero, then the function reports a non-recoverable error. The length can be greater than the number of items in the source sequence following the beginning position, in which case the sublist identifies items to the source sequence.

Function: number count(sequence)

The count() function returns the number of items in the sequence.

6.4.2 Node Functions

Function: string name(node)

The name function returns a string containing a qualified name representing the expanded-name of the node in the argument. For nodes with no name defined (root, text, CDATA text, comment), this function returns the empty string. For processing-instructions, this function returns their target.

Function: string namespace-uri(node)

The namespace-uri function returns the namespace URI of the expanded-name of the node in the argument. For nodes with no namespace defined (root, text, CDATA text, processing instruction, comment, namespace), this function returns the empty string.

Function: string local-name(node)

The local-name() function returns the local part of the expanded-name of the node in the argument. For nodes with no local name defined (root, text, CDATA text, comment), this function returns the empty string. For processing-instructions, this function returns their target.

Function: string prefix(node)

The prefix() function returns the prefix of the expanded-name of the node in the argument. For nodes with no prefix defined (root, text, CDATA text, processing instruction, comment, namespace), this function returns the empty string.

Function: number position()

The position() function returns a number equal to the position of the current node relative to other siblings, see 2.3 Context for details of position() semantics.

Function: node get-node(number)

The get-node() function returns the node which is in the ancestor stack at the level given by the argument. The level number is rounded to the nearest integer if necessary. For example, get-node(0) returns the root of the document, get-node(1) returns the document element. get-node(level()) returns the current node. If there is no node at the requested level in the ancestor stack, the function returns the empty sequence.

Function: boolean has-child-nodes()

The has-child-nodes() function returns true if and only if the current node is the document node or an element node and has child nodes (it is not empty). It returns false otherwise.

6.4.3 Boolean Functions

Function: boolean true()

The true() function returns always true.

Function: boolean false()

The false() function returns always false.

Function: boolean not(sequence)

The not() function reduces its parameter to an effective boolean value using the same rules that are used for the operands of logical expressions (see 6.9 Logical Expressions). It then returns true if the effective boolean value of its parameter is false, and false if the effective boolean value of its parameter is true.

6.4.4 String Functions

Function: boolean starts-with(string, string)

The starts-with() function returns true if the first argument string starts with the second argument string, otherwise it returns false. If the value of any argument is the empty sequence, the function returns the empty sequence.

Function: boolean contains(string, string)

The contains() function returns true if the first argument string is part of the second argument string, otherwise it returns false. If the value of any argument is the empty sequence, the function returns the empty sequence.

Function: string substring(string, number, number?)

The substring() function returns the number specified with the third argument of characters from the offset specified with the second argument in the first argument string; or all characters from the offset to the end of the string if the third argument is omitted; the offset and length numbers are rounded to the nearest integer if necessary. The offset of the first character is 1. If the value of any argument is the empty sequence, the function returns the empty sequence.

Function: string substring-before(string, string)

The substring-before() function returns the part of the first argument string from the beginning of the string up to (but not including) the first occurrence of the second argument string. The empty string is returned if the first argument string does not contain the second argument string. If the value of any argument is the empty sequence, the function returns the empty sequence.

Function: string substring-after(string, string)

The substring-after() function returns the part of the first argument string from the end of the first occurrence of the second argument string to the end of the (first) string. The empty string is returned if the first argument string does not contain the second argument string. If the value of any argument is the empty sequence, the function returns the empty sequence.

Function: number string-length(string)

The string-length() function returns the number of characters in a string. If the value of the argument is the empty sequence, the function returns the empty sequence.

Function: string normalize-space(string)

The normalize-space() function returns the argument string after leading and trailing whitespace is stripped and consequent whitespace characters are replaced with a single space. If the value of the argument is the empty sequence, the function returns the empty sequence.

Function: string translate(string, string, string)

The translate() function returns the first argument string with occurrences of characters in the second argument string replaced by the corresponding characters from the third argument string. If there is a character in the second argument string with no character at a corresponding position in the third argument string (because the second argument string is longer than the third argument string), then occurrences of that character in the first argument string are removed. If a character occurs more than once in the second argument string, then the first occurrence determines the replacement character. If the third argument string is longer than the second argument string, then excess characters are ignored. If the value of any argument is the empty sequence, the function returns the empty sequence.

Function: string concat(string, string?)

The concat() function returns the concatenation of its arguments. If the value of any argument is the empty sequence, the function returns the empty sequence.

Function: string replace(string, string, string, string?)

The replace() function returns the first argument string with parts that match a regular expression given in the second argument string replaced with the third argument string. The regular expression semantics as defined in XML Schema Part 2: Datatypes ([XSD2]), Appendix F is used.

The fourth optional argument is a string consisting of character flags to be used by the match. If a character is present then that flag is true. The flags are:

  • g - global replace

    All occurrences of the regular expression in the string are replaced. If this character is not present, then only the first occurrence of the regular expression is replaced.

  • i - case insensitive

    The regular expression is treated as case insensitive. If this character is not present, then the regular expression is case sensitive.

If the value of any argument is the empty sequence, the function returns the empty sequence.

Function: sequence match(string, string, string?)

The match() function returns a list of integers that identify the offset of the location within the value of the first argument string that is matched by the regular expression that is the value of the second argument string. If there is no substring of the first string that matches the regular expression, the empty sequence is returned. Otherwise, a sequence of two integers is returned: the first integer is the position of the start of the substring and the second integer is the length of the substring that matches. The regular expression semantics as defined in XML Schema Part 2: Datatypes ([XSD2]), Appendix F is used.

The third optional argument is a string consisting of character flags to be used by the match. If a character is present then that flag is true. The flags are:

  • g - global replace

    All occurrences of the regular expression in the string are replaced. If this character is not present, then only the first occurrence of the regular expression is replaced.

  • i - case insensitive

    The regular expression is treated as case insensitive. If this character is not present, then the regular expression is case sensitive.

If the value of any argument is the empty sequence, the function returns the empty sequence.

6.4.5 Numerical Functions

Function: number floor(number)

The floor() function returns the largest number that is not greater than the argument and that is an integer. If the value of the argument is the empty sequence, the function returns the empty sequence.

Function: number ceiling(number)

The ceiling() function returns the smallest number that is not less than the argument and that is an integer. If the value of the argument is the empty sequence, the function returns the empty sequence.

Function: number round(number)

The round() function returns the number that is closest to the argument and that is an integer. If there are two such numbers, then the greater one is returned. If the argument is NaN, then NaN is returned. If the value of the argument is the empty sequence, the function returns the empty sequence.

Function: number sum(sequence)

The sum() function returns the sum, for each item in the argument sequence, of the result of converting the item to a number. If the value of the argument is the empty sequence, the function returns the empty sequence.

6.4.6 Other Functions

Function: string string(sequence)

The string() function returns the result of converting the argument to a string. See 5.3 Type Conversions for details.

Function: number number(sequence)

The number() function returns the result of converting the argument to a number.

Function: boolean boolean(sequence)

The boolean() function returns the result of converting the argument to a boolean.

Function: number level(node?)

The level() function returns the level of the argument node in the ancestor stack. level() and level(.) return the level of the current node. level(/) returns 0. If the value of the argument is the empty sequence, the function returns the empty sequence.

6.5 Data Accessors

The only data available when processing the current node is the data related to the current node, the data related to the next node, and the data related to nodes in the ancestor stack. Location paths called data accessors are used to access to this data. Axes in data accessors are limited to:

  • parent and ancestor axes in relative location paths

  • child and descendant axes (abbreviated syntax only) in absolute location paths

  • attribute axis (abbreviated syntax only)

  • text() node test (child axis) for the current node

Predicates are not allowed in data accessors.

A data accessor always returns a sequence (often a singleton one). These sequences are very limited; they can contain nothing but nodes stored in the ancestor stack (the current node and its attributes, ancestor elements and their attributes) and the next nodes (only if the next node happens to be a text node, accessed with text()). Resulting sequences can be either passed to functions operating with sequences or converted to string, number or boolean.

Here are some examples of data accessors:

  • . - returns the current node

  • text() - returns the first text child of the current node provided it is the very first child of the current node. Otherwise, it returns the empty sequence.

  • parent::* - returns the parent node of the current node

  • ancestor::* - returns a sequence whose items are all ancestors of the current node

  • @foo - returns the foo attribute of the current node

  • ancestor::*/@bar - returns a sequence of bar attributes of ancestors of the current node

  • /aaa/bbb - returns a bbb element from the ancestor stack which is a child of aaa element which is the root element of the ancestor stack (and hence the root element of the input document)

6.6 Sequence Expressions

STXPath supports operators to construct and combine sequences. One way to construct a sequence is using a parenthesized expression (6.3 Parenthesized Expressions), which consists of zero or more expressions separated with the comma operator and delimited with parentheses. The parenthesized expression is evaluated by evaluating each of its constituent expressions and concatenating the resulting sequences, in order, into a single result sequence.

Here are some examples of expressions that construct sequences:

This expression is a sequence of five integers:

(10, 1, 2, 3, 4)

This expression constructs one sequence from the sequences 10, (1, 2), the empty sequence (), and (3, 4):

(10, (1, 2), (), (3, 4))

It evaluates to the sequence (10, 1, 2, 3, 4).

6.7 Arithmetic Expressions

STXPath provides arithmetic operators for addition, subtraction, multiplication, division, and modulus, in their usual binary and unary forms. The binary subtraction operator must be preceded by white space in order to distinguish it from a hyphen, which is a valid name character.

An arithmetic expression is evaluated by applying the following rules:

  • If either operand is the empty sequence, the result of the operation is the empty sequence.

  • Operands other than empty sequences are converted (5.3 Type Conversions) to numbers before the expression is evaluated. If the conversion fails (returns NaN) a non-recoverable error is reported.

6.8 Comparison Expressions

Comparison expressions allow two values to be compared. STXPath provides the following general comparison operators: =, !=, <, <=, >, >=. The result of a comparison is always true or false (a singleton sequence containing one boolean item).

  
CompOp   ::=   '=' | '!=' | '<' | '<=' | '>' | '>='

The result of a comparison of sequences is defined by applying the following rules, in order:

  1. If either operand is the empty sequence, the result is false.

  2. The comparison A operator B is true for sequences A and B if the comparison a operator b is true for some item a in A and some item b in B. Otherwise, A operator B is false.

The result of a comparison of items is defined by applying the following rules. The rules defined in 5.3 Type Conversions apply for conversions:

  • If both items to be compared are nodes, then the comparison will be true if and only if the result of performing the comparison on the string-values of the two nodes is true.

  • If one item to be compared is a node and the other is a number, then the comparison will be true if and only if the result of performing the comparison on the number and on the result of converting the string-value of that node to a number is true.

  • If one item to be compared is a node and the other is a string, then the comparison will be true if and only if the result of performing the comparison on the string-value of the node and the other string is true.

  • If one item to be compared is a node and the other is a boolean, then the comparison will be true if and only if the result of performing the comparison of true and the boolean value is true.

  • When neither item to be compared is node and the operator is = or !=, then the items are compared by converting them to a common type as follows and then comparing them. If at least one item to be compared is a boolean, then each item to be compared is converted to a boolean. Otherwise, if at least one item to be compared is a number, then each item to be compared is converted to a number. Otherwise, both items to be compared are converted to strings.

  • When neither item to be compared is node and the operator is <=, <, >= or >, then the items are compared by converting both items to numbers and comparing the numbers.

6.9 Logical Expressions

STXPath provides two common logical operators: and and or. The value of a logical expression is always one of the boolean values true or false (a singleton sequence containing a boolean item).

Logical expressions are evaluated by reducing each of its operands to an effective boolean value by applying the following rules, in order:

  1. If the operand is the empty sequence, its effective boolean value is false.

  2. If the operand is a singleton sequence containing a boolean item, the item serves as the effective boolean value.

  3. If the operand is a sequence that contains at least one node, its effective boolean value is true.

  4. In any other case, operands are converted to boolean (see 5.3 Type Conversions) to get effective boolean values.

An AND expression returns true if the effective boolean values of both of its operands are true; otherwise it returns false.

An OR expression returns false if the effective boolean values of both of its operands are false; otherwise it returns true.

In addition to logical expressions, XPath provides a function named not() that takes a general sequence as parameter and returns a boolean value.

7 Extensions

STX will define extension modules to interact with other XML and non-XML technologies. What this document describes is the core STX language. Extensions can possibly include the following modules:

A References

XSLT
World Wide Web Consortium. XSLT 1.0. W3C Recommendation. See http://www.w3.org/TR/xslt
XPath
World Wide Web Consortium. XPath 1.0. W3C Recommendation. See http://www.w3c.org/TR/xpath
XPath2
World Wide Web Consortium. XPath 2.0. W3C Working Draft. See http://www.w3.org/TR/xpath20
SAX2
SAX 2.0, the Simple API for XML. See http://www.saxproject.org
IANA
IANA Character Sets assignment. See http://www.iana.org/assignments/character-sets
XSD2
World Wide Web Consortium. XML Schema Part 2: Datatypes. W3C Recommendation. See http://www.w3.org/TR/xmlschema-2
XML Names
World Wide Web Consortium. Namespaces in XML. W3C Recommendation. See http://www.w3.org/TR/REC-xml-names/

B Element Syntax Summary

Plain list only so far:

	stx:transform
	stx:options
	stx:include
	stx:namespace-alias
	stx:template
	stx:procedure
	stx:group
	stx:call-procedure
	stx:copy
	stx:process-children
	stx:process-attributes
	stx:process-siblings
	stx:process-self
	stx:value-of
	stx:text
	stx:cdata
	stx:element
	stx:start-element
	stx:end-element
	stx:processing-instruction
	stx:comment
	stx:attribute
	stx:if
	stx:else
	stx:choose
	stx:when
	stx:otherwise
	stx:variable
	stx:assign
	stx:with-param
	stx:param
	stx:for-each
	stx:process-document
	stx:result-document
	stx:buffer
	stx:process-buffer
	stx:result-buffer
	stx:process-siblings
	stx:replace
	stx:pattern
	

C STXPath Grammar

The following is a complete grammar for STXPath in EBNF notation.

Main Constructs
[1]   pattern   ::=   PathPattern ('|' PathPattern)?
[2]   expression   ::=   Expr
Match Patterns
[3]   PathPattern   ::=   AbsolutePattern | RelativePattern
[4]   AbsolutePattern   ::=   '/' RelativePattern?
[5]   RelativePattern   ::=   Step (('/' RelativePattern) | ('//' RelativePattern))?
[6]   Step   ::=   NodeTest Predicate?
[7]   NodeTest   ::=   NameTest | KindTest
[8]   Predicate   ::=   '[' Expr ']'
[9]   NameTest   ::=   NodeNameTest | AttributeNameTest
[10]   NodeNameTest   ::=   QName | NCName ':' '*' | '*' | '*' ':' NCName
[11]   AttributeNameTest   ::=   '@' QName | '@' NCName ':' '*' | '@' '*' | '@' '*' ':' NCName
[12]   KindTest   ::=   AnyKindTest | CommentTest | ProcessingInstructionTest | TextTest | CDATATest
[13]   AnyKindTest   ::=   'node()'
[14]   CommentTest   ::=   'comment()'
[15]   ProcessingInstructionTest   ::=   'processing-instruction(' StringLiteral? ')'
[16]   TextTest   ::=   'text()'
[17]   CDATATest   ::=   'cdata()'
Expressions
[18]   Expr   ::=   OrExpr
[19]   OrExpr   ::=   AndExpr | OrExpr 'or' AndExpr
[20]   AndExpr   ::=   GeneralComp | AndExpr 'and' GeneralComp
[21]   GeneralComp   ::=   AdditiveExpr | GeneralComp CompOp AdditiveExpr
[22]   AdditiveExpr   ::=   MultiplicativeExpr | AdditiveExpr ('+' | '-') MultiplicativeExpr
[23]   MultiplicativeExpr   ::=   UnaryExpr | MultiplicativeExpr ('*' | 'div' | 'mod') UnaryExpr
[24]   UnaryExpr   ::=   ('-' | '+')? BasicExpr
[25]   BasicExpr   ::=   DataAccessor | ParenthesizedExpr | Literal
[26]   ParenthesizedExpr   ::=   '(' ExprSequence? ')'
[27]   ExprSequence   ::=   Expr (',' Expr)*
[28]   Literal   ::=   NumericLiteral | StringLiteral
Data Accessors
[29]   DataAccesor   ::=   NodeAccessor | NodeAccessor '/' PropertyAccessor | PropertyAccessor
[30]   NodeAccessor   ::=   PathAccessor | Variable | FunctionCall
[31]   FunctionCall   ::=   QName '(' ExprSequence? ')'
[32]   PathAccessor   ::=   ('/' | '//')? RelativeAccessor
[33]   RelativeAccessor   ::=   RelativeAccessor ('/' | '//') AccessorStep | AccessorStep
[34]   AccessorStep   ::=   AccessorAxis? NodeNameTest | '.' | '..'
[35]   PropertyAccessor   ::=   TextTest | AttributeNameTest | NamespaceNameTest
[36]   AccessorAxis   ::=   'parent::' | 'ancestor::'
[37]   NamespaceNameTest   ::=   'namespace::' NCName | 'namespace::' '*'
Syntactic Constructs
[38]   CompOp   ::=   '=' | '!=' | '<' | '<=' | '>' | '>='
[39]   NumericLiteral   ::=   IntegerLiteral | DecimalLiteral | DoubleLiteral
[40]   IntegerLiteral   ::=   Digits
[41]   DecimalLiteral   ::=   ('.' Digits) | (Digits '.' [0-9]*)
[42]   DoubleLiteral   ::=   (('.' Digits) | (Digits ('.' [0-9]*)?))([e]|[E])([+][-])? Digits
[43]   StringLiteral   ::=   (["][^"]*["]) | (['][^']*['])
[44]   Variable   ::=   '$' QName
[45]   Digits   ::=   [0-9]+

In addition, the following non-terminals are defined in [XML Names]:

NCName
QName

D Acknowledgments (Non-Normative)

These people have contributed to this specification as they sent valuable comments to the stx@gingerall.cz mailing list:

Barrie Slaymaker
Miguel Branco
Eric van der Vlist
Richard R. McKinley
Jan Poslušnư
Gunnlaugur Thor Briem
Robert Koberg
Michael Brennan

E Draft Change History since WD 1 November (Non-Normative)

2002-12-08 : CN : Added STXPath grammar section. Changed grammar examples to prodrecap elements.
2002-12-10 : PC : 'mode' replaced with 'group' attribute in stx:process-xxx. 'name' attribute added to stx:group. stx:with-param allowed in stx:process-xxx. stx:param allowed in templates
2002-12-11 : OB : Added stx:process-siblings, rephrased description of stx:process-children and stx:process-self
2002-12-16 : PC : Removed joining of output text events. 'group' attribute added to stx:procedure. Changed definition of text template; TTs are run-time checkable now. Added stx:message.
2002-12-19 : OB : Added 'markup' attribute to stx:text. Incorporated section 'Processing Text' including stx:replace and stx:pattern.
2002-12-25 : OB : Added 'required' attribute to stx:param; Forbid stx:process-... in stx:with-param, stylesheet parameters, and group variables.
2002-12-30 : OB : Added 'base' attribute to stx:process-document; Allow attribute value templates (AVT) for stx:processing-instruction/@name, stx:process-document/@href, and stx:result-document/@href
2003-01-06 : PC : TT definition moved to Templates. Added Errors section, all errors categorized. Minor changes in 'Parameters'.
2003-01-08 : PC : Namespace nodes added. Naming changes: start/end-element, @new-scope, @pass-through. Added notes about stacks for process-buffer/document. Conflicts of group names classified as recoverable errors. Missing group name classified as recoverable error.
2003-01-09 : OB : Revised STXPath grammar; added links from within the specification to non-terminals. Changed type of stx:process-document/@href and stx:result-document/@href to expression. Clarified base URI for stx:process-document.
2003-01-10 : PC : Element string-value changed to text().