This document is an editors' copy that has no official standing.
This document is a working draft of the STX transformation language specification.1 Introduction
2 Processing Model
2.1 Context
2.2 Precedence Categories
2.3 Match Patterns
3 Stylesheet Structure
3.1 STX Namespace
3.2 Transform Element
3.3 Grouping of Templates
3.4 Stylesheet Inclusion
4 Generating Output
4.1 Transformation Options
4.2 Namespace Aliasing
4.3 Templates
4.4 Procedures
4.5 Parameters
4.6 Copying the Current Node
4.7 Processing Nested Events
4.8 Processing Attributes
4.9 Running Overridden Templates
4.10 Outputting Strings
4.11 Outputting Elements and Attributes
4.12 Outputting Other Nodes
4.13 Conditions
4.14 Loops
5 Data Types
5.1 Atomic Types
5.2 Sequences
5.3 Type Conversions
5.4 Tree Fragments
6 Expressions
6.1 Variables
6.2 Literals
6.3 Parenthesized Expressions
6.4 Functions
6.4.1 Sequence Functions
6.4.2 Node Functions
6.4.3 Boolean Functions
6.4.4 String Functions
6.4.5 Numerical Functions
6.4.6 Other Functions
6.5 Data Accessors
6.6 Sequence Expressions
6.7 Arithmetic Expressions
6.8 Comparison Expressions
6.9 Logical Expressions
7 Extensions
A References
B Element Syntax Summary
C Acknowledgments
D Issues
This document defines the syntax and semantics of the STX transformation language. Transformation rules in STX are expressed as well-formed XML documents. These documents, called stylesheets, may include both elements that are defined by STX (STX instructions) and other elements (literals). STX-defined elements are distinguished by belonging to a specific XML namespace, which is referred to in this specification as the STX namespace. This document uses a prefix of 'stx' as a shortcut for referring to elements from the STX namespace.
An STX transformation describes rules for transforming a source event stream into a result event stream. The transformation has a streaming character; this means that it doesn't require to build a tree representing the source document in memory. Result events are generated as soon as single source events appear and are processed.
The transformation is achieved by associating events with templates. A template pattern is matched against events and their context. The best matching template is then instantiated to create part of the result stream. The template is always instantiated with respect to a current context, an amount of information maintained during the transformation. In constructing the result stream, events from the source stream can be filtered and arbitrary events can be added. Events can also be reordered using a working storage.
The syntax of STX is similar to the syntax of [XSLT] formally. STX also employs a compact expression language embedded in certain attributes. This expression language, called STXPath, is again similar to [XPath] on the first sight syntactically. This should allow XSLT users to learn the most of STX syntax easily.
An STX processor transforms a source XML document according to rules given in an STX stylesheet and generates a result XML document.
The source document is supplied in the form of a stream of [SAX2] events. This stream is referred to as the source stream.
No tree representation of the source document is constructed. However, when processing each event, a limited amount of contextual information is available from the system.
Consequent character()
SAX2 events are treated as
a single character()
event. Similarly, consequent
comment()
events are treated as a single
comment()
event.
The stylesheet is a well-formed XML document that may be precompiled to some kind of executable representation that can be reused to perform multiple transformations.
The output of the transformation consists of a sequence of SAX2 events. This sequence of events is referred to as the result stream.
Each incoming event causes invocation of a rule within the stylesheet by means of a match pattern.
The actions such a rule may perform include emitting SAX events to the result stream, saving working data in a working storage, accessing data written to working storage by previously executed rules, and invoking other rules.
There is a contextual information available in each moment of the processing. It includes the data arriving with the current event and other data related to the state of processing. The contextual information which is available in a particular moment of the processing is called the current context. The context information consists of the following parts:
current node data - The node which is the subject of the current event is called the current node. It is always given and there is no way to change the current node using stylesheet rules. The information available for the current node depends on a type of node; see [SAX2] definition for details. For example, qualified name, local name, prefix, namespace URI and attributes (qualified name, local name, prefix, namespace URI, and value for each) are available for elements.
ancestor stack - In addition to the current node, all its ancestor nodes (with all properties) are stored in a storage called an ancestor stack.
next node data - The processing of the current node is delayed so that the next node data is available during the processing. The lookahead information is used to access the first text child of an element, provided it is the very first child of the element.
position within siblings - An information about the position relative to other siblings is kept. The position is available for the current node and all its ancestors.
A position number is available for all node kind tests such as
node()
, text()
, cdata()
,
processing-instruction()
, comment()
. For
elements, the position is available for all qualified names or names
containing * shortcut: pre:lname
, lname
,
pre:*
, *:lname
, *
. For
processing instructions, the position is also available for each
target.
Each incoming event can invoke a rule within the stylesheet by means of precedence categories and a match pattern (see 2.3 Match Patterns). The template that is used to process the current node is called the current template. Templates can be separated into groups (see 3.3 Grouping of Templates). Top-level templates are considered to be members of the default, top-level group. The group containing the current template is referred to as the current group
Templates are associated with the precedence categories according to their visibility. The visibility is defined using the 'visibility' attribute for each template (see 4.3 Templates).
There are two precedence categories (listed with decreasing precedence):
templates from the same group and global or public templates (visibility='global'|'public') from children groups
global templates (visibility='global')
The first precedence category is searched for the best match template by means of a match pattern (see 2.3 Match Patterns. If there is no matching template in this precedence category, the second category is searched.
The match pattern specifies a set of conditions on the current context. If the current context satisfies the conditions the current event matches the pattern; if the current context doesn't satisfy the conditions the current event does not match the pattern. The syntax for patterns is a subset of the pattern syntax for XSLT (see [XSLT], 5.2). In particular, patterns are in form of location paths that meet certain restrictions.
Here are some examples of patterns:
item
- matches any 'item' element from the
namespace used for unprefixed STXPath path patterns (defined with
'default-stxpath-namespace' option, no namespace by
default)
list/item
- matches any 'item' element with
'list' parent, where both elements are from the namespace used for
unprefixed STXPath path patterns
chapter//list/item
- matches any 'item' element
with 'list' parent and 'chapter' ancestor, where all three elements
are from the namespace used for unprefixed STXPath path
patterns
/root/list/*
- matches any element with 'list'
parent and 'root' grand parent which is the document element, where
both 'rot' and 'list' elements are from the namespace used for
unprefixed STXPath path patterns
pre:list[@id=5]/pre:item
- matches any 'item'
element with 'list' parent having 'id' attribute with value of 5,
where both elements are from the namespace which is bound to 'pre'
prefix in stylesheet for this rule
*[position()=1]
- matches any element that is the
first element child of its parent
node()
- matches any node
text()
- matches any text node (including CDATA
text node)
cdata()
- matches any CDATA text node
processing-instruction()
- matches any
processing instruction
A match pattern is a set of location path patterns separated with
|
. A location path pattern is a location path whose steps
all use only the child and descendant axes. Patterns may use the
/
operator as well as the //
operator. Only
abbreviated syntax is allowed. Up to one predicate is allowed in each
step. Predicate expressions are STXPath expressions (see
6 Expressions).
Predicate expressions are evaluated and the result is converted to
a boolean. If the result is a number, the result will be converted to
true if the number is equal to the context position and will be
converted to false otherwise. Thus a location path p[3]
is equivalent to p[position()=3]
. If the result is string,
it is converted to true if and only if its length is non-zero. If the
result is node-set, it is converted to true if and only if it's not
empty.
If there is no matching template available a default rule is applied. One of three default rules can be used: 'ignore' (to ignore all no-match events), 'copy' (to copy all no-match events unmodified), and 'text' (to copy all text children of not matching events). The default rule is settable from the stylesheet (see 4.1 Transformation Options). This feature allows to copy a document with a few changes only, and to select just few items from a document in an easy way. The default behavior is to ignore all not matching events.
It is possible that the current context matches to more than one rule within a precedence category. The template rule to be used is determined according the same rules as in XSLT (see [XSLT], 5.5) then. All rules have a computed priority value. The computed priority can be overwritten with a 'priority' attribute value (see 4.3 Templates).
If the pattern contains multiple alternatives separated with
|
, then it is treated equivalently to a set of
template rules, one for each alternative.
If the pattern has the form of a qualified name or has the form either of processing-instruction(target) or cdata(), then the priority is 0.
If the pattern has the form pre:* or *:lname, then the priority is -0.25.
If the pattern consists of just a node test other than cdata(), then the priority is -0.5.
Otherwise, the priority is 0.5.
The rule with the highest priority is used. If there is more than one matching template rule with the highest priority, an STX processor must choose the rule that occurs last in the stylesheet.
<!-- Category: root --> <stx:transform version = number> <!-- Content: top-level-elements --> </stx:transform>
Stylesheets are required to use the root element of
stx:transform
.
The version
attribute contains a version number to
distinguish language versions; this attribute is mandatory and its
value must be '1.0' for this version of the language.
The stx:transform
element can contain the following children
from the STX namespace. These elements are called top-level
elements:
All top-level elements with the only exception of stx:options
can be present in multiple instances.
stx:options
and stx:namespace-alias
elements are
allowed as top-level elements only.
Templates can be organized into groups using stx:group
element. Groups of templates then play a role in template matching
(precedence categories are defined in terms of groups) and determine
the scoping of variables.
Each stylesheet has its virtual default group that is considered to be the parent of top-level groups. Explicit groups are not mandatory; many transformations can be done without grouping templates. On the other hand, templates separated to groups make it possible to define more precise transformation rules and to run safer complex transformation, especially on well-known, regular input data.
stx:group<!-- Category: top-level or group --> <stx:group> <!-- Content: group-elements --> </stx:group>
This element must be a child of either stx:transform
or
stx:group
element.
An STX stylesheet may include another STX stylesheet using the
stx:include
element.
<!-- Category: top-level or group --> <stx:include href = uri-reference/>
This element must be either a top-level or must be a child of the
stx:group
element. stx:include
element is replaced
with the content of stx:transform
element of the included
stylesheet with two exceptions; stx:namespace-alias
of the
included stylesheet is always inserted as a top-level element (even
when including to a group) and stx:options
of the included
stylesheet is ignored. Top-level variables and top-level templates from
the included stylesheet are treated as group variables and templates
when including into a group. There is no difference between templates
from the main stylesheet and included templates in terms of matching
precedence.
STX templates are called sequentially rather then from other templates.
Pair events match one template only which is separated in two parts; the
first one is executed when the starting event appears and the second one
applies to the ending event. The two parts are separated with the
stx:process-children
element.
<!-- Category: top-level --> <stx:options no-match-events = "ignore"|"copy"|"text" recognize-cdata = "yes"|"no" default-stxpath-namespace = uri-reference strip-space = "yes"|"no" output-encoding = string/>
Global properties of a transformation can be specified using the
stx:options
element.
no-match-events
- This attribute specifies a
default rule how to treat events no matching template is found for.
These events are either ignored (default) or copied to the output
without modification. For "text", only text nodes
are copied to the output.
recognize-cdata
- This attribute specifies, whether
CDATA boundaries are recognized during the transformation. If so,
a node kind test cdata()
can be used in STXPath
expressions. Otherwise
(recognize-cdata="no"
), the
cdata()
kind test never matches in STXPath
expressions. The default value is "yes".
default-stxpath-namespace
- This optional attribute
specifies a namespace used for unprefixed STXPath paths and
patterns. No namespace is used by default.
strip-space
- This optional attribute specifies
whether whitespace text nodes are stripped from the input data
stream. Whitespace text nodes are text nodes containing nothing but
the following characters: #x20, #x9, #xD or #xA. The default value
is "no".
output-encoding
- This optional attribute specifies
the preferred output encoding of the resulting byte stream. The
value of this attribute should be treated case-insensitively; the
value must contain only printable ASCII characters (#x21 - #x7E);
the value should either be a charset registered with the Internet
Assigned Numbers Authority [IANA]
or
"#input".
If the value of this attribute is "#input" or the attribute is not present, the output encoding should be the same as the input encoding. In the event an STX processor is not able to detect the input encoding, UTF-8 must be used as the output encoding. A compliant STX processor is not required to support any particular encoding other than UTF-8.
<!-- Category: top-level --> <stx:namespace-alias source-prefix = ncname|"#default" result-prefix = ncname|"#default"/>
Namespaces from the input stream can be mapped to other namespaces in
the result stream using the stx:namespace-alias
element. Both
attributes are mandatory and can contain either a prefix bound to the
namespace to be used or the "#default" keyword for the
default namespace.
<!-- Category: top-level or group --> <stx:template match = pattern priority = number visibility = "private"|"public"|"global" recursion-entry-point = "yes"|"no" mode = qname> <!-- Content: template --> </stx:template>
Rules to process input events are written in templates.
stx:template
element must be a child of either
stx:transform
or stx:group
element. Templates match
to the events by means of precedence categories and the pattern in the
match
attribute. Optional priority
attribute can
contain a priority value used for matching (see
2.3 Match Patterns).
The visibility
attribute specifies whether the template is
visible from the current group (and thus can match to the next event).
Private templates are visible in their group only, public templates are
visible from parent groups, and global templates are visible from any
group. The default value is "private".
The mode
attribute allows to limit nodes the template
matches to only to nodes received under the same mode (within
process-children, process-attributes or process-self with the same
value of the mode
attribute).
The recursion-entry-point
attribute specifies whether the
template creates new instances of group variables. The default value is
"no". A new set of group variables is created for each
instantiated template with
recursion-entry-point="yes"
. These variables
shadow their former values and exist as long as the template is being
processed.
The content of templates may include both STX instructions and literal elements. Literal elements are simply copied to the output.
<!-- Category: top-level or group --> <stx:procedure visibility = "private"|"public"|"global" recursion-entry-point = "yes"|"no" name = qname> <!-- Content: template --> </stx:procedure>
Procedures are sub-templates that can be called by names (with
stx:call-procedure
element). The visibility
and
recursion-entry-point
attributes have the same meaning as
for templates. Only visible procedures can be called by name, the
recursion-entry-point
must be set to "yes" to
create new copies of group variables. It is a static error if a
stylesheet contains more than one visible procedure with the same
name.
The content of procedures may be the same as the content of templates.
stx:call-procedure<!-- Category: top-level or group --> <stx:call-procedure name = qname> <!-- Content: stx:with-param* --> <stx:call-procedure>
The stx:call-procedure
element makes it possible to invoke
procedures by their names. The name
attribute is
mandatory.
Values can be passed to procedures as parameters. A parameter behaves in the same way as a local variable; thus it is only visible within the procedure it is passed to. There are two elements available to work with parameters:
stx:with-param<!-- Category: call-procedure --> <stx:with-param name = qname select = expression> <!-- Content: text template --> </stx:with-param>
Parameters are passed to procedures using the stx:with-param
element. The required name
attribute specifies the name of
the parameter. The value of the parameter is the result returned by an
expression located either in the select
attribute or in the
content of this element. stx:with-param
is allowed as a child
of stx:call-procedure
only.
<!-- Category: template --> <stx:param name = qname select = expression> <!-- Content: text template --> </stx:param>
The stx:param
element is allowed in procedures only (it must
be a child of stx:procedure
). The required name
attribute specifies the name of the parameter. The select
attribute or the content of this element specifies a default value,
which is used when there is no value specified using the
select
attribute or the content of the appropriate
stx:with-param
element.
<!-- Category: template --> <stx:copy attributes = pattern> <!-- Content: template --> </stx:copy>
The stx:copy
element is used to copy the current node to the
output. The optional attributes
attribute contains a match
pattern. The attributes of the current node that match the pattern are
copied to the output.
Thus, attributes="@*"
copies all attributes,
attributes="@foo|@bar"
copies the foo
and bar
attributes only,
attributes="@*[not(name()='foo')]"
copies all
but foo
attribute, and
attributes="none"
doesn't copy any attributes.
The default is to copy all attributes.
<!-- Category: template --> <stx:process-children mode = qname/>
This element splits a template into 2 parts processed by starting and
ending pair SAX2 events (start-element, end-element). There must be
always at most one stx:process-children
element in a template
matching to an element event. Moreover, a template can contain only one
of stx:process-children
and stx:process-self
instructions in the same time; otherwise an error must be reported.
This element must be always empty.
Note:
If a template doesn't contain anystx:process-children
instruction, the children of this element are not processed at all.
The default rule (<stx:options no-match-events =
"copy|ignore">
) applies only to nodes that are to
be processed, but there is no matching template found.The optional mode
attribute allows to limit templates
matching to children events to those with the same mode (the same value
of the mode
attribute) only.
<!-- Category: template --> <stx:process-attributes mode = qname/>
This instruction is used to apply templates to attribute children of
an element node. The stx:process-attributes
element must
always be empty.
The optional mode
attribute allows to limit templates
matching to attribute children to those with the same mode (the same
value of the mode
attribute) only.
<!-- Category: template --> <stx:process-self mode = qname/>
This instruction is used to process the current node using the
template that would have been chosen if the current template wasn't
present in the stylesheet. There must be always at most one
stx:process-self
element in a template. Moreover, a template
can contain only one of stx:process-children
and
stx:process-self
instructions in the same time; otherwise an
error must be reported. stx:process-self
element must always
be empty.
The optional mode
attribute allows to limit matching
templates to those with the same mode (the same value of the
mode
attribute) only.
<!-- Category: template --> <stx:value-of select = string-expression/>
This instructions emits characters to the result stream. The mandatory
select
attribute contains an STXPath expression evaluating
to string. This element is always empty.
<!-- Category: template --> <stx:text> <!-- Content: #PCDATA --> </stx:text>
This instructions emits literal character data to the result stream.
The content is neither normalized nor stripped should it contain
whitespace characters only. Results of consequent stx:value-of
and stx:text
instructions are joined so that they emit a
single character() event.
<!-- Category: template --> <stx:cdata> <!-- Content: #PCDATA --> </stx:cdata>
This instructions emits literal data as a CDATA section to the result stream. The content is neither normalized nor stripped should it contain whitespace characters only.
<!-- Category: template --> <stx:element name = {qname} namespace = {uri-reference}> <!-- Content: template --> </stx:element>
This instruction is used to generate an element. It has the same meaning as in [XSLT].
stx:element-start<!-- Category: template --> <stx:element-start name = {qname} namespace = {uri-reference}/>
<!-- Category: template --> <stx:element-end name = {qname} namespace = {uri-reference}/>
There are separate instructions available to output an element start
tag and an element end tag. The name
attribute is required
for both instructions. The both elements must be empty.
A compliant STX processor is required to produce well-formed XML output. An attempt to create an end-tag without a matching start-tag must be reported as error by the STX processor.
stx:attribute<!-- Category: template --> <stx:attribute name = {qname} namespace = {uri-reference} select = string-expression> <!-- Content: text template --> </stx:attribute>
This instruction is used to generate an attribute. It has the same
meaning as in [XSLT]. stx:attribute
must follow
an element-starting instruction (stx:element
,
stx:element-start
, stx:copy
, or a literal element)
and no other output-generating instructions are allowed between the
element-starting instruction and stx:attribute
.
<!-- Category: template --> <stx:processing-instruction name = ncname> <!-- Content: text template --> </stx:processing-instruction>
This instruction is used to generate a processing instruction. The
mandatory name
attribute is an attribute value template.
<!-- Category: template --> <stx:comment> <!-- Content: text template --> </stx:comment>
This instruction is used to generate a comment. It has the same meaning as in [XSLT].
<!-- Category: template --> <stx:if test = boolean-expression> <!-- Content: template --> </stx:if>
The mandatory test
attribute contains an STXPath expression
evaluating to boolean. The content template is instantiated if and only
if the test
attribute has evaluated to true.
<!-- Category: template --> <stx:else> <!-- Content: template --> </stx:else>
This instruction must follow immediately after stx:if
; an
error must be reported otherwise. The content template is instantiated
if and only if the test
attribute of the preceding
stx:if
instruction has evaluated to false.
<!-- Category: template --> <stx:choose> <stx:when test = boolean-expression> <!-- Content: template --> </stx:when>+ <stx:otherwise> <!-- Content: template --> </stx:otherwise>? </stx:choose>
The same meaning as in [XSLT].
There are four atomic data types in STX:
string
number
boolean
node
There are seven types of node recognized in STXPath. For every type of node, there is a way of determining a string-value. Since descendants are not available in the time of processing, the string value for some types of nodes is not defined.
root nodes - there is no string value defined for root nodes, an error is reported
element nodes - there is no string value defined for element nodes, an error is reported
attribute nodes - the string-value of an attribute is the normalized value of this attribute
text nodes - the string-value of a text node is the character data of this node
cdata nodes - the string-value of a cdata node is the character data of this node
processing instruction nodes - the string-value of a
processing instruction node is the part of the processing
instruction following the target and any whitespace not including
the terminating ?>
comment nodes - the string-value of a comment is the content
of this comment not including the opening <!--
or
the closing -->
STXPath expressions (see 6 Expressions) always return a sequence. A sequence is an ordered collection of zero or more items. Unlike common lists, sequences are "flat"; sequences may not contain other sequences. Sequences may contain duplicate items. An item must be of one of the atomic types: string, number, boolean, or node.
A sequence with zero items is called an empty sequence. A sequence with exactly one item is called a singleton sequence. There is no distinction between an item and a singleton sequence containing this item; an item is equivalent to a singleton sequence containing this item and vice versa. A sequence has no identity. Equality comparison of sequences is performed only by comparing items of the sequences.
Certain operators, functions, and syntactic constructs expect a value of a particular type to be supplied: this type is referred to as a required type. In such an event, a general sequence is converted to the required type according to the conversion rules
The empty sequence is converted to required types as defined in the following table:
required type | empty sequence |
---|---|
boolean | false |
string | empty string |
number | NaN |
node | ERROR |
A singleton sequence is converted to a required type according to the type of the only member of the sequence:
required type | boolean member | string member | number member | node member |
---|---|---|---|---|
boolean | - | false is converted to 'false', true
is converted to 'true' | false is converted to 0, true is
converted to 1 | ERROR |
string | 'false', '0', empty string are converted to
false , other strings are converted to
true | - | a string that consists of optional whitespace followed by an
optional minus sign followed by a numeric literal (see
6.2 Literals) followed by whitespace is converted to
the number that is nearest to the mathematical value represented
by the string; any other string is converted to
NaN . | ERROR |
number | 0, +0, -0, NaN are converted to
false , other numbers are converted to
true | NaN is converted to 'NaN', +0 and -0 are
converted to '0', positive infinity is converted to 'Infinity',
negative infinity is converted to '-Infinity'. Other numbers are
represented in decimal form as numeric literal (see
6.2 Literals) with no leading zeros (apart possibly
from the one required digit immediately before the decimal
point), preceded by a minus sign (-) if the number is
negative. | - | ERROR |
node | a node is converted to true | a node is converted to its string value (see 6 Expressions) | a node is converted to its string value (see 6 Expressions); then the rules to convert strings to numbers are applied to convert the string value to a number | - |
A sequence containing more than one item is converted according to its very first item; all other items are ignored. The same conversion rules as for singleton sequences are applied (see the table above).
STX uses an expression language of its own called STXPath. STXPath is very similar to [XPath] on the first sight. Syntactically, STXPath is close to an [XPath2] sub-set. However, since STX has a different notion of context, the meaning of some expressions may be different in STXPath and in XPath. Consider the following example:
In XPath, the expression /node1/node2
returns a node-set
containing all node2
elements, whose parent node1
is
the document element. In STXPath, on contrary, the same expression
returns only a single node from this node-set; the one which is an
ancestor of the current node.
Expressions are used in STX as match patterns, to specify conditions for different ways of processing of the current node, to generate text to be inserted to the output stream, or to access data from the ancestor stack.
Each expression has its static context - the information that is available during static analysis of the expression, prior to its evaluation. The static context includes in-scope namespaces, default namespace for element names, and in-scope variables. The information that is available at the time when the expression is evaluated is the current context as defined in 2.1 Context.
Basic primitives of STXPath include:
variables (6.1 Variables)
literals (6.2 Literals)
parenthesized expressions (6.3 Parenthesized Expressions)
functions (6.4 Functions)
Expressions evaluate to string, number, boolean, or node-set. See the standalone STXPath grammar BNF definition for details.
STX variables are scoped statically according to the literal structure of stylesheets. The grouping of templates is used to make the sharing of other than global variables possible.
There are two types of variables:
group variables - stx:variable
is child
of either stx:transform
or stx:group
. Top-level
variables are considered to be members of the top-most default
group that exists for each stylesheet.
local variables - Declared within templates.
A group variable is visible for the group where the variable is declared, for all descendant groups and for all templates belonging to these groups. A local variable is visible for all following siblings of the variable declaration and their descendants. Group variables may be shadowed (another variable with the same name is visible) by descendant group variables and by local variables. It is an error to redeclare a variable with the same name in the same group or template.
Variables always contain a sequence. STX instructions
stx:variable
and stx:assign
are used to evaluate an
expression and store its value to a variable.
Since variables are re-assignable, each variable must be declared
using the stx:variable
element before it's used (assigned,
referenced). Group variables are statically initialized while parsing
the stylesheet; Only the static context information is
available during the initialization. Local variables are initialized in
the run-time. A variable declared with no value is initialized with the
empty sequence.
<!-- Category: top-level or group or template --> <stx:variable name = qname select = expression keep-value = "yes"|"no"> <!-- Content: text template --> </stx:variable>
This instruction is used to declare and initialize a variable. The
name
mandatory attribute contains the name of variable. An
expression in the select
attribute is evaluated and the
variable is initialized with its result. The select
attribute is optional; a variable is initialized with the string
resulting from the content of
the stx:variable
element if the select
is missing.
If the content is empty (stx:variable
element has no children)
the variable is initialized with the empty sequence.
The keep-value
optional attribute specifies whether a new
copy of variable created by recursion is initialized with the value of
the shadowed variable (yes
) or not (no
). The
default value is no
. If there is no shadowed variable yet,
the keep-value
attribute is ignored.
<!-- Category: top-level or group or template --> <stx:assign name = qname select = expression> <!-- Content: text template --> </stx:assign>
This instruction is used to assign a new value to a previously
declared variable. The name
mandatory attribute contains the
name of variable. An expression in the select
attribute is
evaluated and its result is assigned to the variable. The string
resulting from the content of the stx:variable
element is
assigned to the variable if the select
is missing. If the
content is empty, the empty sequence is assigned to the variable.
A literal is a direct syntactic representation of an atomic value. STXPath supports two kinds of literals: string literals and numeric literals.
The value of a string literal is a singleton sequence containing an item whose atomic type is string and whose value is the string denoted by the characters between the delimiting quotation marks.
StringLiteral ::= (["][^"]*["]) | (['][^']['])
The value of a numeric literal is a singleton sequence containing an item whose type is number and whose value is obtained by parsing the numeric literal according to the rules for string to numbers conversion (see 5.3 Type Conversions).
NumericLiteral ::= IntegerLiteral | DecimalLiteral | DoubleLiteral IntegerLiteral ::= Digits DecimalLiteral ::= ('.' Digits) | (Digits '.' [0-9]*) DoubleLiteral ::= (('.' Digits) | (Digits ('.' [0-9]*)?))([e]|[E])([+]|[-])? Digits
Parentheses may be used to enforce a particular evaluation order in expressions that contain multiple operators.
Parentheses are also used as delimiters in constructing a sequence, as described in 6.6 Sequence Expressions.
A function call consists of a function name followed by a parenthesized list of zero or more expressions. The expressions inside the parentheses provide the arguments of the function call. The number of arguments must be equal to the number of function parameters; otherwise a static error is raised.
A function calls are evaluated as follows:
Each argument expression is evaluated, producing an argument value (sequence).
If the the corresponding function parameter has a required type, the argument value is converted to this type.
The function is executed using the converted argument values. The result is a value of the function's declared return type.
The following list of STXPath functions is categorized by required types of primary arguments:
The empty() function returns true if the argument is the empty sequence; otherwise it returns false.
The item-at() function returns the item from the first argument sequence at the position given by the second argument. The index number is rounded to the nearest integer if necessary. If the sequence is the empty sequence, this function returns the empty sequence. If the value of index is greater than the number of items in the sequence, or is less than or equal to zero, then the function reports an error.
The sublist() function returns the contiguous sequence of items from the first argument (source sequence) beginning at the position specified by the second argument (index) and continuing for the number of items indicated by the third argument (length). If length is not specified, then the sublist identifies items to the end of the source sequence. The index and length numbers are rounded to the nearest integers if necessary. If the source sequence is the empty sequence, this function returns the empty sequence. If the value of index is greater than the number of items in the sequence, or is less than or equal to zero, then the function reports an error. The length can be greater than the number of items in the source sequence following the beginning position, in which case the sublist identifies items to the source sequence.
The count() function returns the number of items in the sequence.
The name function returns a string containing a qualified name representing the expanded-name of the node in the argument.
The namespace function returns the namespace URI of the expanded-name of the node in the argument.
The local-name() function returns the local part of the expanded-name of the node in the argument.
The prefix() function returns the prefix of the expanded-name of the node in the argument.
The position() function returns a number equal to the position of the current node relative to other siblings, see 2.1 Context for details of position() semantics.
The get-node() function returns the node which
is in the ancestor stack at the level given by the argument. The
level number is rounded to the nearest integer if necessary. For
example, get-node(0)
returns the root of document,
get-node(1)
returns the document element.
get-node(level())
returns the current node. If there is
no node at the requested level in the ancestor stack, the function
returns the empty sequence.
The has-child-nodes() function returns true if and only if the current node is the document node or an element node and has child nodes (it is not empty). It returns false otherwise.
The true() function returns always
true
.
The false() function returns always
false
.
The not() function reduces its parameter to an effective boolean value using the same rules that are used for the operands of logical expressions (see 6.9 Logical Expressions). It then returns true if the effective boolean value of its parameter is false, and false if the effective boolean value of its parameter is true.
The starts-with() function returns true if the first argument string starts with the second argument string, otherwise it returns false. If the value of any argument is the empty sequence, the function returns the empty sequence.
The contains() function returns true if the first argument string is part of the second argument string, otherwise it returns false. If the value of any argument is the empty sequence, the function returns the empty sequence.
The substring() function returns the number specified with the third argument of characters from the offset specified with the second argument in the first argument string; or all characters from the offset to the end of the string if the third argument is omitted; the offset and length numbers are rounded to the nearest integer if necessary. The offset of the first character is 1. If the value of any argument is the empty sequence, the function returns the empty sequence.
The substring-before() function returns the part of the first argument string from the beginning of the string up to (but not including) the first occurrence of the second argument string. The empty string is returned if the first argument string does not contain the second argument string. If the value of any argument is the empty sequence, the function returns the empty sequence.
The substring-after() function returns the part of the first argument string from the end of the first occurrence of the second argument string to the end of the (first) string. The empty string is returned if the first argument string does not contain the second argument string. If the value of any argument is the empty sequence, the function returns the empty sequence.
The string-length() function returns the number of characters in a string. If the value of the argument is the empty sequence, the function returns the empty sequence.
The normalize-space() function returns the argument string after leading and trailing whitespace is stripped and consequent whitespace characters are replaced with a single space. If the value of the argument is the empty sequence, the function returns the empty sequence.
The translate() function returns the first argument string with occurrences of characters in the second argument string replaced by the corresponding characters from the third argument string. If there is a character in the second argument string with no character at a corresponding position in the third argument string (because the second argument string is longer than the third argument string), then occurrences of that character in the first argument string are removed. If a character occurs more than once in the second argument string, then the first occurrence determines the replacement character. If the third argument string is longer than the second argument string, then excess characters are ignored. If the value of any argument is the empty sequence, the function returns the empty sequence.
The concat() function returns the concatenation of its arguments. If the value of any argument is the empty sequence, the function returns the empty sequence.
The replace() function returns the first argument string with parts that match a regular expression given in the second argument string replaced with the third argument string. The regular expression semantics as defined in XML Schema Part 2: Datatypes ([XSD2]), Appendix F is used.
The fourth optional argument is a string consisting of character flags to be used by the match. If a character is present then that flag is true. The flags are:
g - global replace
All occurrences of the regular expression in the string are replaced. If this character is not present, then only the first occurrence of the regular expression is replaced.
i - case insensitive
The regular expression is treated as case insensitive. If this character is not present, then the regular expression is case sensitive.
If the value of any argument is the empty sequence, the function returns the empty sequence.
The match() function returns a list of integers that identify the offset of the location within the value of the first argument string that is matched by the regular expression that is the value of the second argument string. If there is no substring of the first string that matches the regular expression, the empty sequence is returned. Otherwise, a sequence of two integers is returned: the first integer is the position of the start of the substring and the second integer is the length of the substring that matches. The regular expression semantics as defined in XML Schema Part 2: Datatypes ([XSD2]), Appendix F is used.
The third optional argument is a string consisting of character flags to be used by the match. If a character is present then that flag is true. The flags are:
g - global replace
All occurrences of the regular expression in the string are replaced. If this character is not present, then only the first occurrence of the regular expression is replaced.
i - case insensitive
The regular expression is treated as case insensitive. If this character is not present, then the regular expression is case sensitive.
If the value of any argument is the empty sequence, the function returns the empty sequence.
The floor() function returns the largest number that is not greater than the argument and that is an integer. If the value of the argument is the empty sequence, the function returns the empty sequence.
The ceiling() function returns the smallest number that is not less than the argument and that is an integer. If the value of the argument is the empty sequence, the function returns the empty sequence.
The round() function returns the number that is
closest to the argument and that is an integer. If there are two such
numbers, then the greater one is returned. If the argument is
NaN
, then NaN
is returned. If the value of
the argument is the empty sequence, the function returns the empty
sequence.
The sum() function returns the sum, for each item in the argument sequence, of the result of converting the item to a number. If the value of the argument is the empty sequence, the function returns the empty sequence.
The string() function returns the result of converting the argument to a string. See 5.3 Type Conversions for details.
The number() function returns the result of converting the argument to a number.
The boolean() function returns the result of converting the argument to a boolean.
The level() function returns the level of the
argument node in the ancestor stack. level()
and
level(.)
return the level of the current node.
level(/)
returns 0. If the value of the argument is the
empty sequence, the function returns the empty sequence.
The only data available when processing the current node is the data related to the current node, the data related to the next node, and the data related to nodes in the ancestor stack. Location paths called data accessors are used to access to this data. Axes in data accessors are limited to:
parent and ancestor axes in relative location paths
child and descendant axes (abbreviated syntax only) in absolute location paths
attribute axis (abbreviated syntax only)
text()
node test (child axis) for the current
node
Predicates are not allowed in data accessors.
A data accessor always returns a sequence (often a singleton one).
These sequences are very limited; they can contain nothing but nodes
stored in the ancestor stack (the current node and its attributes,
ancestor elements and their attributes) and the next nodes (only if the
next node happens to be a text node, accessed with text()
).
Resulting sequences can be either passed to functions operating with
sequences or converted to string, number or boolean.
Here are some examples of data accessors:
.
- returns the current node
text()
- returns the first text child of the
current node provided it is the very first child of the current
node
parent::*
- returns the parent node of the
current node
ancestor::*
- returns a sequence whose items are
all ancestors of the current node
@foo
- returns the foo
attribute of
the current node
ancestor::*/@bar
- returns a sequence of
bar
attributes of ancestors of the current
node
/aaa/bbb
- returns a bbb
element from
the ancestor stack which is a child of aaa
element which
is the root element of the ancestor stack (and hence the root
element of the input document)
STXPath supports operators to construct and combine sequences. One way to construct a sequence is using a parenthesized expression (6.3 Parenthesized Expressions), which is zero or more expressions separated with the comma operator and delimited with parentheses. The parenthesized expression is evaluated by evaluating each of its constituent expressions and concatenating the resulting sequences, in order, into a single result sequence.
Here are some examples of expressions that construct sequences:
This expression is a sequence of five integers:
(10, 1, 2, 3, 4)
This expression constructs one sequence from the sequences 10, (1, 2), the empty sequence (), and (3, 4):
(10, (1, 2), (), (3, 4))
It evaluates to (10, 1, 2, 3, 4) sequence.
STXPath provides arithmetic operators for addition, subtraction, multiplication, division, and modulus, in their usual binary and unary forms. The binary subtraction operator must be preceded by white space in order to distinguish it from a hyphen, which is a valid name character.
An arithmetic expression is evaluated by applying the following rules:
If either operand is the empty sequence, the result of the operation is the empty sequence.
Operands other than empty sequences are converted
(5.3 Type Conversions) to numbers before the
expression is evaluated. If the conversion fails (returns
NaN
) an error is reported.
Comparison expressions allow two values to be compared. STXPath provides the following general comparison operators: =, !=, <, <=, >, >=. The result of a comparison is always true or false (a singleton sequence containing one boolean item).
CompOp ::= '=' | '!=' | '<' | '<=' | '>' | '>='
The result of a comparison of sequences is defined by applying the following rules, in order:
If either operand is the empty sequence, the result is false.
The comparison A operator B
is true for sequences
A
and B
if the comparison
a operator b
is true for some item a
in
A
and some item b
in B
.
Otherwise, A operator B
is false.
The result of a comparison of items is defined by applying the following rules. The rules defined in 5.3 Type Conversions apply for conversions:
If both items to be compared are nodes, then the comparison will be true if and only if the result of performing the comparison on the string-values of the two nodes is true.
If one item to be compared is a node and the other is a number, then the comparison will be true if and only if the result of performing the comparison on the number and on the result of converting the string-value of that node to a number is true.
If one item to be compared is a node and the other is a string, then the comparison will be true if and only if the result of performing the comparison on the string-value of the node and the other string is true.
If one item to be compared is a node and the other is a boolean, then the comparison will be true if and only if the result of performing the comparison of true and the boolean value is true.
When neither item to be compared is node and the operator is = or !=, then the items are compared by converting them to a common type as follows and then comparing them. If at least one item to be compared is a boolean, then each item to be compared is converted to a boolean. Otherwise, if at least one item to be compared is a number, then each item to be compared is converted to a number. Otherwise, both items to be compared are converted to strings.
When neither item to be compared is node and the operator is <=, <, >= or >, then the items are compared by converting both items to numbers and comparing the numbers.
STXPath provides two common logical operators: and
and
or
. The value of a logical expression is always one of the
boolean values true
or false
(a singleton
sequence containing a boolean item).
Logical expressions are evaluated by reducing each of its operands to an effective boolean value by applying the following rules, in order:
If the operand is the empty sequence, its effective boolean
value is false
.
If the operand is a singleton sequence containing a boolean item, the item serves as the effective boolean value.
If the operand is a sequence that contains at least one node,
its effective boolean value is true
.
In any other case, operands are converted to boolean (see 5.3 Type Conversions) to get effective boolean values.
An AND expression returns true
if the
effective boolean values of both of its operands are true
;
otherwise it returns false
.
An OR expression returns false
if the
effective boolean values of both of its operands are
false
; otherwise it returns true
.
In addition to logical expressions, XPath provides a function named not() that takes a general sequence as parameter and returns a boolean value.
Plain list only so far:
stx:transform stx:options stx:include stx:namespace-alias stx:template stx:procedure stx:group stx:call-procedure stx:copy stx:process-children stx:process-attributes stx:process-self stx:value-of stx:text stx:cdata stx:element stx:element-start stx:element-end stx:processing-instruction stx:comment stx:attribute stx:if stx:else stx:choose stx:when stx:otherwise stx:variable stx:assign stx:with-param stx:param stx:for-each
A text template is defined as a content of some elements
(stx:attribute
, stx:variable
,
stx:param
, stx:with-param
,
stx:assign
, stx:processing-instruction
,
stx:comment
). The text template is a subset of template
that can contain the following elements:
stx:text stx:cdata stx:value-of stx:if stx:else stx:choose and literal text
The following have contributed to authoring this specification:
Petr Cimprich
Christian Nentwich
Oliver Becker
Honza Jiroušek
Michael Kay
Tom Kaiser
Pavel Hlavnička
Niko Matsakis
Cyrus Dolph
Manos Batsis
Barrie Slaymaker
Jan Poslušnư
Michael Brennan
Tree Fragments
Immutable, opaque tree fragments should be added. We need to define instructions to catch events and store them to a tree, and to send the content of the tree the output.
Resolution:
None recorded.
Modes
The mode
attribute is defined for some elements
(stx:template, stx:process-*) currently. It seems this is not
necessary as modes don't play a relevant role in an one-pass
language. Moreover, matching constrains via modes are orthogonal
to another mechanism to limit the number of matching templates -
the grouping of templates.
Resolution:
None recorded.
Multiple Inputs and Outputs
STX needs a way to allow multiple input/output streams. There is a
idea to use instructions stx:process-document
and
stx:result-document
with href
attribute. This
attribute would contain a URI identifying input and output
channels.
I/O channels are abstractions; each of them has a URI and a resolver (SAX driver for input channels, SAX handler for output channels). There is always one primary input channel and one primary output channel defined for a transformation, and any number of secondary channels.
Resolution:
None recorded.
Current node and for-each
The mechanism of the for-each loop needs to be clarified in STX. Like in XSLT 2.0, it iterates through a sequence of items. If the sequence contains nodes, there may be problems with using the current item as the current node, since the context information for the current item (that is node) can be no more available.
Resolution:
None recorded.