TclXML Parser

TclXML is an open source XML parser written in Tcl. Altair Accelerator ships TclXML with it, so no 3rd party installation is needed to use TclXML. For detailed information, please go to http://tclxml.sourceforge.net/tclxml.html.

The goal of the TclXML package is to provide an API for Tcl scripts that allows "Plug-and-Play" parser implementations; e.g. an application will be able to use different parser implementations without change to the application code.

The TclXML package provides a streaming, or "event-based", interface to an XML document. An application using TclXML creates a parser "object", sets a number of callback scripts and then instructs the parser to parse an XML document. The parser scans the XML document's text and as it finds certain constructs, such as the start/end of elements, character data, and so on, it invokes the appropriate callback script.

Parser Object (::xml::parser)

The ::xml::parser command creates an XML parser object. The return value of the command is the name of the newly created parser.

The parser scans an XML document's syntactical structure, evaluating callback scripts for each feature found. At the very least the parser will normalise the document and check the document for well-formedness. If the document is not well-formed then the -errorcommand option will be evaluated. Some parser classes may perform additional functions, such as validation. Additional features provided by the various parser classes are described in the section Parser Classes.

Parsing is performed synchronously. The command blocks until the entire document has been parsed. Parsing may be terminated by an application callback, see the section Callback Return Codes. Incremental parsing is also supported by using the -final configuration option.

The following table lists all options to the parser object. To find a complete explanation on all script arguments, please find online documentation at http://tclxml.sourceforge.net/tclxml.html.

Option Script Args Description
-attlistdeclcommand <script> name attrname type default value Specifies the prefix of a Tcl command to be evaluated whenever an attribute list declaration is encountered in the DTD subset of an XML document.
-baseurl URI N/A Specifies the base URI for resolving relative URIs that may be used in the XML document to refer to external entities.
-characterdatacommand <script> data Specifies the prefix of a Tcl command to be evaluated whenever character data is encountered in the XML document being parsed.
-commentcommand <script> data Specifies the prefix of a Tcl command to be evaluated whenever a comment is encountered in the XML document being parsed.
-defaultcommand <script> data Specifies the prefix of a Tcl command to be evaluated when no other callback has been defined for a document feature which has been encountered.
-defaultexpandinternalentities <boolean> N/A Specifies whether entities declared in the internal DTD subset are expanded with their replacement text. If entities are not expanded then the entity references will be reported with no expansion.
-doctypecommand <script> name public system dtd Specifies the prefix of a Tcl command to be evaluated when the document type declaration is encountered.
-elementdeclcommand <script> name model Specifies the prefix of a Tcl command to be evaluated when an element markup declaration is encountered.
-elementendcommand <script> name args Specifies the prefix of a Tcl command to be evaluated when an element end tag is encountered.
-elementstartcommand <script> name attlist args Specifies the prefix of a Tcl command to be evaluated when an element start tag is encountered.
-endcdatasectioncommand <script> -NONE- Specifies the prefix of a Tcl command to be evaluated when end of a CDATA section is encountered. The command is evaluated with no further arguments.
-enddoctypedeclcommand <script> -NONE- Specifies the prefix of a Tcl command to be evaluated when end of the document type declaration is encountered. The command is evaluated with no further arguments.
-entitydeclcommand <script> name args Specifies the prefix of a Tcl command to be evaluated when an entity declaration is encountered.
-entityreferencecommand <script> name Specifies the prefix of a Tcl command to be evaluated when an entity reference is encountered.
-errorcommand <script> errorcode errormsg Specifies the prefix of a Tcl command to be evaluated when a fatal error is detected. The error may be due to the XML document not being well-formed. In the case of a validating parser class, the error may also be due to the XML document not obeying validity constraints. By default, a callback script is provided which causes an error return code, but an application may supply a script which attempts to continue parsing.
-externalentitycommand <script> name baseuri uri id Specifies the prefix of a Tcl command to be evaluated to resolve an external entity reference. If the parser has been configured to validate the XML document, a default script is supplied that resolves the URI given as the system identifier of the external entity and recursively parses the entity's data. If the parser has been configured as a non-validating parser, then by default external entities are not resolved. This option can be used to override the default behaviour.
-final <boolean> N/A Specifies whether the XML document being parsed is complete. If the document is to be incrementally parsed then this option will be set to false, and when the last fragment of document is parsed it is set to true.
-ignorewhitespace <boolean> N/A If this option is set to true then spans of character data in the XML document which are composed only of white-space (CR, LF, space, tab) will not be reported to the application. In other words, the data passed to every invocation of the -characterdatacommand script will contain at least one non-white-space character.
-notationdeclcommand <script> name uri Specifies the prefix of a Tcl command to be evaluated when a notation declaration is encountered.
-notstandalonecommand <script> -NONE- Specifies the prefix of a Tcl command to be evaluated when the parser determines that the XML document being parsed is not a standalone document.
-paramentityparsing <script> -NONE- Controls whether external parameter entities are parsed.
-parameterentitydeclcommand <script> name args Specifies the prefix of a Tcl command to be evaluated when a parameter entity declaration is encountered.
-parser <script> -NONE- The name of the parser class to instantiate for this parser object. This option may only be specified when the parser instance is created.
-processinginstructioncommand <script> target data Specifies the prefix of a Tcl command to be evaluated when a processing instruction is encountered.
-reportempty <boolean> N/A If this option is enabled then when an element is encountered that uses the special empty element syntax, additional arguments are appended to the -elementstartcommand and -elementendcommand callbacks. The arguments [-empty 1] are appended.
-estartcdatasectioncommand <script> -NONE- Specifies the prefix of a Tcl command to be evaluated when the start of a CDATA section section is encountered. No arguments are appended to the script.
-startdoctypedeclcommand <script> -NONE- Specifies the prefix of a Tcl command to be evaluated at the start of a document type declaration. No arguments are appended to the script.
-unknownencodingcommand <script> -NONE- Specifies the prefix of a Tcl command to be evaluated when a character is encountered with an unknown encoding. This option has not been implemented.
-unparsedentitydeclcommand <script> system public notation Specifies the prefix of a Tcl command to be evaluated when a declaration is encountered for an unparsed entity.
-validate <boolean> N/A Enables validation of the XML document to be parsed. Any changes to this option are ignored after an XML document has started to be parsed, but the option may be changed after a reset.
-warningcommand <script> warningcode warningmsg Specifies the prefix of a Tcl command to be evaluated when a warning condition is detected. A warning condition is where the XML document has not been authored correctly, but is still well-formed and may be valid. For example, the special empty element syntax may be used for an element which has not been declared to have empty content. By default, a callback script is provided which silently ignores the warning.
-xmldeclcommand <script> version encoding standalone Specifies the prefix of a Tcl command to be evaluated when the XML declaration is encountered.

Application Example

The following is a crude application example on how to pretty print an XML tree. Note, singleton elements are not printed correctly.
# Import 'xml' Tcl package
package require xml

set indent 0 ; # Global indentation variable, for pretty printing

# Declare various callback procedures which will be used by parser
# to crudely pretty print an XML tree.

# Define callback for when an XML element start tag is encountered
proc startCmd {name attlist args} {
    global indent
    puts [format "%[expr $indent * 4]s<$name $attlist $args>" ""]
    incr indent
}

# Define callback for when an XML element end tag is encountered
proc endCmd {name args} {
    global indent
    incr indent -1
    puts [format "%[expr $indent * 4]s</$name $args>" ""]
}

# Define callback for when an XML element's CDATA is encountered
proc cdataCmd {data args} {
    global indent
    puts [format "%[expr $indent * 4]s$data $args" ""]
}

# Create XML parser object with certain callbacks
set parser [::xml::parser  -ignorewhitespace 1  -elementstartcommand startCmd  -elementendcommand endCmd  -characterdatacommand cdataCmd]

# Open XML file for parsing or read from stdin
if {$argc} {
    set file [shift]
    if {![file exists $file]} {
        VovFatalError "The specified file '$file' does not exist"
    }
    set fp [open $file "r"]
    $parser parse [read $fp]
    close $fp
} else {
    VovMessage "Waiting for STDIN"
    $parser parse [read stdin]
}