Version 0.9 |
Syntax Reference for the SDD Language
SDD MetaDataThis appears on the first line of an SDD. It defines the version of the SDD Language being used and the document type/format that the SDD will be used to validate.
CommentsComments are used to add explanatory notes to the sdd. They may be associated with the sdd, a tag element, or a block element. They play no role during validation. The text of comments is enclosed in <! and !>.
Tag ElementTag elements define the names of tags, which are document-type specific, and may or may not have a close tag associated with them. Tags that do not have an associated close tag are referred to as singletons. Tags with an associated close tag may contain other elements (tags, blocks, and ors). Users may also describe the types of data that an element can contain and any attributes associated with it. Only tag elements are allowed to contain data and attribute descriptions. Tag elements require a frequency specification and may be associated with a namespace. When a tag elements appear multiple times within an SDD, it is fully defined on its first appearance but subsequent references only specify the element's placement and frequency, omitting the remainder of its definition.
Block ElementBlock elements wrap other elements (tags, blocks, and ors) and are usually used to define a repeating sequence of elements. Like tags, block elements require a frequency specification and may be associated with a namespace. Restrictions:
Or ElementOr elements are used to define alternatives. Only one of the alternatives per element instance can be applied during parsing. An element whose content is defined using ors must contain at least two or elements. Restrictions:
FrequencyFrequency defines how many times a tag or block element may occur within a specific context. Users may specify that an element occurs an absolute number of times, zero or more, one or more, and zero or once. In addition, users can use the % operator to specify that an element can appear any number of times and that its position relative to other elements does not matter. An example of the % operator's use would be a paragraph in which tags denoting titles and cross-references are interspersed throughout the text in no particular order. Users may also specify a range. For example, the frequency 2+5 means that the element must appear at least twice but no more than five times. The % operator can also be specified as a range. For example, 2%5 means that the element must appear at least twice but no more than five times and that its position relative to other elements is irrelevant.
NamespacesA namespace is a label that is associated with an element, either a tag or block. For blocks, the association allows the SDD author to reuse a block element without having to redefine it each time. When applied to tags, namespaces provide a means to redefine a tag's definition, including its internal structure, content constraints, and attributes. A specification, for example, might contain two different types of para elements. One type, to be used within the body of the document, might allow charts, images, and lists to be imbedded in the text. Another type would allow text only and be used in the document's abstract. This type of restrictive definition can be applied without changing the markup of the source document and facilitates customizing a generic document structure to the particular needs a specific project. Defining AttributesSome data formats, such as SGML and XML, allow markup to have attributes. Attributes can be required or optional. Their specifications consist of an attribute's name along with values--specified using datatypes and/or regular expressions. The language constructs and syntax used to constrain data associated with an element also apply to data associated with an attribute. Both tag and singleton tag elements may contain attributes. Each attribute specification is demarcated with sqaure brackets. The # synbol denotes that an attribute is required, meaning that the tag will not be considered valid if the attribute is missing. The loose or strict keywords follow but are optional. Next is the attributes name, followed by an equal sign which precedes a regular expression and/or list of datatypes.
Defining Data ConstraintsData content may be constrained using datatypes or regular expressions. Datatypes allow the user to pick individual characters or classes of characters to which the data contained within a tag element must conform. The frequency or sequence of the characters in the tag's content does not matter when using a datatype constraint. Datatype specifications are denoted between curly braces and consist of a datatype keyword or one or more characters. For example, the following datatype specification {LOWERCASE}{-_.} allows for lowercase letters along with hyphens, underscores, or periods. If the validator encountered any other type of character while processing this tag's content, an error would be reported. To facilitate the shortest datatype specifications possible, users may use the "!" operator to signify that a datatype is not allowed. For example, the specification {PCDATA}!{DIGIT} allows for all characters except digits. The characters allowed in a datatype specification and the meaning of some of the datatype keywords are document-type specific. For example, the characters "<" and ">" are not legal content within an SGML/XML document.
A tag's content may also be constrained using regular expressions These are specified using the rgx keyword as follows: rgx="regular expression". A tag may only have one regular expression and the entire content of the tag must match this expression. The table below provides some examples of regular expressions. To read more about regular expressions, see OROMatcher's User Guide or Perl Documentation on Regular Expressions.
The three keywords optional, loose, and strict also affect the validation of a tag's content. The optional keyword instructs the validator that the tag's content is optional and that it should not issue an error if the tag has no content. Loose means that data validation should be carried out using the least restrictive method available. Strict means that the most restrictive method should be used. The macro level--accessed through Validation | Configuration--controls how data validation will be handled for all the elements in an SDD, ie whether the validator should use datatypes (loose) or regular expressions (strict) when validating content. If the SDD provides only one type of specification, either datatype or regular expression, then that type will be used when validating regardless of the application-level property. If the SDD provides both types, then the validator will use the type specified through the application-level property. The loose and strict keywords act as local overrides for this application-level property, instructing the validator to use the specified datatype, in the case of loose, or regular expression, in the case of strict, when validating a tag's content. The table below summarizes the application's behavior--the order in which it will select the different types of validation--in various circumstances. The ultimate default if datatypes or an rgx have not been specified is an empty datatype.
Character EntitiesIn progress. Sample SDDIn progress. |