Relax NG Compact Syntax: no to operator precedence, yes to annotations!

James Clark is a fucking genius! He’s the guy who wrote the Expat XML parser, works on Relax NG, and does tons of other important stuff. Relax NG is an ingeniously designed, elegant XML schema language based on regular expressions, which also has a compact, convenient non-xml syntax.

I totally respect the way he throws down the gauntlet on operator precedence (take that you Perl and C++ weenies!):

There is no notion of operator precedence. It is an error for patterns to combine the |, &, , and - operators without using parentheses to make the grouping explicit. For example, foo | bar, baz is not allowed; instead, either (foo | bar), baz or foo | (bar, baz) must be used. A similar restriction applies to name classes and the use of the | and - operators. These restrictions are not expressed in the above EBNF but they are made explicit in the BNF in Section 1.

You can translate back and forth between Relax NG's XML and compact syntaxes with full fidelity, without losing any important information. Relax NG supports annotating the grammar with standard and custom namespaces, so you can add standard extensions and extra user defined meta-data to the grammar. That's useful for many applications like user interface generators, programming tools, editors, compilers, data binding, serialization, documentation, etc.

Here's an interesting example of a complex Relax NG application: OpenLaszlo is an XML/JavaScript based programming language, which the Laszlo compiler translates into SWF files for the Flash player. The Laszlo compiler and programming tools use this lzx.rnc Relax NG schema for the OpenLaszlo XML language. This schema contains annotations used by the Laslzo compiler to define the syntax and semantics of the XML based programming language.

The schema starts out by defining a few namespaces:

default namespace = "http://www.laszlosystems.com/2003/05/lzx"
namespace rng = "http://relaxng.org/ns/structure/1.0"
namespace a = "http://relaxng.org/ns/compatibility/annotations/1.0"
datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes"
namespace lza = "http://www.laszlosystems.com/annotations/1.0"

The a: namespace defines some standard annotations like a:defaultValue, and the lza: namespace defines some custom annotations private to the Laszlo compiler like lza:visibility and lza:modifiers. Thanks to the ability to annotate the grammar, much of the syntax and semantics of the Laszlo programming language are defined directly in the Relax NG schema in the compact syntax, so any other tool can read the exact same definition the compiler is using!

To show how truly simple and elegant it is, here is the snake eating its tail: The Relax NG XML syntax, written in the Relax NG compact syntax:

# RELAX NG XML syntax specified in compact syntax.

default namespace rng = "http://relaxng.org/ns/structure/1.0"
namespace local = ""
datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes"

start = pattern

pattern =
  element element { (nameQName | nameClass), (common & pattern+) }
  | element attribute { (nameQName | nameClass), (common & pattern?) }
  | element group|interleave|choice|optional
            |zeroOrMore|oneOrMore|list|mixed { common & pattern+ }
  | element ref|parentRef { nameNCName, common }
  | element empty|notAllowed|text { common }
  | element data { type, param*, (common & exceptPattern?) }
  | element value { commonAttributes, type?, xsd:string }
  | element externalRef { href, common }
  | element grammar { common & grammarContent* }

param = element param { commonAttributes, nameNCName, xsd:string }

exceptPattern = element except { common & pattern+ }

grammarContent = 
  definition
  | element div { common & grammarContent* }
  | element include { href, (common & includeContent*) }

includeContent =
  definition
  | element div { common & includeContent* }

definition =
  element start { combine?, (common & pattern+) }
  | element define { nameNCName, combine?, (common & pattern+) }

combine = attribute combine { "choice" | "interleave" }

nameClass = 
  element name { commonAttributes, xsd:QName }
  | element anyName { common & exceptNameClass? }
  | element nsName { common & exceptNameClass? }
  | element choice { common & nameClass+ }

exceptNameClass = element except { common & nameClass+ }

nameQName = attribute name { xsd:QName }
nameNCName = attribute name { xsd:NCName }
href = attribute href { xsd:anyURI }
type = attribute type { xsd:NCName }

common = commonAttributes, foreignElement*

commonAttributes = 
  attribute ns { xsd:string }?,
  attribute datatypeLibrary { xsd:anyURI }?,
  foreignAttribute*

foreignElement = element * - rng:* { (anyAttribute | text | anyElement)* }
foreignAttribute = attribute * - (rng:*|local:*) { text }
anyElement = element * { (anyAttribute | text | anyElement)* }
anyAttribute = attribute * { text }