Common Lexical Specification

This Common Lexical Specification provides definitions of grammar rules being re-used in multiple language specifications on this website. This document consists of three sections: Section 1 defines how programs are encoded on a Byte level. Section 2 provides an introduction into grammars. Section 3 provides the full profile lexical grammar. Section 4 provides information on profiles.

1. Unicode

A program is a sequence of Unicode code points encoded into a sequence of Bytes using an Unicode encoding. In this version, only UTF-8 NOBOM with sequences of length 1 is supported. The Unicode encoding of a particular program must be determined by consumers of this specification.

2. Grammars

This section describes context-free grammars used in this specification to define the lexical and syntactical structure of a language.

2.1 Context-free grammars

A context-free grammar consists of a number of production. Each production has an abstract symbol called a non-terminal as its left-hand side, and a sequence of one or more non-terminal and terminal symbols as its right-hand side. For each grammar, the terminal symbols are drawn from a specified alphabet.

Starting from a sequence consisting of a single distinguished non-terminal, called the goal symbol, a given context-free grammar specifies a language, namely, the set of possible sequences of terminal smbols that can result from repeatedly replacing any non-terminal in the sequence with a right-hand side of a production for which the non-terminal is the left-hand side.

2.3 Lexical grammars

The lexical grammar uses the Unicode code points from the Unicode decoding phase as its terminal symbols. The non-terminals of the lexical grammar start with the prefix Lexical.. It defines a set of productions, starting from the goal symbol Lexical.Word, that describe how sequences of code points are translated into a word.

2.4 Syntactical grammars

The syntactical grammar for the Data Definition Language uses words of the lexical grammar as its terminal symbols. The non-terminals of the syntactical grammar start with the prefix Syntactical.. It defines a set of productions, starting from the goal symbol sentence, that describe how sequences of words are translated into a sentence.

2.5 Grammar notation

Productions are written in fixed width fonts.

A production is defined by its left-hand side, followed by a colon , followed by its right-hand side definition. The left-hand side is the name of the non-terminal defined by the production. Multiple alternating definitions of a production may be defined. The right-hand side of a production consits of any sequence of terminals and non-terminals. In certain cases the right-hand side is replaced by a comment describing the right-hand side. This comment is opened by /* and closed by */.

The following production denotes the non-terminal for a digit as used in the definitions of numerals:

Lexical.Digit: /* A single Unicode symbol from the code point range +U0030 to +U0039 */

A terminal is a sequence of Unicode symbols. A Unicode symbol is denoted by a shebang # followed by a hexadecimal number denoting its code point.

The following productions denote the non-terminal for a sign as used in the definitions of numerals:

/* #2b is also known as "PLUS SIGN" */ Lexical.PlusSign : #2b /* #2d is also known as "MINUS SIGN" */ Lexical.MinusSign : #2d sign : plus_sign Lexical.Sign : Lexical.PlusSign | Lexical.MinusSign

The syntax {x} on the right-hand side of a production denotes zero or more occurrences of x.

The following production defines a possibly empty sequence of digits as used in the definitions of numerals:

Lexical.ZeroOrMoreDigits : {Lexical.Digit}

The syntax [x] on the right-hand side of a production denotes zero or one occurrences of x.

The following productions denotes a possible definition of an integer numeral. It consists of an optional sign followed by a digit followed by zero or more digits (as defined in the previous example):

Lexical.Integer : [Lexical.Sign] Lexical.Digit Lexical.ZeroOrMoreDigits

The empty string is denoted by ε.

The following productions denotes a possibly empty list of integers (with integer as defined in the preceeding example). Note that this list may include a trailing comma hence the {x} operator cannot be used here.

Syntactical.IntegerList : integer Syntactical.IntegerListRest Syntactical.IntegerList : ε Syntactical.IntegerListRest : Lexical.Comma Syntactical.Integer Syntactical.IntegerListRest Syntactical.IntegerListRest : Lexical.Comma Syntactical.IntegerListRest : ε /* #2c is also known as "COMMA" */ Lexical.Comma : #2c

3 Full Profile Lexical Specification

The lexical grammar describes the translation of Unicode code points into words. The goal non-terminal of the lexical grammar is the Lexical.Word symbol.

3.1 word

The word word is defined by

Lexical.Word : Lexical.Period Lexical.Word : Lexical.Semicolon Lexical.Word : Lexical.Boolean Lexical.Word : Lexical.Number Lexical.Word : Lexical.String Lexical.Word : Lexical.Void Lexical.Word : Lexical.Name Lexical.Word : Lexical.LeftCurlyBracket Lexical.Word : Lexical.RightCurlyBracket Lexical.Word : Lexical.LeftSquareBracket Lexical.Word : Lexical.RightSquareBracket Lexical.Word : Lexical.Comma Lexical.Word : Lexical.Colon /*whitespace, newline, and comment are not considered the syntactical grammar*/ Lexical.Word : Lexical.Whitespace Lexical.Word : Lexical.Newline Lexical.Word : Lexical.Comment

3.2 whitespace

The word whitespace is defined by

/* #9 is also known as "CHARACTER TABULATION" */ Lexical.Whitespace : #9 /* #20 is also known as "SPACE" */ Lexial.Whitespace : #20

3.3 line terminator

The word Lexical.LineTerminator is defined by

/* #a is also known as "LINEFEED (LF)" */ /* #d is also known as "CARRIAGE RETURN (CR)" */ Lexical.LineTerminator : #a {#d} Lexical.LineTerminator : #d {#a}

3.4 comments

The language using the Common Lexical Specification may use both single-line comments and multi-line comments. A Lexical.Comment is either a single_line_comment or a Lexical.MultiLineComment. Lexical.MultiLineComment is defined by

Lexical.Comment : Lexical.SingleLineComment Lexical.Comment : Lexical.MultiLineComment

A Lexical.SingleLineComment starts with two solidi. It extends to the end of the line. Lexical.SingleLinecomment is defined by

/* #2f is also known as SOLIDUS */ Lexical.SingleLineComment : #2f #2f /* any sequence of characters except for line_terminator */

The Lexical.LineTerminator is not considered as part of the comment text.

A Lexical.MultiLineComment is opened by a solidus and an asterisk and closed by an asterisk and a solidus. Lexical.MultiLineComment is defined by

/* #2f is also known as SOLIDUS */ /* #2a is also known as ASTERISK */ Lexical.MultiLineComment : #2f #2a /* any sequence of characters except except for #2a #2f */ #2a #2f

The #2f #2a and #2a #2f sequences are not considered as part of the comment text.

This implies:

#2f #2f has no special meaning either comment.
#2f #2a and #2a #2f have no special meaning in single-line comments.
Multi-line comments do not nest.

3.5 parentheses

The words Lexical.LeftParenthesis and Lexical.RightParenthesis, respectively, are defined by

/* #28 is also known as "LEFT PARENTHESIS" */ Lexical.LeftParenthesis : #28 /* #29 is also known as "RIGHT PARENTHESIS" */ Lexical.RightParenthesis : #29

3.6 curly brackets

The words Lexical.LeftCurlyBracket and Lexical.RightCurlyBracket, respectively, are defined by

/* #7b is also known as "LEFT CURLY BRACKET" */ Lexical.LeftCurlyBracket : #7b /* #7d is also known as "RIGHT CURLY BRACKET" */ Lexical.RightCurlyBracket : #7d

3.7 colon

The word Lexical.Colon is defined by

/* #3a is also known as "COLON" */ Lexical.Colon : #3a

3.8 square brackets

The words Lexical.LeftSquareBracket and Lexica.RightSquareBracket, respectively, are defined by

/* #5b is also known as "LEFT SQUARE BRACKET" */ Lexical.LeftSquareBracket : #5b /* #5d is also known as "RIGHT SQUARE BRACKET" */ Lexical.RightSquareBracket : #5d

3.9 comma

The word Lexical.Comma is defined by

/* #2c is also known as "COMMA" */ Lexical.Comma : #2c

3.10 name

The word Lexical.Name is defined by

Lexical.Name : {Lexical.Underscore} Lexical.Alphabetic {Lexical.NameSuffixCharacter} /* #41 is also known as "LATIN CAPITAL LETTER A" */ /* #5a is also known as "LATIN CAPITAL LETTER Z" */ /* #61 is also known as "LATIN SMALL LETTER A" */ /* #7a is also known as "LATIN SMALLER LETTER Z" */ Lexical.NameSuffixCharacter : /* The unicode characters from #41 to #5a and from #61 to #7a. */ /* #30 is also known as "DIGIT ZERO" */ /* #39 is also known as "DIGIT NINE" */ Lexical.NameSuffixCharacter : /* The unicode characters from #30 to #39. */ /* #5f is also known as "LOW LINE" */ Lexical.NameSuffixCharacter : #5f

3.10 number literal

The word Lexical.Number is defined by

Lexical.Number : Lexical.IntegerNumber Lexical.Number : Lexical.RealNumber Lexical.IntegerNumber : [Lexical.Sign] Lexical.Digit {Lexical.Digit} Lexical.RealNumber : [Lexical.Sign] Lexical.Period Lexical.Digit {Lexical.Digit} [Lexical.Exponent] Lexical.RealNumber : [Lexical.Sign] Lexical.Digit {Lexical.Digit} [Lexical.Period {Lexical.Digit}] [Lexical.Exponent] Lexical.Exponent : Lexical.ExponentPrefix [Lexical.Sign] Lexical.Digit {Lexical.Digit} /* #2b is also known as "PLUS SIGN" */ Lexical.Sign : #2b /* #2d is also known as "MINUS SIGN" */ Lexical.Sign : #2d /* #65 is also known as "LATIN SMALL LETTER E" */ Lexical.ExponentPrefix : #65 /* #45 is also known as "LATIN CAPITAL LETTER E" */ Lexical.ExponentPrefix : #45

3.11 string literal

The word Lexical.String is defined by

Lexical.String : Lexical.SingleQuotedString Lexical.String : Lexical.DoubleQuotedString Lexical.DoubleQuotedString : Lexical.DoubleQuote {Lexical.DoubleQuotedStringCharacter} Lexical.DoubleQuote Lexical.DoubleQuotedStringCharacter : /* any character except for Lexical.Newline and Lexical.DoubleQuote and characters in [0,1F]*/ Lexical.DoubleQuotedStringCharacter : Lexical.EscapeSequence Lexical.DoubleQuotedStringCharacter : #5c Lexical.DoubleQuote /* #22 is also known as "QUOTATION MARK" */ Lexical.DoubleQuote : #22 Lexical.SingleQuotedString : Lexical.SingleQuote {Lexical.SingleQuotedStringCharacter} Lexical.SingleQuote Lexical.SingleQuotedStringCharacter : /* any character except for Lexical.Newline and Lexical.SingleQuote and characters in [0,1F]*/ Lexical.SingleQuotedStringCharacter : Lexical.EscapeSequence Lexical.SingleQuotedStringCharacter : #5c Lexical.SingleQuote /* #27 is also known as "APOSTROPHE" */ Lexical.SingleQuote : #27 /* #5c is also known as "REVERSE SOLIDUS", #75 is also known as 'LATIN SMALL LETTER U*/ Lexical.EscapeSequence : #5c 'u' Lexical.HexadecimalDigit Lexical.HexadecimalDigit Lexical.HexadecimalDigit Lexical.HexadecimalDigit /* #5c is also known as "REVERSE SOLIDUS" */ Lexical.EscapeSequence : #5c #5c /* #64 is also known as "LATIN SMALL LETTER B" */ Lexical.EscapeSequence : #5c #64 /* #66 is also known as "LATIN SMALL LETTER F" */ Lexical.EscapeSequence : #5c #66 /* #6e is also known as "LATIN SMALL LETTER N" */ Lexical.EscapeSequence : #5c #6e /* #72 is also known as "LATIN SMALL LETTER R" */ Lexical.EscapeSequence : #5c #72 /* #74 is also known as "LATIN SMALL LETTER T" */ Lexical.EscapeSequence : #5c #75

3.12 boolean literal

The word Lexical.Boolean is defined by

Lexical.Boolean : Lexical.True Lexical.Boolean : Lexical.False true : #74 #72 #75 #65 false : #66 #61 #6c #73 #65

Remark: The word Lexical.Boolean is a so called keyword. It takes priority over the Lexical.Name.

3.13 void literal

The word Lexical.Void is defined by

Lexical.Void : #76 #6f # #69 #64

Remark: The word Lexical.Void is a so called keyword. It takes priority over the Lexical.Name.

3.14 decimal digit

The word Lexical.DecimalDigit is defined by

Lexical.DecimalDigit : /* A single Unicode character from the code point range +U0030 to +U0039. */

3.15 hexadecimal digit

The word Lexical.HexadecimalDigit is defined by

Lexical.HexadecimalDigit : /* A single Unicode character from the code point range +U0030 to +U0039, +U0061 to +U007A, U+0041 to U+005A*/

3.16 alphanumeric

The word Lexical.Alphanumeric is reserved for future use.

3.17 period

The word Lexical.Period is defined by

/* #2e is also known as "FULL STOP" */ Lexical.Period : 2e