\[ \newcommand{\ListType}{{\text{List}}} \newcommand{\ListEmpty}{{\left[\right]}} \newcommand{\ListLength}[1]{{\ell\left(#1\right)}} \newcommand{\ListAt}[2]{{{#1}_{#2}}} \]
Home

Common Lexical Translations

The Common Lexical Translations specification a lexical translation, that is, it defines translations rules for how an input sequence of Unicode code points is translated into an output sequence of transformed and classified sequences of Unicode code points called words. The rules for provided by the Common Lexical Translations specification are reused by other specifications within Michael Heilmann’s Arcadia.

1. Introduction

The Common Lexical Translations specification a lexical translation, that is, it defines translations rules for how an input sequence of Unicode code points is translated into an output sequence of transformed and classified sequences of Unicode code points called words. The rules for provided by the Common Lexical Translations specification are reused by other specifications within Michael Heilmann’s Arcadia.

The rules layed out here are described in terms of the concepts and notations of the Context-Free Grammars specification (see https://michaelheilmann.com/specifications/context-free-grammars for more information).

A grammar incorporating these rules must have the set of all Unicode code points as its input alphabet. Furthermore, the grammar must ensure that ambiguities are resolved.

2 Standard Profile

The Full Profile contains all possible rules. Other profiles may be added in future versions. Language designers are encouraged to create their own profiles.

2.1 word

The word word is defined by

Lexical.Word : Lexical.Period
Lexical.Word : Lexical.Semicolon
Lexical.Word : Lexical.Boolean
Lexical.Word : Lexical.Number
Lexical.Word : Lexical.String
Lexical.Word : Lexical.Void
Lexical.Word : Lexical.Name
Lexical.Word : Lexical.LeftCurlyBracket
Lexical.Word : Lexical.RightCurlyBracket
Lexical.Word : Lexical.LeftSquareBracket
Lexical.Word : Lexical.RightSquareBracket
Lexical.Word : Lexical.Comma
Lexical.Word : Lexical.Colon
Lexical.Word : Lexical.Whitespace
Lexical.Word : Lexical.Newline
Lexical.Word : Lexical.Comment

2.2 whitespace

The word whitespace is defined by

/* #9 is also known as "CHARACTER TABULATION" */
Lexical.Whitespace : #9
/* #20 is also known as "SPACE" */
Lexial.Whitespace : #20

2.3 line terminator

The word Lexical.LineTerminator is defined by

/* #a is also known as "LINEFEED (LF)" */
/* #d is also known as "CARRIAGE RETURN (CR)" */
Lexical.LineTerminator : #a {#d}
Lexical.LineTerminator : #d {#a}

2.4 comments

The language using the Common Lexical Specification may use both single-line comments and multi-line comments. A Lexical.Comment is either a single_line_comment or a Lexical.MultiLineComment. Lexical.MultiLineComment is defined by

Lexical.Comment : Lexical.SingleLineComment Lexical.Comment : Lexical.MultiLineComment

A Lexical.SingleLineComment starts with two solidi. It extends to the end of the line. Lexical.SingleLinecomment is defined by

/* #2f is also known as SOLIDUS */ Lexical.SingleLineComment : #2f #2f /* any sequence of characters except for line_terminator */

The Lexical.LineTerminator is not considered as part of the comment text.

A Lexical.MultiLineComment is opened by a solidus and an asterisk and closed by an asterisk and a solidus. Lexical.MultiLineComment is defined by

/* #2f is also known as SOLIDUS */
/* #2a is also known as ASTERISK */
Lexical.MultiLineComment :
#2f #2a
/* any sequence of characters except except for #2a #2f */
#2a #2f

The #2f #2a and #2a #2f sequences are not considered as part of the comment text.

This implies:

2.5 parentheses

The words Lexical.LeftParenthesis and Lexical.RightParenthesis, respectively, are defined by

/* #28 is also known as "LEFT PARENTHESIS" */
Lexical.LeftParenthesis : #28
/* #29 is also known as "RIGHT PARENTHESIS" */
Lexical.RightParenthesis : #29

2.6 curly brackets

The words Lexical.LeftCurlyBracket and Lexical.RightCurlyBracket, respectively, are defined by

/* #7b is also known as "LEFT CURLY BRACKET" */
Lexical.LeftCurlyBracket : #7b
/* #7d is also known as "RIGHT CURLY BRACKET" */
Lexical.RightCurlyBracket : #7d

2.7 colon

The word Lexical.Colon is defined by

/* #3a is also known as "COLON" */
Lexical.Colon : #3a

2.8 square brackets

The words Lexical.LeftSquareBracket and Lexica.RightSquareBracket, respectively, are defined by

/* #5b is also known as "LEFT SQUARE BRACKET" */
Lexical.LeftSquareBracket : #5b
/* #5d is also known as "RIGHT SQUARE BRACKET" */
Lexical.RightSquareBracket : #5d

2.9 comma

The word Lexical.Comma is defined by

/* #2c is also known as "COMMA" */
Lexical.Comma : #2c

2.10 name

The word Lexical.Name is defined by

Lexical.Name : {Lexical.Underscore} Lexical.Alphabetic {Lexical.NameSuffixCharacter}

/* #41 is also known as "LATIN CAPITAL LETTER A" */
/* #5a is also known as "LATIN CAPITAL LETTER Z" */
/* #61 is also known as "LATIN SMALL LETTER A" */
/* #7a is also known as "LATIN SMALLER LETTER Z" */
Lexical.NameSuffixCharacter : /* The unicode characters from #41 to #5a and from #61 to #7a. */

/* #30 is also known as "DIGIT ZERO" */
/* #39 is also known as "DIGIT NINE" */
Lexical.NameSuffixCharacter : /* The unicode characters from #30 to #39. */

/* #5f is also known as "LOW LINE" */
Lexical.NameSuffixCharacter : #5f

2.10 number literal

The word Lexical.Number is defined by

Lexical.Number : Lexical.IntegerNumber
Lexical.Number : Lexical.RealNumber
Lexical.IntegerNumber : [Lexical.Sign] Lexical.Digit {Lexical.Digit}
Lexical.RealNumber : [Lexical.Sign] Lexical.Period Lexical.Digit {Lexical.Digit} [Lexical.Exponent]
Lexical.RealNumber : [Lexical.Sign] Lexical.Digit {Lexical.Digit} [Lexical.Period {Lexical.Digit}] [Lexical.Exponent]
Lexical.Exponent : Lexical.ExponentPrefix [Lexical.Sign] Lexical.Digit {Lexical.Digit}

/* #2b is also known as "PLUS SIGN" */
Lexical.Sign : #2b
/* #2d is also known as "MINUS SIGN" */
Lexical.Sign : #2d
/* #65 is also known as "LATIN SMALL LETTER E" */
Lexical.ExponentPrefix : #65
/* #45 is also known as "LATIN CAPITAL LETTER E" */
Lexical.ExponentPrefix : #45

2.11 string literal

The word Lexical.String is defined by

Lexical.String : Lexical.SingleQuotedString
Lexical.String : Lexical.DoubleQuotedString

Lexical.DoubleQuotedString : Lexical.DoubleQuote {Lexical.DoubleQuotedStringCharacter} Lexical.DoubleQuote
Lexical.DoubleQuotedStringCharacter : /* any character except for Lexical.Newline and Lexical.DoubleQuote and characters in [0,1F]*/
Lexical.DoubleQuotedStringCharacter : Lexical.EscapeSequence
Lexical.DoubleQuotedStringCharacter : #5c Lexical.DoubleQuote
/* #22 is also known as "QUOTATION MARK" */
Lexical.DoubleQuote : #22

Lexical.SingleQuotedString : Lexical.SingleQuote {Lexical.SingleQuotedStringCharacter} Lexical.SingleQuote
Lexical.SingleQuotedStringCharacter : /* any character except for Lexical.Newline and Lexical.SingleQuote and characters in [0,1F]*/
Lexical.SingleQuotedStringCharacter : Lexical.EscapeSequence
Lexical.SingleQuotedStringCharacter : #5c Lexical.SingleQuote
/* #27 is also known as "APOSTROPHE" */
Lexical.SingleQuote : #27

/* #5c is also known as "REVERSE SOLIDUS", #75 is also known as 'LATIN SMALL LETTER U*/
Lexical.EscapeSequence : #5c 'u' Lexical.HexadecimalDigit Lexical.HexadecimalDigit Lexical.HexadecimalDigit Lexical.HexadecimalDigit
/* #5c is also known as "REVERSE SOLIDUS" */
Lexical.EscapeSequence : #5c #5c
/* #64 is also known as "LATIN SMALL LETTER B" */
Lexical.EscapeSequence : #5c #64
/* #66 is also known as "LATIN SMALL LETTER F" */
Lexical.EscapeSequence : #5c #66
/* #6e is also known as "LATIN SMALL LETTER N" */
Lexical.EscapeSequence : #5c #6e
/* #72 is also known as "LATIN SMALL LETTER R" */
Lexical.EscapeSequence : #5c #72
/* #74 is also known as "LATIN SMALL LETTER T" */
Lexical.EscapeSequence : #5c #75

In the lexical translation, several transformations are performed upon the word: