SoFunction
Updated on 2025-05-12

Examples of EBNF introduction in rust

In the rust reference manual, there are a lot of similarities:

syntax
MacroInvocation :
	SimplePath ! DelimTokenTree
DelimTokenTree :
	  ( TokenTree* )
	| [ TokenTree* ]
	| { TokenTree* }
TokenTree :
	   Tokenexclude Delimiter(delimiters) | DelimTokenTree
MacroInvocationSemi :
	  SimplePath ! ( TokenTree* ) ;
	| SimplePath ! [ TokenTree* ] ;
	| SimplePath ! { TokenTree* }

Such an abstract thing (Macros - Rust Reference Manual). This kind of thing with a poor reading experience is obviously something created by the academic community. This article talks about how to read such things.

1. What is EBNF?

In computer science, when we describe the structure of a language (such as a programming language, data format, or configuration file), we need an accurate, unambiguous way.Extended Backus-Naur Form (EBNF)It is such a meta-language (language that describes other languages) markup method. It clearly expresses the grammar of a language through a series of strictly defined rules.

EBNF originated from and expandedBackus-Naur Form, BNF. BNF was originally designed by John Backus and Peter Naur to describe the syntax of the ALGOL 60 programming language. EBNF adds some convenient symbols to the BNF, making the syntax description more concise and easy to read.

2. Core concepts

Before learning EBNF, we need to understand several basic concepts:

  • **Rule/Production: The core of **EBNF is rules. Each rule defines how a specific syntax structure is composed of smaller parts. **Non-terminal Symbol: ** represents a syntactic concept or a structure that can be further decomposed. It is usually the name of other rules, which means "a structure that conforms to the definition of a certain rule can be placed here."
  • In EBNF, non-terminants are usually represented by descriptive names, e.g.expression, Statement, number. In some traditional BNFs, nonterminators are angle bracketed< >Surrounded, like<expression>, but in modern EBNF, it is more common to use names directly.
  • **Terminal Symbol: ** represents the smallest, non-redividable lexical unit or literal value in the language. They are specific characters or strings to which the syntax structure is finally implemented. For example, keywords in programming languages ​​(if, while), operator (+, -, *, /), numeric literals (123, 3.14), string literal ("hello") etc. are all terminal symbols. In EBNF, terminators are usually quoted"or'Surround, or write it out directly (if there is no ambiguity).

3. Detailed explanation of EBNF syntax symbols

EBNF organizes rules, non-terminals and terminators through some special symbols:

  • ::=or=(defined as):
    • This is the core operator defined by the rule, read as "defined as" or "consisted of...". It links the non-terminal character on the left (the structure to be defined) with the specific composition of the structure on the right.
    • Example:Number ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
    • This rule defines what a "number" can be.
  • |(or /select):
    • A vertical line indicates "or", which provides multiple possible options to the right of the rule definition. quilt|The separated items are mutually exclusive options.
    • Example:Boolean value::= "true" | "false"The "boolean" can be "true" or "false".
  • Parallel (sequentially connected):
    • When there are multiple symbols (terminal or non-terminal) arranged in sequence to the right of the rule definition, they indicate that these parts must appear in the given order.
    • Example:Assignment statement::= identifier "=" expressionAn "assignment statement" consists of an "identifier", followed by an equal sign terminator, followed by an "expression".
  • ()(Group):

Parentheses are used to enclose a set of symbols to form a logical unit. This allows repeat, optional operators, etc. to be applied to the entire group.

Example:Function call::= Function name "(" (parameter list)? ")"here(parameter list)?Enclosed in brackets to indicate the optional parameter list.

  • ?(Optional / Zero or once):
    • A question mark indicates that the previous symbol or grouping that is immediately adjacent is optional, that is, it can occur zero or once.
    • Example:Integer ::= ("+" | "-")? Sequence of numbersAn "integral" can have an optional sign followed by a "sequence of numbers".
  • *(Repeat zero or multiple times):
  • An asterisk indicates that the previous symbol or grouping that is immediately adjacent can occur zero, once or more times.
  • Example:Identifier ::= letters (letters | numbers)*An "identifier" begins with a "letter" and can be followed by zero or more "letters" or "numbers".
  • +(Repeat once or more):
    • The plus sign indicates that the previous symbol or grouping that is immediately adjacent to must appear at least once or multiple times.
    • Example:Number sequence ::= Number+A "number sequence" consists of at least one "number".
  • [...](Optional grouping - another common representation):
    • Some EBNF variants use square brackets[]to represent an optional part, equivalent to(...)?
    • Example:Parameter list ::= "[" Parameter ("," Parameter)* "]"(Assumed here[and]is the delimiter for the optional parameter list) or:Integer ::= ["+" | "-"] Sequence of numbers(Equivalent to the above("+" | "-")?)
  • {...}(Repeated grouping - another common representation):
    • Some EBNF variants use curly braces{}to represent a part that can be repeated zero or multiple times, equivalent to(...)*
    • Example:Comment ::= "/*" {arbitrary character} "*/"A "comment" by/*Start with zero or more "arbitrary characters" in the middle and*/Finish.
  • Representation of terminator:
    • As mentioned earlier, terminators are usually surrounded by quotes, e.g."if", '+'
    • If the terminator itself is a sequence of characters that does not cause ambiguity (for example, does not contain EBNF special symbols), it can sometimes be written directly.
  • Notes:
    • Different EBNF tools or specifications may have different annotation methods, common ones include(* This is a comment *)Or similar to C language//. This tutorial focuses on the grammar rules themselves.

4. How to read the EBNF rules

  • Find the rule definition:Each rule usually starts with a non-terminal character followed by::=or=
  • From left to right analysis:Read the symbol sequence to the right of the definition.
  • Understand the choice:meet|When knowing this represents one of several options.
  • Note the order:Symbols that are parallel means they must appear in order.
  • Identify duplicates and optional:pay attention to*, +, ?(or[], {}) means.
  • Processing grouping: ()(and sometimes[]or{}) will treat a part of the symbol as a whole.
  • recursion:The right side of the rule can contain the rule itself or other non-terminal characters, and this recursion is the key to defining complex structures such as nested expressions.
  • Distinguish between terminators and non-terminators:Terminators are the "atoms" of language, and non-terminators are "concepts" that need to be further expanded.

5. Example

Let's look at some examples of using EBNF definition:

Example 1: Simple Email Address

EBNF

Email address ::= username "@" domain name
username   ::= character+
domain name     ::= (子domain name ".")**domain name
子domain name   ::= character+
*domain name ::= character+
character     ::= letter | number | "_" | "-"
letter     ::= "a" | ... | "z" | "A" | ... | "Z" // Omit all lettersnumber     ::= "0" | ... | "9"             // 省略所有number
  • This example defines the structure of the "email address".
  • usernameandcharacterUsed+Indicates at least one.
  • domain nameIn-house(Subdomain ".")*Indicates that there can be zero or more subdomains separated by dots.

Example 2: Arithmetic Expressions (simplified version)

EBNF

expression ::= item (("+" | "-") item)*
item     ::= factor (("*" | "/") factor)*
factor   ::= number | "(" Expression ")"
number   ::= ("0"|"1"|"2"|"3"|"4"|"5"|"6"|"7"|"8"|"9")+
  • This example shows how to define an arithmetic expression containing operator priority and parentheses.
  • expressionanditemThe definition is recursive (factorIncluded inexpression), which allows nesting.
  • (("+" | "-") item)*Indicates that there can be zero or more "items" that are connected by plus or minus signs.

Example 3: A simple list structure

EBNF

List ::= "[" [element ("," element)*] "]"
element ::= Identifier | number
Identifier ::= letter (letter | number)*
// letter和number的定义同上
  • ListSurrounded by square brackets.
  • [Element ("," element)*]Indicates that the contents in square brackets are optional (because the whole is[]Package, here[]It is an optional symbol for EBNF, not a delimiter for lists—for clarity, we can also write it as(Element ("," element)*)?)。
  • If the list is not empty, there is at least one "element", and there may be zero or more "elements" separated by commas.

6. Advantages and limitations

  • advantage:
    • Accuracy:Ability to describe complex syntactic structures without ambiguity.
    • readability:Compared to other formal methods, EBNF is relatively easy to read and understand (especially with extended symbols).
    • standardization:Although there are some variations, the core concepts are common.
    • Tool support:There are many tools (such as parser generator ANTLR, YACC/Bison) that can automatically generate parsed code using EBNF or similar syntax definitions directly.
  • Limitations:
    • Context-independent:EBNF mainly describes context-free grammar. It usually does not directly deal with semantic rules that depend on the context (e.g. variables must be declared before use, type matching, etc.). These usually require additional semantic analysis to handle.
    • Ambiguity:Although the goal is unambiguous, it is possible to write an ambiguous EBNF rule (i.e., an input string can have multiple ways of parsing). Disambiguation sometimes requires rewriting rules or specific policies that depend on parser (such as operator priority and conjunction declarations).
    • Redundancy and complexity:Once there are too many contents defined in it, it is difficult to read.

This is the end of this article about EBNF introduction in rust. For more information about EBNF introduction in rust, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!