Andy
2023-04-25 13:51:55 UTC
I am writing C grammar.
Grammar speed may be down caused by deep expression chain.
For example, simple "n=0" has 20 levels:
assignmentExpression
conditionalExpression
logicalOrExpression
logicalAndExpression
inclusiveOrExpression
exclusiveOrExpression
andExpression
eqaulityExpression
relationalExpression
shiftExpression
additiveExpression
multiplicativeExpression
castExpression
unaryExpression
postfixExpression
postfixExpressionLeft
atom
literal
IntegerConstant
DeciamlConstant
In other hand, Rust grammar for Antlr4
https://github.com/antlr/grammars-v4/blob/master/rust/RustParser.g4
has precedence definition:
expression
: outerAttribute+ expression # AttributedExpression // technical, remove left recursive
| literalExpression # LiteralExpression_
| pathExpression # PathExpression_
| expression '.' pathExprSegment '(' callParams? ')' # MethodCallExpression // 8.2.10
| expression '.' identifier # FieldExpression // 8.2.11
| expression '.' tupleIndex # TupleIndexingExpression // 8.2.7
| expression '.' 'await' # AwaitExpression // 8.2.18
..............................
If I do similar in C grammar:
expr
: atom # atom_
| '__extension__'? '(' compoundStatement ')' # groupedExpression
| expr '(' argumentExpressionList? ')' # postfixExpression
| expr '.' Identifier # postfixExpression
| expr '->' Identifier # postfixExpression
| expr '[' expr ']' # postfixExpression
parsing time increased by orders of magnitude: short .c files was processed longer a long file I didn't wait for it to finish (probably increasing nonlinear)
My ask:
form expr: expr operator expr
is always ambiguous and waste time or I something bad have written?
How is possible to reduce depth of chain?
Parsing time with Antlr is much longer than parsing + semantic actions with GCC. Probably big part of this time is long depth of expressions.
[If your parser is taking a lot of time, something is wrong. In the
compilers I know, the pareer is always a tiny part of the work, which
is dominated by the lexer which has to look at each character in the
input, and optimizers that have to make multiple passes over the whole
intermediate representation. -John]
Grammar speed may be down caused by deep expression chain.
For example, simple "n=0" has 20 levels:
assignmentExpression
conditionalExpression
logicalOrExpression
logicalAndExpression
inclusiveOrExpression
exclusiveOrExpression
andExpression
eqaulityExpression
relationalExpression
shiftExpression
additiveExpression
multiplicativeExpression
castExpression
unaryExpression
postfixExpression
postfixExpressionLeft
atom
literal
IntegerConstant
DeciamlConstant
In other hand, Rust grammar for Antlr4
https://github.com/antlr/grammars-v4/blob/master/rust/RustParser.g4
has precedence definition:
expression
: outerAttribute+ expression # AttributedExpression // technical, remove left recursive
| literalExpression # LiteralExpression_
| pathExpression # PathExpression_
| expression '.' pathExprSegment '(' callParams? ')' # MethodCallExpression // 8.2.10
| expression '.' identifier # FieldExpression // 8.2.11
| expression '.' tupleIndex # TupleIndexingExpression // 8.2.7
| expression '.' 'await' # AwaitExpression // 8.2.18
..............................
If I do similar in C grammar:
expr
: atom # atom_
| '__extension__'? '(' compoundStatement ')' # groupedExpression
| expr '(' argumentExpressionList? ')' # postfixExpression
| expr '.' Identifier # postfixExpression
| expr '->' Identifier # postfixExpression
| expr '[' expr ']' # postfixExpression
parsing time increased by orders of magnitude: short .c files was processed longer a long file I didn't wait for it to finish (probably increasing nonlinear)
My ask:
form expr: expr operator expr
is always ambiguous and waste time or I something bad have written?
How is possible to reduce depth of chain?
Parsing time with Antlr is much longer than parsing + semantic actions with GCC. Probably big part of this time is long depth of expressions.
[If your parser is taking a lot of time, something is wrong. In the
compilers I know, the pareer is always a tiny part of the work, which
is dominated by the lexer which has to look at each character in the
input, and optimizers that have to make multiple passes over the whole
intermediate representation. -John]