-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Java_clean_changes
Andreas Dangel edited this page Jan 23, 2023
·
34 revisions
This is WIP detailed doc about individual AST changes.
-
- Import and package declarations
- Modifier lists
- Flattened body declarations
- Module declarations
- todo: new node for anonymous class
-
- TODO:
- statements are flattened (no BlockStatement, Statement nodes)
- new node for ForeachStatement
- New nodes for ExpressionStatement, LocalClassStatement
- Improve try-with-resources grammar
- TODO:
-
- TODO:
- Literals
- Method calls, constructor call, array allocation
- Field access, array access, variable access
- this/super expression
- TypeExpression
- ...
- Binary operators are left-recursive
- Merge unary expressions
- TODO:
- What: Annotations are consolidated into a single node. SingleMemberAnnotation, NormalAnnotation and MarkerAnnotation are removed in favour of Annotation. The Name node is removed, replaced by a ClassOrInterfaceType.
- Why: Those different node types implement a syntax-only distinction, that only makes semantically equivalent annotations have different possible representations. For example,
@A
and@A()
are semantically equivalent, yet they were parsed as MarkerAnnotation resp. NormalAnnotation. Similarly,@A("")
and@A(value="")
were parsed as SingleMemberAnnotation resp. NormalAnnotation. This also makes parsing much simpler. The nested ClassOrInterface type is used to share the disambiguation logic. - #2282 [java] Use single node for annotations
Code | Old AST | New AST |
---|---|---|
@A |
+ Annotation
+ MarkerAnnotation
+ Name "A" |
+ Annotation
+ ClassOrInterfaceType "A" |
@A() |
+ Annotation
+ NormalAnnotation
+ Name "A" |
+ Annotation "A"
+ ClassOrInterfaceType "A"
+ AnnotationMemberList |
@A(value="v") |
+ Annotation
+ NormalAnnotation
+ Name "A"
+ MemberValuePairs
+ MemberValuePair "value"
+ MemberValue
+ PrimaryExpression
+ PrimaryPrefix
+ Literal '"v"' |
+ Annotation "A"
+ ClassOrInterfaceType "A"
+ AnnotationMemberList
+ MemberValuePair "value" [@Shorthand=false()]
+ StringLiteral '"v"' |
@A("v") |
+ Annotation
+ SingleMemberAnnotation
+ Name "A"
+ MemberValue
+ PrimaryExpression
+ PrimaryPrefix
+ Literal '"v"' |
+ Annotation "A"
+ ClassOrInterfaceType "A"
+ AnnotationMemberList
+ MemberValuePair "value" [@Shorthand=true()]
+ StringLiteral '"v"' |
@A(value="v", on=true) |
+ Annotation
+ NormalAnnotation
+ Name "A"
+ MemberValuePairs
+ MemberValuePair "value"
+ MemberValue
+ PrimaryExpression
+ PrimaryPrefix
+ Literal '"v"'
+ MemberValuePair "on"
+ MemberValue
+ PrimaryExpression
+ PrimaryPrefix
+ Literal
+ BooleanLiteral [@True=true()] |
+ Annotation "A"
+ ClassOrInterfaceType "A"
+ AnnotationMemberList
+ MemberValuePair "value" [@Shorthand=false()]
+ StringLiteral '"v"'
+ MemberValuePair "on"
+ BooleanLiteral [@True=true()] |
- What: Annotations are now nested within the node, to which they are applied to. E.g. if a method is annotated, the Annotation node is now a child of a ModifierList, inside the MethodDeclaration.
- Why: Fixes a lot of inconsistencies, where sometimes the annotations were inside the node, and sometimes just somewhere in the parent, with no real structure.
- #1875 [java] Move annotations inside the node they apply to
Code | Old AST | New AST |
---|---|---|
Method @A
public void set(int x) { } |
|
|
Top-level type declaration @A class C {} |
|
|
Cast expression (@A T.@B S) expr |
N/A (Parse error) |
|
Cast expression with intersection (@A T & S) expr |
|
Notice |
Constructor call new @A T() |
|
|
Array allocation new @A int[0] |
|
|
Array type @A int @B[] |
N/A (parse error) |
Notice |
Type parameters <@A T, @B S extends @C Object> |
|
|
Enum constants enum {
@A E1, @B E2
} |
|
|
- What: those two nodes are turned into interfaces, implemented by concrete syntax nodes. See their javadoc for exactly what nodes implement them.
- Why:
- some syntactic contexts only allow reference types, other allow any kind of type. If you want to match all types of a program, then matching Type would be the intuitive solution. But in 6.0.x, it wouldn't have sufficed, since in some contexts, no Type node was pushed, only a ReferenceType
- Regardless of the original syntactic context, any reference type is a type, and searching for ASTType should yield all the types in the tree.
- Using interfaces allows to abstract behaviour and make a nicer and safer API.
Code | Old AST | New AST |
---|---|---|
// in the context of a variable declaration
List<String> strs; |
|
ClassOrInterfaceType implements ASTReferenceType, which implements ASTType. |
What: Additional nodes ArrayType
, ArrayTypeDim
, ArrayTypeDims
, ArrayAllocation
.
Why: Support annotated array types (#997 Java8 parsing corner case with annotated array types)
Examples:
Code | Old AST | New AST |
---|---|---|
String[][] myArray; |
|
|
String @Annotation1[] @Annotation2[] myArray; |
n/a (parse error) |
|
new int[2][]
new @Bar int[3][2]
new Foo[] { f, g } |
|
|
- What: ClassOrInterfaceType is now left-recursive, and encloses its qualifying type.
- Why: To preserve the position of annotations and type arguments
Code | Old AST | New AST |
---|---|---|
Map.Entry<K,V> |
|
|
First<K>.Second.Third<V> |
|
|
- What:
- TypeArgument is removed. Instead, the TypeArguments node contains directly a sequence of Type nodes. To support this, the new node type WildcardType captures the syntax previously parsed as a TypeArgument.
- The WildcardBounds node is removed. Instead, the bound is a direct child of the WildcardType.
- Why: Because wildcard types are types in their own right, and having a node to represent them skims several levels of nesting off.
Code | Old AST | New AST |
---|---|---|
Entry<String, ? extends Node> |
|
|
List<?> |
|
|
- What: Remove the Name node in imports and package declaration nodes.
- Why: Name is a TypeNode, but it's equivalent to AmbiguousName in that it describes nothing about what it represents. The name in an import may represent a method name, a type name, a field name... It's too ambiguous to treat in the parser and could just be the image of the import, or package, or module.
- #1888 [java] Remove Name nodes in Import- and PackageDeclaration
Code | Old AST | New AST |
---|---|---|
import java.util.ArrayList;
import static java.util.Comparator.reverseOrder;
import java.util.*; |
|
|
package com.example.tool; |
|
|
- What: AccessNode is now based on a node: ModifierList. That node represents modifiers occurring before a declaration. It provides a flexible API to query modifiers, both explicit and implicit. All declaration nodes now have such a modifier list, even if it's implicit (no explicit modifiers).
- Why: AccessNode gave a lot of irrelevant methods to its subtypes. E.g.
ASTFieldDeclaration::isSynchronized
makes no sense. Now, these irrelevant methods don't clutter the API. The API of ModifierList is both more general and flexible - See #2259 [java] Rework AccessNode
Code | Old AST | New AST |
---|---|---|
Method @A
public void set(final int x, int y) { } |
|
|
Top-level type declaration public @A class C {} |
|
|
- What: Removes ClassOrInterfaceBodyDeclaration, TypeDeclaration, and AnnotationTypeMemberDeclaration. These were unnecessary since annotations are nested (see above Annotation nesting).
- Why: This flattens the tree, makes it less verbose and simpler.
- #2300 [java] Flatten body declarations
Code | Old AST | New AST |
---|---|---|
public class Flat {
private int f;
} |
|
|
public @interface FlatAnnotation {
String value() default "";
} |
|
|
- What: Removes the generic Name node and uses instead ClassOrInterfaceType where appropriate. Also uses specific node types for different directives (requires, exports, uses, provides).
- Why: Simplify queries, support type resolution
- #3890 [java] Improve module grammar
Code | Old AST | New AST |
---|---|---|
open module com.example.foo {
requires com.example.foo.http;
requires java.logging;
requires transitive com.example.foo.network;
exports com.example.foo.bar;
exports com.example.foo.internal to com.example.foo.probe;
uses com.example.foo.spi.Intf;
provides com.example.foo.spi.Intf with com.example.foo.Impl;
} |
|
|
- What: Simplify and align the grammar used for method and constructor declarations. The methods in an annotation type are now also method declarations.
- Why: The method declaration had an nested node "MethodDeclarator", which was not available for constructor declarations. This made it difficult to write rules, that concern both methods and constructors without explicitly differentiate between these two.
- #2034 [java] Align method and constructor declaration grammar
Code | Old AST | New AST |
---|---|---|
public class Sample {
public Sample(int arg) throws Exception {
super();
greet(arg);
}
public void greet(int arg) throws Exception {
System.out.println("Hello");
}
} |
|
|
public @interface MyAnnotation {
int value() default 1;
} |
|
|
- What: Use FormalParameter only for method and constructor declaration. Lambdas use LambdaParameter, catch clauses use CatchParameter
- Why: FormalParameter's API is different from the other ones.
- FormalParameter must mention a type node.
- LambdaParameter can be inferred
- CatchParameter cannot be varargs
- CatchParameter can have multiple exception types (a UnionType now)
Code | Old AST | New AST |
---|---|---|
try {
} catch (@A IOException | IllegalArgumentException e) {
} |
|
|
(a, b) -> {}
c -> {}
(@A var d) -> {}
(@A int e) -> {} |
|
|
- What: A separate node type
ReceiverParameter
is introduced to differentiate it from formal parameters. - Why: A receiver parameter is not a formal parameter, even though it looks like one: it doesn't declare a variable, and doesn't affect the arity of the method or constructor. It's so rarely used that giving it its own node avoids matching it by mistake and simplifies the API and grammar of the ubiquitous FormalParameter and VariableDeclaratorId.
- #1980 [java] Separate receiver parameter from formal parameter
Code | Old AST | New AST |
---|---|---|
(@A Foo this, Foo other) |
|
|
- What: parse the varargs ellipsis as an ArrayType
- Why: this improves regularity of the grammar, and allows type annotations to be added to the ellipsis
Code | Old AST | New AST |
---|---|---|
(int... is) |
|
|
(int @A ... is) |
n/a (parse error) |
|
(int[]... is) |
|
|
- What: Add a VoidType node to replace ResultType.
- Why: This means we don't need the ResultType wrapper when the method is not void, and the result type node is never null.
- [java] Add void type node to replace ResultType #2715
Code | Old AST | New AST |
---|---|---|
void foo(); |
|
|
int foo(); |
|
|
- What: The AST representation of a try-with-resources statement has been simplified. It uses now LocalVariableDeclaration unless it is a concise try-with-resources grammar.
- Why: Simpler integration try-with-resources into symboltable and type resolution.
- #1897 [java] Improve try-with-resources grammar
Code | Old AST | New AST |
---|---|---|
try (InputStream in = new FileInputStream(); OutputStream out = new FileOutputStream();) { } |
|
|
InputStream in = new FileInputStream();
try (in) {} |
|
|
- What: Merge AST nodes for postfix and prefix expressions into the single UnaryExpression node. The merged nodes are:
- PreIncrementExpression
- PreDecrementExpression
- UnaryExpression
- UnaryExpressionNotPlusMinus
- Why: Those nodes were asymmetric, and inconsistently nested within UnaryExpression. By definition they're all unary, so that using a single node is appropriate.
- #1890 [java] Merge different increment/decrement expressions
- #2155 [java] Merge prefix/postfix expressions into one node
Code | Old AST | New AST |
---|---|---|
++a;
--b;
c++;
d--; |
|
|
~a
+a |
|
|
- What: For each operator, there were separate AST nodes (like AdditiveExpression, AndExpression, ...).
These are now unified into a
InfixExpression
, which gives access to the operator viagetOperator()
and to the operands (getLhs()
,getRhs()
). Additionally, the resulting AST is not flat anymore, but a more structured tree. - Why: Having different AST node types doesn't add information, that the operator doesn't already provide. The new structure as a result, that the expressions are now parsed left recursive, makes the AST more JLS-like. This makes it easier for the type mapping algorithms. It also provides the information, which operands are used with which operator. This information was lost if more than 2 operands where used and the tree was flattened with PMD 6.
- #1979 [java] Make binary operators left-recursive
Code | Old AST | New AST |
---|---|---|
int i = 1 * 2 * 3 % 4; |
|
|