Java LST examples
When building recipes, it's important to understand how OpenRewrite Lossless Semantic Trees (LSTs) correspond to code. You couldn't, for example, properly rename a variable with a recipe unless you knew that J.Identifier is the class used to represent a variable.
To help you get started on working with Java LSTs and OpenRewrite, this guide will:
- Explain how LSTs work at a high-level
- Provide a high-level diagram that shows how LSTs relate to each other
- Provide a sample chunk of code and discuss how that code relates to different types of LSTs
- Teach you how to learn more about LSTs yourself
High-level LST explanation
In order to programmatically modify code without risking the introduction of syntactic or semantic errors, you must use a data structure that can accurately and comprehensively represent said code. OpenRewrite uses Lossless Semantic Trees (LSTs) for this purpose. Like other tree data structures, more complex LSTs are recursively composed of other, simpler LSTs.
For instance, a ClassDeclaration is an LST that defines a class. A typical class declaration will be composed of elements such as fields, methods, constructors, and inner classes. Each of those elements are, themselves, an LST. So the term "LST" may refer to an entire, complete Java file or just one piece of it.
It's important to note that it is possible to manipulate LSTs to create code that will not compile. While OpenRewrite provides some safeguards against grammatically invalid transformations in its type system (such as not allowing import statements to be replaced with a method declaration), it is still possible to write code that is valid according to the Java grammar without being a valid, compilable program.
For example, nothing prevents you from modifying an LST so that a variable is used before it is defined. It is the responsibility of recipe authors to consider language semantics and the full range of possibilities when making changes. In accordance with the principle of Doing No Harm, always err on the side of leaving code untouched rather than making a risky change.
LST diagram
This diagram demonstrates how a simple Java class is represented as an LST. Note the hierarchical structure where LSTs are composed out of other LSTs.
Java LST types
Below is a simple Java class whose entire purpose is to demonstrate different types of LSTs. Each of the following sections will highlight different parts of this code to demonstrate which chunks correspond to which LST. This listing of LST types is not exhaustive but should give you a good sense of the most common types.
package org.openrewrite;
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;
@AnAnnotation
public class A {
List<Integer> a = new ArrayList<>();
int foo() {
int a = 1 + 2, b = 3, c;
this.a = this.a.stream()
.map(it -> it + 1)
.collect(Collectors.toList());
return a;
}
}
@interface AnAnnotation {}
Binary
A Binary is an Expression with a left and right side, separated by an operator. Examples of operators include +
, -
, ||
, &&
, and more.
Block
A Block is a pair of curly braces and the Statements contained within. Blocks can be nested inside of each other.
ClassDeclaration
A ClassDeclaration contains all of the code for any Java class. Please note that a ClassDeclaration
can be nested inside of another class such as with:
public class A {
// ...
private class B {
// ...
}
}
CompilationUnit
A CompilationUnit is the root of the Java LST. In order for an LST to represent valid Java code, all other elements must be contained inside of this.
Expression
An Expression is anything that returns a value. MethodInvocation
, Identifier
, and Binary
are all examples of expressions. Please note that some LSTs such as MethodInvocation
are both a Statement and an Expression.
In the below code, only some of the expressions are highlighted as expressions can often have many expressions inside of them and it would be too difficult to read if all of them were highlighted. For instance, import java.util.ArrayList
is many expressions (java
, util
, ArrayList
, java.util
, and java.util.ArrayList
).
FieldAccess
A FieldAccess is any fully qualified name. Often times, these are package or import statements, but they can also appear in code as something like: this.foo
.
Identifier
An Identifier is any name in the code (class names, variable names, method names, etc).
You can use J.Identifier.getFieldType()
to tell what class the identifier is a field on. If null
is returned, then that means the identifier it is not a field.
MethodDeclaration
A MethodDeclaration is the annotations, modifiers, return type, name, argument list, and body which together define a method on a Class.
MethodInvocation
A MethodInvocation consists of a select expression, any defined type parameters, the method name, and its arguments. Method invocations have a somewhat surprising structure where the highest-level LST element consists of the select expression (everything to the left of the last dot) and the name on the right. Let's use the below code as an example to clarify this further.
- In the above code, the "highest-level"
MethodInvocation
(this.a.stream().map(it -> it + 1).collect(Collectors.toList())
) would have these components:- Select expression:
this.a.stream.map(it -> it + 1)
- Name:
collect
- A single argument:
Collectors.toList()
- Select expression:
- The argument passed into the above
MethodInvocation
(Collectors.toList()
) is itself aMethodInvocation
that would have these components:- Select expression:
Collectors
- Name:
toList
- No arguments
- Select expression:
- The select expression of the highest-level
MethodInvocation
(this.a.stream().map(it -> it + 1)
) is also aMethodInvocation
and it would have these components:- Select expression:
this.a.stream()
- Name:
map
- A single argument which is the lambda expression:
it -> it + 1
- Select expression:
- The select expression of the above
MethodInvocation
(this.a.stream()
) is also aMethodInvocation
that has these components:- Select expression:
this.a
- Name:
stream
- No arguments
- Select expression:
NewClass
A NewClass is when an object is created via its constructors and the new
keyword.
Statement
A Statement is anything that appears on its own line within a block. Statement elements are usually terminated with a semicolon. if
, while
, try
, Block
, return
, and MethodInvocation
are all examples of statements. Please note that some LST elements such as MethodInvocation
are both Statements and Expressions.
In the below code, only some of the statements are highlighted as statements will often have many sub-statements and the diagram would become too difficult to read. For instance, List<Integer> a = new ArrayList<>()
is a statement as well as new ArrayList<>()
.
VariableDeclarations
A VariableDeclarations contains the declaration of one or more variables of the same type, with or without initializing expressions for each variable.
Using the debugger to detect LSTs
If you want an easier and more visual way to examine LSTs, check out the TreeVisitingPrinter guide.
If you find yourself still unsure what makes up a particular LST or if you want to traverse the LST yourself, you can use the Java debugger to help you.
Let's use the example code from above as an example. You can make a simple recipe that doesn't do much aside from visit a CompilationUnit: gist. You can then make a test that checks that the code hasn't changed: gist.
Once you have that recipe and test class created, there are two main places where you'll want to add breakpoints:
- Inside of the
visitCompilationUnit
method in your recipe class OR - Inside of the JavaVisitor.java class itself.
If you add a breakpoint in the visitCompilationUnit
method, you'll find that the entire LST is defined in a variable called cu
. In there, you can see things like classes
which is an ArrayList
of ClassDeclaration
elements. You could then expand the classes
element and find a body
that contains statements
that contain VariableDeclaration
and MethodDeclaration
elements and so on:
If you add a breakpoint in the JavaVisitor.java
class instead, you can pick which LST type you want to explore. For instance, if you wanted to see what a ClassDeclaration
LST is in your code, you could add a breakpoint in the visitClassDeclaration method. You'll now find that the LST begins with a ClassDeclaration
instead of CompilationUnit
. You can step through the tree in the same way as before and you'll find everything else is the same. The benefit of this approach is that you can continue to resume the program and it'll stop at every point in the LST that a ClassDeclaration
is visited.