The Lossless Semantic Tree: a compiler-accurate model of your code

Every codebase

Code isn’t text. Your tools read it that way.

Moderne sees tens of millions of repositories across its customers. The hard part was never finding a match. It’s finding every real one and changing it safely, everywhere at once.

Grep, the IDE, and an agent reading files all treat code as characters on a page. So every answer comes back approximate, and every tool re-derives the same understanding from scratch, file by file, run after run.

The compiler already knows far more. It knows the real type behind every symbol, what each call resolves to, and how your code connects to its dependencies. Capture that knowledge once and code stops being text. It becomes data you can query. Scroll down and we’ll descend from all of them to a single line to show what that takes.

Just text

Characters on a page know nothing.

The cheapest way to read code is as text, the way grep and ripgrep do. It runs fast and stays completely blind to what any of it means.

Search for a class name and you’ll hit comments, strings, and unrelated variables, while missing every place the same thing arrives under an alias, behind a factory, or through a parent class.

Text has no notion of a type, where a symbol is declared, or what a call resolves to. Across tens of millions of repositories, “probably these, give or take” is not an answer you can act on.

The syntax tree

A parse tree sees shape. Not meaning.

Parse that text into a tree of nested nodes (methods, ifs, calls) and you have an AST. It helps with highlighting and formatting, but it still doesn’t understand the code.

An AST knows this node is a constructor call. It does not know which class is being constructed, what that class extends, or whether the interface it ultimately implements governs a security check. There’s no symbol table, no resolved types, and no links across files, and the formatting is thrown away.

This is the trap: people see a syntax tree and call it “context.” It’s structure without meaning, and you can’t safely rewrite structure. To know what code does, you have to resolve it the way the compiler did when it built it.

The Lossless Semantic Tree

Everything the compiler knows.

The LST is your code as structured data, not text and not an AST. It’s a compiler-accurate model in which every symbol is resolved to the exact type it binds to.

Each identifier ties back to its real declaration. Each call binds to the exact method, generics and overloads included. Each type resolves through the full inheritance graph, across your files, out into your dependencies, and down into the language runtime itself, whether that’s the JDK, the .NET base class library, or Python’s standard library. The model captures the types, the dependencies, and the relationships between them.

It also stays lossless. Formatting, comments, and structure are preserved, so an automated edit reads like a person wrote it. Built once and reused across the org, it becomes a queryable dataset the whole estate can be searched and rewritten against. Compiler-grade meaning with file-perfect fidelity is what makes change safe at scale.

One file

One line opens the door.

Down at a single file sits one ordinary-looking line: new AllowAllHostnameVerifier(). That one constructor call quietly switches off TLS hostname verification.

With that verifier in place, the application will accept a certificate issued for any host, which is exactly the gap a man-in-the-middle attack needs. It’s a real, legitimate class from Apache HttpClient, often dropped in to clear a certificate error during testing and never taken back out.

It compiles. It passes review. It looks deliberate. Nothing on the surface tells you this line is the open door, which is exactly why finding it across an entire estate takes more than a search for its name.

The LST follows the type

Four types deep to prove it.

The LST didn’t match a name. It followed the type. Proving this one line disables the check meant resolving four layers of inheritance, from the class in your code down to the runtime interface it ultimately implements.

AllowAllHostnameVerifier extends AbstractVerifier, which implements X509HostnameVerifier, which extends javax.net.ssl.HostnameVerifier, the JDK interface whose verify() is the check being neutralized.

That chain runs through your code, the library, and the JDK. Because the match is on meaning rather than text, one recipe finds this exposure in every variant across every repository, and the fix lands the same way every time.

One recipe runs everywhere

− return new AllowAllHostnameVerifier(); + return new DefaultHostnameVerifier();

The edit is precise and deterministic. A human reviews it once, then it lands across every repository that carries it.

Tens of millionsrepositories
1repository
1fileTlsConfig.java
1linethe open door

scroll to descend↓

Skip ahead to Below the syntax tree↓

Below the syntax tree

Meaning isn’t
in the file.

A parser sees one file in isolation. To know what a single token really is, the LST resolves it the way the compiler would, against the dependencies on the classpath, the compiler itself, the language version, and the build that wired them all together.

An abstract syntax tree sees only the first box, the source in isolation. The LST resolves it against the other four into one provable fact. That is the difference between a parse tree and a model you can act on.

A foundation, not a parser

Don’t rebuild the foundation.
Build on it.

Getting one type right means modeling a language, its compiler, its build tools, and the way real projects wire dependencies together. Then you do it again for the next language, and keep every one of them correct as they all keep changing.

Tens of billions of lines parsed, resolved, and serialized, the experience that hardens a model this deep, accumulated over years.

One model, across the estate

Java
JavaScript
TypeScript
C#
Python
Go
Kotlin
Groovy
Scala
COBOL
XML
YAML

Each with its own compiler, build tools, and dependency resolution, modeled once so you don’t have to.

The foundation layer

Find it once.
Fix it everywhere.

Semantic code search and deterministic, multi-repo refactoring, all built on the LST.

Read more about the LST → Try it on open source →

The Lossless Semantic Tree (LST) is a compiler-accurate, format-preserving model of source code, built on the open-source OpenRewrite engine. It adds full type attribution (symbol resolution, generics, inheritance, and transitive dependencies) while preserving formatting and comments, so automated semantic code search and code refactoring stay accurate and reviewable across thousands of repositories and many languages (Java, JavaScript, TypeScript, C#, Python, Go, Kotlin, Groovy, Scala, COBOL, XML, YAML).

Frequently asked questions

What is a Lossless Semantic Tree (LST)?

A Lossless Semantic Tree (LST) is a compiler-accurate code model that adds rich type and semantic attribution while preserving formatting, comments, and style. This fidelity enables precise code search and safe, automated refactoring across large, multi-repository codebases.

How is the LST different from an Abstract Syntax Tree (AST)?

Unlike a traditional AST, which strips out formatting, comments, and type information, the LST preserves every detail while adding rich semantic context. That full fidelity makes both search and transformation accurate, and ensures results are idiomatically consistent with how developers actually write code.

What is semantic code search?

Semantic code search goes beyond text queries. It lets developers search by code meaning (method calls, types, or API usage) rather than just strings. With the LST, semantic searches are precise and reliable across entire codebases.

How does the LST improve automated refactoring?

By combining semantic understanding with full code fidelity, the LST enables automated refactorings that are accurate, safe, and developer-friendly, producing clean, minimal diffs that teams trust and merge.

Can the LST handle multiple languages?

Yes. While the LST originated with Java, it also supports JavaScript, TypeScript, C#, Python, Go, Kotlin, Groovy, Scala, COBOL, and infrastructure-as-code formats like XML and YAML. It’s built for polyglot enterprise environments where modernization spans many languages.

Can the LST be used with AI-driven code analysis?

Yes. The LST gives AI models a rich, structured view of code with far deeper context than plain text or a traditional AST. That improves semantic search, code summarization, and issue detection, making AI more useful and reliable in large-scale code analysis.

Why does the LST preserve formatting?

When automated changes are idiomatically consistent with the original source, developers are much more likely to accept them. If comments vanish, developers lose trust in automation; if formatting changes unexpectedly, pull requests become noisy and unreviewable. The LST preserves it all (indentation, spacing, imports, and comments), so diffs stay clean and developers can focus on the change itself.

The Lossless Semantic Tree: a compiler-accurate model of your code

Find it once. Fix it everywhere.

Frequently asked questions

Find it once.
Fix it everywhere.