Safe, efficient ways to use AI for mass-scale automated refactoring and analysis

Justine Gehring
|
April 23, 2024
Moderne using AI for code refactoring
Contents

Key Takeaways

At Moderne, we’re solving the hard problem of understanding and transforming large codebases at mass scale. Whether it’s migrating frameworks, modernizing for the cloud, remediating security vulnerabilities, or performing impact analyses, these are tasks that typically scale beyond individual developers working in single repositories. 

The Moderne Platform leverages the lossless semantic tree (LST) full-fidelity representation of your code and OpenRewrite rules-based recipes to automate code refactoring and analysis across your codebase. And now, we’re integrating the ‘out-of-the-box’ thinking that artificial intelligence (AI) can provide alongside those rules-based recipes to do even more—safely and efficiently. Moderne's integration of AI for large scale refactoring of your existing codebase complements AI-assisted code authorship tools, like Copilot or Codeium, which operate within a single repository.

We are currently implementing AI in three ways within the platform: 

  • Human-computer interaction (HCI) looks at how AI can help users use a platform, computer, program, etc, in the most optimal way. HCI is there to improve user experience. 
  • “A computer gets AI” is our novel way to describe when a program, such as a recipe, leads and guides when to use AI. This way, the rules-based recipes are asking AI to assist with simple operations but with specific context. 
  • “AI gets a computer” is when the AI gets to use tools such as a checker, a calculator or in our case recipes. It’s a nod to the article “ChatGPT gets a computer.” Chatbots, just like ChatGPT, get to use tools, such as a calculator or running python code snippets. 

In this article, we’ll take a walk through the fertile ground of AI applied to mass-scale automated refactoring and analysis. Also, check out our community office hours where we discuss the different AI integrations at Moderne.

Human-computer interface (HCI) with AI

The objective of Human-Computer Interaction (HCI) is to enhance usability, which encompasses the development of an intuitive platform as well as features that assist users in maximizing their interaction with an interface. AI-based interfaces mark a significant evolution in how humans interact with computers, standing as the third generation of human-computer interfaces. 

The first generation, the command-line interface (CLI), is challenging to use, requiring specific commands. The second generation, the graphical user interface (GUI), simplifies interaction through visual menus, though it still requires users to learn system navigation. The latest, AI-based interfaces, enables users to interact with computers using natural language, further simplifying and enhancing the ease of use. 

Moderne’s AI-based interface for searching recipes is not a chatbot, but rather an efficient semantic based search that allows users to quickly find the recipe they were looking for. Our first-generation search was Lucene-based and required exact match keywords, which often prevented users from finding the desired recipe. For example, a search for “Find method invocations” wouldn't return the recipe titled “Find method usages,” despite the terms “invocations” and “usages” being synonymous in this context. 

To address this, we developed an AI-powered search engine that, rather than relying on keyword matching, uses word representations and conducts searches based on concepts. This improvement is made possible through the use of embeddings, which are vector representations of concepts that allow for searching based on the distance between vectors. 

Our solution incorporates two embedding models: the first conducts a preliminary sweep of all recipes, while the second, more sophisticated but slower model, meticulously refines the results of the initial pass. You can read more about the full AI search pipeline in this blog.

A computer gets AI: Recipes calling AI models

OpenRewrite recipes are an excellent way to use AI models, as they can guide the model to be used in the most optimal, efficient way.

A computer gets AI
Figure 1. Recipes calling AI models

An example of a recipe that uses AI, is the recipe that produces a clustering visualization showing the similarity between method declarations in a codebase. This recipe first uses LSTs to find all method declarations in a codebase. The recipe makes calls to an embedding model to get a vector representation for each method declaration in the code. Using these vector representations, we can then cluster them, offering a bird’s-eye view of the codebase. 

We apply KMeans clustering to cluster the method declarations. This machine learning technique divides each vector into groups, each distinguished by a unique color, based on the shortest distance between the individual vectors and the clusters. We can project these high-dimensional vectors into a 2D space using UMAP as seen in Figure 2. You can hover over the dots to reveal the method name.

Clustering visualization of a codebase
Figure 2. Clustering visualization built by AI

Another recipe that uses AI is “Fix mis-encoded French comments.” Mis-encoded characters are more than just a readability issue. They can cause the Javadoc compiler itself to fail, which means consumers of that code do not have ready access to documentation on the APIs they are using. We developed a recipe that makes use of a python sidecar that hosts the AI model to predict a fix for those mis-encoded characters. Read more about it in our blog

AI gets a computer: AI using rules-based programs 

“AI gets a computer” refers to when a chatbot has access to tools. These tools make them mimic more traits of an agent, can limit hallucinations, and, generally, can significantly improve the quality of their output. 

AI gets a computer
Figure 3. When AI uses recipes as 'tools' to improve response quality

The LST functions as an effective retrieval engine, which can be used for retrieval augmented generation (RAG). Operationally, an embedding model called from within a recipe can use the LST to sample the code, gathering semantically diverse blocks to feed the LLM.

With this capability, we developed a recommendations tool designed to diagnose issues in the code and recommend recipe fixes specific to a codebase. It has three main stages: 

  1. The recommendation tool uses a recipe which in turn uses an embedding model to sample with high diversity method declarations present in the codebase. For each sampled method declaration, a generative model (i.e., an LLM), makes tailored modernization recommendations.
  2. Using these recommendations, our in-house recipe search can find the recipes that would do the modernization, fix, or migration required. This step is crucial to limit the hallucinations that a model might do, since we match up the recommendation to a safe and tested OpenRewrite recipe. 
  3. We then apply the recipe, and only recommend to the user recipes that produce changes. 

The whole pipeline for recommendations is illustrated in Figure 4. You can read more about it on our blog where we explain how we came to build such a pipeline here.

Figure 4. AI-based recipe recommendations tool

Trusting AI for mass-scale refactoring

You can find a summary in Figure 5 of the three types of AI integrations we currently do at Moderne. 

We believe that AI and rules-based recipes can be used together, leveraging the strengths of both. And the secure, self-contained Moderne Platform is an ideal environment for AI to work safely across your entire codebase. Finally, a way to use AI at scale that you can trust.

Figure 5. Summary of AI integrations in the Moderne Platform

Contact us to learn more about how you can apply mass-scale automated refactoring to your codebase—and leverage the power of AI models in the process.