Write an article about
Large language models (LLMs) have enabled AI tools that help you write more code faster, but as we ask these tools to take on more and more complex tasks, there are limitations that become apparent. Challenges such as understanding the nuances of programming languages, complex dependencies, and adapting to codebase-specific context can lead to lower-quality code and cause bottlenecks down the line.
Qodo, a member of the NVIDIA Inception program, is a multi-agent code integrity platform that enhances and automates software quality workflows with AI-powered agents for code writing, testing, and review.
A core principle of Qodo’s vision is the belief that AI can only drive meaningful improvements in software integrity if it operates with deep contextual awareness. Code is not written in isolation—it exists within complex architectures, evolving dependencies, and specific coding standards. For AI to effectively assist developers, it must understand not just the syntax but the intent, patterns, and broader structure of the codebase.
Qodo achieves this by building its AI agents on a foundation of advanced retrieval-augmented generation (RAG), indexing and analysis, all powered by a state-of-the-art (SOTA) code embedding model. This specialized code embedding model—trained on NVIDIA DGX—enables AI to understand and analyze code more effectively, and retrieve highly relevant context to ensure LLM’s can generate accurate code suggestions, reliable tests, and insightful code reviews.
The need for a code-specific pipeline
Large complex codebases are constantly changing and indexing for context is an ongoing process.
Qodo built a robust pipeline for continuously maintaining a fresh index to ensure that code and test generation is always based on the most current state of the repository. This pipeline includes retrieving files from a codebase, chunking retrieved files into segments, and adding natural language descriptions to embeddings to make it easier for the AI to understand the context.
One challenge with code-specific RAG pipelines is chunking large code files into meaningful segments. Chunking is relatively simple for natural language text—paragraphs and sentences provide obvious boundary points for creating semantically meaningful segments.
However, naive chunking methods struggle with accurately delineating meaningful segments of code, leading to issues with boundary definition and the inclusion of irrelevant or incomplete information. Providing invalid or incomplete code segments to an LLM can actually hurt performance and increase hallucinations, rather than helping.
Qodo implements chunking using language-specific static analysis to recursively divide nodes into smaller chunks and performing retroactive processing to re-add any critical context that was removed. This method enables the creation of chunks that respect the code structure, keeping related elements together.
Another key challenge is embedding. With many existing embedding models, it’s difficult to accurately retrieve relevant code examples based on natural language queries. Many general-purpose embedding models, such as E5, focus on language patterns rather than code-specific elements such as syntax, variable dependencies, control flow, and API usage. This leads to irrelevant or imprecise search results and code retrieval, but relevancy and precision are critical for enabling AI coding agents.
Figure 1. Qodo’s code-specific ingest pipeline
Embedding model for code
In retrieval-augmented generation (RAG) systems, embedding models play a crucial role by transforming text into high-dimensional vectors that capture semantic meaning. These embeddings are stored in a vector database and enable efficient similarity searches, enabling the system to retrieve the most relevant information from a knowledge base when responding to user queries.
Figure 2. General model of the embedding process used for similarity matching
For code-specific tasks, using an embedding model trained on both programming languages and software documentation is particularly strategic. Such a model can better understand the nuances of code syntax, function names, and technical terminology, leading to more accurate retrieval of relevant code snippets or documentation.
This specialized embedding model can significantly enhance the performance of RAG systems in software development contexts, helping to improve code completion, bug detection, and the generation of technical documentation.
Figure 3. Qodo pipeline from repository to generated dataset
Compared to LLMs, embedding models are significantly smaller, and thus can be more efficiently distributed across multiple GPUs. This enables better utilization of hardware resources and potentially faster training times. As such, they are more amenable to data-parallel distributed training, where the entire model is replicated on each GPU worker, and batches of data are split among multiple GPUs.
Qodo trained their embedding model using a NVIDIA DGX 8x A100 80GB node. Training at bfloat16 numeric precision enabled them to use large micro-batch sizes of 256, speeding up convergence rate and training time. This is important for embedding models using a contrastive loss, especially when relying on in-batch negatives.
A larger batch size enables the model to sample a more diverse set of negative examples, which is essential for effective learning. That diversity helps the model better distinguish between similar and dissimilar instances, leading to improved representation quality.
Qodo fine-tuned two embedding models, Qodo-Embed-1-1.5B and Qodo-Embed-1-7B, based on Qwen, an open-source LLM developed by Alibaba Cloud and designed to perform a wide range of AI tasks. They achieved SOTA accuracy, leading the Hugging Face MTEB::CoIR leaderboard with both models in their respective size category (Figure 4).
NDGC is a specific metric used to assess the quality of information retrieval.
Figure 4. Qodo embedding model comparison
Case study: Internal code search
A recent collaboration between NVIDIA and Qodo shows the value of Qodo’s solution through a real-world use case. The work focused on enhancing the accuracy of one of NVIDIA’s internal RAG solutions (Genie) for searching private code repositories. The end goal was to perform LLM-based queries on NVIDIA’s internal code repositories to generate accurate and precise responses.
To achieve this goal, we substituted existing industry-standard components in the Genie project pipeline with Qodo’s specialized alternatives, improving the system’s ability to mine NVIDIA’s internal code repositories and yielding superior results.
The following Qodo components were integrated into the pipeline:
- Code indexer for GitLab and GitHub
- Code RAG Retriever
- Embedding model (Qodo-Embed-1-7B)
As discussed earlier, one of the challenges of building a code-specific RAG solution is chunking. Large code files should be split at natural stopping points to ensure that text chunks are optimally sized for processing and storage. Otherwise, the retrieval process fails when critical relevant code sections are out of context.
Figure 5. Code-specific RAG pipeline used for a case study
The final pipeline was integrated into NVIDIA’s internal Slack system, allowing expert C++ developers to ask detailed technical questions based on repositories of interest and receive robust responses.
Figures 6 and 7 show the example output from each pipeline, the original and the one built in collaboration with Qodo, respectively. The highlighted rectangle in Figure 6 shows that the original pipeline couldn’t respond with the specific data points.
Figure 6. Example output of the NVIDIA Genie code-specific RAG system using Slack
Figure 7 shows a far more detailed result.
Figure 7. Example output of the Qodo code-specific RAG system using Slack
For testing, we used the following common graphics public SDKs:
We used Ragas to generate synthetic questions based on these datasets and compared which responses were more correct and had technical detail. Each of the three row pairs corresponds to one of the SDK repositories from the list. Each column represents a breakdown of questions, where the value in each cell includes the number of correct responses based on faithfulness and answer relevancy. The final cell in each row shows the total number of correct responses.
Figure 7. Comparison of internal RAG (NVIDIA Genie) to the Qodo-based RAG pipeline
Conclusion7
You can experiment with Qodo’s embedding models, Qodo-Embed-1-1.5B and Qodo-Embed-1-7B, on Hugging Face.
Startups that want to accelerate their work should explore the free benefits available through the NVIDIA Inception program.
For more information, see the following resources:
.Organize the content with appropriate headings and subheadings ( h2, h3, h4, h5, h6). Include conclusion section and FAQs section with Proper questions and answers at the end. do not include the title. it must return only article i dont want any extra information or introductory text with article e.g: ” Here is rewritten article:” or “Here is the rewritten content:”