Generate single title from this title Spotlight: Qodo Innovates Efficient Code Search with NVIDIA DGX in 100 -150 characters. And it must return only title i dont want any extra information or introductory text with title e.g: ” Here is a single title:”

Write an article about

Large language models (LLMs) have enabled AI tools that help you write more code faster, but as we ask these tools to take on more and more complex tasks, there are limitations that become apparent. Challenges such as understanding the nuances of programming languages, complex dependencies, and adapting to codebase-specific context can lead to lower-quality code and cause bottlenecks down the line.

Qodo, a member of the NVIDIA Inception program, is a multi-agent code integrity platform that enhances and automates software quality workflows with AI-powered agents for code writing, testing, and review.

A core principle of Qodo’s vision is the belief that AI can only drive meaningful improvements in software integrity if it operates with deep contextual awareness. Code is not written in isolation—it exists within complex architectures, evolving dependencies, and specific coding standards. For AI to effectively assist developers, it must understand not just the syntax but the intent, patterns, and broader structure of the codebase.

Qodo achieves this by building its AI agents on a foundation of advanced retrieval-augmented generation (RAG), indexing and analysis, all powered by a state-of-the-art (SOTA) code embedding model. This specialized code embedding model—trained on NVIDIA DGX—enables AI to understand and analyze code more effectively, and retrieve highly relevant context to ensure LLM’s can generate accurate code suggestions, reliable tests, and insightful code reviews.

The need for a code-specific pipeline

Large complex codebases are constantly changing and indexing for context is an ongoing process.

Qodo built a robust pipeline for continuously maintaining a fresh index to ensure that code and test generation is always based on the most current state of the repository. This pipeline includes retrieving files from a codebase, chunking retrieved files into segments, and adding natural language descriptions to embeddings to make it easier for the AI to understand the context.

One challenge with code-specific RAG pipelines is chunking large code files into meaningful segments. Chunking is relatively simple for natural language text—paragraphs and sentences provide obvious boundary points for creating semantically meaningful segments.

However, naive chunking methods struggle with accurately delineating meaningful segments of code, leading to issues with boundary definition and the inclusion of irrelevant or incomplete information. Providing invalid or incomplete code segments to an LLM can actually hurt performance and increase hallucinations, rather than helping.

Qodo implements chunking using language-specific static analysis to recursively divide nodes into smaller chunks and performing retroactive processing to re-add any critical context that was removed. This method enables the creation of chunks that respect the code structure, keeping related elements together.

Another key challenge is embedding. With many existing embedding models, it’s difficult to accurately retrieve relevant code examples based on natural language queries. Many general-purpose embedding models, such as E5, focus on language patterns rather than code-specific elements such as syntax, variable dependencies, control flow, and API usage. This leads to irrelevant or imprecise search results and code retrieval, but relevancy and precision are critical for enabling AI coding agents.

Figure 1. Qodo’s code-specific ingest pipeline

Embedding model for code

In retrieval-augmented generation (RAG) systems, embedding models play a crucial role by transforming text into high-dimensional vectors that capture semantic meaning. These embeddings are stored in a vector database and enable efficient similarity searches, enabling the system to retrieve the most relevant information from a knowledge base when responding to user queries.

The diagram shows embedding representation in a vector database, including a request in the embedding form, “Find the two most similar images, documents, or videos.” Figure 2. General model of the embedding process used for similarity matching

For code-specific tasks, using an embedding model trained on both programming languages and software documentation is particularly strategic. Such a model can better understand the nuances of code syntax, function names, and technical terminology, leading to more accurate retrieval of relevant code snippets or documentation.

This specialized embedding model can significantly enhance the performance of RAG systems in software development contexts, helping to improve code completion, bug detection, and the generation of technical documentation.

The workflow diagram shows that the Qodo pipeline automatically scrapes open-source code from GitHub, applies multiple filtering steps for quality, and then injects the data with synthetic function descriptions and docstrings. Figure 3. Qodo pipeline from repository to generated dataset

Compared to LLMs, embedding models are significantly smaller, and thus can be more efficiently distributed across multiple GPUs. This enables better utilization of hardware resources and potentially faster training times. As such, they are more amenable to data-parallel distributed training, where the entire model is replicated on each GPU worker, and batches of data are split among multiple GPUs.

Qodo trained their embedding model using a NVIDIA DGX 8x A100 80GB node. Training at bfloat16 numeric precision enabled them to use large micro-batch sizes of 256, speeding up convergence rate and training time. This is important for embedding models using a contrastive loss, especially when relying on in-batch negatives.

A larger batch size enables the model to sample a more diverse set of negative examples, which is essential for effective learning. That diversity helps the model better distinguish between similar and dissimilar instances, leading to improved representation quality.

Qodo fine-tuned two embedding models, Qodo-Embed-1-1.5B and Qodo-Embed-1-7B, based on Qwen, an open-source LLM developed by Alibaba Cloud and designed to perform a wide range of AI tasks. They achieved SOTA accuracy, leading the Hugging Face MTEB::CoIR leaderboard with both models in their respective size category (Figure 4).

NDGC is a specific metric used to assess the quality of information retrieval.

A graph shows the Qodo embedding models compared to other code-specific embedding models as measured on the HuggingFace MTEB::CoIR leaderboard. The models are at the top with scores of 71.33 and 68.53 in mean NDCG. Figure 4. Qodo embedding model comparison

Case study: Internal code search

A recent collaboration between NVIDIA and Qodo shows the value of Qodo’s solution through a real-world use case. The work focused on enhancing the accuracy of one of NVIDIA’s internal RAG solutions (Genie) for searching private code repositories. The end goal was to perform LLM-based queries on NVIDIA’s internal code repositories to generate accurate and precise responses.

To achieve this goal, we substituted existing industry-standard components in the Genie project pipeline with Qodo’s specialized alternatives, improving the system’s ability to mine NVIDIA’s internal code repositories and yielding superior results.

The following Qodo components were integrated into the pipeline:

Code indexer for GitLab and GitHub
Code RAG Retriever
Embedding model (Qodo-Embed-1-7B)

As discussed earlier, one of the challenges of building a code-specific RAG solution is chunking. Large code files should be split at natural stopping points to ensure that text chunks are optimally sized for processing and storage. Otherwise, the retrieval process fails when critical relevant code sections are out of context.

A diagram shows NVIDIA components for a retriever, internal embedding, internal web app, meeting transcripts, a Unity codebase, and Confluence. The Qodo components are a RAG retriever API, GitHub indexer, GitLab indexer, and embedding model. The off-the-shelf components are GitHub, GitLab, and a Milvus vector database. Figure 5. Code-specific RAG pipeline used for a case study

The final pipeline was integrated into NVIDIA’s internal Slack system, allowing expert C++ developers to ask detailed technical questions based on repositories of interest and receive robust responses.

Figures 6 and 7 show the example output from each pipeline, the original and the one built in collaboration with Qodo, respectively. The highlighted rectangle in Figure 6 shows that the original pipeline couldn’t respond with the specific data points.

Screenshots compare the output of the same question asked using NVIDIA Genie and Qodo, respectively. The question was, “What are the predefined colors in ImGui and their corresponding RGBA values? %rtxpt” The original pipeline couldn’t respond with the specific data points and suggested checking the ImGui documentation instead. Figure 6. Example output of the NVIDIA Genie code-specific RAG system using Slack

Figure 7 shows a far more detailed result.

Screenshots compare the output of the same question asked using NVIDIA Genie and Qodo, respectively. The question was, “What are the predefined colors in ImGui and their corresponding RGBA values? %rtxpt” The Qodo pipeline offered a short list of eight element groups with multiple color values for each, including text, background, border, frame, title, scrollbar, button, and header. Figure 7. Example output of the Qodo code-specific RAG system using Slack

For testing, we used the following common graphics public SDKs:

We used Ragas to generate synthetic questions based on these datasets and compared which responses were more correct and had technical detail. Each of the three row pairs corresponds to one of the SDK repositories from the list. Each column represents a breakdown of questions, where the value in each cell includes the number of correct responses based on faithfulness and answer relevancy. The final cell in each row shows the total number of correct responses.

Three tables show that the Qodo pipeline has higher scores for total correct responses than the NVIDIA Genie pipeline: for RTXPT, the scores were 74 to 55, for RTXGI, 76 to 61, and for RTXDI, 79 to 53. Figure 7. Comparison of internal RAG (NVIDIA Genie) to the Qodo-based RAG pipeline

Conclusion7

You can experiment with Qodo’s embedding models, Qodo-Embed-1-1.5B and Qodo-Embed-1-7B, on Hugging Face.

Startups that want to accelerate their work should explore the free benefits available through the NVIDIA Inception program.

For more information, see the following resources:

.Organize the content with appropriate headings and subheadings ( h2, h3, h4, h5, h6). Include conclusion section and FAQs section with Proper questions and answers at the end. do not include the title. it must return only article i dont want any extra information or introductory text with article e.g: ” Here is rewritten article:” or “Here is the rewritten content:”

Post Views: 3

Generate single title from this title Spotlight: Qodo Innovates Efficient Code Search with NVIDIA DGX in 100 -150 characters. And it must return only title i dont want any extra information or introductory text with title e.g: ” Here is a single title:”

The need for a code-specific pipeline

Embedding model for code

Case study: Internal code search

Conclusion7

Generate single title from this title Empower a data and AI-powered, sustainable energy future with Microsoft in 100 -150 characters. And it must return...

Generate single title from this title Apple’s AI Promises Just Got Exposed — Here’s What They’re Not Telling You in 100 -150 characters. And...

Generate single title from this title Capital One Banks on AI for Financial Services in 100 -150 characters. And it must return only title...

Generate single title from this title AI memory demand fuels SK Hynix’s DRAM market victory in 100 -150 characters. And it must return only...

Generate single title from this title YouTube Clone Built with HTML & CSS Add It to Your Own Website in 100 -150 characters. And...

Generate single title from this title Empower a data and AI-powered, sustainable energy future with Microsoft in 100 -150 characters. And it must return...

Generate single title from this title Apple’s AI Promises Just Got Exposed — Here’s What They’re Not Telling You in 100 -150 characters. And...

Generate single title from this title Capital One Banks on AI for Financial Services in 100 -150 characters. And it must return only title...

Generate single title from this title AI memory demand fuels SK Hynix’s DRAM market victory in 100 -150 characters. And it must return only...

Generate single title from this title YouTube Clone Built with HTML & CSS Add It to Your Own Website in 100 -150 characters. And...

Generate single title from this title Ex-OpenAI staff and top AI experts seek to block proposed for-profit restructure in 100 -150 characters. And it...

Generate single title from this title I replaced my iPad Air with this Samsung tablet, and it’s better in several ways in 100 -150...

Generate single title from this title 10 steps to create character design with personality for animation in 100 -150 characters. And it must return...

LEAVE A REPLY Cancel reply

Latest

Generate single title from this title Empower a data and AI-powered, sustainable energy future with Microsoft in 100 -150 characters. And it must return...

Generate single title from this title Apple’s AI Promises Just Got Exposed — Here’s What They’re Not Telling You in 100 -150 characters. And...

Generate single title from this title Capital One Banks on AI for Financial Services in 100 -150 characters. And it must return only title...

Categories

Useful Links

Our Newsletter