Date:

Accelerating Scientific Literature Reviews with NVIDIA NIM Microservices for LLMs

A Systematic Review of Large Language Models for Processing Papers

A well-crafted systematic review is often the initial step for researchers exploring a scientific field. For scientists new to this field, it provides a structured overview of the domain. For experts, it refines their understanding and sparks new ideas. In 2024 alone, 218,650 review articles were indexed in the Web of Science database, highlighting the importance of these resources in research.

Testing the Potential of LLMs for Processing Papers

As a research group specializing in physiological ecology within the ARC Special Research Initiative Securing Antarctica’s Environmental Future (SAEF), we embarked on writing a review of the literature on the global responses of non-vascular plants, such as moss or lichen, to wind. However, we quickly faced a challenge: many relevant articles on wind-plant interactions failed to explicitly mention these key words in their titles or abstracts, which are typically used as primary filters during literature screening. A comprehensive analysis of the topic required manually reading the full text of each article – a highly time-consuming process.

We decided to explore the potential of using LLMs to extract content specifically related to wind-plant interactions from the articles. To achieve this, we implemented a simple Q&A application based on the LLaMa 3.1 8B Instruct NIM microservice (Figure 1). This enabled us to get an initial prototype quickly.

Best-Performing Model

To determine the best-performing model, we tested a range of instruction-based and general-purpose LLMs from the NVIDIA API Catalog on a set of randomly selected articles. Each model was assessed for its accuracy and comprehensiveness in information extraction. Ultimately, we determined that Llama-3.1-8B-instruct was the most suitable for our needs.

Processing Speed

We developed a Q&A module using Streamlit to answer user-defined research-specific questions. To further improve processing speed, we implemented parallel processing of the prompts sent to the LLM engine and used KV-caching, which significantly accelerated the computation time by a factor of 6x when using 16 threads.

Results

Thanks to these improvements, we significantly reduced the time required to extract information from our database of papers, with a total speedup of 25.25x compared to our initial implementation. Processing the entirety of our database now takes less than 30 minutes using two A100 80-GB GPUs and 16 threads. Compared to the traditional approach of manually reading and analyzing an entire article, which typically takes about one hour, this optimized workflow achieved a time savings of over 99% (Figure 3).

Future Directions

We’re currently refining our workflow to further accelerate the processing. We’re also improving our user interface to provide easy access to more locally deployed LLMs and enhance accessibility by other researchers (Figure 4). We plan to implement the NVIDIA AI Blueprint for multimodal PDF data extraction to identify the most relevant articles for each research question and interact with those papers.

Summary

Our work at the Generative AI Codefest demonstrated the transformative potential of AI in accelerating systematic literature reviews. With NVIDIA NIM, we quickly moved from an idea to a working solution that significantly improves the process of information extraction from scientific papers. This experience highlights how AI can streamline research workflows, enabling faster and more comprehensive insights. LLMs have the potential to facilitate interdisciplinary research, empowering scientists to explore complex, multi-domain research fields more effectively.

FAQs

Q: What is the advantage of using LLMs for processing papers?
A: LLMs can significantly reduce the time required to extract information from a large database of papers, enabling faster and more comprehensive insights.

Q: How did you improve the processing speed?
A: We implemented parallel processing of the prompts sent to the LLM engine and used KV-caching, which accelerated the computation time by a factor of 6x when using 16 threads.

Q: What is the future direction of this research?
A: We plan to refine our workflow to further accelerate the processing, improve our user interface, and implement the NVIDIA AI Blueprint for multimodal PDF data extraction.

Latest stories

Read More

LEAVE A REPLY

Please enter your comment!
Please enter your name here