A Leap Forward in Sequence Modeling and Design from Molecular to Genome-Scale
The first Evo model from November 2024 represented a groundbreaking milestone in genomic research, introducing a foundation model capable of analyzing and generating biological sequences across DNA, RNA, and proteins.
Evo is known for its ability to operate across scales—ranging from molecular to genomic—using a unified approach. Trained on 2.7M prokaryotic and phage genomes, encompassing 300B nucleotide tokens, Evo delivered single-nucleotide resolution across many biological evolution and function tasks.
The core of Evo’s success is its innovative StripedHyena architecture (Figure 1), a hybrid model combining 29 Hyena layers, a new type of deep learning architecture designed to handle long sequences of information without relying on traditional attention mechanisms that are common to Transformer architectures. Instead, it uses a combination of convolutional filters and gates. This design overcame the limitations of traditional Transformer models, enabling Evo to handle long contexts of up to 131,072 tokens efficiently.
Figure 1. Evo and Evo 2 AI model architecture
Evo’s predictive capabilities set new standards for biological modeling. It achieved competitive performance in several zero-shot tasks, including predicting the fitness effects of mutations on proteins, non-coding RNAs, and regulatory DNA, providing invaluable insights for synthetic biology and precision medicine.
Evo also demonstrated remarkable generative capabilities, designing functional CRISPR-Cas systems and transposons. These outputs were validated experimentally, proving that Evo could predict and design novel biological systems with real-world utility.
Evo represents a notable advancement in integrating multimodal and multiscale biological understanding into a single model. Its ability to generate genome-scale sequences and predict gene essentiality across entire genomes marked a leap forward in our capacity to analyze and engineer life.
Learning the Language of Life Across Evolution
Evo 2 is the next generation of this line of research in genomic modeling, building on the success of Evo with expanded data, enhanced architecture, and superior performance.
For more information about the API output for various prompts, see the NVIDIA BioNeMo Framework documentation.
Evo 2 and the Future of AI in Biology
AI is poised to rapidly transform biological research, enabling breakthroughs previously thought to be decades away. Evo 2 represents a significant leap forward in this evolution, introducing a genomic foundation model capable of analyzing and generating DNA, RNA, and protein sequences at unprecedented scales.
While Evo excelled in predicting mutation effects and gene expression in prokaryotes, the capabilities of Evo 2 are much broader, with enhanced cross-species generalization, making it a valuable tool for studying eukaryotic biology, human diseases, and evolutionary relationships.
Evo 2’s ability to perform zero-shot tasks, from identifying genes that drive cancer risk to designing complex biomolecular systems, showcases its versatility. Including long-context dependencies enables it to uncover patterns across genomes, providing multi-modal and multi-scale insights that are pivotal for advancements in precision medicine, agriculture, and synthetic biology.
As the field moves forward, models like Evo 2 set the stage for a future where AI deciphers life’s complexity and is also used to design new useful biological systems. These advancements align with broader trends in AI-driven science, where foundational models are tailored to domain-specific challenges, unlocking previously unattainable capabilities. Evo 2’s contributions signal a future where AI becomes an indispensable partner in decoding, designing, and reshaping the living world.
Acknowledgments
We’d like to thank the following contributors to the described research for their notable contributions to the ideation, writing, and figure design for this post:
- Garyk Brixi, genetics Ph.D. student at Stanford
- Jerome Ku, machine learning engineer working with the Arc Institute
- Michael Poli, founding scientist at Liquid AI and computer science Ph.D. student at Stanford
- Greg Brockman, co-founder and president of OpenAI
- Eric Nguyen, bioengineering Ph.D. student at Stanford
- Brandon Yang, co-founder of Cartesia AI and computer science Ph.D. student at Stanford (on leave)
- Dave Burke, chief technology officer at the Arc Institute
- Hani Goodarzi, core investigator at the Arc Institute and associate professor of biophysics and biochemistry at the University of California, San Francisco
- Patrick Hsu, co-founder of the Arc Institute, assistant professor of bioengineering, and Deb Faculty Fellow at the University of California, Berkeley
- Brian Hie, assistant professor of chemical engineering at Stanford University, Dieter Schwarz Foundation Stanford Data Science Faculty Fellow, innovation investigator at the Arc Institute, and leader at the Laboratory of Evolutionary Design at Stanford
FAQs
Q: What is Evo 2?
A: Evo 2 is a next-generation genomic foundation model that can analyze and generate DNA, RNA, and protein sequences at unprecedented scales.
Q: What are the key features of Evo 2?
A: Evo 2 includes enhanced cross-species generalization, long-context dependencies, and superior performance.
Q: What are the potential applications of Evo 2?
A: Evo 2 can be used for studying eukaryotic biology, human diseases, and evolutionary relationships, as well as for precision medicine, agriculture, and synthetic biology.
Q: How does Evo 2 differ from other AI models?
A: Evo 2 is designed to handle long sequences of information and can integrate multimodal and multiscale biological understanding, making it a powerful tool for biological research.

