Date:

Accelerating Alphafold2 with GPU-Accelerated MMseqs2

The Ability to Compare Sequences of Multiple Related Proteins

The ability to compare the sequences of multiple related proteins is a foundational task for many life science researchers. This is often done in the form of a multiple sequence alignment (MSA), and the evolutionary information retrieved from these alignments can yield insights into protein structure, function, and evolutionary history.

Overcoming Computationally Expensive MSA with NVIDIA CUDA

Traditional MSA tools rely on CPU-based implementations, which, while effective at sequential processing, can’t match GPU parallel processing capabilities.

The joint research team that developed MMseqs2-GPU was led by researchers at Seoul National University, Johannes Gutenberg University Mainz, and NVIDIA. Inspired by their previous work on CUDASW++4.0, they approached the problem by developing a novel, gapless prefiltering algorithm tailored to NVIDIA CUDA that enables efficient, high-sensitivity sequence comparisons at unparalleled speeds.

This GPU-accelerated prefilter replaces k-mer prefiltering in MMseqs2 with a gapless scoring approach. Instead of using k-mer searches, simplifying comparisons between sequences with a coarse representation, the gapless prefilter directly analyzes the full sequences. It employs a modified version of the classic Smith-Waterman-Gotoh algorithm that only considers diagonal dependencies, avoiding gaps in the alignment. The process runs efficiently across thousands of GPU cores.

MMseqs2-GPU Accelerates Protein Structure Prediction

The success of MMseqs2-GPU is rooted in redesigning gapless prefiltering and gapped alignment algorithms, leveraging CUDA to deliver rapid, affordable, and scalable sequence alignment that meets today’s bioinformatics research demands.

As MMseqs2 is integrated into many computational pipelines using GPUs, including structure prediction with Colabfold, users can expect an easy-to-swap-in performance boost:

Speed improvement

Colabfold using MMseqs2-GPU is 22x faster than AlphaFold2 with JackHMMER and HHblits for protein folding (Figure 4). In practice, this means that instead of waiting 40 minutes to predict a protein structure using HHblits, JackHMMER, and AlphaFold2, you can get that exact prediction in one and a half minutes using Colabfold and MMseqs2-GPU.

Accelerated MMseqs2 Means Faster Discoveries

Looking ahead, the joint research team is focused on further refining the algorithms and the MMseqs2 integration, expanding its applications to protein clustering and cascaded database searches. The availability of MMSeqs2 means faster inputs to protein structure prediction that can accelerate drug discovery, as we’ve illustrated here, and a host of other applications (Figure 2).

Conclusion

The MMseqs2-GPU library is a significant advancement in the field of bioinformatics, enabling researchers to accelerate protein structure prediction and other applications. By leveraging the power of NVIDIA CUDA, MMseqs2-GPU provides a scalable and efficient solution for sequence alignment, making it an invaluable tool for the scientific community.

FAQs

Q: What is MMseqs2-GPU?
A: MMseqs2-GPU is a GPU-accelerated library for evolutionary information retrieval, designed to accelerate protein sequence alignment and protein structure prediction.

Q: What are the benefits of MMseqs2-GPU?
A: MMseqs2-GPU provides a significant speedup over traditional CPU-based implementations, enabling researchers to accelerate protein structure prediction and other applications, while maintaining comparable accuracy and sensitivity.

Q: How does MMseqs2-GPU work?
A: MMseqs2-GPU uses a novel, gapless prefiltering algorithm tailored to NVIDIA CUDA, which enables efficient, high-sensitivity sequence comparisons at unparalleled speeds.

Q: Is MMseqs2-GPU open source?
A: Yes, MMseqs2-GPU is open source and available online, providing an invaluable resource for researchers globally.

Q: Can I use MMseqs2-GPU for my research?
A: Yes, MMseqs2-GPU is designed to be used in a variety of research applications, including protein structure prediction, protein clustering, and cascaded database searches.

Latest stories

Read More

LEAVE A REPLY

Please enter your comment!
Please enter your name here