DeepSeek-R1: A State-of-the-Art Reasoning Model for Agentic AI Inference
DeepSeek-R1: A Perfect Example of Test-Time Scaling
DeepSeek-R1 is an open model with state-of-the-art reasoning capabilities. Instead of offering direct responses, reasoning models like DeepSeek-R1 perform multiple inference passes over a query, conducting chain-of-thought, consensus, and search methods to generate the best answer.
The Importance of Test-Time Scaling
Performing this sequence of inference passes – using reason to arrive at the best answer – is known as test-time scaling. DeepSeek-R1 is a perfect example of this scaling law, demonstrating why accelerated computing is critical for the demands of agentic AI inference.
Model Quality and Test-Time Compute
As models are allowed to iteratively “think” through the problem, they create more output tokens and longer generation cycles, so model quality continues to scale. Significant test-time compute is critical to enable both real-time inference and higher-quality responses from reasoning models like DeepSeek-R1, requiring larger inference deployments.
High-Quality Inference and Real-Time Performance
R1 delivers leading accuracy for tasks demanding logical inference, reasoning, math, coding, and language understanding while also delivering high inference efficiency.
Introducing the DeepSeek-R1 NIM Microservice
To help developers securely experiment with these capabilities and build their own specialized agents, the 671-billion-parameter DeepSeek-R1 model is now available as an NVIDIA NIM microservice on build.nvidia.com. The DeepSeek-R1 NIM microservice can deliver up to 3,872 tokens per second on a single NVIDIA HGX H200 system.
Getting Started with the DeepSeek-R1 NIM Microservice
Developers can test and experiment with the application programming interface (API), which is expected to be available soon as a downloadable NIM microservice, part of the NVIDIA AI Enterprise software platform. The DeepSeek-R1 NIM microservice simplifies deployments with support for industry-standard APIs. Enterprises can maximize security and data privacy by running the NIM microservice on their preferred accelerated computing infrastructure.
Creating Customized DeepSeek-R1 NIM Microservices
Using NVIDIA AI Foundry with NVIDIA NeMo software, enterprises will also be able to create customized DeepSeek-R1 NIM microservices for specialized AI agents.
Conclusion
The DeepSeek-R1 model is a large mixture-of-experts (MoE) model that incorporates an impressive 671 billion parameters, supporting a large input context length of 128,000 tokens. The model uses an extreme number of experts per layer, with each layer having 256 experts, with each token routed to eight separate experts in parallel for evaluation.
Frequently Asked Questions
Q: What is the main goal of the DeepSeek-R1 model?
A: The main goal of the DeepSeek-R1 model is to provide state-of-the-art reasoning capabilities for agentic AI inference.
Q: What is test-time scaling?
A: Test-time scaling refers to the process of using reason to arrive at the best answer, which requires multiple inference passes over a query.
Q: What is the importance of test-time compute for reasoning models like DeepSeek-R1?
A: Test-time compute is critical to enable both real-time inference and higher-quality responses from reasoning models like DeepSeek-R1, requiring larger inference deployments.
Q: How can I get started with the DeepSeek-R1 NIM microservice?
A: Developers can experience the DeepSeek-R1 NIM microservice on build.nvidia.com and can test and experiment with the API, which is expected to be available soon as a downloadable NIM microservice, part of the NVIDIA AI Enterprise software platform.