Deep Cogito Releases Novel Open Large Language Models with General Superintelligence Claims
Deep Cogito has released several open large language models (LLMs) that outperform competitors and claim to represent a step towards achieving general superintelligence.
Overview of Deep Cogito’s Open LLMs
The San Francisco-based company, which states its mission is “building general superintelligence,” has launched preview versions of LLMs in 3B, 8B, 14B, 32B, and 70B parameter sizes. Deep Cogito asserts that “each model outperforms the best available open models of the same size, including counterparts from LLAMA, DeepSeek, and Qwen, across most standard benchmarks.”
Iterated Distillation and Amplification (IDA)
Central to this release is a novel training methodology called Iterated Distillation and Amplification (IDA). Deep Cogito describes IDA as “a scalable and efficient alignment strategy for general superintelligence using iterative self-improvement”. This technique aims to overcome the inherent limitations of current LLM training paradigms, where model intelligence is often capped by the capabilities of larger “overseer” models or human curators.
The IDA process involves two key steps iterated repeatedly:
- Amplification: Using more computation to enable the model to derive better solutions or capabilities, akin to advanced reasoning techniques.
- Distillation: Internalising these amplified capabilities back into the model’s parameters.
Deep Cogito says this creates a “positive feedback loop” where model intelligence scales more directly with computational resources and the efficiency of the IDA process, rather than being strictly bounded by overseer intelligence.
Capabilities and Performance of Deep Cogito Models
The newly released Cogito models – based on Llama and Qwen checkpoints – are optimised for coding, function calling, and agentic use cases. A key feature is their dual functionality: “Each model can answer directly (standard LLM), or self-reflect before answering (like reasoning models),” similar to capabilities seen in models like Claude 3.5. However, Deep Cogito notes they “have not optimised for very long reasoning chains,” citing user preference for faster answers and the efficiency of distilling shorter chains.
Benchmark Results
Extensive benchmark results are provided, comparing Cogito models against size-equivalent state-of-the-art open models in both direct (standard) and reasoning modes. Across various benchmarks (MMLU, MMLU-Pro, ARC, GSM8K, MATH, etc.) and model sizes (3B, 8B, 14B, 32B, 70B,) the Cogito models generally show significant performance gains over counterparts like Llama 3.1/3.2/3.3 and Qwen 2.5, particularly in reasoning mode.
Conclusion
This release is labelled a preview, with Deep Cogito stating they are “still in the early stages of this scaling curve”. They plan to release improved checkpoints for the current sizes and introduce larger MoE models (109B, 400B, 671B) “in the coming weeks / months”. All future models will also be open-source.
FAQs
Q: What is Iterated Distillation and Amplification (IDA)?
A: IDA is a novel training methodology that uses iterative self-improvement to overcome the limitations of current LLM training paradigms.
Q: What are the benefits of IDA?
A: IDA creates a “positive feedback loop” where model intelligence scales more directly with computational resources and the efficiency of the IDA process.
Q: What are the Cogito models optimised for?
A: The Cogito models are optimised for coding, function calling, and agentic use cases.
Q: What are the benchmark results?
A: The Cogito models generally show significant performance gains over counterparts like Llama 3.1/3.2/3.3 and Qwen 2.5, particularly in reasoning mode.
Q: What is the next step for Deep Cogito?
A: Deep Cogito plans to release improved checkpoints for the current sizes and introduce larger MoE models (109B, 400B, 671B) “in the coming weeks / months”.