LLMs Outperform Same-Size Models with IDA

Deep Cogito Releases Novel Open Large Language Models with General Superintelligence Claims

Deep Cogito has released several open large language models (LLMs) that outperform competitors and claim to represent a step towards achieving general superintelligence.

Overview of Deep Cogito’s Open LLMs

The San Francisco-based company, which states its mission is “building general superintelligence,” has launched preview versions of LLMs in 3B, 8B, 14B, 32B, and 70B parameter sizes. Deep Cogito asserts that “each model outperforms the best available open models of the same size, including counterparts from LLAMA, DeepSeek, and Qwen, across most standard benchmarks.”

Iterated Distillation and Amplification (IDA)

Central to this release is a novel training methodology called Iterated Distillation and Amplification (IDA). Deep Cogito describes IDA as “a scalable and efficient alignment strategy for general superintelligence using iterative self-improvement”. This technique aims to overcome the inherent limitations of current LLM training paradigms, where model intelligence is often capped by the capabilities of larger “overseer” models or human curators.

The IDA process involves two key steps iterated repeatedly:

Amplification: Using more computation to enable the model to derive better solutions or capabilities, akin to advanced reasoning techniques.
Distillation: Internalising these amplified capabilities back into the model’s parameters.

Deep Cogito says this creates a “positive feedback loop” where model intelligence scales more directly with computational resources and the efficiency of the IDA process, rather than being strictly bounded by overseer intelligence.

Capabilities and Performance of Deep Cogito Models

The newly released Cogito models – based on Llama and Qwen checkpoints – are optimised for coding, function calling, and agentic use cases. A key feature is their dual functionality: “Each model can answer directly (standard LLM), or self-reflect before answering (like reasoning models),” similar to capabilities seen in models like Claude 3.5. However, Deep Cogito notes they “have not optimised for very long reasoning chains,” citing user preference for faster answers and the efficiency of distilling shorter chains.

Benchmark Results

Extensive benchmark results are provided, comparing Cogito models against size-equivalent state-of-the-art open models in both direct (standard) and reasoning modes. Across various benchmarks (MMLU, MMLU-Pro, ARC, GSM8K, MATH, etc.) and model sizes (3B, 8B, 14B, 32B, 70B,) the Cogito models generally show significant performance gains over counterparts like Llama 3.1/3.2/3.3 and Qwen 2.5, particularly in reasoning mode.

Conclusion

This release is labelled a preview, with Deep Cogito stating they are “still in the early stages of this scaling curve”. They plan to release improved checkpoints for the current sizes and introduce larger MoE models (109B, 400B, 671B) “in the coming weeks / months”. All future models will also be open-source.

FAQs

Q: What is Iterated Distillation and Amplification (IDA)?
A: IDA is a novel training methodology that uses iterative self-improvement to overcome the limitations of current LLM training paradigms.

Q: What are the benefits of IDA?
A: IDA creates a “positive feedback loop” where model intelligence scales more directly with computational resources and the efficiency of the IDA process.

Q: What are the Cogito models optimised for?
A: The Cogito models are optimised for coding, function calling, and agentic use cases.

Q: What are the benchmark results?
A: The Cogito models generally show significant performance gains over counterparts like Llama 3.1/3.2/3.3 and Qwen 2.5, particularly in reasoning mode.

Q: What is the next step for Deep Cogito?
A: Deep Cogito plans to release improved checkpoints for the current sizes and introduce larger MoE models (109B, 400B, 671B) “in the coming weeks / months”.

Post Views: 21

LLMs Outperform Same-Size Models with IDA

Building Belief: Organizational Change Management Strategies

Generate single title from this title How Kai Cenat saved my high school English class in 100 -150 characters. And it must return only...

SmartThings Blog

Generate single title from this title AI is rewriting the rules of the insurance industry in 100 -150 characters. And it must return only...

Simulation-based pipeline tailors training data for dexterous robots | MIT News

Building Belief: Organizational Change Management Strategies

Generate single title from this title How Kai Cenat saved my high school English class in 100 -150 characters. And it must return only...

SmartThings Blog

Generate single title from this title AI is rewriting the rules of the insurance industry in 100 -150 characters. And it must return only...

Simulation-based pipeline tailors training data for dexterous robots | MIT News

Generate single title from this title OpenAI And Perplexity Set To Battle Google For Browser Dominance in 100 -150 characters. And it must return...

Generate single title from this title Common Sense Media releases AI toolkit for school districts in 100 -150 characters. And it must return only...

Supporting mission-driven space innovation, for Earth and beyond | MIT News

LEAVE A REPLY Cancel reply

Latest

Building Belief: Organizational Change Management Strategies

Generate single title from this title How Kai Cenat saved my high school English class in 100 -150 characters. And it must return only...

SmartThings Blog

Categories

Useful Links

Our Newsletter