Minimax M2 – Olive Song, MiniMax

The AI race often fixates on benchmark scores and parameter counts. But what if the real win isn't just a bigger model, but a smarter, more practical one built for actual developers? Olive Song from Minimax introduces M2, a 10-billion-parameter open-weight model designed to excel in coding and agentic tasks by prioritizing real-world utility over synthetic metrics.

Expert Feedback Fuels Practical AI

"But then numbers don't tell everything because sometimes you get those super high number models you plug into them into your environment and they suck, right? So we really care about the dynamics in the community..."
Beyond Benchmarks: Minimax acknowledges that high benchmark scores often fail to translate into functional utility for developers. Their focus shifts to community adoption and real-world performance.
Developer-Driven Training: Minimax integrates in-house "expert developers" directly into the model's training loop. These experts provide precise feedback, acting as a reward model to shape M2's behavior for bug fixing, refactoring, and general coding efficiency. Think of it like a chef tasting a dish and giving precise feedback to the cook, rather than just relying on a recipe's ingredient list.
Full-Stack Proficiency: M2 is trained on real internet data and scaled environments, making it proficient across various programming languages and full-stack development tasks.

Interleaved Thinking for Dynamic Environments

"Instead of just stopping after one round of tool calling, it actually thinks again and reacts to the environments to see if the information is enough for it to get what it wants. So basically we call the interleaved thinking because it interleaves thinking with tool calling..."
Iterative Reasoning: M2 employs "interleaved thinking," an iterative process where the model repeatedly thinks, calls tools, processes responses, and re-evaluates its approach. This contrasts with single-pass execution. This is like a detective gathering clues, forming a hypothesis, checking it against new evidence, and refining their approach, instead of just making one guess.
Adapting to Noise: This iterative loop allows M2 to adapt to noisy, dynamic environments—handling unexpected tool errors, suboptimal outcomes, or changing external information (e.g., stock market perturbations).
Long-Horizon Automation: The model can automate complex, multi-step workflows requiring sustained interaction with tools like Gmail, Notion, or terminals, with minimal human intervention.

Generalization and Multi-Agent Scalability

"We conclude that agent generalization is adaptation to perturbations across the model's entire operational space... We designed and maintained perturbation pipelines of our data so that our model can actually generalize to a lot of agent scaffolds."
Robust Generalization: M2 achieves robust generalization by training with "perturbation pipelines" across its entire operational space, including tool information, system prompts, user prompts, and environment responses. Imagine training a pilot not just in perfect weather, but in simulations with sudden wind shifts, instrument failures, and unexpected air traffic, so they can adapt to any real-world scenario.
Cost-Effective Parallelism: With only 10 billion active parameters, M2 is small and efficient. This cost-effectiveness enables deploying multiple M2 instances in parallel for complex tasks like research, analysis, and report generation, acting as a team of specialized agents.

Key Takeaways:

Strategic Implication: The future of AI agents hinges on practical utility and adaptive reasoning, not just raw scale. Models that integrate expert feedback and iterative thinking will outperform those focused solely on benchmarks.
Builder/Investor Note: Builders should prioritize robust generalization through diverse training perturbations. Investors should seek models that demonstrate real-world adoption and cost-effective scalability for multi-agent architectures.
The So What?: The next 6-12 months will see a shift towards smaller, highly specialized, and deeply integrated AI models that function as reliable co-workers, driving efficiency in developer workflows and complex agentic tasks.

Podcast Link: https://www.youtube.com/watch?v=lY1iFbDPRlw

Minimax M2 challenges conventional AI benchmarks, demonstrating superior real-world performance in coding and agentic tasks through a novel blend of expert human feedback and iterative reasoning.

Minimax M2: A New Standard for Open-Weight AI

M2 consistently ranks at the top of intelligence and agent benchmarks among open-source models.
Despite benchmark scores, Minimax emphasizes community adoption, noting M2 achieved the most downloads and climbed to top three token usage on OpenRouter in its first week.
Minimax operates as a global company developing both foundation models and applications, fostering direct feedback from in-house developers.
This integrated approach ensures models meet practical developer needs.

Olive Song argues, "Numbers don't tell everything because sometimes you get those super high number models you plug into them into your environment and they suck, right?"

Expert-Driven Coding Performance

M2 trains on real internet data, scaling environments to allow the model to react to verifiable coding goals during reinforcement learning.
The model is full-stack and multilingual, leading in real-world language use cases.
Minimax utilizes in-house expert developers as reward models, providing precise feedback on model behaviors, bug fixing, and repo refactoring.
These experts identify reliable and trusted model behaviors, directly influencing M2's development cycle.

Olive Song states, "We actually scale something called expert developers as reward models."

Mastering Long-Horizon Tasks with Interleaved Thinking

Traditional reasoning models execute a single sequence of thinking, tool calls, and final output, failing in noisy, real-world scenarios with errors or unexpected results.
M2 employs interleaved thinking (a process where an AI model repeatedly thinks, acts, and re-evaluates its actions based on environmental feedback, mimicking human problem-solving).
This iterative approach allows M2 to adapt to environment noise, make suboptimal decisions, and choose alternative tools or actions.
M2 automates complex workflows across multiple applications like Gmail, Notion, and terminals with minimal human intervention.

Olive Song explains, "Instead of just stopping after one round of tool calling, it actually thinks again and reacts to the environments."

Robust Generalization via Data Perturbation

Initial attempts focused on tool scaling, which proved insufficient for generalization when environments or agent scaffolds changed.
M2's operational space includes tool information, system prompts, user prompts, chat templates, environments, and tool responses.
Minimax designed and maintains perturbation pipelines within its data to train M2 for robust generalization across various agent scaffolds.

Olive Song asserts, "We conclude that it's adaptation to perturbations across the model's entire operational space."

Cost-Effective Multi-Agent Scalability

Minimax demonstrates M2 powering an agent application where multiple copies perform research, analysis, and report generation concurrently.
The model's efficiency supports long-running agentic tasks and those requiring parallel processing.

Olive Song notes, "Because it is so small and so cost effective it can really support those long run agentic tasks and tasks that maybe require some kind of parallelism."

The Future: M2.1, M3, and Community Collaboration

Future iterations will target enhanced coding, memory management, proactive AI for workplace applications, and vertical expertise.
Minimax plans to integrate its advanced audio and video generation models (like Huoa) into future M2 versions.
The company emphasizes community feedback as essential for collaborative model development.

Olive Song states, "We really need feedback from the community if possible because we want to build this together."

Investor & Researcher Alpha

Capital Movement: Investment is shifting towards AI models demonstrating proven real-world utility and robust generalization in dynamic agentic workflows, rather than solely benchmark performance. Capital will favor platforms integrating expert human feedback loops.
New Bottleneck: The critical bottleneck is no longer just raw compute or parameter count, but the AI's ability to perform iterative, adaptive reasoning in noisy environments and generalize across diverse operational perturbations.
Research Direction Obsolete: Research focused solely on static, single-pass reasoning models or basic tool scaling for complex agentic tasks is becoming obsolete. The emphasis must shift to dynamic, human-like iterative problem-solving and robust environmental adaptation.

Strategic Conclusion

Minimax M2 establishes a new paradigm for practical, agentic AI by prioritizing real-world utility, expert feedback, and adaptive reasoning over raw scale. The industry must now focus on developing AI agents that learn and adapt continuously within dynamic, human-centric workflows.

Minimax M2 – Olive Song, MiniMax

Others You May Like

Why the US need Open Models | Nathan Lambert on what matters in the AI and science world

Why the US need Open Models | Nathan Lambert on what matters in the AI and science world

From $0 to $11B: The ElevenLabs Story

Minimax M2 – Olive Song, MiniMax

Join 10,000+ smart readers on our AI newsletter and stay ahead of the curve

Others You May Like

Why the US need Open Models | Nathan Lambert on what matters in the AI and science world

Why the US need Open Models | Nathan Lambert on what matters in the AI and science world

From $0 to $11B: The ElevenLabs Story