- MiMo-V2-Pro focuses on agentic AI, enabling multi-step tasks, coding, and workflow automation.
- Xiaomi claims strong benchmark performance, with coding results surpassing Claude Sonnet 4.6 and approaching Claude Opus 4.6.
- MiMo-V2-Omni and MiMo-V2-TTS expand capabilities into multimodal understanding and expressive speech generation.
Xiaomi has launched MiMo-V2-Pro, its latest flagship AI model, developed to handle complex, real-world tasks rather than just generate text. The company describes it as the “brain” behind next-generation AI agents—systems that can plan, execute, and complete multi-step workflows with minimal human input.
A strong showing in global benchmarks
MiMo-V2-Pro is already making its mark in global rankings. According to the Artificial Analysis Intelligence Index, a recognized benchmark for evaluating AI systems:
- It ranks 8th globally
- It is the second-highest ranked Chinese large language model

The model also performs well in agent-focused benchmarks like ClawEval and PinchBench, where it shows strong capabilities in reasoning, planning, and tool usage.
Built on scale and efficiency
One of the significant aspects of MiMo-V2-Pro is its scale. Xiaomi has significantly expanded both the model size and computational capacity compared to earlier versions.
- Total parameters exceed 1 trillion, with 42 billion active at runtime
- Supports a context window of up to 1 million tokens
- Uses an upgraded Hybrid Attention mechanism, improving efficiency while handling larger workloads
To further boost performance, Xiaomi has added a Multi-Token Prediction (MTP) layer. It allows the model to generate responses faster without sacrificing quality.
Designed for agentic AI
Unlike traditional AI models focused on chat or content creation, MiMo-V2-Pro is built specifically for agentic tasks, situations where AI needs to take initiative and complete objectives.
This includes:
- Executing multi-step workflows
- Interacting with tools and APIs
- Assisting in software development
- Automating complex digital processes
The model has been trained using a combination of supervised fine-tuning and feedback-based learning, with a strong focus on real-world usability rather than just benchmark performance.
Strong performance in coding tasks
MiMo-V2-Pro shows particularly strong capabilities in programming and software engineering.
According to Xiaomi:
- It outperforms Claude 4.6 Sonnet in coding-related tasks
- Its performance comes close to Claude Opus 4.6 in areas like system design and problem-solving
During early testing under the codename Hunter Alpha on the OpenRouter platform, coding tools accounted for the highest usage. This suggests that developers found the model practical and reliable in real workflows.

Integration with developer ecosystems
To encourage adoption, Xiaomi is integrating MiMo-V2-Pro with several popular agent frameworks, including:
- OpenClaw
- OpenCode
- KiloCode
- Blackbox
- Cline

The company is also offering one week of free API access, giving developers a chance to test its capabilities in real-world applications.
Pricing overview
The MiMo-V2-Pro is now publicly available via API, with pricing structured around usage:
| Model | Input Cost | Output Cost | Cache Read | Cache Write |
|---|---|---|---|---|
| MiMo-V2-Pro (≤256K) | $1 | $3 | $0.20 | $0 |
| MiMo-V2-Pro (256K–1M) | $2 | $6 | $0.40 | $0 |
| Claude Sonnet 4.6 | $3 | $15 | $0.30 | $3.75 |
| Claude Opus 4.6 | $5 | $25 | $0.50 | $6.25 |
Related
Beyond text with MiMo-V2-Omni
Alongside MiMo-V2-Pro, Xiaomi has released MiMo-V2-Omni, a multimodal AI model capable of understanding text, images, audio, and video together.
Unlike systems where these capabilities are added separately, MiMo-V2-Omni processes them as a single unified stream, resulting in more natural and context-aware reasoning.
Key features include:
- Advanced audio understanding, including multi-speaker analysis
- Strong visual reasoning and chart interpretation
- Video comprehension with the ability to anticipate what happens next
Xiaomi says that the model performs competitively against leading systems like Gemini 3 Pro and GPT-5.2, based on internal evaluations.

MiMo-V2-TTS gives AI a more human voice
Xiaomi also unveiled MiMo-V2-TTS, a text-to-speech system developed to make AI communication feel more natural and expressive.
Key features include:
- Emotion-aware voice generation
- Flexible style control using natural language prompts
- Support for dialects and character voices
- Ability to generate both speech and singing
The model is trained on over 100 million hours of speech data, allowing it to produce realistic tone, rhythm, and emotional nuance.
Important note: While some performance claims are based on internal testing, early benchmarks and developer feedback suggest that MiMo-V2-Pro could become a serious contender in the rapidly evolving AI landscape.
Keep up with the tech that actually matters.
Add us as a preferred source on Google Search for quicker access to our coverage.

