MiniMax released MiniMax M3 on June 1, 2026. Based on the official introduction, M3 has a clear position: it targets coding, Agent, and long-context tasks, while also adding native multimodal capabilities.
The most interesting part of this release is not a single benchmark score, but the fact that MiniMax puts three capability groups into one model:
- coding and Agent task capability;
- up to a
1M tokenscontext window; - native multimodality, with image and video input support;
- planned open weights for later private deployment and fine-tuning.
If you are watching the progress of Chinese models in coding assistants, automated workflows, long-document processing, and multimodal understanding, M3 is worth a separate look.
M3’s Core Positioning
MiniMax describes M3 as a frontier model for coding and Agent tasks, with 1M context and native multimodality.
These keywords map to several real usage pain points:
- coding tasks are not just function completion; they also require reading projects, editing files, running tools, and fixing errors;
- Agent tasks produce large amounts of tool-call records, logs, and intermediate results;
- long documents, long videos, and full codebases all need larger context windows;
- charts, screenshots, formulas, and video frames cannot be understood through plain text alone.
So M3 feels more like a model prepared for long-chain tasks, rather than one aimed only at ordinary chat or short text generation.
The 1M Context Comes From MSA
M3 uses MiniMax’s self-developed MSA, short for MiniMax Sparse Attention. In the official explanation, MSA is designed to address the rapid growth in computational complexity that traditional full attention faces under long contexts.
Put simply, full attention becomes expensive quickly as context length grows. MSA uses sparse attention and a hardware-friendly KV block access pattern, making it easier for the model to scale in long-context scenarios.
MiniMax says the M3 API supports up to 1M tokens of context, with a guaranteed minimum of 512K tokens. This matters for several task types:
- reading a complete project or large module;
- processing long research reports, contracts, logs, and knowledge-base materials;
- preserving tool-call history during multi-round Agent execution;
- analyzing long videos or multimodal materials.
That said, long context does not mean every task should fill the entire window. In real use, retrieval, chunking, caching, and task decomposition still matter. A 1M context window is more like an upper bound for complex tasks, not a replacement for engineering design.
Coding and Agent Are the Focus
In the official report, M3 is shown with results across several coding and Agent benchmarks:
| Benchmark | Official score |
|---|---|
| SWE-Bench Pro | 59.0% |
| Terminal-Bench 2.1 | 66.0% |
| SWE-fficiency | 34.8% |
| KernelBench Hard | 28.8% |
| MCP Atlas | 74.2% |
These numbers are useful references, but I would not judge the model only by leaderboard scores. The more important point is that MiniMax puts M3’s training and evaluation focus closer to real collaborative Agent scenarios.
Real coding work is not “generate a function from one sentence.” It usually includes:
- repeatedly clarifying requirements;
- reading existing code;
- making a change plan;
- running commands and tests;
- continuing to fix issues based on errors;
- preserving decision context across multiple rounds.
This is also why M3 and MiniMax Code were released together. Model capability is only the base layer. Whether it can finish engineering tasks also depends on the outer Agent harness, tool calls, context management, and verification flow.
Long-Horizon Tasks Shown by MiniMax
MiniMax lists several cases that are closer to real work in its report.
The first is paper reproduction. MiniMax asked M3 to independently reproduce an ICLR 2025 Outstanding Paper. M3 ran continuously for nearly 12 hours, produced 18 commits and 23 experimental figures, and completed the core experiment reproduction.
The point of this case is not that M3 can write a paper summary. It used several capabilities at the same time:
- multimodal understanding for curves, formulas, and charts in the paper;
- long context to place the paper, code, and experiment logs in the same task chain;
- coding and Agent capability for continuous running, experimentation, verification, and correction.
The second case is CUDA kernel optimization. MiniMax asked M3 to start from a Triton skeleton that could not run directly and optimize an FP8 GEMM kernel on NVIDIA Hopper GPUs. In about 24 hours, M3 completed 147 benchmark submissions and 1,959 tool calls, raising hardware peak utilization from 7.6% to 71.3%, equivalent to a 9.4x speedup.
This case shows M3’s emphasis on long-horizon autonomous iteration. Ordinary code-generation models often stop after the first few failed rounds, while Agent-style models need to keep adjusting direction based on feedback.
The third case is letting M3 train models autonomously. In PostTrainBench, MiniMax gave M3 four pretrain-only base models and asked it to complete data synthesis, training, evaluation, and iteration within 12 hours. M3 finally scored 0.37, below Opus 4.7 and GPT-5.5, but clearly ahead of other models.
These cases all come from MiniMax’s own tests, so they should not be treated as independent third-party evaluation results. But they do show M3’s product direction: putting the model into long-running, verifiable task loops with feedback.
Why Native Multimodality Matters
M3 is not simply a text model with visual ability bolted on afterward. MiniMax says it was trained with mixed modalities from the early stage and that the data pipeline was rebuilt to scale training data to the 100T+ level.
For developers, multimodality mainly matters in scenarios such as:
- reading screenshots, charts, formulas, and design drafts;
- analyzing PDFs, papers, reports, and experiment figures;
- understanding visual changes in long videos;
- recognizing UI elements in desktop automation tasks.
MiniMax Code also productizes this direction. According to MiniMax, MiniMax Code can combine M3’s multimodal capability with computer use, such as batch-entering information across applications based on spreadsheet content.
MiniMax Code and Agent Team
Alongside the M3 release, MiniMax Code was also updated. MiniMax positions it as an Agent product better suited for M3, designed to unlock M3’s long-context, coding, Agent, and multimodal capabilities.
MiniMax Code’s Agent Team can split large tasks into multi-stage, concurrent, dynamically adjustable workflows, and use a Producer + Verifier-style adversarial loop to keep producing, reflecting, and correcting.
This direction belongs to the same broad category as Claude Code, Codex CLI, opencode, and similar tools: the model does not only answer questions, but enters a local or cloud development environment, reads files, edits files, runs commands, and then continues based on the results.
MiniMax emphasizes:
- M3’s 1M long context;
- multimodality and computer use;
- long-running autonomous execution by Agent Team;
- large usage quotas under Token Plan.
Token Plan and API
MiniMax also updated its Token Plan. The official three tiers are:
| Plan | Monthly fee | Monthly M3 quota |
|---|---|---|
| Plus | $20/month |
about 1.7B tokens |
| Max | $50/month |
about 5.1B tokens |
| Ultra | $120/month |
about 9.8B tokens |
These quotas look very aggressive and are suitable for high-frequency coding assistants, batch processing, long-document processing, and multimodal tasks. But whether they are truly cost-effective still depends on availability by region, concurrency limits, speed, stability, context pricing, and task success rate.
On the API side, M3 is already available. Several details are worth noting:
- inputs of
<=512K tokensare billed at the standard rate; - inputs above
512K tokensenter the higher long-context pricing tier; - thinking can be enabled or disabled;
- thinking enabled is better for complex reasoning, Agent tasks, and long-horizon collaboration;
- thinking disabled responds faster and suits chat and code completion;
standardandpriorityservice tiers are supported, with priority intended for higher concurrency and more stable latency.
The model name in the official example is:
|
|
The example endpoint is:
|
|
If you want to integrate M3 into existing coding tools, first confirm three things: OpenAI-compatible support, streaming output support, and tool-call format.
Open Weights Are Worth Watching, but Need to Land
MiniMax says M3 will open-source its weights on Hugging Face and GitHub, supporting private cluster deployment and fine-tuning. This is important.
If the weights are truly released and inference-framework support goes smoothly, M3 may enter several enterprise scenarios:
- private codebase assistants;
- internal knowledge-base and document analysis;
- highly sensitive data scenarios;
- government, enterprise, and local deployment environments;
- low-cost batch Agent workflows.
But concrete details still need to land, including:
- weight size and license;
- quantization options;
- support in vLLM, SGLang, llama.cpp, and other frameworks;
- VRAM requirements;
- real cost of multimodality and long context in local deployment;
- whether full training or fine-tuning toolchains will be released.
So it is worth watching now, but it is too early to treat “open weights” as already production-ready.
Who Should Try It First
M3 is better suited for these users to try first:
- developers who often use AI coding agents;
- teams that want to replace part of their Claude, GPT, or Gemini coding workload with a Chinese model;
- people with long-document, long-codebase, or long-log analysis needs;
- developers building automation workflows, MCP, or agent harnesses;
- users who need large token quotas for batch processing;
- teams with long-term needs for local deployment and open weights.
If you only need ordinary chat, short text rewriting, or simple Q&A, M3 may not be the first model you need to try. Its focus is clearly on heavier Agent and engineering tasks.
My Take
The most interesting part of the MiniMax M3 release is its route: instead of only comparing with general chat models, it directly packages coding, Agent, long context, and multimodality into a model aimed at engineering workflows.
That direction makes sense. The future competition in AI programming tools will not only be about whether a model can write a piece of code, but whether it can keep planning, executing, verifying, and correcting itself in long-running tasks while controlling context cost.
Still, whether M3 can enter a main workflow depends on more practical questions:
- whether the API is stable;
- whether long-context pricing is controllable;
- whether the MiniMax Code toolchain is mature;
- whether OpenAI-compatible and mainstream agent-tool integration is smooth;
- whether open weights land on time;
- whether third-party evaluation and real project experience support the official claims.
If these areas perform well later, M3 will become one of the most worth-watching Chinese coding Agent models.