What's New
Changelog and release history for TachiBot MCP. Every release brings new models, tools, and orchestration capabilities.
v2.19.1March 21, 2026
MiniMax M2.7 — Self-Evolving AI
- MiniMax M2.5 → M2.7 — 2,300B MoE (100B active), 200K context. #1 on Artificial Analysis Intelligence Index
- SWE-Pro 56.22% — matches GPT-5.3-Codex. Multi-SWE-Bench 52.7% (#1, beats Opus 4.6 and GPT-5.4)
- Same pricing — $0.30/$1.20 per M tokens. Massive quality leap at zero extra cost
v2.18.0March 21, 2026Major Release
AI That Proves Its Work
Stop babysitting LLMs. Deploy a pipeline that reads actual files, cross-examines across five models, and demands passing tests before moving forward.
- Absolute goal alignment — define success criteria once. The engine verifies every step against your exact goals — drift gets caught at step 1, not step 50
- No blind spots reach production — 5-model rotation cross-examines code: Gemini deduces, Grok detects drift, GPT validates strategy, Qwen cross-checks, Kimi decomposes
- Hard evidence, not hallucinated progress — checkpoints demand raw git diffs, passing test results, and modified file lists. Zero reliance on paraphrased summaries
- Never hit a dead end — structured amendment protocol detects drift, proposes revisions with evidence and impact analysis. You approve before it pivots
- 39 tools operate in reality — every analysis tool reads actual source code from disk via the
files parameter. Models judge implementations, not stories about them - Your project gets smarter every run — post-completion reflexion saves architectural lessons to your devlog. Knowledge compounds across sessions
v2.17.2March 21, 2026
Files Parameter Rollout + Smart File Reader
files on 8 more tools — grok_architect, grok_brainstorm, openai_explain, openai_search, kimi_code, kimi_long_context, gemini_judge, gemini_brainstorm- Directory expansion — pass
src/tools/ to read all code files in a directory - Smart char budget — multi-file reads distribute tokens across files to prevent context overflow
- 23 of 37 tools now support the
files parameter
v2.17.1March 21, 2026
Smart Task Decomposition
- kimi_decompose readability overhaul — output now uses OVERVIEW / STRUCTURE / DETAILS / RISKS sections
- Smart decomposition — infers context, constraints, risks, and measurable criteria automatically
- Reasoning leak fixed — strips Kimi K2.5 chain-of-thought from output
- Tuned for format adherence — temp 0.3, 4500 tokens, 360s timeout
v2.17.0March 21, 2026
GPT-5.4-mini + Model Cleanup
- GPT-5.4-mini — new fast coding model (400k context, $0.75/$4.50 per 1M tokens, SWE-Bench 54.4%)
- Code tasks upgraded —
openai_code_review now uses gpt-5.4-mini — 94% of flagship quality, 70% cheaper - GPT-5.3 series retired —
gpt-5.3-codex and gpt-5.3 removed; coding capabilities absorbed into gpt-5.4 - Simplified lineup —
gpt-5.4 (flagship), gpt-5.4-mini (coding/fast), gpt-5.4-pro (expert)
v2.16.1March 6, 2026
Gemini 3.1 Pro Migration
- Gemini 3.1 Pro — migrated from
gemini-3-pro-preview to gemini-3.1-pro-preview before March 9 retirement - 1M context window — enhanced reasoning capabilities with Gemini 3.1 Pro
- Stale entries removed — cleaned up old display names and pricing for retired model
v2.16.0March 6, 2026
GPT-5.4 Upgrade + Brainstorm Fix
- GPT-5.4 default — most capable model (Mar 2026), $2.50/$15 per 1M tokens
- GPT-5.4-pro — expert model with higher compute ($30/$180 per 1M tokens)
- GPT-5.3-codex — new agentic coding model for code review tasks
- Gemini 3.1 Flash-Lite — added as fastest/cheapest option in 3.1 series
openai_brainstorm fixed — eliminated fragile duplicate API function; now uses shared retry/fallback logic- Token limits bumped — GPT-5.4 reasoning tokens eat into output limit; all OpenAI tools now have higher defaults
v2.15.6February 26, 2026
Full Audit: 6 More Fixes + Cost Optimization
- 6 tools had required enum anti-pattern —
usage_stats, openrouter_multi, gemini_judge, planner_maker, planner_runner, create_workflow. All fixed gemini_judge — had zero required params. Made perspectives required as primary content paramperplexity_reason downgraded — sonar-pro ($3/$15/M) → sonar-reasoning ($1/$5/M), 3x cheaperperplexity_research removed — sonar-deep-research ($5/$25/M) was burning $12 in 3 days- All 51 tools audited — zero remaining required enum violations
v2.15.5February 26, 2026
Tool Parameter Fixes + Gemini Stability
- Fixed parameter validation on
qwen_coder, kimi_code, minimax_code — AI clients were misusing required enum task param. Added query as required primary param, made task optional with defaults kimi_long_context — task enum now optional (default: analyze)- Gemini 3.1 → 3.0 rollback — Reverted to stable
gemini-3-pro-preview (3.1 had timeout/503 issues) - Gemini timeout 30s → 90s — Pro models need longer than Flash
v2.15.0February 12, 2026
31 Prompt Techniques + /blueprint Skill + MiniMax M2.5
- 9 new prompt techniques —
reflexion (Shinn 2023), react (Yao 2022), scot (Li 2025, +13.79% HumanEval), pre_mortem, rubber_duck, test_driven, pre_post, bdd_spec, least_to_most. Total: 31 techniques /blueprint skill — Multi-model council → bite-sized TDD implementation plans. 7-step pipeline: Grok search → Qwen+Kimi analysis → GPT pre-mortem → Gemini final TDD output- MiniMax M2.5 — SWE-Bench 80.2% (was 72.5%). Embedded SCoT, reflexion, rubber_duck techniques. Per-task temperatures
- Planner → writing-plans bridge —
planner_maker now outputs bite-sized TDD steps (exact files, test-first, commit points) - Enhanced skills — /breakdown uses least_to_most + pre_mortem, /judge adds pre-mortem to critique, /decompose adds contracts, /prompt auto-recommends from 30 intents
- 51 tools across 7 providers, 9 skills for Claude Code
v2.14.7February 5, 2026
Gemini Judge + Multi-Model Jury
gemini_judge — Science-backed LLM-as-a-Judge evaluation (arXiv:2411.15594). 4 modes: synthesize, evaluate, rank, resolvejury — Multi-model jury panel. Configurable jurors (grok, openai, qwen, kimi, perplexity, minimax) run in parallel, Gemini synthesizes verdict. Based on "Replacing Judges with Juries" (Cohere, arXiv:2404.18796)- Perplexity fix —
sonar-pro model ID corrected (was using lightweight sonar by mistake) perplexity_research — Removed in v2.15.6 (cost too high)- 51 tools across 7 providers in the full profile
v2.14.6February 5, 2026
Qwen3-Coder-Next
qwen_coder upgraded — Qwen3-Coder-Next (80B/3B MoE, 262K context, SWE-Bench >70%)- 3x cheaper — $0.07/$0.30 per M tokens (was $0.22/$0.88)
- 2x context — 262K tokens (was 131K)
- Auto-fallback — Falls back to legacy 480B coder on provider failure
v2.14.5February 2, 2026
Claude Code Integration
- Tool annotations — All 51 tools now have MCP-standard annotations for better discovery
- Token overhead reduced — Stripped ANSI formatting, clean plain text output
- 25K character safety net — Smart truncation prevents Claude Code context overflow
v2.10January 28, 2026
Multi-Model Planner
- Multi-model council creates verified implementation plans
- Model roles — Grok searches ground truth, Qwen analyzes feasibility, GPT-5.2 critiques, Gemini scores quality
- New models — Kimi K2.5 (multimodal + agent swarm), MiniMax M2.5 (SWE-Bench 80.2%)
- New tools —
qwen_reason, minimax_code, minimax_agent, gemini_search - Smart routing — Tool routing based on availability, cost, and quality
v2.8January 20, 2026
Prompt Techniques
- FocusExecutionService for clean mode orchestration
- 22 research-backed techniques —
first_principles, tree_of_thoughts, council_of_experts, and more - Preview before execute — See enhanced prompts before running them
- Heartbeat support for long-running operations
v2.7.9January 2, 2026
Search Grounding
qwen_algo — O(1)-first algorithm analysis with Qwen3-235B-Thinking (235B MoE, LiveCodeBench 91.4)gemini_search — Google Search grounding with dynamic retrieval- Format utilities for consistent output across tools
v2.3December 28, 2025
Enhanced Thinking
- nextThought with finalJudge — Auto-call judge model when session completes
- Context aliases — Use
"none", "recent", "all" instead of magic numbers - Context distillation — Compress 8000+ tokens to ~500 (5x savings)
usage_stats tool for tracking tool usage and costs
v2.1November 25, 2025
Gateway Mode
- OpenRouter Gateway — One API key for all models
- Unified billing through OpenRouter
v2.0October 15, 2025
Major Rewrite
- Multi-model orchestration rebuilt from scratch
- Tool profiles for context control
- YAML workflow engine with variable interpolation
- 6 AI providers, 31+ tools (now 51 tools across 7 providers)
Explore More
Dive deeper into TachiBot's capabilities