What's New - TachiBot MCP

v2.21.1April 26, 2026Model Bumps

GPT-5.5 + Kimi K2.6 — Frontier Refresh

OpenAI gpt-5.4 → gpt-5.5 — agentic-focused, 1.1M context, omnimodal. Released April 23. gpt-5.5-pro for premium tier ($30/$180)
Kimi K2.5 → K2.6 — 1T MoE, leads SWE-bench Pro for long-horizon coding. Released April 20. Auto-fallback to K2.5 on quota errors
Qwen3.6-Plus available — qwen/qwen3.6-plus registered ($0.325/$1.95). Coder stays on qwen3-coder-next until 3.6-coder ships
API-verified — all model IDs confirmed live via OpenAI /v1/models, OpenRouter, and xAI before release
Holding — Grok 5 unreleased (Q2 expected); Gemini 3.5 in preview (May GA at I/O); gpt-5.4-mini retained for code/explain (no 5.5-mini yet)

v2.21.0April 13, 2026UX Fix

Auto-Alias Param Names — LLMs Stop Failing on Wrong Keys

Transparent remap — query ↔ problem ↔ prompt ↔ question ↔ topic auto-mapped before Zod validation
Zero per-tool changes — one z.preprocess() hook in safeAddTool. Every tool benefits automatically
Eliminates -32602 errors — LLMs that reach for the wrong synonym now succeed instead of hard-failing on schema validation
11 unit tests covering directional aliasing, primary-wins, and missing-key behavior

v2.20.0April 10, 2026Model Bump

Grok 4.20 Flagship — Lower Hallucination, 16-Agent Swarm

Grok 4 → 4.20 — all defaults moved to flagship: grok-4.20-0309-reasoning for reason/search (low hallucination, 2M ctx)
Multi-agent architect — grok_architect now runs grok-4.20-multi-agent-0309 (4–16 parallel agents)
Code/debug/brainstorm — non-reasoning variant for fast turn-around on lighter tasks
Timeout fixes — AbortController on OpenAI (90s default, 180s high-reasoning) + Grok (60–180s by mode). Stops hung calls dead
Cleanup — killed stale grok-4-0709 + gpt-4-mini refs across 6 scattered files

v2.19.1March 21, 2026

MiniMax M2.7 — Self-Evolving AI

MiniMax M2.5 → M2.7 — 2,300B MoE (100B active), 200K context. #1 on Artificial Analysis Intelligence Index
SWE-Pro 56.22% — matches GPT-5.3-Codex. Multi-SWE-Bench 52.7% (#1, beats Opus 4.6 and GPT-5.4)
Same pricing — $0.30/$1.20 per M tokens. Massive quality leap at zero extra cost

v2.18.0March 21, 2026Major Release

AI That Proves Its Work

Stop babysitting LLMs. Deploy a pipeline that reads actual files, cross-examines across five models, and demands passing tests before moving forward.

Absolute goal alignment — define success criteria once. The engine verifies every step against your exact goals — drift gets caught at step 1, not step 50
No blind spots reach production — 5-model rotation cross-examines code: Gemini deduces, Grok detects drift, GPT validates strategy, Qwen cross-checks, Kimi decomposes
Hard evidence, not hallucinated progress — checkpoints demand raw git diffs, passing test results, and modified file lists. Zero reliance on paraphrased summaries
Never hit a dead end — structured amendment protocol detects drift, proposes revisions with evidence and impact analysis. You approve before it pivots
39 tools operate in reality — every analysis tool reads actual source code from disk via the files parameter. Models judge implementations, not stories about them
Your project gets smarter every run — post-completion reflexion saves architectural lessons to your devlog. Knowledge compounds across sessions

v2.17.2March 21, 2026

Files Parameter Rollout + Smart File Reader

files on 8 more tools — grok_architect, grok_brainstorm, openai_explain, openai_search, kimi_code, kimi_long_context, gemini_judge, gemini_brainstorm
Directory expansion — pass src/tools/ to read all code files in a directory
Smart char budget — multi-file reads distribute tokens across files to prevent context overflow
23 of 37 tools now support the files parameter

v2.17.1March 21, 2026

Smart Task Decomposition

kimi_decompose readability overhaul — output now uses OVERVIEW / STRUCTURE / DETAILS / RISKS sections
Smart decomposition — infers context, constraints, risks, and measurable criteria automatically
Reasoning leak fixed — strips Kimi K2.5 chain-of-thought from output
Tuned for format adherence — temp 0.3, 4500 tokens, 360s timeout

v2.17.0March 21, 2026

GPT-5.4-mini + Model Cleanup

GPT-5.4-mini — new fast coding model (400k context, $0.75/$4.50 per 1M tokens, SWE-Bench 54.4%)
Code tasks upgraded — openai_code_review now uses gpt-5.4-mini — 94% of flagship quality, 70% cheaper
GPT-5.3 series retired — gpt-5.3-codex and gpt-5.3 removed; coding capabilities absorbed into gpt-5.4
Simplified lineup — gpt-5.4 (flagship), gpt-5.4-mini (coding/fast), gpt-5.4-pro (expert)

v2.16.1March 6, 2026

Gemini 3.1 Pro Migration

Gemini 3.1 Pro — migrated from gemini-3-pro-preview to gemini-3.1-pro-preview before March 9 retirement
1M context window — enhanced reasoning capabilities with Gemini 3.1 Pro
Stale entries removed — cleaned up old display names and pricing for retired model

v2.16.0March 6, 2026

GPT-5.4 Upgrade + Brainstorm Fix

GPT-5.4 default — most capable model (Mar 2026), $2.50/$15 per 1M tokens
GPT-5.4-pro — expert model with higher compute ($30/$180 per 1M tokens)
GPT-5.3-codex — new agentic coding model for code review tasks
Gemini 3.1 Flash-Lite — added as fastest/cheapest option in 3.1 series
openai_brainstorm fixed — eliminated fragile duplicate API function; now uses shared retry/fallback logic
Token limits bumped — GPT-5.4 reasoning tokens eat into output limit; all OpenAI tools now have higher defaults

v2.15.6February 26, 2026

Full Audit: 6 More Fixes + Cost Optimization

6 tools had required enum anti-pattern — usage_stats, openrouter_multi, gemini_judge, planner_maker, planner_runner, create_workflow. All fixed
gemini_judge — had zero required params. Made perspectives required as primary content param
perplexity_reason downgraded — sonar-pro ($3/$15/M) → sonar-reasoning ($1/$5/M), 3x cheaper
perplexity_research removed — sonar-deep-research ($5/$25/M) was burning $12 in 3 days
All 51 tools audited — zero remaining required enum violations

v2.15.5February 26, 2026

Tool Parameter Fixes + Gemini Stability

Fixed parameter validation on qwen_coder, kimi_code, minimax_code — AI clients were misusing required enum task param. Added query as required primary param, made task optional with defaults
kimi_long_context — task enum now optional (default: analyze)
Gemini 3.1 → 3.0 rollback — Reverted to stable gemini-3-pro-preview (3.1 had timeout/503 issues)
Gemini timeout 30s → 90s — Pro models need longer than Flash

v2.15.0February 12, 2026

31 Prompt Techniques + /blueprint Skill + MiniMax M2.5

9 new prompt techniques — reflexion (Shinn 2023), react (Yao 2022), scot (Li 2025, +13.79% HumanEval), pre_mortem, rubber_duck, test_driven, pre_post, bdd_spec, least_to_most. Total: 31 techniques
/blueprint skill — Multi-model council → bite-sized TDD implementation plans. 7-step pipeline: Grok search → Qwen+Kimi analysis → GPT pre-mortem → Gemini final TDD output
MiniMax M2.5 — SWE-Bench 80.2% (was 72.5%). Embedded SCoT, reflexion, rubber_duck techniques. Per-task temperatures
Planner → writing-plans bridge — planner_maker now outputs bite-sized TDD steps (exact files, test-first, commit points)
Enhanced skills — /breakdown uses least_to_most + pre_mortem, /judge adds pre-mortem to critique, /decompose adds contracts, /prompt auto-recommends from 30 intents
51 tools across 7 providers, 9 skills for Claude Code

v2.14.7February 5, 2026

Gemini Judge + Multi-Model Jury

gemini_judge — Science-backed LLM-as-a-Judge evaluation (arXiv:2411.15594). 4 modes: synthesize, evaluate, rank, resolve
jury — Multi-model jury panel. Configurable jurors (grok, openai, qwen, kimi, perplexity, minimax) run in parallel, Gemini synthesizes verdict. Based on "Replacing Judges with Juries" (Cohere, arXiv:2404.18796)
Perplexity fix — sonar-pro model ID corrected (was using lightweight sonar by mistake)
perplexity_research — Removed in v2.15.6 (cost too high)
51 tools across 7 providers in the full profile

v2.14.6February 5, 2026

Qwen3-Coder-Next

qwen_coder upgraded — Qwen3-Coder-Next (80B/3B MoE, 262K context, SWE-Bench >70%)
3x cheaper — $0.07/$0.30 per M tokens (was $0.22/$0.88)
2x context — 262K tokens (was 131K)
Auto-fallback — Falls back to legacy 480B coder on provider failure

v2.14.5February 2, 2026

Claude Code Integration

Tool annotations — All 51 tools now have MCP-standard annotations for better discovery
Token overhead reduced — Stripped ANSI formatting, clean plain text output
25K character safety net — Smart truncation prevents Claude Code context overflow

v2.10January 28, 2026

Multi-Model Planner

Multi-model council creates verified implementation plans
Model roles — Grok searches ground truth, Qwen analyzes feasibility, GPT-5.2 critiques, Gemini scores quality
New models — Kimi K2.5 (multimodal + agent swarm), MiniMax M2.5 (SWE-Bench 80.2%)
New tools — qwen_reason, minimax_code, minimax_agent, gemini_search
Smart routing — Tool routing based on availability, cost, and quality

v2.8January 20, 2026

Prompt Techniques

FocusExecutionService for clean mode orchestration
22 research-backed techniques — first_principles, tree_of_thoughts, council_of_experts, and more
Preview before execute — See enhanced prompts before running them
Heartbeat support for long-running operations

v2.7.9January 2, 2026

Search Grounding

qwen_algo — O(1)-first algorithm analysis with Qwen3-235B-Thinking (235B MoE, LiveCodeBench 91.4)
gemini_search — Google Search grounding with dynamic retrieval
Format utilities for consistent output across tools

v2.3December 28, 2025

Enhanced Thinking

nextThought with finalJudge — Auto-call judge model when session completes
Context aliases — Use "none", "recent", "all" instead of magic numbers
Context distillation — Compress 8000+ tokens to ~500 (5x savings)
usage_stats tool for tracking tool usage and costs

v2.1November 25, 2025

Gateway Mode

OpenRouter Gateway — One API key for all models
Unified billing through OpenRouter

v2.0October 15, 2025

Major Rewrite

Multi-model orchestration rebuilt from scratch
Tool profiles for context control
YAML workflow engine with variable interpolation
6 AI providers, 31+ tools (now 51 tools across 7 providers)

Explore More

Dive deeper into TachiBot's capabilities

Documentation

Back to the main docs

Tools Overview

See all 51 available tools

Examples

Practical usage patterns and workflows