Open-Source Multi-Model Orchestration

Reasoning in Steps, Not Guesses.

TachiBot runs your prompt through structured reasoning tools — council, jury, planner, step-by-step — so models debate, verify, and reason deeper before they answer. One answer you can trust.

Gateway:One OpenRouter key

BYOB:Your own provider keys

Perplexity: Always needs own key

Get Started →Read Docs

"Have Grok check Twitter for that error message""Ask Perplexity what changed in React 19 this week""Get Gemini to brainstorm, then have Kimi K3 and GPT-5.6 both analyze it"

Providers

Tools

Prompt Techniques

Stop Trusting. Start Verifying.

Same question. Different approach. Better answer.

"What breaking changes are in React 19?"

One Model

Generic advice, might miss recent updates

No sources or documentation links

Could confuse React 18 vs 19 features

No way to verify accuracy

Unverified, possibly outdated

TachiBot

1. Run in Parallel

OpenAIGooglePerplexityOpenRouter

2. Cross-Verify

Models challenge each other, 3-200 rounds

3. Fact-Check

PerplexityGrok

Live sources with recency filters

Verified, with official sources

Get Started →

Building Blocks for Thinking

Mix and Match Your Reasoning Pipeline

Mix and match tools, prompt techniques, and workflows to build your reasoning pipeline.

New

Multi-Model Planner

Create bulletproof implementation plans. A council of 6 AI models — Grok searches for ground truth, Qwen analyzes code, Kimi K3 reasons step-by-step and decomposes into dependency-ordered subtasks, GPT-5.6 critiques gaps, Qwen 235B drafts the synthesis, and Gemini judges the final plan with quality scores and verification checkpoints.

planner_maker + planner_runner

New

Multi-Model Jury

Run your question through a lab-diverse panel of AI jurors in parallel. Grok reasons from first principles, DeepSeek brings frontier math & logic, Kimi decomposes step-by-step, GPT evaluates tradeoffs — with 12 jurors to choose from, including a free offline local juror. Gemini Judge synthesizes a unified verdict using science-backed LLM-as-a-Judge methodology.

jury + gemini_judge

12 AI Providers

GPT-5.6, Gemini, Grok, Perplexity, Kimi K3, Qwen, MiniMax, DeepSeek V4, GLM-5.2, StepFun, ERNIE — plus free local models via Ollama or LM Studio. Gateway mode with one OpenRouter key, or BYOB with your own provider keys.

>70%Qwen Coder-Next·SWE98%Qwen 235B·HMMT2.8TKimi K3·open MoE62.1GLM-5.2·SWE Pro−54%GPT-5.6 sol·tokens1MGemini 3.6 Flash·context

gateway + BYOB + local

New

31 Prompt Techniques

Research-backed patterns including reflexion, SCoT, ReAct, pre_mortem, and 27 more. Embedded in tool prompts for automatic application. Preview before executing.

preview → execute

YAML Workflows

Chain unlimited steps. Variable interpolation, step dependencies, auto-distillation, comparison tables, and optional AI judge. PingPong debates, research pipelines, code reviews.

workflow + planner

Multi-model deliberation

The Council & Jury

One question fans out to a panel of independent models in parallel. Their perspectives are aggregated, then a Gemini judge synthesizes a single verdict — the “replace judges with juries” pattern, made real.

FLOW · jury → gemini_judge

—·Press replay to trace the deliberation

QUERY

grok

first-principles

deepseek

frontier math

kimi

step-by-step

openai

tradeoffs

AGGREGATE

perspectives

gemini

presiding judge

VERDICT

active data path jurors (parallel) presiding judge

Where each model is strongest

researched via grok_search + perplexity_ask · Jun 2026

Grok 4.5juror

Opus-class flagship — real-time info, configurable reasoning effort, lowest hallucination rate.

DeepSeek V4 Projuror

Open-weight frontier reasoning — top AIME & CodeElo math/CP scores.

Kimi K3juror

2.8T open-weight MoE — the largest open model shipped. Long-horizon agentic coding, 1M context, natively multimodal.

GPT-5.6 soljuror

Agentic workflows & rock-solid tool-use reliability. 54% more token-efficient than 5.5.

Gemini 3.1 Projudge

Long-context multimodal reasoning & science — the presiding judge.

GLM-5.2

Long-horizon coding — SWE-Bench Pro 62.1, Terminal-Bench 81.0; agentic planning juror.

StepFun 3.7 Flash

Near-frontier AIME/SWE-Verified reasoning at flash-tier cost.

ERNIE 4.5 VL

Broad world knowledge & human-preference/arena strength.

Gemini 3.6 Flash

High-speed frontier agentic coding & grounded search, 1M context.

Qwen3-235B-Thinking

Heavy mathematical & formal reasoning in thinking mode (HMMT 98%).

MiniMax M3

1M-token context + long-horizon agents, cost-efficient.

Local models

Free offline juror via Ollama / LM Studio — zero token cost, fully private. Runs any local model, incl. Nous Hermes builds. (The Hermes agent itself is model-agnostic — 300+ backends from GPT and Claude to self-hosted.)

Judge

gemini_judge

One model · one pass

A single evaluator. Give it a set of perspectives you already have and it scores, ranks, resolves conflicts, or merges them into one answer. It is the synthesis primitive — the last step the jury and council both lean on.

Jury

jury

Many models · one round

Many jurors answer in parallel, then a judge synthesizes. One tool call, fast, and the default entry point for any “which is right / which is best?” question. = N jurors + 1 judge.

Council

/council

Many models · many rounds

The heavyweight. Research → adversarial debate → sequential reasoning → final synthesis, iterated over multiple rounds. For architecture calls and deep trade-offs where one round isn't enough.

The arsenal

Every tool, filterable

Filter the full toolset by capability or provider — or search by name. Each tool follows the strict design rule: one required string content param, all enums optional with defaults.

Capability

Provider

Showing 65 tools

grok_reasonGrok (xAI)

Deep first-principles reasoning with optional heavy mode.

Reasoninggrok-4.5

openai_reasonOpenAI

Mathematical & high-effort reasoning with GPT-5.6 sol at high reasoning effort.

Reasoninggpt-5.6-sol

qwen_reasonQwen

Heavy mathematical reasoning (235B thinking).

Reasoningqwen3-coder-next

qwq_reasonQwen

Multi-perspective deliberation: optimist, pessimist, expert, contrarian.

Reasoningqwen3-coder-next

kimi_thinkingKimi

Multimodal reasoning with a 100-subagent swarm.

Reasoningkimi-k3

perplexity_reasonPerplexity

Reasoning grounded in live web search.

Reasoningsonar

deepseek_reasonDeepSeek

Open-weight frontier reasoning — top AIME/GPQA math & logic chains.

Reasoningdeepseek-v4-pro

glm_reasonGLM (Z.ai)

Agentic reasoning & tool-use planning (SWE-Bench Pro leader).

Reasoningglm-5.2

stepfun_reasonStepFun

Efficient deep reasoning — high AIME/SWE-Verified at flash-tier cost.

Reasoningstep-3.7-flash

ernie_reasonERNIE (Baidu)

Broad-knowledge reasoning with human-preference/arena strength.

Reasoningernie-4.5-vl

local_queryLocal

Query a local model via Ollama/LM Studio/llama.cpp/vLLM — free, offline, private.

Reasoningollama · lm studio

nextThoughtTachiBot

Sequential thinking chain with optional per-step model execution.

Reasoningorchestration

thinkTachiBot

Lightweight internal reasoning log.

Reasoningorchestration

qwen_coderQwen

Agentic code generation. SWE-bench >70%.

Codingqwen3-coder-next

qwen_algoQwen

Algorithm analysis: complexity & optimization tiers.

Codingqwen3-coder-next

deepseek_algoDeepSeek

Strongest algorithmic review: correctness, Big-O, edge cases, data structures.

Codingdeepseek-v4-pro

qwen_competitiveQwen

Competitive-programming problem solving.

Codingqwen3-coder-next

kimi_codeKimi

SWE-focused code generation & bug-fixing leader.

Codingkimi-k3

minimax_codeMiniMax

Single-pass code operations, cost-efficient (1M context).

Codingminimax-m3

grok_codeGrok (xAI)

Code analysis & optimization.

Codinggrok-4.5

grok_debugGrok (xAI)

Specialized debugging assistance.

Codinggrok-4.5

debug_triageGrok (xAI)

Ranked root-cause hypotheses with likelihoods, a discriminating check per hypothesis, and the minimal fix for the leader.

Codinggrok-4.5

openai_code_reviewOpenAI

Comprehensive code review with focus areas.

Codinggpt-5.6-sol

testgenQwen

Generate runnable tests via Qwen3-Coder-Next — edge cases enumerated first.

Codingqwen3-coder-next

diff_reviewTachiBot

Multi-model diff-aware review — Kimi K3 + DeepSeek + GPT-5.6 scoped to changed lines, Gemini-judged verdict.

Codingorchestration

perplexity_askPerplexity

Web search with current information (Sonar).

Researchsonar

grok_searchGrok (xAI)

Live web search with Grok 4.5.

Researchgrok-4.5

grok_search_liteGrok (xAI)

10x-cheaper live search on grok-4-1-fast — high-volume lookups and jury fan-outs.

Researchgrok-4-1-fast

openai_searchOpenAI

Real-time web search via GPT-5.6.

Researchgpt-5.6-sol

gemini_searchGemini

Web search with Google Search grounding.

Researchgemini-3.6-flash

gemini_analyze_codeGemini

Code quality, security & performance analysis.

Analysisgemini-3.1-pro · 3.6-flash

gemini_analyze_textGemini

Rhetorical analysis: bias, fallacies, persuasion.

Analysisgemini-3.1-pro · 3.6-flash

kimi_long_contextKimi

Long-context analysis across a 1M window.

Analysiskimi-k3

openai_explainOpenAI

Clear explanations for complex topics, any level.

Analysisgpt-5.6-sol

security_reviewDeepSeek

OWASP/CWE security audit via DeepSeek V4 Pro — taint/data-flow analysis, severity per finding, concrete fixes.

Analysisdeepseek-v4-pro

grok_brainstormGrok (xAI)

Contrarian first-principles idea generation.

Brainstormgrok-4.5

openai_brainstormOpenAI

Find the 3rd, 4th & 5th alternative approaches.

Brainstormgpt-5.6-sol

gemini_brainstormGemini

Convergent synthesis — clusters ideas into themes.

Brainstormgemini-3.1-pro · 3.6-flash

juryTachiBot

Multi-model jury panel → Gemini judge synthesis.

Councilorchestration

gemini_judgeGemini

Multi-perspective evaluation & verdict synthesis.

Councilgemini-3.1-pro · 3.6-flash

planner_makerTachiBot

Council-based plan creation via coordinator pattern.

Planningorchestration

planner_runnerTachiBot

Execute plans against goal-oriented verification gates.

Planningorchestration

list_plansTachiBot

List recently created plans.

Planningorchestration

kimi_decomposeKimi

Structured task decomposition via agent swarm.

Planningkimi-k3

minimax_agentMiniMax

Multi-step task decomposition & execution.

Planningminimax-m3

grok_architectGrok (xAI)

System architecture & design via 4–16 agent swarm.

Planninggrok-4.5

plan_critiqueTachiBot

Adversarial multi-model plan red-team — pre-mortem, hidden assumptions, ranked risks, Gemini-judged verdict.

Planningorchestration

spec_writerOpenAI

Loose request → reviewable spec via GPT-5.6 sol: user stories, Given/When/Then, out-of-scope, open questions. Sign off before planning.

Planninggpt-5.6-sol

workflowTachiBot

Execute YAML-based multi-step workflows.

Workfloworchestration

workflow_startTachiBot

Begin a workflow session with variables.

Workfloworchestration

continue_workflowTachiBot

Resume an interrupted workflow session.

Workfloworchestration

list_workflowsTachiBot

Discover available workflows.

Workfloworchestration

create_workflowTachiBot

Create a new YAML workflow definition.

Workfloworchestration

workflow_statusTachiBot

Check progress of a running workflow session.

Workfloworchestration

validate_workflowTachiBot

Validate workflow YAML: syntax, tool registry, dependency graph.

Workfloworchestration

validate_workflow_fileTachiBot

Validate a workflow file on disk before running.

Workfloworchestration

visualize_workflowTachiBot

Render a workflow's step graph for inspection.

Workfloworchestration

list_prompt_techniquesTachiBot

Discover techniques — ~9 core by default, all=true for the full 31.

Prompt Eng.orchestration

preview_prompt_techniqueTachiBot

Preview an enhanced prompt before executing, or recommend techniques with technique="auto".

Prompt Eng.orchestration

execute_prompt_techniqueTachiBot

Run a query with a chosen technique applied.

Prompt Eng.orchestration

refine_promptOpenAI

Opt-in prompt improver — raw query → goal-first brief + what-changed + open questions. Never executes.

Prompt Eng.gpt-5.6-luna

tachiTachiBot

Smart auto-router: research, solve, verify, create, judge.

Routingorchestration

focusTachiBot

Mode-based multi-model reasoning (deep, debate, analyze).

Routingorchestration

usage_statsTachiBot

View or reset tool usage & cost statistics.

Routingorchestration

doctorTachiBot

Setup diagnostic — detected API keys, visible vs hidden tools (and why), active profile, suggested first step. Zero-cost.

Routingorchestration

The roster

Twelve providers, one interface

Every model is wired through a single MCP surface with automatic fallbacks. Tools register only when their API key is present — including free offline local models.

OpenAI

7 tools

gpt-5.6-sol · 1.05M ctx

Agentic reasoning, code review & explanations. Three tiers — sol flagship, terra for code, luna for fast explains.

Gemini

5 tools

gemini-3.1-pro · 3.6-flash · 1M ctx

Multi-perspective analysis, brainstorming & the presiding jury judge on 3.1 Pro; search runs the agentic 3.6 Flash.

Grok (xAI)

8 tools

grok-4.5 · 500K ctx

First-principles reasoning, architecture swarms & contrarian brainstorming. Opus-class flagship with configurable reasoning effort.

Kimi

4 tools

kimi-k3 · 1M ctx

Largest open-weight model shipped — 2.8T MoE, natively multimodal. Task decomposition, long-horizon agentic coding & 1M-context analysis.

Qwen

6 tools

qwen3-coder-next · 262K ctx

Algorithm analysis, agentic code generation & heavy mathematical reasoning. SWE >70%.

MiniMax

2 tools

minimax-m3 · 1M ctx

Cost-efficient coding & long-horizon agentic execution. MSA sparse attention, multimodal, 1M context.

Perplexity

2 tools

sonar · live web

Live web search & research grounded in current sources. Cost-optimized on Sonar.

DeepSeek

3 tools

deepseek-v4-pro · open wts

Open-weight frontier reasoning — top AIME/CodeElo math & competitive programming. Default juror & strongest algorithmic reviewer.

GLM (Z.ai)

1 tools

glm-5.2 · 1M ctx

Long-horizon coding flagship from Zhipu — SWE-Bench Pro 62.1, Terminal-Bench 81.0, usable 1M context.

StepFun

1 tools

step-3.7-flash · 196B

Efficient deep reasoning — near-frontier AIME/SWE-Verified scores at flash-tier cost.

ERNIE (Baidu)

1 tools

ernie-4.5-vl · 424B MoE

Broad world knowledge & human-preference strength. Multimodal 424B/47B-active MoE.

Local

1 tools

ollama · lm studio · offline

Any OpenAI-compatible local server — Ollama, LM Studio, llama.cpp, vLLM. Zero-cost, private, offline jurors.

See how it works — Routing, Planner & Internals →

Get Started in Minutes

Add to Claude Code, Claude Desktop, or any MCP client

1Setup wizard — recommended

npx -y -p tachibot-mcp tachibot init

Detects your keys and clients, prints the exact config · never writes or echoes keys

2Claude Code — one command

claude mcp add tachibot -- npx -y -p tachibot-mcp tachibot

Verify with /mcp · pass API keys with --env flags

3Claude Desktop — install & configure

npm install -g tachibot-mcp

{
  "mcpServers": {
    "tachibot": {
      "command": "tachibot",
      "env": {
        "OPENROUTER_API_KEY": "sk-or-xxx",
        "PERPLEXITY_API_KEY": "pplx-xxx"
      }
    }
  }
}

~/.config/Claude/claude_desktop_config.json · or the one-click .mcpb from the latest GitHub release · add only the keys you have

4Start using

/tachi

See your available tools, skills, and configured keys

Documentation GitHub

Early signal

This MCP server by @bypaweldev is really interesting! It allows your agent to send a prompt to a bunch of LLMs in parallel and process results. Fascinating.

Kent C. Dodds

@kentcdodds

Read on X

Open source

Built in public. Actively maintained.

TachiBot is open source and actively maintained — new releases ship regularly. If it helps your workflow, a star helps others find it.

Star on GitHub Report a bug or request a feature