Back to blog 2026-06-11

The Brain Talks to Everything Now

Four days ago I shipped the third milestone of The Brain — webhook triggers with HMAC auth, file watchers in their own container, the {trigger.X} placeholder family for inbound payloads. That was M3. The Brain had the four classical trigger types: manual, scheduled, webhook, file.

Today M4 is done. The Brain now talks to other tools — natively, over MCP — and the LLM step picks its own model per call.

Why this matters

M1 was the runner. M2 made the runner work unattended. M3 made the runner reactive. M4 makes the runner ecosystem-aware.

Before M4, The Brain was a workflow orchestrator that knew how to do three things on its own: run shell commands, call a local LLM through a fixed configured endpoint, and call Memory Vault over its REST API. That was the entire step-type surface. Useful, but every integration with anything new required writing a custom adapter.

After M4, The Brain can call any MCP server as a workflow step. Memory Vault's MCP server, GitHub's MCP server, Sentry's MCP server, your own. The stdio transport is the v1.0 commitment; the workflow file says "spawn this MCP server, call this tool, here are the arguments" and The Brain handles the lifecycle.

The LLM step also got per-step overrides. Before M4, every workflow used one configured model server at one URL. Now each step can name its own provider URL, its own model, its own API key, its own timeout, its own max tokens. Mix a fast local model and a slow careful one in the same workflow. Point at LM Studio for some steps and a cloud endpoint for others.

What M4 ships

Per-step LLM overrides.

LLMStep(
    name="fast_summary",
    prompt="Two sentences: {previous.recall}",
    model="mistralai/ministral-3-3b",
    timeout_seconds=60,
    max_tokens=400,
)

LLMStep(
    name="careful_analysis",
    prompt="Detailed breakdown of: {fast_summary}",
    provider_url="http://other-host:1234/v1",
    api_key="sk-...",
    model="anthropic/claude-3-5-sonnet",
    timeout_seconds=600,
    max_tokens=4000,
)

Each field overrides the corresponding LLM_BASE_URL / LLM_API_KEY / LLM_MODEL env var when set; falls back to the env var when left as None. Tested against LM Studio only — other OpenAI-compatible providers (Ollama, vLLM, llama.cpp server, OpenAI proper) may work via the same wire format but are not promised in v1.0.

MCP tool calling as a step type.

McpToolStep(
    name="recall",
    server_command="python -m memory_vault.mcp",
    tool="recall",
    args={"query": "{previous.search_term}", "limit": 10},
    timeout_seconds=30,
)

The Brain spawns the MCP server as a subprocess for that one step, runs the MCP initialize handshake, calls one tool, and kills the subprocess. Fresh start every time. No shared state. If the MCP server crashes mid-call, only that step fails — the next step gets a brand-new subprocess.

The server_command and string values in args accept {previous.X} and {trigger.X} placeholders the same way ShellStep.command does. The tool name and args keys are never substituted — those are protocol-level identifiers, not user data. Non-string args values (ints, bools, nested dicts) pass through unchanged.

stdio transport only in v1.0 (HTTP transport is a future consideration). initialize + tools/call only — no tools/list, no resources, no prompts, no server-initiated notifications.

The derive-your-own-image pattern.

The stock mihaibuilds/the-brain image bundles zero MCP servers. The Brain is a workflow orchestrator; MCP servers are independent products. Coupling them would force users into installing things they don't need — anyone who only wants shell + LLM + webhook workflows should pay zero MCP cost.

If your workflow's server_command calls an MCP server, install that server in a derived image:

FROM mihaibuilds/the-brain:latest

# install whichever MCP server(s) your workflows call —
# each server's install instructions live in its own repo
RUN <install-command-per-the-server-s-readme>

This keeps each ecosystem product independent. The Brain stands alone. Memory Vault stands alone. You compose them by deriving your own image with the combination you want.

A worked example. examples/brain-with-mv-mcp/ ships a complete composition: Dockerfile, docker-compose.yml, a verify workflow, and a runbook. It's the canonical reference for the pattern.

How it works under the hood

Three locked invariants worth naming directly.

Per-step spawn lifecycle. Every McpToolStep spawns its MCP server subprocess at step start, runs the MCP handshake, invokes one tools/call, and kills the subprocess at step end. No shared client between steps. No pooling. No long-running MCP server process. Trade-off: ~200-500ms startup cost per MCP step for a server like MV's that loads sentence-transformers + spaCy + a pgvector connection on every spawn. Acceptable for v1.0's self-hoster audience; per-run pooling is a future consideration if real latency complaints surface.

Substitution boundaries are sharp. The _resolve_step function in the runner extends to handle dict-typed args on McpToolStep. It iterates dict values, substitutes string-typed values via {previous.X} and {trigger.X} resolvers, leaves non-strings and keys untouched. The tool name is never substituted. Nested-dict args (args={"filter": {"query": "{previous.X}"}}) are not recursively substituted — consistent with the {trigger.body.foo} no-nesting rule from M3. If a workflow needs nested-dict semantics, it builds the string at a previous step.

isError: true becomes step failure. When an MCP server returns a successful JSON-RPC response that contains isError: true in its result, The Brain treats that as step failure. The workflow halts the same way it would on a non-zero shell exit code. The first text content block in the response becomes the step's error message. This makes MCP errors flow through the same workflow-halt semantics as every other failure path — workflow authors don't have to check isError in every downstream step.

The moment for the ecosystem

I want to call this out separately because it matters more than either feature in isolation.

Memory Vault went live two months ago. The Brain has been under construction since May. I've been calling them "the ecosystem" the whole time, but they were two completely separate projects living in two completely separate repositories. They had never actually worked together end-to-end.

For M4's verify pass, I built a derived image: The Brain plus Memory Vault, both installed, separate Postgres instances (Brain's tables + MV's pgvector tables), three containers in one Docker network. The verify workflow asks Memory Vault — over MCP — for memories matching a query. Memory Vault searches its pgvector index and returns chunks with similarity scores. The Brain pipes those chunks into a local LLM step. The LLM writes a short digest. A shell step saves the digest to a file.

It worked. Real database, real hybrid search, real LLM call, real file written.

I ran it twice — once with Ministral-3B-Instruct loaded in LM Studio (about 4 seconds end-to-end) and once with Qwen3.5-9B — a reasoning-style model with much larger token budgets and patience (about 2 minutes 13 seconds). Same workflow file. The only difference was three fields on the LLM step: model, timeout_seconds, max_tokens.

Both summaries were real. The fast model wrote a tight two-sentence digest. The reasoning model produced a longer, more comprehensive summary that captured more of the original context — at thirty times the wall-clock cost.

This is the first time The Brain and Memory Vault have actually composed in production shape. It's the moment where "the ecosystem" stops being a word on a roadmap and starts being a system that exists. Worth a quiet pause.

What M4 does not do, on purpose

The LLM step does not drive tool calling. LLMStep is chat-completion only — it produces text. If a workflow wants "LLM picks an MCP tool to call," it wires it explicitly: an LLMStep produces a tool name, {previous.X} substitution puts that name into the next McpToolStep.tool field. But the tool field is locked NOT-substituted — meaning the workflow author has to chain through the args instead, or wire each candidate tool as a separate branch. The workflow file is the orchestrator. The LLM transforms text. It does not decide. This is by design.

No tools/list discovery. Workflow authors are expected to know the tool name and the args shape in advance, the same way they know what shell commands they're calling. If you want introspection, build it in a separate workflow step.

MCP HTTP transport is not in v1.0. Stdio only. HTTP transport is the streamable-HTTP MCP variant, which is a more recent addition to the spec and brings its own auth surface (Bearer / mTLS / OAuth). For v1.0, stdio is the deeper, more universal transport. Future versions of The Brain may add HTTP.

The stock image bundles zero MCP servers. Per the ecosystem rule. Derive-your-own-image is the documented path.

Per-run MCP server pooling is not implemented. Per-step spawn is the lifecycle. Two McpToolStep calls to the same server in one workflow run produce two distinct subprocess PIDs. The cold-start cost on each call is real; for v1.0, the isolation guarantee is worth it.

No custom LLM auth schemes. Bearer-only when an api_key is set, no header when it isn't. If your provider needs something else, you bake that into your derived image.

Reasoning models need bigger budgets. This isn't a v1.0 restriction; it's a model-behavior reality. Reasoning-style LLMs (qwen 3.x+, o1-style, R1-style, QwQ) consume token budget on internal reasoning before producing visible content. If you point a per-step LLM call at a reasoning model with default budgets (60 second timeout, no max_tokens), you may get back empty content. The fix is bigger budgets — timeout_seconds=600 and max_tokens=8000+ is a reasonable starting point. Instruct models (Ministral, Mistral Instruct, Llama Instruct) don't have this behavior.

What's next

Milestone 5 is the v1.0 launch milestone. It's not new features — it's continuous integration, a security audit, full docs, the public README polish, and the launch ritual. After M5 ships, The Brain is publicly v1.0 — open-source, MIT, single-tenant, self-hosted, the same shape Memory Vault took at its own v1.0.

There's no M5 dev-log post on this blog. The next post on The Brain will be the v1.0 launch post itself.

Try it

git clone https://github.com/MihaiBuilds/the-brain
cd the-brain
THE_BRAIN_API_TOKEN=any-value docker compose up -d

# call any MCP server from a workflow (build your own derived image first
# with the MCP server installed — see examples/brain-with-mv-mcp/)
docker compose exec brain brain run examples/mcp_recall_memory.py

# or use per-step LLM overrides without any MCP setup
docker compose exec brain brain run examples/daily_digest.py

The repo has the full README, the derive-pattern example with a complete runbook for composing The Brain with Memory Vault, and reference workflows for both LLMStep and McpToolStep.

Follow along

Twitter / X: @mihaibuilds
Blog: mihaibuilds.com
GitHub: github.com/MihaiBuilds/the-brain

Watch the repo to follow along as each milestone ships.