baro

LLM providers

baro runs on either Claude (through the Claude Code CLI) or GPT (through Mozaik's native OpenAI runner). Same DAG, same orchestration, two completely different transports underneath.

baro 0.40 ships with two end-to-end LLM paths. The --llm flag picks which provider every agent in the run — Architect, Planner, Critic, Surgeon, Story Agents — talks to.

baro --llm claude "Your goal"      # default; routes every phase through Claude Code
baro --llm openai "Your goal"      # routes every phase through Mozaik-native OpenAI

The DAG, the event bus, the participant set, the prompts — all identical. The only thing that moves is the provider on the other side of every call.

What the two flags actually do

--llm claude (the default) does not call Anthropic's API directly. It shells out to the Claude Code CLI (claude --print in headless mode) for every phase. Auth comes from your active Claude Code session. Today that means every baro run bills against your Claude Max subscription, same as your interactive Claude Code sessions.

--llm openai calls the OpenAI Responses API directly through Mozaik's native runner — no subprocess, no CLI in between. Auth comes from an OPENAI_API_KEY environment variable. Every token is billed per-call.

This is the most-misread distinction in baro. Same orchestration on top, two completely different economic and latency profiles underneath.

Picking a provider

Pick thisWhen
--llm claudeYou have a Claude Max subscription and want to amortise it. A multi-hour parallel baro run costs you nothing on top of the $200/mo you already pay.
--llm openaiYou want raw wall-clock speed. Mozaik-native calls return faster per request than Claude Code subprocess calls. Roughly $1–2 per mid-sized run with cache hits.

We ran a side-by-side benchmark on the Mozaik repo (add an Anthropic provider, mid-sized feature). GPT-5.5 finished in 2:16, Claude Opus 4.7 in 4:01. Identical DAG parallel speedup (1.18× vs 1.20×) — the whole gap is per-call latency, partly the model and partly the Claude Code subprocess overhead. See the JigJoy engineering blog for the full write-up.

Claude Max subscription change — June 15

Anthropic announced that on June 15, claude --print (the headless mode baro uses for --llm claude) stops billing against Claude Max and starts requiring its own API key. After the cutover, the cost case for the Claude path collapses — it just becomes "Anthropic API calls with extra subprocess overhead." A Mozaik-native Anthropic runtime is on the baro roadmap to address this.

Model defaults per phase

Every phase has a routed default, picked to match the workload of that phase. Reasoning-heavy phases get the flagship model; the per-turn verdict phase gets the cheap one.

Phase--llm claude--llm openai
Architectopusgpt-5.5
Planneropusgpt-5.5
Story Agentopusgpt-5.5
Critichaikugpt-5.4-mini
Surgeonopusgpt-5.5

Critic stays on the cheap tier because it runs once per agent per turn (highest-volume call in a run) and its verdict is a structured PASS/FAIL — flagship reasoning doesn't move that needle.

Overriding per phase

Each phase has its own --*-model flag if you want to override the routed default without touching the rest:

baro --llm openai \
  --architect-model gpt-5.5 \
  --planner-model gpt-5.4 \
  --story-model gpt-5.5 \
  --critic-model gpt-5.4-nano \
  "Your goal"

Available OpenAI models (Mozaik 3.9): gpt-5.5, gpt-5.4, gpt-5.4-mini, gpt-5.4-nano. Available Claude models (via Claude Code): opus, sonnet, haiku.

If you want one model pinned across every phase, use --model instead — it overrides every per-phase default:

baro --llm openai --model gpt-5.4 "Your goal"     # whole run on 5.4
baro --llm claude --model sonnet "Your goal"      # whole run on Sonnet

To opt out of routing entirely (no per-phase defaults, no per-phase overrides — every phase falls back to the SDK's own default), pass --no-model-routing.

Setting up --llm openai

If your shell already has OPENAI_API_KEY exported, baro picks it up automatically:

export OPENAI_API_KEY=sk-...
baro --llm openai "Your goal"

If you don't have it in your environment, baro detours through an interactive API-key entry screen before planning starts. Whatever you type stays in baro's memory for the duration of the run — it is not persisted to disk and is not logged.

baro --doctor does not yet check OPENAI_API_KEY directly; it only verifies the Claude Code path. If you're on the OpenAI path and the run fails before planning starts with OPENAI_API_KEY is not set, that's the obvious fix.

What stays the same on both paths

  • The Architect → Planner → Story → Critic → Surgeon → Finalizer pipeline is identical.
  • The DAG shape is decided by the Planner; both providers produce DAGs of the same structural quality on the same goal (we've measured this).
  • The Critic loop runs on both — agent output gets per-turn graded against the story's acceptance criteria regardless of provider.
  • The Finalizer composes the same PR body shape regardless of provider.
  • --quick, --parallel, --timeout, --resume, --intra-level-delay, --no-* toggles all work identically on both paths.

The orchestration is what baro built. The model is replaceable.

On this page