Cost & budget
How lupe accounts token usage in USD, caps spend with maxCostUsd, and lets you correct pricing per model.
lupe runs on your own model tokens, so every review has a real dollar cost. To keep that cost visible and bounded, lupe tracks token usage per model, converts it to USD, prints the total in the review summary, and can fail a run outright if the projected spend exceeds a cap you set.
Cost in the summary footer
After a review, lupe appends a small footer to the summary showing the usage and estimated cost for the run:
๐ธ 128,540 in ยท 3,102 out ยท 96,204 cached ยท ~$0.4187That line reports input tokens, output tokens, cached (cache-read) tokens, and the total estimated cost in USD (four decimal places). Cost is computed per model from token usage, then summed across every model and chunk the run touched.
Built-in pricing table
lupe ships a vendored pricing table in USD per 1M tokens. Model ids are matched by substring, so id variants (dates, suffixes) resolve to the right family. The Anthropic rows are verified against published pricing; every non-Anthropic row is a rough placeholder you should override per deployment.
| Model family (match) | Input | Output | Cache read | Cache write | Status |
|---|---|---|---|---|---|
opus | 5 | 25 | 0.5 | 6.25 | verified |
sonnet | 3 | 15 | 0.3 | 3.75 | verified |
haiku | 1 | 5 | 0.1 | 1.25 | verified |
fable / mythos | 10 | 50 | 1.0 | 12.5 | verified |
gpt-5-mini / gpt-4o-mini | 0.15 | 0.6 | 0.075 | 0.15 | placeholder |
gpt-5 / gpt-4.1 / gpt-4o | 2.5 | 10 | 1.25 | 2.5 | placeholder |
gemini โฆ flash | 0.15 | 0.6 | 0.0375 | 0.15 | placeholder |
gemini โฆ pro | 1.25 | 10 | 0.31 | 1.25 | placeholder |
A model id that matches no row resolves to a zero price marked "unknown" โ which the cost cap treats as a hard failure rather than as "free" (see below).
Prompt caching
On Anthropic models, lupe reuses a frozen, cacheable system prefix across chunks of a large review. The first chunk writes that prefix to the cache (a cache-write); later chunks read it back (a cache-read) instead of paying full input price for the same bytes. That makes repeated-input cost much cheaper on multi-chunk runs.
The pricing table reflects the 5-minute ephemeral cache rates lupe uses: cache rows default to cacheRead = 0.1 ร input and cacheWrite = 1.25 ร input. The same defaults apply to any prices you supply that omit the cache fields (see below).
Capping spend with maxCostUsd
Set maxCostUsd in your config to put a hard USD ceiling on a single review run. lupe enforces it at two points, both of which stop the run rather than silently overspend:
- Pre-flight estimate โ before any model call. lupe projects the run's cost from the diff size and the resolved model price. If the projection exceeds
maxCostUsd, or if the model has no known price, the run fails immediately without contacting the model. The estimate is deliberately biased to over-predict so the cap fails safe. - Post-priming breaker โ after the first chunk. On a multi-chunk review, lupe reviews one cache-priming chunk first, measures its real cost, and re-projects the full run from that measurement. If the projection would push the run over the cap, it aborts before fanning out the remaining chunks.
Both checks fail closed on an unpriced model: if lupe cannot price the model, it will not run under a cost cap. The error tells you to set modelPrices.
# .lupe.yaml
maxCostUsd: 2.50maxCostUsd, you must supply a price via modelPrices โ otherwise the fail-closed cap will refuse to run.Correcting prices with modelPrices
The modelPrices map lets you override or add prices per exact model id. This is essential for bring-your-own endpoints and non-Anthropic models, where the built-in placeholders are wrong and would make cost accounting โ and the cost cap โ inaccurate.
Each entry is keyed by the exact model id and takes prices in USD per 1M tokens. input and output are required; cacheRead and cacheWrite are optional and default to input ร 0.1 and input ร 1.25 respectively.
# .lupe.yaml
modelPrices:
gpt-5:
input: 1.25
output: 10
cacheRead: 0.125
cacheWrite: 1.5625
my-self-hosted-model:
input: 0.8
output: 2.4
# cacheRead / cacheWrite omitted โ derived from inputAn override always wins over the built-in table, and providing one flips a previously "unknown" model to "known" so the cost cap can enforce a budget on it. See providers and models for how model ids are chosen per task.
Local backends are free
The opt-in local backends (claude-cli and codex-cli) drive your own already-authenticated CLI. They are unmetered โ lupe does not see token counts or per-call pricing for them โ so they always report a cost of $0.0000. maxCostUsd has nothing to gate on these backends.