All features

Everything you need to master
AI costs.

From a single-line proxy swap to enterprise governance and distributed tracing — ModelSpend grows with your team.

Core routing

OpenAI-compatible proxy

Point your existing OpenAI or Anthropic SDK at api.modelspend.best/proxy/v1. Every existing SDK call — streaming, tools, vision, function calling — works identically. The model parameter becomes a routing hint.

Streaming SSE with OpenAI delta format
Function calling and tool use pass-through
Cost metadata in x-modelspend-* headers
Model-to-tier hint table (50+ model patterns)

 # Environment variable only
OPENAI_BASE_URL=https://api.modelspend.best/proxy/v1
OPENAI_API_KEY=msp_live_... # Then use OpenAI SDK normally from openai import OpenAI
client = OpenAI() # reads env vars 

New · v3.1

Evaluation framework

Before you switch routing configurations, prove quality is maintained. Upload a dataset of representative prompts, run them against multiple models simultaneously, and score outputs with LLM-as-judge scoring.

Scores are tracked over time so you can see quality trends as providers update their models.

CSV or API dataset upload (up to 1,000 items)
Run against up to 6 models in parallel
LLM-as-judge scoring (0.0–1.0) with reasoning
Exact match mode for deterministic tasks
Per-model quality × cost × latency comparison
Link eval runs to prompt versions

Sample eval result

gpt-4o-mini

91 $0.0003

claude-haiku

88 $0.0004

gemini-flash

83 $0.0002

llama-4-scout

79 $0.0001

New · v3.1

customer-support · v1.4.0

1.4.0 production Today

1.3.0 archived 3 days ago

1.2.1 archived 1 week ago

1.2.0-draft draft Now editing

Prompt registry

System prompts are code. Treat them like it. Semantic versioning, diff views, a staging workflow, and rollback — the same controls you have on your application code.

Semantic versioning (major.minor.patch)
draft → staging → production promotion
Line-level diff between any two versions
One-click rollback to any previous version
Link to eval runs for quality validation
Token count tracking per version

New · v3.1

OpenTelemetry traces

Every execute call emits a distributed trace with child spans for each stage of the pipeline. Export to your existing observability stack via OTLP. Debug exactly why a specific request was expensive, slow, or blocked.

Root span per request + child spans per stage
Attributes: cost, tokens, tier, provider, model
OTLP HTTP export to any collector
Native integrations: Jaeger, Tempo, Datadog, New Relic, Honeycomb
30-day rolling retention in ModelSpend
In-dashboard trace viewer with waterfall

modelspend.execute 1247ms

modelspend.governance 3ms

modelspend.dlp.scan 8ms

modelspend.routing.decision 12ms

modelspend.budget.check 5ms

modelspend.bridge.execute 1219ms

Ready to reduce your AI bill?

Start for free — setup takes 4 minutes. No infrastructure changes.

Start free → View pricing

Everything you need to master AI costs.

OpenAI-compatible proxy

Evaluation framework

Prompt registry

OpenTelemetry traces

Ready to reduce your AI bill?

Everything you need to master
AI costs.