Skip to main content
All features

Everything you need to master
AI costs.

From a single-line proxy swap to enterprise governance and distributed tracing — ModelSpend grows with your team.

Core routing

OpenAI-compatible proxy

Point your existing OpenAI or Anthropic SDK at api.modelspend.best/proxy/v1. Every existing SDK call — streaming, tools, vision, function calling — works identically. The model parameter becomes a routing hint.

  • Streaming SSE with OpenAI delta format
  • Function calling and tool use pass-through
  • Cost metadata in x-modelspend-* headers
  • Model-to-tier hint table (50+ model patterns)
# Environment variable only OPENAI_BASE_URL=https://api.modelspend.best/proxy/v1 OPENAI_API_KEY=msp_live_... # Then use OpenAI SDK normally from openai import OpenAI client = OpenAI() # reads env vars
New · v3.1

Evaluation framework

Before you switch routing configurations, prove quality is maintained. Upload a dataset of representative prompts, run them against multiple models simultaneously, and score outputs with LLM-as-judge scoring.

Scores are tracked over time so you can see quality trends as providers update their models.

  • CSV or API dataset upload (up to 1,000 items)
  • Run against up to 6 models in parallel
  • LLM-as-judge scoring (0.0–1.0) with reasoning
  • Exact match mode for deterministic tasks
  • Per-model quality × cost × latency comparison
  • Link eval runs to prompt versions
Sample eval result
gpt-4o-mini
91 $0.0003
claude-haiku
88 $0.0004
gemini-flash
83 $0.0002
llama-4-scout
79 $0.0001
New · v3.1
customer-support · v1.4.0
1.4.0 production Today
1.3.0 archived 3 days ago
1.2.1 archived 1 week ago
1.2.0-draft draft Now editing

Prompt registry

System prompts are code. Treat them like it. Semantic versioning, diff views, a staging workflow, and rollback — the same controls you have on your application code.

  • Semantic versioning (major.minor.patch)
  • draft → staging → production promotion
  • Line-level diff between any two versions
  • One-click rollback to any previous version
  • Link to eval runs for quality validation
  • Token count tracking per version
New · v3.1

OpenTelemetry traces

Every execute call emits a distributed trace with child spans for each stage of the pipeline. Export to your existing observability stack via OTLP. Debug exactly why a specific request was expensive, slow, or blocked.

  • Root span per request + child spans per stage
  • Attributes: cost, tokens, tier, provider, model
  • OTLP HTTP export to any collector
  • Native integrations: Jaeger, Tempo, Datadog, New Relic, Honeycomb
  • 30-day rolling retention in ModelSpend
  • In-dashboard trace viewer with waterfall
modelspend.execute 1247ms
modelspend.governance 3ms
modelspend.dlp.scan 8ms
modelspend.routing.decision 12ms
modelspend.budget.check 5ms
modelspend.bridge.execute 1219ms

Ready to reduce your AI bill?

Start for free — setup takes 4 minutes. No infrastructure changes.

Founding Beta: Limited Access
Help shape the future of AI spend control.
ends 29 August 2026
Spots are limited.
Secure your early access.
Request Access →