Skip to main content
SHIPPED CODE, NOT MARKETING COPY

The technical moat
behind ModelSpend

Every capability listed on this page links to a source file. No unsupported percentage claims. No fabricated benchmarks. This is what ships in the product you sign up for.

Deterministic routing signalsCircuit-breaker failover trailScenario plannerOTLP + Datadog + SIEMSCIM lifecycle audit
Routing

Complexity-aware routing that catches what simple heuristics miss

Most AI proxies route by model name or keyword count. ModelSpend runs a deterministic 17-rule signal detector before every route decision. It catches prompt classes that look simple on the surface but require deep, consistent reasoning: counterfactual physics, acrostic output constraints, multi-agent hidden-state tracking, recursive belief structures, and reverse-chronology dependencies.

When advanced reasoning signals fire, the router escalates to a higher-capability model tier — automatically, with a named rationale returned in the API response and emitted to your observability stack.

  • 17 weighted detection rules across 9 complexity categories
  • Additive scoring — threshold 3 prevents false positives from single terms
  • Every decision returns an explainable rationale array (e.g. "counterfactual_physics", "acrostic_constraint")
  • Simple prompts stay on cheap models — no blanket over-routing
  • Benchmark-aware: quality scores feed routing when eval data exists
// Routing decision response includes rationale
{
  "tier": "deep_reasoning",
  "selected_model": "claude-3-5-sonnet-20241022",
  "confidence": 0.94,
  "risk_signals": {
    "hasAdvancedReasoningSignals": true,
    "advancedReasoningDetails": [
      "counterfactual_physics",
      "causal_consistency_constraint"
    ]
  }
}
Resilience

Provider circuit breakers with an auditable failover trail

Provider outages are a when, not an if. ModelSpend maintains an in-memory circuit breaker per provider, persisted to your database, with automatic CLOSED → OPEN → HALF_OPEN transitions. When a provider's circuit opens, the execution bridge selects the next-cheapest compatible provider from your routing profile — without dropping the request.

Every failover is recorded in a per-tenant audit table with from/to provider, model, failure class, and timestamp. SSE events and Datadog log events fire on open and recovery.

  • Per-provider circuit state persisted to v3_provider_circuit_breakers
  • Automatic incident creation when circuit opens; auto-resolves on recovery
  • Failover decision audit trail in v3_failover_decisions (tenant-scoped)
  • Health history samples stored with failure classification
  • GET /v1/sla/failover-audit and /v1/sla/provider-history for operator review
// Failover audit record — stored per request
{
  "from_provider": "openai",
  "to_provider": "groq",
  "model_id": "gpt-4o",
  "failure_class": "server_error",
  "reason": "circuit_open",
  "decided_at": "2026-06-02T14:23:11Z"
}
Forecasting

30/60/90-day spend projections with a scenario planner

ModelSpend does not just report what you spent — it projects where you are going. Forecasts derive directly from your usage ledger. Confidence bands widen as variance in your data increases, so the output is honest about uncertainty rather than false-precise.

The scenario planner lets you model specific changes: shift 20% of deep-tier traffic to a cheaper profile, increase cache hit rate by 15%, adopt a new model mix. Each scenario returns an adjusted projection and a mechanism-level savings estimate.

  • 30/60/90-day cost projection from actual usage data
  • Confidence bands based on observed variance
  • Scenario planner: model mix, cache hit rate, routing-shift, usage growth inputs
  • Savings attribution by mechanism: routing recommendations, compression, (caching coming)
  • Dashboard widgets + CSV/JSON export
// Scenario planner request
POST /v1/forecast/scenario
{
  "horizon_days": 90,
  "routing_shift_deep_pct": 20,
  "cache_hit_rate_delta": 0.15
}

// Returns:
{
  "base_daily_cost_usd": 142.80,
  "adjusted_daily_cost_usd": 109.40,
  "savings_vs_base_usd": 3003.60,
  "routing_savings_ratio_used": 0.234
}
Observability

Enterprise observability that speaks your stack's language

ModelSpend emits structured telemetry from every routing decision, provider health change, budget threshold breach, and audit anomaly. The Datadog adapter posts structured log events to the Datadog log intake API, env-gated so it is a no-op when DD_API_KEY is absent. A separate OTLP-compatible trace exporter stores spans in otel_traces (30-day rolling retention) and forwards to any OTLP-capable collector — Jaeger, Grafana, Datadog APM, or your own endpoint.

The webhook delivery system supports any HTTP sink, with per-tenant event subscriptions, retry/backoff, and a dead-letter store. SIEM export covers CEF for Splunk/QRadar and JSONL for Elastic/OpenSearch.

  • Datadog: structured log events for routing decisions, provider health, budget events, audit anomalies
  • OTel traces: span-per-pipeline-stage, OTLP JSON export to any collector
  • Webhook delivery: any HTTP sink, per-event-type subscriptions, retry backoff
  • SIEM: CEF (Splunk/QRadar) and JSONL (Elastic/OpenSearch) with configurable retention
  • All emissions are fire-and-forget — telemetry never blocks request latency
# Env-gated Datadog integration — no code changes needed
DD_API_KEY=your-key
DD_SITE=datadoghq.com        # or datadoghq.eu, etc.
# DD_METRICS_ENABLED=false   # explicit kill-switch

# OTLP trace export
OTLP_ENDPOINT=https://your-collector.internal/v1/traces
Identity

Enterprise identity with a full lifecycle audit trail

ModelSpend's SCIM 2.0 implementation covers the full user lifecycle — provision, update, deactivate, reactivate, and group membership — and writes every operation to a per-tenant audit log. Role assignment is governed exclusively by admin-configured SSO role mappings: SCIM PATCH operations cannot escalate privileges.

The SSO diagnostics endpoint returns a structured health check: enabled connections, role mapping counts, domain-hint gaps, and never-used connections — all in one API call, with no secrets in the response.

  • SCIM 2.0 Users + Groups with full lifecycle audit log (scim_deprovision_log)
  • SAML/OIDC role mappings: exact, prefix, suffix, contains match types
  • Role escalation prevention: PATCH body role fields silently ignored
  • GET /v1/sso/diagnostics: health check with structured warnings
  • GET /v1/sso/evidence: exportable SSO + SCIM config for enterprise review
// SSO diagnostics — structured warnings, no secrets
GET /v1/sso/diagnostics

{
  "healthy": false,
  "warnings": [
    "no_role_mappings:c-1: Enabled connection \"okta-prod\" has no role mappings"
  ],
  "scim_token_count": 1,
  "connections": [{ "connection_id": "c-1", "is_enabled": true, ... }]
}

Claims provenance

Every capability listed above is implemented and testable in the source repository. Numeric figures (thresholds, counts) reference module constants or migration schemas, not aspirational targets. Savings estimates are user-specific — use the ROI calculator with your own usage figures, or ask for a live simulator against your actual workload.

Enterprise overview Full feature list Talk to engineering →

Built for teams that need proof, not promises.

Every enterprise evaluation starts with a technical review. We'll walk through the codebase, answer hard questions, and run a POC against your actual workload.

Start saving in 4 minutes → Calculate your savings
Founding Beta: Limited Access
Help shape the future of AI spend control.
ends 29 August 2026
Spots are limited.
Secure your early access.
Request Access →