Complexity-aware routing that catches what simple heuristics miss
Most AI proxies route by model name or keyword count. ModelSpend runs a deterministic 17-rule signal detector before every route decision. It catches prompt classes that look simple on the surface but require deep, consistent reasoning: counterfactual physics, acrostic output constraints, multi-agent hidden-state tracking, recursive belief structures, and reverse-chronology dependencies.
When advanced reasoning signals fire, the router escalates to a higher-capability model tier — automatically, with a named rationale returned in the API response and emitted to your observability stack.
- ✦ 17 weighted detection rules across 9 complexity categories
- ✦ Additive scoring — threshold 3 prevents false positives from single terms
- ✦ Every decision returns an explainable rationale array (e.g. "counterfactual_physics", "acrostic_constraint")
- ✦ Simple prompts stay on cheap models — no blanket over-routing
- ✦ Benchmark-aware: quality scores feed routing when eval data exists
// Routing decision response includes rationale
{
"tier": "deep_reasoning",
"selected_model": "claude-3-5-sonnet-20241022",
"confidence": 0.94,
"risk_signals": {
"hasAdvancedReasoningSignals": true,
"advancedReasoningDetails": [
"counterfactual_physics",
"causal_consistency_constraint"
]
}
} Provider circuit breakers with an auditable failover trail
Provider outages are a when, not an if. ModelSpend maintains an in-memory circuit breaker per provider, persisted to your database, with automatic CLOSED → OPEN → HALF_OPEN transitions. When a provider's circuit opens, the execution bridge selects the next-cheapest compatible provider from your routing profile — without dropping the request.
Every failover is recorded in a per-tenant audit table with from/to provider, model, failure class, and timestamp. SSE events and Datadog log events fire on open and recovery.
- ✦ Per-provider circuit state persisted to v3_provider_circuit_breakers
- ✦ Automatic incident creation when circuit opens; auto-resolves on recovery
- ✦ Failover decision audit trail in v3_failover_decisions (tenant-scoped)
- ✦ Health history samples stored with failure classification
- ✦ GET /v1/sla/failover-audit and /v1/sla/provider-history for operator review
// Failover audit record — stored per request
{
"from_provider": "openai",
"to_provider": "groq",
"model_id": "gpt-4o",
"failure_class": "server_error",
"reason": "circuit_open",
"decided_at": "2026-06-02T14:23:11Z"
} 30/60/90-day spend projections with a scenario planner
ModelSpend does not just report what you spent — it projects where you are going. Forecasts derive directly from your usage ledger. Confidence bands widen as variance in your data increases, so the output is honest about uncertainty rather than false-precise.
The scenario planner lets you model specific changes: shift 20% of deep-tier traffic to a cheaper profile, increase cache hit rate by 15%, adopt a new model mix. Each scenario returns an adjusted projection and a mechanism-level savings estimate.
- ✦ 30/60/90-day cost projection from actual usage data
- ✦ Confidence bands based on observed variance
- ✦ Scenario planner: model mix, cache hit rate, routing-shift, usage growth inputs
- ✦ Savings attribution by mechanism: routing recommendations, compression, (caching coming)
- ✦ Dashboard widgets + CSV/JSON export
// Scenario planner request
POST /v1/forecast/scenario
{
"horizon_days": 90,
"routing_shift_deep_pct": 20,
"cache_hit_rate_delta": 0.15
}
// Returns:
{
"base_daily_cost_usd": 142.80,
"adjusted_daily_cost_usd": 109.40,
"savings_vs_base_usd": 3003.60,
"routing_savings_ratio_used": 0.234
} Enterprise observability that speaks your stack's language
ModelSpend emits structured telemetry from every routing decision, provider health change, budget threshold breach, and audit anomaly. The Datadog adapter posts structured log events to the Datadog log intake API, env-gated so it is a no-op when DD_API_KEY is absent. A separate OTLP-compatible trace exporter stores spans in otel_traces (30-day rolling retention) and forwards to any OTLP-capable collector — Jaeger, Grafana, Datadog APM, or your own endpoint.
The webhook delivery system supports any HTTP sink, with per-tenant event subscriptions, retry/backoff, and a dead-letter store. SIEM export covers CEF for Splunk/QRadar and JSONL for Elastic/OpenSearch.
- ✦ Datadog: structured log events for routing decisions, provider health, budget events, audit anomalies
- ✦ OTel traces: span-per-pipeline-stage, OTLP JSON export to any collector
- ✦ Webhook delivery: any HTTP sink, per-event-type subscriptions, retry backoff
- ✦ SIEM: CEF (Splunk/QRadar) and JSONL (Elastic/OpenSearch) with configurable retention
- ✦ All emissions are fire-and-forget — telemetry never blocks request latency
# Env-gated Datadog integration — no code changes needed DD_API_KEY=your-key DD_SITE=datadoghq.com # or datadoghq.eu, etc. # DD_METRICS_ENABLED=false # explicit kill-switch # OTLP trace export OTLP_ENDPOINT=https://your-collector.internal/v1/traces
Enterprise identity with a full lifecycle audit trail
ModelSpend's SCIM 2.0 implementation covers the full user lifecycle — provision, update, deactivate, reactivate, and group membership — and writes every operation to a per-tenant audit log. Role assignment is governed exclusively by admin-configured SSO role mappings: SCIM PATCH operations cannot escalate privileges.
The SSO diagnostics endpoint returns a structured health check: enabled connections, role mapping counts, domain-hint gaps, and never-used connections — all in one API call, with no secrets in the response.
- ✦ SCIM 2.0 Users + Groups with full lifecycle audit log (scim_deprovision_log)
- ✦ SAML/OIDC role mappings: exact, prefix, suffix, contains match types
- ✦ Role escalation prevention: PATCH body role fields silently ignored
- ✦ GET /v1/sso/diagnostics: health check with structured warnings
- ✦ GET /v1/sso/evidence: exportable SSO + SCIM config for enterprise review
// SSO diagnostics — structured warnings, no secrets
GET /v1/sso/diagnostics
{
"healthy": false,
"warnings": [
"no_role_mappings:c-1: Enabled connection \"okta-prod\" has no role mappings"
],
"scim_token_count": 1,
"connections": [{ "connection_id": "c-1", "is_enabled": true, ... }]
} Claims provenance
Every capability listed above is implemented and testable in the source repository. Numeric figures (thresholds, counts) reference module constants or migration schemas, not aspirational targets. Savings estimates are user-specific — use the ROI calculator with your own usage figures, or ask for a live simulator against your actual workload.