Routing Strategies

Waterfall isn't just a model API — it's an intelligent router. Instead of picking a model yourself, you pick a strategy, and Waterfall handles the rest: fallbacks, rate limits, cost optimisation, and capability matching.

15 strategies live0 coming soon

How to select a strategy

// Pass routing_strategy in extra_body
const response = await openai.chat.completions.create({
  model: "waterfall",
  messages: [...],
  extra_body: { routing_strategy: "free_smart" }
})

Live Strategies

Free Smart
Enterprise-quality output, zero spend.
Most Popularfree_smart

Routes exclusively through our curated pool of free-tier models — OpenRouter :free endpoints and NVIDIA NIM community models. The pool is re-validated hourly so you never accidentally hit a model that switched to paid.

Cascade

  • DeepSeek R1 (free)
  • Llama 4 Maverick (free)
  • Gemini 2.5 Pro (free)
  • NVIDIA Kimi K2 (free)
  • 70+ more free models

Best for

  • Prototyping and learning
  • High-volume agent loops
  • Hermes sub-tasks
  • Any zero-budget use case
Cheap Smart
Premium quality at sub-cent prices.
Newcheap_smart

Intelligently cascades through sub-$0.50/1M paid models — DeepSeek V3, Gemini Flash, GPT-4o Mini, and similar. Slightly higher quality ceiling than free-smart, with fractional cost.

Cascade

  • DeepSeek V3 ($0.14/1M)
  • Gemini 2.0 Flash ($0.15/1M)
  • GPT-4o Mini ($0.15/1M)
  • Llama 4 Scout via Groq

Best for

  • Production apps with a budget
  • When free-smart rate-limits
  • High-quality tool calling at low cost
Privacy Smart
ZDR-first routing for privacy-sensitive work.
privacy_smart

Routes to Zero Data Retention (ZDR) OpenRouter models and direct-provider routes where available. ZDR means prompt and response content should not be stored after processing. It is not the same as HIPAA compliance, a BAA, or a full legal-data compliance program.

Cascade

  • NVIDIA NIM (direct — no intermediary)
  • OpenRouter ZDR models (348+ providers)
  • Claude Sonnet 4 (ZDR)
  • GPT-5 (ZDR)
  • Gemini 3.1 Pro (ZDR)

Best for

  • Privacy-sensitive app traffic
  • Internal drafts and notes
  • Internal enterprise tools
  • Avoiding normal provider logging where possible
Tool Calling
Reliable structured output for agentic workflows.
tool_calling

Prioritises models with consistently correct JSON/tool-call outputs. Falls back through a validated chain so your agents never get malformed function calls.

Cascade

  • Claude Sonnet 4
  • GPT-4.1
  • Llama 4 Maverick (free)
  • Elephant Alpha (free)
  • NVIDIA GLM-4.7 (free)

Best for

  • Claude Code / Hermes backends
  • Multi-step agent pipelines
  • Structured data extraction
  • API orchestration
Orchestrator
Planner-grade judgment for complex multi-step tasks.
orchestrator

Targets the highest-capability reasoning models with large context windows. Designed as a fallback for Hermes and similar planner agents when their primary model is rate-limited.

Cascade

  • Qwen 3.6 Plus (free)
  • Kimi K2.5
  • Gemini 2.5 Pro
  • NVIDIA DeepSeek V3.2 (free)

Best for

  • Planner fallback when primary is down
  • Code review and synthesis
  • High-stakes agent judgment
  • Long-horizon reasoning
Reasoning
Chain-of-thought depth for hard problems.
reasoning

Routes to models with explicit extended-thinking: DeepSeek R1, QwQ, o3-mini. These models "think out loud" before answering — slower but dramatically more accurate on math, science, and debugging.

Cascade

  • DeepSeek R1 0528 (free)
  • QwQ 32B (free)
  • NVIDIA Kimi K2 Thinking (free)
  • o1 / o3-mini

Best for

  • Complex debugging
  • Math and science problems
  • Legal and medical analysis
  • Code generation from scratch
Speed First
Sub-200ms first token for real-time UIs.
speed_first

Optimises for time-to-first-token above all else. Routes to Groq and Cerebras inference clusters which run models at 150–500 tok/s.

Cascade

  • Groq Llama 4 Scout (~500 tok/s)
  • Groq Kimi K2 (~400 tok/s)
  • Cerebras Llama 3.1 (~150 tok/s)
  • Gemini Flash fallback

Best for

  • Streaming chat interfaces
  • Autocomplete and copilots
  • Real-time voice pipelines
Context Max
Up to 1M tokens — entire codebases in one prompt.
context_max

Routes exclusively to models with the largest context windows. Perfect for RAG pipelines, full-document ingestion, and long-running agent conversations.

Cascade

  • Gemini 2.5 Pro (1M ctx)
  • Llama 4 Maverick (1M ctx, free)
  • Claude 3.5 (200K ctx)
  • Kimi K2 (128K ctx)

Best for

  • Full codebase analysis
  • Document Q&A
  • Long agent conversations
  • RAG with large corpora
Smart Video
Multimodal intelligence for video + images.
smart_video

Routes to vision-capable models cheapest-first. NVIDIA-hosted Gemma 3n and Llama 4 handle images for free; for actual video frames the router cascades through Gemini and Nova video endpoints.

Cascade

  • Gemma 3n E4B (NVIDIA, vision, free)
  • Llama 4 Maverick (NVIDIA, 1M ctx, free)
  • Nova Lite Video ($0.06/1M)
  • Gemini 2.5 Flash Lite Video ($0.10/1M)
  • Gemini 2.5 Pro (fallback)

Best for

  • Screenshot-to-code
  • Video content analysis
  • Visual QA and captioning
  • Diagram and chart understanding
Smart Legal
Precise legal routing with honest privacy limits.
smart_legal

Routes legal queries through privacy-aware, high-context models. This is useful for legal drafting and analysis, but it is not a privilege guarantee. Law firms should use covered direct-provider routes with approved contracts and subprocessors for client-confidential data.

Cascade

  • Mistral Large 3 675B (NVIDIA, free)
  • Kimi K2 Thinking (NVIDIA, free)
  • DeepSeek V3.2 (NVIDIA, free)
  • Claude Sonnet 4 (complex)
  • Claude Opus 4 (expert)

Best for

  • Contract review and redlining
  • Statute and case law research
  • Compliance document generation
  • Legal memo drafting
Smart Health
Health-focused routing without fake HIPAA claims.
smart_health

Routes health and clinical queries through careful models with privacy-aware defaults. This is not HIPAA compliance by itself. PHI should only use routes covered by a signed BAA and the right provider configuration.

Cascade

  • Mistral Large 3 675B (NVIDIA, free)
  • Kimi K2 Thinking (NVIDIA, free)
  • DeepSeek V3.2 (NVIDIA, free)
  • Claude Sonnet 4 (complex)
  • Claude Opus 4 (expert/critical)

Best for

  • Clinical documentation (SOAP notes)
  • Medical coding and billing
  • Patient communication drafts
  • Literature summarization
Smart Image
Image generation via best-in-class multimodal models.
Newsmart_image

Routes image generation requests through chat-capable image models cheapest-first. Gemini Flash Image for speed and cost; GPT-5 Image Mini and GPT-5 Image for highest fidelity. All models support natural-language prompts and iterative refinement.

Cascade

  • Gemini 2.5 Flash Image ($0.30/1M)
  • GPT-5 Image Mini ($2.50/1M)
  • GPT-5 Image ($10/1M)
  • Gemini 3 Pro Image Preview (fallback)

Best for

  • UI mockups and wireframes
  • Product photography concepts
  • Marketing asset generation
  • Creative concept art
Smart Transcribe
Audio → text with speaker diarisation and translation.
Newsmart_transcribe

Routes audio transcription and speech understanding through Voxtral (fast, cheap) and Gemini audio models. Supports long-form audio, multi-speaker diarisation, and real-time streaming transcription.

Cascade

  • Voxtral Small 24B ($0.10/1M)
  • Gemini 2.0 Flash ($0.10/1M)
  • Gemini 2.5 Flash ($0.30/1M)
  • GPT-4o Audio ($2.50/1M)

Best for

  • Meeting and call transcripts
  • Podcast summarisation
  • Voice memo analysis
  • Real-time captioning
Smart Voice
Voice input + output for conversational agents.
Newsmart_voice

Routes voice-agent requests through models that natively support audio input and output. Gemini Flash for low-latency dialogue; GPT-audio and GPT-4o-audio for premium natural-sounding voice synthesis and understanding.

Cascade

  • Gemini 2.0 Flash ($0.10/1M)
  • GPT-audio Mini ($0.60/1M)
  • Gemini 2.5 Flash ($0.30/1M)
  • GPT-audio / GPT-4o-audio ($2.50/1M)

Best for

  • Conversational AI assistants
  • Voice-enabled customer support
  • Real-time dialogue agents
  • Speech synthesis pipelines
Smart Multilingual Voice
140+ language voice and text understanding.
Newsmart_multilingual_voice

Routes multilingual voice and text requests through Gemma 4 (free, 140+ languages) and Gemini Flash. Optimised for cross-lingual transcription, translation, and global voice-agent deployment at minimal cost.

Cascade

  • Gemma 4 26B A4B (free, 140+ languages)
  • Gemma 4 31B (free, 140+ languages)
  • Gemini 2.0 Flash ($0.10/1M)
  • Gemini 2.5 Flash ($0.30/1M)
  • Qwen 3.6 Max Preview (fallback)

Best for

  • Global support bots
  • Multilingual voice agents
  • Cross-lingual transcription
  • Language tutoring apps

Start routing smarter

All strategies are available through the same OpenAI-compatible API. No SDK changes needed.