Models

ChatTool UseReasoningCode

Free131K ctx

CoBuddy is a code generation model from Baidu, optimized for coding tasks and AI Agent workflows. It features high inference throughput and low end-to-end latency, with native support for tool...

cognitivecomputations/dolphin-mistral-24b-venice-edition-free

cognitivecomputations

Free33K ctx

Venice Uncensored Dolphin Mistral 24B Venice Edition is a fine-tuned variant of Mistral-Small-24B-Instruct-2501, developed by dphn.ai in collaboration with Venice.ai. This model is designed as an “uncensored” instruct-tuned LLM, preserving...

Free Models Router

openrouter

liquid/lfm-2.5-1.2b-thinking-free

liquid

Free33K ctx

LFM2.5-1.2B-Thinking is a lightweight reasoning-focused model optimized for agentic tasks, data extraction, and RAG—while still running comfortably on edge devices. It supports long context (up to 32K tokens) and is...

ChatReasoningTool Use

NVIDIA Build: active-speaker-detection

nvidia

active-speaker-detection from NVIDIA Build model catalog.

VideoAudioVision

NVIDIA Build: bevformer

nvidia

bevformer from NVIDIA Build model catalog.

NVIDIA Build: cosmos-transfer1-7b

nvidia

cosmos-transfer1-7b from NVIDIA Build model catalog.

NVIDIA Build: cosmos-transfer2.5-2b

nvidia

cosmos-transfer2.5-2b from NVIDIA Build model catalog.

NVIDIA Build: deepseek-v4-flash

deepseek-ai

deepseek-v4-flash from NVIDIA Build model catalog.

NVIDIA Build: deepseek-v4-pro

deepseek-ai

deepseek-v4-pro from NVIDIA Build model catalog.

NVIDIA Build: flux-2-klein-4b

black-forest-labs

flux-2-klein-4b from NVIDIA Build model catalog.

Image

NVIDIA Build: gemma-4-31b-it

google

gemma-4-31b-it from NVIDIA Build model catalog.

ChatVisionTool Use

NVIDIA Build: gliner-pii

nvidia

gliner-pii from NVIDIA Build model catalog.

NVIDIA Build: glm-5.1

z-ai

glm-5.1 from NVIDIA Build model catalog.

NVIDIA Build: ising-calibration-1-35b-a3b

nvidia

ising-calibration-1-35b-a3b from NVIDIA Build model catalog.

NVIDIA Build: kimi-k2.6

moonshotai

ChatVisionTool UseReasoning

kimi-k2.6 from NVIDIA Build model catalog.

NVIDIA Build: lipsync

nvidia

lipsync from NVIDIA Build model catalog.

VideoAudio

smart-videosmart-voice

NVIDIA Build: llama-3-2-nemoretriever-300m-embed-v1

nvidia

llama-3-2-nemoretriever-300m-embed-v1 from NVIDIA Build model catalog.

smart-embedding

NVIDIA Build: llama-3.1-nemotron-safety-guard-8b-v3

nvidia

llama-3.1-nemotron-safety-guard-8b-v3 from NVIDIA Build model catalog.

NVIDIA Build: llama-nemotron-rerank-1b-v2

nvidia

llama-nemotron-rerank-1b-v2 from NVIDIA Build model catalog.

smart-rerank

NVIDIA Build: llama-nemotron-rerank-vl-1b-v2

nvidia

llama-nemotron-rerank-vl-1b-v2 from NVIDIA Build model catalog.

Vision

smart-rerank

NVIDIA Build: minimax-m2.7

minimaxai

minimax-m2.7 from NVIDIA Build model catalog.

NVIDIA Build: mistral-medium-3.5-128b

mistralai

mistral-medium-3.5-128b from NVIDIA Build model catalog.

ChatVisionTool Use

NVIDIA Build: mistral-small-4-119b-2603

mistralai

mistral-small-4-119b-2603 from NVIDIA Build model catalog.

ChatVisionTool Use

NVIDIA Build: nemotron-3-content-safety

nvidia

nemotron-3-content-safety from NVIDIA Build model catalog.

Reasoning

NVIDIA Build: nemotron-3-nano-omni-30b-a3b-reasoning

nvidia

nemotron-3-nano-omni-30b-a3b-reasoning from NVIDIA Build model catalog.

ChatVisionAudioVideo

NVIDIA Build: nemotron-3-super-120b-a12b

nvidia

nemotron-3-super-120b-a12b from NVIDIA Build model catalog.

NVIDIA Build: nemotron-asr-streaming

nvidia

smart-transcribesmart-voice

nemotron-asr-streaming from NVIDIA Build model catalog.

Audio

NVIDIA Build: nemotron-content-safety-reasoning-4b

nvidia

nemotron-content-safety-reasoning-4b from NVIDIA Build model catalog.

NVIDIA Build: nemotron-ocr-v1

nvidia

nemotron-ocr-v1 from NVIDIA Build model catalog.

Vision

NVIDIA Build: nemotron-tts

nvidia

smart-voicesmart-multilingual-voice

nemotron-tts from NVIDIA Build model catalog.

Audio

NVIDIA Build: nemotron-tts-multilingual

nvidia

smart-voicesmart-multilingual-voice

nemotron-tts-multilingual from NVIDIA Build model catalog.

Audio

NVIDIA Build: nemotron-voicechat

nvidia

nemotron-voicechat from NVIDIA Build model catalog.

ChatAudio

privacysmart-voice

NVIDIA Build: nv-embedcode-7b-v1

nvidia

nv-embedcode-7b-v1 from NVIDIA Build model catalog.

Code

smart-embedding

NVIDIA Build: phi-4-multimodal-instruct

nvidia

phi-4-multimodal-instruct from NVIDIA Build model catalog.

ChatVisionAudio

privacysmart-image

NVIDIA Build: qwen-image

qwen

qwen-image from NVIDIA Build model catalog.

Image

NVIDIA Build: qwen-image-edit

qwen

qwen-image-edit from NVIDIA Build model catalog.

Image

NVIDIA Build: relighting

nvidia

relighting from NVIDIA Build model catalog.

ImageVision

NVIDIA Build: riva-translate-4b-instruct-v1_1

nvidia

riva-translate-4b-instruct-v1_1 from NVIDIA Build model catalog.

ChatAudio

privacysmart-voice

NVIDIA Build: riva-translate-4b-instruct-v1-1

nvidia

riva-translate-4b-instruct-v1-1 from NVIDIA Build model catalog.

ChatAudio

privacysmart-voice

NVIDIA Build: sparsedrive

nvidia

sparsedrive from NVIDIA Build model catalog.

NVIDIA Build: streampetr

nvidia

streampetr from NVIDIA Build model catalog.

NVIDIA Build: synthetic-video-detector

nvidia

synthetic-video-detector from NVIDIA Build model catalog.

NVIDIA Build: usdcode

nvidia

usdcode from NVIDIA Build model catalog.

Code

smart-code

NVIDIA: Nemotron 3 Nano Omni (free)

nvidia

Free256K ctx

NVIDIA Nemotron™ 3 Nano Omni is a 30B-A3B open multimodal model designed to function as a perception and context sub-agent in enterprise agent systems. It accepts text, image, video, and...

ChatVisionVideoAudio

nvidia/nemotron-nano-12b-v2-vl-free

nvidia

NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model designed for video understanding and document intelligence. It introduces a hybrid Transformer-Mamba architecture, combining transformer-level accuracy with Mamba’s...

openrouter/free

openrouter

Free200K ctx

The simplest way to get free inference. openrouter/free is a router that selects free models at random from the models available on OpenRouter. The router smartly filters for models that...

Owl Alpha

openrouter

Free1M ctx

Owl Alpha is a high-performance foundation model designed for agentic workloads. Natively supports tool use, and long-context tasks, with strong performance in code generation, automated workflows, and complex instruction execution....

ChatCodeTool Use

Anthropic Claude Sonnet Latest

anthropic

arcee-ai

baidu

$0.68 / $2.81 per 1M66K ctx

Qianfan-OCR-Fast is a domain-specific multimodal large model purpose-built for OCR. By leveraging specialized OCR training data while preserving versatile multimodal intelligence, it provides a powerful performance upgrade over Qianfan-OCR.

orchestratorsmart-image

Body Builder (beta)

openrouter

$-1000000.0000 / $-1000000.0000 per 1M128K ctx

Transform your natural language requests into structured OpenRouter API request objects. Describe what you want to accomplish with AI models, and Body Builder will construct the appropriate API calls. Example:...

ByteDance Seed: Seed 1.6

bytedance-seed

$0.25 / $2.00 per 1M262K ctx

Seed 1.6 is a general-purpose model released by the ByteDance Seed team. It incorporates multimodal capabilities and adaptive deep thinking with a 256K context window.

ByteDance Seed: Seed 1.6 Flash

bytedance-seed

$0.07 / $0.30 per 1M262K ctx

Seed 1.6 Flash is an ultra-fast multimodal deep thinking model by ByteDance Seed, supporting both text and visual understanding. It features a 256k context window and can generate outputs of...

ByteDance Seed: Seed-2.0-Lite

bytedance-seed

deepseek

$0.11 / $0.22 per 1M1M ctx

DeepSeek V4 Flash is an efficiency-optimized Mixture-of-Experts model from DeepSeek with 284B total parameters and 13B activated parameters, supporting a 1M-token context window. It is designed for fast inference and...

DeepSeek: DeepSeek V4 Flash (free)

deepseek

$0.11 / Free per 1M1M ctx

tool-callingorchestrator

DeepSeek: DeepSeek V4 Pro

deepseek

$0.43 / $0.87 per 1M1M ctx

DeepSeek V4 Pro is a large-scale Mixture-of-Experts model from DeepSeek with 1.6T total parameters and 49B activated parameters, supporting a 1M-token context window. It is designed for advanced reasoning, coding,...

DeepSeek: R1 0528

deepseek

$0.50 / $2.15 per 1M164K ctx

May 28th update to the [original DeepSeek R1](/deepseek/deepseek-r1) Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active...

DeepSeek: R1 Distill Llama 70B

deepseek

$0.70 / $0.80 per 1M131K ctx

DeepSeek R1 Distill Llama 70B is a distilled large language model based on [Llama-3.3-70B-Instruct](/meta-llama/llama-3.3-70b-instruct), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). The model combines advanced distillation techniques to achieve high performance across...

z-ai

Free / Free per 1M203K ctx

Compared with GLM-4.5, this generation brings several key improvements: Longer context window: The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex...

google

Gemini Flash 2.0 offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1.5](/google/gemini-flash-1.5), while maintaining quality on par with larger models like [Gemini Pro 1.5](/google/gemini-pro-1.5). It...

Google: Gemini 2.0 Flash Lite

google

$0.07 / $0.30 per 1M1M ctx

Gemini 2.0 Flash Lite offers a significantly faster time to first token (TTFT) compared to [Gemini Flash 1.5](/google/gemini-flash-1.5), while maintaining quality on par with larger models like [Gemini Pro 1.5](/google/gemini-pro-1.5),...

Google: Gemini 2.5 Flash

google

$0.30 / $2.50 per 1M1M ctx

Gemini 2.5 Flash is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in "thinking" capabilities, enabling it to provide responses with greater...

ChatCodeReasoning

smart-transcribesmart-voice

Google: Gemini 2.5 Flash Lite

google

Gemini 2.5 Flash-Lite is a lightweight reasoning model in the Gemini 2.5 family, optimized for ultra-low latency and cost efficiency. It offers improved throughput, faster token generation, and better performance...

Google: Gemini 2.5 Flash Lite Preview 09-2025

google

Google: Gemini 2.5 Pro Preview 05-06

google

$1.25 / $10.00 per 1M1M ctx

ChatCodeReasoning

Google: Gemini 2.5 Pro Preview 06-05

google

$1.25 / $10.00 per 1M1M ctx

ChatCodeReasoning

Google: Gemini 3 Flash Preview

google

$0.50 / $3.00 per 1M1M ctx

Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic workflows, multi turn chat, and coding assistance. It delivers near Pro level reasoning and tool...

ChatCodeTool UseReasoning

Google: Gemini 3.1 Flash Lite

google

$0.25 / $1.50 per 1M1M ctx

Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs, and is designed for lightweight agentic...

ChatVisionVideoAudio

Google: Gemini 3.1 Flash Lite Preview

google

$0.25 / $1.50 per 1M1M ctx

Gemini 3.1 Flash Lite Preview is Google's high-efficiency model optimized for high-volume use cases. It outperforms Gemini 2.5 Flash Lite on overall quality and approaches Gemini 2.5 Flash performance across...

Google: Gemini 3.1 Pro Preview

google

$2.00 / $12.00 per 1M1M ctx

Gemini 3.1 Pro Preview is Google’s frontier reasoning model, delivering enhanced software engineering performance, improved agentic reliability, and more efficient token usage across complex workflows. Building on the multimodal foundation...

ChatVisionTool UseReasoning

Google: Gemini 3.1 Pro Preview Custom Tools

google

$2.00 / $12.00 per 1M1M ctx

Gemini 3.1 Pro Preview Custom Tools is a variant of Gemini 3.1 Pro that improves tool selection behavior by preventing overuse of a general bash tool when more efficient third-party...

Google: Gemini 3.5 Flash

google

$1.50 / $9.00 per 1M1M ctx

Gemini 3.5 Flash is Google's high-efficiency multimodal model, bringing near-Pro level coding and reasoning at Flash-tier cost and speed. It is highly optimized for coding proficiency and parallel agentic execution...

ChatVisionCodeTool Use

Google: Gemma 2 27B

google

$0.65 / $0.65 per 1M8K ctx

Gemma 2 27B by Google is an open model built from the same research and technology used to create the [Gemini models](/models?q=gemini). Gemma models are well-suited for a variety of...

Google: Gemma 3 12B

google

$0.04 / $0.13 per 1M131K ctx

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...

Google: Gemma 3 27B

google

$0.08 / $0.16 per 1M131K ctx

Google: Gemma 3 4B

google

$0.04 / $0.08 per 1M131K ctx

Google: Gemma 3n 4B

google

$0.06 / $0.12 per 1M33K ctx

Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It supports multimodal inputs—including text, visual data, and audio—enabling diverse tasks...

ChatVisionAudio

Google: Gemma 4 26B A4B

google

$0.06 / $0.33 per 1M262K ctx

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference — delivering near-31B quality at...

Google: Gemma 4 26B A4B (free)

google

$0.06 / Free per 1M262K ctx

Google: Gemma 4 31B

google

$0.12 / $0.37 per 1M262K ctx

Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function...

Google: Gemma 4 31B (free)

google

$0.12 / Free per 1M262K ctx

Google: Nano Banana (Gemini 2.5 Flash Image)

google

$0.30 / $2.50 per 1M33K ctx

Gemini 2.5 Flash Image, a.k.a. "Nano Banana," is now generally available. It is a state of the art image generation model with contextual understanding. It is capable of image generation,...

Google: Nano Banana 2 (Gemini 3.1 Flash Image Preview)

google

$0.50 / $3.00 per 1M131K ctx

Gemini 3.1 Flash Image Preview, a.k.a. "Nano Banana 2," is Google’s latest state of the art image generation and editing model, delivering Pro-level visual quality at Flash speed. It combines...

Google: Nano Banana Pro (Gemini 3 Pro Image Preview)

google

$2.00 / $12.00 per 1M66K ctx

Nano Banana Pro is Google’s most advanced image-generation and editing model, built on Gemini 3 Pro. It extends the original Nano Banana with significantly improved multimodal reasoning, real-world grounding, and...

gpt-4.1

openai

Free / Free per 1M1M ctx

GPT-4.1 is a flagship large language model optimized for advanced instruction following, real-world software engineering, and long-context reasoning. It supports a 1 million token context window and outperforms GPT-4o and...

gpt-4.1-mini

openai

Free / Free per 1M1M ctx

GPT-4.1 Mini is a mid-sized model delivering performance competitive with GPT-4o at substantially lower latency and cost. It retains a 1 million token context window and scores 45.1% on hard...

gpt-4o

openai

Free / Free per 1M128K ctx

GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as...

smart-video

gpt-4o-mini

openai

Free / Free per 1M128K ctx

GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/models/openai/gpt-4o), supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more affordable...

gpt-5

openai

Free / Free per 1M400K ctx

GPT-5 is OpenAI’s most advanced model, offering major improvements in reasoning, code quality, and user experience. It is optimized for complex tasks that require step-by-step reasoning, instruction following, and accuracy...

IBM: Granite 4.0 Micro

ibm-granite

$0.02 / $0.11 per 1M131K ctx

Granite-4.0-H-Micro is a 3B parameter from the Granite 4 family of models. These models are the latest in a series of models released by IBM. They are fine-tuned for long...

IBM: Granite 4.1 8B

ibm-granite

$0.05 / $0.10 per 1M131K ctx

Granite 4.1 8B is a dense, decoder-only 8-billion-parameter language model from IBM, part of the Granite 4.1 family. It supports a 131K-token context window and is designed for enterprise tasks...

inflection

$2.50 / $10.00 per 1M8K ctx

Inflection 3 Pi powers Inflection's [Pi](https://pi.ai) chatbot, including backstory, emotional intelligence, productivity, and safety. It has access to recent news, and excels in scenarios like customer support and roleplay. Pi...

Inflection: Inflection 3 Productivity

inflection

$2.50 / $10.00 per 1M8K ctx

Inflection 3 Productivity is optimized for following instructions. It is better for tasks requiring JSON output or precise adherence to provided guidelines. It has access to recent news. For emotional...

kimi-k2

moonshotai

Free / Free per 1M131K ctx

Kimi K2 Instruct is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for...

kimi-k2.5

moonshotai

minimax

Free / Free per 1M1M ctx

MiniMax-01 is a combines MiniMax-Text-01 for text generation and MiniMax-VL-01 for image understanding. It has 456 billion parameters, with 45.9 billion parameters activated per inference, and can handle a context...

orchestrator

minimax-m2.5

minimax

mistralai

Mistral: Mixtral 8x22B Instruct

mistralai

$2.00 / $6.00 per 1M66K ctx

Mistral's official instruct fine-tuned version of [Mixtral 8x22B](/models/mistralai/mixtral-8x22b). It uses 39B active parameters out of 141B, offering unparalleled cost efficiency for its size. Its strengths include: - strong math, coding,...

ChatCode

Mistral: Pixtral Large 2411

mistralai

$2.00 / $6.00 per 1M131K ctx

Pixtral Large is a 124B parameter, open-weight, multimodal model built on top of [Mistral Large 2](/mistralai/mistral-large-2411). The model is able to understand documents, charts and natural images. The model is...

moonshotai

$0.60 / $2.50 per 1M262K ctx

Kimi K2 0905 is the September update of [Kimi K2 0711](moonshotai/kimi-k2). It is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32...

MoonshotAI: Kimi K2 Thinking

moonshotai

$0.60 / $2.50 per 1M262K ctx

Kimi K2 Thinking is Moonshot AI’s most advanced open reasoning model to date, extending the K2 series into agentic, long-horizon reasoning. Built on the trillion-parameter Mixture-of-Experts (MoE) architecture introduced in...

MoonshotAI: Kimi K2.6

moonshotai

nousresearch

$1.00 / Free per 1M131K ctx

ChatReasoningTool Use

NVIDIA: Llama 3.3 Nemotron Super 49B V1.5

nvidia

$0.10 / $0.40 per 1M131K ctx

Llama-3.3-Nemotron-Super-49B-v1.5 is a 49B-parameter, English-centric reasoning/chat model derived from Meta’s Llama-3.3-70B-Instruct with a 128K context. It’s post-trained for agentic workflows (RAG, tool calling) via SFT across math, code, science, and...

ChatCodeTool UseReasoning

NVIDIA: Nemotron 3 Nano 30B A3B

nvidia

$0.05 / $0.20 per 1M262K ctx

NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems. The model is fully...

NVIDIA: Nemotron 3 Super

nvidia

$0.09 / $0.45 per 1M1M ctx

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer...

NVIDIA: Nemotron 3 Super (free)

nvidia

$0.09 / Free per 1M1M ctx

NVIDIA: Nemotron Nano 9B V2

nvidia

$0.04 / $0.16 per 1M131K ctx

NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and...

nvidia/nemotron-3-nano-30b-a3b-free

nvidia

$0.05 / Free per 1M256K ctx

NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems. The model is fully...

nvidia/nemotron-nano-9b-v2-free

nvidia

$0.04 / Free per 1M128K ctx

o1

openai

Free / Free per 1M200K ctx

The latest and strongest model family from OpenAI, o1 is designed to spend more time thinking before responding. The o1 model series is trained with large-scale reinforcement learning to reason...

openai

For tasks that demand low latency, GPT‑4.1 nano is the fastest and cheapest model in the GPT-4.1 series. It delivers exceptional performance at a small size with its 1 million...

openai

$0.15 / $0.60 per 1M128K ctx

OpenAI: GPT-4o-mini Search Preview

openai

$0.15 / $0.60 per 1M128K ctx

GPT-4o mini Search Preview is a specialized model for web search in Chat Completions. It is trained to understand and execute web search queries.

OpenAI: GPT-5 Chat

openai

$1.25 / $10.00 per 1M128K ctx

GPT-5 Chat is designed for advanced, natural, multimodal, and context-aware conversations for enterprise applications.

openai

$0.04 / Free per 1M131K ctx

ChatReasoningTool Use

qwen

$0.30 / $1.80 per 1M1M ctx

Qwen3.5 Plus (April 2026) is a large-scale multimodal language model from Alibaba. It accepts text, image, and video input and produces text output, with a 1M token context window. This...

Qwen: Qwen3.5-122B-A10B

qwen

$0.26 / $2.08 per 1M262K ctx

The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. In terms of...

Qwen: Qwen3.5-27B

qwen

$0.20 / $1.56 per 1M262K ctx

The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of...

Qwen: Qwen3.5-35B-A3B

qwen

$0.14 / $1.00 per 1M262K ctx

The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture that integrates linear attention mechanisms and a sparse mixture-of-experts model, achieving higher inference efficiency. Its overall...

Qwen: Qwen3.5-9B

qwen

$0.04 / $0.15 per 1M262K ctx

Qwen3.5-9B is a multimodal foundation model from the Qwen3.5 family, designed to deliver strong reasoning, coding, and visual understanding in an efficient 9B-parameter architecture. It uses a unified vision-language design...

qwen

$0.33 / $1.95 per 1M1M ctx

Qwen 3.6 Plus builds on a hybrid architecture that combines efficient linear attention with sparse mixture-of-experts routing, enabling strong scalability and high-performance inference. Compared to the 3.5 series, it delivers...

qwen/qwen3-coder-free

qwen

$0.22 / Free per 1M1M ctx

ChatCodeReasoningTool Use

stepfun

$0.10 / $0.30 per 1M262K ctx

Step 3.5 Flash is StepFun's most capable open-source foundation model. Built on a sparse Mixture of Experts (MoE) architecture, it selectively activates only 11B of its 196B parameters per token....

Switchpoint Router

switchpoint

$0.85 / $3.40 per 1M131K ctx

Switchpoint AI's router instantly analyzes your request and directs it to the optimal AI from an ever-evolving library. As the world of LLMs advances, our router gets smarter, ensuring you...

Tencent: Hunyuan A13B Instruct

tencent

$0.14 / $0.57 per 1M131K ctx

Hunyuan-A13B is a 13B active parameter Mixture-of-Experts (MoE) language model developed by Tencent, with a total parameter count of 80B and support for reasoning via Chain-of-Thought. It offers competitive benchmark...

Tencent: Hy3 preview

tencent

Xiaomi: MiMo-V2-Flash

xiaomi

$0.10 / $0.30 per 1M262K ctx

MiMo-V2-Flash is an open-source foundation language model developed by Xiaomi. It is a Mixture-of-Experts model with 309B total parameters and 15B active parameters, adopting hybrid attention architecture. MiMo-V2-Flash supports a...

z-ai

$0.13 / Free per 1M131K ctx

GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter...

Z.ai: GLM 4 32B

z-ai

$0.10 / $0.10 per 1M128K ctx

GLM 4 32B is a cost-effective foundation language model. It can efficiently perform complex tasks and has significantly enhanced capabilities in tool use, online search, and code-related intelligent tasks. It...

ChatCodeTool Use

Z.ai: GLM 4.5

z-ai

$0.60 / $2.20 per 1M131K ctx

GLM-4.5 is our latest flagship foundation model, purpose-built for agent-based applications. It leverages a Mixture-of-Experts (MoE) architecture and supports a context length of up to 128k tokens. GLM-4.5 delivers significantly...

z-ai

$1.20 / $4.00 per 1M203K ctx

GLM-5 Turbo is a new model from Z.ai designed for fast inference and strong performance in agent-driven environments such as OpenClaw scenarios. It is deeply optimized for real-world agent workflows...

Z.ai: GLM 5.1

z-ai

$0.98 / $3.08 per 1M203K ctx

ChatCode

Z.ai: GLM 5V Turbo

z-ai

$1.20 / $4.00 per 1M203K ctx

GLM-5V-Turbo is Z.ai’s first native multimodal agent foundation model, built for vision-based coding and agent-driven tasks. It natively handles image, video, and text inputs, excels at long-horizon planning, complex coding,...

ChatVisionCodeTool Use