Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

AI Providers

This page documents the [ai.providers], [ai.routing], and [ai.fallback] sections of an instance’s config.toml. All three are optional — an instance with no [ai.*] section uses a sensible default stack (DeepSeek for chat, Gemini for vision, DeepSeek Reasoner for hard questions, no CENSORED cascade).

Quick reference

# instances/<your-bot>/config.toml

[ai.providers.<name>]
url = "https://..."           # required: full chat-completions endpoint
model = "..."                 # required: model identifier
api_key_env = "..."           # required: name of env var holding the bearer token
max_tokens = 8192             # required: per-provider hard cap on response tokens
timeout_secs = 30             # optional, default 30
supports_vision = false       # optional, default false
supports_tools = true         # optional, default true
is_reasoner = false           # optional, default false
spec = "openai"               # optional, default "openai"

[ai.routing]
chat = "<provider-name>"      # required if section present
vision = "<provider-name>"    # optional — if omitted, image messages fall through to chat
reasoner = "<provider-name>"  # optional — if omitted, classifier step is skipped

[ai.fallback]
on_censored = ["<name>", "..."]  # CENSORED-cascade chain (optional)

Default registry

When [ai.providers] is absent, the bot ships these four definitions:

NameURLModelEnv varmax_tokenstimeoutvisiontoolsreasoner
deepseek_chathttps://api.deepseek.com/chat/completionsdeepseek-v4-flashDEEPSEEK_API_KEY819230noyesno
deepseek_reasonerhttps://api.deepseek.com/chat/completionsdeepseek-v4-proDEEPSEEK_API_KEY65536300nonoyes
gemini_flashhttps://generativelanguage.googleapis.com/v1beta/openai/chat/completionsgemini-3-flash-previewGEMINI_API_KEY1638430yesyesno
grokhttps://api.x.ai/v1/chat/completionsgrok-3GROK_API_KEY1638430noyesno

A provider whose api_key_env resolves to an unset/empty env var is “unavailable” — defined but not usable. The bot starts and runs without it; AI features that depend on it are silently disabled (with a warning at startup if anything references it).

Default routing

When [ai.routing] is absent:

chat = "deepseek_chat"
vision = "gemini_flash"
reasoner = "deepseek_reasoner"

This is exactly the routing behaviour of every release before 0.15.0. Existing instances pick it up automatically with no config changes.

Routing degradation rules

When [ai.routing] IS present, only the keys you write take effect. There’s no field-merge with the defaults above. Specifically:

RoleIf setIf omitted
chatMust resolve to a configured provider; otherwise the bot panics at startupRequired — bot panics at startup
visionMust resolveImage-bearing requests fall through to chat with a warning log
reasonerMust resolveClassifier step is skipped; every request goes to chat

This lets you write a one-model config:

[ai.providers.my_local]
url = "http://localhost:11434/v1/chat/completions"
model = "llama3.1:70b"
api_key_env = "LOCAL_LLM_KEY"
max_tokens = 8192

[ai.routing]
chat = "my_local"
# vision and reasoner omitted → graceful degrade

Disabling V4-Pro flagship

deepseek_reasoner defaults to DeepSeek V4-Pro (the 1.6T-parameter flagship). V4-Pro output costs roughly 12× V4-Flash output per token, so high-volume reasoner traffic adds up quickly. The existing routing system already provides the off-switch — no per-feature boolean is needed.

To skip V4-Pro entirely without redefining a provider, point the reasoner role at the cheaper V4-Flash:

[ai.routing]
reasoner = "deepseek_chat"

To disable the reasoner role altogether — the bot will never invoke a reasoner provider, and every chat goes through the chat role — set [ai.routing] and omit reasoner:

[ai.routing]
chat = "deepseek_chat"
vision = "gemini_flash"
# reasoner intentionally omitted — graceful degrade

Either pattern leaves V4-Pro unconfigured by routing and unbilled by DeepSeek.

Provider definitions

Each [ai.providers.<name>] block is independent. The <name> is your handle for the provider — used in [ai.routing] lookups, in [ai.fallback] on_censored lists, and in log lines.

Required fields

  • url — full HTTPS endpoint (the chat-completions URL, including any version path).
  • model — model identifier the provider expects in the request body.
  • api_key_env — name of an environment variable. The bot reads std::env::var(api_key_env) at startup; an unset/empty value marks the provider unavailable.
  • max_tokens — per-provider hard cap on response tokens. The orchestration layer asks for whatever budget it wants and clamps to this cap.

Optional fields

  • timeout_secs (default 30) — HTTP timeout for chat-completions calls. DeepSeek Reasoner needs 300 (5 minutes); fast chat models are fine with the default.
  • supports_vision (default false) — whether the model accepts image content parts. Today only the Gemini default registry entry sets this to true.
  • supports_tools (default true) — whether the model accepts a tools array. DeepSeek Reasoner is the standout exception (set to false).
  • is_reasoner (default false) — flags a slow reasoning model. The orchestration layer uses this signal alongside the longer timeout_secs budget you should also set.
  • spec (default "openai") — request/response shape. "openai" (default) and "anthropic" (added in 0.16.0) are supported. See Anthropic spec for details.

Provider name rules

  • Non-empty after .trim()
  • No internal whitespace characters
  • TOML’s bare-key rules already constrain [ai.providers.<name>] syntax to safe characters (alphanumeric, underscore, hyphen, dot)
  • A user-defined name that matches a default-registry name (e.g. gemini_flash) fully replaces the default — no field-level merge

Backward-compatible alias names

For instance configs that pin model-string aliases instead of canonical provider names, the bot recognises a small set of short aliases at lookup time. They were introduced in two waves:

AliasResolves toAdded in
geminigemini_flash0.14.0
deepseekdeepseek_chat0.14.0
deepseek-chatdeepseek_chat0.14.0
deepseek-v4deepseek_chat0.18.0
deepseek-v4-flashdeepseek_chat0.18.0
deepseek-v4-prodeepseek_reasoner0.18.0
deepseek-reasonerdeepseek_reasoner0.18.0

These aliases work in [ai.fallback] on_censored and in any other place the bot looks up a provider by name at request time. They are not accepted by [ai.routing] startup validation — using [ai.routing] chat = "gemini" panics at startup with the canonical name (gemini_flash) in the error message.

The 0.14.0 aliases exist so a [ai.fallback] on_censored = ["grok", "gemini"] line copied from a 0.14.0 example doesn’t silently produce an empty cascade. The 0.18.0 aliases preserve [ai.fallback] configs that named DeepSeek’s deepseek-reasoner (retiring 2026-07-24) and add forward-compatible spellings for the explicit V4 model names.

New configs should use the canonical names (deepseek_chat, deepseek_reasoner, gemini_flash, grok).

Anthropic spec

As of 0.16.0, spec = "anthropic" enables native Anthropic /v1/messages routing. This is useful for using Claude directly without going through an OpenAI-compat proxy — native routing preserves Claude’s structured tool use, vision content parts, and prompt caching (future work).

An Anthropic provider definition looks like this:

[ai.providers.claude]
spec = "anthropic"
url = "https://api.anthropic.com/v1/messages"
model = "claude-opus-4-7"
api_key_env = "ANTHROPIC_API_KEY"
max_tokens = 8192
supports_vision = true
supports_tools = true

# Anthropic's auth is x-api-key with no scheme prefix.
auth_header = "x-api-key"
auth_scheme = ""

# Anthropic's required version header.
headers = { "anthropic-version" = "2023-06-01" }

New fields (also available to OpenAI providers)

  • headersHashMap<String, String> of extra HTTP headers. Default empty. Values must be printable ASCII. Use inline-table syntax (headers = { "x" = "y" }) for 1-2 headers, or a sub-table ([ai.providers.claude.headers]) for longer lists.
  • auth_header — name of the auth header. Default "Authorization". Must be non-empty.
  • auth_scheme — prefix prepended to the API key. Default "Bearer " (with trailing space). Use "" for Anthropic.

These fields are respected by both the OpenAI and Anthropic paths — you can use them on any provider that needs custom auth or headers (e.g. a self-hosted endpoint requiring a custom x-internal-auth header).

Translation — what the bot handles automatically

When you route to an Anthropic provider, the bot translates every shape difference transparently. The internal tool definitions, system prompt, and message history are all built in OpenAI shape (the bot’s internal canonical form) and translated to Anthropic’s wire shape on each request. You never need to write Anthropic-specific prompt logic.

Translation covers:

  • System prompt → top-level system field on the request body (not a role: "system" message in the array)
  • Image content parts → base64 source blocks with correct media_type
  • Tool definitions → flat {name, description, input_schema} shape
  • Tool call responses → tool_use content blocks are extracted into the same flat ToolCall { id, name, arguments } shape the bot uses internally
  • Tool result messages → wrapped in user-content tool_result blocks

What works with Claude today

  • Text chat (any routing role)
  • Vision (when supports_vision = true)
  • Tool use (when supports_tools = true)
  • Multi-round search via the CENSORED cascade / [ai.fallback] on_censored = ["claude", ...] as a post-DeepSeek-refusal fallback
  • Mixed setups: DeepSeek primary + Claude as cascade member, or Claude primary + DeepSeek as reasoner, etc.

What’s not yet available

  • Streaming responses
  • Anthropic’s cache_control ephemeral blocks for prompt caching
  • Structured “thinking” / extended reasoning outputs (Claude’s reasoning models)

These are tracked as future enhancements; today’s integration gives feature parity with DeepSeek/Gemini for the bot’s standard workflow.

Validation behaviour

Performed once at startup, before the bot connects to Discord:

CaseBehaviour
[ai.routing] chat unset OR points at unknown provider namePanic
[ai.routing] vision / reasoner set to unknown provider namePanic
[ai.routing] section present without chatPanic
Provider name contains whitespacePanic
Provider with spec = "anthropic"Fully supported as of 0.16.0 — see Anthropic spec
Provider’s api_key_env resolves to unset env varProvider marked unavailable
Routing or fallback references an unavailable providerWarn at startup
[ai.routing] vision references provider with supports_vision = falseWarn at startup
[ai.routing] reasoner references provider with is_reasoner = falseWarn at startup
[ai.fallback] on_censored references defined-but-unavailable providerWarn at startup; cascade_for skips it at request time
[ai.fallback] on_censored references completely unknown nameWarn at request time only (via cascade_for); silently produces empty cascade entry. Common when migrating from 0.14.0 configs that used aliases — see Backward-compatible alias names above

Worked examples

One-model setup (local Ollama)

[ai.providers.local]
url = "http://localhost:11434/v1/chat/completions"
model = "llama3.1:70b"
api_key_env = "OLLAMA_API_KEY"   # any non-empty value works for Ollama
max_tokens = 8192

[ai.routing]
chat = "local"

Vision and reasoner gracefully degrade. Image messages will be handed to the local model anyway; classifier is skipped so every prompt goes straight to chat.

Three providers + cascade

[ai.providers.openai_gpt]
url = "https://api.openai.com/v1/chat/completions"
model = "gpt-4o"
api_key_env = "OPENAI_API_KEY"
max_tokens = 16384
supports_vision = true

[ai.routing]
chat = "openai_gpt"
vision = "openai_gpt"            # reuse same provider for vision
reasoner = "deepseek_reasoner"   # keep the default reasoner

[ai.fallback]
on_censored = ["grok", "gemini_flash"]

Override a default

To use gemini_flash but with a different model:

[ai.providers.gemini_flash]
url = "https://generativelanguage.googleapis.com/v1beta/openai/chat/completions"
model = "gemini-2.5-pro"           # not the default flash; check ai.google.dev for newer Pro models
api_key_env = "GEMINI_API_KEY"
max_tokens = 32768
supports_vision = true

The user definition fully replaces the default gemini_flash entry. No [ai.routing] change needed if you’re keeping the same role assignment.

See also