AI Pipeline
This page walks through the path from a Discord @mention to the reply
the user sees. It’s the most elaborate event path in the bot: history
building, personality injection, provider routing, tool-use loops,
response sanitising, and Discord-friendly chunking all happen before the
reply is sent. A reader who understands this page can confidently
extend the AI in new directions — adding tools, tweaking the system
prompt, swapping providers — without breaking the rest.
The core file is
src/ai/chat.rs.
Everything under src/ai/ is either called from there or defines data
it consumes. For how users actually interact with this from Discord,
see AI Chat.
Sequence
sequenceDiagram
participant U as User
participant E as events::handle_message
participant H as build_message_history
participant R as model router
participant API as DeepSeek / Gemini
participant T as tool executor
participant S as sanitize + split
participant D as Discord
U->>E: @bot what's the weather in tokyo?
E->>E: activation check (mention / reply + AI key)
E->>H: fetch 100 recent messages
H->>H: filter by age, started_at, bad-msg
H->>E: system prompt + history + current msg
E->>R: classify message: reasoning or chat?
R->>API: chat completion with tools
API-->>R: content + tool_calls (web_search, play_song, ...)
loop up to 3 rounds
R->>T: run search tool calls
T-->>R: search results
R->>API: re-call with results
end
R->>S: final text response
S->>S: sanitize, split at 2000 chars
S->>D: send one or more reply messages
T->>D: execute action tools (play_song, tempban, ...)
Not shown: vision routing (images go directly to Gemini 3 Flash via OpenAI-compatible endpoint before any of this), the moderation confirmation button flow, and the typing indicator that fires every 8 seconds while the pipeline is running.
Activation
Before any of the pipeline runs, the bot has to decide “is this for
me?” That check lives in
handle_message
in src/events/mod.rs and is deliberately narrow. Three conditions must
all hold before handle_mention is called:
- The message is from a non-bot author in a guild channel.
- The message either contains a direct mention of the bot user ID, or is a Discord reply to a message the bot itself sent.
- At least one of
DEEPSEEK_API_KEYorGEMINI_API_KEYis set onData::config.
There is no keyword trigger, no prefix alternative, no owner override.
Everything flows through @mention or reply. This keeps the activation
surface small, which matters for a bot that can issue tempbans and spend
real money on API calls.
A fourth short-circuit happens inside handle_mention: the per-user AI
rate limiter (RateLimiters::ai)
is a sliding window of 10 requests per 60 seconds. Users who exceed it
get a “Slow down — try again in Ns.” reply and the pipeline exits
before any API call.
History building
The AI’s memory is whatever messages the bot can reconstruct from the
channel’s recent history. There is no vector store, no long-term memory,
no per-user state. When the pipeline starts,
build_message_history
fetches the last 100 messages before the current one via
channel.messages(...).before(message.id).limit(100) and walks them in
reverse-chronological order.
From that window, it keeps the most recent 10 relevant messages, where “relevant” means:
- Posted after
Data::started_at. Bot messages from a previous process instance are filtered out, because they might be tied to state this process no longer has. - Posted within the last 30 minutes. Anything older is stale context — the AI would start mixing up a question from an hour ago with the current one.
- Either from the bot (assistant role) or from a human who was directly talking to the bot (either mentioning it or replying to a bot message). Messages that are just general channel chatter don’t get included — the bot is not trying to maintain a running summary of the whole channel.
- Not a known bad assistant message. Leaked
I'm Claude, memory denials, broken tool replies, and error strings are pattern-matched viaBAD_ASSISTANT_PATTERNSand skipped — and any user message those bad assistant messages were replying to is skipped too, on the theory that if the AI blew up on that question, feeding it back in will make it blow up again.
Bot messages whose content is empty but that carry embeds (Now Playing,
Added to Queue, confirmation prompts, etc.) get their embeds summarised
into a compact [Already completed action] [title: description] string.
This is what keeps the AI from replaying the same music request every
time someone @mentions it — the embed history tells the model “you
already did this, move on.”
After the loop, the builder pushes a synthetic system message:
Everything above is conversation history for context only. You have already responded to all of it. Do NOT act on any previous requests again. The NEXT message is the current request — respond ONLY to it.
This separator is a necessary belt-and-braces measure against a failure mode where the AI would pick up an earlier question and answer it instead of the new one. Finally, the current message is appended as a user-role message, prefixed with the author’s display name.
If the current message is a Discord reply to a non-bot message, the
builder fetches that referenced message, truncates it to 300 characters,
and prepends a [Replying to name: "..."] marker to the current
message. If the referenced message has image attachments, they’re
collected for vision routing.
Personality and system prompt
The personality file loaded from CONFIG_DIR/personality.txt is
appended verbatim into the system prompt by
get_system_prompt.
Around it, the function hard-codes:
- The current date (so the model doesn’t guess),
- The current bot version (pulled from
CARGO_PKG_VERSION), - A block explaining the music tools and the rules for using them (most importantly: only on explicit current requests, never replay old ones),
- A block explaining the web search tool and the rule that the model can search up to three times per turn,
- A block explaining moderation tools and that the system does its own permission checking,
- A block covering markdown capabilities, the user-name prefix convention, and how to handle mentions in message text,
- A security block instructing the model to refuse prompt-injection attempts and to treat role markers in user text as data, not instructions.
The personality file never sees this hard-coded framing: the bot operator writes only their instance’s voice, and the bot fills in the mechanics. See Personality Files for how to write the free-form half.
Provider selection
The bot speaks two providers, both through an OpenAI-compatible chat-completions API:
- DeepSeek at
https://api.deepseek.com/chat/completions.deepseek-v4-flash(DeepSeek V4) is the default for text.deepseek-v4-prois the flagship used for questions the router classifies as needing deeper thinking. - Gemini at
https://generativelanguage.googleapis.com/v1beta/openai/chat/completions.gemini-3-flash-previewhandles image vision, because DeepSeek’s chat model is text-only.
Routing happens in two places. First, vision routing: if the message (or the message it replies to) has image attachments and a Gemini key is configured, the pipeline preprocesses each image (resize to 1024x1024 max, re-encode as JPEG, base64 in a data URI) and sends the history as a multimodal completion to Gemini. If Gemini fails, the pipeline strips the images and falls through to the text path.
Second, reasoning routing: for text requests,
classify_message
sends the user’s most recent message to DeepSeek V4 with a one-shot
“yes/no — does this need deep reasoning?” prompt. If the classifier
says yes, the pipeline switches the active endpoint to deepseek-v4-pro.
Because the reasoner role can’t use tools, the pipeline first runs a
pre-flight loop on deepseek-v4-flash that’s allowed to call web_search up to
MAX_SEARCH_ROUNDS times (currently 3), collects the results, and
injects them into the V4-Pro conversation as extra system context
before asking V4-Pro the real question.
If the classifier itself fails (network error, timeout), the pipeline
defaults to deepseek-v4-flash without reasoning — “failing toward the cheap path” is
the preferred failure mode.
Tool use loop
Once an endpoint and model are picked, call_api posts the history
with the full
tool definitions
attached (except for the reasoner role, which gets no tools). The response
contains content (the assistant message), tool_calls (any function
calls the model wants to invoke), or both.
Tools come in two flavours:
- Search tools — just
web_searchtoday. The model asks for a search, the bot runs it, the result goes back to the model as arole: "tool"message, and the model gets another turn to decide whether to search again or answer. Up toMAX_SEARCH_ROUNDSrounds (currently 3), after which the pipeline forces a final answer with tools disabled. The sameMAX_SEARCH_ROUNDSconstant insrc/ai/chat.rsis interpolated into the system prompt and drives both the V4-Flash chat loop and the V4-Pro pre-flight loop, so the prompt and the code can never disagree about the limit. - Action tools — everything that changes state:
play_song,skip,stop,pause,resume,show_queue,now_playing,shuffle,set_loop,remove_from_queue,tempban,unban,nuke,stock_buy,stock_sell,stock_price,stock_portfolio,stock_leaderboard,connections_start,wordle_start.
Search tools are executed inside the loop because their results feed back into the model. Action tools are executed after the text response is posted: the bot sends the model’s witty reply, then runs the actions. This preserves the personality when actions have their own output (skip messages, Now Playing embeds, etc.) and keeps the user experience close to “bot says something, then does the thing.”
Action tools are dispatched from a single for loop in handle_mention
that checks each call against is_moderation_tool, is_stock_tool,
is_connections_tool, is_wordle_tool, and falls through to
execute_music_tool for the rest. Moderation tools go through an
extra step: a Discord confirmation embed with Approve/Cancel buttons,
handled by
request_confirmation.
That function pre-checks the requesting user’s guild permissions
(computed from role permissions because Message::member.permissions
is often None for fetched messages), posts the confirmation embed,
waits up to 30 seconds for the original author to click, and returns
approval status. Only then does the moderation action run. Other
action tools run without confirmation — permissions are enforced inside
each tool by the DJ mode check or by Discord’s own permissions.
DSML: tool calls in prose
DeepSeek V4 sometimes emits tool calls as structured text inside the
content field instead of the proper OpenAI-style tool_calls array.
The bot handles this by parsing a custom “DSML” (Discord Structured
Message Language) block out of the content — fullwidth pipe characters
wrapping <|DSML|invoke name="...">...</|DSML|/invoke> — in
parse_dsml.
Any DSML tool calls found are appended to the real tool call list and
the content is cleaned up before being shown to the user. This is a
resilience hack for model quirks; the primary path is still proper
function calling.
Response sanitising
Two kinds of cleaning happen to AI-adjacent text.
Input sanitising is applied to every bit of user text that gets
added to the history. Mentioned in
sanitize_content,
it rewrites system:, assistant:, user: role markers into
bracketed forms ([system]:), strips DeepSeek’s internal <|...|>
tokens, and strips Llama-style [INST] and <SYS> markers. The point
is not to block every conceivable prompt injection — that’s impossible
— but to make it harder to slip a realistic-looking “new system
prompt” into the model’s conversation by typing one into Discord.
Output filtering happens at history-build time. Past bot messages that match known failure patterns (“I’m Claude”, “I don’t have access to our previous”, “created by Anthropic”, “Failed to join”, etc.) are skipped when reconstructing the history, and so are any user messages those bad bot messages were replying to. This is a self-healing mechanism: if the model goes off the rails once, the next turn won’t see the broken exchange and is less likely to repeat it.
Response splitting
Discord messages max out at 2000 characters. The splitter in
src/ai/split.rs
takes a raw response string and returns a Vec<String> of chunks each
under the limit. Simple cases (response under 2000 chars) return a
single-element vec. For long responses, the splitter walks forward
looking for the best break point:
- If the current chunk ends inside a fenced code block (an odd number of
```markers), the splitter finds the opening fence and either splits just before it (if the code block hasn’t yet started near the top of the chunk) or closes it with```and re-opens it with```langin the next chunk, preserving syntax highlighting. - Otherwise, it prefers breaking on
\n\n, then\n, then". "— in that order. The split point has to be at least 200 bytes into the chunk to avoid pathological tiny slices.
All slicing is done at char boundaries (UTF-8 safety), not byte boundaries, so multi-byte characters don’t get cut in half.
Rate limiting
Rate limiting for the AI path is a per-user sliding window configured in
src/util/ratelimit.rs:
10 requests per 60 seconds, shared across every AI interaction. It’s
enforced in handle_mention before any API call. The other limiters —
music, moderation, stocks, and welcome — are all enforced too
on their respective paths; see the
Concurrency Model rate-limiter section
for the full table and the periodic bucket-cleanup task that keeps the
limiter maps from growing without bound.
Rate limiting at the API layer (DeepSeek / Gemini quotas) is the provider’s responsibility; the bot doesn’t pre-check quotas and relies on the API’s own error responses.
Error handling inside the pipeline
Each layer has its own fallback:
- Classifier fails → default to
deepseek-v4-flash(non-reasoner) path. - Vision API fails → strip images and fall through to text.
- Text API fails → reply with “Something went wrong talking to the
AI. Try again in a sec.” Log the upstream error with
tracing. - “Content Exists Risk” censored response from DeepSeek → reply with a sarcastic “my overlords at DeepSeek won’t let me talk about that.”
- Search tool fails → inject “Search failed.” as the tool result and let the model continue with whatever it has.
- Tool call with bad arguments → the tool executors generally
unwrap_or(...)past missing fields rather than erroring, because the user has already waited for the model and a silent no-op is better than a red error string. - Tool dispatch / DB / HTTP failures inside a tool → all
user-facing replies in
handle_mention’s tool loop now use the same generic, sanitised wording asBotError::user_message()(“Something went wrong talking to the database. Please try again later.”, etc.). Operators still see the full upstream error viatracing::error!with the failing tool name and guild ID, but rawsqlx/reqwest/serde_jsonstrings never reach Discord. See Error Handling for the mapping table.
The typing indicator is re-triggered every 8 seconds by a spawned
background task, so users see the bot “thinking” for the whole duration
of a slow tool-use loop. That task is aborted on every exit path
(typing_handle.abort()) to keep it from leaking past the end of the
conversation.
Known issues
- Context bleed. The 10-message / 30-minute window is a compromise. Shorter would drop useful context; longer pulls in stale questions the AI wants to answer. Users occasionally see the AI start answering a question from 20 minutes ago when they @mention it with something new. The self-healing filter helps but doesn’t eliminate it.
- Permission checks on message.member. Confirmation flow has to
recompute permissions from the guild’s role table because
message.member.permissionsis oftenNonefor messages fetched via the API. This is a serenity quirk, not a design choice. rmcpsession auth. See MCP Gateway Routing for the current state of MCP authentication.
Cross-links
- AI Chat — user-facing feature description.
- Personality Files — how to write
personality.txt. - Concurrency Model — the rate limiter and
DashMappatterns the pipeline leans on. - MCP Tool Catalog — the separate tool surface exposed via MCP, not via the AI pipeline.