Error Handling
This page describes how errors flow through the bot: where they start,
where they get turned into user-visible messages, and where they’re
logged and swallowed. The design is deliberately minimal — one error
type, one on_error hook, and a handful of rules about who panics and
who doesn’t.
The BotError type
Every fallible function in this codebase returns Result<T, BotError>,
where BotError is a plain hand-written enum defined in
src/error.rs:
#[derive(Debug)]
pub enum BotError {
Serenity(serenity::Error),
Sqlx(sqlx::Error),
Reqwest(reqwest::Error),
SerdeJson(serde_json::Error),
Other(String),
}
Each variant wraps one upstream error type. The fifth variant, Other,
is an escape hatch for ad-hoc string errors that don’t correspond to a
specific upstream — BotError::Other("Not in a guild".into()) is a
common pattern when a command can’t proceed because of a missing
argument or a precondition failure.
Conversions live right next to the definition as From impls:
impl From<serenity::Error> for BotError { ... }
impl From<sqlx::Error> for BotError { ... }
impl From<reqwest::Error> for BotError { ... }
impl From<serde_json::Error> for BotError { ... }
impl From<String> for BotError { ... }
These From impls are the reason command handlers can use ?
everywhere. let expires_at = create_tempban(...).await?; turns a
sqlx::Error into a BotError::Sqlx and bubbles up, without the
handler having to know what create_tempban can fail with. The enum
implements std::error::Error and Display, so errors also format
sensibly when logged.
There’s no thiserror, no anyhow, no derived From. The enum is
small enough that hand-writing the impls is cleaner than pulling in a
macro dependency, and the explicitness makes it obvious what kinds of
errors the bot actually handles.
Where errors are surfaced
Command errors. Poise ties every handler’s return value to its
framework error hook. A command returning Err(BotError::Sqlx(...))
or Err(BotError::Other("...".into())) raises a
FrameworkError::Command, which the hook turns into a user-facing
reply and a tracing log. That hook lives in main.rs and now uses
BotError::user_message() to keep operator-only details out of
chat:
on_error: |error| Box::pin(async move {
match error {
poise::FrameworkError::Command { error, ctx, .. } => {
// Full error (including upstream sqlx/reqwest text) goes
// to logs only.
tracing::error!("Command error: {error}");
// Sanitised, per-variant copy goes to the user.
let _ = ctx.say(error.user_message()).await;
}
other => {
tracing::error!("Framework error: {other}");
}
}
})
The split between Display and user_message() is deliberate:
Displaystill produces the verbose form ("Database error: <sqlx message>","HTTP error: <reqwest message>", …) and is what gets logged. It carries every byte of upstream context an operator might want to grep for.user_message()returns a fixed, generic per-variant string (“Something went wrong talking to the database. Please try again later.”, “Couldn’t reach an external service. Please try again.”, …) — except forOther(s), which is treated as already-curated copy and passed through verbatim. That last case is what makes short messages like"Not in a guild"and validation errors still surface naturally.
The user sees the short, friendly form. The operator sees the full upstream chain in the logs and can correlate by timestamp.
Other FrameworkError variants (permission denied, argument parse
failure, missing subcommand) are logged but not replied to. The
default poise behaviour is to post a short notice for some of these;
the current hook is deliberately minimal, because command errors are
rare and usually only interesting to operators.
Event handler errors. Event handlers (message handler, voice
state, component interactions, member join) don’t return Result in
the usual sense. The top-level event_handler function returns
Result<(), BotError>, but it never actually returns Err — every
sub-handler uses let _ = ... to swallow individual errors and
continues. This is because an event handler has no good place to post
an error: the “user” who triggered the event might be a raw gateway
event like a voice state update, not a chat message, so there’s
nothing to reply to.
Instead, event handlers emit tracing::error! or tracing::warn!
at the site of the failure. For example, the auto-role promotion
path inside handle_message spawns a task that logs via
tracing::warn!("Auto-role promotion failed for {}: {}", ...). The
user sees nothing; the operator sees the error in the logs.
Background task errors. main.rs spawns several long-running
tasks (rate-limiter cleanup, tempban unban sweep, auto-role
time-based check, donator sync). Each iteration body runs inside the
run_supervised helper, which wraps it in
AssertUnwindSafe(...).catch_unwind(). A panic inside one iteration
is caught and logged via tracing::error! with the task name and
panic payload, and the outer loop continues to the next iteration —
“a background task should never take the bot down” is now enforced
by the wrapper, not just by convention. Recoverable errors inside the
body still use the same tracing::warn! / tracing::error! pattern
and continue. See
Concurrency Model: background task supervision
for the JoinSet plumbing and graceful-shutdown story.
The AI pipeline. handle_mention doesn’t return a Result at
all. It uses pattern matching and explicit return statements to
exit on failure paths, and posts its own user-visible messages for
things like “Something went wrong talking to the AI.” This is by
design: the pipeline has too many recoverable states (classifier
failure, vision failure, censored response, search failure) for the
? operator to express naturally, so it handles each one explicitly.
The tool dispatch paths used to compose user-facing replies as
message.reply(format!("Database error: {e}")) (and similar for
reqwest/serde_json/MCP errors), which leaked the same internal
detail the command path now hides. Those reply sites have all been
swept to use the same generic copy as BotError::user_message(),
with the full upstream error logged via tracing::error! carrying
the failing tool name and guild ID for operator diagnostics.
Debug vs production
What gets logged and what gets shown to users is split on purpose:
- Logs (
tracing::error!,tracing::warn!): the fullDisplayform ofBotError, including every byte of upstream context. These go to stderr and whatever log aggregator the operator has set up.tracing_subscriber::fmt::init()inmain.rsis the default config — override withRUST_LOGto raise or lower the level. - User messages (
ctx.say(...),message.reply(...)): the output ofBotError::user_message(). A failed DB query becomes"Something went wrong talking to the database. Please try again later."; a flaky upstream API becomes"Couldn't reach an external service. Please try again.". The user knows something is broken and that retrying is reasonable, but no SQL fragment, hostname, or serde path leaks to chat.
The split matters because upstream error messages can contain
information that shouldn’t be in Discord — file paths, internal
hostnames, table names, partial stack traces from dependencies. The
old format!("Error: {e}") path leaked all of that whenever a
handler bubbled up a BotError::Sqlx or BotError::Reqwest; the
user_message() mapping closes that gap. The same swept the
BotError::Other(format!("...{e}")) pattern out of the AI tool
dispatch paths so a failing DeepSeek call can’t smuggle a raw HTTP
body into chat by way of Other.
Panics
Panics are reserved for one specific case: startup config validation.
The get_env_or_throw helper in
src/config.rs
panics if a required environment variable is missing or contains a
placeholder value:
fn get_env_or_throw(key: &str) -> String {
let val = env::var(key).unwrap_or_else(|_| panic!("{key} must be set in .env"));
if val.starts_with("your-") {
panic!("{key} has placeholder value — set it in .env");
}
val
}
This is used for DISCORD_TOKEN, CLIENT_ID, and GUILD_ID — the
three variables without which the bot literally cannot connect. A
missing value there is a deployment error that the operator needs to
see immediately, in the clearest possible way, before the process
starts doing real work. The panic message ends up in the process
output and the operator fixes it.
The database_url, MCP bind config, and AI API keys do not panic.
They fall back to defaults or stay unset, and the features that need
them either disable themselves or warn at first use.
Optional config is never a panic. When config.toml has a
feature enabled but its corresponding [feature_name] section is
missing, main.rs warns through tracing::warn!(...) and skips the
feature. For example:
let auto_role_config = if instance_cfg.features.auto_role {
match &instance_cfg.auto_role {
Some(cfg) => { /* log, enable */ Some(cfg.clone()) }
None => {
tracing::warn!("Auto-role feature enabled but [auto_role] config section missing");
None
}
}
} else {
None
};
The same pattern repeats for the minecraft donator-sync config, the chargeback config, the join-role config, and the welcome prompt file. Missing optional config is always a warning plus a disabled feature, never a crash. Operators can ship a bot with half its features half-configured and it’ll still start — the log just tells them what they missed.
Runtime panics elsewhere — inside a command handler, an event handler, or a background task — are considered bugs. If one happens, Tokio will catch the task panic and log it, and the rest of the runtime will keep going. The user whose command triggered the panic sees nothing, which is unpleasant but better than the process exiting.
Cross-links
- Data Flow — the shape of the call chain that produces these errors in the first place.
- Debugging — how to read the logs and track a failure back to its source.
- Environment Variables — the required variables whose absence produces a startup panic.