Designing for AI Uncertainty
AI systems are probabilistic, not deterministic. This guide covers how to design interfaces that honestly communicate uncertainty, set correct expectations, and maintain user trust when the model is wrong.
What is it?
AI systems produce probabilistic outputs — they generate likely answers, not certain ones. Unlike traditional software where every output is deterministic, AI outputs vary in accuracy, confidence, and consistency. Designing for AI uncertainty means building interfaces that communicate this honestly, help users calibrate their trust appropriately, and gracefully handle cases where the model is wrong or unsure.
Why it matters
Failing to communicate AI uncertainty causes two critical problems: over-trust (users rely on incorrect outputs without verification) and under-trust (users distrust a system that is actually reliable). Both destroy the value of the product. Research from Google Research and Microsoft shows that users who are shown calibrated confidence signals make better decisions with AI than users shown outputs without any confidence information.
Best Practices
- Communicate what the AI is and is not capable of. Set expectation boundaries clearly during onboarding and at the point of output.
- Use confidence signals where outputs vary meaningfully in accuracy. Distinguish between high-confidence and low-confidence results visually.
- Design clear fallback paths for low-confidence scenarios: "I'm not sure — here are some alternatives" or "Let me connect you with a human."
- Never present AI output as fact without appropriate hedging language: "Based on available information," "This may not be complete," "Results may vary."
- Give users the ability to verify, correct, or provide feedback on AI outputs. Editable outputs are far safer than read-only outputs.
- Show the reasoning or sources when possible (explainability). Users who understand why an output was generated are better equipped to evaluate it.
- Design for the error recovery path first, not as an afterthought. What does the user do when the AI is wrong?
- Avoid AI completeness theater — don't show a confident, authoritative output for tasks where the model's accuracy is genuinely limited.
Common Mistakes
- Presenting AI-generated text with the same visual authority as factual, verified content.
- Hiding or downplaying error rates. Users who don't know failure rates cannot appropriately calibrate trust.
- Using confident, declarative language for speculative or low-confidence outputs: "Your report shows..." when accuracy is ~60%.
- No mechanism for users to correct or report wrong outputs — silent failures erode trust invisibly.
- Over-caveating every output until the product becomes unusable — excessive disclaimers cause users to ignore them entirely.
- Assuming users will always evaluate AI outputs critically — most users accept what a confident-looking interface presents.
Checklist
Research & Theory
Calibration and Trust in AI (Yin et al., 2019)
Research showing that users who are shown AI confidence scores make more accurate decisions than users who see outputs without confidence information — but only when the confidence scores are well-calibrated.
Why it's relevant
Showing confidence only helps if it is accurate. Poorly-calibrated confidence signals (always showing 90%+) are worse than no confidence signal at all.
Automation Bias (Parasuraman & Manzey, 2010)
Users systematically over-trust automated systems, especially when the system appears authoritative. They verify AI outputs less often than they should.
Why it's relevant
The default human behavior when using AI is to trust it too much. Design for this by building in verification steps, uncertainty signals, and editable outputs.
Human-AI Interaction Guidelines (Microsoft, 2019)
A set of 18 guidelines for human-AI interaction design, including: make clear what the system can and cannot do, make clear why the system did what it did, support efficient correction.
Why it's relevant
The most comprehensive published framework for AI UX design. Direct practical application to every AI product.
Real-World Examples
GitHub Copilot
Code suggestions are clearly marked as suggestions, not correct code. Multiple alternatives are offered (via cycling). The expectation that the developer will evaluate the suggestion is built into the workflow.
Google Search Generative Experience
AI-generated summaries are visually distinct from web results and include citations. Users can expand to see sources. Hedging language is used for uncertain topics.
Notion AI
AI-generated content is inserted into a document in a clearly-styled "draft" state. Users must explicitly accept, edit, or discard the output — no silent injection into permanent content.