AI Product Design | Rookie Tutorial
\\n\\nIf you design AI features like ordinary software features, the result will likely disappoint users.
\\n\\n- \\n
- Ordinary software is deterministic—you click save, it saves, and it's the same every time. \\n
- AI products are probabilistic—you ask it to write copy, it does well this time, but may not next time. \\n
When ordinary software makes an error, it shows an error message. When AI products make an error, they may confidently talk nonsense, making you think they're right.
\\n\\nThis fundamental difference determines that AI product design requires a completely new approach.
\\n\\n\\n\\n\\nTraditional product design pursues zero defects, where users get the same result every time. AI product design must accept probabilistic output and help users understand and deal with the uncertainty of results.
\\n
\\n\\n
AI Product Thinking
\\n\\nTo design good AI products, you must first understand the uniqueness of AI and the psychological changes it brings to users.
\\n\\nProbabilistic Output
\\n\\nThe output of large models is essentially probability sampling. The same question may yield different answers each time—sometimes good, sometimes bad, sometimes mediocre. This is not a bug; it's a characteristic of AI.
\\n\\nFor designers, this means several key principles:
\\n\\n| Design Principle | \\nSpecific Approach | \\nWhy It Matters | \\n
|---|---|---|
| Provide retry mechanism | \\nLet users "generate again" | \\nGives users a second chance when dissatisfied | \\n
| Show multiple options | \\nGenerate 3-5 versions simultaneously for selection | \\nIncreases probability of users finding satisfactory results | \\n
| Allow editing and modification | \\nAI-generated content should be easy to edit | \\nUsers can fix imperfect parts | \\n
| Set quality expectations | \\nTell users "results may need adjustment" | \\nAvoids excessively high expectations | \\n
Excellent AI products don't try to hide this uncertainty; they turn it into an advantage.
\\n\\nUser Psychology: Expectation Management
\\n\\nUsers' expectations of AI often swing between two extremes: either too high or completely dismissive.
\\n\\nPeople using ChatGPT for the first time often exclaim how amazing it is, thinking AI can do anything.
\\n\\nWhen AI makes a silly mistake, they may immediately conclude AI is nothing special.
\\n\\nThe product designer's task is to guide users' expectations to a reasonable range.
\\n\\nHow?
\\n\\n- \\n
- First, clearly tell users what AI can and cannot do. \\n
- Second, when AI makes mistakes, don't try to hide them—face them honestly. \\n
- Third, give users control—let them adjust, edit, and veto AI output. \\n
\\n\\n\\nGood AI products make users feel: AI is my assistant, not my boss.
\\n
Failure Modes of AI Products
\\n\\nAI products fail differently from traditional software.
\\n\\nTraditional software either works or doesn't—it crashes, shows errors, or features stop working.
\\n\\nAI product failures are more subtle:
\\n\\n| Failure Mode | \\nManifestation | \\nResponse Strategy | \\n
|---|---|---|
| Hallucination | \\nFabricating non-existent facts, citations, data | \\nAdd fact-checking, provide source annotations, allow user verification | \\n
| Alignment failure | \\nAnswer doesn't match user intent | \\nProvide clarifying questions, let users confirm intent, multi-turn dialogue optimization | \\n
| Unstable output quality | \\nSometimes good, sometimes poor | \\nProvide multiple options, allow retries, let users rate and give feedback | \\n
| Overconfidence | \\nConfidently stating wrong information | \\nAdd confidence display, use more cautious phrasing, encourage questioning | \\n
| Context loss | \\nForgetting key information from earlier conversation | \\nShow context summary, allow referencing history, provide conversation memory management | \\n
Understanding these failure modes is the first step to designing good AI products.
\\n\\n\\n\\n
AI Feature Design Principles
\\n\\nThree core principles: progressive disclosure, user control, and transparent communication.
\\n\\nProgressive Disclosure
\\n\\nDon't dump all features on users at once.
\\n\\nThe ideal flow is: give users a simple starting point first, then gradually show more options as needed.
\\n\\nFor example, when users write emails:
\\n\\n- \\n
- Step 1: Enter a topic or keywords, AI generates a draft. \\n
- Step 2: After seeing the draft, users can adjust tone, length, and style. \\n
- Step 3: If needed, further modify specific paragraphs or have AI provide several different versions. \\n
The benefits of this design:
\\n\\nBeginners won't be intimidated by complex options, while experts can find sufficient control.
\\n\\n| Stage | \\nUser Sees | \\nUser Action | \\n
|---|---|---|
| Initial interface | \\nSimple input box | \\nEnter basic requirements | \\n
| After generation | \\nResult + basic adjustment options | \\nSelect "more formal," "more concise," etc. | \\n
| Expanded advanced options | \\nDetailed parameter controls | \\nAdjust temperature, role, format, etc. | \\n
Controllability: Let Users Adjust AI Output
\\n\\nUsers need to feel that they are the final decision-makers.
\\n\\nProvide control knobs, not "take it or leave it" one-time results.
\\n\\nCommon control dimensions:
\\n\\n| Control Dimension | \\nTypical Options | \\nApplicable Scenarios | \\n
|---|---|---|
| Style/Tone | \\nFormal, casual, humorous, professional | \\nWriting, emails, copy | \\n
| Length | \\nBrief, medium, detailed, long | \\nSummaries, articles, reports | \\n
| Complexity | \\nSimple and easy to understand, medium, professional depth | \\nExplanations, tutorials, technical documents | \\n
| Creativity | \\nConservative, balanced, boldly innovative | \\nBrainstorming, creative writing | \\n
| Format | \\nList, paragraph, table, outline | \\nNotes, planning, documents | \\n
More advanced control: let users directly edit AI prompts, or save their own commonly used prompt templates.
\\n\\nTransparency: Inform Users This is AI-Generated
\\n\\nLetting users know they are interacting with AI is not only an ethical issue but also a product experience issue.
\\n\\nIf users think it's written by a human, they will feel deceived when they discover it's AI.
\\n\\nIf it's stated from the beginning that it's AI-generated, users will be more forgiving and more willing to participate in improvement.
\\n\\nSeveral approaches to transparency:
\\n\\n- \\n
- First, clear identification—use visual elements to distinguish AI content from user content. \\n
- Second, show the process—let users see how AI generates results (such as showing thinking process, retrieved sources). \\n
- Third, explain limitations—tell users what types of errors AI might make and how to identify them. \\n
\\n\\n\\nTransparency builds trust. When users know this is AI, they will use it the right way—as a reference rather than blindly trusting it.
\\n
\\n\\n
User Experience Design
\\n\\nAI products have several key experience touchpoints: loading states, error handling, and feedback mechanisms.
\\n\\nLoading State Design: Streaming Output
\\n\\nLarge models take time to generate content. Making users wait 10 seconds doing nothing creates a poor experience.
\\n\\nThe best approach is streaming output—content appears on screen word by word.
\\n\\nWhy is this good?
\\n\\n- \\n
- First, users feel the system is working, not frozen. \\n
- Second, users can start reading early, and halfway through know if it's the right direction. \\n
- Third, if the direction is wrong, they can interrupt at any time without waiting for full generation. \\n
Besides streaming output, you can also:
\\n\\n- \\n
- Show progress indicators—"Thinking," "Retrieving," "Generating." \\n
- Give estimated time—"Approximately 10 seconds needed." \\n
- Provide a cancel button—let users stop at any time. \\n
Error Handling and Degradation Plans
\\n\\nAI will definitely make mistakes. Good product design makes errors less frightening.
\\n\\nWhen AI output is obviously problematic:
\\n\\n- \\n
- First, give users a simple "dissatisfied" or "retry" button. \\n
- Second, provide alternatives—"How about trying this angle?" \\n
- Third, allow users to easily roll back to the previous state. \\n
More serious errors (such as the model not working at all):
\\n\\nHave a degradation plan. For example, when AI is unavailable, provide a template library for users to manually select from.
\\n\\n| Error Severity | \\nUser Manifestation | \\nProduct Response | \\n
|---|---|---|
| Minor issue | \\nResult has some flaws but is usable | \\nProvide editing functionality for users to fine-tune | \\n
| Moderate issue | \\nResult is wrong, needs regeneration | \\nProvide retry button, or suggest adjusting input | \\n
| Severe issue | \\nAI completely not working | \\nProvide degradation plan, such as template library | \\n
Feedback Mechanism
\\n\\nUser feedback is a valuable resource for improving AI products.
\\n\\nBut feedback functionality shouldn't be too complex, or users won't use it.
\\n\\nSimple and effective feedback design:
\\n\\n- \\n
- First, thumbs up/thumbs down—one-click expression of satisfaction or dissatisfaction. \\n
- Second, brief multiple choice—"What's wrong?" (Too long, too short, off-topic, factual error...). \\n
- Third, optional text box—let users explain problems in detail, but not required. \\n
- Fourth, tell users what feedback is for—"Your feedback will help us improve." \\n
After feedback is collected, it's best to give users confirmation on the interface—"Thank you for your feedback, we received it."
\\n\\n\\n\\n
Prompt Productization
\\n\\nGood prompts are the core competitive advantage of AI products. Turn prompts from "magic" into manageable product features.
\\n\\nEncapsulating Prompts as Product Features
\\n\\nOrdinary users don't need to know what a "system prompt" is.
\\n\\nThey just need to know: click this button, and get a draft of a formal email.
\\n\\nSo, the designer's job is to package complex prompts into simple feature buttons.
\\n\\nFor example:
\\n\\nThe original prompt might be very long:
\\n\\nyouis/area/anprofessionalemailwritingassistant。pleasehelpuserwritea/anformal、polite、conciseofbusinessemail。requirement:1. Start with an appropriate salutation; 2. Express the body clearly; 3. End with a polite closing. The tone should be professional but not rigid, friendly but not overly casual.\\n\\nAfter productization, users see only:
\\n\\n- \\n
- A button—"Write Business Email." \\n
- A few simple options—"Formal/Neutral/Friendly," "Brief/Detailed." \\n
This is the core of prompt productization: keep complexity for yourself, keep simplicity for users.
\\n\\nSystem Prompt Version Management
\\n\\nPrompts aren't done once written; they need continuous iteration.
\\n\\nYou may find: the new version of the prompt works better for scenario A but worse for scenario B.
\\n\\nSo, prompts need version management like code.
\\n\\nKey practices:
\\n\\n- \\n
- First, give each version a number or name—"v1.0," "v1.1," "Experimental-MoreFriendly." \\n
- Second, record changes for each version—"what was changed, why it was changed, expected effect." \\n
- Third, can run multiple versions simultaneously for A/B testing. \\n
- Fourth, can quickly roll back to previous versions. \\n
\\n\\n\\nWhen iterating prompts, always retain the ability to roll back. New versions may bring unexpected problems.
\\n
A/B Testing Prompts
\\n\\nWhich is better, version A or version B of the prompt? Don't guess—let data speak.
\\n\\nA/B testing approach:
\\n\\n- \\n
- Some users use version A, some use version B. \\n
- See which version has higher user satisfaction, lower retry rate, and better completion rate. \\n
Below is a simple Python script demonstrating how to do A/B test analysis for prompts:
\\n\\nExample
\\n\\n# ============================================\\n# Prompt A/B Test Analysis Script\\n# Used to compare effects of two prompt versions\\n# ============================================\\n\\nfrom dataclasses import dataclass\\nfrom typing import List, Dict, Optional\\nimport statistics\\n\\n@dataclass\\nclass TestResult:\\n """Data structure for single test result"""\\n prompt_version: str # "A" or "B"\\n user_satisfaction: int # User satisfaction 1-5\\n retry_count: int # User retry count\\n task_completed: bool # Whether task completed\\n time_spent_seconds: int # Time spent\\n test_case_id: str # Test case identifier (e.g., different query types)\\n\\ndef analyze_ab_test(results: List) -> Dict:\\n """Analyze A/B test results, return comparison statistics"""\\n # Group by version\\n group_a = [r for r in results if r.prompt_version == "A"]\\n group_b = [r for r in results if r.prompt_version == "B"]\\n \\n if not group_a or not group_b:\\n return {"error": "Need test data for at least two versions"}\\n \\n def calc_stats(group: List) -> Dict:\\n """Calculate statistics for single group"""\\n satisfaction_scores = [r.user_satisfaction for r in group]\\n return {\\n "sample_size": len(group),\\n "avg_satisfaction": statistics.mean(satisfaction_scores),\\n "median_satisfaction": statistics.median(satisfaction_scores),\\n "avg_retry": statistics.mean([r.retry_count for r in group]),\\n "completion_rate": sum(1 for r in group if r.task_completed) / len(group),\\n "avg_time": statistics.mean([r.time_spent_seconds for r in group]),\\n }\\n \\n stats_a = calc_stats(group_a)\\n stats_b = calc_stats(group_b)\\n \\n # Compare key metrics\\n comparison = {\\n "satisfaction_diff": stats_b - stats_a,\\n "retry_diff": stats_b - stats_a,\\n "completion_diff": stats_b - stats_a,\\n }\\n \\n # Can also break down by test case (e.g., different types of queries)\\n by_test_case = {}\\n test_case_ids = set(r.test_case_id for r in results)\\n \\n for case_id in test_case_ids:\\n case_results = [r for r in results if r.test_case_id == case_id]\\n case_a = [r for r in case_results if r.prompt_version == "A"]\\n case_b = [r for r in case_results if r.prompt_version == "B"]\\n \\n if case_a and case_b:\\n by_test_case = {\\n "a_score": statistics.mean(r.user_satisfaction for r in case_a),\\n "b_score": statistics.mean(r.user_satisfaction for r in case_b),\\n }\\n \\n return {\\n "version_a": stats_a,\\n "version_b": stats_b,\\n "comparison": comparison,\\n "by_test_case": by_test_case,\\n "winner": _determine_winner(stats_a, stats_b),\\n }\\n\\ndef _determine_winner(stats_a: Dict, stats_b: Dict) -> Optional:\\n """Determine which version is better based on statistics"""\\n # Comprehensive consideration of satisfaction, completion rate, retry count\\n a_score = (\\n stats_a * 0.5 +\\n stats_a * 10 * 0.3 +\\n (5 - stats_a) * 0.2\\n )\\n b_score = (\\n stats_b * 0.5 +\\n stats_b * 10 * 0.3 +\\n (5 - stats_b) * 0.2\\n )\\n \\n # If difference is too small, uncertain\\n if abs(a_score - b_score) < 0.1:\\n return "tie" # Tie\\n \\n return "A" if a_score > b_score else "B"\\n\\ndef print_report(analysis: Dict) -> None:\\n """Print readable A/B test report"""\\n print("=" * 60)\\n print("TUTORIAL Prompt A/B Test Report")\\n print("=" * 60)\\n \\n if "error" in analysis:\\n print(f"Error: {analysis['error']}")\\n return\\n \\n print(f"\\\\n Version A (Baseline):")\\n print(f" Sample size: {analysis['version_a']['sample_size']}")\\n print(f" Average satisfaction: {analysis['version_a']['avg_satisfaction']:.2f}/5")\\n print(f" Completion rate: {analysis['version_a']['completion_rate']:.1%}")\\n print(f" Average retry count: {analysis['version_a']['avg_retry']:.2f}")\\n \\n print(f"\\\\n Version B (New Solution):")\\n print(f" Sample size: {analysis['version_b']['sample_size']}")\\n print(f" Average satisfaction: {analysis['version_b']['avg_satisfaction']:.2f}/5")\\n print(f" Completion rate: {analysis['version_b']['completion_rate']:.1%}")\\n print(f" Average retry count: {analysis['version_b']['avg_retry']:.2f}")\\n \\n print(f"\\\\n Comparison Results:")\\n diff = analysis\\n print(f" Satisfaction change: {diff['satisfaction_diff']:+.2f}")\\n print(f" Completion rate change: {diff['completion_diff']:+.1%}")\\n print(f" Retry count change: {diff['retry_diff']:+.2f}")\\n \\n winner = analysis\\n if winner == "tie":\\n print("\\\\n Conclusion: Both versions perform similarly, need more data or adjustment")\\n else:\\n print(f"\\\\n Conclusion: Version {winner} performs better")
YouTip