Skip to main content
This guide walks you through the complete process of building a production-ready AI voice agent from scratch, following best practices learned from building hundreds of voice agents for real clients.
This is not a beginner tutorial. We assume you’re familiar with voice AI platforms like Vapi or Retell AI. If you’re brand new, check out their beginner tutorials first.

The Reality of Voice Agent Development

Here’s what typically happens:
  1. You build a voice agent
  2. You call it 5-10 times
  3. It sounds great! Everything works perfectly
  4. You deploy to production
  5. It fails 40-50% of the time with real users
Why? Because manual testing can’t simulate:
  • Different accents and speaking styles
  • Frustrated or stressed users
  • Edge cases and unexpected responses
  • The scale of hundreds or thousands of calls
This guide shows you how to build agents that actually work in production, not just in testing.

Overview: Building an Agent Start-to-Finish

We’ll build a complete inbound voice agent for a real estate company that can handle:
  • Property inquiries (buying, selling, renting)
  • Lead qualification
  • Information capture (name, address, timeline)
  • Conditional logic based on user responses
Then we’ll stress-test it with Relyable to find and fix issues before real customers encounter them.

Step 1: Plan Your Agent with a Call Flow

Before writing a single line of prompt, map out your agent’s conversation flow visually. This makes prompt engineering dramatically easier.

Create a Call Flow Diagram

Use a tool like Whimsical, Miro, or any diagramming tool to map out:
  1. First Message - What the agent says when it answers
  2. Pathways - Different conversation flows based on user intent
  3. Questions - Information to capture at each step
  4. Conditions - Branching logic (if yes → do this, if no → do that)
  5. End States - How conversations conclude

Example: Real Estate Agent Flow

Inbound Call Started

"Hi, this is Emily. I'm an AI agent from Inflate Real Estate. How can I help you today?"

   ┌─────────────┼─────────────┐
   ↓             ↓             ↓
BUYING        SELLING       RENTING
Buying Pathway:
  1. “Great! Which property were you interested in?”
  2. “Could you spell out your full name?”
  3. “Do you have another property to sell first?”
    • If YES → “Would you like our assistance selling it?”
      • If YES → “What’s your current address?”
    • If NO → “Would you like to book a walkthrough?”
Selling Pathway:
  1. “Great! Could you provide your address?”
  2. “Just to confirm, your address is [repeat address]. Is that correct?”
  3. “Could you spell out your full name?”
  4. “What’s your reason for selling?”
  5. “Have you done any repairs recently?”
  6. “What’s your timeline for getting this sold?”
Renting Pathway:
  1. “Great! Which property were you interested in?”
  2. “Could you spell out your full name?”
  3. “What’s your timeline for moving in?”
  4. “Would you like to book a walkthrough?”
The more detailed your call flow, the easier prompt engineering becomes. Spend 20-30 minutes on this before touching your prompt.

Step 2: Configure Your Voice Agent Settings

Before building your prompt, set up your agent configuration in Vapi or Retell.

Model Selection

For Vapi:
  • GPT-4o - Reliable, fast, good for most use cases (recommended for starting)
  • GPT-4o-mini - Cheaper, slightly less capable
  • Claude 3.5 Sonnet - Alternative, good reasoning
  • GPT-5 (new) - Test carefully before using in production
Start with GPT-4o until you’ve tested your agent thoroughly. Don’t use brand-new models in production right away.

Voice Selection

Choosing the right voice is critical: Vapi Voices:
  • Pre-tuned for phone calls
  • Sound realistic and not glitchy
  • Limited selection (~10-15 voices)
  • Recommended for production - they’re tested and reliable
ElevenLabs Voices:
  • Larger selection
  • Can sound more natural
  • Warning: Some voices sound great but glitch after hundreds of calls
  • Test extensively if using ElevenLabs
Never deploy a voice to production without testing it on at least 50-100 calls. Some voices that sound incredible will screech or glitch under load.

Transcriber Settings

For most agents, the defaults work well:
  • Deepgram for Vapi (recommended)
  • Start with default settings
  • Only adjust after you’ve tested and identified specific transcription issues

Step 3: Structure Your Prompt

A well-structured prompt is critical for reliable agent performance.
# ROLE

You are **Emily**, a friendly AI agent from **Inflate Real Estate Services**. You help answer inbound phone calls, assisting clients with selling, buying, or renting properties.

# TASK

Follow the pathways below based on what the caller needs. Follow each question step-by-step in order.

## Buying

This is the pathway to follow when someone is interested in buying one of our properties.

1. Great! Which of our properties were you interested in?
2. Thank you. Could you please spell out your full name for us?
3. Once their name has been captured, repeat it back to them letter by letter.
4. Do you have another property you need to sell first?

If the user responds yes to question 4, please ask: `No worries. Would you like our assistance in selling this property?`

If the user responds no to question 4, please ask: `No worries. Would you like to book a walkthrough?`

If they said yes to wanting help selling:
5. Great! Could you please provide us with your current address?
6. Thanks. Would you like to book a walkthrough for the new property?

## Selling

This is the pathway to follow when someone is interested in selling their property.

1. Great! Could you please provide us with your address?
2. Thank you. Just to confirm, your address is `[read out the address captured]`. Is that correct?
3. Thank you. Could you please spell out your full name for us?
4. Once their name has been captured, repeat it back to them letter by letter.
5. What is your reason for selling?
6. Have you done any repairs recently?
7. What is your timeline for getting this place sold?

## Renting

This is the pathway to follow when someone is interested in renting one of our properties.

1. Great! Which of our properties were you interested in?
2. Thank you. Could you please spell out your full name for us?
3. Once their name has been captured, repeat it back to them letter by letter.
4. What is your timeline for moving in?
5. No worries. Would you like to book a walkthrough?

# NOTES

## Address Pronunciation

When reading out an address, split it into clear words:
- "123B" should be read as "one two three B"
- "57A Woodson Blvd" should be read as "five seven A, Woodson Boulevard"
- Never say "one hundred twenty-three", always "one two three"

Key Prompt Engineering Principles

Use Markdown Formatting: AI models are trained on markdown. Using proper formatting helps the AI understand your prompt better:
  • Use # Headers for major sections
  • Use ## Subheaders for pathways
  • Use **Bold** for emphasis on key terms
  • Use numbered lists for sequential steps
  • Use backticks for exact phrases to say
Be Explicit About Order: Number your questions explicitly (1, 2, 3…) and tell the agent to follow them step-by-step. Don’t assume the AI will figure out the order. Use Clear Conditionals: Instead of “If they say yes, help them” use “If the user responds yes to question 4, please ask: Would you like our assistance?” - Reference specific question numbers to avoid ambiguity. Confirm Critical Information: Always repeat back names (spell back letter by letter), addresses (read back with proper pronunciation), email addresses, and any data going into a CRM. This dramatically improves accuracy. Define Personality in the Role: The ROLE section shapes how the agent speaks. Adding “friendly” or “professional” or “empathetic” actually changes behavior significantly.

Step 4: Test Manually First

Before running automated tests, do a quick manual test:
1

Publish Your Agent

Save and publish your agent in Vapi/Retell
2

Call It Yourself

Call your agent and go through one pathway completely. Check:
  • Does it read the first message correctly?
  • Does it follow the question order?
  • Does the voice sound good?
  • Is the response latency acceptable?
3

Note Issues

Write down anything that doesn’t work. Don’t try to fix everything yet - just get a baseline.
If your agent completely fails manual testing, fix the obvious issues before automated testing. Automated testing is for finding subtle edge cases, not broken core functionality.

Step 5: Import to Relyable

Follow the Quick Start guide to:
  1. Create a Relyable account
  2. Import your agent
  3. Generate test cases
  4. Create personas

Step 6: Generate Comprehensive Test Cases

Test cases evaluate specific behaviors. Aim for 15-25 test cases covering:

Must-Have Test Cases

Identity and Branding:
  • Agent introduces with correct name and company
  • Agent mentions it’s an AI system
  • Agent maintains consistent identity
Conversation Flow:
  • Agent asks questions in the correct order
  • Agent doesn’t skip required questions
  • Agent handles conditional logic correctly
Data Capture:
  • Agent captures required information (name, address, email)
  • Agent confirms captured information back to user
  • Agent spells back name letter by letter
  • Agent reads addresses with proper pronunciation
Pathway Handling:
  • Agent correctly identifies buying vs selling vs renting intent
  • Agent follows the correct pathway for each intent
  • Agent handles pathway changes (e.g., buyer also needs to sell)
Edge Case Handling:
  • Agent handles objections (“Why do you need my name?”)
  • Agent handles unclear responses
  • Agent doesn’t give up on difficult customers prematurely

Setting Test Case Priorities

PriorityWhen to UseImpact on Score
CriticalMust work 100% of the time (emergency routing, compliance)Huge impact
HighCore functionality (capturing lead info, following flow)Large impact
MediumImportant but not critical (tone, using caller’s name)Medium impact
LowNice-to-have (specific phrases, minor details)Small impact

Step 7: Create Diverse Personas

Don’t just create “normal” callers. Create personas that stress-test your agent:

Example Personas to Create

The Frustrated Elderly Customer: Stanley Miller, 79-year-old retired mechanic. Frustrated with life, no patience, speaks in short clipped sentences. Gruff and exasperated. Reluctantly calling because his kids are forcing him to sell his house of 50 years. The Fast-Talking Young Professional: Sarah Chen, 28-year-old tech professional. Speaks very quickly, interrupts frequently, expects immediate answers. Impatient with any delays. Used to chatbots and expects perfect performance. The Non-Native Speaker: Raj Patel, 45-year-old engineer. Strong Indian accent, speaks slowly and carefully, sometimes struggles with pronunciation. Very polite but needs information repeated sometimes. The Skeptical Customer: Mike Johnson, 52-year-old business owner. Doesn’t trust AI systems, questions everything, tests whether it’s really AI or human. Asks unexpected questions to trip up the system. The Confused Caller: Linda Martinez, 67-year-old retiree. Not tech-savvy, easily confused, needs things explained multiple times. Forgets what was just discussed. Very polite but difficult to keep on track.
Create 5-7 diverse personas representing your actual customer base plus some worst-case scenarios. This coverage helps you find issues across user types.

Step 8: Run Automated Tests at Scale

Now run your tests:
1

Start with 5-10 Scenarios

Create 5-10 different scenarios using your personas:
  • 2-3 buying scenarios
  • 2-3 selling scenarios
  • 2-3 renting scenarios
  • 1-2 complex scenarios (buyer who also needs to sell)
2

Run All at Once

Select all scenarios and run them together. This typically takes 10-20 minutes for 5-10 calls.
3

Wait for Results

Grab coffee. Relyable will call your agent with each scenario and evaluate against all test cases.

Step 9: Analyze Results and Find Issues

Understanding Your Score

After tests complete, you’ll see an overall score:
  • 90%+ → Excellent, production-ready
  • 70-89% → Good, acceptable for production
  • 50-69% → Needs work before production
  • Below 50% → Significant issues, not ready
Reality Check: In the video demonstration, an agent that worked perfectly in manual testing scored 51-61% in automated testing. This is normal and exactly why you need automated testing!

Reviewing Failed Test Cases

Click on failed test cases to see:
  1. Which calls failed - Sometimes a test case fails on 2/5 calls, not all
  2. Why it failed - AI explanation of what went wrong
  3. The exact conversation - Full transcript and audio
  4. Suggestions - How to fix it in your prompt

Common Failure Patterns

Skipping Questions:
  • Symptom: Agent jumps to question 5 without asking questions 3 and 4
  • Fix: Add explicit numbering and: “Follow each question step-by-step in order. Do not skip any questions.”
Not Confirming Information:
  • Symptom: Agent captures name but doesn’t spell it back
  • Fix: Make confirmation explicit: “3. Once their name has been captured, repeat it back to them letter by letter.”
Giving Up on Difficult Customers:
  • Symptom: When customer pushes back (“Why do you need that?”), agent says “No worries, have a great day” and ends the call
  • Fix: Add handling: “If the caller objects or questions why you need information, briefly explain the reason and ask again gently. Do not end the call unless they explicitly want to hang up.”
Breaking Conditional Logic:
  • Symptom: Agent asks “Would you like help selling?” even when user said NO to having a property to sell
  • Fix: Be more explicit: “If the user responds NO to question 4, skip to question 6 directly.”
Poor Address Pronunciation:
  • Symptom: Agent says “one hundred twenty-three” instead of “one two three” for address 123
  • Fix: Add pronunciation guide in a NOTES section showing how to split numbers and addresses.

Step 10: Iterate and Improve

The key to production-ready agents is iteration:
1

Fix 2-3 Issues at a Time

Don’t try to fix everything. Pick the 2-3 most critical failures and fix those in your prompt.
2

Update in Vapi/Retell

Make your prompt changes in your voice platform, not in Relyable. Relyable is read-only.
3

Sync to Relyable

Click “Sync Prompt” in Relyable to pull the updated prompt.
4

Test Again

Run the same scenarios again. Did your score improve? Did the specific failures get fixed?
5

Repeat Until 70%+

Keep iterating. Most production-ready agents take 5-10 iterations to get from 50% to 75%+.

Tracking Improvements

IterationScoreChanges Made
151%Baseline
258%Added step-by-step ordering
363%Added name confirmation
469%Fixed conditional logic
574%Added objection handling
678%Production Ready
Document what changes you make each iteration. This helps you understand what actually improves performance vs. what doesn’t matter.

Step 11: Enable Live Monitoring

Once you reach 70%+ consistently:
1

Enable Call Monitoring

In Relyable, go to Agent Settings → Enable Call Monitoring
2

Deploy to Production

Your agent is now ready for real customers
3

Monitor Performance

Every production call is evaluated against your test cases. You’ll see:
  • Real-time scores
  • Which test cases are failing in production
  • Trends over time (is quality improving or degrading?)
4

Set Up Alerts

Get notified when:
  • Score drops below 70%
  • Critical test cases fail
  • Specific issues occur multiple times

Prompt Engineering Best Practices

Use Markdown Effectively

Good Example:
# ROLE
You are **Emily**, a friendly AI agent from **Inflate Real Estate**.

## Buying Pathway
1. Ask which property they're interested in
2. Capture their full name
Bad Example:
You are Emily from Inflate Real Estate. When someone wants to buy ask them which property and get their name.

Preview Your Prompt

Use Markdown Live Preview to see how your prompt renders. This shows you how the AI “sees” your prompt.

Be Explicit About Everything

Don’t assume the AI will “figure it out.” Be explicit about:
  • Question order
  • What to say exactly (use backticks)
  • When to say it
  • What NOT to do

Test One Change at a Time

If you change 5 things and the score improves, you don’t know which change helped. Change 1-2 things per iteration.

Use Real Call Examples

When you find a failure in testing, paste the transcript into your prompt as an example:
## Handling Objections

If a caller asks "Why do you need my name?", respond with:
"I just want to make sure we have the right information to help you. It ensures our team can follow up properly."

Example:
Caller: "Why do you need my name?"
Agent: "I just want to make sure we have the right information to help you. It ensures our team can follow up properly. Could you spell out your full name for us?"

Advanced: Data Capture and CRM Integration

Once your agent is reliable, integrate it with your CRM:

Capture Data with Functions

In Vapi, set up functions to capture:
  • Name
  • Phone number
  • Email
  • Address
  • Other lead information

Confirm Before Sending

Always confirm information before sending to CRM:
5. Repeat back: "Just to confirm, your email is `[email]`. Is that correct?"
6. If yes, save to CRM. If no, ask them to repeat it.

Test the Integration

Run automated tests with the CRM integration enabled. Verify:
  • Data is captured correctly
  • Data is confirmed before sending
  • Invalid data doesn’t get sent
  • The agent handles CRM errors gracefully

Performance Optimization

Reducing Latency

If your agent is too slow:
  1. Use a faster model - Try GPT-4o-mini instead of GPT-4o
  2. Shorten your prompt - Remove unnecessary details
  3. Reduce tool calls - Fewer function calls = faster responses
  4. Use Vapi’s latest features - They constantly optimize latency

Improving Speech Quality

If speech sounds robotic or glitchy:
  1. Switch voices - Some voices are more reliable
  2. Adjust WPM (words per minute) - 150-180 is natural
  3. Add punctuation to responses - Helps with cadence
  4. Use SSML tags - For pauses and emphasis (platform-dependent)

Common Pitfalls

Testing Only Happy Paths: Don’t just test scenarios where everything goes perfectly. Test difficult customers, people who object or refuse to give information, people who go off-script, and multiple intents in one call. Assuming Manual Testing Is Enough: You CANNOT catch all issues with manual testing. You’ll miss rare edge cases, issues with specific accents, problems at scale, and inconsistencies (works 80% of the time). Over-Engineering Too Early: Start simple. Get the basic flow working well before adding complex branching, advanced features, or CRM integrations. A simple agent that works is better than a complex one that’s unreliable. Ignoring Low-Priority Test Cases: Just because something is “low priority” doesn’t mean ignore it. If low-priority test cases fail 100% of the time, fix them. They’re still part of the user experience. Not Documenting Changes: Keep notes on what you change and why. This helps you understand what works, train team members, debug issues later, and build knowledge for future agents.

Checklist: Is My Agent Production-Ready?

Functionality Requirements

  • Agent introduces itself correctly 100% of the time
  • Agent follows conversation flow step-by-step
  • Agent captures all required information
  • Agent confirms critical data (name, email, address)
  • Agent handles conditional logic correctly
  • Agent doesn’t skip questions
  • Agent doesn’t give up on difficult customers

Testing Requirements

  • Agent scores 70%+ on automated tests consistently
  • Tested with at least 5 different personas
  • Tested with 20+ different scenarios
  • All Critical test cases pass 100%
  • All High test cases pass 90%+
  • Edge cases are handled gracefully

Quality Standards

  • Voice sounds natural and doesn’t glitch
  • Latency is under 1 second average
  • Words per minute is 150-180 (natural pace)
  • Agent sounds friendly and professional
  • No awkward pauses or weird inflections

Production Monitoring

  • Live monitoring is enabled in Relyable
  • Alerts are set up for critical failures
  • Team knows how to check performance
  • Process for responding to issues is defined

Next Steps

Resources