Voice AI Assistant - Sarah
A production-grade AI-powered voice agent for healthcare organizations that handles patient phone calls autonomously. Built using LiveKit Agents framework with real-time SIP telephony integration, achieving sub-second response times and HIPAA-compliant architecture.
Key Achievements
Sub-1 second response time: Reduced from 4-6 seconds (custom solution) to under 1 second using LiveKit's streaming pipeline with Silero VAD and ML-based turn detection.
200+ calls handled daily: Production deployment at Piedmont Urgent Care handling appointment inquiries, walk-in availability, clinic hours, and location information.
$50K annual savings: Replaced the need for 2 FTE receptionists while providing 24/7 availability.
85%+ retrieval accuracy: RAG-powered knowledge base using FAISS vector database for accurate, context-aware responses.
HIPAA compliant: Built on LiveKit's SOC2, GDPR, and HIPAA-certified infrastructure with end-to-end encryption.
Technical Architecture
The system uses an industry-standard voice pipeline that replaces custom WebSocket handling with production-ready components:
Voice Pipeline: LiveKit Agents 1.3 - handles real-time audio routing, participant management, and telephony bridging.
Speech-to-Text: Deepgram Nova-2 with streaming transcription, interim results, and 25ms endpointing for natural conversation flow.
LLM: Azure OpenAI GPT-4o-mini for reasoning and response generation with RAG context injection.
Text-to-Speech: ElevenLabs (Jessica voice) with eleven_turbo_v2_5 model for human-like voice synthesis.
Voice Activity Detection: Silero ML-based VAD for accurate speech detection without processing silence.
Turn Detection: LiveKit Multilingual Transformer model that knows when the caller has finished speaking.
Knowledge Base: FAISS vector database with Azure OpenAI embeddings for SOP document retrieval.
Telephony: Twilio SIP trunk integration with direct routing to LiveKit Cloud.
System Flow
When a patient calls the clinic phone number:
1. Call Routing: Twilio receives the call and routes it via SIP trunk to LiveKit Cloud.
2. Agent Dispatch: LiveKit automatically dispatches the Sarah agent to a new room for the caller.
3. Greeting: Agent generates an autonomous greeting from the knowledge base (not hardcoded).
4. Conversation: Streaming STT captures speech, LLM generates response with RAG context, streaming TTS speaks the response.
5. Turn Detection: ML model detects when caller finishes speaking, enabling natural conversation flow.
Key Features
Fully Autonomous: Zero hardcoded responses - all conversation logic handled by LLM with knowledge base context.
Swap and Adapt: Change clinic documents and Sarah automatically learns new information. No code changes required.
Professional Boundaries: Redirects inappropriate conversations back to clinic-relevant topics.
Garbled Speech Handling: Detects nonsensical transcriptions and politely asks caller to repeat.
URL Cleanup: Automatically removes URLs from voice output for natural speech.
Noise Cancellation: BVC telephony filter for clean audio on phone calls.
Technology Stack
Voice Framework: LiveKit Agents 1.3.12
STT: Deepgram Nova-2
LLM: Azure OpenAI GPT-4o-mini
TTS: ElevenLabs eleven_turbo_v2_5
VAD: Silero (ML-based)
Turn Detection: LiveKit Multilingual Model
Vector Database: FAISS
Embeddings: Azure OpenAI text-embedding-ada-002
Telephony: Twilio Elastic SIP Trunking
Cloud: LiveKit Cloud (WebRTC infrastructure)
Performance Metrics
STT Latency: ~400ms (streaming)
LLM Response: ~500ms
TTS Latency: Streaming (starts immediately)
End-to-End: Sub-1 second perceived response
VAD: Real-time ML-based detection
Turn Detection: Transformer-based, 85% accuracy
Security & Compliance
HIPAA: LiveKit is HIPAA-eligible with BAA available
SOC2: Compliant with AICPA Trust Services Criteria
GDPR: Fully aligned with EU data protection laws
Encryption: 256-bit TLS for connections, AES-128 for media streams, AES-256 for data at rest
Data Storage: LiveKit does not store audio or video data
Business Impact
Deployed to Piedmont Urgent Care for production use
Handles 200+ patient calls daily
Available 24/7 without staffing constraints
Consistent, accurate information delivery
Replaced need for 2 FTE receptionists ($50K annual savings)
4.5/5 patient satisfaction rating
Future Enhancements
Outbound calling for appointment reminders
Multi-language support using LiveKit's multilingual capabilities
Integration with EHR systems for patient lookup
Call transfer to human agents for complex cases
Real-time analytics dashboard
View the project on GitHub:
View on GitHub