Voice & Turn-Taking Settings
Voice & Turn-Taking Settings
These settings control how aggressively the agent detects user speech and decides when to respond.
Interruptions
allow_interruptions- Allows the user to interrupt the agent mid-speech.
min_interruption_duration- Minimum duration of detected user speech to count as an interruption.
Defaults:
allow_interruptions=Truemin_interruption_duration=0.08
Turn-taking / endpointing
These parameters control Voice Activity Detection (VAD) and determine when the agent decides the user has finished speaking:
min_silence_duration- How long (in seconds) the user must be silent before the system considers they might be done speaking.activation_threshold- Confidence level (0.0-1.0) required to detect speech. Higher values reduce false positives from background noise.prefix_padding_duration- Amount of audio (in seconds) to capture before detected speech starts. Helps catch the beginning of words.min_endpointing_delay- Minimum time (in seconds) to wait before ending the user's turn.max_endpointing_delay- Maximum time (in seconds) to wait before forcefully ending the turn.
Defaults:
min_silence_duration=2.0activation_threshold=0.4prefix_padding_duration=0.5min_endpointing_delay=0.45max_endpointing_delay=3.0
These defaults are optimized to handle natural conversation pauses and prevent the agent from cutting off users who are thinking or formulating their thoughts.
Practical guidance
- If the agent responds too quickly: Increase
min_silence_durationormax_endpointing_delay. - If the agent cuts off users mid-thought: Increase
min_endpointing_delayandmax_endpointing_delay. - If users experience silence detection issues: Check if the stream is closing prematurely by increasing
max_endpointing_delayto 4.0-5.0 seconds. - If the agent interrupts users too easily: Reduce
allow_interruptionsor increasemin_interruption_duration. - If speech is missed at the beginning: Increase
prefix_padding_duration. - For noisy environments: Increase
activation_thresholdto 0.65-0.7 to reduce false speech detection. - For users who speak slowly or thoughtfully: Increase both
min_silence_duration(to 1.0-1.2) andmax_endpointing_delay(to 4.0-5.0).