Krisp launches VIVA 2.0, introducing voice infrastructure for real-world AI agents |

India, May 7: Krisp today launched Krisp VIVA 2.0, the voice AI infrastructure layer for voice agents, IVRs, and conversational AI. The release introduces a new generation of small, real-time models that improve WER, predict when users finish speaking, classify interruptions, and read perceptual signals like synthetic speech, gender, and accent, setting a new benchmark for how voice agents handle audio in production.

Voice agent usage grew 9x in 2025, yet most voice agents still fail in the same predictable ways the moment they leave a demo room. Background voices and noises push speech-to-text word error rates from 5% to over 30%. Voice activity detection misfires on background voices; bots can ignore real interruptions or hallucinate them. And on telephony, the agent’s own voice can loop back through the mic and trigger self-interruption.

Voice AI systems today are built on STT, LLMs, and TTS. What’s been missing is a layer to handle real-world audio and conversational dynamics before those systems engage.

VIVA fills that gap to ensure AI agents function in messy, real-world environments.

Krisp’s VIVA SDK runs server-side directly in each customer’s audio pipeline before STT, improving reliability across the entire stack.

What’s new in VIVA 2.0:

Turn Prediction v3: A new multilingual model that predicts end-of-turn from audio alone, no transcription needed. Reacts quickly to real turn-ends while holding through mid-sentence pauses — low-latency responses without the agent cutting users off. Tiny enough to run on standard CPUs or locally, on-device for robotics and conversational toys.
Interrupt Prediction v1: A first-of-its-kind audio-only classifier that predicts when a user is intending to interrupt the agent (start-of-turn prediction). Distinguishes intent-to-take-the-floor from backchannel speech like “yes” or “mhm.” Different from end-of-turn prediction, which detects when the user has finished speaking. Patent filed.
Signal Detectors: A new category of real-time audio models that give voice AI the perceptual cues humans use without thinking. Three models launching with VIVA 2.0:

TTS Detector: Detects synthetic speech in real time. Use case: an outbound voice AI agent calls a number and recognizes when an inbound voice AI agent or IVR picks up.
Accent Detector: Identifies the speaker’s accent so audio can be routed to the STT model best tuned for it, lifting transcription quality.
Gender Detector: Identifies speaker gender to enable personalized responses.

Voice Isolation v3: The world’s most widely used voice isolation model has been upgraded to deliver measurable improvements in downstream WER.

All models run on standard server CPUs, operate on audio input alone with no transcription required, and are bundled into existing VIVA pricing at no additional charge.

Krisp has spent over eight years solving real-world voice in production, first for human-to-human conversations and now for human-to-AI. That experience gives VIVA the depth of training data and field-tested reliability nothing else in the market can match.

Krisp VIVA SDK processes more than 12 billion minutes of voice AI agent traffic a year and is embedded in over 130 voice AI products, including Daily, Vapi, LiveKit, Ultravox, Telnyx, the world’s leading AI labs, and the largest enterprise contact centers.

Platforms running VIVA report:

3.5x improvement in turn-taking accuracy
50% fewer dropped calls
30% higher customer satisfaction

“At scale, the biggest challenge in voice AI isn’t the model. It’s the quality of the signal going into it,” said David Casem, CEO of Telnyx. “Krisp addresses that at the source, which improves everything downstream from transcription to response.”

“Voice is becoming the primary interface between humans and AI,” said Robert Schoenfield, EVP of Licensing and Partnerships at Krisp. “Those conversations don’t happen in clean environments. They happen in the real world, shaped by noise and subtle human cues. VIVA brings that layer into the system, so voice agents can operate the way people actually speak.”

VIVA 2.0 is available now and Krisp will showcase it live at Twilio Signal 2026 May 6-7 in San Francisco.

Krisp launches VIVA 2.0, introducing voice infrastructure for real-world AI agents

Byteam

Related Post

Manappuram Foundation Raises Awareness on ‘Digital Dementia’ and the Cognitive Impact of Excessive Technology Use

AI Adoption in Audit Accelerates as Firms Prioritise Human Judgment and Accountability

Wet Coffee Grounds Turned into High-Grade Solid Fuel in Just 90 Seconds

Leave a Reply Cancel reply

You missed

Child Care Aware of Missouri Appoints Director to Lead New Initiative

fäm Properties closes AED124 million office sale at Vision Tower, Business Bay

Carne Group appoints John Parkhouse as President to accelerate execution of its growth strategy

Greenpanel Showcases Next-Generation Wood Panel Solutions at Bharat Buildcon 2026