ElevenLabs Gives IBM's AI Agents a Voice — in 70 Languages
IBM watsonx Orchestrate integrates ElevenLabs TTS and STT, bringing 10,000+ voices across 70 languages to enterprise AI agents — with PCI compliance and HIPAA-ready data handling built in.
Most enterprise voice bots still sound like they’re reading a script through a tin can. Rigid call flows, robotic intonation, the unmistakable pause while the system figures out what you said — it’s the reason people mash zero the moment they hear an automated greeting. ElevenLabs and IBM think they can fix that, and as of March 25, they’re trying.
The two companies announced an integration that pipes ElevenLabs’ Text to Speech and Speech to Text directly into IBM watsonx Orchestrate, the agentic AI orchestration platform that banks, insurers, healthcare providers, and government agencies use to automate workflows. The pitch: AI agents that don’t just process language but actually sound human doing it.
What’s Actually in the Box
Developers building agents on watsonx Orchestrate now get access to ElevenLabs’ full library — 10,000-plus voices spanning 70 languages with regional accents. The TTS and STT steps attach directly to orchestration pipelines, which means no separate API integration, no duct-taping a third-party voice service onto your agent stack.
That matters more than it sounds. Enterprise voice deployments are notoriously fragile. Every additional API call adds latency, every separate vendor adds a compliance surface. By making ElevenLabs native to the platform, IBM is cutting out the middleware layer that typically turns a demo-ready voice agent into a deployment nightmare.
The compliance angle is the real selling point for the target audience. PCI compliance for payment processing, Zero Retention Mode designed for HIPAA-compliant data handling, and configurable data residency — these aren’t features consumer voice products worry about. They’re table stakes for any organization that handles credit card numbers over the phone or discusses patient information with a virtual agent.
Why IBM Needs This
IBM has been on a watsonx Orchestrate expansion tear. In the last two months alone, they’ve shipped AI features for the Masters Tournament fan experience, partnered with Deepgram for voice capabilities, launched a new FlashSystem portfolio with agentic AI services, and opened an AI impact accelerator. The ElevenLabs deal continues a clear pattern: make watsonx the connective tissue for enterprise AI, regardless of whose models or tools sit at the edges.
The Deepgram partnership from February is the most obvious comparison. Both deals add voice to watsonx Orchestrate — but ElevenLabs brings something Deepgram doesn’t: the most recognizable name in AI voice. ElevenLabs’ Eleven v3 engine set the bar for emotionally expressive synthetic speech, and their enterprise agent deployments have already handled over 33 million conversations this year across 2 million customer-created agents.
Who This Is For
Government agencies jump out as the clearest use case. A federal or state service desk that needs to support 70 languages isn’t going to hire 70 sets of bilingual call center staff. An AI agent that handles tier-one inquiries about healthcare enrollment, tax questions, or civic services — in Tagalog, Haitian Creole, or Urdu with regional inflection — is the kind of thing that justifies an IBM contract.
Banks and insurance companies are the other obvious market. PCI compliance means the voice agent can handle payment card data during a call without violating card network rules. Zero Retention Mode means conversation audio can be processed and discarded without hitting a HIPAA-regulated data store. These aren’t hypothetical use cases — they’re the exact scenarios where enterprises currently default to human agents because the compliance risk of automation was too high.
The Bigger Voice-Agent Picture
World’s AgentKit proved there’s demand for trustworthy AI agents in commerce. Anthropic’s Claude Marketplace showed that developers want clean, low-friction agent deployment. The ElevenLabs-IBM deal attacks a different part of the stack — the literal voice of the agent — but it points at the same trend: 2026 is the year AI agents stop being text-only experiments and start handling real customer interactions with real money on the line.
ElevenLabs co-founder Mati Staniszewski put it bluntly: they’re helping organizations replace robotic interactions with agents people actually want to talk to. Whether watsonx Orchestrate can deliver on that promise at enterprise scale — thousands of concurrent calls, dozens of languages, millisecond latency — is the engineering question that matters. The 70-language, 10,000-voice spec sheet is impressive. The proof will be in the hold music it replaces.