ElevenLabs Gives IBM's AI Agents a Voice — in 70 Languages
IBM watsonx Orchestrate integrates ElevenLabs TTS and STT, bringing 10,000+ voices across 70 languages to enterprise AI agents — with PCI compliance and HIPAA-ready data handling built in.
No results found
51 articles
IBM watsonx Orchestrate integrates ElevenLabs TTS and STT, bringing 10,000+ voices across 70 languages to enterprise AI agents — with PCI compliance and HIPAA-ready data handling built in.
Shopify flipped Agentic Storefronts on by default for every eligible merchant. Your products now surface in ChatGPT, Google AI Mode, and Microsoft Copilot — no app install, no setup, no opt-in required.
Cursor's new coding agent costs one-tenth the price of frontier models and scores 61.7% on Terminal-Bench. Then developers found the model ID — and it pointed straight to Moonshot AI's Kimi K2.5.
Microsoft's second-gen image model jumps from top-10 to #3 on the Arena.ai leaderboard. Photorealism, in-image text, and rich scene generation — rolling out to Copilot and Bing now.
MiniMax's M2.7 participated in its own training loop — running 100+ autonomous iteration rounds on its own scaffold, achieving a 30% performance boost. SWE-Pro hits 56.22%.
Xiaomi drops three frontier models at once — an omni-modal agent, a 1T-parameter coding model that topped OpenRouter under a fake name, and a TTS model that can sing. All at a fraction of Claude's price.
Stitch graduates from Google Labs with a complete redesign: infinite canvas, voice design agent, instant prototypes, and DESIGN.md — an agent-friendly file for sharing design systems.
Cowork's new Dispatch feature lets you send Claude a task from your phone and pick up the result on your desktop. One persistent thread, full access to your local files.
Google's Gemini API now lets developers mix function calling with built-in tools like Search and Maps in a single call. Context circulation and spend caps also ship.
Baidu is integrating OpenClaw's agentic AI framework into Xiaodu smart speakers, moving toward multi-step voice-controlled task execution as its core business erodes.
World's AgentKit lets AI agents carry cryptographic proof of the human behind them, using World ID and zero-knowledge proofs. Built for agentic commerce at scale.
Nvidia is forming an 8-lab coalition to build open frontier models on DGX Cloud. Mira Murati's Thinking Machines Lab is a founding member.
Alibaba's Wukong platform coordinates multiple AI agents for complex business workflows. Available in closed beta via DingTalk and a desktop app, powered by Qwen.
Google's Gemini feature that connects to your Gmail, Photos, Calendar, and Drive is dropping the paywall for all US users. Five prompts a day, 32K context.
GPT-5.4 mini hits 54.4% on SWE-Bench at $0.75/M tokens. Nano goes cheaper still. Both unlock agentic tool calling at the small-model tier.
Seven chips in production, a purpose-built agentic CPU, 256-LPU inference racks, and $1 trillion in orders. Nvidia's GTC 2026 keynote rewrites the datacenter playbook.
Zhipu AI's closed-source GLM-5-Turbo targets the OpenClaw agent ecosystem with 744B parameters, 200K context, and pricing that undercuts Claude by 5x. Shares jumped 16%.
Microsoft launches Copilot Health, connecting wearables, health records from 50,000+ providers, and lab results into a single AI-powered health companion — with a medical superintelligence roadmap behind it.
Anthropic ships inline visualizations for Claude — interactive charts, diagrams, and visual explanations rendered directly in chat using HTML and SVG. Available to all users, including free tier.
Google Maps gets its biggest upgrade in a decade: Ask Maps turns location search into a Gemini-powered conversation, while Immersive Navigation rebuilds turn-by-turn directions in 3D.
Upload a PNG, get back separated layers with live editable text. Canva's Magic Layers decomposes flat images into full design projects — available now on Free, Pro, and Enterprise.
Perplexity launches Personal Computer — an always-on AI agent running on a dedicated Mac mini. $200/month for Max subscribers, with 10,000 monthly credits and remote control from any device.
Nvidia's Nemotron 3 Super is a 120B hybrid Mamba-Transformer MoE model that activates only 12B parameters per token. Open weights, 1M context, 5x faster than its predecessor.
Cloudflare's new /crawl endpoint turns any website into Claude-ready markdown with one API call. Here's how to set it up, from API tokens to RAG pipelines.
Paper launches its desktop app with MCP server support, turning a web-based design tool into something AI agents can actually manipulate. Free tier includes 100 MCP tool calls per week.
Google rolls out deep Gemini integration across Workspace — Docs drafts from your Drive history, Sheets that build themselves from a description, Slides that match your deck's theme, and cross-app search in Drive.
Nvidia unveils NemoClaw ahead of GTC 2026 — an open-source framework for deploying autonomous AI agents that can plan, reason, and execute multi-step tasks inside corporate systems. Built on Nemotron models and NIM.
Anthropic launches Code Review for Claude Code — a multi-agent system that analyzes pull requests for bugs, logic errors, and security vulnerabilities. Detects issues in 84% of large PRs with under 1% false positives.
Google launches Gemini Embedding 2 — the first multimodal embedding model in the Gemini API, mapping text, images, video, audio, and documents into a single vector space. Available now in public preview.
Andrej Karpathy open-sources autoresearch — a 630-line repo where an AI agent autonomously runs LLM training experiments, modifies its own code, and accumulates improvements overnight on a single GPU.
Apple releases Pico-Banana-400K, a 400,000-image open dataset for text-guided image editing — built from real photos using Google's Nano Banana, scored by Gemini, and available on GitHub under CC BY-NC-ND 4.0.
Qualcomm's first post-Arduino-acquisition board pairs a 40 TOPS Dragonwing IQ8 NPU with an STM32HS microcontroller for real-time actuation — under $300, shipping Q2 2026.
Luma AI releases Uni-1, a unified model that merges visual understanding and image generation in one decoder-only transformer — outperforming Nano Banana 2 and GPT Image 1.5 on key benchmarks.
Microsoft brings Anthropic's Claude Cowork technology into M365 Copilot, launches Agent 365 at $15/user, and bundles everything into a $99 E7 Frontier Suite — while SaaS stocks reel from a $285 billion selloff.
Anthropic launches Claude Marketplace, an enterprise store for third-party AI software from Snowflake, Harvey, Replit, and others — with zero commission and the option to pay using committed API spend.
Helios is a 14B-parameter open-weight video generation model that hits 19.5 FPS on a single H100 GPU, generates minute-long videos, and supports text, image, and video inputs — all under Apache 2.0.
OpenAI launches Codex Security in research preview — an autonomous agent that discovers, validates, and patches code vulnerabilities. It already found 14 CVEs in OpenSSH, GnuTLS, and Chromium.
OpenAI releases GPT-5.4 in standard, Thinking, and Pro variants — its most capable model yet, with native computer-use mode, 1 million token context, 33% fewer errors, and direct Excel and Google Sheets integration.
Google open-sources gws, a Rust-built CLI that gives developers and AI agents unified access to every Workspace API — Drive, Gmail, Calendar, Sheets, Docs, and more — from a single command line.
Google's NotebookLM launches Cinematic Video Overviews — a feature that transforms uploaded documents into bespoke, animated video summaries using a combination of Gemini 3, Nano Banana Pro, and Veo 3.
Google rolls out Canvas in AI Mode to all US users — turning Search into a creative workspace where you can write documents, generate code, and build interactive apps directly from a search query.
OpenAI open-sources Symphony, a system that monitors project boards like Linear, spawns coding agents to implement tasks, and submits pull requests with CI verification and video walkthroughs.
Apple announces MacBook Pro with M5 Pro and M5 Max — built on a new Fusion Architecture that combines two 3nm dies into one SoC, delivering up to 4x AI performance and the world's fastest CPU core.
Google releases Gemini 3.1 Flash Lite — a model built for speed and cost efficiency. It's 2.5x faster than Gemini 2.5 Flash, costs a fraction of Pro, and still handles 1M tokens of context.
OpenAI rolls out GPT-5.3 Instant — a major update to ChatGPT's default model that addresses user complaints about overly cautious, preachy responses while improving factual accuracy and web search integration.
Google releases Nano Banana 2 — technically Gemini 3.1 Flash Image — bringing 4K resolution, character consistency, and real-time web grounding to AI image generation across the Gemini app, Google Search, and the developer API.
Google DeepMind launches Gemini 3.1 Pro with a new three-tier thinking system, 1M token context window, and benchmark scores that leave the competition behind — including a 77.1% on ARC-AGI-2.
Google DeepMind releases Lyria 3, its most advanced AI music model yet. Available inside the Gemini app, it generates 30-second tracks with vocals, lyrics, and instruments from a simple text prompt.
Alibaba's Qwen team releases Qwen 3.5 — a 397-billion-parameter mixture-of-experts model under Apache 2.0 that activates only 17B parameters per token, supports 201 languages, and claims competitive performance against GPT-5.2 and Claude Opus 4.5.
ByteDance launches Seedance 2.0, an AI video generator that produces cinematic clips with synchronized audio, multi-shot sequences, and realistic physics — all from a single text prompt.
Kuaishou launches Kling 3.0 with native 4K at 60fps, multi-shot storyboarding with up to 6 camera cuts, character consistency across generations, and synchronized audio output.