Qwen 3.5: Open-Source 397B That Overdelivers

Alibaba's Qwen team releases Qwen 3.5 — a 397-billion-parameter mixture-of-experts model under Apache 2.0 that activates only 17B parameters per token, supports 201 languages, and claims competitive performance against GPT-5.2 and Claude Opus 4.5.

Alibaba's Qwen team has released Qwen 3.5, a new family of large language models headlined by a 397-billion-parameter flagship that activates only 17 billion parameters per forward pass. Released on February 16, 2026, under the Apache 2.0 license, the model represents one of the most capable open-weight LLMs available today — and its benchmark results put it in direct competition with proprietary models from OpenAI, Anthropic, and Google that cost significantly more to run.

The release includes the flagship Qwen3.5-397B-A17B, followed by medium-sized variants (122B-A10B and 27B) optimized for agentic workloads, and as of March 2, a family of small models (9B and 4B) designed for on-device deployment.

Architecture and Design Philosophy

Qwen 3.5 is built on a sparse Mixture-of-Experts (MoE) architecture — a design pattern that has become the dominant approach for training models that are large in total parameter count but efficient at inference time. The flagship 397B-A17B model routes each token through a subset of expert networks, activating roughly 17 billion parameters out of 397 billion total. This means the model stores a vast amount of knowledge across its parameter space but incurs computational costs closer to those of a 17B-parameter dense model during actual generation.

The architecture incorporates multi-token prediction and an attention mechanism inherited from Qwen3-Next, specifically designed to reduce memory pressure at very long context lengths. The open-weight version supports a 256K-token context window, while the hosted Qwen3.5-Plus variant extends this to 1 million tokens — a context length that enables processing of entire codebases, lengthy legal documents, or multi-hour transcripts in a single pass.

The model's vocabulary has been expanded to 250,000 tokens, up from 150,000 in prior Qwen generations. Language support now covers 201 languages and dialects, a substantial increase from the 119 languages supported by Qwen 3. This breadth of multilingual capability is particularly relevant for Alibaba's global cloud customer base, where models must perform well across diverse linguistic contexts without requiring per-language fine-tuning.

Benchmark Performance

The headline numbers are striking. According to Alibaba's published evaluations, Qwen3.5-397B-A17B outperforms Qwen3-Max — a model with over a trillion parameters — across multiple reasoning and coding benchmarks. This is an unusual result: a model with roughly a third of the total parameters beating its much larger predecessor, suggesting that architectural improvements and training efficiency gains are delivering more value than raw parameter count.

On external benchmarks, the model claims competitive results against GPT-5.2, Claude Opus 4.5, and Gemini 3 Pro. The SWE-bench Verified score of 76.4 percent places it in the same tier as frontier proprietary models for real-world software engineering tasks, while a LiveCodeBench v6 score of 83.6 represents near-human performance on competitive programming problems.

Perhaps more significant than the raw benchmark numbers is the efficiency story. At 256K context lengths, Qwen 3.5 decodes 19 times faster than Qwen3-Max and 7.2 times faster than Qwen 3's 235B-A22B model. Alibaba claims the model is 60 percent cheaper to run than its predecessor and eight times more capable of handling large concurrent workloads — metrics that matter enormously for production deployments where cost per token and throughput are the primary constraints.

Spec	397B-A17B	122B-A10B	27B	9B	4B
Total params	397B	122B	27B	9B	4B
Active params	17B	10B	27B	9B	4B
Context	256K (1M hosted)	256K	256K	—	—
License	Apache 2.0	Apache 2.0	Apache 2.0	Apache 2.0	Apache 2.0
Target	Cloud inference	Agentic workloads	Agentic workloads	On-device	On-device

Native Multimodal Fusion

Unlike models that bolt vision capabilities onto a language backbone as a post-training step, Qwen 3.5 fuses text, image, and video tokens from the very first pretraining stage through what Alibaba calls "early fusion." The practical implication is that the model does not treat visual understanding as a separate skill grafted onto language capability; instead, cross-modal reasoning is baked into the model's fundamental representations.

This approach enables capabilities like generating code from a screenshot of a UI mockup, answering questions that require jointly reasoning over text and images, or analyzing video content with temporal coherence. Whether early fusion produces meaningfully better multimodal performance than the adapter-based approaches used by competitors is an empirical question that independent benchmarks will need to validate, but the architectural choice is technically sound and aligns with the direction that several leading research groups have been pursuing.

The Full Model Family

Qwen 3.5 is not a single model but a graduated family designed to cover the full spectrum of deployment scenarios. The flagship 397B-A17B targets cloud-based inference for the most demanding tasks. The medium-tier models — the 122B-A10B and 27B variants — are explicitly optimized for agentic applications, scenarios where a model must plan, reason, and execute multi-step workflows with tool use and environment interaction.

The small model series, released on March 2, includes variants from 0.8B to 9B parameters designed for on-device deployment. These models target smartphones, edge devices, and latency-sensitive applications where sending requests to a cloud endpoint is either too slow or impractical. The 9B variant, in particular, represents an interesting option for developers who need a capable model that can run on consumer GPUs.

Open-Source Strategy and Competitive Implications

The Apache 2.0 license on the 397B open-weight model is significant. It allows commercial use without restriction, meaning any organization can download, fine-tune, and deploy the model without licensing fees or usage-based charges to Alibaba. The hosted Qwen3.5-Plus variant, available through Alibaba Cloud, offers the extended 1M-token context window and managed infrastructure for organizations that prefer not to handle their own deployment.

This dual distribution strategy — open weights for those who want control, hosted API for those who want convenience — mirrors the approach that Meta has taken with Llama and that Mistral has pursued with its model family. For the broader ecosystem, a model that approaches GPT-5.2-class performance while being freely available under a permissive license has real consequences for competition, pricing, and the accessibility of frontier AI capabilities.

Alibaba has also unified its AI model branding under the Qwen name, signaling that the company views its model family as a core strategic asset rather than a research side project. As the Chinese AI landscape becomes increasingly competitive — with DeepSeek, Baidu, and others releasing their own frontier models — Qwen 3.5 represents Alibaba's bid to remain at the forefront of both open-source AI development and commercial cloud AI services.