Huawei Ascend 950PR: China's 1.56 Petaflop Answer to Nvidia

Huawei's new AI chip delivers 2.8x the FP4 performance of Nvidia's H20 with 112GB of in-house HBM. ByteDance has already committed $5.6 billion in orders.

ByteDance just committed $5.6 billion to buy a chip that most of the Western AI industry has never heard of.

The Huawei Ascend 950PR delivers 1.56 petaflops of AI compute with 112GB of memory, a memory bandwidth of 1.4TB/s, and a 600W TDP. Huawei claims it hits 2.8x the FP4 performance of Nvidia’s H20 — the chip that Nvidia designed specifically for the Chinese market under US export controls. It’s the only product in China that supports FP4 low-precision inference.

The Specs

The 950PR is built for prefill and recommendation workloads — the inference tasks that need large, fast memory rather than raw sequential compute. It ships inside the Atlas 350 accelerator module and uses Huawei’s in-house high-bandwidth memory, branded HiBL 1.0. That last detail matters: this is a fully domestic supply chain, from logic to memory.

Additional performance claims include a 4x improvement in memory access efficiency for smaller operators and a 60% boost in multimodal throughput over the previous Ascend generation.

Huawei plans to ship 750,000 units in 2026. Between ByteDance’s order and reported demand from Alibaba, a significant portion of that production is already spoken for.

The Nvidia Problem

US export controls have turned China’s AI chip market into a parallel ecosystem. Nvidia’s H20 was the compliant option — a deliberately limited chip that stayed under the export threshold. The 950PR’s 2.8x FP4 advantage over the H20 suggests Huawei has closed the gap faster than Washington anticipated.

The comparison to Nvidia’s unrestricted hardware is different. Against the B200 — Nvidia’s current flagship that Chinese companies can’t buy — the 950PR is less competitive on absolute throughput. But Chinese AI labs don’t need to beat the B200. They need to beat the chips they can actually get, and the H20 is the benchmark that matters inside China.

Nvidia’s Vera Rubin architecture, announced at GTC 2026, pushes the frontier further — but it also pushes the export control conversation further. Every generation of restricted Nvidia hardware that Chinese labs can’t access makes domestic alternatives like the 950PR more attractive, even if the absolute performance gap remains.

Who’s Buying

ByteDance’s $5.6 billion commitment is the headline order, but the customer base is broader. Alibaba has secured allocations, and the Chinese government’s push for semiconductor self-sufficiency provides both funding and regulatory tailwind.

The 950PR isn’t trying to be the world’s best AI chip. It’s trying to be the best AI chip available in China — and at 2.8x the FP4 performance of the only Nvidia product Chinese companies can legally buy, that’s a competitive position that grows stronger with every new round of export restrictions.