Graphcore unveiled its third-generation intelligence processing unit (IPU), the first processor to be built using 3D wafer-on-wafer (WoW) technology.
Codenamed the Bow IPU, Graphcore’s new AI processor achieves up to 40% higher performance and 16% better power efficiency than the previous (non-WoW, but otherwise identical) product, launched in 2020.
“Wafer-on-wafer technology sets a direction in terms of where Graphcore is heading,” said Graphcore CEO Nigel Toon. “We’ve been working very closely with TSMC on this technology, developing this over the last two years. We’ve been in extensive production qualification over the last year with very detailed testing for reliability, and we’re now at the stage where this technology is ready for full volume production.”
Graphcore plans to drastically increase its price/performance metrics by offering the new parts at the same price as the old ones. Customers can swap over to Bow IPUs without making any software changes, the company said.
Graphcore also announced that it will use future generations of WoW IPU to build a product it calls the Good Computer, an ultra-intelligence AI supercomputer product capable of 10 ExaFLOPS, in response to customer demand.
Graphcore bills the new Bow IPU as “the highest performance production AI processor in the world today.” Each Bow IPU chip offers 350 TeraFLOPS of mixed-precision AI compute. The processor has the same 1472 independent processor cores and the same 900MB in-processor SRAM as the previous generation Colossus Mk2 IPU chip, but it runs around 40% faster than its predecessor – 1.85 GHz instead of 1.325 GHz – hence the up to 40% performance improvement.
Graphcore said its customers are seeing up to 40% increase in time to train across a range of models. Figures published by Graphcore show speedups of between 1.29x and 1.39x across a range of workloads including image classification (including vision transformers), object detection, text to image, graph networks, natural language processing, and speech recognition.
Power efficiency (performance per watt) also improved between 9-16% across a smaller range of workloads, according to Graphcore’s figures.
Intended to power AI training and inference at large scale, Graphcore’s Bow IPUs can be combined into very large multi-chip systems. A 256-chip BowPod-256 system offers 89 PetaFLOPS, while a 1024-chip BowPod-1024 offers 350 PetaFLOPS.
Graphcore is TSMC’s lead customer for the foundry’s wafer-on-wafer (WoW) technology.
WoW chips feature two wafers bonded together: a wafer of processor die, and a wafer of power delivery die. The power delivery wafer contains deep trench capacitors, similar to those used to store information in DRAM, used as a charge reservoir and connected to the transistors on the processor die at very low impedance.
“This allows the transistors to operate much more quickly at good power efficiency, so the net effect on the Bow IPU processor is to increase its clock speed,” said Graphcore CTO Simon Knowles, despite using the same processor design and the same process technology (TSMC 7nm) for the processor die.
WoW depends on two key technologies.
Hybrid bonding allows two wafers to be bonded together, metal sides together, without any interstitial bumps.
“It’s like a kind of cold weld,” said Knowles. “The advantage of doing it this way is an extremely high density of interconnect between the wafers.”
The other key technology is a new type of through-silicon via (TSV) called a back-side TSV (BTSV) which allows connection to layers inside the wafer “sandwich.”
WoW is distinct from chip-on-wafer technologies used in the industry to mount memory die on top of processor die; Knowles said the differences result in finer connection pitch for WoW, though he did not reveal what the pitch is. Knowles ascribed the finer pitch to the ease of aligning two complete wafers rather than two individual die, and the ability to use an ion etch process for BTSVs due to the power delivery wafer being extremely thin. The thin wafer, “thinner than cling film,” is so thin that it’s transparent and floppy, so bonding to the thicker wafer before thinning allows the thicker wafer to act as a mechanical support during subsequent process steps. This wouldn’t be possible with individual die, he said.
“We’ve been working With TSMC as their vanguard customer in this technology for about two years now,” said Knowles. “An enormous amount of work has been done to make this a production technology, and I’m sure any of our rivals who are starting today will take a good long time to get to where we are.”
The new Bow IPU processor is “100% software compatible” with existing customer code, since the behavior of the processor is identical to the previous generation (it just runs faster and more efficiently). Bow is supported by Graphcore’s Poplar low level compiler and SDK, which is compatible with many higher-level frameworks, including PyTorch, TensorFlow, Keras, Lightning, Halo, PaddlePaddle and more.
As with previous generations of IPU, Graphcore’s Bow IPU will be offered as a 4-IPU, 1.4 PetaFLOPS, 1U server blade. Graphcore has relied on price-performance metrics in its previous product pitches, rather than chip-to-chip comparisons, and this time is no different. The company compares its BowPod-16 (16 IPU chips, 5.6 PetaFLOPS, $149,995) to an Nvidia DGX-A100 (8 GPU chips, 5 PetaFLOPS, $299,000) which is a similar physical size. Graphcore claims a TCO advantage based on this comparison.
The Bow IPU machines and Bow-Pod systems are being offered at the same price as their previous-gen equivalents, despite increasing wafer cost, using twice as many wafers and more complex packaging processes. How is the company able to do this?
“It really comes down to the economics of manufacturing scale,” Toon said, adding that extensive production qualification processes take into account improved learning on the redundancy Graphcore builds into its chips. “All of this combines to allow us to be able to deliver this and more advanced technology at the same cost,” he said.
Toon said Graphcore may choose to reduce the cost of previous-gen systems going forward.
Graphcore also announced a roadmap product which it calls the Good Computer, after 1960s computer science pioneer Jack Good. Good proposed an “ultra-intelligent” machine, that is, a computer with more intelligence than a human brain. Graphcore’s Good Computer will use scale to achieve this. Future-gen IPUs, which Graphcore anticipates will use WoW processes to stack processor die, will be used to build a 10-ExaFLOPS system with 8192 Graphcore chips. It will support models with up to 500 trillion parameters.
“We’ll exceed the parametric capacity of the human brain, and thereby we hope, represent a big step on the path to the discovery of ultra-intelligence,” Knowles said.
Knowles was clear that the Good Computer is not intended to be a one-off; it will be a commercial product retailing for around $120 million, and is a direct response to what customers are asking for today.
“Now that we have the technology working after two years of close development with TSMC, we’re ready for this step… And it will be a very potent step,” he said.
The Good Computer should be on the market by 2024.
Bow IPUs are already in use at the US Department of Energy’s Pacific Northwest National Laboratory for applications including cybersecurity and computational chemistry, as well as with a handful of other customers.
US cloud service provider Cirrascale is making Bow Pod systems available today as part of its Graphcloud IPU bare metal service, while G-Core Labs in Europe will launch cloud instances in Q2 2022. Kingsoft Cloud will launch an IPU service in China, and NHN is in the process of building out an IPU cloud offering in Korea.
Bow IPU chips are in volume production and systems are shipping to customers now.