Intel's AI accelerator NNP-T with PCIe 4.0 and TSMC technology

119 trillion tensor operations per second (TOPS): This promises Intel at the conference Hot Chips HC31 for the upcoming AI accelerator Spring Crest aka NNP-T1000 aka NNP-L1000. Intel wants to beat the Nvidia Tesla V100, for which Nvidia calls 120 TOPS maximum. But according to Intel, the theoretical AI performance of the NNP-T1000 should be much more practical. And thanks to a special interface with four fast ports, numerous servers, each with several NNP-Ts, can be closely networked to further enhance the performance of training AI algorithms.

Intel is also developing a dedicated accelerator called NNP-I1000 aka Spring Hill for use with AI algorithms, inferencing. Intel wants to produce this with its own 10-nm technology.

NNP stands for Nervana Neural Processor: The design comes from the company Nervana, which was acquired in 2016. Therefore, the NNP-T is not designed for Intel's manufacturing technology, but for TSMC's 16-nanometer FinFETs. This probably also contributes to the fact that the NNP-T already dominates PCI Express 4.0, while Intel's 14-nm chips are still stuck with PCIe 3.0.

Intel's "Spring Crest" aka NNP-T1000 has 27 billion transistors and requires 150 to 250 watts of "typical power".
(Image: Intel)

The NNP-T should come on the market in 2019; Intel had already announced it in 2018, but in the meantime as NNP-L1000. The chip combines 27 billion transistors on 680 square millimeters. It sits on a 12-square-inch interposer with four stacks of 8 GB of ECC-protected HBM RAM each. Overall, the data transfer rate to the local RAM is more than 1.2 TB / s.

The power consumption is according to Intel "typically at 150 to 250 watts". Over 64 fast (112 Gbps) serial lines spread across four ports, up to 1024 servers can be paired with NNP T1000 cards without additional switches.

An NNP-T has 24 Tensor Processing Clusters (TPCs), each containing two 32×32 multipliers for the BFloat16 data format. BFloat16 will also handle future Xeons from the generation Cooper Lake. When the NNP-T now comes specifically to the market, Intel did not reveal, but announced further benchmark results before the end of the year.

In a completely different performance region, Spring Hill aka NNP-I1000. There is already more Intel technology in it, specifically the current 10 nm manufacturing technology and two of the new Sunny Cove computing cores, which also count in Ice Lake processors. However, the actual deep learning algorithms for a Convolutional Neural Network (CNN) run on 12 Inference Computing Engines (ICEs). Each of these in turn has a so-called Deep Learning Compute Grid and a VLIW vector unit based on the DSP Cadence Tensilica Vision P6 (VP6).

For the popular ResNet 50 benchmark Intel calls 3600 "Inferences per second" with 10 Watts of power consumption of the actual chip (thus without its local DDR4 SDRAM) and calculates from it an efficiency of 4.8 TOPS / Watt.


. (tagsToTranslate) Intel (t) Artificial Intelligence (t) Nervana (t) Processors (t) Computing Accelerator (t) Data Center (t) Server & Storage