From Wikipedia, the free encyclopedia

A deep learning processor (DLP), or a deep learning accelerator, is an electronic circuit designed for deep learning algorithms, usually with separate data memory and dedicated instruction set architecture. Deep learning processors range from mobile devices, such as neural processing units (NPUs) in Huawei cellphones, to cloud computing servers such as tensor processing units (TPU) in the Google Cloud Platform.

The goal of DLPs is to provide higher efficiency and performance for deep learning algorithms than general central processing unit (CPUs) and graphics processing units (GPUs) would. Most DLPs employ a large number of computing components to leverage high data-level parallelism, a relatively larger on-chip buffer/memory to leverage the data reuse patterns, and limited data-width operators for error-resilience of deep learning. Deep learning processors differ from AI accelerators in that they are specialized for running learning algorithms, while AI accelerators are typically more specialized for inference. However, the two terms (DLP vs AI accelerator) are not used rigorously and there is often overlap between the two.

History