“Two months ago, that is, four years after the former FPGA giant Altera was acquired by Intel, Intel launched a new generation of FPGA products – Agilex, which was developed “fully with its own capabilities”. Completely different from the Stratix, Arria, Cyclone, Max and other product series launched by Altera before, Agilex is a brand-new FPGA series, which “reflects all the technical resources related to Intel that you can imagine”, and is entrusted by Intel with more expect.
Two months ago, that is, four years after the former FPGA giant Altera was acquired by Intel, Intel launched a new generation of FPGA products – Agilex, which was developed “fully with its own capabilities”. Completely different from the Stratix, Arria, Cyclone, Max and other product series launched by Altera before, Agilex is a brand-new FPGA series, which “reflects all the technical resources related to Intel that you can imagine”, and is entrusted by Intel with more expect.
The “related technical resources” mentioned here are basically equivalent to the “six technical pillars” proposed by Intel at the “Architecture Day” at the end of 2018: process and packaging, architecture, memory and storage, interconnection, security, and software. . Although Intel officially stated at the time that it would apply the six technical pillars to its entire engineering department as soon as possible, and implement it in the product and technical planning that has been or will be launched. But in less than half a year, Agilex FPGA has become the best carrier for the implementation of the “six technical pillars”, as evidenced by Intel’s strong system R&D and integration capabilities.
Get a glimpse of the whole leopard
Agilex is a combination of the words Agile (agile) and Flexible (flexible), and these two characteristics are also the two core points of modern FPGA technology. Intel promised in 2015 that it will provide heterogeneous architectures with different points 5 according to different customer needs in the future, including: discrete CPU+FPGA, packaged integrated CPU+FPGA, and Intel CPU/FPGA/ARM three. die-integrated FPGA.
The reason is obvious. Through integration, it can not only reduce latency, improve efficiency and performance/watt, but also unify the tool flow between the processor and FPGA, and provide broader architectural support for different performance requirements. Four years later, Agilex FPGA has realized the integration of different process technologies and different logic units through heterogeneous architecture, achieving breakthroughs in flexibility and customization.
According to Intel’s February benchmarks, Agilex achieved a 40 percent improvement over Stratix 10 at maximum clock rate (Fmax) while reducing overall power consumption by up to 40 percent. In addition, Agilex has up to 40 TFLOPs of DSP performance (FP16 configuration) and 92 TOP DSP performance (INT8 configuration).
Frankly speaking, Agilex FPGAs cannot actually achieve the above performance indicators based on the heterogeneous architecture alone. So, what unknown “black technologies” are hidden in Agilex FPGAs?
10nm process and advanced 3D packaging
For a semiconductor giant like Intel with “end-to-end” solutions, having advanced semiconductor process technology and packaging technology is the foundation and key to building leading products. At Architecture Day and the subsequent CES 2019 exhibition, Intel successively demonstrated 10-nanometer products covering cloud-to-end, including “Ice Lake” PC processors, “Lakefield” client platforms, “Snow Ridge” network system chips, “Ice Lake” “Intel Xeon Scalable processor, and is regarded by the outside world as another “milestone” innovative breakthrough after the Embedded Multi-Chip Interconnect Bridge (EMIB) packaging technology launched in 2018 – “Foveros” 3D packaging Technology.
In order to ensure the consistency of performance, the FPGA logic structure chip at the core of Agilex FPGA devices is also constructed using Intel’s 10-nanometer chip process technology, which is also one of the most advanced FinFET process technologies in the world. At the same time, Agilex also incorporates Intel’s proprietary Embedded Multi-Chip Interconnect Bridge (EMIB) integrated 3D Heterogeneous System-in-Package (SiP) technology, which provides a high-performance, low-cost Integrated into the same package as the FPGA logic fabric chip.
2nd Generation Intel HyperFlex Architecture and Chiplets Architecture
The logic structure chip of Agilex FPGA adopts the second-generation Intel HyperFlex architecture. In addition to the use of additional registers Hyper-Registers in the entire core structure as the first-generation architecture, the second-generation architecture also improves the overall structural performance, while the maximum Power consumption is minimized, one of the most notable improvements being the addition of a high-speed bypass in the hyper-register.
Chiplets, on the other hand, are physical IP modules that can integrate other Chiplets through package-level integration methods and standardized interfaces. With the mix-and-match mode of Chiplets, the number of transceivers is no longer limited by the number of channels. Designers who want to increase or decrease the number of transceiver channels can simply add the required transceiver chiplets without rearranging the chip to integrate a different number of channels. For that alone, Intel boosted the speed of a single transceiver channel from 58Gbps to 112Gbps.
high performance processor interface
As the hardware accelerator of CPU in the data center, it is used to accelerate various applications such as model training of deep learning, financial calculation, and network function offloading. It is a main application scenario of FPGA at present. But one of the core problems to be solved in this field is cache coherency. In other words, it is necessary to clarify the memory interconnection protocol between the CPU and the hardware accelerator.
In March of this year, Intel announced that it would join forces with Microsoft, Alibaba, Cisco, Dell EMC, Facebook, Google, Hewlett Packard Enterprise HPE, and Huawei to jointly launch a new interconnection standard, named Compute Express Link (CXL), with the application target targeting the Internet. Data centers, communication infrastructure, cloud computing and cloud services, etc., and this is also an important platform for FPGAs to show their talents.
To ensure high-performance inline processing and processor load acceleration, Intel Agilex FPGAs support the latest generation of high-performance processor interfaces, including PCIe Gen 5 and CXL, and will be the first coherent cache and memory interconnects powered by Xeon Scalable processors structured FPGA.
Advanced memory hierarchy
Agilex FPGAs support various levels of memory resources, including embedded memory resources, in-package memory, and off-chip memory provided through proprietary interfaces. The first level of the hierarchy is embedded on-chip memory, including MLABs, block RAMs, and eSRAMs, each of which can provide different capacities to meet different processing needs. In addition, Intel uses SIP technology to integrate high-bandwidth memory (HBM) directly into Agilex FPGA devices in the design, helping to reduce board size and cost, simplifying and reducing power requirements.
Another important point to watch is that the Agilex platform also integrates eASIC technology. This integrated eASIC chip customization technology enables migration from FPGA to structured ASIC. In other words, users can leverage the eASIC’s own custom logic continuum of reusable IP to perform flexible optimizations throughout the product life cycle and quickly move from FPGA to ASIC.
For every order of magnitude performance improvement of the new hardware architecture, the software can correspondingly bring two orders of magnitude performance improvement. On the new generation of Agilex FPGAs, the supporting software Quartus Prime can shorten the compilation time of hardware developers by 30% and improve the memory utilization by 15%. At the same time, a new generation of Agilex FPGAs is also incorporated into the One API architecture.
The “OneAPI” software programming framework, which will be launched in the fourth quarter of this year, provides software developers with a single-source heterogeneous programming environment, supports common performance library APIs, software development tools such as Intel VTune and Advisor, and can match software To hardware that can accelerate software code to the greatest extent, to simplify the programming interface of various computing engines including FPGA, CPU, GPU, artificial intelligence and other accelerators, reduce development complexity under various architectures and workloads, Accelerate large-scale deployment of the six technology pillars.
Embracing the Era of Diversified Computing
Let’s jump out of the FPGA circle for a while and see why Intel proposed the “six technology pillars”?
Some people say that these “six technical pillars” are the solid city defense built by Intel to defend against companies such as NVIDIA, AMD, and Xilinx. “. But in fact, no matter what it is called, in Intel’s view, these six technical pillars are interrelated and closely coupled, which can not only bring exponential innovation, but also the main driving force for Intel in the next ten or even fifty years. .
According to the data released by Intel, the transistor density under its 10nm process has reached 100.8Mtr/mm2, which is about 2.7 times that of the previous generation 14nm process. That is to say, in the three years or so from 2015 to 2018, Intel achieved a 2.7-fold increase in transistor density. At the same time, Intel is also actively researching transistors such as nanowire transistors, III-V materials (such as gallium arsenide and indium phosphide) transistors, 3D stacking of silicon wafers, high-density memory and interconnect, ultraviolet light (EUV) lithography, spin Frontier projects such as electronics and neuron computing.
To develop sophisticated semiconductor manufacturing technologies and platforms, to produce the best chips in the world, and to continuously promote technological innovation in manufacturing processes and packaging processes, is of course Intel’s mission, but it is not all.
We are now gradually moving to a data-centric era. It is expected that by 2020, the amount of data generated by ordinary users per day will be 1.5GB, smart hospitals will be 3TB per day, autonomous driving will reach 4TB per day, while connected aircraft and smart factories will reach 40TB and 1PB per day, respectively!
This means that with the explosive growth of data volume, data types have also undergone revolutionary changes. Emerging applications such as artificial intelligence, 5G, autonomous driving, cloud computing, and the Internet of Things have brought more diversified computing needs. For example, in the field of embedded applications and edge devices, users need to be able to extract data including images, videos and visual information in real time; in the communication infrastructure, users need high-bandwidth fusion processing capabilities; in the cloud, related enterprises The need is to be able to efficiently manage, organize and process the explosion of data.
That is to say, when we look at data architecture from a higher dimension, we will clearly realize that in this era of rapid evolution and exponential expansion of computing architecture driven by massive data, there is no single technology that can fully satisfy The computing needs of consumers or enterprise customers in the future cannot be solved by a direct scalar architecture. What they need is to connect diverse architectures in a diverse hierarchy, such as CPU, GPU, Scalar, Vector, Matrix and Spatial architectures in AI and FPGA products.
At the same time, with the increasing demand for collection, analysis, and decision-making from highly dynamic, unstructured natural data, the demands on computing go beyond classical CPU and GPU architectures. While leading processes and CPUs remain critical, fully capitalizing on the opportunities presented by the data explosion requires building a range of foundations including process and packaging, architecture, memory and storage, interconnect, security, software Rapid innovation on modules. It is impossible not to study the generation, types, and processing power of data. This is different from the previous general data processing. It is not feasible to simply emphasize the computing power of a certain processor.
Intel hopes to lead the era of “hyperheterogeneous computing” through six technology pillars. That is, by providing a diverse combination of scalar, vector, matrix, and spatial computing architectures, designed with advanced process technology, supported by disruptive memory hierarchies, integrated into systems through advanced packaging, and ultra-large-scale deployment using light-speed interconnects, Provide a unified software development interface and security functions.
Take Intel’s next-generation CPU microarchitecture, Sunny Cove, shown at CES 2019, for example. It includes new features that accelerate specialized computing tasks such as artificial intelligence and encryption, and is designed to improve computing performance per clock and reduce power consumption for general-purpose computing tasks. . In Ice Lake, the 10-nanometer PC processor that will be mass-produced, the Sunny Cove microarchitecture, AI usage acceleration instruction set, and Intel’s 11th-generation core graphics are highly integrated.
Why combine process packaging and architecture design? Because through hyper-heterogeneous computing, Intel can integrate technologies such as different architectures, different processes, 3D packaging, interconnection, and OneAPI to ensure that product diversity can be most effectively achieved and products improved. Stability, quickly meet the needs of customer customization and marketization.
In the process of transforming into a data company, Intel defines itself as an end-to-end solution provider, that is, its product line covers the cloud, network transmission and terminals. Among them, the core comes from large-scale data processing in the cloud, and the end-to-end layout allows Intel to grasp “when the data comes, what kind of data, and how to process it.”
In order to increase the ability to process new data, accelerate the pace of technological development, and drive computing beyond PCs and servers, Intel has spent the past six years not only working on specialized architectures that can accelerate classic computing platforms, but also increasing its focus on artificial intelligence (AI) for the past six years. ) and neuromorphic computing investments and R&D. The first self-learning neuromorphic test chip Loihi that has completed manufacturing and packaging, the delivered 49-qubit superconducting quantum test chip, and the spin qubit manufacturing process invented on the 300mm wafer process are all regarded by the industry as Intel’s advance layout for future computing aims to subvert the future computing landscape.
With the integrated application and further development of artificial intelligence, Internet of Things, sensors and other technologies, more and more unattended machines and application scenarios are possible. “Autonomous” is also replacing “intelligence”, becoming the driving force behind The new trend of a new round of innovation and development. Against the background of such an era, Intel has strategically jumped out of the low-level competition pattern of simply competing technology and computing power, and stood at a higher starting point. , AI accelerators, communication systems, high-speed storage and other parts are organically combined to redefine the mode of product development and design. Agilex FPGA is one of the best proofs. We also look forward to seeing more products based on the six technical pillars, leading the industry to better address the challenges of diverse computing needs.