Many of the world’s top “architects” are people you’ve probably never heard of, and they’ve designed and created amazing structures you’ve probably never seen before, such as the intricate systems inside chips. The basic material for making chips is derived from sand, but the chips themselves have become indispensable to people’s contemporary lives. If you use your cell phone, computer, or send and receive messages over the Internet, you are benefiting from the great work of these architects all the time.
FPGA is one of the chips. Since its birth in the 1980s, FPGA has developed from a simple programmable gate array to a complex system-on-chip with a large amount of programmable logic. In addition to the hardware structure, the development tools and application scenarios of FPGA have also made great progress and expansion, and the importance of FPGA in the entire semiconductor industry is also increasing. The evolution of FPGA chips is inseparable from the continuous invention and creation of these “architects”.
A few years ago, these top FPGA architects selected the 25 most influential research results in the FPGA field from the 20 years since the 1990s. Through these important results, we will understand how FPGA has developed so far, and know where FPGA technology will develop in the future.
The 25 research results are divided into architectures, EDA tools, circuits, applications and other categories according to the research field. Each achievement is recommended by a top scholar in the field. Next, I will introduce these important research results that have changed the FPGA development process in several articles. This article is the system architecture.
(The original texts of the 5 achievements introduced in this article have all been uploaded to Knowledge Planet: Lao Shi Tanxin Advanced Edition, please scan the code at the end of the article to enter the planet to view)
01
Combination and unification of FPGA and SIMD array
One sentence summary: FPGA is the pioneering work of parallel computing accelerators
English name: Unifying FPGAs and SMID Arrays
By Michael Bolotski, Andre DeHon, Thomas F. Knight, Jr.
Published: 1994
Presenter: Jonathan Rose (University of Toronto)
Andre? DeHon (currently a professor at the University of Pennsylvania)
This work revisits the FPGA as a computing “medium” from a philosophical point of view and links it to a single-instruction, multiple-data (SIMD) approach for parallel acceleration of conventional computing. This work is the first to reveal how the two computing methods, FPGA and SIMD, can be seen as a continuous whole, and in a sense combine and unify the two.
This work proposes a hybrid architecture called “Dynamic Programmable Gate Array (DPGA)”. In this architecture, the bitstream used to configure the logic and routing resides in specially designed local memory cells and changes rapidly over time. In DPGA, there is a central context identifier, which is responsible for deciding which configuration to load from local memory, as shown in the figure below.
By using this approach, the DPGA architecture is made somewhat SIMD-like. Specifically, if the contents of these local memories are the same, the same “instructions” will be executed; conversely, if the contents of the local memories are different, then each processing unit will fight on its own. This allows the DPGA to process data both in parallel and serially.
In addition, this work provides an in-depth analysis of the costs and benefits of this new computing architecture.
This work is the culmination of a series of influential work on DPGA, and one of the first to explore the context of FPGA programming. Although this programmable architecture has not become mainstream in the industry, it has inspired many subsequent high-quality work and laid a solid theoretical foundation for newcomers.
02
A High-speed Hierarchical Synchronous Programmable Array
One sentence summary: A pioneering exploration of high-performance, high-clock frequency FPGA architecture design and timing optimization algorithms
English name: HSRA: High-Speed, Hierarchical Synchronous Reconfigurable Array
By William Tsu, Kip Macy, Atul Joshi, Randy Huang, Tony Tung, Omid Rowhani, Varghese George, John Wawrzynek, Andre? DeHon
Published: 1999
Presenter: Carl Ebeling (University of Washington)
This work focuses on answering the question: Is it possible to design an FPGA fabric that can compete with the clock frequency of a processor or ASIC?
Typically, the clock frequency of an FPGA is 5 to 10 times slower than that of a CPU or ASIC, which is mainly limited by the logic delay and interconnect delay inside the FPGA. This work hopes to bring the performance of FPGAs to a new level by combining innovations in FPGA architecture and CAD tools.
Overall, the approach taken in this work is to design the system architecture based on a specific clock frequency. In fact, this is completely contrary to the traditional FPGA design method. In this way, however, the designer can precisely define the number of logic layers, interconnects, and distances in a clock cycle, resulting in a highly pipelined structure including programmable interconnects.
The most novel aspect of HSRA is its tree-like hierarchical interconnect structure, as shown in the figure below. This architecture allows connections to be made in a point-to-point manner, so the distance and delay between any two points can be obtained. With this information, many placement and routing issues can be resolved from a timing perspective.
On the other hand, not all designs can be deeply pipelined optimized according to the HSRA architecture. To address this issue, this work creatively employs a method called C-slowing, by introducing additional parallelism in the circuit to handle and compensate for the delays introduced when large feedbacks are included in the design. C-slowing has gradually become one of the mainstream methods of retiming.
In summary, this work opens up a new direction in the field of FPGA architecture, which is FPGA architecture design for timing and high performance. The HSRA architecture itself is so different from traditional FPGAs that it has not gone far on the road to commercialization, but many ideas and methods in this work have had a profound impact on the evolution of modern FPGA architectures.
03
Dynamic Power in Virtex-II FPGAs
One sentence summary: The pioneering work of modern FPGA dynamic power analysis, modeling and optimization methods
English name: Dynamic Power Consumption in Virtex-II FPGA Family
Authors: Li Shang, Alireza Kaviani, Kusuma Bathala
Published: 2002
Presenter: Russ Tessier (University of Massachusetts)
Prior to this work, few studies have specifically addressed the issue of power consumption in FPGAs. Therefore, this result is an important first step for researchers to deeply understand the power consumption of FPGAs and optimize power consumption.
Regarding power consumption, the industry has long assumed that interconnect power consumption is the main source of dynamic power consumption in FPGAs. This work experimentally proves this hypothesis to be correct. In the analysis of dynamic power consumption, this work studies the influence of different structures in FPGA on power consumption, thus providing a theoretical basis for the emergence of CAD algorithms optimized for power consumption later. At the same time, through simulation and physical measurement, the power distribution results proposed by this achievement are very credible, as shown in the figure below.
The industry’s interest in FPGA power optimization began about a decade ago, when power optimization in FPGAs had just become another major optimization direction after area and latency optimization. This work not only provides distributional results of dynamic power consumption on FPGAs, but also provides detailed methodological support for power analysis and optimization algorithms over the next decade.
This work is also a typical example of a close collaboration between industry and academia. In this work, Xilinx provides models and datasets of FPGA devices and provides advanced dynamic power analysis methods and techniques. Since the academic community is familiar with the Virtex II FPGA architecture, there is no need for FPGA vendors to publish additional confidential information, making the methodology used in this work highly general.
04
Stratix FPGA Routing and Logic Architecture
One sentence summary: The cornerstone of the five-generation Stratix core architecture
English name: The Stratix Routing and Logic Architecture
By David Lewis, Vaughn Betz, David Jefferson, Andy Lee, Chris Lane, Paul Leventis, Sandy Marquardt, Cameron McClintock, Bruce Pedersen, Giles Powell, Srinivas Reddy, Chris Wysocki, Richard Cliff, Jonathan Rose
Published: 2003
Presenter: Herman Schmit (Carnegie Mellon University)
Vaughn Betz (currently professor at the University of Toronto)
Over the past many years, a team at the University of Toronto, led by Professor Jonathan Rose, has built a suite of FPGA design tools called VPR to design and explore simplified FPGA system architectures and microarchitectures. VPR contains many algorithms and processes of FPGA back-end design, including logic packaging, placement and routing, etc., which makes many FPGA architectural problems can be quantitatively analyzed with VPR. This also makes the University of Toronto one of the most important FPGA academic research centers in the world.
In 1998, Professor Jonathan Rose founded a start-up company called RightTrack CAD, whose main purpose is to commercialize VPR. At the same time, Altera is also working hard to improve their FPGA architecture to deal with competition from Xilinx’s successful Virtex family. In 2000, Altera acquired RightTrack and developed the Altera FPGA Modeling Toolkit to optimize their first-generation Stratix FPGA architecture.
This achievement introduces the technical details of the Stratix architecture in detail, as shown in the figure below. More importantly, it systematically describes the specific process that architects make when designing Stratix. This work demonstrates that the quantitative analysis method employed by VPR is equally applicable to analyzing real-world performance and design metrics, such as FPGA physical area and critical path latency. These methods and tools have been used in the design of at least 5 generations of Stratix FPGAs. This work also successfully demonstrates the close connection and cooperation between academic research and technological development in industry.
05
Quantify the difference between FPGA and ASIC
One sentence summary: The benchmark for FPGA benchmarking
English name: Measuring the Gap between FPGAs and ASICs
By Ian Kuon, Jonathan Rose
Year of publication: 2006
Presenter: Herman Schmit (Carnegie Mellon University)
Jonathan Rose (currently professor at the University of Toronto)
This work has been cited more than 1400 times since publication. One of its major contributions is to quantify the cost of programmability. This work shows that the core area of an FPGA is 40 times larger than a standard ASIC cell. This is one of the main drivers of all efforts to improve and improve the FPGA architecture.
Prior to this work, most comparisons between FPGAs and ASICs were based on small circuits, and tended to compare FPGAs to mask-programmable gate arrays. In this case, the FPGA consumes only about 10 times the additional area. However, by 2006, ASIC CAD tools had come a long way. ASIC designs based on synthesizable logic cells have become a common choice in the industry.
Objectively speaking, the size of 40 times the area mentioned here is actually not reasonable, because only the area of the FPGA core area is considered here, and many logic and arithmetic operation units are not implemented with the help of solidified multipliers. In this work, it divides the benchmark circuit set to be studied into four categories according to whether the circuit contains arithmetic operations, memory cells, structured logic, and registers, as shown in the figure below. In a circuit design that includes logic units and arithmetic operation units, if the FPGA architecture includes a solidified multiplier, the area of the FPGA will be 28 times larger than that of an ASIC.
A more important contribution of this work is to reveal the correlation between the architectural features of FPGAs (such as hardened memory cells and DSPs, etc.) and benchmark results. At the same time, this work deeply analyzes the impact and correlation of the logic structure solidified in the FPGA on the performance and cost, which directly has a profound impact on the architectural design of modern FPGAs. In modern FPGAs, which IP or logic circuits are implemented in a hard-core manner has become an important proposition that affects the development of FPGAs. This is just as important as traditional FPGA architecture issues such as LUT size, routing topology, etc.
In academia, benchmarking efforts like this are always controversial. Because they either use different metrics when comparing, or abstract the criteria for comparison, making the results not extensible and general. However, this work sets an example for this type of work, showing how to make comparisons objectively, and how to describe the specifics of comparisons in detail, so that researchers can draw their own conclusions from the results, and apply this idea applied to future research work.
Epilogue
The important work of these five FPGA system architectures has laid the foundation of modern commercial FPGAs, such as Xilinx’s Virtex and Altera’s Stratix, and some of them have created an important direction for the use of FPGAs as parallel hardware accelerators. Specifications and standards for FPGA fabric benchmarking. More importantly, the methodology, way of thinking, combination of forward-looking and practical, and rigorous academic attitudes adopted in these works have set the highest model for subsequent academic and industrial research.
The Links: CM400DU-24NFJ MCC312-16IO1 FF200R12KE3