Roel Meeuws
antitative Hardware Prediction
Modeling for Hardware/Soware
antitative Hardware Prediction
Modeling for Hardware/Soware
Co-design
PROEFSCHRIFT
ter verkrijging van de graad van doctor aan de Technische Universiteit Del,
op gezag van de Rector Magnificus prof. ir. K.Ch.A.M. Luyben, voorzier van het College voor Promoties,
in het openbaar te verdedigen
op Woensdag juli om : uur
door
Roeland Jan MEEUWS
Ingenieur in Computer Engineering geboren te Roerdam, Nederland
Samenstelling promotiecommissie:
Rector Magnificus voorzier
Prof. dr. K.L.M. Bertels Technische Universiteit Del, promotor Prof. dr. ir. H.J. Sips Technische Universiteit Del, NL Prof. dr. O. Nieto-Taladriz Universidad Politecnica de Madrid Prof. Dr.-Ing. Michael Hübner Ruhr-Universität Bochum
Prof. dr. J. Champeau École Nationale Supérieure de Techniques Avancées
Dr. A. Pimentel Universiteit van Amsterdam
Dr. E.A. Cator Technische Universiteit Del
Prof. dr. ir. G.-J. Houben Technische Universiteit Del, reservelid
ISBN ----
Cover image: a ipu, an ancient Incan recording device, that consisted of colored threads of wool with a series of knots applied to them, thus signifying different quantities and values.
Copyright © R.J. Meeuws
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmied, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without permission of the author.
I dedicate this book to my dear parents
and my beloved Marina.
Blessed is the man
who does not walk in the counsel of the wicked or stand in the way of sinners
or sit in the seat of mockers.
But his delight is in the law of the L , and on his law he meditates day and night. He is like a tree planted by streams of water, which yields its fruit in season
and whose leaf does not wither. Whatever he does prospers.
— Psalm 1:1-3
Abstract
h
ardware estimation is an important factor in Hardware/Soware Co-design.In this dissertation, we present the ipu Modeling Approach, a high-level quantitative prediction model for HW/SW Partitioning using statistical methods. Our approach uses linear regression between soware complex-ity metrics and hardware characteristics. e resulting prediction models provide essen-tial information for such Co-design tasks, as identifying resource intensive parts of the application, helping to evaluate different mapping options, and guiding code modifica-tions.
We show that prediction models can be generated for different High Level Synthe-sis tools, reconfigurable devices, hardware measures, and application domains. To this purpose, we present a detailed investigation of several ipu prediction models target-ing each of these different dimensions. In addition, an extensive description is given of the targeting of the ipu Modeling Approach to a new tool and platform within a few days. We evaluate the quality of our models by carefully investigating the error behav-ior, which ranges from .%, for a domain-specific model targeting slices, to .%, for a domain-agnostic model targeting the number of controller states.
As a demonstration of the practical use of ipu Prediction models, we present a case study of two applications. ese applications were analyzed and partitioned for the Molen Machine Organization. We show how ipu prediction models play an important role in evaluating area constraints and performing Design Space Exploration. e two applications had an increased performance of % and %.
Acknowledgements
t
here comes an end to everything, as is the way of things. is is just as trueto Ph.D. research as to anything else. It is up to the Ph.D. candidate to end it well. I am very thankful that I have been able to do just that. Although the book in your hands is the proof that my work has been fruitful in the last few years, I cannot and will not claim to have achieved this on my own. Many excellent people have contributed to this work or supported me during the last few years and I would like to try and thank every one of them. Here goes.
Let’s start by thanking my promotor and mentor, Koen Bertels. Koen, together with Stamatis Vassiliadis, you challenged me to start this undertaking and you kept on chal-lenging me during the last few years. His death was a big shock to all of us. May he rest in peace. Regardless, you have supported me even during those days. I value the freedom that you entrusted to me and I hope that I have been able to keep that trust. It has always been a pleasure to discuss and envision new ideas, but also to talk about politics, religion, and so on. I am especially thankful for the dedicated time you took to read my thesis in order for me to schedule the defense before my wedding.
Of course, I also thank Carlo Galuzzi for all the proofreading of my papers and my thesis in the past years. What would Ph.D. students do without people like you that give valuable comments on your writing and practical tips to navigate the research arena? ank you in particular for pushing me to write a journal before writing my thesis. Writing my thesis became so much simpler because of that!
A very special thanks goes out to the orum: Kamana, Faisal, and Arash. I am so thankful that God in His grace put me in an office with you guys for the last few years! I thank you three for all the discussions, the dinners, the laughs, the prayers… I am thankful to have gained such friends as you. Kamana, thank you for your trust, friendship, and honesty. Faisal, thank you for your friendship, laughs, and the paratha’s of your wife! Arash, thank you for your friendship, discussions on music, and our fruitful cooperation! God bless you guys!
I would also like to thank the Koen, Carlo, the orum, and Imran Ashraf for proof-reading my thesis. Imran, although you are not officially part of the orum, I do also value your friendship and our joint research. You still have some years ahead of you, but I hope Arash and me have been able to assist you in kick-starting your research. Also, I must not forget Vlad Mihai Sima. Vlad, I cannot count the times that I popped into your
was not for your help and your DWARV compiler, I would not have been able to finish this work!
Good research also requires good technical and administrative support. In that re-spect, I would like to thank several excellent people. First, Lidwina, thank you for always guiding me through the jungle of financial, administrative, and other paperwork. I’ve also enjoyed our chats on other subjects as well. Second, Bert, your flexibility in provid-ing any resources that were necessary for continuprovid-ing my research have been essential. irdly, Erik and Eef, you guys have been a great support. Whether it was installing an extra soware package, puing up with me when I crashed the high-performance machines again, or just having a chat, I could count on you guys!
ere are many more people that deserve special mention, but I hope they can for-give me for being only human. I do want to thank the following colleagues as well, in no particular order: Stephan Wong, Mojtaba Sabeghi, Razvan Nane, Chunyang Gou, Sorin Cotofana, Radu Ştefan, Ghazaleh Nazarian, Catalin Ciobanu, Maghyar Shahsavari, Laiq Hasan, Seyab Khan, Omar Esli Jimenez Villareal, Bogdan Spinean, Demid Borodin, Chris-tos Strydis, Arnaldo Azevedo, Yi Lu, omas Marconi, Hans van Someren, Georgi Gay-dadjiev, Georgi Kuzmanov, Elena Moscu Panainte, Cor Meenderinck, Ozana Dragomir-Azevedo, Lotfi Mhamdi, and the many others that I have failed to mention here!
I would also like to thank my friends here in Holland that had to miss my presence on many an occasion. I am sorry for not always being there, but grateful for your continued friendship! I thank, in particular, Johan Kok, Jeroen Brosky, David Speters, Hans Linker, Anoeska de Bonte, Carine van der Ham, Martijn Nijhoff, Gerard Aalbers, my fellow bible study group members, the members of the “Christelijk Gemengd Koor Delfshaven”, the members of the “Spangen Gospel Choir”, and many more.
Special thanks and appreciation goes to my wife-to-be. Marina you are the light of my life and I am so very grateful for every second with you. Even more, I am thankful for the fact that you pushed me to finish this book as soon as possible.
Of all people, my greatest thanks goes out to my loving parents. Mom, Dad, I love you and am forever thankful for all the time, energy, love, guidance, and what have you more, that you have always given me. You’re the best!
en there is nothing else to do, but thank Him who is the creator of the heavens and the earth. To God be all the glory, praise, and thanks, forever and ever, in Jesus Name, amen.
Roel Meeuws Del, e Netherlands, April
Table of contents
Abstract ⅲ Acknowledgements ⅴ Table of contents ⅶ List of figures ⅹⅲ List of tables ⅹⅶ List of listings ⅹⅸ List of Acronyms ⅹⅺ Terminology ⅹⅹⅴ Introduction . Problem Overview . . . . e ipu Modeling Approach . . . . . Research Challenges . . . . Dissertation Contributions . . . . Dissertation Organization . . . Hardware Estimation for Reconfigurable Platforms
. Heterogeneous Reconfigurable Architectures . . . .. Types of Processing Elements . . . .. Reconfigurable Computing . . . . Hardware/Soware Co-Design . . .
. Hardware Estimation . . . .. Evaluation Criteria . . . .. Logic Resource Estimation . . . .. Interconnect Estimation . . . .. Validation in Hardware Estimation . . . . Project Context . . . .. Molen Abstraction Layer . . . .. Del Workbench . . . . Research Demarcation . . . . Summary . . .
Measuring Soware Complexity for Hardware Estimation
. Soware Measurement . . . . Classifying Soware Metrics . . . .. Entities and Aributes . . . .. Scales of Measurement . . . .. Static and Dynamic Metrics . . . .. Levels of design . . . . Implementation Issues . . . . Soware Complexity Metrics . . . .. Halstead’s Soware Science Metrics . . . .. Average Information Content Classification . . . .. Scope Number and Scope Ratio . . . .. McCabe’s Cyclomatic Complexity . . . .. Nesting Depth . . . .. Piwowarski’s complexity . . . .. Gong and Schmidt Complexity . . . .. Loop Complexity . . . .. (Modified) Basili-Hutchens Complexity . . . .. Tai’sDU(G)Metric . . . .. Oviedo’s Data Complexity . . . .. Elsho’s Dataflow Complexity . . . .. (Source) Lines of Code . . .
.. Prather’s Testing Metric . . . .. NPATH . . . .. Operators, blocks, and variables . . . .. Caveats . . . . Summary . . .
Statistical and antitative Prediction Modeling
. Linear Model Definition . . . .. Least Squares Regression . . . . Linear Regression Assumptions . . . .. Normality . . . .. Homoscedasticity . . . .. Independence . . . .. Linearity . . . . Modeling Issues . . . .. Collinearity . . . .. Non-linearity . . . .. Sparse data . . . .. Outliers . . . . Regression Techniques in the ipu Modeling Approach . . . . .. Box-Cox power transform . . . .. Generalized Linear Model . . . .. Principal Component Regression . . . .. Partial Least Squares Regression . . . .. Variable Selection . . . .. Artificial Neural Networks for Regression . . . .. Multi-Level Models . . . . Model Evaluation . . . .. Summarizing the prediction error . . . .. Visualization of the errors . . . .. Cross-validation . . . . Summary . . .
e ipu Modeling Approach
. Modeling Methodology . . . . Kernel Library . . .
.. Semi-Automatic Modeling Process . . . .. Other tools . . . . Targeting the ipu Modeling Approach to LegUp/Synopsys . . . . . Summary . . .
Validation of the ipu Prediction Models
. Criteria of Evaluation . . . . Experimental Setup . . . . Speed of ipu Prediction Models . . . . . Domain-agnostic modeling . . . .. HLS tools . . . .. Hardware Characteristics . . . . Domain-specific modeling . . . . Comparison with other approaches . . . .. Logic Utilization Estimation Approaches . . . .. Interconnect Estimation Approaches . . . . Analysis of ipu Prediction Models . . . . Evolution of ipu Model ality . . . . Summary . . .
ipu Prediction Models in Practice
. Case Descriptions . . . .. Q² Profiling Framework . . . .. Objective . . . .. Applications . . . .. Target Platform . . . . Q² Analysis and Partitioning . . . .. Canny Edge Detection . . . .. Mixed Excitation Linear Prediction Vocoder . . . . Discussion . . . . Summary . . .
Conclusions
. Summary . . .
. Main Contributions . . . . Research Opportunities . . . A Implementation details Bibliography List of Publications Samenvatting en Stellingen Curriculum Vitae ⅺ
List of figures
Chapter
. In the early stages of HW/SW Co-design, the developer faces many dif-ferent issues. . . . An overview of the ipu Modeling Approach. . . . . An illustration of the utilization of a ipu Prediction Model. . . . . An outline of the different chapters, challenges, and contributions in this
dissertation. . .
Chapter
. An overview of the basic Von-Neumann architecture. . . . Flexibility versus performance for different computing paradigms. . . . e layout of a typical island-style FPGA with Configurable Logic Blocks,
Programmable Interconnect Points, and switch boxes (s). . . . A basic Logic Element with a K-input Look-Up Table, a flip-flop, and an
output multiplexer. . . . An overview of the Molen Platform with an indication of the flow of
instructions through the platform. . . . Overview of the Del Workbench tool chain. . .
Chapter
. An overview of the different classifications with regard to Soware Com-plexity Metrics, that can be found in Section .. . .
. A visualization of the quantitative relation between hardware and so-ware, as captured by a certain model f (x). . . . Examples of different graphs related assumptions in regression models. . . A graphical representation of a covariance matrix. . . . Examples of different distributions from the exponential family. . . . Screeplot of the Principal Component Analysis of Soware Complexity
Metrics. . . . An example of a single neuron and a feed-forward Artificial Neural
Net-work with a -- topology. . . . e visualization of the error distribution and error trend with respect
to the observed values. . . . An example ofK-fold cross-validation. . .
Chapter
. A detailed overview of the ipu Modeling Approach with its different tools and components. . . . Model quality given different zero-thresholds. . . . An overview of different tool and platform combinations targeted by the
ipu Modeling Approach. . . .
Chapter
. An overview of the prediction performance of the total area for different
ipu Prediction Models generated for four different High-Level
Syn-thesis tools. . . . An overview of the prediction performance of the area for different ipu
Prediction Models for four different application domains. . . . e evolution of the ipu Modeling Approach and its predecessors. . .
Chapter
. An overview of the Q² Profiling Framework. . . . e results of the different steps in the Canny Edge Detection algorithm
as applied to the commonly used Lena-picture. . . . e Pareto-front of the different mapping options for the Canny Edge
Detection application. . .
. e Annotated antitative Data Usage Graph for the Canny Edge De-tection application modified for hardware mapping. . . . e Pareto-front for the different mapping options for the Mixed
Excita-tion Linear PredicExcita-tion applicaExcita-tion. . . . e Annotated antitative Data Usage graphs for the two merged
ver-sions of the Mixed Excitation Linear Prediction application. . .
List of tables
Chapter
. An overview of the main hardware estimation approaches relevant to the work in this thesis. . . . An overview of the main existing interconnect estimation approaches
related to the work in this thesis. . .
Chapter
. Overview of the soware complexity metrics employed by the prelimi-nary version of the ipu Modeling Approach (ipuα). . . . Overview of the soware complexity metrics currently employed by the
current ipu modeling approach (ipuβ). . . . Detailed expressions for the NPATH complexity metric for different
state-ments in the ANSI-C language. . .
Chapter
. Summary of the different statistical assumptions, issues, and utilized re-gression techniques, that are employed within the ipu Modeling Ap-proach. . .
Chapter
. e number of kernels and the performance in generating synthesizable HDL for four C-to-HDL compilers. . .
. e measurement speed of the ipu Metrication Tool for different levels of optimizations. . . . Overview of the model performance of several ipu prediction
mod-els targeting different combinations of tools and platforms for hardware resource measures. . . . Overview of the model performance of several ipu prediction models
targeting the Del Workbench Automated Reconfigurable VHDL gener-ator (DWARV)/Xilinx combination for different hardware measures. . . . . Overview of the model performance of several ipu prediction
mod-els targeting the DWARV/Xilinx combination for different application domains. . . . Overview of the performance and validation quality of the main existing
logic estimation approaches. . . . Overview of the performance and validation quality of the main existing
interconnect estimation approaches. . . . Summary of the independent variables of the ipu interconnect
predic-tion model for the number of nets. . .
Chapter
. e area predictions and theoretical speedups for the kernels in Canny Edge Detection (CED). . . . Overview of the area predictions and theoretical speedups for the merged
kernel in the CED application for the subsequent optimizations that were performed. . . . e area predictions and theoretical speedups for the kernels in Mixed
Excitation Linear Prediction (MELP). . . . Results of the analysis of the merging candidates and final merged
ker-nels and the actual synthesis results. . . . Summary of the results of the Q² Profiling Framework and the
partition-ing based on those results. . .
Appendix A
A. An overview of the different applications in the ipu Kernel library.
List of listings
Chapter
. An example of a script in the kernel library with a hook for a particular HLS tool. . . . Pseudo code of the ipu modeling script implemented in the R statistical
computing environment. . . .
Appendix A
A. Invocation of the Xilinx ISE . Synthesis Tool (XST). . . A. Execution of the complete Xilinx ISE . synthesis toolchain. . . A. TCL script used to invoke Catapult-C. . . A. e invocation of the LegUp C-to-Verilog compiler in the LLVM compiler
framework. . . A. e Synopsys invocation in case of LegUp-generated Verilog. . . A. Wrapper script for the SystemRacer HLS tool. . . A. e configuration file that was used in the Altera artus . synthesis
toolchain . . . A. Execution of the complete Altera artus . synthesis toolchain. . . .
List of Acronyms
AIC Akaike’s Information Criterion . . .
AICC Average Information Content Classification . . .
ANN Artificial Neural Network . . .
API Application Programming Interface
ALM Adaptive Logic Module . . .
ANSI-C American National Standards Institute standard for the C programming language
ASAP As Soon As PossibleScheduling . . .
ASIC Application Specific Integrated Circuit . . .
ASIP Application Specific Instruction-set Processor
AST Abstract Syntax Tree . . .
BIC Bayes’ Information Criterion . . .
BFGS Broyden-Fletcher-Goldfarb-Shanno, an ANN training algorithm. . .
BRAM Block RAM, a local block of RAM on a Virtex FPGA . . .
CCU Custom Computing Unit . . .
CDFG Control- and Data Flow Graph . . .
CED Canny Edge Detection . . .
CFG Control Flow Graph . . .
CLB Configurable Logic Block
COCOMO COnstructive COst MOdel . . . CPU Central Processing Unit . . .
DAG Directed Acyclic Graph . . .
DSE Design Space Exploration . . .
DFG Data Flow Graph . . .
DSP Digital Signal Processoror Digital Signal Processing . . .
DWARV Del Workbench Automated Reconfigurable VHDL generator . . .
DWB Del Workbench
FFT Fast Fourier Transform . . .
FIR Finite Impulse Response . . .
FPGA Field Programmable Gate Array . . .
FSM Finite State Machine . . .
GCLP Global Criticality/Local Phase driven algorithm . . .
GLM Generalized Linear Model . . .
GPP General Purpose Processor . . .
GPU Graphical Processing Unit . . .
H-CDFG Hierarchical Control- and Data Flow Graph . . .
HDL Hardware Description Language . . .
HLL High-Level Language . . .
HLS High-Level Synthesis . . .
IDE Integrated Development Environment . . .
ILP Instruction-Level Parallelism . . .
IP Intellectual Property . . .
IR Intermediate Representation . . .
LAB Logic Array Block . . .
LE Logic Element
LLVM Low-Level Virtual Machine, a research compiler infrastructure. . .
LLOC Logical Lines Of Code
LOC Lines Of Code
LOOCV Leave-One-Out Cross-Validation . . .
LPC Linear Predictor Coder, a type of voice coder. . .
LR Linear Regression . . .
LogR Logistic Regression . . .
LUT Look-Up Table . . . M Memory Access Intensity Profiler . . .
MAL Molen Abstraction Layer . . .
MAPE Mean Absolute Percentage Error . . .
MELP Mixed Excitation Linear Prediction . . .
MIBS Mapping and Implementation Bin Selection . . .
MPSoC Multi-Processor System on Chip . . .
MSE Mean Squared Error . . .
MAC Multiply Accumulate instruction . . .
NFR Non-Functional Requirement . . .
NPATH Number of Static Acyclic Paths . . .
OLSR Ordinary Least Squares Regression . . .
OS Operating System . . .
PC Principal Component . . .
PCA Principal Component Analysis . . .
PCM Pulse-code Modulation, a method to encode digitally sampled analog signals.
πISA Polymorphic Instruction Set Architecture . . .
PCR Principal Component Regression . . .
PE Processing Element . . .
PFU Programmable Functional Unit . . .
PGM Portable GrayMap, an image file format defined by the Netpbm project.
PIP Programmable Interconnect Point . . .
PLSR Partial Least Squares Regression . . .
QDU antitative Data Usage Graph . . .
Q-Q Plot antile-antile Plot
ROCCC Riverside Optimizing Compiler for Configurable Computing . . .
RP Reconfigurable Processor . . .
ρμ-code configuration microcode, or Reconfigurable Micro-code, used in the Molen Machine Organization. . .
RMSE Rooted Mean Squared Error . . .
RMSE% Rooted Mean Squared Error as a Percentage of the mean of the
independent variable . . .
RTL Register Transfer Level
SA-C Singe Assignment C . . .
SCM Soware Complexity Metric . . . .
SIMD Single Instruction Multiple Data . . .
SLOC Source Lines Of Code . . .
SoC System on Chip . . .
SR Stepwise Regression . . .
TCL Tool Command Language . . .
TLM Transaction Level Modeling . . .
TTA Transport Triggered Architectures. . . .
VHDL VHSIC Hardware Description Language(VHSIC stands for
Very-High-Speed Integrated Circuit) . . .
Vocoder Voice Coder(a Coder/Decoder for transmission of voice signals.) . . .
XDL Xilinx Design Language . . .
Terminology
In this dissertation, several words that are ambiguous in the English language are uti-lized. In the following, we list the definitions that we utilize for some important words.
ipu Modeling Approach A modeling approach that generates hardware prediction models targeting the early stages of HW/SW Co-design. e aim of this approach is to capture the relation between hardware and soware characteristics as a quan-titative regression model, called a ipu Prediction Model. e ipu Modeling Approach targets hardware prediction for parts of an application, called kernels. is modeling approach is the main contribution of this thesis. For more details see Section ..
ipu Prediction Model A quantitative hardware prediction model generated by the
ipu Modeling Approach. A ipu Prediction model is specifically generated for
a particular combination of a toolchain, a platform, and a hardware characteristic. For more details see Section ..
Kernel A consecutive code segment in the context of a larger application, which
per-forms a set of operations. A kernel can either be a function or a loop nest. In this dissertation, the terms function and kernel are used interchangeably.
Function A kernel that is an independent unit with respect to the rest of the source
code. A function can be executed as a whole by calling the function with a set of parameters. Other words for function are: (sub)routine, procedure, or method.
Error A quantitative measure of the discrepancy between observed values and
pre-dicted values. e error can be given in the units of the measurements or as a percentage. As there are multiple ways to indicate the error, we specify which specific error metric is used, as much as possible. Some error metrics in this thesis are Rooted Mean Squared Error (RMSE), Rooted Mean Squared Error as a Percent-age of the mean of the independent variable (RMSE%), Mean Absolute PercentPercent-age Error (MAPE), and the percentage error.
Model An exactly specified quantitative relation between two domains that can be
mea-sured. In most of the cases, it will denote a (linear) regression model quantifying the relation between soware complexity metrics and hardware measures.
input variables, and explanatory variables.
Dependent variable e variable that a particular model aims to predict. Again, many
equivalent terms exists, such as: response variable, output parameter, and output variable.
CHAPTER
Introduction
“Computers are useless. ey can only give you answers.”
— Pablo Picasso
t
he pervasive use of embedded systems throughout the entire range ofde-vices in our homes, at our work, and even in our pockets, has opened up a new world of possible applications that support and enrich our daily lives. is new territory has awakened an ever growing demand for computing performance and low power consumption. Mobile users, for example, demand high per-formance to see, to listen, to create, to share, and to experience high quality multimedia content without the need to charge their device every few hours. As baery technol-ogy does not scale as fast as computing technoltechnol-ogy, the challenge remains to reduce the power consumption, while still providing increased processing performance. At first glance, these two demands seem contradictory, but are they?
In the past, industry was able to address this challenge by continuing technology scaling. Every new technology node provided increased computing power and con-sumed less energy, compared to the previous node. As Moore’s law predicted, the num-ber of transistors doubled every months for many years and computing performance scaled along with it, as did the reduction of power consumption. However, in the last decade, wire delays, leakage current, and the memory boleneck have ended this scal-ing contest. In addition, there are economic reasons why continued exponential growth of computing power is unrealistic, as indicated by []. While Moore’s law will still apply to transistor scaling for a few years to come, industry has had to find alternative ways to provide additional computing performance at low power budgets. It is interest-ing to study exactly what kind of alternatives are available and what difficulties these alternatives present. We investigate this in the following section.
. Problem Overview
We established that there is an end to the “easy” scaling of computing performance and suppression of power consumption. Industry is looking for alternatives to achieve these goals. Indeed, there has been a steady increase in the use of more parallel and heteroge-neous architectures. ese architectures incorporate multiple and different Processing Elements (PEs), such as Graphical Processing Units (GPUs), Application Specific Inte-grated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), and Digital Signal Processors (DSPs), which can accelerate the computationally intensive parts of an appli-cation or more efficiently execute power-hungry parts. A broad spectrum of such archi-tectures already exists in the industry today, such as the hArtes platform [], IBM’s Cell Broadband Engine [], or nVidia’s Tegra mobile platform []. By mapping computa-tionally intensive parts on the specialized PEs in these architectures, it becomes possible to achieve significant performance improvements, while keeping power consumption in check. One reason for this is that specialized components have advantageous charac-teristics in certain types of computation that let them efficiently execute tasks of those types. Additionally, the parallel use of multiple PEs allows the simultaneous execution of multiple tasks and makes the performance benefits of parallel algorithms available.
An interesting development with regard to these heterogeneous architectures has been the gaining popularity of reconfigurable architectures. e reconfigurable nature of these architectures provide the necessary flexibility to accommodate for changing ap-plication requirements, while, at the same time, they allow for substantial apap-plication speedups, oen at low power costs. is flexibility becomes even more visible, when the reconfigurable fabric is integrated in a microprocessor architecture, so as to easily incorporate and execute specialized accelerators. Recent examples of such tightly cou-pled reconfigurable PEs are Xilinx’ Zynq [] and Altera’s Cyclone V [], which both integrate a dual-core ARM Cortex-A processor with FPGA technology providing per-formance, flexibility, and low-power.
ese architectures provide increased potential for computing performance and re-duced power consumption, while retaining a much needed level of flexibility. However, they pose some challenges for engineers to harness this potential. For instance, effec-tive use of heterogeneous architectures requires engineers to combine knowledge from both soware and hardware engineering. Soware engineering skills are essential to create and maintain more and more complex applications. However, accelerating com-putationally intensive parts of an application on reconfigurable fabrics requires hard-ware engineering skills. However, as engineers that have both sets of skills are scarce, companies struggle to effectively use heterogeneous and reconfigurable platforms. is problem is being addressed in several ways. Some focus on educating engineers in both disciplines to beer cooperate in a co-design seing. Others move to specific co-design languages, such as SystemC [], to tackle this problem. However, there is a knowledge gap between hardware designers and soware developers. In addition, companies oen have large legacy code bases in High-Level Languages (HLLs). For these reasons, there is a clear demand for comprehensive tool support to bridge the gap that engineers are facing and to retarget existing code to heterogeneous and reconfigurable platforms.
Part of that demand is being addressed by High-Level Synthesis (HLS) tools, such as
PROBLEM OVERVIEW Section . GPGPU? FPGA? SIMD? DSP? ASIC? Area Speedup Power Time-to-Market Costs Partitioning Profiling Optimization COTS? HLS Simulation Developer Application
Figure .: In the early stages of HW/SW Co-design, the developer faces many different issues. ere is a clear demand for information to guide the development process in the right direction.
Catapult C [], ROCCC¹ [], or DWARV² []. ese tools automatically generate Hardware Description Language (HDL) descriptions from HLL descriptions, enabling soware developers to map parts of their application to the reconfigurable fabric. In this way, there is no need for detailed hardware design expertise and the design time can be reduced significantly. Notwithstanding, developers have to address many other issues in a short time frame in order to meet the constant time-to-market pressure. Some of these issues include the identification of resource intensive parts of the application, the evaluation of different architectures and mapping options, and the estimation of project costs. ese issues are more apparent in the early design stages, when a huge design space still needs to be explored.
In these early stages, there is a need to quickly evaluate many design alternatives, so as to significantly reduce the design space in a short amount of time. is early Design Space Exploration (DSE) needs to take into account many different hardware aspects, such as performance improvement, resource constraints, power consumption, run-time reconfiguration, and communication time. On the one hand, DSE requires large quantities of information about these and other hardware characteristics, but, on the other hand, there is very lile time to gather this information. As there are many design alternatives, it is not an option to spend more than a few moments in evaluating each alternative. erefore, generating hardware for every part of the application and every possible PE is not feasible. By doing that, DSE would become exceedingly time-consuming, taking up to several months. Even more, in the early design phases, the application may still change many times and this process would have to be repeated. e crux of the maer, therefore, is: how do we obtain the required information in a
timely fashion?
One way to address this lack of information is the use of high-level hardware predic-tion models. Instead of the time-consuming process of actually implementing different design alternatives, the necessary information to evaluate each design is provided by prediction models. It is essential for these predictions models to be fast and accurate. Fast — because many alternatives need to be evaluated in a short amount of time.
Ac-¹ Riverside Optimizing Compiler for Configurable Computing ² Del Workbench Automated Reconfigurable VHDL generator
Kernel Kernel Library Library HLSHLS Tool Tool Hardware Characteristics Statistical Statistical Modeling Modeling C-code C-code Analysis Analysis 1 4 2 5 Quipu Quipu Prediction Prediction Model Model Metrics
(e.g. operators, loops)
3 6
7
Figure .: An overview of the ipu Modeling Approach. A library of soware kernels ① pro-vides the necessary data to generate prediction models. e C-code of the kernels is analyzed ② to obtain a set of metrics ③. ese kernels are also synthesized using HLS tools ④ to obtain hardware characteristics ⑤. Statistical modeling ⑥ captures the relation between these measures and generates a ipu Prediction Model ⑦.
curate — because the evaluation of the predictions should yield in appropriate design decisions. Consequently, it becomes possible to quickly identify resource intensive parts of an application, to evaluate the effect of changes on the cost of the final design, or to select the right PEs for for each application part. As a result, hours or even days can be saved per design iteration.
. e ipu Modeling Approach
In this dissertation, we propose the ipu Modeling Approach that generates hardware prediction models targeting the early stages of HW/SW Co-design. e aim of this ap-proach is to capture the relation between hardware and soware characteristics as a quantitative regression model, called a ipu Prediction Model. e ipu Modeling Approach targets hardware prediction for parts of an application, called kernels. Kernels are consecutive code segments in the context of a larger application, which perform a set of operations. A kernel can be a function or a loop nest, but, with respect to the ipu Modeling Approach, kernels are assumed to be functions.
An overview of the ipu Modeling Approach is given in Figure .. e ipu Modeling Approach utilizes a dataset of soware and hardware characteristics generated from a common base of kernels, in order to empirically determine a quantitative relation between these datasets. First, the C-code of each kernel in the library is analyzed to obtain a set of metrics that characterize the complexity of the soware. In addition, the
THE QUIPU MODELING APPROACH Section . C-code C-code Analysis Analysis Partitioning and Mapping Application Hardware Predictions Metrics
(e.g. operators, loops)
Quipu Quipu Prediction Prediction Model Model 2 3 1 4 5 6
Figure .: An illustration of the utilization of a ipu Prediction Model. An application ① that contains several kernels is considered for partitioning. e C-code of the kernels is analyzed ② to obtain a set of metrics ③. ese metrics are used as input to a ipu Prediction Model ④. is model produces predictions ⑤ for certain hardware characteristics. ese predictions can be used, for example, during hardware/soware partitioning and mapping ⑥.
hardware of each kernel is generated using a HLS tool. By using statistical modeling techniques, the relation between the datasets of soware and hardware characteristics is determined. is relation is then expressed as a ipu Prediction Model. e ipu Modeling Approach generates prediction models based on statistical regression from datasets generated for specific situations. erefore, prediction models can be generated to target different hardware characteristics, different HLS tools, different platforms, and different application domains.
An illustration of the use of a ipu Prediction Model is given in Figure .. In this figure, we see how the C-code of an application that contains several kernels is analyzed to obtain a set of metrics that characterize the soware complexity. A ipu Prediction Model uses these metrics as input and makes hardware predictions, which can be used, for example, during the partitioning and mapping of an application. Two questions that arise, while designing such an approach, are: (a) how can soware characteristics be quantified and determined? (b) how can this quantitative relation be determined?
• How do we measure soware complexity?
e discipline of Soware Measurement provides the concept of Soware Com-plexity Metrics (SCMs) as a way to quantify soware characteristics. e ipu Modeling Approach is based on the assumption that these metrics that describe the complexity of HLL soware descriptions are correlated with the hardware char-acteristics of the generated designs for these descriptions. We provide a detailed investigation into the subject of SCMs in Chapter .
• How do we quantify the relation between hardware and soware?
In statistics, we find a wide range of regression techniques that estimate the re-lation between two sets of data. In this thesis, we aim to use such techniques to
estimate the relation between soware and hardware characteristics. e ipu Modeling Approach utilizes several regression techniques, such as Partial Least Squares Regression (PLSR), the Generalized Linear Model (GLM), and Artificial Neural Networks (ANNs), to achieve this. e regression techniques used in the
ipu Modeling Approach are discussed in Chapter .
. Research Challenges
e problem landscape of hardware estimation for reconfigurable platforms has many facets. In this thesis, not all aspects of this research domain can be addressed. For that reason, we focus on the following challenges, which have been highlighted earlier in this chapter.
Challenge — How can the many design alternatives in the huge design space
for heterogeneous reconfigurable platforms be evaluated in a sufficiently short period of time?
e huge design space inherent to heterogeneous reconfigurable architectures during the early design stages poses real challenges to application developers, that need to meet time-to-market and budget constraints. To this purpose, it is essen-tial to reduce the design space as early as possible by pruning away any infeasible design alternatives. In addition to reducing the design space, it is also important to evaluate the design alternatives in a short period of time. Especially in the highly iterative early design stages, when the application design can change oen. Early hardware resource estimation addresses both these issues. First, it helps to re-duce the design space, as functions that do not meet the resource constraints can be omied from further evaluation. Second, it can provide the necessary hard-ware characteristics needed for the evaluation of design alternatives without the need for time-consuming hardware generation and synthesis. e ipu Model-ing Approach aims to provide such fast estimation models. Because these ipu Prediction Models are simple equations generated using linear regression, the time required to perform hardware prediction is in the order of a few milliseconds per prediction. Additionally, the use of SCMs allows predictions to be made from HLL source code without the need for further hardware generation.
e SCMs are discussed in Chapter and the statistical modeling techniques used by the ipu Modeling Approach are discussed in Chapter .
Challenge — How can hardware characteristics be transparently predicted for
an arbitrary set of HLS tools, while retaining the comparability of these predictions?
Since the introduction of HLS ten years ago, a broad spectrum of HLL-to-HDL tools have become available on the market. As it is important for companies to quickly reduce the design space, fast estimation approaches for each of those tools, as well as for different target platforms, are essential. However, not all tools and platforms offer viable estimation models. Even more, the prediction models that
DISSERTATION CONTRIBUTIONS Section .
do exist offer different levels of granularity, accuracy, and restriction. erefore, it is essential to provide a way of generating prediction models for different HLS tools and target platforms that exhibit the same level of granularity, that have comparable error characteristics, and that assume the same restrictions. On the one hand, such an approach can generate models for tools and platforms for which no estimation approach exists. On the other hand, this enables predictions for dif-ferent HLS tools and target platforms to become more comparable. One of the main features of the ipu Modeling Approach is its ability to generate models for different HLS tools and heterogeneous platforms. Regardless of each tool or platform, ipu Prediction Models are generated with the same level of granu-larity, with comparable levels of accuracy, and where necessary with the same restrictions.
e presentation of the methodology of generating new models using the ipu Mod-eling Approach is presented in Chapter . e evaluation of the accuracy of ipu Prediction Models can be found in Chapter
Challenge — How can we validate the accuracy of a hardware estimation
ap-proach in a realistic, representative, and quantitative manner?
In our evaluation of hardware estimation approaches, it has become apparent that there is a consistent lack of validation material. Almost all authors validate their approach using less than validation points, which poses serious questions about the applicability of these approaches in other contexts. First, it is important to validate an estimation approach with a substantial set of kernels so as to provide the necessary confidence about the reported accuracy and prevent over-optimistic results. Secondly, the validation of hardware estimation in a certain context re-quires the use of a collection of realistic functions that are representative of all the functions in that particular context. At the basis of the ipu Modeling Approach is a kernel library with kernels that are used for validation purposes. ese kernels come from different real applications from a wide range of application domains. Furthermore, the statistical modeling performed by the ipu Model-ing Approach makes it possible to make statements about the confidence for each prediction.
e kernel library utilized by the ipu Modeling Approach is presented in Chapter . e evaluation of the accuracy of ipu Prediction Models can be found in Chapter
. Dissertation Contributions
e focus of this dissertation is on early high-level quantitative prediction modeling for Hw/SW co-design. In this area of research, we have made the following contributions.
Contribution — A High-Level antitative Modeling Approach for early HW/SW
Co-design based on Statistical Methods that effectively captures the cor-relation between hardware and soware.
We present the ipu Modeling Approach, which provides estimation models tar-geting the early stages of HW/SW Co-design. ese estimation models are based on metrics determined from HLL descriptions that characterize the complexity of the soware at hand. A set of regression techniques are utilized to describe how these metrics are related to the different hardware measures. Among those tech-niques are least squares regression, neural networks, and principal component regression. In this context, we show how ipu Prediction Models achieve pre-diction speeds of up to . prepre-dictions per second, while retaining a high degree of accuracy.
Contribution — A set of ipu Prediction Models for various HLS tools,
vari-ous target platforms, varivari-ous hardware measures, andvarivari-ous application domains, demonstrating the generality of the ipu Modeling Approach. e ipu Modeling Approach can be applied to different HLS tools, different target platforms, different hardware measures, and different application domains. We demonstrate how ipu Prediction Models can be generated in a new context by providing a comprehensive example run of the ipu Modeling Methodology. Furthermore, we provide the results for a set of fully operational ipu Prediction Models for four separate combinations of a HLS tool and a target platform, four different application domains, and twelve different hardware measures.
Contribution — Two case studies, where the ipu Modeling Approach is
uti-lized to analyze and to partition an application.
In order to evaluate the practical use of the ipu Prediction Models, we present two case studies in the analysis and partitioning of an application. e first appli-cation is a well-known edge-detection algorithm from the domain of image pro-cessing. e second application is an advanced voice codec featuring good voice quality even at extremely low bit rates. e ipu Prediction Models are used to evaluate the area constraints of the target platform during partitioning. Further-more, we show how the area predictions made by our ipu Prediction Models, together with execution time profiles, can be used to perform a preliminary DSE.
Contribution — An elaborate statistical validation of the predictive quality of
the ipu Prediction Models.
In our work, we validate the ipu Prediction Models using a library of ap-plication kernels from different apap-plications from a wide range of apap-plication domains, contrary to existing approaches. In addition, we evaluate the ipu Pre-diction Models by detailed analyses of their error behavior, in contrast with almost all other approaches, which characterize their results with a single percentage er-ror. Additionally, this dissertation argues the point that more elaborate
investiga-tions of predictive quality are necessary in the field of hardware estimation.
. Dissertation Organization
e remainder of this dissertation is organized in several chapters. First, we survey the field of hardware estimation for reconfigurable platforms in Chapter . en, we inves-tigate how soware complexity can be measured with respect to hardware prediction,
DISSERTATION ORGANIZATION Section .
Chapter 2 Hardware Estimation for Reconfigurable Platforms
Chapter 3 Measuring Software Complexity
for Hardware Estimation
Chapter 4 Statistical and antitative
Prediction Modeling
Chapter 5 The ipu Modeling Approach
Chapter 6 Validation of the ipu
Prediction Models
Chapter 7 ipu Prediction Models
in Practice
Chapter 8 Conclusions Challenge 1
How can the many design alter-natives in the huge design space for heterogeneous reconfigurable platforms be evaluated in a
suffi-ciently short period of time?
Challenge 2
How can hardware characteris-tics be transparently predicted for an arbitrary set of tools, while re-taining the comparability of these
predictions?
Challenge 3
How can we validate the accura-cy of a hardware estimation ap-proach in a realistic,
representati-ve, and quantitative manner?
Contribution 1
A High-Level antitative Modeling Approach for early HW/SW
Co-de-sign based on Statistical Methods that effectively captures the correla-tion between hardware and software.
Contribution 2
A set of ipu Prediction Models for various HLS tools, various target platforms, various hardware measu-res, and various application domains,
demonstrating the generality of the ipu Modeling Approach.
Contribution 3
Two case studies, where the ipu Modeling Approach is utilized to analyze and to
partition an application.
Contribution 4
An elaborate statistical validation of the predictive quality of the
ipu Prediction Models.
Appendix A The ipu Modeling Approach:
Implementation Details
Figure .: An outline of the different chapters, challenges, and contributions in this dissertation. Chapter and Chapter have the same color as they both address the theory at the basis of our approach. Chapter and Chapter have the same color as they both address the validation of ipu Prediction Models. Appendix A gives additional information with regard to Chapter .
in Chapter . Aer that, in Chapter , we discuss in detail the different modeling tech-niques used in the work presented in this thesis. Subsequently, we present the ipu Modeling Approach in Chapter . e ipu Modeling Approach is then evaluated in Chapters and . Finally, we conclude this dissertation in Chapter .
A visual outline of this dissertation is given in Figure .. In this figure, the relation between the different chapters, research challenges, and contributions has been depicted. In the following, we present a brief summary of each chapter.
Chapter — Hardware Estimation for Reconfigurable Platforms
In Chapter , we investigate the problem of hardware estimation for reconfig-urable platforms. e domain of heterogeneous reconfigreconfig-urable architectures poses new challenges in the development of applications that effectively use the differ-ent types of PEs and adapt to changing run-time conditions. e huge design space that developers are confronted with, as they target heterogeneous PEs,
mands fast and early estimation approaches that guide application development, prune the design space, and drive partitioning and mapping. We investigate these issues and identify the role of the ipu Modeling Approach in the problem land-scape. Several research challenges are identified and research assumptions are formulated.
Chapter — Measuring Soware Complexity for Hardware Estimation
Chapter presents an overview of the problem of measuring soware complex-ity as a basis for hardware prediction. e estimation of hardware characteristics based on Linear Regression (LR) targeting HLS tools requires measurement data of the HLL descriptions at hand. e discipline of Soware Measurement pro-vides the necessary tools to acquire these measurements. Focusing on Soware Complexity Metrics (SCMs) at the function– and kernel–level, the ipu Model-ing Approach features a wide range of relevant metrics. ese metrics may be determined at different times during the compilation process to account for HLS optimizations.
Chapter — Statistical and antitative Prediction Modeling
At the basis of the modeling approach presented in this thesis are several statistical modeling techniques. Chapter presents the necessary theory to understand these techniques. A definition of a statistical model is presented and an explanation of simple linear regression is given. Such regression models are based on a set of assumptions, which do not hold true in all cases. For this purpose, a specialized set of regression techniques are utilized, such as the Box-Cox power transform, Artificial Neural Networks (ANNs), and Partial Least Squares Regression (PLSR). As it is important to evaluate the generated models in a comparable and robust manner, the evaluation criteria used in this dissertation are discussed.
Chapter — e ipu Modeling Approach
e theory about measuring soware complexity and statistical quantitative mod-eling discussed in Chapters and , has been implemented in the ipu Modmod-eling Approach. In Chapter , an overview of this modeling approach is presented. e
ipu Modeling Approach consists of several essential components, which are
described in detail. As an example of the generality of the ipu Modeling Ap-proach, the chapter contains a comprehensive description of the generation of a
ipu Prediction Model for a new tool and a new platform. Chapter — Validation of the ipu Prediction Models
Chapter contains the detailed evaluation of the ipu Prediction Models that are generated by the ipu Modeling approach presented in Chapter . is ap-proach generates hardware prediction models for heterogeneous platforms target-ing early DSE. In this chapter, we evaluate the various qualities of the generated prediction models. We investigate the accuracy, the generality, the specificity, and the speed of ipu Prediction Models. To this purpose, models for different hardware measures, different HLS-tools, and different application domains are presented. Additionally, we compare the ipu Modeling Approach against other prominent approaches in the domain of Hardware Estimation. Finally, we inves-tigate the evolution of the ipu Modeling Approach with respect to the quality of the models.
DISSERTATION ORGANIZATION Section .
Chapter — ipu Prediction Models in Practice
e ipu Prediction Models can be used effectively in real scenarios. In Chap-ter , two scenarios, where ipu Prediction Models play an important role, are described: a well-known edge detector and a high-grade voice codec. ese appli-cations are analyzed using profiling information and hardware estimates provided by ipu Prediction Models. e applications are partitioned on a heterogeneous platform utilizing the profiling information and area estimates obtained during application analysis. Several benefits of ipu Prediction Models are discerned.
Chapter — Conclusions
In Chapter , a summary of the work in this dissertation is presented. Several conclusions with respect to the contributions anticipated in the introduction are drawn. Subsequently, the chapter lists several open issues and opportunities for future research.
It is our sincere hope that the reader will find this dissertation an interesting contribution in the fields of hardware estimation and reconfigurable computing. Although the work that we present makes heavy use of statistics — a subject many readers might not be too familiar with — it is our belief that the presentation of the necessary theory and background will help the interested reader to grasp the contents of our work.
CHAPTER
Hardware Estimation for
Reconfigurable Platforms
In this chapter, we investigate the problem of hardware estimation for reconfigurable plat-forms. e domain of heterogeneous reconfigurable architectures poses new challenges in the development of applications that effectively use the different types of processing ele-ments and adapt to changing run-time conditions. e huge design space that developers are confronted with as they target heterogeneous processing elements, demands fast and early estimation approaches that guide application development, prune the design space, and drive partitioning and mapping. We investigate these issues and identify the role of our ipu Modeling Approach in the problem landscape.
t
he rapid advancement of soware development in the last few decadeswas enabled to a great extent by the Von-Neumann Machine (or Stored-program machine) concept []. is machine concept introduces a clear separation between the program and data storage on the one hand, and the program execution on the other. As can be seen in Figure ., the Central Processing Unit (CPU) executes a program that consists of instructions by fetching them from memory. For each instruction, the necessary operands are also fetched from memory before the its execution. Aerwards, the results of the instruction are wrien back to the memory. Inherently, this process results in a sequential flow of data units and control directives between the memory and the processing unit.
e Stored-program machine abstraction provided a flexible model for programming computers, because the computer program was not hardwired and different programs could be executed on the same machine by providing punch cards or other media. In addition, the limited set of instructions provided a simple programming interface, greatly reducing the time to develop a new program. is way, it paved the way for the rapid advancement of soware development in the following decades.
e Von-Neumann machine has been very successful due in no small part to the broad tool and hardware support for the paradigm throughout the soware develop-ment process. Furthermore, the miniaturization of electronics have provided regular speed improvements (Moore’s Law) [] satisfying the ever-increasing need for more computation power. However, as Backus [] pointed out, the concept showed an in-herent boleneck, which he called the Von-Neumann boleneck. Because the processing
Control Control Unit Unit Arithmetic Arithmetic Logic Unit Logic Unit (ALU) (ALU) Program Program Instructions
Instructions DataData
Memory
Memory
Bus
Bus
CPU
CPU
Figure .: An overview of the basic Von-Neumann architecture, with control and data flowing from the memory to the CPU and vice-versa. e communication caused by this setup can limit the speed at which the CPU can operate.
unit and the memory in a Von-Neumann machine are separate, instructions and data have to be moved continually. e sequential nature of this architecture limits the speed one can achieve by exploiting more parallelism, because with increasing levels of par-allelism, the need for additional memory bandwidth also increases. However, as the limits of miniaturization become clear exploiting parallelism is exactly what is needed. Already systems are using multiple processors and multiple cores per processor to pro-vide more computing power and the memory boleneck is becoming harder to solve. If we hope to adequately address this problem, we need to look beyond Von Neumann’s concept of a machine to other models of computation.
. Heterogeneous Reconfigurable Architectures
In recent years, the continuing applicability of Moore’s Law has come into question. For one, wire delays become an increasing problem at higher speeds, and second, the manu-facturing of transistors smaller than a few atoms seems unlikely. Furthermore, a growing demand for mobile technology and other systems with limited power supplies have made the use of fast Von-Neumann processors in such systems difficult if not impossible. To cope with this problem, manufacturers increasingly use specialized accelerators to speed up expensive algorithms in applications, such as media encoding and signal processing. ese accelerators provide additional processing performance and power efficiency for specific types of tasks. In addition, specialized memory hierarchies that provide the nec-essary information to each heterogeneous Processing Element (PE) reduce the impact of the memory boleneck inherent to Von-Neumann architectures. Both of these effects
HETEROGENEOUS RECONFIGURABLE ARCHITECTURES Section .
V
on-Neumann
Re
con
figurable
ASICs
Hete
ro
ge
n
eo
u
s
Performance
F
le
xibility
Figure .: Flexibility versus performance for different computing paradigms. Heterogeneous computing makes use of different types of computing elements that support any of these para-digms. As such, it overlaps with the other parapara-digms.
are important for mobile and embedded systems. Notwithstanding, custom hardware can also help circumvent the need for difficult and expensive technology scaling, by providing increased performance compared to Von-Neumann architectures.
Machines that break Von-Neumann’s model are already in use for some time. For example, Application Specific Integrated Circuits (ASICs) are able to use the parallelism inherent to the problem at hand and to combine processing and storage into their data-path, effectively eliminating the constant flow of directives and data. Special languages, tools, and design methodologies have been developed to make the implementation of ASICs possible. Despite the additional performance they provide, ASICs do not provide the programmability and flexibility of the Stored-Program machine. At the same time, due to the high development costs, the utilization of ASICs has been limited to high volume production.
.. Types of Processing Elements
Apart from ASICs, heterogeneous computing platforms can contain many other types of PEs. Some of the more prominent are listed in the following.