Quantitative hardware prediction modeling for hardware/software co-design

(1)

Roel Meeuws

antitative Hardware Prediction

Modeling for Hardware/Soware

(2)

(3)

antitative Hardware Prediction

Modeling for Hardware/Soware

Co-design

PROEFSCHRIFT

ter verkrijging van de graad van doctor aan de Technische Universiteit Del,

op gezag van de Rector Magniﬁcus prof. ir. K.Ch.A.M. Luyben, voorzier van het College voor Promoties,

in het openbaar te verdedigen

op Woensdag  juli  om : uur

door

Roeland Jan MEEUWS

Ingenieur in Computer Engineering geboren te Roerdam, Nederland

(4)

Samenstelling promotiecommissie:

Rector Magniﬁcus voorzier

Prof. dr. K.L.M. Bertels Technische Universiteit Del, promotor Prof. dr. ir. H.J. Sips Technische Universiteit Del, NL Prof. dr. O. Nieto-Taladriz Universidad Politecnica de Madrid Prof. Dr.-Ing. Michael Hübner Ruhr-Universität Bochum

Prof. dr. J. Champeau École Nationale Supérieure de Techniques Avancées

Dr. A. Pimentel Universiteit van Amsterdam

Dr. E.A. Cator Technische Universiteit Del

Prof. dr. ir. G.-J. Houben Technische Universiteit Del, reservelid

ISBN ----

Cover image: a ipu, an ancient Incan recording device, that consisted of colored threads of wool with a series of knots applied to them, thus signifying diﬀerent quantities and values.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmied, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without permission of the author.

(5)

I dedicate this book to my dear parents

and my beloved Marina.

Blessed is the man

who does not walk in the counsel of the wicked or stand in the way of sinners

or sit in the seat of mockers.

But his delight is in the law of the L , and on his law he meditates day and night. He is like a tree planted by streams of water, which yields its fruit in season

and whose leaf does not wither. Whatever he does prospers.

— Psalm 1:1-3

(6)

(7)

Abstract

h

ardware estimation is an important factor in Hardware/Soware Co-design.

In this dissertation, we present the ipu Modeling Approach, a high-level quantitative prediction model for HW/SW Partitioning using statistical methods. Our approach uses linear regression between soware complex-ity metrics and hardware characteristics. e resulting prediction models provide essen-tial information for such Co-design tasks, as identifying resource intensive parts of the application, helping to evaluate diﬀerent mapping options, and guiding code modiﬁca-tions.

We show that prediction models can be generated for different High Level Synthe-sis tools, reconfigurable devices, hardware measures, and application domains. To this purpose, we present a detailed investigation of several ipu prediction models target-ing each of these different dimensions. In addition, an extensive description is given of the targeting of the ipu Modeling Approach to a new tool and platform within a few days. We evaluate the quality of our models by carefully investigating the error behav-ior, which ranges from .%, for a domain-specific model targeting slices, to .%, for a domain-agnostic model targeting the number of controller states.

As a demonstration of the practical use of ipu Prediction models, we present a case study of two applications. ese applications were analyzed and partitioned for the Molen Machine Organization. We show how ipu prediction models play an important role in evaluating area constraints and performing Design Space Exploration. e two applications had an increased performance of % and %.

(8)

(9)

Acknowledgements

t

here comes an end to everything, as is the way of things. is is just as true

to Ph.D. research as to anything else. It is up to the Ph.D. candidate to end it well. I am very thankful that I have been able to do just that. Although the book in your hands is the proof that my work has been fruitful in the last few years, I cannot and will not claim to have achieved this on my own. Many excellent people have contributed to this work or supported me during the last few years and I would like to try and thank every one of them. Here goes.

Let’s start by thanking my promotor and mentor, Koen Bertels. Koen, together with Stamatis Vassiliadis, you challenged me to start this undertaking and you kept on chal-lenging me during the last few years. His death was a big shock to all of us. May he rest in peace. Regardless, you have supported me even during those days. I value the freedom that you entrusted to me and I hope that I have been able to keep that trust. It has always been a pleasure to discuss and envision new ideas, but also to talk about politics, religion, and so on. I am especially thankful for the dedicated time you took to read my thesis in order for me to schedule the defense before my wedding.

Of course, I also thank Carlo Galuzzi for all the proofreading of my papers and my thesis in the past years. What would Ph.D. students do without people like you that give valuable comments on your writing and practical tips to navigate the research arena? ank you in particular for pushing me to write a journal before writing my thesis. Writing my thesis became so much simpler because of that!

A very special thanks goes out to the orum: Kamana, Faisal, and Arash. I am so thankful that God in His grace put me in an oﬃce with you guys for the last few years! I thank you three for all the discussions, the dinners, the laughs, the prayers… I am thankful to have gained such friends as you. Kamana, thank you for your trust, friendship, and honesty. Faisal, thank you for your friendship, laughs, and the paratha’s of your wife! Arash, thank you for your friendship, discussions on music, and our fruitful cooperation! God bless you guys!

I would also like to thank the Koen, Carlo, the orum, and Imran Ashraf for proof-reading my thesis. Imran, although you are not oﬃcially part of the orum, I do also value your friendship and our joint research. You still have some years ahead of you, but I hope Arash and me have been able to assist you in kick-starting your research. Also, I must not forget Vlad Mihai Sima. Vlad, I cannot count the times that I popped into your

(10)

was not for your help and your DWARV compiler, I would not have been able to ﬁnish this work!

Good research also requires good technical and administrative support. In that re-spect, I would like to thank several excellent people. First, Lidwina, thank you for always guiding me through the jungle of ﬁnancial, administrative, and other paperwork. I’ve also enjoyed our chats on other subjects as well. Second, Bert, your ﬂexibility in provid-ing any resources that were necessary for continuprovid-ing my research have been essential. irdly, Erik and Eef, you guys have been a great support. Whether it was installing an extra soware package, puing up with me when I crashed the high-performance machines again, or just having a chat, I could count on you guys!

ere are many more people that deserve special mention, but I hope they can for-give me for being only human. I do want to thank the following colleagues as well, in no particular order: Stephan Wong, Mojtaba Sabeghi, Razvan Nane, Chunyang Gou, Sorin Cotofana, Radu Ştefan, Ghazaleh Nazarian, Catalin Ciobanu, Maghyar Shahsavari, Laiq Hasan, Seyab Khan, Omar Esli Jimenez Villareal, Bogdan Spinean, Demid Borodin, Chris-tos Strydis, Arnaldo Azevedo, Yi Lu, omas Marconi, Hans van Someren, Georgi Gay-dadjiev, Georgi Kuzmanov, Elena Moscu Panainte, Cor Meenderinck, Ozana Dragomir-Azevedo, Lotﬁ Mhamdi, and the many others that I have failed to mention here!

I would also like to thank my friends here in Holland that had to miss my presence on many an occasion. I am sorry for not always being there, but grateful for your continued friendship! I thank, in particular, Johan Kok, Jeroen Brosky, David Speters, Hans Linker, Anoeska de Bonte, Carine van der Ham, Martijn Nijhoﬀ, Gerard Aalbers, my fellow bible study group members, the members of the “Christelijk Gemengd Koor Delfshaven”, the members of the “Spangen Gospel Choir”, and many more.

Special thanks and appreciation goes to my wife-to-be. Marina you are the light of my life and I am so very grateful for every second with you. Even more, I am thankful for the fact that you pushed me to ﬁnish this book as soon as possible.

Of all people, my greatest thanks goes out to my loving parents. Mom, Dad, I love you and am forever thankful for all the time, energy, love, guidance, and what have you more, that you have always given me. You’re the best!

en there is nothing else to do, but thank Him who is the creator of the heavens and the earth. To God be all the glory, praise, and thanks, forever and ever, in Jesus Name, amen.

Roel Meeuws Del, e Netherlands, April 

(11)

Abstract ⅲ Acknowledgements ⅴ Table of contents ⅶ List of ﬁgures ⅹⅲ List of tables ⅹⅶ List of listings ⅹⅸ List of Acronyms ⅹⅺ Terminology ⅹⅹⅴ  Introduction  . Problem Overview . . .  . e ipu Modeling Approach . . . .  . Research Challenges . . .  . Dissertation Contributions . . .  . Dissertation Organization . . . 

 Hardware Estimation for Reconﬁgurable Platforms 

. Heterogeneous Reconﬁgurable Architectures . . .  .. Types of Processing Elements . . .  .. Reconﬁgurable Computing . . .  . Hardware/Soware Co-Design . . . 

(12)

. Hardware Estimation . . .  .. Evaluation Criteria . . .  .. Logic Resource Estimation . . .  .. Interconnect Estimation . . .  .. Validation in Hardware Estimation . . .  . Project Context . . .  .. Molen Abstraction Layer . . .  .. Del Workbench . . .  . Research Demarcation . . .  . Summary . . . 

 Measuring Soware Complexity for Hardware Estimation 

. Soware Measurement . . .  . Classifying Soware Metrics . . .  .. Entities and Aributes . . .  .. Scales of Measurement . . .  .. Static and Dynamic Metrics . . .  .. Levels of design . . .  . Implementation Issues . . .  . Soware Complexity Metrics . . .  .. Halstead’s Soware Science Metrics . . .  .. Average Information Content Classification . . .  .. Scope Number and Scope Ratio . . .  .. McCabe’s Cyclomatic Complexity . . .  .. Nesting Depth . . .  .. Piwowarski’s complexity . . .  .. Gong and Schmidt Complexity . . .  .. Loop Complexity . . .  .. (Modified) Basili-Hutchens Complexity . . .  .. Tai’sDU(G)Metric . . .  .. Oviedo’s Data Complexity . . .  .. Elsho’s Dataflow Complexity . . .  .. (Source) Lines of Code . . . 

(13)

.. Prather’s Testing Metric . . .  .. NPATH . . .  .. Operators, blocks, and variables . . .  .. Caveats . . .  . Summary . . . 

 Statistical and antitative Prediction Modeling 

. Linear Model Deﬁnition . . .  .. Least Squares Regression . . .  . Linear Regression Assumptions . . .  .. Normality . . .  .. Homoscedasticity . . .  .. Independence . . .  .. Linearity . . .  . Modeling Issues . . .  .. Collinearity . . .  .. Non-linearity . . .  .. Sparse data . . .  .. Outliers . . .  . Regression Techniques in the ipu Modeling Approach . . . .  .. Box-Cox power transform . . .  .. Generalized Linear Model . . .  .. Principal Component Regression . . .  .. Partial Least Squares Regression . . .  .. Variable Selection . . .  .. Artiﬁcial Neural Networks for Regression . . .  .. Multi-Level Models . . .  . Model Evaluation . . .  .. Summarizing the prediction error . . .  .. Visualization of the errors . . .  .. Cross-validation . . .  . Summary . . . 

 e ipu Modeling Approach 

. Modeling Methodology . . .  . Kernel Library . . . 

(14)

.. Semi-Automatic Modeling Process . . .  .. Other tools . . .  . Targeting the ipu Modeling Approach to LegUp/Synopsys . . . .  . Summary . . . 

 Validation of the ipu Prediction Models 

. Criteria of Evaluation . . .  . Experimental Setup . . .  . Speed of ipu Prediction Models . . . .  . Domain-agnostic modeling . . .  .. HLS tools . . .  .. Hardware Characteristics . . .  . Domain-speciﬁc modeling . . .  . Comparison with other approaches . . .  .. Logic Utilization Estimation Approaches . . .  .. Interconnect Estimation Approaches . . .  . Analysis of ipu Prediction Models . . .  . Evolution of ipu Model ality . . .  . Summary . . . 

 ipu Prediction Models in Practice 

. Case Descriptions . . .  .. Q² Proﬁling Framework . . .  .. Objective . . .  .. Applications . . .  .. Target Platform . . .  . Q² Analysis and Partitioning . . .  .. Canny Edge Detection . . .  .. Mixed Excitation Linear Prediction Vocoder . . .  . Discussion . . .  . Summary . . . 

 Conclusions 

. Summary . . . 

(15)

. Main Contributions . . .  . Research Opportunities . . .  A Implementation details  Bibliography  List of Publications  Samenvatting en Stellingen  Curriculum Vitae  ⅺ

(16)

(17)

List of figures

Chapter 

. In the early stages of HW/SW Co-design, the developer faces many dif-ferent issues. . .  . An overview of the ipu Modeling Approach. . . .  . An illustration of the utilization of a ipu Prediction Model. . . .  . An outline of the diﬀerent chapters, challenges, and contributions in this

dissertation. . . 

Chapter 

. An overview of the basic Von-Neumann architecture. . .  . Flexibility versus performance for diﬀerent computing paradigms. . .  . e layout of a typical island-style FPGA with Conﬁgurable Logic Blocks,

Programmable Interconnect Points, and switch boxes (s). . .  . A basic Logic Element with a K-input Look-Up Table, a ﬂip-ﬂop, and an

output multiplexer. . .  . An overview of the Molen Platform with an indication of the ﬂow of

instructions through the platform. . .  . Overview of the Del Workbench tool chain. . . 

Chapter 

. An overview of the diﬀerent classiﬁcations with regard to Soware Com-plexity Metrics, that can be found in Section .. . . 

(18)

. A visualization of the quantitative relation between hardware and so-ware, as captured by a certain model f (x). . .  . Examples of diﬀerent graphs related assumptions in regression models. .  . A graphical representation of a covariance matrix. . .  . Examples of diﬀerent distributions from the exponential family. . .  . Screeplot of the Principal Component Analysis of Soware Complexity

Metrics. . .  . An example of a single neuron and a feed-forward Artiﬁcial Neural

Net-work with a -- topology. . .  . e visualization of the error distribution and error trend with respect

to the observed values. . .  . An example ofK-fold cross-validation. . . 

Chapter 

. A detailed overview of the ipu Modeling Approach with its different tools and components. . .  . Model quality given different zero-thresholds. . .  . An overview of different tool and platform combinations targeted by the

ipu Modeling Approach. . . . 

Chapter 

. An overview of the prediction performance of the total area for diﬀerent

ipu Prediction Models generated for four diﬀerent High-Level

Syn-thesis tools. . .  . An overview of the prediction performance of the area for diﬀerent ipu

Prediction Models for four diﬀerent application domains. . .  . e evolution of the ipu Modeling Approach and its predecessors. . . 

Chapter 

. An overview of the Q² Proﬁling Framework. . .  . e results of the diﬀerent steps in the Canny Edge Detection algorithm

as applied to the commonly used Lena-picture. . .  . e Pareto-front of the diﬀerent mapping options for the Canny Edge

Detection application. . . 

(19)

. e Annotated antitative Data Usage Graph for the Canny Edge De-tection application modiﬁed for hardware mapping. . .  . e Pareto-front for the diﬀerent mapping options for the Mixed

Excita-tion Linear PredicExcita-tion applicaExcita-tion. . .  . e Annotated antitative Data Usage graphs for the two merged

ver-sions of the Mixed Excitation Linear Prediction application. . . 

(20)

(21)

List of tables

Chapter 

. An overview of the main hardware estimation approaches relevant to the work in this thesis. . .  . An overview of the main existing interconnect estimation approaches

related to the work in this thesis. . . 

Chapter 

. Overview of the soware complexity metrics employed by the prelimi-nary version of the ipu Modeling Approach (ipuα). . .  . Overview of the soware complexity metrics currently employed by the

current ipu modeling approach (ipuβ). . .  . Detailed expressions for the NPATH complexity metric for diﬀerent

state-ments in the ANSI-C language. . . 

Chapter 

. Summary of the diﬀerent statistical assumptions, issues, and utilized re-gression techniques, that are employed within the ipu Modeling Ap-proach. . . 

Chapter 

. e number of kernels and the performance in generating synthesizable HDL for four C-to-HDL compilers. . . 

(22)

. e measurement speed of the ipu Metrication Tool for diﬀerent levels of optimizations. . .  . Overview of the model performance of several ipu prediction

mod-els targeting diﬀerent combinations of tools and platforms for hardware resource measures. . .  . Overview of the model performance of several ipu prediction models

targeting the Del Workbench Automated Reconﬁgurable VHDL gener-ator (DWARV)/Xilinx combination for diﬀerent hardware measures. . . .  . Overview of the model performance of several ipu prediction

mod-els targeting the DWARV/Xilinx combination for diﬀerent application domains. . .  . Overview of the performance and validation quality of the main existing

logic estimation approaches. . .  . Overview of the performance and validation quality of the main existing

interconnect estimation approaches. . .  . Summary of the independent variables of the ipu interconnect

predic-tion model for the number of nets. . . 

Chapter 

. e area predictions and theoretical speedups for the kernels in Canny Edge Detection (CED). . .  . Overview of the area predictions and theoretical speedups for the merged

kernel in the CED application for the subsequent optimizations that were performed. . .  . e area predictions and theoretical speedups for the kernels in Mixed

Excitation Linear Prediction (MELP). . .  . Results of the analysis of the merging candidates and ﬁnal merged

ker-nels and the actual synthesis results. . .  . Summary of the results of the Q² Proﬁling Framework and the

partition-ing based on those results. . . 

Appendix A

A. An overview of the  diﬀerent applications in the ipu Kernel library. 

(23)

List of listings

Chapter 

. An example of a script in the kernel library with a hook for a particular HLS tool. . .  . Pseudo code of the ipu modeling script implemented in the R statistical

computing environment. . . . 

Appendix A

A. Invocation of the Xilinx ISE . Synthesis Tool (XST). . .  A. Execution of the complete Xilinx ISE . synthesis toolchain. . .  A. TCL script used to invoke Catapult-C. . .  A. e invocation of the LegUp C-to-Verilog compiler in the LLVM compiler

framework. . .  A. e Synopsys invocation in case of LegUp-generated Verilog. . .  A. Wrapper script for the SystemRacer HLS tool. . .  A. e conﬁguration ﬁle that was used in the Altera artus . synthesis

toolchain . . .  A. Execution of the complete Altera artus . synthesis toolchain. . . . 

(24)

(25)

List of Acronyms

AIC Akaike’s Information Criterion . . . 

AICC Average Information Content Classiﬁcation . . . 

ANN Artiﬁcial Neural Network . . . 

API Application Programming Interface

ALM Adaptive Logic Module . . . 

ANSI-C American National Standards Institute standard for the C programming language

ASAP As Soon As PossibleScheduling . . . 

ASIC Application Speciﬁc Integrated Circuit . . . 

ASIP Application Speciﬁc Instruction-set Processor

AST Abstract Syntax Tree . . . 

BIC Bayes’ Information Criterion . . . 

BFGS Broyden-Fletcher-Goldfarb-Shanno, an ANN training algorithm. . . 

BRAM Block RAM, a local block of RAM on a Virtex FPGA . . . 

CCU Custom Computing Unit . . . 

CDFG Control- and Data Flow Graph . . . 

CED Canny Edge Detection . . . 

CFG Control Flow Graph . . . 

CLB Conﬁgurable Logic Block

COCOMO COnstructive COst MOdel . . .  CPU Central Processing Unit . . . 

DAG Directed Acyclic Graph . . . 

DSE Design Space Exploration . . . 

DFG Data Flow Graph . . . 

DSP Digital Signal Processoror Digital Signal Processing . . . 

DWARV Del Workbench Automated Reconﬁgurable VHDL generator . . . 

DWB Del Workbench

(26)

FFT Fast Fourier Transform . . . 

FIR Finite Impulse Response . . . 

FPGA Field Programmable Gate Array . . . 

FSM Finite State Machine . . . 

GCLP Global Criticality/Local Phase driven algorithm . . . 

GLM Generalized Linear Model . . . 

GPP General Purpose Processor . . . 

GPU Graphical Processing Unit . . . 

H-CDFG Hierarchical Control- and Data Flow Graph . . . 

HDL Hardware Description Language . . . 

HLL High-Level Language . . . 

HLS High-Level Synthesis . . . 

IDE Integrated Development Environment . . . 

ILP Instruction-Level Parallelism . . . 

IP Intellectual Property . . . 

IR Intermediate Representation . . . 

LAB Logic Array Block . . . 

LE Logic Element

LLVM Low-Level Virtual Machine, a research compiler infrastructure. . . 

LLOC Logical Lines Of Code

LOC Lines Of Code

LOOCV Leave-One-Out Cross-Validation . . . 

LPC Linear Predictor Coder, a type of voice coder. . . 

LR Linear Regression . . . 

LogR Logistic Regression . . . 

LUT Look-Up Table . . .  M Memory Access Intensity Proﬁler . . . 

MAL Molen Abstraction Layer . . . 

MAPE Mean Absolute Percentage Error . . . 

MELP Mixed Excitation Linear Prediction . . . 

MIBS Mapping and Implementation Bin Selection . . . 

MPSoC Multi-Processor System on Chip . . . 

MSE Mean Squared Error . . . 

MAC Multiply Accumulate instruction . . . 

(27)

NFR Non-Functional Requirement . . . 

NPATH Number of Static Acyclic Paths . . . 

OLSR Ordinary Least Squares Regression . . . 

OS Operating System . . . 

PC Principal Component . . . 

PCA Principal Component Analysis . . . 

PCM Pulse-code Modulation, a method to encode digitally sampled analog signals.

πISA Polymorphic Instruction Set Architecture . . . 

PCR Principal Component Regression . . . 

PE Processing Element . . . 

PFU Programmable Functional Unit . . . 

PGM Portable GrayMap, an image ﬁle format deﬁned by the Netpbm project.

PIP Programmable Interconnect Point . . . 

PLSR Partial Least Squares Regression . . . 

QDU antitative Data Usage Graph . . . 

Q-Q Plot antile-antile Plot

ROCCC Riverside Optimizing Compiler for Conﬁgurable Computing . . . 

RP Reconﬁgurable Processor . . . 

ρμ-code conﬁguration microcode, or Reconﬁgurable Micro-code, used in the Molen Machine Organization. . . 

RMSE Rooted Mean Squared Error . . . 

RMSE% Rooted Mean Squared Error as a Percentage of the mean of the

independent variable . . . 

RTL Register Transfer Level

SA-C Singe Assignment C . . . 

SCM Soware Complexity Metric . . . .

SIMD Single Instruction Multiple Data . . . 

SLOC Source Lines Of Code . . . 

SoC System on Chip . . . 

SR Stepwise Regression . . . 

TCL Tool Command Language . . . 

TLM Transaction Level Modeling . . . 

TTA Transport Triggered Architectures. . . .

VHDL VHSIC Hardware Description Language(VHSIC stands for

Very-High-Speed Integrated Circuit) . . . 

Vocoder Voice Coder(a Coder/Decoder for transmission of voice signals.) . . . 

(28)

XDL Xilinx Design Language . . . 

(29)

Terminology

In this dissertation, several words that are ambiguous in the English language are uti-lized. In the following, we list the deﬁnitions that we utilize for some important words.

ipu Modeling Approach A modeling approach that generates hardware prediction models targeting the early stages of HW/SW Co-design. e aim of this approach is to capture the relation between hardware and soware characteristics as a quan-titative regression model, called a ipu Prediction Model. e ipu Modeling Approach targets hardware prediction for parts of an application, called kernels. is modeling approach is the main contribution of this thesis. For more details see Section ..

ipu Prediction Model A quantitative hardware prediction model generated by the

ipu Modeling Approach. A ipu Prediction model is speciﬁcally generated for

a particular combination of a toolchain, a platform, and a hardware characteristic. For more details see Section ..

Kernel A consecutive code segment in the context of a larger application, which

per-forms a set of operations. A kernel can either be a function or a loop nest. In this dissertation, the terms function and kernel are used interchangeably.

Function A kernel that is an independent unit with respect to the rest of the source

code. A function can be executed as a whole by calling the function with a set of parameters. Other words for function are: (sub)routine, procedure, or method.

Error A quantitative measure of the discrepancy between observed values and

pre-dicted values. e error can be given in the units of the measurements or as a percentage. As there are multiple ways to indicate the error, we specify which speciﬁc error metric is used, as much as possible. Some error metrics in this thesis are Rooted Mean Squared Error (RMSE), Rooted Mean Squared Error as a Percent-age of the mean of the independent variable (RMSE%), Mean Absolute PercentPercent-age Error (MAPE), and the percentage error.

Model An exactly speciﬁed quantitative relation between two domains that can be

mea-sured. In most of the cases, it will denote a (linear) regression model quantifying the relation between soware complexity metrics and hardware measures.

(30)

input variables, and explanatory variables.

Dependent variable e variable that a particular model aims to predict. Again, many

equivalent terms exists, such as: response variable, output parameter, and output variable.

(31)

CHAPTER

󿾱

Introduction

“Computers are useless. ey can only give you answers.”

— Pablo Picasso

t

he pervasive use of embedded systems throughout the entire range of

de-vices in our homes, at our work, and even in our pockets, has opened up a new world of possible applications that support and enrich our daily lives. is new territory has awakened an ever growing demand for computing performance and low power consumption. Mobile users, for example, demand high per-formance to see, to listen, to create, to share, and to experience high quality multimedia content without the need to charge their device every few hours. As baery technol-ogy does not scale as fast as computing technoltechnol-ogy, the challenge remains to reduce the power consumption, while still providing increased processing performance. At ﬁrst glance, these two demands seem contradictory, but are they?

In the past, industry was able to address this challenge by continuing technology scaling. Every new technology node provided increased computing power and con-sumed less energy, compared to the previous node. As Moore’s law predicted, the num-ber of transistors doubled every  months for many years and computing performance scaled along with it, as did the reduction of power consumption. However, in the last decade, wire delays, leakage current, and the memory boleneck have ended this scal-ing contest. In addition, there are economic reasons why continued exponential growth of computing power is unrealistic, as indicated by []. While Moore’s law will still apply to transistor scaling for a few years to come, industry has had to ﬁnd alternative ways to provide additional computing performance at low power budgets. It is interest-ing to study exactly what kind of alternatives are available and what diﬃculties these alternatives present. We investigate this in the following section.

(32)

. Problem Overview

We established that there is an end to the “easy” scaling of computing performance and suppression of power consumption. Industry is looking for alternatives to achieve these goals. Indeed, there has been a steady increase in the use of more parallel and heteroge-neous architectures. ese architectures incorporate multiple and different Processing Elements (PEs), such as Graphical Processing Units (GPUs), Application Specific Inte-grated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), and Digital Signal Processors (DSPs), which can accelerate the computationally intensive parts of an appli-cation or more efficiently execute power-hungry parts. A broad spectrum of such archi-tectures already exists in the industry today, such as the hArtes platform [], IBM’s Cell Broadband Engine [], or nVidia’s Tegra mobile platform []. By mapping computa-tionally intensive parts on the specialized PEs in these architectures, it becomes possible to achieve significant performance improvements, while keeping power consumption in check. One reason for this is that specialized components have advantageous charac-teristics in certain types of computation that let them efficiently execute tasks of those types. Additionally, the parallel use of multiple PEs allows the simultaneous execution of multiple tasks and makes the performance benefits of parallel algorithms available.

An interesting development with regard to these heterogeneous architectures has been the gaining popularity of reconfigurable architectures. e reconfigurable nature of these architectures provide the necessary flexibility to accommodate for changing ap-plication requirements, while, at the same time, they allow for substantial apap-plication speedups, oen at low power costs. is flexibility becomes even more visible, when the reconfigurable fabric is integrated in a microprocessor architecture, so as to easily incorporate and execute specialized accelerators. Recent examples of such tightly cou-pled reconfigurable PEs are Xilinx’ Zynq [] and Altera’s Cyclone V [], which both integrate a dual-core ARM Cortex-A processor with FPGA technology providing per-formance, flexibility, and low-power.

ese architectures provide increased potential for computing performance and re-duced power consumption, while retaining a much needed level of flexibility. However, they pose some challenges for engineers to harness this potential. For instance, effec-tive use of heterogeneous architectures requires engineers to combine knowledge from both soware and hardware engineering. Soware engineering skills are essential to create and maintain more and more complex applications. However, accelerating com-putationally intensive parts of an application on reconfigurable fabrics requires hard-ware engineering skills. However, as engineers that have both sets of skills are scarce, companies struggle to effectively use heterogeneous and reconfigurable platforms. is problem is being addressed in several ways. Some focus on educating engineers in both disciplines to beer cooperate in a co-design seing. Others move to specific co-design languages, such as SystemC [], to tackle this problem. However, there is a knowledge gap between hardware designers and soware developers. In addition, companies oen have large legacy code bases in High-Level Languages (HLLs). For these reasons, there is a clear demand for comprehensive tool support to bridge the gap that engineers are facing and to retarget existing code to heterogeneous and reconfigurable platforms.

Part of that demand is being addressed by High-Level Synthesis (HLS) tools, such as

(33)

PROBLEM OVERVIEW Section . GPGPU? FPGA? SIMD? DSP? ASIC? Area Speedup Power Time-to-Market Costs Partitioning Profiling Optimization COTS? HLS Simulation Developer Application

Figure .: In the early stages of HW/SW Co-design, the developer faces many diﬀerent issues. ere is a clear demand for information to guide the development process in the right direction.

Catapult C [], ROCCC¹ [], or DWARV² []. ese tools automatically generate Hardware Description Language (HDL) descriptions from HLL descriptions, enabling soware developers to map parts of their application to the reconfigurable fabric. In this way, there is no need for detailed hardware design expertise and the design time can be reduced significantly. Notwithstanding, developers have to address many other issues in a short time frame in order to meet the constant time-to-market pressure. Some of these issues include the identification of resource intensive parts of the application, the evaluation of different architectures and mapping options, and the estimation of project costs. ese issues are more apparent in the early design stages, when a huge design space still needs to be explored.

In these early stages, there is a need to quickly evaluate many design alternatives, so as to significantly reduce the design space in a short amount of time. is early Design Space Exploration (DSE) needs to take into account many different hardware aspects, such as performance improvement, resource constraints, power consumption, run-time reconfiguration, and communication time. On the one hand, DSE requires large quantities of information about these and other hardware characteristics, but, on the other hand, there is very lile time to gather this information. As there are many design alternatives, it is not an option to spend more than a few moments in evaluating each alternative. erefore, generating hardware for every part of the application and every possible PE is not feasible. By doing that, DSE would become exceedingly time-consuming, taking up to several months. Even more, in the early design phases, the application may still change many times and this process would have to be repeated. e crux of the maer, therefore, is: how do we obtain the required information in a

timely fashion?

One way to address this lack of information is the use of high-level hardware predic-tion models. Instead of the time-consuming process of actually implementing diﬀerent design alternatives, the necessary information to evaluate each design is provided by prediction models. It is essential for these predictions models to be fast and accurate. Fast — because many alternatives need to be evaluated in a short amount of time.

Ac-¹ Riverside Optimizing Compiler for Conﬁgurable Computing ² Del Workbench Automated Reconﬁgurable VHDL generator

(34)

Kernel Kernel Library Library _HLS_HLS Tool Tool Hardware Characteristics Statistical Statistical Modeling Modeling C-code C-code Analysis Analysis 1 4 2 5 Quipu Quipu Prediction Prediction Model Model Metrics

(e.g. operators, loops)

3 6

7

Figure .: An overview of the ipu Modeling Approach. A library of soware kernels ① pro-vides the necessary data to generate prediction models. e C-code of the kernels is analyzed ② to obtain a set of metrics ③. ese kernels are also synthesized using HLS tools ④ to obtain hardware characteristics ⑤. Statistical modeling ⑥ captures the relation between these measures and generates a ipu Prediction Model ⑦.

curate — because the evaluation of the predictions should yield in appropriate design decisions. Consequently, it becomes possible to quickly identify resource intensive parts of an application, to evaluate the eﬀect of changes on the cost of the ﬁnal design, or to select the right PEs for for each application part. As a result, hours or even days can be saved per design iteration.

. e ipu Modeling Approach

In this dissertation, we propose the ipu Modeling Approach that generates hardware prediction models targeting the early stages of HW/SW Co-design. e aim of this ap-proach is to capture the relation between hardware and soware characteristics as a quantitative regression model, called a ipu Prediction Model. e ipu Modeling Approach targets hardware prediction for parts of an application, called kernels. Kernels are consecutive code segments in the context of a larger application, which perform a set of operations. A kernel can be a function or a loop nest, but, with respect to the ipu Modeling Approach, kernels are assumed to be functions.

An overview of the ipu Modeling Approach is given in Figure .. e ipu Modeling Approach utilizes a dataset of soware and hardware characteristics generated from a common base of kernels, in order to empirically determine a quantitative relation between these datasets. First, the C-code of each kernel in the library is analyzed to obtain a set of metrics that characterize the complexity of the soware. In addition, the

(35)

THE QUIPU MODELING APPROACH Section . C-code C-code Analysis Analysis Partitioning and Mapping Application Hardware Predictions Metrics

(e.g. operators, loops)

Quipu Quipu Prediction Prediction Model Model 2 3 1 4 5 6

Figure .: An illustration of the utilization of a ipu Prediction Model. An application ① that contains several kernels is considered for partitioning. e C-code of the kernels is analyzed ② to obtain a set of metrics ③. ese metrics are used as input to a ipu Prediction Model ④. is model produces predictions ⑤ for certain hardware characteristics. ese predictions can be used, for example, during hardware/soware partitioning and mapping ⑥.

hardware of each kernel is generated using a HLS tool. By using statistical modeling techniques, the relation between the datasets of soware and hardware characteristics is determined. is relation is then expressed as a ipu Prediction Model. e ipu Modeling Approach generates prediction models based on statistical regression from datasets generated for specific situations. erefore, prediction models can be generated to target different hardware characteristics, different HLS tools, different platforms, and different application domains.

An illustration of the use of a ipu Prediction Model is given in Figure .. In this ﬁgure, we see how the C-code of an application that contains several kernels is analyzed to obtain a set of metrics that characterize the soware complexity. A ipu Prediction Model uses these metrics as input and makes hardware predictions, which can be used, for example, during the partitioning and mapping of an application. Two questions that arise, while designing such an approach, are: (a) how can soware characteristics be quantiﬁed and determined? (b) how can this quantitative relation be determined?

• How do we measure soware complexity?

e discipline of Soware Measurement provides the concept of Soware Com-plexity Metrics (SCMs) as a way to quantify soware characteristics. e ipu Modeling Approach is based on the assumption that these metrics that describe the complexity of HLL soware descriptions are correlated with the hardware char-acteristics of the generated designs for these descriptions. We provide a detailed investigation into the subject of SCMs in Chapter .

• How do we quantify the relation between hardware and soware?

In statistics, we ﬁnd a wide range of regression techniques that estimate the re-lation between two sets of data. In this thesis, we aim to use such techniques to

(36)

estimate the relation between soware and hardware characteristics. e ipu Modeling Approach utilizes several regression techniques, such as Partial Least Squares Regression (PLSR), the Generalized Linear Model (GLM), and Artiﬁcial Neural Networks (ANNs), to achieve this. e regression techniques used in the

ipu Modeling Approach are discussed in Chapter .

. Research Challenges

e problem landscape of hardware estimation for reconﬁgurable platforms has many facets. In this thesis, not all aspects of this research domain can be addressed. For that reason, we focus on the following challenges, which have been highlighted earlier in this chapter.

Challenge  — How can the many design alternatives in the huge design space

for heterogeneous reconﬁgurable platforms be evaluated in a suﬃciently short period of time?

e huge design space inherent to heterogeneous reconﬁgurable architectures during the early design stages poses real challenges to application developers, that need to meet time-to-market and budget constraints. To this purpose, it is essen-tial to reduce the design space as early as possible by pruning away any infeasible design alternatives. In addition to reducing the design space, it is also important to evaluate the design alternatives in a short period of time. Especially in the highly iterative early design stages, when the application design can change oen. Early hardware resource estimation addresses both these issues. First, it helps to re-duce the design space, as functions that do not meet the resource constraints can be omied from further evaluation. Second, it can provide the necessary hard-ware characteristics needed for the evaluation of design alternatives without the need for time-consuming hardware generation and synthesis. e ipu Model-ing Approach aims to provide such fast estimation models. Because these ipu Prediction Models are simple equations generated using linear regression, the time required to perform hardware prediction is in the order of a few milliseconds per prediction. Additionally, the use of SCMs allows predictions to be made from HLL source code without the need for further hardware generation.

e SCMs are discussed in Chapter  and the statistical modeling techniques used by the ipu Modeling Approach are discussed in Chapter .

Challenge  — How can hardware characteristics be transparently predicted for

an arbitrary set of HLS tools, while retaining the comparability of these predictions?

Since the introduction of HLS ten years ago, a broad spectrum of HLL-to-HDL tools have become available on the market. As it is important for companies to quickly reduce the design space, fast estimation approaches for each of those tools, as well as for diﬀerent target platforms, are essential. However, not all tools and platforms oﬀer viable estimation models. Even more, the prediction models that

(37)

DISSERTATION CONTRIBUTIONS Section .

do exist offer different levels of granularity, accuracy, and restriction. erefore, it is essential to provide a way of generating prediction models for different HLS tools and target platforms that exhibit the same level of granularity, that have comparable error characteristics, and that assume the same restrictions. On the one hand, such an approach can generate models for tools and platforms for which no estimation approach exists. On the other hand, this enables predictions for dif-ferent HLS tools and target platforms to become more comparable. One of the main features of the ipu Modeling Approach is its ability to generate models for different HLS tools and heterogeneous platforms. Regardless of each tool or platform, ipu Prediction Models are generated with the same level of granu-larity, with comparable levels of accuracy, and where necessary with the same restrictions.

e presentation of the methodology of generating new models using the ipu Mod-eling Approach is presented in Chapter . e evaluation of the accuracy of ipu Prediction Models can be found in Chapter 

Challenge  — How can we validate the accuracy of a hardware estimation

ap-proach in a realistic, representative, and quantitative manner?

In our evaluation of hardware estimation approaches, it has become apparent that there is a consistent lack of validation material. Almost all authors validate their approach using less than  validation points, which poses serious questions about the applicability of these approaches in other contexts. First, it is important to validate an estimation approach with a substantial set of kernels so as to provide the necessary confidence about the reported accuracy and prevent over-optimistic results. Secondly, the validation of hardware estimation in a certain context re-quires the use of a collection of realistic functions that are representative of all the functions in that particular context. At the basis of the ipu Modeling Approach is a kernel library with  kernels that are used for validation purposes. ese kernels come from  different real applications from a wide range of application domains. Furthermore, the statistical modeling performed by the ipu Model-ing Approach makes it possible to make statements about the confidence for each prediction.

e kernel library utilized by the ipu Modeling Approach is presented in Chapter . e evaluation of the accuracy of ipu Prediction Models can be found in Chapter 

. Dissertation Contributions

e focus of this dissertation is on early high-level quantitative prediction modeling for Hw/SW co-design. In this area of research, we have made the following contributions.

Contribution  — A High-Level antitative Modeling Approach for early HW/SW

Co-design based on Statistical Methods that eﬀectively captures the cor-relation between hardware and soware.

(38)

We present the ipu Modeling Approach, which provides estimation models tar-geting the early stages of HW/SW Co-design. ese estimation models are based on metrics determined from HLL descriptions that characterize the complexity of the soware at hand. A set of regression techniques are utilized to describe how these metrics are related to the diﬀerent hardware measures. Among those tech-niques are least squares regression, neural networks, and principal component regression. In this context, we show how ipu Prediction Models achieve pre-diction speeds of up to . prepre-dictions per second, while retaining a high degree of accuracy.

Contribution  — A set of ipu Prediction Models for various HLS tools,

vari-ous target platforms, varivari-ous hardware measures, andvarivari-ous application domains, demonstrating the generality of the ipu Modeling Approach. e ipu Modeling Approach can be applied to different HLS tools, different target platforms, different hardware measures, and different application domains. We demonstrate how ipu Prediction Models can be generated in a new context by providing a comprehensive example run of the ipu Modeling Methodology. Furthermore, we provide the results for a set of fully operational ipu Prediction Models for four separate combinations of a HLS tool and a target platform, four different application domains, and twelve different hardware measures.

Contribution  — Two case studies, where the ipu Modeling Approach is

uti-lized to analyze and to partition an application.

In order to evaluate the practical use of the ipu Prediction Models, we present two case studies in the analysis and partitioning of an application. e ﬁrst appli-cation is a well-known edge-detection algorithm from the domain of image pro-cessing. e second application is an advanced voice codec featuring good voice quality even at extremely low bit rates. e ipu Prediction Models are used to evaluate the area constraints of the target platform during partitioning. Further-more, we show how the area predictions made by our ipu Prediction Models, together with execution time proﬁles, can be used to perform a preliminary DSE.

Contribution  — An elaborate statistical validation of the predictive quality of

the ipu Prediction Models.

In our work, we validate the ipu Prediction Models using a library of  ap-plication kernels from  diﬀerent apap-plications from a wide range of apap-plication domains, contrary to existing approaches. In addition, we evaluate the ipu Pre-diction Models by detailed analyses of their error behavior, in contrast with almost all other approaches, which characterize their results with a single percentage er-ror. Additionally, this dissertation argues the point that more elaborate

investiga-tions of predictive quality are necessary in the ﬁeld of hardware estimation.

. Dissertation Organization

e remainder of this dissertation is organized in several chapters. First, we survey the ﬁeld of hardware estimation for reconﬁgurable platforms in Chapter . en, we inves-tigate how soware complexity can be measured with respect to hardware prediction,

(39)

DISSERTATION ORGANIZATION Section .

Chapter 2 Hardware Estimation for Reconfigurable Platforms

Chapter 3 Measuring Software Complexity

for Hardware Estimation

Chapter 4 Statistical and antitative

Prediction Modeling

Chapter 5 The ipu Modeling Approach

Chapter 6 Validation of the ipu

Prediction Models

Chapter 7 ipu Prediction Models

in Practice

Chapter 8 Conclusions Challenge 1

How can the many design alter-natives in the huge design space for heterogeneous reconfigurable platforms be evaluated in a

suffi-ciently short period of time?

Challenge 2

How can hardware characteris-tics be transparently predicted for an arbitrary set of tools, while re-taining the comparability of these

predictions?

Challenge 3

How can we validate the accura-cy of a hardware estimation ap-proach in a realistic,

representati-ve, and quantitative manner?

Contribution 1

A High-Level antitative Modeling Approach for early HW/SW

Co-de-sign based on Statistical Methods that effectively captures the correla-tion between hardware and software.

Contribution 2

A set of ipu Prediction Models for various HLS tools, various target platforms, various hardware measu-res, and various application domains,

demonstrating the generality of the ipu Modeling Approach.

Contribution 3

Two case studies, where the ipu Modeling Approach is utilized to analyze and to

partition an application.

Contribution 4

An elaborate statistical validation of the predictive quality of the

ipu Prediction Models.

Appendix A The ipu Modeling Approach:

Implementation Details

Figure .: An outline of the diﬀerent chapters, challenges, and contributions in this dissertation. Chapter  and Chapter  have the same color as they both address the theory at the basis of our approach. Chapter  and Chapter  have the same color as they both address the validation of ipu Prediction Models. Appendix A gives additional information with regard to Chapter .

in Chapter . Aer that, in Chapter , we discuss in detail the diﬀerent modeling tech-niques used in the work presented in this thesis. Subsequently, we present the ipu Modeling Approach in Chapter . e ipu Modeling Approach is then evaluated in Chapters  and . Finally, we conclude this dissertation in Chapter .

A visual outline of this dissertation is given in Figure .. In this ﬁgure, the relation between the diﬀerent chapters, research challenges, and contributions has been depicted. In the following, we present a brief summary of each chapter.

Chapter  — Hardware Estimation for Reconﬁgurable Platforms

In Chapter , we investigate the problem of hardware estimation for reconfig-urable platforms. e domain of heterogeneous reconfigreconfig-urable architectures poses new challenges in the development of applications that effectively use the differ-ent types of PEs and adapt to changing run-time conditions. e huge design space that developers are confronted with, as they target heterogeneous PEs,

(40)

mands fast and early estimation approaches that guide application development, prune the design space, and drive partitioning and mapping. We investigate these issues and identify the role of the ipu Modeling Approach in the problem land-scape. Several research challenges are identiﬁed and research assumptions are formulated.

Chapter  — Measuring Soware Complexity for Hardware Estimation

Chapter  presents an overview of the problem of measuring soware complex-ity as a basis for hardware prediction. e estimation of hardware characteristics based on Linear Regression (LR) targeting HLS tools requires measurement data of the HLL descriptions at hand. e discipline of Soware Measurement pro-vides the necessary tools to acquire these measurements. Focusing on Soware Complexity Metrics (SCMs) at the function– and kernel–level, the ipu Model-ing Approach features a wide range of relevant metrics. ese metrics may be determined at diﬀerent times during the compilation process to account for HLS optimizations.

Chapter  — Statistical and antitative Prediction Modeling

At the basis of the modeling approach presented in this thesis are several statistical modeling techniques. Chapter  presents the necessary theory to understand these techniques. A deﬁnition of a statistical model is presented and an explanation of simple linear regression is given. Such regression models are based on a set of assumptions, which do not hold true in all cases. For this purpose, a specialized set of regression techniques are utilized, such as the Box-Cox power transform, Artiﬁcial Neural Networks (ANNs), and Partial Least Squares Regression (PLSR). As it is important to evaluate the generated models in a comparable and robust manner, the evaluation criteria used in this dissertation are discussed.

Chapter  — e ipu Modeling Approach

e theory about measuring soware complexity and statistical quantitative mod-eling discussed in Chapters  and , has been implemented in the ipu Modmod-eling Approach. In Chapter , an overview of this modeling approach is presented. e

ipu Modeling Approach consists of several essential components, which are

described in detail. As an example of the generality of the ipu Modeling Ap-proach, the chapter contains a comprehensive description of the generation of a

ipu Prediction Model for a new tool and a new platform. Chapter  — Validation of the ipu Prediction Models

Chapter  contains the detailed evaluation of the ipu Prediction Models that are generated by the ipu Modeling approach presented in Chapter . is ap-proach generates hardware prediction models for heterogeneous platforms target-ing early DSE. In this chapter, we evaluate the various qualities of the generated prediction models. We investigate the accuracy, the generality, the specificity, and the speed of ipu Prediction Models. To this purpose, models for different hardware measures, different HLS-tools, and different application domains are presented. Additionally, we compare the ipu Modeling Approach against other prominent approaches in the domain of Hardware Estimation. Finally, we inves-tigate the evolution of the ipu Modeling Approach with respect to the quality of the models.

(41)

DISSERTATION ORGANIZATION Section .

Chapter  — ipu Prediction Models in Practice

e ipu Prediction Models can be used effectively in real scenarios. In Chap-ter , two scenarios, where ipu Prediction Models play an important role, are described: a well-known edge detector and a high-grade voice codec. ese appli-cations are analyzed using profiling information and hardware estimates provided by ipu Prediction Models. e applications are partitioned on a heterogeneous platform utilizing the profiling information and area estimates obtained during application analysis. Several benefits of ipu Prediction Models are discerned.

Chapter  — Conclusions

In Chapter , a summary of the work in this dissertation is presented. Several conclusions with respect to the contributions anticipated in the introduction are drawn. Subsequently, the chapter lists several open issues and opportunities for future research.

It is our sincere hope that the reader will find this dissertation an interesting contribution in the fields of hardware estimation and reconfigurable computing. Although the work that we present makes heavy use of statistics — a subject many readers might not be too familiar with — it is our belief that the presentation of the necessary theory and background will help the interested reader to grasp the contents of our work.

(42)

(43)

CHAPTER

󿾲

Hardware Estimation for

Reconfigurable Platforms

In this chapter, we investigate the problem of hardware estimation for reconfigurable plat-forms. e domain of heterogeneous reconfigurable architectures poses new challenges in the development of applications that effectively use the different types of processing ele-ments and adapt to changing run-time conditions. e huge design space that developers are confronted with as they target heterogeneous processing elements, demands fast and early estimation approaches that guide application development, prune the design space, and drive partitioning and mapping. We investigate these issues and identify the role of our ipu Modeling Approach in the problem landscape.

t

he rapid advancement of soware development in the last few decades

was enabled to a great extent by the Von-Neumann Machine (or Stored-program machine) concept []. is machine concept introduces a clear separation between the program and data storage on the one hand, and the program execution on the other. As can be seen in Figure ., the Central Processing Unit (CPU) executes a program that consists of instructions by fetching them from memory. For each instruction, the necessary operands are also fetched from memory before the its execution. Aerwards, the results of the instruction are wrien back to the memory. Inherently, this process results in a sequential ﬂow of data units and control directives between the memory and the processing unit.

e Stored-program machine abstraction provided a ﬂexible model for programming computers, because the computer program was not hardwired and diﬀerent programs could be executed on the same machine by providing punch cards or other media. In addition, the limited set of instructions provided a simple programming interface, greatly reducing the time to develop a new program. is way, it paved the way for the rapid advancement of soware development in the following decades.

e Von-Neumann machine has been very successful due in no small part to the broad tool and hardware support for the paradigm throughout the soware develop-ment process. Furthermore, the miniaturization of electronics have provided regular speed improvements (Moore’s Law) [] satisfying the ever-increasing need for more computation power. However, as Backus [] pointed out, the concept showed an in-herent boleneck, which he called the Von-Neumann boleneck. Because the processing

(44)

Control Control Unit Unit Arithmetic Arithmetic Logic Unit Logic Unit (ALU) (ALU) Program Program Instructions

Instructions DataData

Memory

Bus

CPU

Figure .: An overview of the basic Von-Neumann architecture, with control and data ﬂowing from the memory to the CPU and vice-versa. e communication caused by this setup can limit the speed at which the CPU can operate.

unit and the memory in a Von-Neumann machine are separate, instructions and data have to be moved continually. e sequential nature of this architecture limits the speed one can achieve by exploiting more parallelism, because with increasing levels of par-allelism, the need for additional memory bandwidth also increases. However, as the limits of miniaturization become clear exploiting parallelism is exactly what is needed. Already systems are using multiple processors and multiple cores per processor to pro-vide more computing power and the memory boleneck is becoming harder to solve. If we hope to adequately address this problem, we need to look beyond Von Neumann’s concept of a machine to other models of computation.

. Heterogeneous Reconﬁgurable Architectures

In recent years, the continuing applicability of Moore’s Law has come into question. For one, wire delays become an increasing problem at higher speeds, and second, the manu-facturing of transistors smaller than a few atoms seems unlikely. Furthermore, a growing demand for mobile technology and other systems with limited power supplies have made the use of fast Von-Neumann processors in such systems difficult if not impossible. To cope with this problem, manufacturers increasingly use specialized accelerators to speed up expensive algorithms in applications, such as media encoding and signal processing. ese accelerators provide additional processing performance and power efficiency for specific types of tasks. In addition, specialized memory hierarchies that provide the nec-essary information to each heterogeneous Processing Element (PE) reduce the impact of the memory boleneck inherent to Von-Neumann architectures. Both of these effects

(45)

HETEROGENEOUS RECONFIGURABLE ARCHITECTURES Section .

V

on-Neumann

Re

con

figurable

ASICs

Hete

_ro

ge

_n

eo

u

s

Performance

F

le

xibility

Figure .: Flexibility versus performance for diﬀerent computing paradigms. Heterogeneous computing makes use of diﬀerent types of computing elements that support any of these para-digms. As such, it overlaps with the other parapara-digms.

are important for mobile and embedded systems. Notwithstanding, custom hardware can also help circumvent the need for diﬃcult and expensive technology scaling, by providing increased performance compared to Von-Neumann architectures.

Machines that break Von-Neumann’s model are already in use for some time. For example, Application Specific Integrated Circuits (ASICs) are able to use the parallelism inherent to the problem at hand and to combine processing and storage into their data-path, effectively eliminating the constant flow of directives and data. Special languages, tools, and design methodologies have been developed to make the implementation of ASICs possible. Despite the additional performance they provide, ASICs do not provide the programmability and flexibility of the Stored-Program machine. At the same time, due to the high development costs, the utilization of ASICs has been limited to high volume production.

.. Types of Processing Elements

Apart from ASICs, heterogeneous computing platforms can contain many other types of PEs. Some of the more prominent are listed in the following.