Transistor-Level Statistical Timing Analysis: Solving Random Differential Equations Directly

(1)

Transistor-Level Statistical

Timing Analysis

Solving Random Differential Equations Directly

(2)

(3)

Transistor-Level Statistical

Timing Analysis

(4)

(5)

Transistor-Level Statistical

Timing Analysis

Solving Random Differential Equations Directly

PROEFSCHRIFT

ter verkrijging van de graad van doctor

aan de Technische Universiteit Delft,

op gezag van de Rector Magnificus Prof. ir. K.C.A.M. Luyben,

voorzitter van het College voor Promoties,

in het openbaar te verdedigen op dinsdag 2 April 2013 om 12.30 uur

door

Qin Tang

Master of Science in Microelectronics and Solid State Electronics

Southeast University, Nanjing, China

(6)

Prof. dr. ir. Edoardo Charbon

Copromotor: Dr. ir. Nick van der Meijs

Samenstelling promotiecommissie:

Rector Magniﬁcus voorzitter

Prof. dr. ir. Edoardo Charbon Technische Universiteit Delft, promotor Dr. ir. Nick van der Meijs Technische Universiteit Delft, co-promotor Prof. dr. David T. Blaauw University of Michigan Engineering Prof. dr. Wil Schilders Technische Universiteit Eindhoven Prof. dr. John Long Technische Universiteit Delft

Dr. Davide Pandini STMicroelectronics

Dr. ir. Michel Berkelaar Technische Universiteit Delft

Prof. dr. ir. Alle-Jan van der Veen Technische Universiteit Delft, reservelid

Copyright c⃝ 2013 by Qin Tang

All rights reserved. No part of the material protected by this copyright notice may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording or by any information storage and retrieval system, without the prior permission of the author.

isbn 978-94-6186-134-4

(7)

To my parents and sister To Dajie and Kaesy Im memory of my grandparents

(8)

(9)

Acknowledgements

Good advice is beyond all price.

Over the years I have been so lucky to encounter many people who have given me their precious time, professional suggestions, personal help and companion-ship. I could not have come this far without their generous encouragement and support.

First of all, I would like to thank all committee members for their eﬀorts on reviewing my thesis and their valuable comments and suggestions, and to my pro-motor, Prof. Edoardo Charbon, for his broad technical knowledge, encouragement and conscientious review of the manuscript of this thesis.

I also would like to express my gratitude to my supervisor, Nick van der Meijs, for the chance to be a Ph.D. student in this group. During my Ph.D. study, he has given me patient instructions and scientific discussions on my research. “You are your own boss.”, he told me in the beginning, and gradually I realize how much I’ve got from his instruction: I am gradually becoming an independent researcher. As a Ph.D. student, I have learned not only the theoretical and technical knowledge and insight of my research field, but also the problem finding and solving abilities, cooperation and communication skills, and student teaching and guiding techniques. I am also grateful for his attitude in life, which helped me go through my stressful and depressed period in 2010.

I would also like to thank my project leader, Michel Berkelaar. Although he only works two days a week in Delft University of Technology, he always gives priority to my research and involves in my research on a week-to-week basis. During the weekly discussions, he provides answers or suggestions for my questions and confusions, making the next steps more clear. He also eﬃciently reviews all my papers and thesis draft and quickly gives me feedbacks to polish them. When I have problems in my life, he never hesitates to give valuable advice and help.

(10)

I am also thankful to my colleague Amir Zjajo, who I worked with in the same oﬃce for almost four years. Discussions with him improved my research abilities and inspired our cooperations in some research topics. During my Ph.D. study, my scientiﬁc writing skill has improved a lot, for which I need to thank not only Nick van der Meijs and Michel Berkelaar, but also Amir Zjajo.

I also would like to express my respect to Prof. Alle-Jan van der Veen for interviewing me and giving me the chance to study in this big family, Circuits and Systems Group. My gratitude also goes to Minaksie Ramsoekh, Rosario Salazar Lozano and Laura Bruns for their secretarial support and personal help on child-care, and Antoon Frehe for his eﬃcient and eﬀective IT support and patient explanations.

I am also grateful to and proud of the following master students: Ashish Nigam, Javier Rodriguez Rodriguez de Guzman and Xinyue Zheng, who worked on my project and helped me to verify some ideas. They all worked hard and had cheerful achievements. I am also thankful for all other members in this group, Alexander, Chockalingam, Prof. Geert Leus, Gerard, Hadi, Matthew, Mohammad, Wim, Mu, Rene van Leuken, Rob, Shingo, Simon, Summet, Seyran, Sundeep, Tao, Venkataraman, Venkat, Yu, Yuki, YiYin, ZiJian, etc. All the people in this group create a pleasant research atmosphere, which I feel lucky to work in.

A life without a friend is death. Thank you, all my Chinese friends here, to give me a special and memorable wedding in the Netherlands, give me so much fun, encouragement and support, and give my daughter the love from uncles and aunts. I also need to thank all parents in TU Vista from diﬀerent places in the world, the shared parenting experience and the laugh when our kids play together are always in my mind.

Last but not the least, I want to express my deepest gratitude to my parents and my sister Wen Tang, for their endless love, trust, encouragement and support, for which I have nothing equal to pay back and to my parents-in-law for their support. I also want to give my special thanks to my husband Dajie Liu, with who I am blessed to stay together with love till death and end. To my beloved angel, my daughter Kaesy Liu, thank you for bringing so many joyful moments and happiness to our life. Mama loves you forever.

Qin Tang

(11)

Acknowledgements

i

List of Acronyms

vii

List of Symbols

xi

1 Introduction

1

1.1 Timing Analysis . . . 1

1.1.1 Gate Timing Models . . . 3

1.2 Statistical Timing Analysis . . . 5

1.2.1 PVT Variations . . . 5

1.2.2 Statistical Timing Analysis Methodologies . . . 7

1.2.3 Statistical Gate Timing Models . . . 8

1.3 Motivations and Contributions . . . 9

1.4 Thesis Organization . . . 11

2 Background

15 2.1 Timing Analysis Categorizations . . . 15

2.1.1 Deterministic Timing Analysis . . . 15

2.1.2 Statistical Static Timing Analysis . . . 17

2.2 Transistor-level Timing Analysis . . . 17

2.2.1 Eﬃciency Improvement . . . 17

2.2.2 Challenges of Transistor-Level Statistical Timing Analysis . 20 2.3 Transient Analysis . . . 21

2.3.1 Transient Analysis Flow . . . 21

2.3.2 Numerical Integration Methods . . . 21

2.3.3 Nonlinear Algebraic Equations . . . 26

2.3.4 Linear Algebraic Equations . . . 28 iii

(12)

2.4 Random Diﬀerential Equations . . . 29

2.4.1 Random Variables and Stochastic Processes . . . 29

2.4.2 Random Diﬀerential Equations . . . 31

2.5 Conclusion . . . 33

3 Simplified Transistor Model Based Timing Analysis

35 3.1 Introduction . . . 35

3.2 The Proposed Transistor-Level Gate Models . . . 38

3.2.1 LUT-Based Simpliﬁed Transistor Model (STM) . . . 38

3.2.2 Simpliﬁed Transistor Model with Statistical Extension . . . 42

3.2.3 STM Characterization and Gate Model Construction . . . . 43

3.3 Waveform Evaluation Methodology . . . 44

3.3.1 Initialization . . . 44

3.3.2 The Main Loop . . . 47

3.4 Experimental Setup and Results . . . 48

3.4.1 Experimental Setup . . . 48

3.4.2 Experimental Results . . . 49

4 RESTA: RDE-Based Statistical Timing Analysis

59 4.1 Introduction . . . 59

4.2 Process Variation Classiﬁcation and Modeling . . . 62

4.2.1 Process Variation Classiﬁcation . . . 62

4.2.2 Process Variation Modeling . . . 63

4.3 Statistical Timing Analysis Development . . . 65

4.4 RDE-Based Statistical Timing Analysis . . . 66

4.4.1 RDE-Based Statistical Solver . . . 66

4.4.2 Analysis Flow . . . 69

4.4.3 Correlations of Variational Waveforms . . . 71

4.4.4 Statistical Delay Calculation . . . 72

4.4.5 Complexity analysis . . . 75

4.5 Piecewise Linear RDE-based Statistical Timing Analysis . . . 77

4.5.1 Theoretical Basis . . . 77

4.5.2 Piecewise Linear RDE-based Statistical Solver . . . 78

4.6 Experimental Results . . . 81

4.6.1 Statistical Delay Calculation Considering MISS . . . 82

4.6.2 Statistical Delay Calculation for Sequential Circuits . . . . 84

4.6.3 Statistical Timing Analysis . . . 85

4.6.4 Runtime . . . 88

(13)

CONTENTS v

5 Statistical Interconnect Delay Calculation Considering Crosstalk Effects

91

5.1 Crosstalk Eﬀects on Interconnect Delay . . . 91

5.2 Crosstalk Noise Filtering and Consideration . . . 94

5.3 Problem Description . . . 95

5.4 SK-Induced Interconnect Delay . . . 98

5.4.1 Piecewise Linear Delay Change Curve Model (PLDM) . . . 98

5.4.2 PLDM-based Interconnect Delay Calculation . . . 101

5.4.3 Complexity Analysis . . . 103

5.5 PV-Induced Interconnect Delay . . . 104

5.6 Experimental Results . . . 106

5.6.1 Experimental Setup . . . 106

5.6.2 SK-induced Statistical Interconnect Delay Calculation . . . 107

5.6.3 PV-induced Statistical Interconnect Delay Calculation . . . 110

5.6.4 Combined SK- and PV-induced Statistical Interconnect De-lay Calculation . . . 111

6 Conclusions and Future Work

115 6.1 Conclusions . . . 115

6.2 Possible Extensions and Future Work . . . 117

A 3D Triangulation-Based Interpolation

121

B STM Accuracy Analysis

123

C Interconnect Delay Calculation Data

127

Bibliography

131

Summary

145

Samenvatting

149

(14)

(15)

List of Acronyms

IC Integrated Circuit . . . 1

EDA Electronic Design Automation . . . 1

CLB Combinational Logic Block . . . 1

DTA Deterministic Timing Analysis . . . 2

GTM Gate Timing Model . . . 2

DSM Deep Sub-Micron . . . 2

MISS Multiple Input Simultaneous Switching . . . 2

NLDM Non-Linear Delay Model . . . 3

LUT Look-Up Table . . . 3

CCS Composite Current Source . . . 3

SIS Single Input Switching . . . 3

σTA Statistical Timing Analysis . . . 5

ViVo Voltage-in, Voltage-out . . . 5

SGTM Statistical Gate Timing Model . . . 5

PVT Process, supply Voltage and Temperature . . . 5

PCA Principal Component Analysis . . . 6

ICA Independent Component Analysis . . . 6

RRR Reduced Rank Reduction . . . 6

MC Monte Carlo . . . 7

CALHS Criticality Aware Latin Hypercube Sampling . . . 7

SH-QMC Stratiﬁcation+Hybrid Quasi-MC . . . 7

(16)

CALHS Criticality Aware Latin Hypercube Sampling . . . 7

SSTA Statistical Static Timing Analysis . . . 8

RESTA Random Diﬀerential Equation based Statistical Timing Analysis . 10 STM Simpliﬁed Transistor Model . . . 10

RDE Random Diﬀerential Equation . . . 11

PWL Piece-Wise Linear . . . 11

STA Static Timing Analysis . . . 15

CCC Channel-Connected Component . . . 19

MNA Modiﬁed Nodal Analysis . . . 21

ODE Ordinary Diﬀerential Equation . . . 21

DAE Diﬀerential Algebraic Equation . . . 21

NR Newton-Raphson . . . 21

DC Direct Current . . . 21

FE Forward Euler . . . 22

BE Backward Euler . . . 22

BDF2 2nd-order Backward Diﬀerentiation Formula . . . 22

TR Trapezoidal Rule . . . 22

LTE Local Truncation Error . . . 23

PLTE Principal Local Truncation Error . . . 23

LAE Linear Algebraic Equation . . . 28

NBTI Negative Bias Temperature Instability . . . 30

pdf Probability Density Function . . . 30

TB Triangularization-Based . . . 39

2D Two-Dimensional . . . 39

KCL Kirchhoﬀ’s Current Law . . . 45

KVL Kirchhoﬀ’s Voltage Law . . . 45

SC Simpliﬁed Chord . . . 48

CMI Compiled Model Interface . . . 48

CMP Chemical Mechanical Polishing . . . 63

VMR Variation to Mean Ratio . . . 63

cdf Cumulative Distribution Function . . . 71

R resistance . . . 91

(17)

ix

PV Process Variation . . . 94

SK Input Skew . . . 94

DCC Delay Change Curve . . . 98

PLDM Piecewise Linear DCC Model . . . 98

CSM Current Source Model

MOS Metal Oxide Semiconductor

SPECS Simulation Program for Electronic Circuits and Systems

ACES Adaptively Controlled Explicit Simulation

(18)

(19)

List of Symbols

ξ process variation vector

ρ correlation coeﬃcient

µ mean value

σ standard deviation value

ς spacing between two adjancent wires

Γξξ covariance matrix of process variations

Γxx covariance matrix of state variables

Cef f eﬀective capacitance

h time step

ids statistical transistor drain-source current

JN R Jacobian matrix obtained by the Newton-Raphson method

JSC Jacobian matrix obtained by the Simpliﬁed Chord method

Lef f eﬀective channel length

Lnom nominal value of the eﬀective channel length Lef f

M the number of sub-spaces

N the number of input skew samples for PLDM characterization

Nn the number of nodes

Ntransistor the number of transistors

Ncp the number of critical paths

Nv the number of voltages in a circuit equation

Np the number of process variations

p process parameter value

ps nominal value of the process parameter

pξ random process parameter with process variations

x state variable vector

xs nominal value of x

˙

x time derivative of x

(20)

x0 initial value of x at the starting time point of simulations

Rdriver output resistance of a driver

s input skew value

Sin input slew

SKe the earliest SK distribution

SKl the latest SK distribution

tk wire thickness

tn the nthtime point

Tr input transition time

T Aa aggressor input arrival time distribution

T Av victim input arrival time distribution

v(t) internal and output voltage vector

VDD supply voltage

Vth threshold voltage

w wire width

(21)

Chapter

1

Introduction

1.1 Timing Analysis

As the CMOS technology continues to reduce the minimum feature sizes and consequently increases the number of transistors on a chip as Moore’s law [1] predicts, the modern digital Integrated Circuits (ICs) become highly complica-ted, containing billions of devices. Additionally, the technology scaling brings challenges for fabrication since many process parameter values, such as transistor length and doping concentrations, are diﬃcult to control exactly. As a conse-quence, designers tend to rely on Electronic Design Automation (EDA) tools to create, verify and optimize their designs, to gain more productivity, and to catch up with the industry development.

A digital IC design must operate safely at the specified frequency of the clocks without any timing violations, which is typically checked by timing analysis tools [2]. As shown in Fig. 1.1, a digital IC circuit typically consists of storage elements like latches or flip-flops and Combinational Logic Blocks (CLBs). Both

clock clock CLB Combinational Logic Block (CLB) S to ra g e E le m e n ts S to ra g e E le m e n ts

Figure 1.1: A typical digital IC, consisting of storage elements and Combinational

Logic Block (CLB).

(22)

the maximum and minimum delays of CLBs must meet certain timing constraints since the maximum delay (worst-case delay) limits the operation frequency of the whole circuit while the minimum delay (best-case delay) may result in a functio-nal failure. Therefore, during timing afunctio-nalysis, the CLBs are typically separated from storage elements, and the timing analysis of the CLBs is performed inde-pendently. In order to verify the timing of CLBs, timing analysis tools are widely used by digital IC designers. Conventionally, during each timing analysis run, all process parameters, temperature and supply voltage are assumed to have ﬁxed values. In this thesis, we call this Deterministic Timing Analysis (DTA).

As shown in Fig. 1.2, a CLB consists of a large number of combinational gates and interconnects. During DTA, the gate delay calculation and interconnect delay calculation are typically performed separately by using corresponding simpliﬁed Gate Timing Models (GTMs) and interconnect models as illustrated in Fig. 1.2. Based on the gate and interconnect delays, the earliest and latest arrival times of each node are calculated and propagated through the timing paths. The in-terconnect models are mainly linear reduced RC systems and the inin-terconnect delays are calculated by using moment-matching techniques, such as asymptotic waveform evaluation, without considering crosstalk eﬀects1 _{[3]. In contrast, logic} gates exhibit highly nonlinear behavior and this behavior should be captured in GTMs for DTA methods.

Therefore, the GTMs are crucial for accuracy-efficiency trade-offs of DTA methods. With the technology shrinking and the advent of more full custom design practices, previously ignored circuit effects, such as resistive interconnect load, Multiple Input Simultaneous Switching (MISS) and arbitary input signal waveforms, become more critical than ever. How to consider these circuit effects for timing analysis is still challenging for Deep Sub-Micron (DSM) and nanometer technologies. Consequently, more accurate GTMs and more advanced timing analysis approaches are required especially for high-performance designs [4–13].

ĂĂ

Gate Timing Model (GTM) Interconnect Model

ĂĂ

Ceff Timing Path gate _gate interconnect

Figure 1.2: An example of the timing analysis procedure of a CLB.

1_{Due to parasitic coupling capacitances between wires, the switching characteristics of a net} are affected by simultaneous switching of other wires that are in close physical proximity. This effect is called crosstalk effect.

(23)

1.1 TIMING ANALYSIS 3

1.1.1 Gate Timing Models

As bases of DTA, GTMs have been developed for accurate timing veriﬁcation. A long-time industry standard is the Non-Linear Delay Model (NLDM) [14], which represents gate delay and output slew in 2D Look-Up Tables (LUTs) as a function of input slew (Sin) and eﬀective load capacitance (Cef f), as shown in Fig. 1.3a.

Since the delay calculation using NLDM only involves linear interpolation from LUTs based on Sinand Cef f, NLDM-based DTA can be performed very fast.

Ho-wever, the simplistic approach suﬀers from timing inaccuracy due to the following main intrinsic accuracy limitations:

LIM1 Incomplete Consideration of Physical Eﬀects.

Physical eﬀects, such as Miller eﬀects, high interconnect resistance and noise propagation, are not considered, which results in inaccuracies in DTA [15]. LIM2 Incompatibility with Arbitrary Waveform Shapes.

Since only the 50% crossing time and input slew are used for delay calcu-lation2, a simple saturated ramp assumption is implied for the waveform modeling. However, the signal shape deviates from the saturated ramp due to many reasons, such as crosstalk noise, large parasitic interconnect resis-tance and MISS [16]. Consequently, a simple ramp can no longer represent the input signals for accurate timing analysis.

LIM3 Simple Capacitive Load Modeling.

Ignorance of parasitic resistance and coupling effects of interconnects for gate delay calculation can lead to large errors, risking the fidelity of timing verification [7, 17]. Additionally, it fails to work with a multi-port coupled interconnect load since the load is only modeled as a Cef f capacitance.

LIM4 Incomplete Consideration of Electrical Eﬀects.

NLDM-based DTA is unable to capture electrical effects, such as MISS and internal charge effects [7] for high-stack and complicated cells. The Single Input Switching (SIS) assumption is applied, which considers only one input switching while setting other input voltages to fixed values. This leads to neglect of these electrical effects. Not modeling MISS for timing can result in as much as 100% error in gate delay and slew calculation [5].

Consequently, the accuracy of NLDM is not suﬃcient for timing veriﬁcation of DSM and nanometer designs.

In order to include the physical eﬀects, Sin- and Cef f-dependent CSMs, such

as the Composite Current Source (CCS) model from Synopsys [15], were proposed and are currently supported by many vendors. Sin- and Cef f-dependent CSMs, 2_{Input slew is deﬁned as the time diﬀerence between the two crossing times (typically the} 30% and 70%, or the 10% and 90% crossing times).

(24)

(a) NLDM. (b) Sin&Ceff -dependent CSM

(c) Voltage -dependent CSM

(d) Multi-port CSM

Transistor -Level Gate Models Delay/Slew LUT

Sin Ceff Ceff Ci(Sin,Ceff ) Io(Sin,Ceff ) Ci Io Co/Ci(Vi,Vo ) Io(Vi,Vo ) Ci Io Co Vi Vo PI1 PI2 PIN . . . . . . . . . . . . PO1 PO2 PON Internal Port . . .

Figure 1.3: The gate timing models are improved for higher accuracy.

illustrated in Fig. 1.3b, model every gate as one or more current sources and capacitors, the values of which are stored in LUTs indexed by Sinand Cef f. As

a consequence, although some physical eﬀects are considered, the Sin- and Cef f

-dependent CSMs still have limitations LIM2∼LIM4 due to the linear waveform model for Sinand the simple Cef f load model, like NLDM.

To account for arbitrary waveforms and loading, voltage-based CSMs [3,17–20] model the current and capacitance as a function of input and output voltages rather than Sinand Cef f, as shown in Fig. 1.3c. This voltage-based representation

addresses the limitations LIM1∼LIM3, and enables waveform-evaluation timing analysis, realizing up to 30% error reduction in DSM circuits [11]. However, most voltage-based CSMs are deﬁned with only one input and one output. Hence, the limitation LIM4 can not be handled, which may result in as much as 100% error in DTA [5].

The limitation LIM4 (electrical eﬀect issue) is taken into account in multiple-port CSMs [5–8], as illustrated in Fig. 1.3d. The multiple-multiple-port CSMs model all input and output ports, and some or all internal nodes in a gate by voltage-based CSMs. It is clear from Fig. 1.3 that the CSMs are getting more and more compli-cated for higher accuracy. The gate timing models, from NLDM to the multiple-port CSMs, attempt to optimize the models to maintain acceptable accuracy for all types of gates. Nevertheless, the fact that these models are black-box models

(25)

1.2 STATISTICAL TIMING ANALYSIS 5

where the internal structure of the gates is hidden is the essential root of all the accuracy limitations.

All four accuracy limitations can be addressed by transistor-level gate mo-dels [9–13, 21–24], which replace every transistor in a gate by CSM-like momo-dels. Compared to multiple-port CSMs, transistor-level gate models are more gene-ral and accurate for timing, noise and power analysis. They are also practical for multi-million gate DTA runs [9–13, 21–24]. Additionally, transistor-level gate models require signiﬁcantly shorter characterization time [13].

In conclusion, the demand for high accuracy has driven the development of accurate GTMs and timing analysis methods. Since most CSMs [3, 17–20] and transistor-level gate models [9–13, 21–24] have model elements dependent on Voltage-in, Voltage-out (ViVo), they are called ViVo-gate models in this thesis.

1.2 Statistical Timing Analysis

The deterministic approach, DTA, has been commonly used for many different CMOS technologies. However, when the feature sizes aggressively scale down of just a few nanometers, the ratios of the Process, supply Voltage and Tempera-ture (PVT) fluctuations with respect to their corresponding nominal values are increasing. The increasing ratios lead to statistical timing performance variabili-ties, which can not be calculated by a single DTA run. As a consequence, different Statistical Gate Timing Models (SGTMs) have been developed and a number of Statistical Timing Analysis (σTA) methods have been proposed to estimate the timing variabilities, as shown in Fig. 1.4.

Deterministic Timing Analysis (DTA)

Statistical Timing Analysis (σTA) Gate Timing Model

(GTM)

Statistical Gate Timing Model (SGTM) PVT Variations

Figure 1.4: Due to the process variations, the SGTMs and the corresponding statistical

timing analysis methods are required for timing veriﬁcation.

1.2.1 PVT Variations

PVT variations refer to process variations, supply voltage variations and tem-perature variations. The process variations, deﬁned as the deviations from the desired values of process parameters, result from a wide range of factors during the manufacturing processes. Examples are the lithography-induced transistor length and width variations and discrete doping ﬂuctuation induced threshold voltage

(26)

variations. The supply voltage variations are mainly caused by the supply voltage drop fluctuations in power distribution networks. A decrease in supply voltage and/or an increase in ground voltage lower the local power supply of a gate, which reduces driving strength and thus increases gate delay. Additionally, the power supply offset of gates in one path will bring voltage offsets and impact gate delay. Temperature has variations due to many factors, such as circuit structures and operating activities. An increased temperature causes performance degradation in both devices and interconnects. As technology continues to scale down, the impact of PVT variations on timing performance is increasing. Without taking PVT variation impacts into consideration before fabrication, the failure rate of chips may be quite high, leading to excessive production cost. Hence, the impact of PVT variations should be accurately analyzed so that designers can optimize their designs before fabrication.

The first step to consider the PVT variations is to understand their sources and impacts. The PVT variations can be categorized into two main types: global (inter-die) variations and local (intra-die) variations. Global variations refer to the parameter variations that have the same value on the entire chip. Global variations between parameters within a single die can lead to either performance degradation or failing hardware without proper consideration. Local variations cause device parameters to vary across different locations within a single die. The spatial dependence of the local variations results in non-identical behaviors among duplicate structures due to the difference between individual devices and wires. Additionally, since each device requires separate random variables to represent the variations in the die, local variations cause a huge dimensionality for variation modeling. For efficient timing analysis including the impacts of local variations, dimension-reduction methods, such as Karhunen-Lo`eve expansion [25], Principal Component Analysis (PCA), Independent Component Analysis (ICA) [26, 27] or Reduced Rank Reduction (RRR) [28] can be used to significantly reduce the number of local variations.

The next step is to model the PVT variations for σTA methods. The supply voltage drop results from parasitic resistances in power supply wires and thus it is layout related. However, layout design is not available until placement and routing are complete. Therefore, the supply voltage variations can not be accurately modeled before layout design. Additionally, the supply voltage and temperature variations are time-dependent since they depend on the work-load of the design. As a consequence, the supply voltage and temperature variations are generally not treated statistically. Typically, the supply voltage and temperature are assumed to have worst-case values during timing analysis [29].

In contrast, the process parameters have fixed values after fabrication, which may differ from their nominal values, which can not be exactly predicted during timing verification. Consequently, the process variations are statistically distri-buted and independent of time during the timing verification phase. Although some process variations become time-dependent due to some effects such as Ne-gative Bias Temperature Instability (NBTI) [30], the impact can be verified by

(27)

1.2 STATISTICAL TIMING ANALYSIS 7

using aging analysis [31] or checked under high temperature. Therefore, process variations are generally modeled as random variables with known distributions or statistical moments, and in this thesis, we focus on the process variations in σTA. Due to process variations, nano-scale circuit designs are subject to signiﬁcant performance ﬂuctuations. The process variations must be accounted for in the early design phase to ensure an acceptable yield once the chips are manufactured. As a result, the σTA method to include the impact of process variations has recently been an active research topic.

1.2.2 Statistical Timing Analysis Methodologies

Corner-Based Timing Analysis

By setting process parameter values to their practical maximum or minimum li-mits, we have diﬀerent parameter conditions, which are called corners. In corner-based timing analysis, the worst/best-case delays are calculated by performing DTA at multiple corners. corner-based timing analysis is considered a safe method to estimate the impact of process variations, and thus is still a standard technique widely employed in the industry. However, it overestimates the circuit delay since it is unlikely for all process parameters to have the extreme values simultaneously. Consequently, the “safe” approach may lead to circuit over-design or circuit mo-diﬁcation to compensate for the unrealistic extreme case performance, wasting human and computation resources. Additionally, the number of corners required by the corner-based method is exponential in the number of process variations. Furthermore, the corner-based method does not provide statistical distributions since the DTA at every corner generates deterministic extreme-case results.

Monte Carlo-Based Timing Analysis

The most accurate available statistical approach to predict the effects of process variations is Monte Carlo (MC) analysis. At each MC iteration, each process parameter is sampled from its distribution, based on which the DTA or transient analysis is performed to obtain the output delay. This procedure is repeated for thousands of trials, and then the delay distribution is calculated from the col-lection of output delays. The output delay distribution can be predicted with sufficient accuracy if the number of trials is large enough. In order to make the MC method practical for large digital IC designs, some variance reduction tech-niques, such as importance sampling and the control variates techniques [32], the mixture importance sampling [33], the Criticality Aware Latin Hypercube Sam-pling (CALHS) approach [34], and Stratification+Hybrid Quasi-MC (SH-QMC) method [35] were proposed to reduce the number of trials and improve the effi-ciency. Nevertheless, the efficiency of the MC method is still insufficient for use in a design flow, and thus other σTA methods with significantly faster speed and similar accuracy are desired.

(28)

Statistical Static Timing Analysis

Statistical Static Timing Analysis (SSTA) has attracted a lot of attentions since it aims at evaluating not only the nominal case timing performance but also its statistical variations due to process variations. During SSTA, each process variation is considered as a random variable with known distribution or statistical moments. In contrast to corner-based and MC-based timing analysis, the gate delays in most SSTA methods are typically characterized as a linear or quadratic function of process variations and the delay distributions or statistical moments are calculated. Based on the characterized function coeﬃcients, the statistical delay calculation only involves interpolation and closed-form function evaluation. By using statistical minimum or maximum operation, the statistical extreme-case delay is computed and propagated.

In order to reduce the computational complexity and improve the speed of SSTA methods, some sacriﬁces are made. For instance, the process variations are often assumed independent and Gaussian-distributed, the non-Gaussian distribu-tions resulting from statistical maximum or minimum operadistribu-tions are approxima-ted by Gaussian distributions, and only a single input is considered. In order to improve accuracy while retaining speed, recently researchers are trying to remove these simpliﬁcations.

For supply voltage and temperature variations, the corner-based timing ana-lysis is still a standard technique to estimate their impact on timing performance. However, for process variations, SSTA is prefered. The statistical gate delay cal-culation in SSTA is performed based on SGTMs rather than the GTMs used for DTA. Therefore, the SGTM is key for the accuracy-eﬃciency trade-oﬀ of SSTA.

1.2.3 Statistical Gate Timing Models

In recent years, in order to consider the unavoidable random process variations, the deterministic GTMs introduced in Section 1.1.1 have been extended to SGTMs for σTA as shown in Fig. 1.4. One type of SGTMs is referred to as function-based SGTM, which is the basis of many published SSTA methods. The function-based SGTM represents the gate delay as a linear or non-linear function of process variations as:

D = D0+ S1Tξ or D = D0+ S1Tξ + ξ

T_S

2ξ (1.1)

where D denotes the statistical gate delay with nominal value D0. S1and S2 are

the ﬁrst-order sensitivity vector and second-order sensitivity matrix, respectively. ξ represents the process variation vector. It is interesting to note that (1.1) can be regarded as a statistical extension to NLDM. The coeﬃcients S1 and S2 are

characterized and stored in LUTs indexed by Sin and Cef f, demanding much

longer characterization time than NLDM. Additionally, most SSTA methods interpolate the coeﬃcients S1(and S2) based on the nominal value of Sinand Cef f

(29)

1.3 MOTIVATIONS AND CONTRIBUTIONS 9

well due to the process variations. Not considering the statistical nature of Sinand

Cef f can result in 30% delay errors and even worse for bigger circuits [17]. Also,

like NLDM, the function-based SGTMs can not account for resistive interconnect loads and nonlinear input waveforms. Furthermore, the function-based delay representation is entirely based on non-physical or empirical models, which is a major source of inaccuracy [18]. In contrast, the current sources and capacitors in ViVo-gate models have a well-deﬁned physical relationship with node voltages and physical parameters, especially in transistor-level gate models.

To improve accuracy, the voltage-based CSMs have also been extended for statistical delay calculation [17–20]. In [17], the current source values and capaci-tances in CSMs are modeled as a quadratic Hermite function of process variations. Several crossing times are characterized based on this quadratic CSM, from which other crossing time distributions are calculated by process variation sampling and linear interpolation. In [18], the variational voltages and all elements in a CSM are modeled as a stochastic ﬁrst-order expression in terms of process variations. Then the output voltage is treated as a Markovian process for delay distribu-tion calculadistribu-tion. Similarly, a CSM with parametric nonlinear voltage-dependent current source and parametric capacitances is used in [19] and [20]. The voltage in [19] is represented as a time-domain statistical variable and time-domain inte-gration is performed taking into account input voltage waveform variations. The gate output voltage distribution in [20] is obtained by MC sampling and stored for various voltage levels by developing regression based models. However, these methods are just veriﬁed in several simple single gates considering only single input switching, and signal correlations and sequential cells are not considered.

Both the statistical extension of NLDM and voltage-based CSMs are based on simpliﬁed black-box models, whose errors aﬀect the subsequent statistical in-ference. Considering the accuracy dependence of the statistical delay calculation on the SGTMs, the statistical results based on these black-box models may be inaccurate and lead to erroneous conclusions.

1.3 Motivations and Contributions

The objective of our project is to come up with a novel statistical timing analysis method and algorithm for accurate statistical delay calculation. Motivated by this objective, in this thesis, we focus on a new statistical timing analysis method and algorithm, and our goal is to capture process variations and many circuit eﬀects for statistical gate delay and interconnect delay calculation with high accuracy and eﬃciency.

The accuracy requirement of DTA has driven the development of GTMs from NLDM to ViVo-gate models as shown in Fig. 1.5. However, this gate model improvement does not help SSTA a lot. Most of the literature tries to extend linear function models to non-linear function models, by using diﬀerent statistical minimum and maximum calculation methods and assumptions for statistical

(30)

de-NLDM Sin&Ceff -dependent CSMs Voltage-dependent CSMs Multi -port CSMs Transistor -Level Gate Models Function-Based Model ViVo-gate models

GTMs for DTA SGTMsfor SSTA

? How? H ig h er A cc u ra cy

Figure 1.5: The development of deterministic and statistical gate timing models.

lay calculation and propagation. Nevertheless, how to extend the more accurate ViVo-gate models, especially the accurate transistor-level gate models for statis-tical timing analysis is still challenging. This extension involves the following problems:

1. Which ViVo-gate model to choose?

2. What kind of method can be used for statistical timing analysis based on the model we choose?

3. How to consider crosstalk eﬀects for statistical interconnect delay calcula-tion?

To gain high accuracy and to be able to see the important circuit eﬀects, we propose a novel non-MC Random Diﬀerential Equation based Statistical Timing

Analysis (RESTA) method, which provides both statistical delay and variational

waveform information. The problems mentioned above are solved by the following approaches:

1. Choose transistor-level gate models. As introduced in Section 1.1.1 and Section 1.2.3, NLDM and simple CSMs inject errors into delay calculation due to their over-simplified black-box models. Since GTMs are the bases of gate-based timing analysis, transistor-level gate models are chosen in our algorithm to reduce the errors from the GTMs. In order to improve the efficiency of transistor-level timing analysis, a Simplified Transistor Model (STM) and waveform evaluation method are proposed (Chapter 3). 2. Use the non-MC RESTA method. Instead of simulating many times by

using corner-based or MC-based timing analysis, the RESTA method solves the statistical circuit equation directly for eﬃcient timing analysis. The variational waveforms and statistical delay moments are calculated accura-tely(Chapter 4).

(31)

1.4 THESIS ORGANIZATION 11

3. Employ a Piecewise Linear Delay change curve Model (PLDM). Based on the PLDM, the statistical interconnect delay due to crosstalk eﬀects is culated analytically. Additionally, the PLDM-based interconnect delay cal-culation method can handle both Gaussian and non-Gaussian input skew variations (Chapter 5).

The proposed statistical timing analysis also has the following features: • A Random Diﬀerential Equation (RDE) based solver is proposed in RESTA

to solve random circuit equations based on ﬁrst-order Taylor expansion. If the process variation range needs to be enlarged, the Piece-Wise Li-near (PWL) RDE-based solver in RESTA can be employed. This method partitions the process variation space into several sub-spaces where the li-nearity assumption has good accuracy and then the RDE-based solver is called.

• By using the RDE-based solver, the results are obtained by simulating only once, which is much more eﬃcient than MC-based and many-corner-based timing analysis.

• The nominal voltages are calculated eﬃciently by using Simpliﬁed Chord method, which avoids Jacobian calculation and update at every iteration. • In the proposed RESTA method, all input signals are considered and

cal-culated together, thus fundamentally addressing MISS in statistical timing analysis.

• The correlations among signals and between signals and delays are preserved during statistical delay calculation.

• Due to crosstalk eﬀects, input skew becomes a key factor for interconnect delay calculation. By using the proposed PLDM-based delay calculation method, both Gaussian and non-Gaussian input skew distributions can be handled and the statistical interconnect delay can be calculated analytically with closed form expressions.

1.4 Thesis Organization

The thesis structure and the relationships among the six chapters are shown in Fig. 1.6. The contents of each chapter is summarized below.

Chapter 1 gives a brief introduction to timing analysis and our motivations. Before explaining our methods, the categorizations of both Deterministic Timing Analysis (DTA) and Statistical Static Timing Analysis (SSTA) are studied in the beginning of Chapter 2. Then, a literature review of transistor-level timing analysis is given. The literature review indicates a tendency to move to transistor-level timing analysis. It also shows the research eﬀorts for accuracy improvement

(32)

MOTIVATION STATEMENT

Demands for more accurate statistical timing analysis methods ( Chapter 1 )

PROPOSED DETERMINISTIC SOLUTION

STM-Based Timing Analysis ( Chapter 3 )

PROPOSED STATISTICAL SOLUTION

RDE-Based Statistical Timing Analysis ( Chapter 4 ) Statistical Interconnect Delay Calculation Considering

Crosstalk Effects ( Chapter 5 )

Conclusion ( Chapter 6 ) SUPPLEMENT

Background (Chapter 2 )

Figure 1.6: The outline of the thesis.

of both DTA and SSTA through transistor-level approaches. After the literature review, Chapter 2 also introduces a typical circuit simulation ﬂow and the solution of RDEs. Chapter 2 provides the background for better understanding of later chapters in order to make the thesis more complete.

Chapter 3 proposes the deterministic timing analysis method used in RESTA for nominal output calculation. This method uses our Simplified Transistor Model (STM), which simplifies the transistor models while still maintaining the ability to consider important circuit effects. STM-based gate modeling significantly reduces characterization time. An efficient waveform evaluation algorithm for arbitrary input and output waveform shapes is also presented. The experimental results in both combinational and sequential circuits demonstrate the high accuracy and efficiency of the proposed STM-based timing analysis approach.

Chapter 4 presents the core of the RESTA method. Due to process variations, the output becomes variational voltages, which we represent as a combination of nominal voltages and a variational part. The nominal voltage can be calcu-lated by using a deterministic STA method, such as the method introduced in Chapter 3. The voltage variations are eﬃciently computed by using our RDE-based statistical solver which solves random circuit equations. For large process variations, high accuracy is maintained by using our PWL-RDE based statistical solver. Based on the variational voltages, statistical delay is calculated through closed-form expressions. RESTA provides statistical delay moments with only a small eﬃciency penalty on the DTA approaches, especially for large circuits.

The importance of considering crosstalk effects in statistical timing analysis is discussed in Chapter 5. After a study of the impact of crosstalk effects, input skew is found to be a key factor. We propose a PLDM model to represent the dependence of the interconnect delay on the input skew due to crosstalk effects. Based on the PLDM, a closed-form statistical interconnect delay calculation

(33)

me-1.4 THESIS ORGANIZATION 13

thod is proposed, which can handle both Gaussian and non-Gaussian input skew distributions. By taking into account both statistical input skew variations and process variations in closed form expressions, the proposed method shows high accuracy and eﬃciency.

In Chapter 6, we give a comprehensive conclusion for this dissertation. The possible applications and eﬃciency improvement techniques of RESTA are also discussed. Additionally, several interesting research topics are outlined for future work.

(34)

(35)

Chapter

2

Background

This chapter firstly studies the categorizations of both Deterministic Timing Ana-lysis (DTA) and Statistical Static Timing AnaAna-lysis (SSTA), and reviews the exis-ting transistor-level timing analysis methods. Next, it introduces a typical tran-sient analysis flow in timing analysis algorithms which use ViVo-based gate mo-dels, including transistor-level gate models. Finally, this chapter formulates some concepts and theorems on solving differential equations with random variables. These topics are important for a complete understanding of the later chapters.

2.1 Timing Analysis Categorizations

2.1.1 Deterministic Timing Analysis

As introduced in Chapter 1, DTA tools are widely used to verify the timing performance of digital circuits when the process variations are negligible or have extreme-case values. According to diﬀerent Gate Timing Models (GTMs), the DTA methods can be divided into diﬀerent types, such as NLDM-based timing analysis, CSM-based timing analysis and transistor-level timing analysis. From the input vector perspective, DTA can be categorized into dynamic timing analysis and Static Timing Analysis (STA).

Dynamic timing analysis verifies both the functionality and the timing of a digital design. In order to find the worst-case delay, many combinations of possible input transitions and input arrival times of each gate are required in dynamic timing analysis methods. For each multi-input gate, the number of possible input vectors grows exponentially with the number of input pins of the gate. Hence, efficiency is an obstacle for the application of dynamic timing analysis to large digital circuits.

In contrast, STA approximates the worst-case and best-case delays of all paths, 15

(36)

g1 g2 g3 g4 g5 g6 g7 g8 source sink

level 1 level 2 level 3

g1 g2 g3 g4 g5 g6 g7 g8 source sink path (a) (b)

Figure 2.1: (a) Block-based technique; (b) Path-based technique.

irrespective of the data values being applied at input pins. Therefore, STA is called static and it is highly eﬃcient in propagating the delay of each signal forward through Combinational Logic Blocks (CLBs) based on pre-characterized GTMs. A CLB consists of logic gates and interconnects. In STA, the timing information contained in a CLB is modeled in a timing graph, as shown in Fig. 2.1. A timing graph has nodes and edges. Each node (g1-g8 in Fig. 2.1) represents an input or an output of a gate or a primary input or output of the CLB. For convenience, often a source node and a sink node are added to the timing graph, as start and end points of all timing paths. The source node has outgoing edges to all nodes representing primary inputs, and the sink node has incoming edges from all nodes representing primary outputs. Each edge represents a timing arc of an input-output pair of a gate or an interconnect. According to diﬀerent traversal algorithms in the timing graph, most STA methods can be divided into two major categories: block-based STA and path-based STA.

Block-based STA works through a levelized timing graph in a breadth-first fashion as shown in Fig. 2.1a. The CLB under analysis is partitioned by its logic levels and processed following the level order. When a node in the timing graph has multiple input edges, the maximum or minimum operation is performed to choose the worst-case or best-case arrival time for propagation. The block-based algorithm has a computational complexity that is linear in the number of nodes. However, the runtime of a block-based STA algorithms differ when using different GTMs. When using simple GTMs, such as NLDM, block-based STA is more efficient than when using more complicated GTMs, such as the ViVo-gate models introduced in Section 1.1.1.

In contrast, path-based STA calculates the delay of individual paths in a circuit, as shown in Fig. 2.1b. In path-based STA, all topological paths are identiﬁed for timing analysis and the maximum operation is only necessary for path delays, rather than all gate delays. In practice, the top Ncpcritical paths are selected for

path-based STA to estimate the worst-case path delay, where Ncp is a constant

chosen to cover a suﬃcient spread of critical paths and is typically much smaller than the number of all topological paths. As mentioned above, block-based STA is fast when using simple GTMs, such as NLDM. Therefore, path-based STA typically uses this kind of fast block-based STA to pick up the top Ncp critical

(37)

2.2 TRANSISTOR-LEVEL TIMING ANALYSIS 17

paths. After obtaining the Ncpcritical paths, more accurate GTMs, such as the

transistor-level gate models, can be used to estimate an upper bound of path delays with higher accuracy. Since path-based STA provides a good accuracy-eﬃciency trade-oﬀ, we use it for deterministic timing analysis in this thesis.

2.1.2 Statistical Static Timing Analysis

When process variations are considered, SSTA can be applied to calculate statis-tical delays and verify the timing of digital circuits. Similar to STA, SSTA also calculates and propagates the worst-case and best-case delays, independent of the input vectors. However, in contrast to STA, the delays and arrival times in SSTA are statistical distributions, rather than deterministic values. In SSTA, the gate delay is typically modeled as a linear or quadratic function of process variations as introduced in Section 1.2.3.

SSTA methods can also be divided into two categories: block-based SSTA and path-based SSTA. Block-based SSTA propagates statistical delays and arrival times based on level orders of a timing graph. When a node in the timing graph has multiple input edges, statistical maximum or minimum operations are requi-red. Since maximum and minimum operations are highly nonlinear, the statistical maximum and minimum calculation poses challenges for accurate statistical de-lay calculation. As a consequence, a Gaussian approximation is typically used for the statistical maximum operation at the cost of accuracy [36, 37]. In path-based SSTA, Ncpcritical paths can also be quickly selected by using block-based STA.

Then the Ncp critical paths are analyzed by using accurate approaches

consi-dering process variations and circuit eﬀects. As done in [38–42], we also use the path-based approach in the experiments of the proposed statistical timing analysis method in this thesis.

2.2 Transistor-level Timing Analysis

2.2.1 Efficiency Improvement

The accuracy and efficiency of both block-based and path-based STA depend on the GTMs used in STA algorithms. By using simple GTMs, STA can qui-ckly calculate the worst-case path delay and estimate critical paths to shorten the digital IC design cycle time. However, the over-simplified GTMs and the extreme-case delay propagation schemes result in inaccuracy of timing verifica-tion for DSM and nanometer technologies. In order to consider circuit effects, such as nonlinear realistic waveforms, resistive interconnect load, Multiple Input Simultaneous Switching (MISS) scenarios, internal charge effects and memory elements, STA using accurate transistor-level gate models, called transistor-level timing analysis, attracts a lot of attention in recent years.

Similar to all other ViVo-gate model based STA methods, voltage calculation in transistor-level timing analysis requires timing simulation. Unlike

(38)

conventio-nal simulation, which uses complex aconventio-nalytical transistor models (e.g. BSIM4 [43]) and sophisticated simulation algorithms (e.g. the algorithms used in Spectre and Spice1_{[44,45]), timing simulation provides eﬃcient time-domain analysis for large} digital circuits by using fast simulation methods and/or simpliﬁed transistor mo-dels.

Transistor-level timing analysis typically offers a few orders of magnitude in runtime speedup with an accuracy within 5% for large digital circuit analysis compared to Spice-like simulations. The runtime bottleneck of the transistor-level timing analysis is typically overcome by the following three approaches: 1) fast simulation methods; 2) simplified transistor models; 3) the combination of fast simulation methods and simplified transistor models.

Fast Simulation Methods

• Event-Driven Simulation. The eﬃciency of some methods comes from the exploitation of temporal and spatial latency that allow circuit partitioning and event-driven simulation. An event-driven method discretises the vol-tage axis such that all the volvol-tage values are multiples of a ﬁxed value, then partially activates the circuit operations and evaluates the time-domain res-ponses incrementally in time. Although still using complex analytical tran-sistor models, the runtime does not depend on the number of nodes in the circuit, but on the level of activity during the simulation. An event-driven approach is used together with waveform relaxation in [46] to further acce-lerate simulation [46–50].

• Integration Techniques. Some methods employ adaptively controlled expli-cit integration techniques in conjunction with piecewise linear models for higher eﬃciency. These methods explicitly control the time step to select the minimum number of time steps for waveform computation [50]. • Piecewise Waveform Model. In order to avoid the time-consuming

brute-force solution of diﬀerential equations, piecewise quadratic waveform mat-ching is used in [12]. However, the piecewise quadratic model can be expen-sive to evaluate.

• Worst-case Delay Calculation. Traditionally, the gate delay can be found by simulating the gate with a set of input vectors. However, this method requires multiple simulations even for small circuits. A worst-case delay configuration method is proposed in [23], which uses a single simulation to find the worst-case delay by carefully choosing the input excitations and the initial conditions of the internal nodes. A method to find the worst-case

1_{Spice refers to Simulation Program with Integrated Circuit Emphasis, which is a} general-purpose and open source analog electronic circuit simulator. Spectre is a commercial Spice-like circuit simulator from Cadence Design Systems, Inc..

(39)

2.2 TRANSISTOR-LEVEL TIMING ANALYSIS 19

conﬁguration and waveform was proposed in [11] for eﬃcient transistor-level timing analysis.

• Channel-Connected Component (CCC) based method. A circuit at the tran-sistor level is partitioned into groups of CCCs, which are clusters of transis-tors connected by the source-drain terminals of transistransis-tors. The CCCs are simulated individually [23, 51]. A CCC constitutes a delay element whose behavior aﬀects the overall timing analysis. Together with worst-case de-lay calculation and a novel global caching scheme, CCC simulation based STA is able to process large circuits with the accuracy and speed requi-red by high performance designs. However, it is diﬃcult to take MISS into consideration.

• Successive Chord method. The successive chord method linearizes non-linear devices with constant ﬁrst-order terms, making the Jacobian matrix constant over all iterations. Without re-calculating the Jacobian matrix at every iteration, transistor-level timing analysis speed is signiﬁcantly impro-ved [10, 10, 12, 22, 52].

• Multi-Threaded Algorithm. By using a multi-threaded timing traversal ap-proach and proper utilization of available computer resources, it becomes practical to use transistor-level timing analysis in multi-million gate STA runs [13].

Simplified Transistor Models

• Piecewise Linear Models. The I-V characteristics of a device can be ap-proximated by a piecewise constant function, which forces branch voltages to be piecewise linear in time and branch currents to be piecewise constant in time [49]. Nonlinear device characteristics are modeled in terms of pie-cewise linear representations in [50], allowing eﬃcient simulation for MOS circuits [49, 50].

• Closed-Form Simplified Transistor Expressions. A fast-to-evaluate and ac-curate simplified transistor model was proposed in [21] for timing analysis, which models every transistor by one current source and five capacitors. The currents and capacitances are represented as a linear function of termi-nal voltages and some parameters like channel length. In [53], region-wise quadratic I-V models and linear resistance and capacitance models are used for fast timing analysis. A five-point based I-V model and a BSIM4-based charge model are used for transistor modeling in [54]. Although the closed-form expressions achieve fast timing analysis, the simplistic expressions can not account for many secondary effects and thus are inaccurate for DSM and nanometer designs.

(40)

• LUT-based Simplified Transistor Models. Recently, LUT-based transistor models are used for timing analysis, which achieve a good combination of accuracy and efficiency. In those models, every transistor is represented by a current source and five capacitors. The values of current and capacitance are either all modeled in LUTs, or the current value in a LUT and the capacitance value in closed-form expressions [10, 12, 13, 22, 52, 55].

The Combination of Faster Simulation Method and Simplified Transistor Models

The simulation acceleration methods and simplified transistor models mentioned above can be combined together for better performance of timing analysis. Based on detailed analytical transistor models, the ELogic simulator improves efficiency by using an event-driven mechanism available in a relaxation-based electrical si-mulator [46]. The sisi-mulators SPECS [49] and ACES [50] employ a piecewise linear model approximation and an event-driven approach. SPECS also uses a variable-accuracy circuit simulation algorithm that uses LUT-based transistor models. A CCC-based simulation method with a worst-case delay calculation algorithm is proposed in [23, 51] for deterministic timing analysis. Recently, LUT-based tran-sistor models are used with a successive chord method in [10,12,22,52] and with a multi-threaded algorithm in [13], respectively. These methods show the accuracy and efficiency of transistor-level timing analysis for large digital circuits.

2.2.2 Challenges of Transistor-Level Statistical Timing Analysis

Transistor-level timing analysis has been developed for large digital designs be-cause of its high accuracy. Despite the efficiency reported for large circuits, how to use them for statistical timing analysis is still challenging. One particular rea-son is that it is difficult to add the impact of process variations to the transistor models. Another reason is that the voltage, rather than the delay, is calcula-ted. Hence, how to extend the transistor-level gate models for statistical timing analysis, how to efficiently calculate output variational voltage caused by process variations, and how to extract statistical delay information from the variational voltages are open questions.

When using expression-based transistor models that consider process parame-ters of interest, the Monte Carlo (MC) method can be directly applied to obtain accurate delay distributions. For LUT-based transistor models, the process va-riations must be included into the model in order to perform MC-based timing analysis. As introduced in Section 1.2.2, MC can provide accurate delay distribu-tions if the GTMs are accurate enough, and thus the MC results can be regarded as a reference for comparison. However, its low runtime eﬃciency still prevents its use for timing estimation of large digital IC designs. Therefore, it is desirable to have a more eﬃcient statistical timing analysis method with similar accuracy as MC-based approaches [9, 13, 56, 57]. In this thesis, a statistical gate modeling methodology and a non-MC statistical timing analysis approach are presented.

(41)

2.3 TRANSIENT ANALYSIS 21

By using statistical ViVo-gate models, including statistical transistor-level gate models [9, 13, 17–20, 56, 57], the path-based SSTA approach can be used for a good accuracy-efficiency trade-off. A path-based approach usually first performs block-based STA as a first-hand diagnosis tool to identify a set of critical paths. Then, an accurate statistical analysis, such as the transistor-level statistical timing analysis, can be applied for this limited set of critical paths to account for the process variations. It can reasonably be assumed that the smaller set of critical paths determines the timing performance of the entire circuit design under process variations.

2.3 Transient Analysis

As mentioned in Section 2.2, by using ViVo-gate models, such as transistor-level gate models, the waveforms need to be evaluated via timing simulation, which typically has a transient analysis ﬂow. In this section, we summarize and brieﬂy introduce the relevant knowledge of transient analysis. More detailed information can be found in [58, 59].

2.3.1 Transient Analysis Flow

The behavior of a circuit is captured in a set of equations that are formulated by combining the element equations and Kirchhoff’s Current and Voltage Laws (KCL and KVL). In most modern circuit simulators, the Modified Nodal Analy-sis (MNA) method is implemented as a systematic and automatic approach for for-mulating the circuit equation. In general, for the transient analysis flow shown in Fig. 2.2 [58], MNA leads to a nonlinear Ordinary Differential Equation (ODE) or a Differential Algebraic Equation (DAE) system, which is transformed by means of multi-step numerical integration methods into a nonlinear algebraic system. At each integration step, a Newton-Raphson (NR)-like method is used to solve this nonlinear algebraic system by linearizing this system around an initial value point. Typically, a circuit equation in MNA form can be expressed compactly as F ( ˙x, x, t, ps) = 0 x(t0) = x0 (2.1) where x denotes the state variable vector including branch currents and node voltages, ˙x is its time-domain derivative and psrepresents a nominal value vector

of process parameters. The initial value of the state variables, denoted as x0, needs to be calculated before the transient analysis. Generally, x0is obtained via the Direct Current (DC) analysis [58].

2.3.2 Numerical Integration Methods

In order to manage the diﬀerential system (2.1), in general, we must resort to numerical integration methods to convert it into an algebraic system. This re-quires an algebraic approximation of the time derivative ˙x in (2.1). The numerical

(42)

Netlist Parsing

Nonlinear ODE/DAE F( ẋ,x,t,ps )=0

Nonlinear Algebraic Equations H( x )=0

Linear Algebraic Equations Aδx=b Converged? End Time? End Yes Yes No No Numerical Integration NR-like Method Solve Section 2.3.2 Section 2.3.3 Section 2.3.4

Figure 2.2: The overall circuit transient analysis ﬂow.

integration methods replace a continuous time interval [t0, tend] by a discrete time

point set tn: t0< t1 < t2<· · · < tn−1 < tn <· · · < tend. Here hn = tn− tn−1,

n = 1, 2,· · · is called the time step for system calculation at tn. If the time step

is uniform at every time point, h is simply used to represent the time step. At each discretized time tn, the time derivative ˙x is replaced by a ﬁnite diﬀerence,

making use of the previously found values xn−1, xn−2, · · · to calculate xn as an

approximation of the exact value x(tn). Four simple and basic numerical

integra-tion methods are Forward Euler (FE), Backward Euler (BE), 2nd_{-order Backward}

Diﬀerentiation Formula (BDF2) and Trapezoidal Rule (TR), which can be for-mulated as FE: x˙n= 1 hn (xn+1− xn) (2.2) BE: x˙n= 1 hn (xn− xn₋₁) (2.3) BDF2: x˙n= 1 2hn (3xn− 4xn−1+ xn−2) (2.4) TR: x˙n= 2 hn (xn− xn−1)− ˙xn−1 (2.5)

By using these numerical integration methods, the diﬀerential equation (2.1) is converted into an algebraic equation. For example, by approximating ˙x by BE,

(43)

2.3 TRANSIENT ANALYSIS 23

we obtain an implicit nonlinear algebraic system from (2.1) at tn as

F (xn− xn−1 hn

, xn, tn, ps) = 0 x(tn−1) = xn−1 (2.6)

The numerical integration methods can also be interpreted as a mapping from the dynamic elements to linear models. The linearized model of a dynamic element in circuit simulation is called a companion model. Consider a linear capacitor at time tn, whose I-V relationship is in = C ˙vn, where in and vn are the

approxima-tion of i(tn) and v(tn) respectively, and ˙vn is the time derivative of vn. By using

TR, in can be expressed as in = Geqvn− Ieq, where Geq and Ieq are shown in

Fig. 2.3. As a result, the capacitor is linearized to a companion model consisting of a conductance (Geq) and a current source (Ieq) as illustrated in Fig. 2.3.

Ieq

Ieq=in-1+(2C/h)vn-1 Geq=2C/h C

Geq

Figure 2.3: The companion model of a linear capacitor by using the Trapezoidal Rule

method. Geq and Ieq are the conductance and current value of the companion model.

It is clear from (2.2)-(2.5) that the calculation of xn by using FE is only based

on the previous known value xn₋₁, thus it is an explicit method. In contrast,

solving xnby using the other three methods depends on xnitself, and hence they

are implicit methods. Since BDF2 requires two previously calculated values, it is a two-step method, with the rest being one-step methods.

The main quality metrics of numerical integration methods are accuracy, com-putational eﬃciency and stability. BE, FE, BDF2 and TR only use one or two previously calculated values for derivative approximation, hence they are simpler to implement and faster to evaluate than other higher-order numerical methods. In general, the accuracy is estimated using the Local Truncation Error (LTE) order, which is deﬁned as follows:

Deﬁnition Let xnbe the value returned by a numerical integration method when

we assume xn−j = x(tn−j), for j = 0, 1,· · · , n. The Local Truncation Error (LTE)

is deﬁned as τn(h) = x(tn)− xn, which can be expressed as

τn(h) = P LT E + O(hp+2)

where h is the time step, p denotes the order of the numerical integration method, O(·) is the big O notation and PLTE represents the Principal Local Truncation Error (PLTE). The PLTE can be written as

P LT E = Cp+1hp+1xp+1(tn−1)

where Ci (i = 0· · · p + 1) are constants with the values C0= C1=· · · = Cp = 0

Transistor-Level Statistical Timing Analysis: Solving Random Differential Equations Directly

Transistor-Level Statistical

Timing Analysis

Solving Random Differential Equations Directly

Transistor-Level Statistical

Timing Analysis

Transistor-Level Statistical

Timing Analysis

Solving Random Differential Equations Directly

PROEFSCHRIFT

ter verkrijging van de graad van doctor

aan de Technische Universiteit Delft,

op gezag van de Rector Magnificus Prof. ir. K.C.A.M. Luyben,

voorzitter van het College voor Promoties,

in het openbaar te verdedigen op dinsdag 2 April 2013 om 12.30 uur

door

Qin Tang

Master of Science in Microelectronics and Solid State Electronics

Southeast University, Nanjing, China

Acknowledgements

Contents

Acknowledgements

List of Acronyms

List of Symbols

1 Introduction

2 Background

3 Simplified Transistor Model Based Timing Analysis

4 RESTA: RDE-Based Statistical Timing Analysis

5 Statistical Interconnect Delay Calculation Considering Crosstalk Effects

6 Conclusions and Future Work

A 3D Triangulation-Based Interpolation

B STM Accuracy Analysis

C Interconnect Delay Calculation Data

Bibliography

Summary

Samenvatting

List of Acronyms

List of Symbols

Chapter

1

Introduction

1.1

Timing Analysis

ĂĂ

ĂĂ

ĂĂ

ĂĂ

1.1.1

Gate Timing Models

1.2

Statistical Timing Analysis

1.2.1

PVT Variations

1.2.2

Statistical Timing Analysis Methodologies

1.2.3

Statistical Gate Timing Models

1.3

Motivations and Contributions

1.4

Thesis Organization

Chapter

2

Background

2.1

Timing Analysis Categorizations

2.1.1

Deterministic Timing Analysis

2.1.2

Statistical Static Timing Analysis

2.2

Transistor-level Timing Analysis

2.2.1

Efficiency Improvement

2.2.2

Challenges of Transistor-Level Statistical Timing Analysis

2.3

Transient Analysis

2.3.1

Transient Analysis Flow