• Nie Znaleziono Wyników

Computational Performance Sequential optimization

N/A
N/A
Protected

Academic year: 2021

Share "Computational Performance Sequential optimization"

Copied!
15
0
0

Pełen tekst

(1)

Analysis and modeling of

Computational Performance

Sequential optimization

(2)

Software optimization

Software optimization can have several goals:

minimization of execution time

• the only one we are interested in, further called just optimization

minimization of memory footprint

other requirements, often depending on the particular software type or domain of application

Optimization can be performed by different means at different stages of software development

by properly choosing algorithms and data structures while designing codes

• depends on the domain of application

by proper implementation at the stage of source code creation

• the main concern today is exploitation of parallel capabilities

• even scalable software should have high single node performance

by using optimizing compiler

by the use of hardware designed for performance

(3)

Software optimization

Software optimization is often blamed for being an obstacle for proper code development

Donald Knuth: "“Premature optimization is the root of all evil”

• but the full quote includes “The real problem is that programmers have spent far too much time worrying about efficiency in the wrong

places and at the wrong times; ..."

Performance optimization have to be done for the code that works

• however, in order to give optimization a chance to improve the

performance, the code has to be designed from the beginning with the future performance optimization in mind

Often employed strategy

• predict the places most important from the performance point of view

• separate the related code, create working version of the program

• perform optimization, by removing "bottlenecks"

bottleneck is a place that cause performance degradation for a particular code or even particular case of input data

(4)

Software optimization

The prediction of places most important from the performance point of view can be based on the analysis of the number of

instructions and memory accesses done in a given part of the code

the parts of the code with the highest percentage of expected execution time are called "hot spots"

optimizing "hot spots" may be the most effective way for performance improvement

"hot spots" often become performance "bottlenecks"

• it is also possible that a bottleneck appears in a place where relatively few operations are performed but these operations are (or become in certain circumstances) extremely slow

e.g. swapping or other secondary storage (hard disk or SSD) access, slow network connection, etc.

we will be mainly concerned with "hot spot" optimization, but will

keep in mind that code profiling and bottleneck discovery should be

the first step in optimization for a particular code

(5)

Software optimization

The optimization should concern parts of the code most important from the performance point of view

"hot spots" can be identified through algorithm and source code analysis

"bottlenecks" can be found by profiling

After separating the code related to the performance, different actions can be performed:

a proper high performance library can be found that provides functions necessary for code implementation

• e.g. many linear algebra packages, with LAPACK being a

prominent example, are successfully used in numerous programs

• using libraries creates dependencies that may become problematic during code evolution

optimization can be performed for the code

• the optimization usually depends on target execution environment and hardware, creating less portable code

(6)

Software optimization

How to optimize a part of the code:

use optimizing compiler

perform manual optimization

• contemporary optimizing compilers are doing their job very well

• it is difficult to obtain by changing the source the same effect as by the use of an optimizing compiler

without optimization options compilers often produce unnecessarily slow code (e.g. for debugging purposes)

• the best way for manual optimization is to apply specific

techniques that help compilers to produce more effective code

allow for reducing the number of operations, effectively using different instruction pipelines, removing dependencies,

choosing proper functions and instructions, vectorizing code, optimally use memory hierarchy

use a different programming language, designed for performance

• eventually employ assembler language

(7)

Classical software optimization

Classical optimization concerns mainly the single node performance and aims primarily at:

reducing the number of performed operations

proper utilization of vector capabilities of the hardware

proper utilization of memory hierarchy

removing dependencies between instructions

Classical optimization techniques can be applied manually

most of the techniques are also utilized by the compilers

it is important not to inhibit compiler optimizations by manual source code changes

• it is unfortunately a common case when manually optimized code performs worse than before optimization due to improper

interaction with an optimizing compiler

Classical optimization can speed-up program execution dozens

of times in certain situations

(8)

Classical optimization techniques

General techniques for variables and expressions:

– constant folding

• instead of: for(i=...) r = 2*PI*r[i];

• use: const double 2_PI = 2*PI; for(i=...) r = 2_PI*r[i];

– copy propagation

• instead of: y = x; ...; z = f(y); // read-after-write

• use: y = x; ...; z = f(x); // no dependence

– strength reduction

• instead of: y = pow(x,4);

• use: temp = x*x; y = temp*temp;

– common subexpression elimination

• instead of: a = b * c + g; d = b * c * e;

• use: temp = b*c; a = temp + g; d = temp * e;

(9)

Classical optimization techniques

Loop oriented techniques

– induction variable simplification – loop invariant code motion

before:

for(i=0; i<N; i++){

for(j=0; j<N; j++) { sum += a[i*n+j];

} }

after LICM:

for(i=0; i<N; i++){

int in=i*n;

for(j=0; j<N; j++) { sum += a[in+j];

} }

after LICM+IVS:

for(i=0; i<N; i++){

int in=i*n;

for(j=0; j<N; j++) { sum += a[in];

in++;

} }

(10)

Classical optimization techniques

Loop oriented techniques

– loop unrolling

• instead of:

dot = 0.0;

for(i=0; i<N; i++) {

dot += X[i]*X[i];

}

• use:

dot = 0.0;

for(i=0; i<N; i+=4) // always add another loop with N%4 iterations {

dot += X[i]*X[i]+X[i+1]*X[i+1]+X[i+2]*X[i+2]+X[i+3]*X[i+3];

}

(11)

Classical optimization techniques

Loop oriented techniques

– loop fusion (e.g. to reduce the number of memory accesses)

before

for(k=0; k<16; k++){

a_tab[k] += 2*c_tab;

b_tab[k] += 2*d_tab;

}

for(k=0; k<16; k++){

a_tab[k] += d_tab;

b_tab[k] += c_tab;

}

after

for(k=0; k<16; k++){

a_tab[k] += 2*c_tab+d_tab;

b_tab[k] += 2*d_tab+c_tab;

}

(12)

Classical optimization techniques

Loop oriented techniques

– loop fission (e.g. to reduce register pressure)

after

for(i=0;i<1000000;i++){

for(k=0; k<16; k++){

a_tab[k] += 1.0;

b_tab[k] += 1.0;

} }

for(i=0;i<1000000;i++){

for(k=0; k<16; k++){

c_tab[k] += 1.0;

d_tab[k] += 1.0;

} }

before

for(i=0;i<1000000;i++){

for(k=0; k<16; k++){

a_tab[k] += 1.0;

b_tab[k] += 1.0;

c_tab[k] += 1.0;

d_tab[k] += 1.0;

} }

(13)

Classical optimization techniques

Loop oriented techniques

– loop interchange (e.g. to correct memory access pattern)

• before:

for( i=0; i<N; i++ ){

for( j=0; j<N; j++ ) {

sum += a[i+j*n]; // not optimal memory access, stride n } }

• after:

for( j=0; j<N; j++ ){

for( i=0; i<N; i++ ) {

sum += a[i+j*n]; // optimal memory access, stride 1 } }

(14)

Classical optimization techniques

Loop oriented techniques

– register blocking

• before:

for(i = 0; i < n, i++){

for(j = 0; j < n; j++) { sum += a[i*n+j] * x[j];

} }

• after (reduced number of memory accesses for x):

for(i = 0; i < n, i+=2){

for(j = 0; j < n; j+=2) { t0 = x[j];

t1 = x[j+1];

sum += a[i*n+j] * t0 + a[i*n+j+1] * t1;

sum += a[(i+1)*n+j] * t0 + a[(i+1)*n+j+1] * t1;

} }

(15)

Classical optimization techniques

Other techniques

– dead code removal

– tail-recursion elimination – inlining

– software prefetching – software pipelining

software prefetching and pipelining example:

before:

for(i = 0; i<n, i++){

fetch( a[i] );

process( a[i] );

}

after:

fetch( a[0] );

for(i = 0; i<n-1, i++){

fetch( a[i+1] );

process( a[i] );

}

process( a[n-1] );

Cytaty

Powiązane dokumenty

W przeciwieństwie d o pierwszej pozycji serii ..D eutschland und Ö sterreich&#34; nie jest pracą dw óch autorów konfrontujących swe poglądy na problem y

W porównaniu do środowisk badaczy gier zrze- szonych wokół diGRA można więc uznać, iż naukowcy związani z PTBG i prelegenci konferencji organizowanych przez

To be able to evaluate the functional performance of a certain kind of filling cell, the Tray Filling Cell, a simulation program is written in Must.. This report describes the

W najnowszej literaturze można odnotować wiele poglądów, których wyznaw­ cy wskazują istotną przeszkodę na drodze do precyzyjnego wykazania produktywności administracji

W utworze opisywana jest operacja mózgu, która odbywa się w nadmorskim instytucie medycznym, gdzie wykonuje się m.in.. sekcje neuronów lub tworzy układy hybrydowe,

floatmg in waves in each other's vicinity The hydrodynamic coefficients of each 'body and 'hydrodynamic interaction coefficients are calculated for several configurations of

Przede wszystkim jednak artykuł zasługuje na uwage˛ z tego powodu, z˙e jego autor stara sie˛ wykazac´ fałszywos´c´ tezy, według której ewidencja dos´wiadczalna nie

W odniesieniu do jZzyka polskiego termin turcyzm moSna zatem odnieUb do saów pochodz_cych wya_cznie z jZzyka osmafsko-tureckiego b_dg teS do wyrazów z któregoU