• Nie Znaleziono Wyników

The marvel garden; inside the Java Virtual Machine

N/A
N/A
Protected

Academic year: 2021

Share "The marvel garden; inside the Java Virtual Machine"

Copied!
133
0
0

Pełen tekst

(1)

The marvel garden; inside the Java Virtual

Machine

Dawid Weiss

(2)
(3)
(4)

Talk Outline

Philosophical note about performance.

Data types, with consequences.

Memory allocation, garbage collection.

HotSpot optimizations.

This talk is an abbreviated version of several presentations given at the Pozna ´n Java Users Group (JUG) and at GeeCon’2010.

(5)

Divide-and-conquer

style algorithm

for (Example e : examples) {

e.hasQuiz() ? e.showQuiz() : e.showCode(); e.explain();

e.deriveConclusions(); }

(6)

Philosophical note about

performance.

(7)

Is Java faster than C/C++?

The short answer is: it depends.

(8)

It’s usually

hard

to make

a fast program run faster.

It’s

easy

to make a slow

program run even slower.

It’s

easy

to make fast

(9)

It’s usually

hard

to make

a fast program run faster.

It’s

easy

to make a slow

program run even slower.

It’s

easy

to make fast

(10)
(11)

Sanity check

Algorithms, algorithms, algorithms.

Proper data structures.

(12)

Sanity check

Algorithms, algorithms, algorithms.

Proper data structures.

(long time nothing)

(13)

Sanity check

Algorithms, algorithms, algorithms.

Proper data structures.

(long time nothing)

(14)

Data types,

(15)

Hybrid data types in Java

primitive types (non-objects),

object types,

(16)

Example 1.

declaration

size in-memory (approx.)

byte b = 1

Byte b = 1

ref + 8 + 1

double d = 1

8

Double d = 1

ref + 8 + 8

char c = ’c’

2

byte[] b = new byte[100]

12 + 100

Byte[] b = new Byte[100]

12 + ref

×

100 + objects

boolean[] b = new boolean[100]

(17)

Example 1.

declaration

size in-memory (approx.)

byte b = 1

1

Byte b = 1

double d = 1

8

Double d = 1

ref + 8 + 8

char c = ’c’

2

byte[] b = new byte[100]

12 + 100

Byte[] b = new Byte[100]

12 + ref

×

100 + objects

boolean[] b = new boolean[100]

(18)

Example 1.

declaration

size in-memory (approx.)

byte b = 1

1

Byte b = 1

ref + 8 + 1

double d = 1

Double d = 1

ref + 8 + 8

char c = ’c’

2

byte[] b = new byte[100]

12 + 100

Byte[] b = new Byte[100]

12 + ref

×

100 + objects

boolean[] b = new boolean[100]

(19)

Example 1.

declaration

size in-memory (approx.)

byte b = 1

1

Byte b = 1

ref + 8 + 1

double d = 1

8

Double d = 1

char c = ’c’

2

byte[] b = new byte[100]

12 + 100

Byte[] b = new Byte[100]

12 + ref

×

100 + objects

boolean[] b = new boolean[100]

(20)

Example 1.

declaration

size in-memory (approx.)

byte b = 1

1

Byte b = 1

ref + 8 + 1

double d = 1

8

Double d = 1

ref + 8 + 8

char c = ’c’

byte[] b = new byte[100]

12 + 100

Byte[] b = new Byte[100]

12 + ref

×

100 + objects

boolean[] b = new boolean[100]

(21)

Example 1.

declaration

size in-memory (approx.)

byte b = 1

1

Byte b = 1

ref + 8 + 1

double d = 1

8

Double d = 1

ref + 8 + 8

char c = ’c’

2

byte[] b = new byte[100]

Byte[] b = new Byte[100]

12 + ref

×

100 + objects

boolean[] b = new boolean[100]

(22)

Example 1.

declaration

size in-memory (approx.)

byte b = 1

1

Byte b = 1

ref + 8 + 1

double d = 1

8

Double d = 1

ref + 8 + 8

char c = ’c’

2

byte[] b = new byte[100]

12 + 100

Byte[] b = new Byte[100]

12 + ref

×

100 + objects

boolean[] b = new boolean[100]

(23)

Example 1.

declaration

size in-memory (approx.)

byte b = 1

1

Byte b = 1

ref + 8 + 1

double d = 1

8

Double d = 1

ref + 8 + 8

char c = ’c’

2

byte[] b = new byte[100]

12 + 100

Byte[] b = new Byte[100]

12 + ref

×

100 + objects

(24)

Example 1.

declaration

size in-memory (approx.)

byte b = 1

1

Byte b = 1

ref + 8 + 1

double d = 1

8

Double d = 1

ref + 8 + 8

char c = ’c’

2

byte[] b = new byte[100]

12 + 100

Byte[] b = new Byte[100]

12 + ref

×

100 + objects

(25)

String s = "c"

?

private final char value[];

/** The offset is the first index of the storage that is used. */

private final int offset;

/** The count is the number of characters in the String. */

private final int count;

/** Cache the hash code for the string */

private int hash; // Default to 0

(26)

String s = "c"

?

/** The value is used for character storage. */

private final char value[];

/** The offset is the first index of the storage that is used. */

private final int offset;

/** The count is the number of characters in the String. */

private final int count;

/** Cache the hash code for the string */

private int hash; // Default to 0

(27)

/** The value is used for character storage. */

private final char value[];

/** The offset is the first index of the storage that is used. */

private final int offset;

/** The count is the number of characters in the String. */

private final int count;

/** Cache the hash code for the string */

private int hash; // Default to 0

(28)

Does it make Java

inefficient?

No. It’s the price of

comfort (or laziness).

(29)

Does it make Java

inefficient?

No. It’s the price of

comfort (or laziness).

(30)

Example 2.

public void testSum1() {

int sum = 0;

for (int i = 0; i < COUNT; i++) sum += sum1(i, i);

result = sum; }

public void testSum2() {

int sum = 0;

for (int i = 0; i < COUNT; i++) sum += sum2(i, i);

result = sum; }

(31)

Example 2.

public void testSum1() {

int sum = 0;

for (int i = 0; i < COUNT; i++) sum += sum1(i, i);

result = sum; }

public void testSum2() {

int sum = 0;

for (int i = 0; i < COUNT; i++) sum += sum2(i, i);

result = sum; }

where the body of

sum1

and

sum2

sums arguments and returns the

result and

COUNT

is significantly large. . .

(32)

VM

sum1

sum2

sun-1.6.0-20

sun-1.6.0-16

0.04

3.20

sun-1.5.0-18

0.04

3.29

ibm-1.6.2

0.08

6.28

jrockit-27.5.0

0.18

0.16

harmony-r917296

0.17

0.35

(33)

VM

sum1

sum2

sun-1.6.0-20

0.04

sun-1.6.0-16

0.04

3.20

sun-1.5.0-18

0.04

3.29

ibm-1.6.2

0.08

6.28

jrockit-27.5.0

0.18

0.16

harmony-r917296

0.17

0.35

(34)

VM

sum1

sum2

sun-1.6.0-20

0.04

2.62

sun-1.6.0-16

sun-1.5.0-18

0.04

3.29

ibm-1.6.2

0.08

6.28

jrockit-27.5.0

0.18

0.16

harmony-r917296

0.17

0.35

(35)

VM

sum1

sum2

sun-1.6.0-20

0.04

2.62

sun-1.6.0-16

0.04

3.20

sun-1.5.0-18

ibm-1.6.2

0.08

6.28

jrockit-27.5.0

0.18

0.16

harmony-r917296

0.17

0.35

(36)

VM

sum1

sum2

sun-1.6.0-20

0.04

2.62

sun-1.6.0-16

0.04

3.20

sun-1.5.0-18

0.04

3.29

ibm-1.6.2

jrockit-27.5.0

0.18

0.16

harmony-r917296

0.17

0.35

(37)

VM

sum1

sum2

sun-1.6.0-20

0.04

2.62

sun-1.6.0-16

0.04

3.20

sun-1.5.0-18

0.04

3.29

ibm-1.6.2

0.08

6.28

jrockit-27.5.0

harmony-r917296

0.17

0.35

(38)

VM

sum1

sum2

sun-1.6.0-20

0.04

2.62

sun-1.6.0-16

0.04

3.20

sun-1.5.0-18

0.04

3.29

ibm-1.6.2

0.08

6.28

jrockit-27.5.0

0.18

0.16

harmony-r917296

(39)

sun-1.6.0-20

0.04

2.62

sun-1.6.0-16

0.04

3.20

sun-1.5.0-18

0.04

3.29

ibm-1.6.2

0.08

6.28

jrockit-27.5.0

0.18

0.16

harmony-r917296

0.17

0.35

(40)

sun-1.6.0-20

0.04

2.62

1.05

3.76

sun-1.6.0-16

0.04

3.20

1.39

4.99

sun-1.5.0-18

0.04

3.29

1.46

5.20

ibm-1.6.2

0.08

6.28

0.16

14.64

jrockit-27.5.0

0.18

0.16

1.16

3.18

harmony-r917296

0.17

0.35

9.18

22.49

(41)

return a + b; }

return a + b; }

Integer sum2(Integer a, Integer b) {

return Integer.valueOf( a.intValue() + b.intValue()); }

(42)

int sum = 0;

for (int i = 0; i < args.length; i++) sum += args[i];

return sum; }

int sum = 0;

for (int i = 0; i < args.length; i++) { sum += args[i];

}

return sum; }

Integer sum4(Integer [] args) {

// ...

(43)

Conclusions

Syntactic sugar

may

be costly.

Primitive types are

fast

on most VMs.

(44)

Hints

If you have a great deal of computations

and

use collections (lists,

maps), take a peek at:

fastutil

http://fastutil.dsi.unimi.it/

Apache Mahout Collections

http://lucene.apache.org/mahout/

HPPC

http://labs.carrotsearch.com/hppc.html

PCJ is unmaintained and buggy, avoid

(45)
(46)

Example 3.

public static void main(String[] args) throws Exception {

final long start = Runtime.getRuntime().freeMemory();

final byte[][] arrays = new byte[100][];

for (int i = 0; i < arrays.length; i++) { arrays[i] = new byte[100];

long current = Runtime.getRuntime().freeMemory(); System.out.println(start + " " + current); Thread.sleep(1000);

} ...

(47)

1 61934120 61934120 2 61934120 61934120 3 61934120 61934120 4 61934120 61934120 5 61934120 61934120 6 61934120 61934120 7 61934120 61934120 ...

(48)

Conclusions

Memory allocation is very different to C/C++.

There is no single “memory allocation scheme”

(49)
(50)

Garbage detection

direct garbage identification (e.g., reference counting),

live object marking.

(51)

Mark-Sweep

(52)

Mark-Sweep – Marking

(53)

Mark-Sweep – Sweeping

(54)

Mark-Compact

(55)

Mark-Compact – Marking

(56)

Mark-Compact – Compacting

Root

(57)

Copying

From-Space

To-Space

unused

(58)

Copying – Evacuation

From-Space

To-Space

Root

(59)

Copying – Flip

From-Space

To-Space

Root

unused

FreePtr

(60)

Typically

Most objects die quickly.

Many objects never die.

(61)

Generational Garbage Collection

Young Generation

Old Generation

Allocation

(62)

Generational GC Benefits

For most Java applications, Generational GC is

hard to beat

>

Many have tried,

>

including Sun.

(63)

The Heap In The HotSpot JVM

Young Generation

Old Generation

(64)

HotSpot Young Generation

From

Old Generation

Eden

To

Survivor Spaces

Young Generation

unused

(65)

Before Young Collection

From

Old Generation

Eden

To

Survivor Spaces

Young Generation

unused

(66)

TLABs

Thread-Local Allocation Buffers

Each application thread gets a TLAB to allocate into

>

TLABs allocated in the Eden

>

Bump-a-pointer allocation; fast

>

No synchronization (thread “owns” TLAB for allocation)

Only synchronization when getting a new TLAB

>

Bump-a-pointer to allocate TLAB too; also fast

Allocation code inlined

(67)

Attaching to process ID 9134, please wait... Debugger attached successfully.

Server compiler detected. JVM version is 14.0-b16

using thread-local object allocation. Parallel GC with 2 thread(s)

Heap Configuration: MinHeapFreeRatio = 40 MaxHeapFreeRatio = 70 MaxHeapSize = 1040187392 (992.0MB) NewSize = 2686976 (2.5625MB) MaxNewSize = 17592186044415 MB OldSize = 5439488 (5.1875MB) NewRatio = 2 SurvivorRatio = 8 PermSize = 21757952 (20.75MB) MaxPermSize = 88080384 (84.0MB) ...

(68)

PS Young Generation Eden Space: capacity = 105512960 (100.625MB) used = 55139968 (52.5855712890625MB) free = 50372992 (48.0394287109375MB) 52.258952833850934% used From Space: capacity = 2424832 (2.3125MB) used = 0 (0.0MB) free = 2424832 (2.3125MB) 0.0% used To Space: capacity = 2359296 (2.25MB) used = 0 (0.0MB) free = 2359296 (2.25MB) 0.0% used PS Old Generation capacity = 437125120 (416.875MB) used = 308442480 (294.15367126464844MB) free = 128682640 (122.72132873535156MB) 70.56160030336395% used PS Perm Generation capacity = 21757952 (20.75MB) used = 2519424 (2.4027099609375MB) free = 19238528 (18.3472900390625MB) 11.579325112951807% used

(69)

Back to Example 3:

0 61934120 61934120 1 61934120 61934120 2 61934120 61934120 3 61934120 61934120 ...

java -XX:-UseTLAB com.dawidweiss.examples.Example07

0 61974064 61973112 1 61974064 61971256 2 61974064 61970648 3 61974064 61970040 4 61974064 61969432 ...

(70)

Back to Example 3:

0 61934120 61934120 1 61934120 61934120 2 61934120 61934120 3 61934120 61934120 ...

What if we disable TLABs?

java -XX:-UseTLAB com.dawidweiss.examples.Example07

1 61974064 61971256 2 61974064 61970648 3 61974064 61970040 4 61974064 61969432 ...

(71)

0 61934120 61934120 1 61934120 61934120 2 61934120 61934120 3 61934120 61934120 ...

What if we disable TLABs?

java -XX:-UseTLAB com.dawidweiss.examples.Example07

0 61974064 61973112 1 61974064 61971256 2 61974064 61970648 3 61974064 61970040 4 61974064 61969432 ...

(72)

GC “smackdown”

ParallelOldGC, SerialGC, G1GC, ParallelGC,

ConcMarkSweepGC.

64-bit JVM (SUN, 1.6.0_14-b08).

OpenSuSE, 64-bit, 4GB RAM.

(73)
(74)

Loops logs/log-loop-UseConcMarkSweepGC.log t [s] (uptime) 24.70 123.49 246.97 370.46 0.1094 0.2187 0.3281 logs/log-loop-UseG1GC.log t [s] (uptime) 24.70 123.49 246.97 370.46 0.1094 0.2187 0.3281 logs/log-loop-UseParallelGC.log t [s] (uptime) 24.70 123.49 246.97 370.46 0.1094 0.2187 0.3281 logs/log-loop-UseParallelOldGC.log t [s] (uptime) t [s] (gc time) 24.70 123.49 246.97 370.46 0.1094 0.2187 0.3281 0.4375 logs/log-loop-UseSerialGC.log t [s] (uptime) t [s] (gc time) 24.70 123.49 246.97 370.46 0.1094 0.2187 0.3281 0.4375

(75)

Random logs/log-random-UseConcMarkSweepGC.log t [s] (uptime) 22.78 113.92 227.85 341.77 0.2194 0.4389 0.6583 logs/log-random-UseG1GC.log t [s] (uptime) 22.78 113.92 227.85 341.77 0.2194 0.4389 0.6583 logs/log-random-UseParallelGC.log t [s] (uptime) 22.78 113.92 227.85 341.77 0.2194 0.4389 0.6583 logs/log-random-UseParallelOldGC.log t [s] (uptime) t [s] (gc time) 22.78 113.92 227.85 341.77 0.2194 0.4389 0.6583 0.8778 logs/log-random-UseSerialGC.log t [s] (uptime) t [s] (gc time) 22.78 113.92 227.85 341.77 0.2194 0.4389 0.6583 0.8778

(76)

Hints

Memory allocation is usually

very fast

and optimized.

Don’t play with various garbage collectors, it’s not worth it,

instead. . .

Allocate memory statically if high-performance is needed,

and. . .

Dump and inspect GC activity with

-verbose:gc

and other

options.

(77)

Hints

Memory allocation is usually

very fast

and optimized.

Don’t play with various garbage collectors, it’s not worth it,

instead. . .

Allocate memory statically if high-performance is needed,

and. . .

Dump and inspect GC activity with

-verbose:gc

and other

options.

(78)

Hints

Memory allocation is usually

very fast

and optimized.

Don’t play with various garbage collectors, it’s not worth it,

instead. . .

Allocate memory statically if high-performance is needed,

and. . .

-verbose:gc

options.

(79)

Hints

Memory allocation is usually

very fast

and optimized.

Don’t play with various garbage collectors, it’s not worth it,

instead. . .

Allocate memory statically if high-performance is needed,

and. . .

Dump and inspect GC activity with

-verbose:gc

and other

options.

(80)

Hints

Memory allocation is usually

very fast

and optimized.

Don’t play with various garbage collectors, it’s not worth it,

instead. . .

Allocate memory statically if high-performance is needed,

and. . .

Dump and inspect GC activity with

-verbose:gc

and other

options.

(81)
(82)

Example 4

(83)
(84)
(85)
(86)

But it’s the same program

and the same VM!

(87)
(88)

public static void startThread() {

new Thread() {

public void run() {

try {

sleep(2000);

} catch (Exception e) { /* ignore */ } System.out.println("Marking loop exit."); ready = true;

} }.start(); }

public static void main(String[] args) { startThread();

System.out.println("Entering the loop...");

while (!ready) {

// Do nothing.

}

System.out.println("Done, I left the loop!"); }

(89)

Imperative languages

have become a way to express

declarative

(90)

while (!ready) { // Do nothing. }

?

boolean r = ready; while (!r) { // Do nothing. }

(91)

while (!ready) { // Do nothing. }

?

while (!r) { // Do nothing. }

(92)
(93)
(94)

C1:

fast

not (much) optimization

C2:

slow(er) than C1

a lot

of JMM-allowed optimizations

What about

javac -O

option?

/* -O is a no-op, accepted for backward compatibility. */

(95)

C1:

fast

not (much) optimization

C2:

slow(er) than C1

a lot

of JMM-allowed optimizations

What about

javac -O

option?

(96)

C1:

fast

not (much) optimization

C2:

slow(er) than C1

a lot

of JMM-allowed optimizations

What about

javac -O

option?

/* -O is a no-op, accepted for backward compatibility. */

(97)

There are hundreds of JVM

tuning/diagnostic switches.

(98)
(99)

Conclusions

Bytecode is

far

from what is executed.

A lot

going on under the (VM) hood.

Bad code may work, but will eventually crash.

HotSpot-level optimizations are

good

.

(100)

Conclusions

Bytecode is

far

from what is executed.

A lot

going on under the (VM) hood.

Bad code may work, but will eventually crash.

HotSpot-level optimizations are

good

.

(101)
(102)
(103)

Example 5

public void testSum1() {

int sum = 0;

for (int i = 0; i < COUNT; i++) sum += sum1(i, i);

result = sum; }

public void testSum1_2() {

int sum = 0;

for (int i = 0; i < COUNT; i++) sum += sum1(i, i);

(104)

VM

sum1

sum1_2

sun-1.6.0-20

sun-1.6.0-16

0.04

0.00

sun-1.5.0-18

0.04

0.00

ibm-1.6.2

0.08

0.01

jrockit-27.5.0

0.17

0.08

harmony-r917296

0.17

0.11

(105)

VM

sum1

sum1_2

sun-1.6.0-20

0.04

sun-1.6.0-16

0.04

0.00

sun-1.5.0-18

0.04

0.00

ibm-1.6.2

0.08

0.01

jrockit-27.5.0

0.17

0.08

harmony-r917296

0.17

0.11

(106)

VM

sum1

sum1_2

sun-1.6.0-20

0.04

0.00

sun-1.5.0-18

0.04

0.00

ibm-1.6.2

0.08

0.01

jrockit-27.5.0

0.17

0.08

harmony-r917296

0.17

0.11

(107)

sun-1.6.0-20

0.04

0.00

sun-1.6.0-16

0.04

0.00

sun-1.5.0-18

0.04

0.00

ibm-1.6.2

0.08

0.01

jrockit-27.5.0

0.17

0.08

harmony-r917296

0.17

0.11

(108)

java -server -XX:+PrintOptoAssembly -XX:+PrintCompilation ...

- access: 0xc1000001 public

- name: ’testSum1_2’

...

010 pushq rbp

subq rsp, #16 # Create frame

nop # nop for patch_verified_entry 016 addq rsp, 16 # Destroy frame

popq rbp

testl rax, [rip + #offset_to_poll_page] # Safepoint: poll for GC 021 ret

(109)

- method holder: ’com/dawidweiss/geecon2010/Example03’ - access: 0xc1000001 public

- name: ’testSum1_2’

...

010 pushq rbp

subq rsp, #16 # Create frame

nop # nop for patch_verified_entry 016 addq rsp, 16 # Destroy frame

popq rbp

testl rax, [rip + #offset_to_poll_page] # Safepoint: poll for GC 021 ret

(110)

Conclusions

HotSpot is smart and effective at removing

dead code.

(111)

Example 6

@Test

public void testAdd1() {

int sum = 0;

for (int i = 0; i < COUNT; i++) { sum += add1(i);

}

guard = sum; }

public int add1(int i) {

return i + 1; }

(112)

testAdd1

-XX:+Inlining -XX:+PrintInlining

0.04

-XX:-Inlining

?

(113)

testAdd1

-XX:+Inlining -XX:+PrintInlining

0.04

-XX:-Inlining

0.45

(114)

Most Java calls are

(115)

HotSpot adjusts to

megamorphic

calls

automatically.

(116)

Example 7

abstract class Superclass {

abstract int call(); }

class Sub1 extends Superclass { int call() { return 1; } }

class Sub2 extends Superclass { int call() { return 2; } }

class Sub3 extends Superclass { int call() { return 3; } } Superclass[] mixed =

initWithRandomInstances(10000); Superclass[] solid =

initWithSub1Instances(10000);

@Test

public void testMonomorphic() {

int sum = 0;

int m = solid.length;

for (int i = 0; i < COUNT; i++) sum += solid[i % m].call(); guard = sum;

} @Test

public void testMegamorphic() {

int sum = 0;

int m = mixed.length;

for (int i = 0; i < COUNT; i++) sum += mixed[i % m].call(); guard = sum;

(117)

sun-1.6.0-20

0.19

0.32

sun-1.6.0-16

0.19

0.34

sun-1.5.0-18

0.18

0.34

ibm-1.6.2

0.20

0.30

jrockit-27.5.0

0.22

0.29

harmony-r917296

0.27

0.32

(118)

Example 8

@Test

public void testBitCount1() {

int sum = 0;

for (int i = 0; i < COUNT; i++) sum += Integer.bitCount(i); guard = sum;

}

@Test

public void testBitCount2() {

int sum = 0;

for (int i = 0; i < COUNT; i++) sum += bitCount(i); guard = sum; } /* Copied from * {@link Integer#bitCount} */

static int bitCount(int i) {

// HD, Figure 5-2 i = i - ((i >>> 1) & 0x55555555); i = (i & 0x33333333) + ((i >>> 2) & 0x33333333); i = (i + (i >>> 4)) & 0x0f0f0f0f; i = i + (i >>> 8); i = i + (i >>> 16); return i & 0x3f; }

(119)

VM

testBitCount1

testBitCount2

sun-1.6.0-20

0.43

0.43

sun-1.7.0-b80

0.43

0.43

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200).

sun-1.6.0-20

0.08

0.33

sun-1.7.0-b83

0.07

0.32

(120)

testBitCount1

testBitCount2

sun-1.6.0-20

0.43

0.43

sun-1.7.0-b80

0.43

0.43

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200).

VM

testBitCount1

testBitCount2

sun-1.6.0-20

0.08

0.33

sun-1.7.0-b83

0.07

0.32

(121)

... -XX:+PrintInlining ...

Inlining intrinsic _bitCount_i at bci:9 in ..Example06::testBitCount1 Inlining intrinsic _bitCount_i at bci:9 in ..Example06::testBitCount1 Inlining intrinsic _bitCount_i at bci:9 in ..Example06::testBitCount1 Example06.testBitCount1: [measured 10 out of 15 rounds]

round: 0.07 [+- 0.00], round.gc: 0.00 [+- 0.00] ...

@ 9 com.dawidweiss.geecon2010.Example06::bitCount inline (hot) @ 9 com.dawidweiss.geecon2010.Example06::bitCount inline (hot) @ 9 com.dawidweiss.geecon2010.Example06::bitCount inline (hot) Example06.testBitCount2: [measured 10 out of 15 rounds]

(122)

...

Inlining intrinsic _bitCount_i at bci:9 in ..Example06::testBitCount1 Inlining intrinsic _bitCount_i at bci:9 in ..Example06::testBitCount1 Inlining intrinsic _bitCount_i at bci:9 in ..Example06::testBitCount1 Example06.testBitCount1: [measured 10 out of 15 rounds]

round: 0.07 [+- 0.00], round.gc: 0.00 [+- 0.00] ...

@ 9 com.dawidweiss.geecon2010.Example06::bitCount inline (hot) @ 9 com.dawidweiss.geecon2010.Example06::bitCount inline (hot) @ 9 com.dawidweiss.geecon2010.Example06::bitCount inline (hot) Example06.testBitCount2: [measured 10 out of 15 rounds]

(123)

... -XX:+PrintOptoAssembly ...

- klass: {other class}

- method holder: com/dawidweiss/geecon2010/Example06 - name: testBitCount1

...

0c2 B13: # B12 B14 &lt;- B8 B12 Loop: B13-B12 inner stride: ... 0c2 movl R10, RDX # spill

...

0e1 movl [rsp + #40], R11 # spill 0e6 popcnt R8, R8

...

0f5 addl R9, #7 # int 0f9 popcnt R11, R11 0fe popcnt RCX, R9

(124)

{method}

- klass: {other class}

- method holder: com/dawidweiss/geecon2010/Example06 - name: testBitCount1

...

0c2 B13: # B12 B14 &lt;- B8 B12 Loop: B13-B12 inner stride: ... 0c2 movl R10, RDX # spill

...

0e1 movl [rsp + #40], R11 # spill 0e6 popcnt R8, R8

...

0f5 addl R9, #7 # int 0f9 popcnt R11, R11 0fe popcnt RCX, R9

(125)
(126)
(127)
(128)

Java can be very fast

(129)

Constant progress in JITs

(130)

More conclusions

Benchmarks must be statistically sound.

averages, variance, min, max, warm-up phase

Account for HotSpot optimisations.

Account for hardware differences.

test-on-target

Best: use domain data and real scenarios.

Inspect suspicious output with debug JVM.

(131)

Performance checklist

(sanity check)

Algorithms, algorithms, algorithms.

Proper data structures.

Spurious GC activity.

Memory barriers in tight loops.

CPU cache utilization.

(132)

Self-study

Overhead of volatile (memory barriers) and

monitors (locks).

Lock contention and dealing with it.

Lockless, safe data structures.

Code reordering and potential breaking

(133)

Cytaty

Powiązane dokumenty

Nowak w dalszej części swojej refleksji nad doświadczeniem Lasek podkreśla uniwersalność tego środowiska, mogącego inspirować inne środowiska, zwłaszcza te,

Niestety, poziom mo- nitorowania zagrożeń związanych z tym czynnikiem jest gorszy; opiera się ono głównie na danych subiektywnych (obserwacje i audyty oraz raporty załóg)..

It was originally on the sixth of June, but it seems that Sarah’s mother won’t leave hospital until the tenth of June, so we really didn’t have any choice..

Przy jego opracowaniu NRA kierowała się zadaniami adwo­ katury wynikającymi z ustawy-Prawo o adwokaturze, wytycznymi Rady Państwa określonymi przy rozpatrywaniu

Pojawił się też generał Fedon Gizikis, który jako pierwszy od czternastu miesięcy zwrócił się do Alekosa per „pan” [U 99] i któremu – jako „dżentelmenowi, osobie

Zaproponowany, nowy instrument stabilizacji dochodów (IST) rodzi jednak pewne obawy. Mianowicie podstawą jego wdrożenia i realizacji jest ewidencja dochodów

Ex- plosive mixtures of dust and air may form during transport (e.g. in bucket elevators) and during the storage of raw mate- rials such as cereals, sugar and flour. An explosion

`Sustainable housing' is defined as housing with a minimum of negative environmental impacts in terms of climate change (greenhouse effect); the quality of air, water, and soil;