The marvel garden; inside the Java Virtual Machine

(1)

The marvel garden; inside the Java Virtual

Machine

Dawid Weiss

(2)

(3)

(4)

Talk Outline

• Philosophical note about performance.

• Data types, with consequences.

• _{Memory allocation, garbage collection.}

• HotSpot optimizations.

This talk is an abbreviated version of several presentations given at the Pozna ´n Java Users Group (JUG) and at GeeCon’2010.

(5)

Divide-and-conquer

style algorithm

for (Example e : examples) {

e.hasQuiz() ? e.showQuiz() : e.showCode(); e.explain();

e.deriveConclusions(); }

(6)

Philosophical note about

performance.

(7)

Is Java faster than C/C++?

The short answer is: it depends.

(8)

It’s usually

hard

to make

a fast program run faster.

It’s

easy

to make a slow

program run even slower.

It’s

easy

to make fast

(9)

It’s usually

hard

to make

a fast program run faster.

It’s

easy

to make a slow

program run even slower.

It’s

easy

to make fast

(10)

(11)

Sanity check

• Algorithms, algorithms, algorithms.

• Proper data structures.

(12)

Sanity check

• Algorithms, algorithms, algorithms.

• Proper data structures.

(long time nothing)

(13)

Sanity check

• Algorithms, algorithms, algorithms.

• Proper data structures.

(long time nothing)

(14)

Data types,

(15)

Hybrid data types in Java

• primitive types (non-objects),

• object types,

(16)

Example 1.

declaration

size in-memory (approx.)

byte b = 1

Byte b = 1

ref + 8 + 1

double d = 1

8 Double d = 1

ref + 8 + 8

char c = ’c’

2 byte[] b = new byte[100]

12 + 100

Byte[] b = new Byte[100]

12 + ref

×

100 + objects

boolean[] b = new boolean[100]

(17)

Example 1.

declaration

size in-memory (approx.)

byte b = 1

1 Byte b = 1

double d = 1

8 Double d = 1

ref + 8 + 8

char c = ’c’

2 byte[] b = new byte[100]

12 + 100

Byte[] b = new Byte[100]

12 + ref

×

100 + objects

boolean[] b = new boolean[100]

(18)

Example 1.

declaration

size in-memory (approx.)

byte b = 1

1 Byte b = 1

ref + 8 + 1

double d = 1

Double d = 1

ref + 8 + 8

char c = ’c’

2 byte[] b = new byte[100]

12 + 100

Byte[] b = new Byte[100]

12 + ref

×

100 + objects

boolean[] b = new boolean[100]

(19)

Example 1.

declaration

size in-memory (approx.)

byte b = 1

1 Byte b = 1

ref + 8 + 1

double d = 1

8 Double d = 1

char c = ’c’

2 byte[] b = new byte[100]

12 + 100

Byte[] b = new Byte[100]

12 + ref

×

100 + objects

boolean[] b = new boolean[100]

(20)

Example 1.

declaration

size in-memory (approx.)

byte b = 1

1 Byte b = 1

ref + 8 + 1

double d = 1

8 Double d = 1

ref + 8 + 8

char c = ’c’

byte[] b = new byte[100]

12 + 100

Byte[] b = new Byte[100]

12 + ref

×

100 + objects

boolean[] b = new boolean[100]

(21)

Example 1.

declaration

size in-memory (approx.)

byte b = 1

1 Byte b = 1

ref + 8 + 1

double d = 1

8 Double d = 1

ref + 8 + 8

char c = ’c’

2 byte[] b = new byte[100]

Byte[] b = new Byte[100]

12 + ref

×

100 + objects

boolean[] b = new boolean[100]

(22)

Example 1.

declaration

size in-memory (approx.)

byte b = 1

1 Byte b = 1

ref + 8 + 1

double d = 1

8 Double d = 1

ref + 8 + 8

char c = ’c’

2 byte[] b = new byte[100]

12 + 100

Byte[] b = new Byte[100]

12 + ref

×

100 + objects

boolean[] b = new boolean[100]

(23)

Example 1.

declaration

size in-memory (approx.)

byte b = 1

1 Byte b = 1

ref + 8 + 1

double d = 1

8 Double d = 1

ref + 8 + 8

char c = ’c’

2 byte[] b = new byte[100]

12 + 100

Byte[] b = new Byte[100]

12 + ref

×

100 + objects

(24)

Example 1.

declaration

size in-memory (approx.)

byte b = 1

1 Byte b = 1

ref + 8 + 1

double d = 1

8 Double d = 1

ref + 8 + 8

char c = ’c’

2 byte[] b = new byte[100]

12 + 100

Byte[] b = new Byte[100]

12 + ref

×

100 + objects

(25)

String s = "c"

?

private final char value[];

/** The offset is the first index of the storage that is used. */

private final int offset;

/** The count is the number of characters in the String. */

private final int count;

/** Cache the hash code for the string */

private int hash; // Default to 0

(26)

String s = "c"

?

/** The value is used for character storage. */

(27)

/** The value is used for character storage. */

(28)

Does it make Java

inefficient?

No. It’s the price of

comfort (or laziness).

(29)

Does it make Java

inefficient?

No. It’s the price of

comfort (or laziness).

(30)

Example 2.

public void testSum1() {

int sum = 0;

for (int i = 0; i < COUNT; i++) sum += sum1(i, i);

result = sum; }

int sum = 0;

result = sum; }

(31)

Example 2.

int sum = 0;

result = sum; }

int sum = 0;

result = sum; }

where the body of

sum1

and

sum2

sums arguments and returns the

result and

COUNT

is significantly large. . .

(32)

VM

sum1

sum2

sun-1.6.0-20

sun-1.6.0-16

0.04

3.20 sun-1.5.0-18

0.04

3.29 ibm-1.6.2

0.08

6.28 jrockit-27.5.0

0.18

0.16 harmony-r917296

0.17

0.35

(33)

VM

sum1

sum2

sun-1.6.0-20

0.04 sun-1.6.0-16

0.04

3.20 sun-1.5.0-18

0.04

3.29 ibm-1.6.2

0.08

6.28 jrockit-27.5.0

0.18

0.16 harmony-r917296

0.17

0.35

(34)

VM

sum1

sum2

sun-1.6.0-20

0.04

2.62 sun-1.6.0-16

sun-1.5.0-18

0.04

3.29 ibm-1.6.2

0.08

6.28 jrockit-27.5.0

0.18

0.16 harmony-r917296

0.17

0.35

(35)

VM

sum1

sum2

sun-1.6.0-20

0.04

2.62 sun-1.6.0-16

0.04

3.20 sun-1.5.0-18

ibm-1.6.2

0.08

6.28 jrockit-27.5.0

0.18

0.16 harmony-r917296

0.17

0.35

(36)

VM

sum1

sum2

sun-1.6.0-20

0.04

2.62 sun-1.6.0-16

0.04

3.20 sun-1.5.0-18

0.04

3.29 ibm-1.6.2

jrockit-27.5.0

0.18

0.16 harmony-r917296

0.17

0.35

(37)

VM

sum1

sum2

sun-1.6.0-20

0.04

2.62 sun-1.6.0-16

0.04

3.20 sun-1.5.0-18

0.04

3.29 ibm-1.6.2

0.08

6.28 jrockit-27.5.0

harmony-r917296

0.17

0.35

(38)

VM

sum1

sum2

sun-1.6.0-20

0.04

2.62 sun-1.6.0-16

0.04

3.20 sun-1.5.0-18

0.04

3.29 ibm-1.6.2

0.08

6.28 jrockit-27.5.0

0.18

0.16 harmony-r917296

(39)

sun-1.6.0-20

0.04

2.62 sun-1.6.0-16

0.04

3.20 sun-1.5.0-18

0.04

3.29 ibm-1.6.2

0.08

6.28 jrockit-27.5.0

0.18

0.16 harmony-r917296

0.17

0.35

(40)

sun-1.6.0-20

0.04

2.62

1.05

3.76 sun-1.6.0-16

0.04

3.20

1.39

4.99 sun-1.5.0-18

0.04

3.29

1.46

5.20 ibm-1.6.2

0.08

6.28

0.16

14.64 jrockit-27.5.0

0.18

0.16

1.16

3.18 harmony-r917296

0.17

0.35

9.18

22.49

(41)

return a + b; }

↓

Integer sum2(Integer a, Integer b) {

return Integer.valueOf( a.intValue() + b.intValue()); }

(42)

int sum = 0;

for (int i = 0; i < args.length; i++) sum += args[i];

return sum; }

int sum = 0;

for (int i = 0; i < args.length; i++) { sum += args[i];

}

return sum; }

↓

Integer sum4(Integer [] args) {

// ...

(43)

Conclusions

• Syntactic sugar

may

be costly.

• _{Primitive types are}

_fast

_{on most VMs.}

(44)

Hints

If you have a great deal of computations

and

use collections (lists,

maps), take a peek at:

• fastutil

http://fastutil.dsi.unimi.it/

• Apache Mahout Collections

http://lucene.apache.org/mahout/

• HPPC

http://labs.carrotsearch.com/hppc.html

• PCJ is unmaintained and buggy, avoid

(45)

(46)

Example 3.

public static void main(String[] args) throws Exception {

final long start = Runtime.getRuntime().freeMemory();

final byte[][] arrays = new byte[100][];

for (int i = 0; i < arrays.length; i++) { arrays[i] = new byte[100];

long current = Runtime.getRuntime().freeMemory(); System.out.println(start + " " + current); Thread.sleep(1000);

} ...

(47)

1 61934120 61934120 2 61934120 61934120 3 61934120 61934120 4 61934120 61934120 5 61934120 61934120 6 61934120 61934120 7 61934120 61934120 ...

(48)

Conclusions

• Memory allocation is very different to C/C++.

• There is no single “memory allocation scheme”

(49)

(50)

Garbage detection

• direct garbage identification (e.g., reference counting),

• live object marking.

(51)

Mark-Sweep

(52)

Mark-Sweep – Marking

(53)

Mark-Sweep – Sweeping

(54)

Mark-Compact

(55)

Mark-Compact – Marking

(56)

Mark-Compact – Compacting

Root

(57)

Copying

From-Space

To-Space

unused

(58)

Copying – Evacuation

From-Space

To-Space

Root

(59)

Copying – Flip

From-Space

To-Space

Root

unused

FreePtr

(60)

Typically

• Most objects die quickly.

• Many objects never die.

(61)

Generational Garbage Collection

Young Generation

Old Generation

Allocation

(62)

Generational GC Benefits

• For most Java applications, Generational GC is

hard to beat

>

Many have tried,

>

including Sun.

(63)

The Heap In The HotSpot JVM

Young Generation

Old Generation

(64)

HotSpot Young Generation

From

Old Generation

Eden

To

Survivor Spaces

Young Generation

unused

(65)

Before Young Collection

From

Old Generation

Eden

To

Survivor Spaces

Young Generation

unused

(66)

TLABs

• Thread-Local Allocation Buffers

• Each application thread gets a TLAB to allocate into

>

TLABs allocated in the Eden

>

Bump-a-pointer allocation; fast

>

No synchronization (thread “owns” TLAB for allocation)

• Only synchronization when getting a new TLAB

>

Bump-a-pointer to allocate TLAB too; also fast

• Allocation code inlined

(67)

Attaching to process ID 9134, please wait... Debugger attached successfully.

Server compiler detected. JVM version is 14.0-b16

using thread-local object allocation. Parallel GC with 2 thread(s)

Heap Configuration: MinHeapFreeRatio = 40 MaxHeapFreeRatio = 70 MaxHeapSize = 1040187392 (992.0MB) NewSize = 2686976 (2.5625MB) MaxNewSize = 17592186044415 MB OldSize = 5439488 (5.1875MB) NewRatio = 2 SurvivorRatio = 8 PermSize = 21757952 (20.75MB) MaxPermSize = 88080384 (84.0MB) ...

(68)

PS Young Generation Eden Space: capacity = 105512960 (100.625MB) used = 55139968 (52.5855712890625MB) free = 50372992 (48.0394287109375MB) 52.258952833850934% used From Space: capacity = 2424832 (2.3125MB) used = 0 (0.0MB) free = 2424832 (2.3125MB) 0.0% used To Space: capacity = 2359296 (2.25MB) used = 0 (0.0MB) free = 2359296 (2.25MB) 0.0% used PS Old Generation capacity = 437125120 (416.875MB) used = 308442480 (294.15367126464844MB) free = 128682640 (122.72132873535156MB) 70.56160030336395% used PS Perm Generation capacity = 21757952 (20.75MB) used = 2519424 (2.4027099609375MB) free = 19238528 (18.3472900390625MB) 11.579325112951807% used

(69)

Back to Example 3:

0 61934120 61934120 1 61934120 61934120 2 61934120 61934120 3 61934120 61934120 ...

java -XX:-UseTLAB com.dawidweiss.examples.Example07

0 61974064 61973112 1 61974064 61971256 2 61974064 61970648 3 61974064 61970040 4 61974064 61969432 ...

(70)

Back to Example 3:

0 61934120 61934120 1 61934120 61934120 2 61934120 61934120 3 61934120 61934120 ...

What if we disable TLABs?

1 61974064 61971256 2 61974064 61970648 3 61974064 61970040 4 61974064 61969432 ...

(71)

0 61934120 61934120 1 61934120 61934120 2 61934120 61934120 3 61934120 61934120 ...

What if we disable TLABs?

0 61974064 61973112 1 61974064 61971256 2 61974064 61970648 3 61974064 61970040 4 61974064 61969432 ...

(72)

GC “smackdown”

• ParallelOldGC, SerialGC, G1GC, ParallelGC,

ConcMarkSweepGC.

• 64-bit JVM (SUN, 1.6.0_14-b08).

• OpenSuSE, 64-bit, 4GB RAM.

(73)

(74)

Loops logs/log-loop-UseConcMarkSweepGC.log t [s] (uptime) 24.70 123.49 246.97 370.46 0.1094 0.2187 0.3281 logs/log-loop-UseG1GC.log t [s] (uptime) 24.70 123.49 246.97 370.46 0.1094 0.2187 0.3281 logs/log-loop-UseParallelGC.log t [s] (uptime) 24.70 123.49 246.97 370.46 0.1094 0.2187 0.3281 logs/log-loop-UseParallelOldGC.log t [s] (uptime) t [s] (gc time) 24.70 123.49 246.97 370.46 0.1094 0.2187 0.3281 0.4375 logs/log-loop-UseSerialGC.log t [s] (uptime) t [s] (gc time) 24.70 123.49 246.97 370.46 0.1094 0.2187 0.3281 0.4375

(75)

Random logs/log-random-UseConcMarkSweepGC.log t [s] (uptime) 22.78 113.92 227.85 341.77 0.2194 0.4389 0.6583 logs/log-random-UseG1GC.log t [s] (uptime) 22.78 113.92 227.85 341.77 0.2194 0.4389 0.6583 logs/log-random-UseParallelGC.log t [s] (uptime) 22.78 113.92 227.85 341.77 0.2194 0.4389 0.6583 logs/log-random-UseParallelOldGC.log t [s] (uptime) t [s] (gc time) 22.78 113.92 227.85 341.77 0.2194 0.4389 0.6583 0.8778 logs/log-random-UseSerialGC.log t [s] (uptime) t [s] (gc time) 22.78 113.92 227.85 341.77 0.2194 0.4389 0.6583 0.8778

(76)

Hints

• Memory allocation is usually

very fast

and optimized.

Don’t play with various garbage collectors, it’s not worth it,

instead. . .

• Allocate memory statically if high-performance is needed,

and. . .

• Dump and inspect GC activity with

-verbose:gc

and other

options.

(77)

Hints

• Memory allocation is usually

very fast

and optimized.

• Don’t play with various garbage collectors, it’s not worth it,

instead. . .

Allocate memory statically if high-performance is needed,

and. . .

• Dump and inspect GC activity with

-verbose:gc

and other

options.

(78)

Hints

• Memory allocation is usually

very fast

and optimized.

• Don’t play with various garbage collectors, it’s not worth it,

instead. . .

• Allocate memory statically if high-performance is needed,

and. . .

-verbose:gc

options.

(79)

Hints

• Memory allocation is usually

very fast

and optimized.

• Don’t play with various garbage collectors, it’s not worth it,

instead. . .

• Allocate memory statically if high-performance is needed,

and. . .

• Dump and inspect GC activity with

-verbose:gc

and other

options.

(80)

Hints

• Memory allocation is usually

very fast

and optimized.

• Don’t play with various garbage collectors, it’s not worth it,

instead. . .

• Allocate memory statically if high-performance is needed,

and. . .

• Dump and inspect GC activity with

-verbose:gc

and other

options.

(81)

(82)

Example 4

(83)

(84)

(85)

(86)

But it’s the same program

and the same VM!

(87)

(88)

public static void startThread() {

new Thread() {

public void run() {

try {

sleep(2000);

} catch (Exception e) { /* ignore */ } System.out.println("Marking loop exit."); ready = true;

} }.start(); }

public static void main(String[] args) { startThread();

System.out.println("Entering the loop...");

while (!ready) {

// Do nothing.

}

System.out.println("Done, I left the loop!"); }

(89)

Imperative languages

have become a way to express

declarative

(90)

while (!ready) { // Do nothing. }

≡

?

boolean r = ready; while (!r) { // Do nothing. }

(91)

while (!ready) { // Do nothing. }

≡

?

while (!r) { // Do nothing. }

(92)

(93)

(94)

C1:

• fast

• not (much) optimization

C2:

• slow(er) than C1

• a lot

of JMM-allowed optimizations

What about

javac -O

option?

/* -O is a no-op, accepted for backward compatibility. */

(95)

C1:

• fast

• not (much) optimization

C2:

• slow(er) than C1

• a lot

of JMM-allowed optimizations

What about

javac -O

option?

(96)

C1:

• fast

• not (much) optimization

C2:

• slow(er) than C1

• a lot

of JMM-allowed optimizations

What about

javac -O

option?

/* -O is a no-op, accepted for backward compatibility. */

(97)

There are hundreds of JVM

tuning/diagnostic switches.

(98)

(99)

Conclusions

• Bytecode is

far

from what is executed.

• A lot

going on under the (VM) hood.

• _{Bad code may work, but will eventually crash.}

• HotSpot-level optimizations are

good

.

(100)

Conclusions

• Bytecode is

far

from what is executed.

• A lot

going on under the (VM) hood.

• _{Bad code may work, but will eventually crash.}

• HotSpot-level optimizations are

good

.

(101)

(102)

(103)

Example 5

int sum = 0;

result = sum; }

public void testSum1_2() {

int sum = 0;

(104)

VM

sum1

sum1_2

sun-1.6.0-20

sun-1.6.0-16

0.04

0.00 sun-1.5.0-18

0.04

0.00 ibm-1.6.2

0.08

0.01 jrockit-27.5.0

0.17

0.08 harmony-r917296

0.17

0.11

(105)

VM

sum1

sum1_2

sun-1.6.0-20

0.04 sun-1.6.0-16

0.04

0.00 sun-1.5.0-18

0.04

0.00 ibm-1.6.2

0.08

0.01 jrockit-27.5.0

0.17

0.08 harmony-r917296

0.17

0.11

(106)

VM

sum1

sum1_2

sun-1.6.0-20

0.04

0.00 sun-1.5.0-18

0.04

0.00 ibm-1.6.2

0.08

0.01 jrockit-27.5.0

0.17

0.08 harmony-r917296

0.17

0.11

(107)

sun-1.6.0-20

0.04

0.00 sun-1.6.0-16

0.04

0.00 sun-1.5.0-18

0.04

0.00 ibm-1.6.2

0.08

0.01 jrockit-27.5.0

0.17

0.08 harmony-r917296

0.17

0.11

(108)

java -server -XX:+PrintOptoAssembly -XX:+PrintCompilation ...

- access: 0xc1000001 public

- name: ’testSum1_2’

...

010 pushq rbp

subq rsp, #16 # Create frame

nop # nop for patch_verified_entry 016 addq rsp, 16 # Destroy frame

popq rbp

testl rax, [rip + #offset_to_poll_page] # Safepoint: poll for GC 021 ret

(109)

- method holder: ’com/dawidweiss/geecon2010/Example03’ - access: 0xc1000001 public

- name: ’testSum1_2’

...

010 pushq rbp

subq rsp, #16 # Create frame

nop # nop for patch_verified_entry 016 addq rsp, 16 # Destroy frame

popq rbp

testl rax, [rip + #offset_to_poll_page] # Safepoint: poll for GC 021 ret

(110)

Conclusions

• HotSpot is smart and effective at removing

dead code.

(111)

Example 6

@Test

public void testAdd1() {

int sum = 0;

for (int i = 0; i < COUNT; i++) { sum += add1(i);

}

guard = sum; }

public int add1(int i) {

return i + 1; }

(112)

testAdd1

-XX:+Inlining -XX:+PrintInlining

0.04 -XX:-Inlining

?

(113)

testAdd1

-XX:+Inlining -XX:+PrintInlining

0.04 -XX:-Inlining

0.45

(114)

Most Java calls are

(115)

HotSpot adjusts to

megamorphic

calls

automatically.

(116)

Example 7

abstract class Superclass {

abstract int call(); }

class Sub1 extends Superclass { int call() { return 1; } }

class Sub2 extends Superclass { int call() { return 2; } }

class Sub3 extends Superclass { int call() { return 3; } } Superclass[] mixed =

initWithRandomInstances(10000); Superclass[] solid =

initWithSub1Instances(10000);

@Test

public void testMonomorphic() {

int sum = 0;

int m = solid.length;

for (int i = 0; i < COUNT; i++) sum += solid[i % m].call(); guard = sum;

} @Test

public void testMegamorphic() {

int sum = 0;

int m = mixed.length;

for (int i = 0; i < COUNT; i++) sum += mixed[i % m].call(); guard = sum;

(117)

sun-1.6.0-20

0.19

0.32 sun-1.6.0-16

0.19

0.34 sun-1.5.0-18

0.18

0.34 ibm-1.6.2

0.20

0.30 jrockit-27.5.0

0.22

0.29 harmony-r917296

0.27

0.32

(118)

Example 8

@Test

public void testBitCount1() {

int sum = 0;

for (int i = 0; i < COUNT; i++) sum += Integer.bitCount(i); guard = sum;

}

@Test

public void testBitCount2() {

int sum = 0;

for (int i = 0; i < COUNT; i++) sum += bitCount(i); guard = sum; } /* Copied from * {@link Integer#bitCount} */

static int bitCount(int i) {

// HD, Figure 5-2 i = i - ((i >>> 1) & 0x55555555); i = (i & 0x33333333) + ((i >>> 2) & 0x33333333); i = (i + (i >>> 4)) & 0x0f0f0f0f; i = i + (i >>> 8); i = i + (i >>> 16); return i & 0x3f; }

(119)

VM

testBitCount1

testBitCount2

sun-1.6.0-20

0.43

0.43 sun-1.7.0-b80

0.43

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200).

sun-1.6.0-20

0.08

0.33 sun-1.7.0-b83

0.07

0.32

(120)

testBitCount1

testBitCount2

sun-1.6.0-20

0.43

0.43 sun-1.7.0-b80

0.43

(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200).

VM

testBitCount1

testBitCount2

sun-1.6.0-20

0.08

0.33 sun-1.7.0-b83

0.07

0.32

(121)

... -XX:+PrintInlining ...

Inlining intrinsic _bitCount_i at bci:9 in ..Example06::testBitCount1 Inlining intrinsic _bitCount_i at bci:9 in ..Example06::testBitCount1 Inlining intrinsic _bitCount_i at bci:9 in ..Example06::testBitCount1 Example06.testBitCount1: [measured 10 out of 15 rounds]

round: 0.07 [+- 0.00], round.gc: 0.00 [+- 0.00] ...

@ 9 com.dawidweiss.geecon2010.Example06::bitCount inline (hot) @ 9 com.dawidweiss.geecon2010.Example06::bitCount inline (hot) @ 9 com.dawidweiss.geecon2010.Example06::bitCount inline (hot) Example06.testBitCount2: [measured 10 out of 15 rounds]

(122)

...

Inlining intrinsic _bitCount_i at bci:9 in ..Example06::testBitCount1 Inlining intrinsic _bitCount_i at bci:9 in ..Example06::testBitCount1 Inlining intrinsic _bitCount_i at bci:9 in ..Example06::testBitCount1 Example06.testBitCount1: [measured 10 out of 15 rounds]

round: 0.07 [+- 0.00], round.gc: 0.00 [+- 0.00] ...

@ 9 com.dawidweiss.geecon2010.Example06::bitCount inline (hot) @ 9 com.dawidweiss.geecon2010.Example06::bitCount inline (hot) @ 9 com.dawidweiss.geecon2010.Example06::bitCount inline (hot) Example06.testBitCount2: [measured 10 out of 15 rounds]

(123)

... -XX:+PrintOptoAssembly ...

- klass: {other class}

- method holder: com/dawidweiss/geecon2010/Example06 - name: testBitCount1

...

0c2 B13: # B12 B14 <- B8 B12 Loop: B13-B12 inner stride: ... 0c2 movl R10, RDX # spill

...

0e1 movl [rsp + #40], R11 # spill 0e6 popcnt R8, R8

...

0f5 addl R9, #7 # int 0f9 popcnt R11, R11 0fe popcnt RCX, R9

(124)

{method}

- klass: {other class}

- method holder: com/dawidweiss/geecon2010/Example06 - name: testBitCount1

...

0c2 B13: # B12 B14 <- B8 B12 Loop: B13-B12 inner stride: ... 0c2 movl R10, RDX # spill