The marvel garden; inside the Java Virtual
Machine
Dawid Weiss
Talk Outline
•
Philosophical note about performance.
•
Data types, with consequences.
•
Memory allocation, garbage collection.
•
HotSpot optimizations.
This talk is an abbreviated version of several presentations given at the Pozna ´n Java Users Group (JUG) and at GeeCon’2010.
Divide-and-conquer
style algorithm
for (Example e : examples) {
e.hasQuiz() ? e.showQuiz() : e.showCode(); e.explain();
e.deriveConclusions(); }
Philosophical note about
performance.
Is Java faster than C/C++?
The short answer is: it depends.
It’s usually
hard
to make
a fast program run faster.
It’s
easy
to make a slow
program run even slower.
It’s
easy
to make fast
It’s usually
hard
to make
a fast program run faster.
It’s
easy
to make a slow
program run even slower.
It’s
easy
to make fast
Sanity check
•
Algorithms, algorithms, algorithms.
•
Proper data structures.
Sanity check
•
Algorithms, algorithms, algorithms.
•
Proper data structures.
(long time nothing)
Sanity check
•
Algorithms, algorithms, algorithms.
•
Proper data structures.
(long time nothing)
Data types,
Hybrid data types in Java
•
primitive types (non-objects),
•
object types,
Example 1.
declaration
size in-memory (approx.)
byte b = 1
Byte b = 1
ref + 8 + 1
double d = 1
8
Double d = 1
ref + 8 + 8
char c = ’c’
2
byte[] b = new byte[100]
12 + 100
Byte[] b = new Byte[100]
12 + ref
×
100 + objects
boolean[] b = new boolean[100]
Example 1.
declaration
size in-memory (approx.)
byte b = 1
1
Byte b = 1
double d = 1
8
Double d = 1
ref + 8 + 8
char c = ’c’
2
byte[] b = new byte[100]
12 + 100
Byte[] b = new Byte[100]
12 + ref
×
100 + objects
boolean[] b = new boolean[100]
Example 1.
declaration
size in-memory (approx.)
byte b = 1
1
Byte b = 1
ref + 8 + 1
double d = 1
Double d = 1
ref + 8 + 8
char c = ’c’
2
byte[] b = new byte[100]
12 + 100
Byte[] b = new Byte[100]
12 + ref
×
100 + objects
boolean[] b = new boolean[100]
Example 1.
declaration
size in-memory (approx.)
byte b = 1
1
Byte b = 1
ref + 8 + 1
double d = 1
8
Double d = 1
char c = ’c’
2
byte[] b = new byte[100]
12 + 100
Byte[] b = new Byte[100]
12 + ref
×
100 + objects
boolean[] b = new boolean[100]
Example 1.
declaration
size in-memory (approx.)
byte b = 1
1
Byte b = 1
ref + 8 + 1
double d = 1
8
Double d = 1
ref + 8 + 8
char c = ’c’
byte[] b = new byte[100]
12 + 100
Byte[] b = new Byte[100]
12 + ref
×
100 + objects
boolean[] b = new boolean[100]
Example 1.
declaration
size in-memory (approx.)
byte b = 1
1
Byte b = 1
ref + 8 + 1
double d = 1
8
Double d = 1
ref + 8 + 8
char c = ’c’
2
byte[] b = new byte[100]
Byte[] b = new Byte[100]
12 + ref
×
100 + objects
boolean[] b = new boolean[100]
Example 1.
declaration
size in-memory (approx.)
byte b = 1
1
Byte b = 1
ref + 8 + 1
double d = 1
8
Double d = 1
ref + 8 + 8
char c = ’c’
2
byte[] b = new byte[100]
12 + 100
Byte[] b = new Byte[100]
12 + ref
×
100 + objects
boolean[] b = new boolean[100]
Example 1.
declaration
size in-memory (approx.)
byte b = 1
1
Byte b = 1
ref + 8 + 1
double d = 1
8
Double d = 1
ref + 8 + 8
char c = ’c’
2
byte[] b = new byte[100]
12 + 100
Byte[] b = new Byte[100]
12 + ref
×
100 + objects
Example 1.
declaration
size in-memory (approx.)
byte b = 1
1
Byte b = 1
ref + 8 + 1
double d = 1
8
Double d = 1
ref + 8 + 8
char c = ’c’
2
byte[] b = new byte[100]
12 + 100
Byte[] b = new Byte[100]
12 + ref
×
100 + objects
String s = "c"
?
private final char value[];
/** The offset is the first index of the storage that is used. */
private final int offset;
/** The count is the number of characters in the String. */
private final int count;
/** Cache the hash code for the string */
private int hash; // Default to 0
String s = "c"
?
/** The value is used for character storage. */
private final char value[];
/** The offset is the first index of the storage that is used. */
private final int offset;
/** The count is the number of characters in the String. */
private final int count;
/** Cache the hash code for the string */
private int hash; // Default to 0
/** The value is used for character storage. */
private final char value[];
/** The offset is the first index of the storage that is used. */
private final int offset;
/** The count is the number of characters in the String. */
private final int count;
/** Cache the hash code for the string */
private int hash; // Default to 0
Does it make Java
inefficient?
No. It’s the price of
comfort (or laziness).
Does it make Java
inefficient?
No. It’s the price of
comfort (or laziness).
Example 2.
public void testSum1() {
int sum = 0;
for (int i = 0; i < COUNT; i++) sum += sum1(i, i);
result = sum; }
public void testSum2() {
int sum = 0;
for (int i = 0; i < COUNT; i++) sum += sum2(i, i);
result = sum; }
Example 2.
public void testSum1() {
int sum = 0;
for (int i = 0; i < COUNT; i++) sum += sum1(i, i);
result = sum; }
public void testSum2() {
int sum = 0;
for (int i = 0; i < COUNT; i++) sum += sum2(i, i);
result = sum; }
where the body of
sum1
and
sum2
sums arguments and returns the
result and
COUNT
is significantly large. . .
VM
sum1
sum2
sun-1.6.0-20
sun-1.6.0-16
0.04
3.20
sun-1.5.0-18
0.04
3.29
ibm-1.6.2
0.08
6.28
jrockit-27.5.0
0.18
0.16
harmony-r917296
0.17
0.35
VM
sum1
sum2
sun-1.6.0-20
0.04
sun-1.6.0-16
0.04
3.20
sun-1.5.0-18
0.04
3.29
ibm-1.6.2
0.08
6.28
jrockit-27.5.0
0.18
0.16
harmony-r917296
0.17
0.35
VM
sum1
sum2
sun-1.6.0-20
0.04
2.62
sun-1.6.0-16
sun-1.5.0-18
0.04
3.29
ibm-1.6.2
0.08
6.28
jrockit-27.5.0
0.18
0.16
harmony-r917296
0.17
0.35
VM
sum1
sum2
sun-1.6.0-20
0.04
2.62
sun-1.6.0-16
0.04
3.20
sun-1.5.0-18
ibm-1.6.2
0.08
6.28
jrockit-27.5.0
0.18
0.16
harmony-r917296
0.17
0.35
VM
sum1
sum2
sun-1.6.0-20
0.04
2.62
sun-1.6.0-16
0.04
3.20
sun-1.5.0-18
0.04
3.29
ibm-1.6.2
jrockit-27.5.0
0.18
0.16
harmony-r917296
0.17
0.35
VM
sum1
sum2
sun-1.6.0-20
0.04
2.62
sun-1.6.0-16
0.04
3.20
sun-1.5.0-18
0.04
3.29
ibm-1.6.2
0.08
6.28
jrockit-27.5.0
harmony-r917296
0.17
0.35
VM
sum1
sum2
sun-1.6.0-20
0.04
2.62
sun-1.6.0-16
0.04
3.20
sun-1.5.0-18
0.04
3.29
ibm-1.6.2
0.08
6.28
jrockit-27.5.0
0.18
0.16
harmony-r917296
sun-1.6.0-20
0.04
2.62
sun-1.6.0-16
0.04
3.20
sun-1.5.0-18
0.04
3.29
ibm-1.6.2
0.08
6.28
jrockit-27.5.0
0.18
0.16
harmony-r917296
0.17
0.35
sun-1.6.0-20
0.04
2.62
1.05
3.76
sun-1.6.0-16
0.04
3.20
1.39
4.99
sun-1.5.0-18
0.04
3.29
1.46
5.20
ibm-1.6.2
0.08
6.28
0.16
14.64
jrockit-27.5.0
0.18
0.16
1.16
3.18
harmony-r917296
0.17
0.35
9.18
22.49
return a + b; }
return a + b; }
↓
Integer sum2(Integer a, Integer b) {
return Integer.valueOf( a.intValue() + b.intValue()); }
int sum = 0;
for (int i = 0; i < args.length; i++) sum += args[i];
return sum; }
int sum = 0;
for (int i = 0; i < args.length; i++) { sum += args[i];
}
return sum; }
↓
Integer sum4(Integer [] args) {
// ...
Conclusions
•
Syntactic sugar
may
be costly.
•
Primitive types are
fast
on most VMs.
Hints
If you have a great deal of computations
and
use collections (lists,
maps), take a peek at:
•
fastutil
http://fastutil.dsi.unimi.it/
•
Apache Mahout Collections
http://lucene.apache.org/mahout/
•
HPPC
http://labs.carrotsearch.com/hppc.html
•
PCJ is unmaintained and buggy, avoid
Example 3.
public static void main(String[] args) throws Exception {
final long start = Runtime.getRuntime().freeMemory();
final byte[][] arrays = new byte[100][];
for (int i = 0; i < arrays.length; i++) { arrays[i] = new byte[100];
long current = Runtime.getRuntime().freeMemory(); System.out.println(start + " " + current); Thread.sleep(1000);
} ...
1 61934120 61934120 2 61934120 61934120 3 61934120 61934120 4 61934120 61934120 5 61934120 61934120 6 61934120 61934120 7 61934120 61934120 ...
Conclusions
•
Memory allocation is very different to C/C++.
•
There is no single “memory allocation scheme”
Garbage detection
•
direct garbage identification (e.g., reference counting),
•
live object marking.
Mark-Sweep
Mark-Sweep – Marking
Mark-Sweep – Sweeping
Mark-Compact
Mark-Compact – Marking
Mark-Compact – Compacting
Root
Copying
From-Space
To-Space
unused
Copying – Evacuation
From-Space
To-Space
Root
Copying – Flip
From-Space
To-Space
Root
unused
FreePtr
Typically
•
Most objects die quickly.
•
Many objects never die.
Generational Garbage Collection
Young Generation
Old Generation
Allocation
Generational GC Benefits
•
For most Java applications, Generational GC is
hard to beat
>
Many have tried,
>
including Sun.
The Heap In The HotSpot JVM
Young Generation
Old Generation
HotSpot Young Generation
From
Old Generation
Eden
To
Survivor Spaces
Young Generation
unused
Before Young Collection
From
Old Generation
Eden
To
Survivor Spaces
Young Generation
unused
TLABs
•
Thread-Local Allocation Buffers
•
Each application thread gets a TLAB to allocate into
>
TLABs allocated in the Eden
>
Bump-a-pointer allocation; fast
>
No synchronization (thread “owns” TLAB for allocation)
•
Only synchronization when getting a new TLAB
>
Bump-a-pointer to allocate TLAB too; also fast
•
Allocation code inlined
Attaching to process ID 9134, please wait... Debugger attached successfully.
Server compiler detected. JVM version is 14.0-b16
using thread-local object allocation. Parallel GC with 2 thread(s)
Heap Configuration: MinHeapFreeRatio = 40 MaxHeapFreeRatio = 70 MaxHeapSize = 1040187392 (992.0MB) NewSize = 2686976 (2.5625MB) MaxNewSize = 17592186044415 MB OldSize = 5439488 (5.1875MB) NewRatio = 2 SurvivorRatio = 8 PermSize = 21757952 (20.75MB) MaxPermSize = 88080384 (84.0MB) ...
PS Young Generation Eden Space: capacity = 105512960 (100.625MB) used = 55139968 (52.5855712890625MB) free = 50372992 (48.0394287109375MB) 52.258952833850934% used From Space: capacity = 2424832 (2.3125MB) used = 0 (0.0MB) free = 2424832 (2.3125MB) 0.0% used To Space: capacity = 2359296 (2.25MB) used = 0 (0.0MB) free = 2359296 (2.25MB) 0.0% used PS Old Generation capacity = 437125120 (416.875MB) used = 308442480 (294.15367126464844MB) free = 128682640 (122.72132873535156MB) 70.56160030336395% used PS Perm Generation capacity = 21757952 (20.75MB) used = 2519424 (2.4027099609375MB) free = 19238528 (18.3472900390625MB) 11.579325112951807% used
Back to Example 3:
0 61934120 61934120 1 61934120 61934120 2 61934120 61934120 3 61934120 61934120 ...java -XX:-UseTLAB com.dawidweiss.examples.Example07
0 61974064 61973112 1 61974064 61971256 2 61974064 61970648 3 61974064 61970040 4 61974064 61969432 ...
Back to Example 3:
0 61934120 61934120 1 61934120 61934120 2 61934120 61934120 3 61934120 61934120 ...What if we disable TLABs?
java -XX:-UseTLAB com.dawidweiss.examples.Example07
1 61974064 61971256 2 61974064 61970648 3 61974064 61970040 4 61974064 61969432 ...
0 61934120 61934120 1 61934120 61934120 2 61934120 61934120 3 61934120 61934120 ...
What if we disable TLABs?
java -XX:-UseTLAB com.dawidweiss.examples.Example07
0 61974064 61973112 1 61974064 61971256 2 61974064 61970648 3 61974064 61970040 4 61974064 61969432 ...
GC “smackdown”
•
ParallelOldGC, SerialGC, G1GC, ParallelGC,
ConcMarkSweepGC.
•
64-bit JVM (SUN, 1.6.0_14-b08).
•
OpenSuSE, 64-bit, 4GB RAM.
Loops logs/log-loop-UseConcMarkSweepGC.log t [s] (uptime) 24.70 123.49 246.97 370.46 0.1094 0.2187 0.3281 logs/log-loop-UseG1GC.log t [s] (uptime) 24.70 123.49 246.97 370.46 0.1094 0.2187 0.3281 logs/log-loop-UseParallelGC.log t [s] (uptime) 24.70 123.49 246.97 370.46 0.1094 0.2187 0.3281 logs/log-loop-UseParallelOldGC.log t [s] (uptime) t [s] (gc time) 24.70 123.49 246.97 370.46 0.1094 0.2187 0.3281 0.4375 logs/log-loop-UseSerialGC.log t [s] (uptime) t [s] (gc time) 24.70 123.49 246.97 370.46 0.1094 0.2187 0.3281 0.4375
Random logs/log-random-UseConcMarkSweepGC.log t [s] (uptime) 22.78 113.92 227.85 341.77 0.2194 0.4389 0.6583 logs/log-random-UseG1GC.log t [s] (uptime) 22.78 113.92 227.85 341.77 0.2194 0.4389 0.6583 logs/log-random-UseParallelGC.log t [s] (uptime) 22.78 113.92 227.85 341.77 0.2194 0.4389 0.6583 logs/log-random-UseParallelOldGC.log t [s] (uptime) t [s] (gc time) 22.78 113.92 227.85 341.77 0.2194 0.4389 0.6583 0.8778 logs/log-random-UseSerialGC.log t [s] (uptime) t [s] (gc time) 22.78 113.92 227.85 341.77 0.2194 0.4389 0.6583 0.8778
Hints
•
Memory allocation is usually
very fast
and optimized.
Don’t play with various garbage collectors, it’s not worth it,
instead. . .
•
Allocate memory statically if high-performance is needed,
and. . .
•
Dump and inspect GC activity with
-verbose:gc
and other
options.
Hints
•
Memory allocation is usually
very fast
and optimized.
•
Don’t play with various garbage collectors, it’s not worth it,
instead. . .
Allocate memory statically if high-performance is needed,
and. . .
•
Dump and inspect GC activity with
-verbose:gc
and other
options.
Hints
•
Memory allocation is usually
very fast
and optimized.
•
Don’t play with various garbage collectors, it’s not worth it,
instead. . .
•
Allocate memory statically if high-performance is needed,
and. . .
-verbose:gc
options.
Hints
•
Memory allocation is usually
very fast
and optimized.
•
Don’t play with various garbage collectors, it’s not worth it,
instead. . .
•
Allocate memory statically if high-performance is needed,
and. . .
•
Dump and inspect GC activity with
-verbose:gc
and other
options.
Hints
•
Memory allocation is usually
very fast
and optimized.
•
Don’t play with various garbage collectors, it’s not worth it,
instead. . .
•
Allocate memory statically if high-performance is needed,
and. . .
•
Dump and inspect GC activity with
-verbose:gc
and other
options.
Example 4
But it’s the same program
and the same VM!
public static void startThread() {
new Thread() {
public void run() {
try {
sleep(2000);
} catch (Exception e) { /* ignore */ } System.out.println("Marking loop exit."); ready = true;
} }.start(); }
public static void main(String[] args) { startThread();
System.out.println("Entering the loop...");
while (!ready) {
// Do nothing.
}
System.out.println("Done, I left the loop!"); }
Imperative languages
have become a way to express
declarative
while (!ready) { // Do nothing. }
≡
?
boolean r = ready; while (!r) { // Do nothing. }while (!ready) { // Do nothing. }
≡
?
while (!r) { // Do nothing. }C1:
•
fast
•
not (much) optimization
C2:
•
slow(er) than C1
•
a lot
of JMM-allowed optimizations
What about
javac -O
option?
/* -O is a no-op, accepted for backward compatibility. */
C1:
•
fast
•
not (much) optimization
C2:
•
slow(er) than C1
•
a lot
of JMM-allowed optimizations
What about
javac -O
option?
C1:
•
fast
•
not (much) optimization
C2:
•
slow(er) than C1
•
a lot
of JMM-allowed optimizations
What about
javac -O
option?
/* -O is a no-op, accepted for backward compatibility. */
There are hundreds of JVM
tuning/diagnostic switches.
Conclusions
•
Bytecode is
far
from what is executed.
•
A lot
going on under the (VM) hood.
•
Bad code may work, but will eventually crash.
•
HotSpot-level optimizations are
good
.
Conclusions
•
Bytecode is
far
from what is executed.
•
A lot
going on under the (VM) hood.
•
Bad code may work, but will eventually crash.
•
HotSpot-level optimizations are
good
.
Example 5
public void testSum1() {
int sum = 0;
for (int i = 0; i < COUNT; i++) sum += sum1(i, i);
result = sum; }
public void testSum1_2() {
int sum = 0;
for (int i = 0; i < COUNT; i++) sum += sum1(i, i);
VM
sum1
sum1_2
sun-1.6.0-20
sun-1.6.0-16
0.04
0.00
sun-1.5.0-18
0.04
0.00
ibm-1.6.2
0.08
0.01
jrockit-27.5.0
0.17
0.08
harmony-r917296
0.17
0.11
VM
sum1
sum1_2
sun-1.6.0-20
0.04
sun-1.6.0-16
0.04
0.00
sun-1.5.0-18
0.04
0.00
ibm-1.6.2
0.08
0.01
jrockit-27.5.0
0.17
0.08
harmony-r917296
0.17
0.11
VM
sum1
sum1_2
sun-1.6.0-20
0.04
0.00
sun-1.5.0-18
0.04
0.00
ibm-1.6.2
0.08
0.01
jrockit-27.5.0
0.17
0.08
harmony-r917296
0.17
0.11
sun-1.6.0-20
0.04
0.00
sun-1.6.0-16
0.04
0.00
sun-1.5.0-18
0.04
0.00
ibm-1.6.2
0.08
0.01
jrockit-27.5.0
0.17
0.08
harmony-r917296
0.17
0.11
java -server -XX:+PrintOptoAssembly -XX:+PrintCompilation ...
- access: 0xc1000001 public
- name: ’testSum1_2’
...
010 pushq rbp
subq rsp, #16 # Create frame
nop # nop for patch_verified_entry 016 addq rsp, 16 # Destroy frame
popq rbp
testl rax, [rip + #offset_to_poll_page] # Safepoint: poll for GC 021 ret
- method holder: ’com/dawidweiss/geecon2010/Example03’ - access: 0xc1000001 public
- name: ’testSum1_2’
...
010 pushq rbp
subq rsp, #16 # Create frame
nop # nop for patch_verified_entry 016 addq rsp, 16 # Destroy frame
popq rbp
testl rax, [rip + #offset_to_poll_page] # Safepoint: poll for GC 021 ret
Conclusions
•
HotSpot is smart and effective at removing
dead code.
Example 6
@Test
public void testAdd1() {
int sum = 0;
for (int i = 0; i < COUNT; i++) { sum += add1(i);
}
guard = sum; }
public int add1(int i) {
return i + 1; }
testAdd1
-XX:+Inlining -XX:+PrintInlining
0.04
-XX:-Inlining
?
testAdd1
-XX:+Inlining -XX:+PrintInlining
0.04
-XX:-Inlining
0.45
Most Java calls are
HotSpot adjusts to
megamorphic
calls
automatically.
Example 7
abstract class Superclass {
abstract int call(); }
class Sub1 extends Superclass { int call() { return 1; } }
class Sub2 extends Superclass { int call() { return 2; } }
class Sub3 extends Superclass { int call() { return 3; } } Superclass[] mixed =
initWithRandomInstances(10000); Superclass[] solid =
initWithSub1Instances(10000);
@Test
public void testMonomorphic() {
int sum = 0;
int m = solid.length;
for (int i = 0; i < COUNT; i++) sum += solid[i % m].call(); guard = sum;
} @Test
public void testMegamorphic() {
int sum = 0;
int m = mixed.length;
for (int i = 0; i < COUNT; i++) sum += mixed[i % m].call(); guard = sum;
sun-1.6.0-20
0.19
0.32
sun-1.6.0-16
0.19
0.34
sun-1.5.0-18
0.18
0.34
ibm-1.6.2
0.20
0.30
jrockit-27.5.0
0.22
0.29
harmony-r917296
0.27
0.32
Example 8
@Testpublic void testBitCount1() {
int sum = 0;
for (int i = 0; i < COUNT; i++) sum += Integer.bitCount(i); guard = sum;
}
@Test
public void testBitCount2() {
int sum = 0;
for (int i = 0; i < COUNT; i++) sum += bitCount(i); guard = sum; } /* Copied from * {@link Integer#bitCount} */
static int bitCount(int i) {
// HD, Figure 5-2 i = i - ((i >>> 1) & 0x55555555); i = (i & 0x33333333) + ((i >>> 2) & 0x33333333); i = (i + (i >>> 4)) & 0x0f0f0f0f; i = i + (i >>> 8); i = i + (i >>> 16); return i & 0x3f; }
VM
testBitCount1
testBitCount2
sun-1.6.0-20
0.43
0.43
sun-1.7.0-b80
0.43
0.43
(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200).
sun-1.6.0-20
0.08
0.33
sun-1.7.0-b83
0.07
0.32
testBitCount1
testBitCount2
sun-1.6.0-20
0.43
0.43
sun-1.7.0-b80
0.43
0.43
(averages in sec., 10 measured rounds, 5 warmup, 64-bit Ubuntu, dual-core AMD Athlon 5200).
VM
testBitCount1
testBitCount2
sun-1.6.0-20
0.08
0.33
sun-1.7.0-b83
0.07
0.32
... -XX:+PrintInlining ...
Inlining intrinsic _bitCount_i at bci:9 in ..Example06::testBitCount1 Inlining intrinsic _bitCount_i at bci:9 in ..Example06::testBitCount1 Inlining intrinsic _bitCount_i at bci:9 in ..Example06::testBitCount1 Example06.testBitCount1: [measured 10 out of 15 rounds]
round: 0.07 [+- 0.00], round.gc: 0.00 [+- 0.00] ...
@ 9 com.dawidweiss.geecon2010.Example06::bitCount inline (hot) @ 9 com.dawidweiss.geecon2010.Example06::bitCount inline (hot) @ 9 com.dawidweiss.geecon2010.Example06::bitCount inline (hot) Example06.testBitCount2: [measured 10 out of 15 rounds]
...
Inlining intrinsic _bitCount_i at bci:9 in ..Example06::testBitCount1 Inlining intrinsic _bitCount_i at bci:9 in ..Example06::testBitCount1 Inlining intrinsic _bitCount_i at bci:9 in ..Example06::testBitCount1 Example06.testBitCount1: [measured 10 out of 15 rounds]
round: 0.07 [+- 0.00], round.gc: 0.00 [+- 0.00] ...
@ 9 com.dawidweiss.geecon2010.Example06::bitCount inline (hot) @ 9 com.dawidweiss.geecon2010.Example06::bitCount inline (hot) @ 9 com.dawidweiss.geecon2010.Example06::bitCount inline (hot) Example06.testBitCount2: [measured 10 out of 15 rounds]
... -XX:+PrintOptoAssembly ...
- klass: {other class}
- method holder: com/dawidweiss/geecon2010/Example06 - name: testBitCount1
...
0c2 B13: # B12 B14 <- B8 B12 Loop: B13-B12 inner stride: ... 0c2 movl R10, RDX # spill
...
0e1 movl [rsp + #40], R11 # spill 0e6 popcnt R8, R8
...
0f5 addl R9, #7 # int 0f9 popcnt R11, R11 0fe popcnt RCX, R9
{method}
- klass: {other class}
- method holder: com/dawidweiss/geecon2010/Example06 - name: testBitCount1
...
0c2 B13: # B12 B14 <- B8 B12 Loop: B13-B12 inner stride: ... 0c2 movl R10, RDX # spill
...
0e1 movl [rsp + #40], R11 # spill 0e6 popcnt R8, R8
...
0f5 addl R9, #7 # int 0f9 popcnt R11, R11 0fe popcnt RCX, R9