Another harder to explain example
This program was sent to me by Ilya Kriveshko and shows a somewhat more complicated
performance pattern. Here is his explanation:
I saw your page on double performance problems in HotSpot runtime,
and (in disbelief) tried to run some more tests. To my amazement, I found
that this may not even be an 8-byte boundary alignment problem. I decided
not to use recursion the way you did in your Fibonacci calculation (to take
that out of the equation), but use simple iteration instead. This would ensure
that in the same test batch the same function would be called with the same
stack frame alignment.
I use iteration to run several tests to ensure that separate test batches
are run with sliding stack alignment. (Pardon my wording - I'm sure you'll
understand what I mean when you look at the code.)
What I ran into showed me that the alignment problem may not be as simple
as your conclusion tells. It appears that several (i.e. more than two) different
levels of performance may be achieved depending on alignment on single, double,
triple, etc... word boundary. Please, compile and run the attached program
(based on yours, but uses iteration and computes non-sense number) and look
at the pattern of varying execution times. It appears that every 2hd test
takes 2-3 times longer. But on top of that, every 16th test takes 5-6 times
longer. ... Actually, I just pulled the test data into a spreadsheet and
made a barchart out of it. You can see for yourself. There is a visible
harmonic pattern to it.
Also, running my program with -server also exhibits a similar pattern, although
the peaks are not nearly as pronounced. Overall my test rogram runs significantly
faster with -server. I attached the chart for the -server run as well.
I just thought you might like to know this, since you were interested in
the problem in the first place.
Also, I was hoping that you could put my test program on your website and
add a comment on Sun's site referring to it, since I don't have web space
readily available to me.
Thanks,
--
Ilya A. Kriveshko
And here is the program DoubleTest.java
and charts of the results for client JVM
and server JVM
A possible explanation?
My own best guess as to why there are complicated patterns in the results
is that the penalty for a misaligned double varies depending on whether the
double also crosses other boundaries such as a cache-line boundary or a page
boundary. This might explain the complex patterns in these results,
but doesn't explain why the server JVM shows variations when other tests
indicated that it didn't suffer from the double misalignment problem. I
suspect it would take a detailed understanding of hotspot and x86 behaviour
to fully explain these results. I was able to stabilize the performance
variations somewhat in my experiments by inserting unused extra local int
variables, but not fully. Perhaps one of the hotspot engineers will
eventually enlighten us.