Introduction
Garbage collection (GC) is an
integral part of the Java Virtual Machine (JVM) as it collects unused Java heap
memory so that the application can continue allocating new objects. The
effectiveness and performance of the GC play an important role in application
performance and determinism. The IBM JVM provided with IBM WebSphere
Application Server V8 (on supported platforms) provides four different GC
policy algorithms:
·
-Xgcpolicy:optthruput
·
-Xgcpolicy:optavgpause
·
-Xgcpolicy:gencon
·
-Xgcpolicy:balanced
Each of these algorithms provides
different performance and deterministic qualities. In addition, the default
policy in WebSphere Application Server V8 has changed from -Xgcpolicy:optthruput to the -Xgcpolicy:gencon policy. Let’s
take a look at each of these policies and see what this change in the default
policy means.
The garbage
collector
Different
applications naturally have different memory usage patterns. A computationally
intensive number crunching workload will not use the Java heap in the same way
as a highly transactional customer-facing interface. To optimally handle these
different sorts of workloads, different garbage collection strategies are
required. The IBM JVM supports several garbage collection policies to enable
you to choose the strategy that best fits your application
The parallel
mark-sweep-compact collector: optthruput
The simplest
possible garbage collection technique is to continue allocating until free
memory has been exhausted, then stop the application and process the entire
heap. While this results in a very efficient garbage collector, it means that
the user program must be able to tolerate the pauses introduced by the
collector. Workloads that are only concerned about overall throughput might
benefit from this strategy.
The optthruput policy (-Xgcpolicy:optthruput) implements this
strategy (Figure 1). This collector uses a parallel mark-sweep algorithm. In
a nutshell, this means that the collector first walks through the set of
reachable objects, marking them as live data. A second pass then sweeps away
the unmarked objects, leaving behind free memory than can be used for new
allocations. The majority of this work can be done in parallel, so the
collector uses additional threads (up to the number of CPUs by default) to get
the job done faster, reducing the time the application remains paused.
Figure 1. Application and collector CPU
usage: optthruput
The problem with a mark-sweep
algorithm is that it can lead to fragmentation (Figure 2).
There might be lots of free memory, but if it is in small slices interspersed
with live objects then no individual piece might be large enough to satisfy a
particular allocation.
The solution to this is compaction. In theory, the
compactor slides all the live objects together to one end of the heap, leaving
a single contiguous block of free space. This is an expensive operation because
every live object might be moved, and every pointer to a moved object must be
updated to the new location. As a result, compaction is generally only done
when it appears to be necessary. Compaction can also be done in parallel, but
it results in a less efficient packing of the live objects -- instead of a
single block of free space, several smaller ones might be created.
Figure 2. Heap fragmentation
The
concurrent collector: optavgpause
For applications
that are willing to trade some overall throughput for shorter pauses, a
different policy is available. The optavgpause policy (-Xgcpolicy:optavgpause) attempts to do as
much GC work as possible before stopping the application, leading to shorter
pauses (Figure 3). The same mark-sweep-compact collector is used, but much of
the mark and sweep phases can be done as the application runs. Based on the
program's allocation rate, the system attempts to predict when the next garbage
collection will be required. When this threshold approaches, aconcurrent GC begins. As
application threads allocate objects, they will occasionally be asked to do a
small amount of GC work before their allocation is fulfilled. The more
allocations a thread does, the more it will be asked to help out. Meanwhile,
one or more background GC threads will use idle cycles to get additional work
done. Once all the concurrent work is done, or if free memory is exhausted
ahead of schedule, the application is halted and the collection is completed.
This pause is generally short, unless a compaction is required. Because
compaction requires moving and updating live objects, it cannot be done
concurrently.
Figure 3. Application and collector CPU
usage: optavgpause
The
generational collection: gencon
It has long been
observed that the majority of objects created are only used for a short period
of time. This is the result of both programming techniques and the type of
application. Many common Java idioms create helper objects that are quickly
discarded; for exampleStringBuffer/StringBuilder objects, or
Iterator objects. These are allocated to accomplish a specific task, and are
rarely needed afterwards. On a larger scale, applications that are
transactional in nature also tend to create groups of objects that are used and
discarded together. Once a reply to a database query has been returned, then
the reply, the intermediate state, and the query itself are no longer needed.
This observation lead to the
development of generational garbage
collectors. The idea is to divide the heap up into different areas, and collect
these areas at different rates. New objects are allocated out of one such area,
called the nursery (or newspace). Since most
objects in this area will become garbage quickly, collecting it offers the best
chance to recover memory. Once an object has survived for a while, it is moved
into a different area, called tenure (or oldspace). These objects
are less likely to become garbage, so the collector examines them much less
frequently. For the right sort of workload the result is collections that are
faster and more efficient since less memory is examined, and a higher
percentage of examined objects are reclaimed. Faster collections mean shorter
pauses, and thus better application responsiveness.
IBM's gencon policy (-Xgcpolicy:gencon) offers a
generational GC ("gen-") on top of the concurrent one described above
("-con"). The tenure space is collected as described above, while the
nursery space uses a copying collector. This algorithm works
by further subdividing the nursery area into allocate and survivor spaces
(Figure 4). New objects are placed in allocate space until its free space has
been exhausted. The application is then halted, and any live objects in
allocate are copied into survivor. The two spaces then swap roles; that is,
survivor becomes allocate, and the application is resumed. If an object has
survived for a number of these copies, it is moved into the tenure area
instead.
Figure 4. Gencon in action
In theory, this means that half of
the nursery space (that is, the survivor space) is unused at any point in time.
In practice, the amount of memory set aside for survivor space is adjusted on
the fly depending on the percentage of objects that survive each collection. If
the majority of new objects are getting collected -- which is the expected case
-- then the dividing line between allocate and survivor is tilted, increasing the
amount of data that can be allocated before a collection is required.
This style of collector has an
important benefit: by moving live objects with each collection, the nursery
area is implicitly compacted with each collect. This results in the largest
possible block of free space, but also tends to move objects that are closely
related (for example, a String and its char[] data) into
nearby memory locations. This improves the performance characteristics of the
system memory cache, and, in turn, the application itself.
The cost of a nursery garbage
collection is relative to the amount of data that survives (Figure 5). Because
the expectation is that most objects will be garbage, a nursery collection
generally results in a very brief pause. While the majority of objects should
be collected quickly, some will not. This means that over time, the tenure area
will fill up with long-lived objects and a garbage collection of the entire
heap will be required. Most of the techniques described above for the
concurrent collector still apply here. Marking of the tenure area will run
concurrently as required while allocations and collections happen in the
nursery. Sweeping of the tenure area is not done concurrently under gencon, but
as part of the main tenure collection.
Figure 5. Application and collector CPU
usage: gencon
The
region-based collector: balanced
A new garbage
collection policy has been added in WebSphere Application Server V8. This
policy, called balanced (-Xgcpolicy:balanced), expands on the
notion of having different areas of the heap. It divides the heap into a large
number of regions, which can be dealt with individually. The details of
region-based garbage collection in general, and the balanced policy in
particular, will be discussed in Part 2.
Tuning heap
settings for non-generational collectors
How to monitor and
analyze the garbage collector
Monitoring garbage
collector and Java heap usage can be achieved using one of two mechanisms:
·
The Health Center live monitoring tool provides very low performance
overhead monitoring and analysis of garbage collection and other data,
including method execution and lock contention.
·
The -verbose:gc command line
option produces output to the native_stderr.log file that can be loaded into
the Garbage Collection and Memory Visualizer tool. (It is also possible to
direct -verbose:gcoutput to its own
file using the -Xverbosegclogoption.)
See Resources for details.
The first step to tuning the heap
size for any application is to run your application using the default heap
settings, which will enable you to gauge the out-of-the-box performance. At
this point, if the heap is consistently less than 40% free or the GC pauses are
taking more than 10% of the total run time, you should consider increasing the
heap size. The minimum and maximum heap sizes can be modified by specifying -Xms<value> and -Xmx<value>respectively.
The time spent in GC pauses for the
mark and sweep phases of a garbage collection are based on the number of live
objects on the heap. As you increase the heap size on a consistent workload,
the mark and sweep phases will continue to take approximately the same length
of time. So, by increasing the heap size the interval between GC pauses will
increase, which will give the application more time to execute.
If the GC is performing compaction
phases due to fragmentation issues, increasing the heap size might help
alleviate the long pauses due to compaction. Compaction phases tend to
significantly increase GC pause times, so if they are a regular occurrence
tuning the heap settings can improve the application performance.
Fixed vs
variable sized heap
Using a variable
size heap enables the GC to use only the OS resources required by the
application for the heap. As the application heap requirements change, the GC
can respond by expanding or contracting the heap. The GC can only contract
contiguous blocks of memory from the end of the heap, so a compaction might be
required to contract the heap. The actual contraction and expansion phases are
very quick and will not noticeably increase the GC pause times. By setting the
maximum heap size larger than required for normal operations, the application
will be able to handle extra workloads by expanding the heap.
Applications with consistent heap
requirements might see GC pause time improvements by using a fixed heap size.
Tuning
generational GC
How to set command
line options in the admin console
Java command line
options can be set in the WebSphere Application Server administrative console
using the generic JVM arguments option in the Java Virtual Machine panel of the
process definition. To find the Java Virtual Machine panel:
1.
Navigate to the admin console and select Servers
> Server Types > WebSphere application serversfrom the left
panel.
2.
Select your application server from the main panel.
3.
Expand the Java and Process Management option
to the right of the main panel and select Process definition.
4.
Select the Java Virtual Machine option on
the right.
5.
The Generic JVM arguments text box displays near the bottom of the main
panel.
When the options are added, you will need to save
and synchronize the changes before restarting the application server to put the
changes in effect.
When tuning for generational garbage
collection, the simplest approach is to treat the nursery space as a new Java
heap area, in addition to the Java heap area used in the non-generational case.
The Java heap for the non-generational case therefore becomes the tenured heap.
This approach is conservative: the
expectation is that the occupancy of the tenured heap will drop as a result of
introducing the nursery, but it provides a safe starting point, especially in
the case of a migration from a non-generational policy. When the occupancy of
the tenure heap after global (full) collections can be monitored, the size can
then be adjusted as describedearlier:
·
-Xmn<size> sets the
initial and maximum size of the nursery, effectively setting both -Xmnsand -Xmnx.
·
-Xmns<size> sets the
initial size of the nursery to the specified value.
·
-Xmnx<size> sets the
maximum size of the nursery to the specified value.
The size of the nursery heap should
be fixed, and as a result only one of these options, -Xmnis required.
Therefore, you only need to understand how to correctly size the nursery heap.
Sizing the
nursery heap
To correctly size
the nursery, you first need to consider the mechanism that nursery collections
use and the secondary characteristics that occur as a result:
·
Nursery collections work by copying data from
allocate to survivor spaces. Copying data is a relatively
expensive, time consuming task. As a result, the duration of a nursery collect
is dominated by the amount of data being copied. That isn't to say that there
is no effect from the number of objects being copied and the size of the
nursery itself, but that these are relatively minor in comparison to the cost
of copying the actual data. As a result, the duration of the nursery collect is
proportional to the amount of data being copied.
·
Only a finite and fixed amount of data is
"live" in any given collection. Once an application
has completed startup and fully populated its caches, and so on, the amount of
"live" data that needs to be copied in the nursery heap is fixed by
the amount of work that is being done at that point in time. In a system that
processes transactions, the amount of live data that needs to be copied will be
equivalent to one set of live transactions. For example, if you have configured
your application server with 50 WebContainer threads enabling 50 concurrent
transactions to occur, then the amount of live data will be that associated
with those 50 transactions.
This means that the duration of the
nursery collect is set by the size of the data associated with the number of
concurrent transactions occurring at the time of the collect, and not the size
of the nursery. This also means that as the size of the nursery is made larger,
there is an increase in time between nursery collects, without an increase in
the duration of the collect. In effect, as the nursery is made larger, the
overall time spent in garbage collection drops.
Figure 6 shows that if the size of
the nursery is less than the live data associated with one set of transactions,
and thereby the time between nursery collects is less than one transaction,
then data has to be copied multiple times.
Figure 6. Average number of times data
is copied versus time between nursery collects
As the nursery size is expanded and
the time between nursery collects increases, you do less copying on average,
and the overhead of garbage collection drops.
Limitations
on the nursery heap size
There are no direct
limitations on the size of the nursery heap that are imposed by the IBM garbage
collector or JVM; in fact, there are cases where the nursery is being set to
sizes in the order of 10s and even 100s of gigabytes. There are, however,
limitations imposed by the operating system that the Java process has to adhere
to in terms of virtual memory and process address space, as well as the
availability of sufficient physical memory (RAM). The operating system
restrictions for each platform for a 32bit process are shown in Figure 7.
Figure 7. 32bit address spaces by
operating system
The restrictions on 64bit process are
much, much larger. With addressable memory in the range of hundreds to billions
of gigabytes, it is the available physical memory (RAM) limitation that becomes
much more important.
Putting the
two together
As discussed above,
the simplest approach is to treat the nursery as an additional memory space.
However, both the nursery and tenured heaps are in fact allocated as a single
continuous segment of memory, the size of which is controlled by the -Xmx setting. If
only the -Xmx setting is used, 25% of the -Xmx value is used
for the maximum nursery size, and the size of the nursery is permitted to grow
and shrink within that 25%. This gives the layout for the Java heap shown in
Figure 8.
Figure 8. Default heap layout
You should, however, fix the nursery
size at a large value to minimise time spent in garbage collection, and enable
the tenured heap to resize itself according to occupancy to build in
resilience. Therefore, the preferred layout for the Java heap is as shown in
Figure 9.
Figure 9. Recommended heap layout
In order to achieve this layout, the
individual values for the minimum and maximum heap size for both the nursery
and tenured spaces should be set, with minimum and maximum nursery size
settings equal to each other, and the minimum and maximum tenured space sizes
set to different values.
For example, if you want to have a
256MB nursery heap size, and a tenured heap between 756MB and 1024MB, the
values would be:
-Xmns256M
-Xmnx256M
-Xmos756M
-Xmox1024M
-Xmnx256M
-Xmos756M
-Xmox1024M
Migrating to
generational
Since the default
GC policy has changed from optthruput to gencon in WebSphere Application Server
V8, previously chosen tuning parameters might need to be adjusted. The primary
issue is changing the heap sizes to compensate for the nursery. A program that previously
ran fine under optthruput with 1G of heap (i.e. -Xmx1G) might be unhappy
running with only 768M of tenure space and 256M of nursery space. The
techniques described above will help to choose new heap parameters.
There are other less obvious situations
where gencon may display different behaviour.
Because classes are generally
long-lived objects, they are allocated directly into tenure space. As a result,
class unloading can only be done as part of a tenure collection. If an
application relies heavily on short-lived class loaders, and nursery
collections can keep up with any other allocated objects, then tenure
collections might not happen very frequently. This means that the number of
classes and class loaders will continue increasing, which can increase the
pressure on native memory and lead to very long tenure collections when they do
happen, because there is so much class unloading work to be done.
If this issue becomes a problem,
there are two solutions. The first is to encourage additional tenure
collections in the presence of large amounts of class loaders. The command line
option -Xgc:classUnloadingKickoffThreshold=<number> tells the
system that a concurrent tenure collection be started every time <number> new class
loaders have been created. So, for example, specifying -Xgc:classUnloadingKickoffThreshold=100 will start a
concurrent tenure collect whenever a nursery collect notices that 100 new class
loaders have been created since the last tenure collection. The second solution
is to change to one of the other GC policies.
A similar issue can arise with
reference objects (for example, subclasses of java.lang.ref.Reference) and objects with finalize()methods. If one of
these objects survives long enough to be moved into tenure space before
becoming unreachable, it could be a long time before a tenure collection runs
and "realizes" that the object is dead. This can become a problem if
these objects are holding on to large or scarce native resources. We've dubbed
this an "iceberg" object: it takes up a small amount of Java heap,
but below the surface lurks a large native resource invisible to the garbage
collector. As with real icebergs, the best tactic is to steer clear of the
problem wherever possible. Even with one of the other GC policies, there is no
guarantee that a finalizable object will be detected as unreachable and have
its finalizer run in a timely fashion. If scarce resources are being managed,
manually releasing them wherever possible is always the best strategy.
Changing
policies
The default policy
should provide adequate performance for most workloads, but it may not be the
ideal choice for a particular application.
An application that operates like a
"batch job" sets up its initial state and loads the data on which it
will operate. Most of these objects will live for the duration of the job, with
only a few additional objects being created as the job runs. This sort of
workload fits the optthruput model since the expectation is that there will be
very little garbage until the task is complete. In a very similar case, jobs
that complete very quickly or allocate very few objects might be able to run
without requiring a garbage collection with an appropriate sized heap. In these
cases, the minimal overhead of the optthruput collector makes a good choice.
By contrast, a transactional
application is continually creating and discarding groups of objects. In this
context, the term "transaction" can be very literal -- consider a
database update or an e-commerce purchase -- or used in the far broader sense
of a discrete unit of work. To take some examples, serving a web page can be
considered a transaction. The client submits a URL, the server computes the
content of the page and sends it to the client. Once the client has received
the page, the server can discard the computed data.
To stretch the definition a little
further, consider a standard user interface. The user clicks on the Save button
and the system opens a file dialog so the user can navigate the filesystem and
choose a location for the document. Once the user has dismissed the dialog, all
of that intermediate state becomes unnecessary. Even some batch jobs are really
transactional under the surface. A task that is creating thumbnails for a
collection of large image files might appear to be a single large batch job,
but internally it is processing images separately, each one forming a
transaction of sorts. For any of these kinds of workloads, the gencon model
should provide benefits.
The optavgpause model sits somewhere
in the middle. Applications that have a lot of long-lived data that changes
slowly as the program runs might fit this model. This is an unusual pattern for
a workload; generally, long lived data is either almost never changed or
changed frequently. That said, a system with a slowly evolving data set that
does not create many intermediate objects mighty benefit from this policy.
Programs that are unable to run efficiently under gencon due to one of the
problems discussed above might benefit from the concurrent nature of
optavgpause.
Conclusion
This article
presented a brief description of the garbage collections strategies available
in the Java Virtual Machine in WebSphere Application Server V8. While the
default setting should perform well in most cases, some tuning might be
required to get the best performance. By matching the GC policy used to the
type of workload and choosing suitable heap parameters, the impact of garbage
collection on the application can be reduced.
Part 2 of this series will introduce
a new region-based garbage collection strategy, called balanced, which is designed
to improve scalability when deploying on large 64bit multi-core systems. The
article will cover the motivation behind the new technology and the performance
improvements it provides, as well as hints and tips on tuning this new option.
Resources
Learn
·
Wikipedia: Garbage collection definition
No comments:
Post a Comment