Summary of various disadvantages of Java virtual machine GC

Java implements automatic memory management through Garbage Collection (GC), which can effectively reduce the burden on Java application developers and avoid more risks of memory leakage.

If you have used languages such as C++ that require manual memory management, then you will experience the convenience brought by GC and lower the threshold for language use.

However, when we enjoy the convenience of automatic memory management, we have to pay attention to some of the shortcomings it brings. The most criticized Java garbage collector is probably STW, but in addition, it has some shortcomings. In this article, we will list several major shortcomings of GC.

1. SWT, stop-the-world)

During garbage collection, the garbage collection cycle requires all application threads to pause, in order to avoid the application code destroying the heap state information mastered by the garbage collection thread during garbage collection.

STW will make all business threads pause execution and wait for GC's marking. Even relatively advanced garbage collectors such as ZGC and C4 still cannot avoid complete STW during the root scan and other stages. This will reduce the throughput of the entire business, because garbage collection is not about doing business-related things. STW will also increase the delay and reduce the response speed.

If your application focuses on latency, then see if JDK supports the latest garbage collectors, such as ZGC, which focuses on low-latency garbage collectors; if your application only focuses on throughput, then choose Parallel GC. Although this garbage collector has existed for a long time, it still has certain advantages over other collectors in terms of throughput.

In addition, the entire virtual machine is a system, and GC is also part of this system. It is not run alone. It needs to interact with the stack, compiler, threads, etc. The inspection and write barriers of thread safety points will also directly affect the efficiency of the program.

2. Take up more memory/low memory utilization

The most direct waste of space is the To Survivor area. At present, many GCs use generational garbage collection, dividing the entire heap into young generations and old generations, among which the young generations are divided into Eden, From Survivor and To Survivor areas. The replication algorithms that young generations use do not allow the use of the To Survivor area, and its size is usually 1/10 of the entire young generation.

The space utilization rate of the elderly is not too high, and there must always be a part of the guarantee space to ensure the smooth implementation of the young generation of GCs.

In order to realize the recycle of young generation GCs separately, the objects of the old generation need to be scanned as root objects. In order to speed up the scanning speed of the old generation, data structures such as card tables and offset tables are needed to assist, which all require space. For example, the card table is usually a card table with 512 bytes, so a 2G-sized old generation needs about 4MB card tables, and the memory set of G1 needs to occupy more memory to record the reference relationship between generations.

When allocating memory space to the heap, mmap() is usually called to apply and allocate. However, Linux uses two-stage submission, which means that the virtual memory space will be applied first, and the physical space will be truly allocated when an address is accessed. The current JDK can specify or not specify the heap size, and it can be automatically adjusted by GC when not specified. However, it seems that most people will still specify heap size parameters for the virtual machine when using it. They may even configure parameters such as AlwaysPreTouch to reduce the delay, so that the heap can apply for all physical memory in advance, avoid dynamic allocation during program run, affecting efficiency. Whether it is manual or automatically adjusted heap size, it will not be released once it is applied to the physical space. Just imagine that if you may apply for a lot of physical memory during peak traffic, and the memory utilization rate may be very low when the traffic is low. However, Alibaba's JDK has developed the feature of returning physical memory. Starting from JDK13, ZGC has added a new memory return feature (Uncommit Unused Memory), which can return unused heap memory to the operating system, which is very suitable for containerized fields. These measures are conducive to improving memory usage.

3. The time of GC occurs unknown

When the time of GC occurs unknown, it is uncertain when the Java object is recycled, that is, the life cycle (survival time) of Java is uncertain. The timing of garbage collection is not certain, nor does it occur at a fixed frequency, which will also cause some floating garbage, that is, the objects that originally needed to be recycled are still occupying space, and the inability to release them in time will also affect the space utilization rate.

Here we discuss a problem that makes Java's finalize feature become useless due to uncertainty in Java generation cycle.

If you want to write C++, you can narrow the life cycle range of an object into a block, as follows:

class ResoruceMark{
   ResourceMark(){
     // Apply for resources in the constructor, such as mutex lock  }
  ~ResourceMark(){
    // Free resource in destructor  }
};
// Use ResourceMark to manage resources within a block{
  ResourceMark mark; // Apply for resources  ...
 // The mark life cycle has ended, and the constructor is automatically called to release the resource}

In the Java virtual machine HotSpot, there are various classes with the ending of Mark strings, most of which are used in the above way, such as ResourceMark and HandleMark.

Java's finalize() mechanism also tries to provide automatic resource management, which can rewrite the finalize() method to release resources (similar to the destructor of C++). When the object is recycled, the finalize() method is automatically called to release resources.

In HotSpot VM, when GC performs accessibility analysis, if the current object is an object of type finalize (the object of the finalize() method is rewritten) and it is unreachable itself, it will be added to a ReferenceQueue queue. During the initialization process, the system will start a daemon thread of FinalizerThread type (thread name Finalizer), which will continuously consume the objects in the ReferenceQueue and execute its finalize() method. After executing the finalize() method, the object simply disconnects the association with the Finalizer, which does not mean that it will be recycled immediately, and it will still have to wait for the next GC to be recycled. The finalize() method of each object will only be executed once and will not be executed repeatedly.

The problem is that the finalize() method relies very much on the GC recycling action, and the time of GC running is uncertain, so it is also uncertain when the finalize() method is called to release the resources in it. Assuming that the file handle needs to be recycled, if the finalze() does not happen for a long time, then in a sense, this is also a resource leak, and resources can be exhausted as soon as possible. So it does not safely implement automatic resource management.
Finalize() is marked outdated in the later version. Java official explicitly recommends avoiding it (see JEP 421 for details)

We cannot predict when GC will occur, which will also lead to other unexpected behaviors. For example, FullGC occurs in CMS garbage collectors. Such FullGC is inefficient in collection and has a long STW time. If there are a large number of Http requests at this time, a large number of timeout behaviors may occur at some point.

4. GC moving objects

Java objects will be moved to other places after GC, so Java objects are not allowed to operate during GC, and the address that refers to this Java object needs to be updated after GC.

4.1. Critical Zone

I wrote an article before, "When GC garbage collection, there were actually user threads running." During the GC, the thread that executes the local native is still running, but this thread may hold indirect references to Java objects, and operations on the objects need to be completed through the JNI API.

The way to manipulate arrays through the JNI API is to use GetXXXArrayElements and ReleaseXXXArrayElements. However, such operations are very effective, because GC will change the position of the array in memory, and it is not safe to directly hand over the memory address on the Java heap to the user. Therefore, the GetXXXArrayElements returns to the user an array copy, while ReleaseXXXArrayElements copies the copy back to the real array in the Java heap.

For examples as follows:

JNIEXPORT void JNICALL Java_cn_hotspotvm_TestArray_mul(
     JNIEnv *env, jclass klass, 
     jfloatArray mat1, jfloatArray mat2)
{
    jboolean isCopyA, isCopyB;
    float *A = env-&gt;GetFloatArrayElements(mat1, &amp;isCopyA);
    float *B = env-&gt;GetFloatArrayElements(mat2, &amp;isCopyB);
    mult_SSE(A, B);
    // The third parameter 0 means synchronizing the modified data back to the Java array and freeing the local buffer    env-&gt;ReleaseFloatArrayElements(mat1, A, 0);
    //Do not synchronize the modification back to the Java array, and directly release the buffer (suitable for read-only operations)    env-&gt;ReleaseFloatArrayElements(mat2, B, JNI_ABORT);
}

In fact, when calling GetFloatArrayElements(), the array copy is returned.

In order to improve performance, we can use the critical zone, where GC is not allowed to occur, so that there is no need to copy the array copy, as follows:

JNIEXPORT void JNICALL Java_cn_hotspotvm_TestArray_mul(
     JNIEnv *env, jclass klass, 
     jfloatArray mat1, jfloatArray mat2)
{
    jboolean isCopyA, isCopyB;
    float *A = static_cast<float*>(env->GetPrimitiveArrayCritical(mat1, &isCopyA));
    float *B = static_cast<float*>(env->GetPrimitiveArrayCritical(mat2, &isCopyB));
    mult_SSE(A, B);
    env->ReleasePrimitiveArrayCritical(mat1, A, 0);
    env->ReleasePrimitiveArrayCritical(mat2, B, JNI_ABORT);
}

Just replace GetFloatArrayElements and ReleaseFloatArrayElements with GetPrimitiveArrayCritical and ReleasePrimitiveArrayCritical. CriticalArray is to solve the problem of array copy. It creates a critical area that blocks GC in GetPrimitiveArrayCritical and ReleasePrimitiveArrayCritical, so that the real data of the array can be directly exposed to the user.

JNIEXPORT void JNICALL JavaCritical_cn_hotspotvm_TestArray_mul( 
 jint length1, jfloat* mat1,
 jint length2, jfloat* mat2)
{
   mult_SSE(mat1, mat2);
}

CriticalNative is a special JNI function, the entire function is a critical area (of course, it also includes skipping some non-critical security checks), which can obtain maximum performance at the expense of the overall stability of the JVM. Since it was originally designed to be used as a JRE encryption module, considering that most of the current encryption algorithms are in block units, in other words, in most cases, small-scale arrays are frequently passed in JNI, CriticalNative is specially designed to optimize the delivery of arrays.

Compared with previous versions, JavaCritical function can further reduce JNI call overhead, because it can skip some "excess" checks and enter a critical area that prohibits JVM from garbage collection to obtain performance improvements.

4.2. Heap out-of-memory

Many communication frameworks will open up a piece of out-of-the-mill memory to improve efficiency, such as netty, etc. In fact, during the network and disk IO process, if the data is in Heap, it will eventually be copied out of the heap and then sent. The reason is that when the operating system writes data in memory to disk or network, it requires that the memory area where the data is located cannot change, but GC will sort out the memory, causing the data memory address to change, so it can only copy it to off-heap memory (not affected by GC) and then send this address to the operating system.

Source code location：openjdk/jdk/src/share/classes/sun/nio/ch/
static int read(FileDescriptor fd, 
    ByteBuffer dst, long position,
    NativeDispatcher nd)
        throws IOException
{
        if (())
            throw new IllegalArgumentException("Read-only buffer");
        // If it is a DirectBuffer on the heap, just read the content and return it        if (dst instanceof DirectBuffer)
            return readIntoNativeBuffer(fd, dst, position, nd);
        // Apply for a temporary DirectBuffer        ByteBuffer bb = (());
        try {
            // Copy the contents in the heap to DirectBuffer            int n = readIntoNativeBuffer(fd, bb, position, nd);
            ();
            if (n &gt; 0)
                (bb);
            return n;
        } finally {
            (bb);
        }
}

There is a DirectByteBuffer in Java. When the DirectByteBuffer is created, it will directly allocate a piece of memory outside the Java heap through malloc through Unsafe's native method, and then operate this piece of memory through Unsafe's native method. For memory that requires frequent operation and only exists temporarily for a while, it is recommended to use off-heap memory and make it into a buffer pool to continuously recycle this memory.

For examples as follows:

try (FileChannel channel = (("/tmp/"), )) {
     // Direct buffer    ByteBuffer buffer = (1024);
    while ((buffer) &gt; 0) {
        ();
        // Processing data...        ();
    }
} catch (IOException e) {
    ();
}

Calling the open() method of FileChannel will return a FileChannelmpl instance. The read() method of this instance will call the() method. This method is the method we introduced above.

This is the end of this article about the various shortcomings of Java virtual machine GC. For more related Java virtual machine GC content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!