站内搜索: 请输入搜索关键词
当前页面: 图书首页 > NET For Java Developers Migrating To C#

NET For Java Developers Migrating To C#

[ directory ] Previous Section Next Section

1.4 Memory Management

One of the key features of modern execution environments such as the JVM and the CLR is automatic memory management through garbage collection. The CLR provides a garbage collector (GC), similar to that of the JVM, which runs in the background and releases locked-up memory by freeing and destroying unused objects. The runtime uses metadata to locate and load classes, lay out instances in memory, resolve method invocations, generate native code, enforce security, and set runtime context boundaries. The runtime automatically handles object layout and manages references to objects, releasing them when they are no longer being used. Objects whose lifetimes are managed in this way are called managed data. Automatic memory management eliminates memory leaks. Let's examine briefly what garbage collection is all about in the CLR and the JVM.

1.4.1 Garbage Collection in the CLR

The CLR allocates new objects on the managed heap, which is a contiguous block of address space. The managed heap also has a pointer, P, which initially points to the base address of the heap. When the new operator is called by the application to create a new object, the CLR determines whether the new object fits into the address space. The CLR adds the size of the object (in bytes) to the pointer P. If, after the addition, P points outside the address space, then the object cannot be allocated immediately and the heap will have to be scanned and cleared of the garbage. When space is then created on the managed heap, the new object is allocated and the address is returned. The process of identifying and removing the garbage on the heap is garbage collection, and several algorithms are available to conduct this process.

What Is Garbage?

Every application has a set of roots. Roots identify storage locations, which refer to objects on the managed heap or to objects that are set to null. For example, all the global and static object pointers in an application are considered part of the application's roots, as are any local variable or parameter object pointers on a thread's stack. Finally, any CPU registers containing pointers to objects in the managed heap are also considered part of the application's roots. The list of active roots is maintained by the JIT compiler and the CLR, and this list is accessible to the garbage collector's algorithm. Thus, garbage is defined as those objects on the heap that cannot be reached from this list of active roots.

How Does Garbage Collection Work?

When the garbage collector starts running, it assumes that all objects in the heap are garbage. In other words, it assumes that none of the application's roots refer to any objects in the heap. The GC starts walking the roots and building a graph of all objects reachable from the roots. For example, the GC may locate a global variable that points to an object in the heap. The GC continues to walk through all reachable objects recursively. When this part of the graph is complete, the GC checks the next root and walks the objects again.

As the garbage collector walks from object to object, if it attempts to add an object to the graph and discovers that the object has already been added, the GC stops walking that path. This sequence serves two purposes. First, it helps performance significantly because the GC walks any set of objects only once. Second, it prevents infinite loops should you have any circular linked lists of objects.

When all the roots have been checked, the GC's graph contains the set of all objects that are somehow reachable from the application's roots; any objects that are not in the graph are not accessible by the application and are therefore considered garbage. The GC now walks the heap linearly, looking for contiguous blocks of garbage objects and removing all gaps in the heap. Of course, moving the objects in memory invalidates all pointers to the objects. This means that the GC must modify the application's roots so that the pointers point to the objects' new locations. In addition, if any object contains a pointer to another object, the GC is also responsible for correcting these pointers.

Garbage Collection Performance

The performance of garbage collection depends on how big the managed heap is and how much of it is garbage, and also whether any of the objects in the managed heap have finalize methods defined. Just as in Java, objects can define a finalize method that can be used as a last effort to release resources. If this method is defined, it gets called before the object's memory is reclaimed.

Calling the finalize method on all reclaimable objects slows down garbage collection. Therefore, as will be explained in later chapters, the finalize method should not be used for routine resource management.

Another performance bottleneck is large objects on the heap. Imagine the structure of your C: drive stored in memory as a tree. If an application continuously references this tree throughout the course of its execution, the tree will continue to remain on the heap and will never be garbage-collected. To deal with such large objects, it's best for these objects to be garbage-collected and accessed by the application at the same time, something that is implemented via weak references. This mechanism is based on the timing of access and collection. If only weak references to an object exist and if the GC runs, the object is collected; when the application later attempts to access the object, the access will fail. On the other hand, to access a weakly referenced object, the application must obtain a strong reference to the object. If the application obtains this strong reference before the GC collects the object, then the GC can't collect the object because a strong reference to the object exists.

You can also improve the performance of the GC by making it generational and compacting only portions of the heap. A generational GC assumes three things: that newer objects have shorter lifetimes, that older objects have longer lifetimes, and that newer objects have stronger relationships and are frequently accessed at about the same time. When initialized, the managed heap consists of generation 0 objects. When it's time for garbage collection, the objects in generation 0 become generation 1 objects, and the new objects that get allocated because of the freed space now become generation 0. This cycle of pushing the surviving objects to later generations works so that GC need scan only the youngest generation to reclaim space. If the GC cannot get what it needs by examining generation 0 objects, it then goes to the older generation. Therefore, the GC ends up scanning only part of the heap.

Note that GC algorithms are not set in stone. This is one of the pieces of the CLR that is guaranteed to be tweaked more often so as to maximize performance.

1.4.2 Garbage Collection in the JVM

The HotSpot JVM employs multiple algorithms for the young and old objects on its managed heap. For the younger-generation objects it uses the copying collector, and for the older generations it uses the compact collector.

Younger objects tend to be short-lived and closely related to each other, and, as explained earlier, this is true even for the Java-based managed heap objects. The HotSpot GC uses a copying collector for this generation. The copying collector is suitable for small, more frequent collections, which are typical of younger objects. For the older objects, which need to be reclaimed only when reclaiming younger objects doesn't free up enough memory, the compact collector is used because it is efficient for handling large amounts of data, although it is slower than the copying collector.

The HotSpot GC also provides the incremental collector, which is useful for high-availability applications and those that need to execute without user-noticeable pauses.

The HotSpot JVM has evolved significantly compared with the first iteration of the JDK 1.1 garbage collector. Similarly, the CLR garbage collector is certain to be fine-tuned for future versions. Interestingly, even though the JVM and the CLR might use different garbage collecting algorithms, they share very similar concepts when it comes to identifying and defining what constitutes garbage and object allocation on the heap. They both have the concept of a managed heap, dividing the heap into generations, and identifying garbage based on reachability from the root set. Additionally, both GCs support weak references.

    [ directory ] Previous Section Next Section