Shifting battlegrounds

It is no secret, no mystery: I just love garbage collectors. They have saved me from my own failings over and over again,. They are the single most potent contribution to the quality of the software I develop in this ever more complex and more dynamic world. What the heck: I even included a full section on garbage collectors in “The Rise and Fall of Software Recipes”!

When interviewing potential new hires, I ask them for the biggest difference between C++ and Java. When their answer includes a reference to garbage collectors, memory management or any other thing amounting to the same, I know they’ve actually used both languages seriously enough to appreciate the difference these features make. On the other hand, if they come up with some claim about C++ being compiled while Java is “interpreted”, not only is it essentially untrue and irrelevant, but it is a sign of a more academic, less hands-on exposure to these languages.

I have had numerous passionate debates about garbage collectors, trying to dispel what I considered as myths and misconceptions. Oddly enough, while garbage collectors have not changed that much over the years, these debates have.

 

 

Twenty-five years ago, I had to make the case for abstraction: software developers could not be trusted with keeping track of pointer liveness on their own. It was just too tricky and cumbersome a task to expect a human never to get it wrong. Abstracting away memory management allowed us to focus more on the problem than on technicalities, and contrary to intuition, would often be faster than a carefully manually crafted implementation.

C++ was the rage back then, and the opposing party would generally claim that by redefining the right set of exotic operators such as the copy-initializer and a few others, one could keep track of the live objects, maintain reference counters, and deallocate objects safely and apparently automatically (I write “apparently”, since this was still happening under the developer’s ultimate responsibility)

I was not impressed: the complexity of this C++ machinery was daunting, and reference counting came with flaws of its own: performance hit, cyclic object structures, etc. (I don’t resent reference counting per se: it can be a valid way of implementing a garbage collector. I just don’t want to be responsible for keeping track of these reference counts myself)

But that was then.

Things have evolved. C++ is not dead, but it is no longer the default language for most applications, even for reasonably performance-critical systems where Java and C# have gained prominence. Garbage collectors are now accepted fixtures, and I seldom have to explain why they are important, useful, indispensable even.

This war has been won. What used to be an iconoclastic opinion of mine has become as mainstream as can be (not that I’m deluded enough to think that I have any credit to take for it)

I should feel vindicated, but I don’t. Garbage collectors have merely moved from one misconception to another.

I no longer have to convince people of the virtues of garbage collectors, but I must now emphasize that despite their simplicity from a developer’s point of view, they remain complicated beasts that can have dreadful effects on performance if used carelessly. Allocating excessive number of objects increases the frequency of the garbage collector. Reclaiming dynamically allocated memory, even if automatic and transparent to the developer – that’s the whole idea of abstraction – requires sophisticated mechanisms, whether one uses precise (or exact), conservative, generational, mark and sweep, copying or compacting garbage collectors.

Dynamic allocation is made available by means of simple operators (“new”) but not all operators are born equal, and allocating an object is significantly more costly than adding two numbers of any other similarly hardware-supported capability. Turning the creation of an object to a single operation in a virtual machine cannot hide the fact that there is much more happening., and the amount of aggressive optimization effort invested in the dynamic allocation schemes and garbage collectors is a testimony of the importance of the issue.

Ever increasing processing power will save you only to a degree. First, because the amount of memory made available to processes has grown as well, increasing the amount of work required by a garbage collection generation almost linearly. Even more of an issue, garbage collectors do not always (rhetoric precaution, as I have not run a comprehensive study comparing all available garbage collectors) behave well in multi-threaded environments. More specifically, they will often (see precaution mentioned above) block all threads but one when actually reclaiming memory.

This is why heavily multi-threaded servers written in garbage collected languages must be spread over multiple processes to segregate the work memory into distinct address spaces, and consequently, allow for parallel garbage collection. Quite some plumbing, if you ask me, to use a feature that is supposed to provide abstraction from such implementation detail.

To make a long story short, as useful as abstraction can be, memory management is one area where one must understand what happens under the hood and act accordingly, an area where ignorance’s bliss can be toxic.

And I have moved from the position where I defended the level of abstraction provided by garbage collectors, to one where I must warn against the false level of comfort one gets when forgetting what is actually being abstracted away.

Life is full of surprises.

 

 

26-09-2017 - By in

Leave a Reply

Your email address will not be published. Required fields are marked *