System.exit() not working?

Until recently, I was living in a world where calling a System.exit() method was not very elegant but 100% effective, bulletproof and ultimate way to shut down a multi-threaded Java application. Over the years, in many Java projects, I've seen a quite similar approach to shutting down the application. First, we try to shut down gracefully by finishing or interrupting all the threads, which is by the way not a trivial task in a large application, then, after a timeout is reached, we give up and call System.exit() method - we want to be sure that the JVM is stopped.

The exact same approach was applied in the application I was recently refactoring. I've been doing a massive redesign, touched about three hundred classes, changed many of them drastically. A day or two after one of the huge merges a bug was discovered - shutting down the application takes ages.

I've checked the logs, confronting it with the source code, the whole flow was pretty straightforward and I was sure System.exit() had been called. It didn't cause the JVM stoppage. I could see that after the call there are still some lines being logged from one of the periodic workers. The app had been running for 10 more minutes not doing much until it was finally killed by an outside wrapper process (thank god).
I started to google for the possible reasons System.exit() call may not be doing its job. Haven't found the answer, which later turned out to be the reason I decided to start a blog and write an article on that.

I had to take a closer look at the process of shutting down the application. The trouble is, the application is sometimes deployed as a Windows service and sometimes as a Linux systemd service. There are a few paths to shut down the application used in production.

Shutting down the windows service
Killing the Linux process
Using a REST endpoint
The application can shutdown itself when self-diagnostics fails

Because of the different paths and the fact that before the shutdown some very important cleanup needs to be done, we've introduced a terminating service that could be used by the variety of components. During the recent refactoring, I spread the usage of the service all over the code. I've removed all the ad-hoc similar code and replaced it with a call to the terminating service.

The algorithm there was pretty simple:

I've seen all three lines in the log file. I couldn't understand why after a System.exit() call the JVM was still running.

I came up with an answer when I tried to reproduce the issue. I've realized that in this case, the person who had found the bug was restarting the windows service, which caused a shutdown hook to be called, which then called the terminating service. It turns out that System.exit() actually delegates the call to Runtime.exit() which first calls all the shutdown hooks and the finalizers and then terminates the JVM, unlike its sibling- the Runtime.halt() method, which just forcibly stops the JVM.

Although I knew that was it I still didn't understand why instead of a recurrence shutdown hook calls I could only see a single System.exit() call in logs. It all made clear when I checked the code of java.lang.ApplicationShutdownHooks#runHooks. Adding a shutdown hook is really adding a java.lang.Thread instance which is started when hooks are being run. The start method in the Thread class is marked as synchronized. That's why there was only a single entry in the log file. Now everything was clear. The lesson learned here is to never call System.exit() method from a shutdown hook.

TL;DR;

If System.exit() call does not seem to work for you, make sure you not calling it from a shutdown hook thread.

KnotGillCup

Search This Blog

System.exit() not working?

TL;DR;

Comments

Post a Comment

Popular posts from this blog

How to store last n-days of historical data in a partitioned table - second attempt

How (not) to store last n-days of historical data in a partitioned table