The right medicine: Tomcat shutdown process analysis and thread processing method

This work often encounters various inexplicable errors caused by the fact that the thread created by Tomcat shutdown does not stop in time. This article will discuss the causes of these errors by combing the Tomcat shutdown process, and propose two feasible The solution.

  Tomcat shutdown process analysis

A Tomcat process is essentially a JVM process, and its internal structure is shown below:

From top to bottom are Server, service, connector | Engine, host, context.

In the implementation, Engine and host are just an abstraction, and the more core functions are implemented in the context. There can only be one server at the top level, one server can contain multiple services, and one service can contain multiple connectors and one Continer. Continer is an abstraction of Engine, Host or Context. Not strictly speaking, a Context corresponds to a Webapp.

When Tomcat starts, the main work of the main thread is summarized as follows:

Build the container from the top-level server to the Service, Connector, etc. (which also includes the build of the Context) by scanning the configuration file (default is server.xml).
Call Catalina's start method, and then call the Server's start method. The start method will cause the entire container to start.
Servers such as Server, Service, Connector, and Context implement the Lifecycle interface, and these components maintain a strict, top-down tree structure. Tomcat manages all other containers in all tree structures only through lifecycle management of the root node (Server).

Block yourself in the await() method. The await() method waits for a network connection request. When a user connects to the corresponding port and sends the specified string (usually 'SHUTDOWN'), await() returns and the main thread continues execution.
The main thread executes the stop() method. The stop() method will call all the stop methods of its under container from the server. After the stop() method is executed, the main thread exits. If there is no problem, the Tomcat container terminates at this time.
It is worth noting that the stop() method is executed asynchronously from the layer below the Service. code show as below:

In these closed children, the standard should be a layered structure such as Engine-Host-Context, which means that the Context stop() method will be called last. These three methods are called in the stopInternal method of the Context:

filterStop();
listenerStop();
((Lifecycle) loader).stop();
(Note: This is only part of it, as it is listed in relation to the process we analyzed, and other process-independent methods are not listed.)

The filterStop will clean up the filter we registered in web.xml. The listenerStop will further call the onDestory method of the Listener registered in web.xml (if there are multiple Listeners registered, the calling order is opposite to the registration order). The loader is here WebappClassLoader, where important operations (trying to stop threads, clean up reference resources, and unload Classes) are all done in the stop function.

If we use SpringWeb, the Listener registered in the general web.xml will be:

Looking at the code of the ContextLoaderListener is not difficult to find, the Spring framework initializes the bean through the contextInitialized method of the Listener, and cleans up the bean through the contextDestroyed method.

public class ContextLoaderListener extends ContextLoader implements ServletContextListener {

public ContextLoaderListener() {

}

public ContextLoaderListener(WebApplicationContext context) {

super(context);

}

public void contextInitialized(ServletContextEvent event) {

this.initWebApplicationContext(event.getServletContext());

}

public void contextDestroyed(ServletContextEvent event) {

this.closeWebApplicationContext(event.getServletContext());

ContextCleanupListener.cleanupAttributes(event.getServletContext());

}

There is one important thing here: our thread is tried to stop in the loader, and the loader's stop method is after the listenerStop method, that is, even if the loader successfully terminates the user-initiated thread, it is still possible in the thread. Use the Sping framework before terminating, and the Spring framework is already closed in the Listener! Moreover, only the clearReferencesStopThreads parameter is configured during the cleanup thread of the loader, and the thread started by the user itself is forcibly terminated (using Thread.stop()). In most cases, in order to ensure data integrity, this parameter is not Will be configured. That is to say, in the WebApp, the threads (including Executors) that the user starts themselves will not be terminated due to the exit of the container.

We know that there are two main reasons why the JVM exits on its own:

Called the System.exit() method
All non-daemon threads exit
Tomcat does not actively call the System.exit() method at the end of the stop execution, so if there is a non-daemon thread started by the user and the user does not close the thread synchronously with the container, Tomcat will not end actively! This problem has been put on hold for a while. Let me talk about the various problems encountered during downtime.

Abnormal analysis during Tomcat shutdown
IllegalStateException In the Webapp using the Spring framework, there is a serious synchronization problem between the closing of the Spring framework and the end of the user thread when Tomcat exits. During this time (the Spring framework is closed and the user thread ends), many unforeseen problems occur. The most common of these problems is the IllegalStateException. When such an exception occurs, the standard code is as follows:
public void run(){

while(!isInterrupted()) {

try {

Thread.sleep(1000);

GQBean bean = SpringContextHolder.getBean(GQBean.class);

/do something with bean…/

} catch (Exception e) {

e.printStackTrace ();

}

This type of error is easy to reproduce and is very common. Needless to say.

ClassNotFound/NullPointerException

This kind of mistake is not common and it is cumbersome to analyze.

In the previous analysis we identified two things:

User-created threads do not stop as the container is destroyed.
The ClassLoader unloads the loaded Class during the stop of the container.
It's easy to determine that this is caused by the end of the thread.

When the ClassLoader is unloaded, the user thread tries to load a Class and reports a ClassNotFoundException or NoClassDefFoundError.
During the ClassLoader uninstallation process, because Tomcat does not strictly synchronize the stop container, attempting to load a Class may result in a NullPointerException for the following reasons:
//part of load class code, may be executed in user thread

protected ResourceEntry findResourceInternal(…){

if (!started) return null;

synchronized (jarFiles) {

if (openJARs()) {

for (int i = 0; i < jarFiles.length; i++) {

jarEntry = jarFiles[i].getJarEntry(path);

if (jarEntry != null) {

try {

entry.manifest = jarFiles[i].getManifest();

} catch (IOException yes) {

// Ignore

}

break;

}

/Other statement/

}

As you can see from the code, the access to jarEntry is very carefully synchronized. There are very careful synchronizations in the use of jarEntry, except in the stop:

// loader.stop() must be executed in stop thread

public void stop() throws LifecycleException {

/other statement/

length = jarFiles.length;

for (int i = 0; i < length; i++) {

try {

if (jarFiles[i] != null) {

jarFiles[i].close();

}

} catch (IOException e) {

// Ignore

}

jarFiles [i] = zero;

}

/other statement/

}

It can be seen that in the above two pieces of code, if the user thread enters the synchronous code block (this will cause the thread buffer to be refreshed), started becomes false, skipping updating jarFiles or jarFiles[0] has not been Empty, wait until the return from openJARs, stop just executed jarFiles[0] = null, it will trigger NullPointerException.

This exception is very difficult to understand, the reason is why the loadClass operation is triggered, especially when there is no new class in the code. In fact, there are many times when it triggers an initialization check on a class. (Note that the initialization of the class, not the initialization of the class instance, the difference between the two)

The following conditions will trigger the initialization check of the class:

The first instance of this class is created in the current thread
The static method of calling the class for the first time in the current thread
The first time a static member of a class is used in the current thread
Assigning values to class static members for the first time in the current thread
(Note: If the class has been initialized at this time, it will return directly. If the class has not been initialized at this time, the class initialization will be performed)

Initialization checks are triggered when these conditions occur in a thread (up to one thread in a thread). You must obtain this class before checking the initialization of this class. You need to call the loadClass method.

Generally, the code with the following pattern is easy to trigger the above exception:

try{

/**do something **/

}catch(Exception e){

//ExceptionUtil has never used in the current thread before

String = ExceptionUtil.getExceptionTrace(e);

//or this, ExceptionTracer never appears in the current thread before

System.out.println(new ExceptionTracer(e));

//or other statement that triggers a call of loadClass

/do other thing/

}

Some suggested solutions
According to the above analysis, the main cause of the exception is that the thread did not terminate in time. So the key to the solution is how to gracefully terminate the user-initiated thread before the container terminates.

Create your own Listener as the informer of the terminating thread

According to the analysis, the user mainly uses the thread created by the user, including four types:

Thread
Executors
hours
Scheduler
So the most straightforward idea is to create a management module for these components, which is divided into two steps:

The first step: create a Listener-based management module, and assign the four types of class instances mentioned above to the module management.
Step 2: When the Listener listens to Tomcat downtime, it triggers the end method corresponding to the instance it manages. For example, Thread triggers the interrupt() method, and ExecutorService triggers the shutdown() or shutdownNow() methods (depending on the specific policy selection).
It is worth noting that the user-created Thread needs to respond to the Interrupt event, that is, after isInterrupted() returns true or after the InterruptException is caught, the thread is exited. In fact, creating a thread that does not respond to the Interrupt event is a very bad design.

The advantage of creating your own Listener is that you can actively block the destruction process when listening for events, and get some time for the user thread to clean up, because Spring has not been destroyed yet, and the state of the program is all normal.

The downside is that it is invasive to the code and depends on the user's coding.

Use Spring's TaskExecutor

In response to the goal of managing its own threads in the webapp, Spring provides a set of tools for TaskExcutor. The ThreadPoolTaskExecutor is very similar to the ThreadPoolExecutor in Java5, except that the lifecycle is managed by Spring. When the Spring framework is stopped, the Executor is also stopped, and the user thread receives an interrupt exception. At the same time, Spring also provides the ScheduledThreadPoolExecutor, which can be used for scheduled tasks or for creating your own threads. For thread management, Spring provides very rich support, which can be seen here:

Https://docs.spring.io/spring/docs/current/spring-framework-reference/integration.html#scheduling.

The advantage of using the Spring framework is that it is less intrusive and less dependent on code.

The disadvantage is that the Spring framework does not guarantee the chronological order of thread interrupts and bean destruction. That is, if a thread catches an InterruptException and then passes Spring to getBean, it will still trigger an IllegalSateException. At the same time, the user still needs to check the thread status or trigger an interrupt in Sleep, otherwise the thread will not terminate.

Other reminders

In the above solution, whether blocking the main thread's stop operation in the Listener or not responding to the interrupt state in the Spring framework, you can continue to do something for the thread to gain some time. But this time is not unlimited. In catalina.sh, we can see in the script of the stop part (this is a simple example here):

#TomcatShutdown script excerpt

#First normal stop

eval “\”$_RUNJAVA\”” $LOGGING_MANAGER $JAVA_OPTS \

-Djava.endorsed.dirs=”\”$JAVA_ENDORSED_DIRS\”” -classpath “\”$CLASSPATH\”” \

-Dcatalina.base = "" $ CATALINA_BASE \ "" \

-Dcatalina.home=”\”$CATALINA_HOME\”” \

-Djava.io.tmpdir = "" $ CATALINA_TMPDIR \ "" \

org.apache.catalina.startup.Bootstrap “$@” stop

#If the termination fails, use kill -15

if [ $? != 0 ]; then

kill -15 cat “$CATALINA_PID” >/dev/null 2>&1

#Set the waiting time

SLEEP=5

if [ “$1” = “-force” ]; then

shift

#If there is -force in the parameter will force stop

FORCE=1

while [ $SLEEP -gt 0 ]; do

sleep 1

SLEEP=expr $SLEEP – 1

done

#If you need to force termination of kill -9

if [ $FORCE -eq 1 ]; then

kill -9 $PID

As you can see from the stop script above, if forced termination is configured (our server is configured by default), the time you block the process to do its own thing is only 5 seconds. During this time, there are other threads that are doing some tasks and when the thread actually starts to terminate until the discovery is terminated (such as the time from the current to the next call to isInterrupted). Considering this, the maximum blocking time should be shorter.

As can be seen from the above analysis, if there are more important and time-consuming tasks in the service, and you want to ensure consistency, the best way is to record the current execution progress in the precious 5 seconds of blocking, wait until The last execution progress is detected when the service is restarted, and then resumed from the previous progress.

It is recommended that the execution granularity of each task (the detection interval of two isInterrupted) be controlled at least within the maximum blocking time to allow enough time for the recording work after termination.

Reference material

Tomcat source code 7.0.69
Tomcat start and stop service principle http://blog.csdn.net/beliefer/article/details/51585006
Tomcat Lifecycle Management http://blog.csdn.net/beliefer/article/details/51473807
JVMs and kill signalshttp://journal.thobe.org/2013/02/jvms-and-kill-signals.html
Task Execution and Schedulinghttps://docs.spring.io/spring/docs/current/spring-framework-reference/integration.html#scheduling
The Art of Java Concurrent Programming