Wednesday, 8 February 2012

Spring Integration Gateways - Null Handling & Timeouts


Spring Integration (SI) Gateways

Spring Integration Gateways (<int:gateway>) provide a semantically rich interface to message sub-systems. Gateways are specified using namespace constructs, these reference a specific Java interface (<service-interface>) that is backed by an object dynamically implemented at run-time by the Spring Integration framework. Furthermore, these Java interfaces can, if you so wish, be defined entirely independent of any Spring artefacts - that's both code and configuration.

One of the primary advantages of using the SI gateway as an interface to message sub-systems is that it's possible to automatically adopt the benefit of rich, default and customisable, gateway configuration. One such configuration attribute deserves further scrutiny and discussion primarily because it's easy to misunderstand and misconfigure around - default-reply-timeout.

Primary Motivator for Gateway Analysis


During recent consulting engagements, I've encountered a number of deployments that use Spring Integration Gateway specifications that may, in some circumstances, lead to production operational instability. This has often been in high-pressure environments or those where technology support is not backed by adequate training, testing, review or technology mentoring.

How do gateways behave in Spring Integration (R2.0.5)


One of the key sections, regarding gateways, in the Spring Integration manual clearly explains gateway semantics. Below is a 2-dimensional table of possible non-standard gateway returns for each of the scenarios that the SI Manual (r2.0.5) refers to.

Gateway Non-standard Responses

Runtime Events default-reply-timeout=x
Single-threaded
default-reply-timeout=x
Multi-threaded
default-reply-timeout=null
Single-threaded
default-reply-timeout=null
Multi-threaded
1. Long Running
Process
Thread Parked null returned Thread Parked Thread Parked
2. Null Returned
Downstream
null returned null returned Thread Parked Thread Parked
3. void method
Downstream
null returned null returned Thread Parked Thread Parked
4. Runtime
Exception
Error handler invoked or
exception thrown.
Error handler invoked or
exception thrown.
Error handler invoked or
exception thrown.
Error handler invoked or
exception thrown.

The key parts of this table are the conditions that lead to invoking threads being parked (noted in red), nulls returned (noted in orange) and exceptions (noted in green). Each contributor consists of configuration that is under the developers control, deployed code that is under developers control and conditions that are usually not under developers control. 

Clearly, the column headings in the table above are divided into two sections; two gateway configuration attributes. The default-reply-timeout is set by the SI configured and is the amount of time that a client call is wiling to wait for a response from the gateway. Secondly, synchronous flows are represented by Single-threaded flows, asynchronous by Multi-threaded flows.

A synchronous, or single-threaded flow, is one such as the following:

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:int="http://www.springframework.org/schema/integration"
       xsi:schemaLocation="http://www.springframework.org/schema/beans
       http://www.springframework.org/schema/beans/spring-beans.xsd
       http://www.springframework.org/schema/integration
       http://www.springframework.org/schema/integration/spring-integration-2.1.xsd">

    <import resource="common-beans.xml"/>

    <int:gateway id="enrollmentServiceGateway"
            service-interface="com.l8mdv.sample.EnrollmentServiceGateway"
            default-request-channel="gateway-request-channel"
            default-reply-channel="gateway-response-channel"
            default-reply-timeout="6000"/>

    <int:service-activator id="sleeperService" 
                           input-channel="gateway-request-channel"
                           output-channel="gateway-response-channel" 
                           ref="sleeper"/>

</beans>

The implicit input channel (gateway-request-channel) has no associated dispatcher configured. An asynchronous, or multi-threaded flow, is one such as the following:

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:int="http://www.springframework.org/schema/integration"
       xsi:schemaLocation="http://www.springframework.org/schema/beans
       http://www.springframework.org/schema/beans/spring-beans.xsd
       http://www.springframework.org/schema/integration
       http://www.springframework.org/schema/integration/spring-integration-2.1.xsd">

    <import resource="common-beans.xml"/>

    <int:gateway id="enrollmentServiceGateway"
                 service-interface="com.l8mdv.sample.EnrollmentServiceGateway"
                 default-request-channel="gateway-request-channel"
                 default-reply-channel="gateway-response-channel"
                 default-reply-timeout="6000"/>

    <int:channel id="gateway-request-channel">
        <int:dispatcher task-executor="taskExecutor"/>
    </int:channel>
    <int:service-activator id="sleeperService" 
                           input-channel="gateway-request-channel"
                           output-channel="gateway-response-channel" 
                           ref="sleeper"/>

</beans>

The explicit input channel has a dispatcher configured ("taskExecutor"). This task executor specifies a thread pool that supplies threads for execution and whose configuration as above marks a thread boundary. Note: This is not the only way of making channels asynchronous

The other configuration attribute referenced is default-reply-timeout, this is set on the gateway namespace configuration such as the example above. Note that both of these runtime aspects are set by the configurer during SI flow design and implementation. They are entirely under developer control.

The 'Runtime Events' column indicates gateway relevant runtime events that have to be considered during gateway configuration - these are obviously not under developer control. Trigger conditions for these events are not as unusual as one may hope.

1. Long Running Processes


It's not uncommon for thread pools to become exhausted because all pooled threads are waiting for an external resource accessed through a socket, this may be a long running database query, a firewall keeping a connection open despite the server terminating etc. There is significant potential for these types of trigger. Some long-running processes terminate naturally, sometimes they never completed - an application restart is required.

2. Null returned downstream


A null may be returned from a downstream SI construct such as a Transformer, Service Activator or Gateway. A Gateway may return null in some circumstances such as following a gateway timeout event.

3. Void method downstream

Any custom code invoked during an SI flow may use a void method signature. This can also be caused by configuration in circumstances where flows are determined dynamically at runtime.

4. Runtime Exception

RuntimeException's can be triggered during normal operation and are generally handled by catching them at the gateway or allowing them to propagate through. The reason that they are coloured green in the table above is that they are generally much easier to handle than timeouts.

Gateway Timeout Handling Strategies

There are four possible outcomes from invoking a gateway with a request message, all of these as a result of specific runtime events: a) an ordinary message response, b) an exception message, c) a null or d) no-response.

Ordinary business responses and exceptions are straight forward to understand and will not be covered further in this article. The two significant outcomes that will be explored further are strategies for dealing with nulls and no-response.

Generally speaking, long running processes either terminate or not. Long running processes that terminate may eventually return a message through the invoked gateway or timeout depending on timeout configuration, in which case a null may be returned. The severity of this as a problem depends on throughput volume, length of long running process and system resources (thread-pool size).

Configuration exists for default-reply-timeout 

In the case where a long running process event is underway and a default-reply-timeout has been set, as long as the long running process completes before the default-reply-timeout expires, there is no problem to deal with. However, if the long running process does not complete before that timeout expires one of three outcomes will apply.

Firstly, if the long running process terminates subsequent to the reply timeout expiry, the gateway will have already returned null to the invoker so the null response needs handling by the invoker. The thread handling the long-running process will be returned to the pool.

Secondly, if the long running process does not terminate and a reply timeout has been set, the gateway will return null to the gateway invoker but the thread executing the long-running process will not get returned to the pool.

Thirdly, and most significantly, if a default-reply-timeout has been configured but the long running process is running on the same thread as the invoker, i.e. synchronous channels supply messages to that process, the thread will not return, the default-reply-timeout has no affect.

Assuming the most common processing scenario, a long running process completes either before or after the reply timeout expiry. When a null is returned by the gateway, the invoker is forced to deal with a null response. It's often unacceptable to force gateway consumers to deal with null responses and is not necessary as with a little additional configuration, this can be avoided.

Absent Configuration for default-reply-timeout


The most significant danger exists around gateways that have no default-reply-timeout configuration set.

A long running process or a null returned from downstream will mean that the invoking thread is parked. This is true for both synchronous and asynchronous flows and may ultimately force an application to be restarted because the invoker thread pool is likely to start on a depletion course if this continues to occur.

Spring Integration Timeout Handling Design Strategies


For those Spring Integration configuration designers that are comfortable with gateway invokers dealing with null responses, exceptions and set default-reply-timeouts on gateways, there's no need to read further. However, if you wish to provide clients of your gateway a more predictable response, a couple of strategies exist for handling null responses from gateways in order that invokers are protected from having to deal with them.

Firstly, the simpliest solution is to wrap the gateway with a service activator. The gateway must have the default-reply-timeout attribute value set in order to avoid unnecessary parking of threads. In order to avoid the consequence of long-running threads it's also very prudent to use a dispatcher soon after entry to the gateway - this breaks the thread boundary.

Whilst this is a valid technical approach, the impact is that we have forced a different entry point to our message sub-system. Entry is now via a Service Activator rather than a Gateway. A side affect of this change is that the testing entry point changes. Integration tests that would normally reference a gateway to send a message now have to locate the backing implementation for the Service Activator, not ideal.

An alternative approach toward solving this problem would be to configure two gateways with a Service Activator between them. Only one of the gateways would be exposed to invokers, the outer one. Both Gateways would reference the same service interface. The outer gateway specification would not specify the default-reply-timeout but would specify the input and output channels in the same way that a single gateway would. The Service Activator between the Gateways would handle null gateway responses and possibly any exceptions if preferred to the gateway error handler approach.

An example is as follows:

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:int="http://www.springframework.org/schema/integration"
       xsi:schemaLocation="http://www.springframework.org/schema/beans
       http://www.springframework.org/schema/beans/spring-beans.xsd
       http://www.springframework.org/schema/integration
       http://www...org/schema/integration/spring-integration-2.1.xsd">

    <import resource="common-beans.xml"/>

    <int:gateway id="enrollmentServiceGateway"
                 service-interface="com.l8mdv.sample.EnrollmentServiceGateway"
                 default-request-channel="gateway-request-channel"/>

    <!-- This service activator has the adapted gateway injected as a bean and
    calls the adapted gateway from within the service method directly. -->
    <int:service-activator id="enrollmentServiceGatewayHandler"
                           input-channel="gateway-request-channel">
        <bean class="com.l8mdv.sample.EnrollmentServiceGatewayHandler">
            <constructor-arg name="enrollmentServiceAdaptedGateway"
                             ref="enrollmentServiceAdaptedGateway"/>
        </bean>
    </int:service-activator>

    <int:gateway id="enrollmentServiceAdaptedGateway"
                 service-interface="com.l8mdv.sample.EnrollmentServiceGateway"
                 default-request-channel="adapted-gateway-request-channel"
                 default-reply-timeout="6000"/>

    <int:channel id="adapted-gateway-request-channel" 
                 datatype="java.lang.String">
        <int:dispatcher task-executor="taskExecutor"/>
    </int:channel>

</beans>

The Service Activator bean (enrollmentServiceGatewayHandler) deals with both null and exception responses from the adapted gateway (enrollmentServiceAdaptedGateway), in the situation where these are generated a business response detailing the error is generated.

Spring Integration R2.1 Changes
async-executor on gateway spec

1 comment:

  1. Hello,
    very nice article.
    I am trying to use tiemout behaviour in case db table is locked for write.
    I have dispatcher channel configured in gateway request channel and get the following behaviour:
    1. gateway returns null to the caller after timeout
    2. I'd expect the threads in dispatcher channel terminate (get returned to the pool) after timeout (10s) but that's not the case. They are waiting for the lock for 1 min and then resume (after db table is unlocked) resulting in completed transaction.
    3. My intention was to use timeout as a trigger for retry. This way I can't rely on that since although null is returned to the caller the service activator invoked by dispatcher thread just resumes its execution.

    Any hint?
    Thanks,
    Milan

    ReplyDelete