Wednesday, August 19, 2009

Stateful vs Stateless

In this entry I will cautiously venture into the thorny territory of stateless-vs-stateful. For quite some time I have been frying my brain in attempts to introduce some structure into this space. The goal is to define patterns and practices about working with state under OSGi. Generally under OSGi bundles communicate with services so I will do my musings from this perspective.

Hotswapping and the Service Use Spike

"Why not simply use state as we like?" you might ask. As is common with OSGi the short answer is "Because of dynamics!". The longer answer is that under OSGi a bundle can be updated at runtime and this will kill any state accumulated inside the bundle. Ideally we want client bundles to perceive this event as a hotswap of all services the updated bundle exports. I will define a service as being capable of hotswap as

The ability for callers to swap the live service object and experience a timing hiccup as the only side effect.

Using services under OSGi always requires us to go through the same steps:

  1. get
    Service object is obtained.
  2. use
    Service object is called one or more times.
  3. release
    Service object is released

To reflect the peak of activity at the use step I decided to call this sequence the service use spike. The spike "rises" in the get step and "falls" in the release step. You can imagine these spikes drawn on a time axis as the pulse of some odd creature. Obviously a service can be hotswapped safely between the use spikes. The event of a bundle update can be represented as a vertical line cutting through the time axis. Because an update can happen at any time inevitably it will cut through some use spikes. The type of service determines how the bundles who own the interrupted spikes deal with the failure.

Stateless Services

These are the "classic" services. The result of call to a stateles service depends only on the passed parameters - i.e. is a direct function of the parameters. Note that there is no problem for the results to include side effects including database modifications. The single and rather big constraint that makes a service stateles is...

A service qualifies as stateles if it does not store unrecoverable data in the objects comprising it's implementation.

Here unrecoverable basically means data kept in memory rather than externalized into some data store. I.e. once the service is unregistered the data is lost to all clients. This definition permits stateles services to build up state as long they remain indistinguishable from a service that calculates everything at every method call. I.e. we permit intermediate computation caching to improve performance.

Logically the use spike of stateles services spans a single method call. Therefore to retry a use spike we need only the information about this call. In most cases this contextual information is available on the stack of the current thread. E.g. the thread is at a point where it tries to call out to the service and discovers it is not available. It is than free to retry the call against an alternative service or if none are present wait until it appears. This type of behavior is even supported by the OSGi standard service tracking utility.

For performance reasons the use spike of stateles services can be extended to span multiple method calls. This can be done by serving the get step from a cache that in turn obtains the services from the dynamic environment and holds on to them for a while. The release step is than postponed and the use spike is artificially extended. The net result - we do much less get and release than use and performance goes up.

Stateless services are used simultaneously by multiple clients and virtually always need to deal with issues of concurrent access.

Stateful Services

A result from a call to a stateful service depends on previous calls. Here two subsequent calls with identical parameters can yield different results. You can think of these as services with history where every call mutates some internal state and thus influences the outcome of subsequent calls. Because the correctness of our program depends on the accumulated history we care dearly to call the exact same object every time. This means that hotswapping stateful services is in the general case impossible.

Obviously the use spike of stateful services spans many calls, because the user must hold on to a service object as state is accumulated with every subsequent call. If the spike is interrupted at the N-th call to do a retry we need to playback all (N-1) calls leading up to the failure. It is impractical or downright impossible to keep such a call log. For this reason we can not hotswap stateful services. I.e. an update to a stateful service can not go unnoticed. We treat the disappearing of a stateful service as a catastrophic event. I.e. we throw an exception, unroll the stack up to a fault barrier, cleanup any private state we have associated with the stateful service, do contingency reactions.

Stateless services can serve only one client at a time and often are accessed sequentially by that client. So we can expect less concurrency issues here.

Because each client needs it's own copy of the service we need a factory mechanism via, which clients can produce new instances. The default OSGi factory mechanism is too restrictive. It caches service instances on per-bundle basis and does not allow parameters to be passed at construction time. For this reason often stateful services come with a supporting stateles factory service. This service has one or more factory methods that clients call to obtain the "real" stateful service objects. This approach has the drawback that the OSGi container no longer has visibility over these secondary services. As a consequence we count on the clients to not forget to call a close()/dispose() method on the stateful object when they are done with it. OSGi definitely needs an improved factory mechanism if it is to support stateful services as equal citizens. It must support construction parameters and relinquish the caching policy into the hands of the exporting bundle. This is a theme for a future post.

Once we have state we must worry about mutating it in a correct sequence. The beginning and the end of this sequence become particularly interesting when part of the state is stored into an external service. Under OSGi dynamics we can not know when the external service will come and go. So how can we keep our half of the state in lockstep with the dynamic half? This is how we arrive at the notion of a lifecycle and to the problem of how and when to synchronize the lifecycles of dependent "lumps of state". In short:

As soon as we add even one bit of mutable state (e.g. an availability flag) to our service, we also need to add lifecycle to manage that bit.

A service interface plus a lifecycle are one common definition of component. And here you have it! Managing stateful services leads to the need of a strong component model. What are the further repercussions of this I will analyze in future posts.

Conclusion

State is a fact of life in a general purpose programming setting. It is simply not convenient or practical to always externalize the state out of our service so we make it stateles. One example is a bundle that implements a network stack. The clients of this stack need to use Connection objects directly. It would be an atrocity to force them to hold on to a ConnectinData object and pass it to a stateles ConnectionService every time they want to read or write.

Here in lies the dilemma of state: an explicit separation between code (locked in stateles services) and data (locked in DTOs) permits a scalable, composable, hotswappable design. All nice features of a functional-like programming style. At the same time this flies in the face of conventional OO design. This stark switch in style between design in the small (inside the bundle) and design in the large (composing bundles into systems) can be a source of more problems than just going ahead and using state where it makes sense.

In this post I tried to identify the problems caused by state. The idea was to invent some reference framework to think about this stuff so I can get to real solutions in later posts. If OSGi is to become the universal JVM middleware it aspires to be we need clear practices and frameworks to deal with state. Even if the ultimate answer turns out to be "Just don't use stateful services!" the exploration leading up to that is worth it.