Friday, August 14, 2009

Classload acrobatics: code generation under OSGi

In a previous blog I mentioned that the hardest problem we face when porting existing Java infrastructure to OSGi has to do with class loading. This blog is dedicated to the AOP wrappers, ORM mappers and similar code generation engines that face the harshest issues in this area. I will gradually introduce the main problem, present the best current solution, and develop a tiny bit of code that implements it. This blog comes with a working demo project that contains not only the code presented here, but also two ASM-based code generators you can play with.

Classload site conversion

Usually porting a Java framework to OSGi requires it to be refactored to the extender pattern. This pattern allows the framework to delegate all class loading to OSGi and at the same time retain control over the lifecycle of application code. The goal of the conversion is to replace things like

Class appClass = Class.forName("com.acme.devices.SinisterEngine");
...
ClassLoader appLoader = ...
Class appClass = appLoader.loadClass("com.acme.devices.SinisterEngine")

with

Bundle appBundle = ...
Class appClass = appBundle.loadClass("com.acme.devices.SinisterEngine")

Although we must do a non-trivial piece of work to get OSGi to load the application code for us we at least have an nice and correct way to get things working. And work they will even better than before, because now the user can add/remove applications just by installing/uninstalling bundles into the OSGi container. Also the user can break up their application in as many bundles as he wishes share libraries between the applications and all that sweet modular stuff.

Adapter ClassLoader

Sometimes the code we convert has externalized it's class loading policy. This means the classes and methods of the framework take explicit ClassLoader parameters allowing us to dictate where they load application code from. In this case the conversion to OSGi can become a mere question of adapting a Bundle object to the ClassLoader API. This is done by what I call an adapter ClassLoader.

public class BundleClassLoader extends ClassLoader {
  private final Bundle delegate;

  public BundleClassLoader(Bundle delegate) {
    this.delegate = delegate;
  }

  @Override
  public Class<?> loadClass(String name) throws ClassNotFoundException {
    return delegate.loadClass(name);
  }
}

Now we can pass this adapter to the framework code. We can also add bundle tracking code to create the adapters as new bundles come and go. I.e. we are able to adapt a Java framework to OSGi "externally" avoiding the exhausting browsing through the codebase and the conversion of each individual classload site. Here is a highly schematic sample of some code that converts a framework to use OSGi class loading:

...
Bundle app = ...
BundleClassLoader appLoader = new BundleClassLoader(app);

DeviceSimulationFramework simfw = ...
simfw.simulate("com.acme.devices.SinisterEngine", appLoader);
...

Bridge ClassLoader

The coolest Java frameworks do fancy classworking on client code at runtime. The goal usually is to dynamically build classes out of stuff living in the application class space. Some examples are service proxies, AOP wrappers, and ORM mappers. Let's call these generated classes enhancements. Usually the enhancement implements some application-visible interface or extends an application-visible class. Sometimes additional interfaces and their implementations are mixed in as well.

Enhancements augment application code. I.e. the generated objects are meant to be called directly by the application. For example a service proxy is passed to business code to free it from the need to track a dynamic service. Similarly a wrapper that adds some AOP feature is passed to application code in place of the original object.

Enhancements start life as byte[] blocks produced by your favorite class engineering library (ASM, BCEL, CGLIB, ...). Once we have generated our class we must turn the raw bytes into a Class object. I.e. we must make some ClassLoader call it's defineClass() method on our bytes. We have three separate problems to solve:

  • Class space completeness
    First we must determine the class space, into which we can define our enhancements. It must "see" enough classes to allow the enhancements to be fully linked.
  • Visibility
    ClassLoader.defineClass() is a protected method. We must find a good way to call it.
  • Class space consistency
    Enhancements mix classes from the extender and the application bundles in a way that is "invisible" to the OSGi container. As a result the enhancements can potentially be exposed to incompatible versions of the same class.

Class space completeness

Enhancements are backed by code private to the Java framework that generates them. Therefore the extender should introduce the new class into it's own class space. On the other hand the enhancements implement interfaces or extend classes visible in the application class space. Therefore we should define the enhancement class there. Bummer!

Because there is no class space that sees all classes we require we have no other option but to make a new class space. A class space equals a ClassLoader instance so our first job is to maintain one dedicated ClassLoader on top of every application bundle. These are called bridge ClassLoaders, because they merge two class loaders by chaining them like so:

public class BridgeClassLoader extends ClassLoader {
  private final ClassLoader secondary;

  public BridgeClassLoader(ClassLoader primary, ClassLoader secondary) {
    super(primary);
  }

  @Override
  protected Class<?> findClass(String name) throws ClassNotFoundException {
    return secondary.loadClass(name);
  }
}

Now we can use the BundleClassLoader developed earlier:

  /* Application space */
  Bundle app = ...
  ClassLoader appSpace = new BundleClassLoader(app);

  /*
   * Extender space
   *
   * We assume this code is executed in a non-static method inside the extender
   */
  ClassLoader extSpace = getClass().getClassLoader();

  /* Bridge */
  ClassLoader bridge = new BridgeClassLoader(appSpace, extSpace);

This loader will serve requests first from the application space, and if that fails try the extender space. Notice that we still let OSGi do lot's of heavy lifting for us. When we delegate to either class space we are in fact delegating to an OSGi-backed ClassLoader. I.e. the primary and secondary loaders can delegate to other bundle loaders in accordance to the import/export metadata of their respective bundles.

At this point we might be pleased with ourselves - I was for quite some time. The bitter truth however is that the extender and application class spaces combined may not be enough. Everything hinges on the particular way the JVM links classes (also known as resolving classes).

In brief
JVM resolution works on a fine grained or sub-class level.

In detail
When the JVM links a class it does not need the complete descriptions of all classes referenced from the linked class. It only needs information about the individual methods, fields and types that are really used by the linked class. What to our intuition is a monolithic whole to the JVM is a class name, plus a superclass class, plus a set of implemented interfaces, plus a set of method signatures, plus a set of field signatures. All these symbols are resolved independently and lazily. For example to link a method call the class space of the caller needs to supply Class objects only for the target class and for all types used in the method signature. Definitions for the numerous other things that the target class may contain are not needed and the ClassLoader of the calling class will never receive a request for them.

Formally
Class TA from class space SpaceA must be represented by the same Class object in class space SpaceB if and only if:

  • There exists a class TB from SpaceB that refers to TA form it's symbol table (known also as the constant pool).
  • The OSGi container has chosen SpaceA as the provider of class TA for SpaceB.

By example
Imagine we have a bundle BndA that exports a class A. Class A has 3 methods distributed between 3 interfaces: IX.methodX(String), IY.methodY(String), IZ.methodZ(String). Imagine further we have a bundle BndB that has a class B. Somewhere in class B there is a reference A a = ... and a method call a.methodY("hello!"). To get class B to resolve we need to introduce into the class space of BndB class A, and class String. That's all! We don't need to import IX or IZ. We don't need to import even IY because class B does not use it - it uses only A. On the other hand when the exporting bundle BndA resolves class A it must supply IX, IY, IZ because they are directly referenced as interfaces implemented by class A. Finally even BndA does not have to supply any of the super-interfaces of IX, IY, IZ because they are not directly referenced from A.

Now let's imagine we want to present to class B an enhanced version of class A. The enhancement needs to extend class A and override some or all of it's methods. Because of that the enhancement needs to see the classes used in the signatures of all overridden methods. To supply all required classes BndB must contain code that calls each method we mean to override. Otherwise it will have not reason to import the required classes. It is very likely however that BndB calls only a few of A's methods. Therefore BndB likely does not see enough classes to support the enhancement. The complete set can only be supplied by BndA. Double Bummer!

Turns out that we must bridge not the extender and application spaces but the extender space and the space of the enhanced class. I.e. rather than "bridge per application space" we must shift to a "bridge per enhanced space". I.e. application really requires us to bridge the class space of some third party class it can see because it's bundle imports it. How do we do that transitive leap from the application space to the space the application uses? Simple! As we know every Class object can tell us, which is the class space where it is fully defined. For example all we need to do to get the defining class loader of A is to call A.class.getClassLoader(). In many cases however we have a String name rather than a Class object so how do we get A.class to begin with? Simple again! We can ask the application bundle to give us the exact Class object it sees under the name "A". This is a critical step because we need the enhanced and original classes to be interchangeable within the application. Out of potentially many available versions of class A we need to pick the class space of the one used by the application. Here is a schematic of how an extender can maintain a cache of class loader bridges:

...
/* Ask the app to resolve the target class */
Bundle app = ...
Class target = app.loadClass("com.acme.devices.SinisterEngine");

/* Get the defining class loader of the target */
ClassLoader targetSpace = target.getClassLoader();

/* Get the bridge for the class space of the target */
BridgeClassLoaderCache cache = ...
ClassLoader bridge = cache.resolveBridge(targetSpace);

where the bridge cache would look something like

public class BridgeClassLoaderCache {
  private final ClassLoader primary;
  private final Map<ClassLoader, WeakReference<ClassLoader>> cache;

  public BridgeClassLoaderCache(ClassLoader primary) {
    this.primary = primary;
    this.cache = new WeakHashMap<ClassLoader, WeakReference<ClassLoader>>();
  }

  public synchronized ClassLoader resolveBridge(ClassLoader secondary) {
    ClassLoader bridge = null;

    WeakReference<ClassLoader> ref = cache.get(secondary);
    if (ref != null) {
      bridge = ref.get();
    }

    if (bridge == null) {
      bridge = new BridgeClassLoader(primary, secondary);
      cache.put(secondary, new WeakReference<ClassLoader>(bridge));
    }

    return bridge;
  }
}

To prevent memory leaks due to ClassLoader retention I had to use both weak keys and weak values. The goal is to not retain the class space of an uninstalled bundle in memory. I had to use weak values because the value of each map entry references strongly the key thus negating it's weakness. This is the standard advice prescribed by the WeakHashMap javadoc. By using a weak cache I avoid the need to track a whole lot of bundles and do eager reactions to their lifecycles.

Visibility

Okay we finally have our exotic bridge class space. Now how do we define our enhancements in it? The problem as I mentioned is that defineClass() is a protected method of BridgeClassLoader. We could override it with a public method but that would be rude and we will have to code our own checks to see if the requested enhancement has already been defined. Normally defineClass() is called from findClass(), when it determines it can supply the requested class from a binary source. The only information findClass() must relay on to make this decision is the name of the class. So our BridgeClassLoader must think to itself:

This is a request for "A$Enhanced" so I must call the enhancement generator for class "A"! Than I call defineClass() on the produced byte[]. Than I return the new Class object.

There are two remarkable things about that statement.

  • We introduced a text protocol for the names of enhancement classes.
    We can pass to our ClassLoader a single item of data - a String for the name of the requested class. At the same time we need to pass two items of data - the name of the original class and a flag marking it as a subject to enhancement. We pack these two items into a single string of the form
    [name of target class]"$Enhanced"
    . Now findClass() can look for the enhancement marker $Enhanced and when it is present extract the name of the target class. In this way we also introduce a convention for the names of our enhancements. Whenever we see a class name sending on $Enhanced in a stack trace we know this is a dynamically generated class. To mitigate the risk of name clashes with normal classes we make the enhancement marker as exotic as Java allows (e.g. $__service_proxy__).
  • Enhancements are generated on demand.
    We will never try to generate an enhancement twice. The loadClass() method we inherited will first call findLoadedClass(), if that fails it will call parent.loadClass(), and only if that fails it will call findClass(). The fact that we use a strict protocol for the names guarantees findLoadedClass() will work the second time we get a request to enhance the same class. Couple this with the caching of bridge ClassLoaders and we get a pretty efficient solution where at no point we bridge the same bundle space twice or generate redundant enhancement classes.

Here we must also mention the option to call defineClass() through reflection. This approach is used by cglib. I suppose this is a viable option when we want the user to pass us a ready for use ClassLoader. By using reflection we avoid the need to create yet another loader on top of that just so we can access it's the defineClass() method.

Class space consistency

In the end of the day what we have done is to merge two class spaces that were not explicitly connected through the OSGi modular layer. Also we introduced a search order between those spaces similarly to the search order of the evil java class path. I.e. we have potentially eroded the class space consistency of the OSGi container. Here is a scenario of how bad things can happen:

  1. Extender uses package com.acme.devices and requires exactly version 1.0
  2. Application uses package com.acme.devices and requires exactly version 2.0.
  3. Class A refers directly to com.acme.devices.SinisterDevice.
  4. Class A$Enhanced refers directly to com.acme.devices.SinisterDevice from it's internal implementation.
  5. Because we search the application space first A$Enhanced will be linked against com.acme.devices.SinisterDevice version 2.0, while it's internal code was compiled against com.acme.devices.SinisterDevice version 1.0.

As a result the application will see mysterious LinkageErrors and/or ClassCastExceptions. Triple Bummer!

Alas there does not yet exist an automated way to handle this problem. We must simply make sure the enhancement code refers directly only to "very private" implementation classes that are not likely to be used by anyone else. We can even build private adapters for any external API's we might want to use and than refer to those from the enhancement code. Once we have a well defined implementation subspace we can use that knowledge to limit the class leakage. We now delegate to the extender requests only for the special subset of private implementation classes. This will also limit the search order problem allowing us to switch between application-first and extender-first search. One good policy to keep things under control is to have a dedicated package for all enhancement implementations. Than we only check for classes who's name begins with that package. Finally we sometimes need to judiciously relax this isolation policy for certain singleton packages like org.osgi.framework. I.e. we can feel pretty safe to compile our enhancement code directly against org.osgi.framework because at runtime everyone in the OSGi container will see the same org.osgi.framework - it is supplied by the OSGi core.

Putting it all together

Everything from this class loading saga can be distilled in the following ~100 lines of code.

public class Enhancer {
  private final ClassLoader privateSpace;
  private final Namer namer;
  private final Generator generator;
  private final Map<ClassLoader , WeakReference<ClassLoader>> cache;

  public Enhancer(ClassLoader privateSpace, Namer namer, Generator generator) {
    this.privateSpace = privateSpace;
    this.namer = namer;
    this.generator = generator;
    this.cache = new WeakHashMap<ClassLoader , WeakReference<ClassLoader>>();
  }

  @SuppressWarnings("unchecked")
  public <T> Class<T> enhance(Class<T> target) throws ClassNotFoundException {
    ClassLoader context = resolveBridge(target.getClassLoader());
    String name = namer.map(target.getName());
    return (Class<T>) context.loadClass(name);
  }

  private synchronized ClassLoader resolveBridge(ClassLoader targetSpace) {
    ClassLoader bridge = null;

    WeakReference<ClassLoader> ref = cache.get(targetSpace);
    if (ref != null) {
      bridge = ref.get();
    }

    if (bridge == null) {
      bridge = makeBridge(targetSpace);
      cache.put(appSpace, new WeakReference<ClassLoader>(bridge));
    }

    return bridge;
  }

  private ClassLoader makeBridge(ClassLoader targetSpace) {
    /* Use the target space as a parent to be searched first */ 
    return new ClassLoader(targetSpace) {
      @Override
      protected Class<?> findClass(String name) throws ClassNotFoundException {
        /* Is this used privately by the enhancements? */
        if (generator.isInternal(name)) {
          return privateSpace.loadClass(name);
        }

        /* Is this a request for enhancement? */
        String unpacked = namer.unmap(name);
        if (unpacked != null) {
          byte[] raw = generator.generate(unpacked, name, this);
          return defineClass(name, raw, 0, raw.length);
        }

        /* Ask someone else */
        throw new ClassNotFoundException(name);
      }
    };
  }
}

public interface Namer {
  /** Map a target class name to an enhancement class name. */
  String map(String targetClassName);

  /** Try to extract a target class name or return null. */
  String unmap(String className);
}

public interface Generator {
  /** Test if this is a private implementation class. */
  boolean isInternal(String className);

  /** Generate enhancement bytes */
  byte[] generate(String inputClassName, String outputClassName, ClassLoader context);
}

Enhancer captures only the bridging pattern. I have externalized the code generation logic into a pluggable Generator. The generator receives a context ClassLoader from where it can pull classes and use reflection on them to drive the code generation. The text protocol for the enhancement class names is also pluggable via the Namer interface. Here is a final schematic code for how such an enhancement framework can be used:

...
/* Setup the Enhancer on top of the current class space */
ClassLoader privateSpace = getClass().getClassLoader();
Namer namer = ...;
Generator generator = ...;
Enhancer enhancer = new Enhancer(privateSpace, namer, generator);
...

/* Enhance some class the app sees */
Bundle app = ...
Class target = app.loadClass("com.acme.devices.SinisterEngine");
Class<SinisterDevice> enhanced = enhancer.enhance(target);
...

The Enhancer framework presented above is more than pseudocode. In fact during the research of this blog I really built it and tested it with two separate code generators mixing it in the same OSGi container. The result was too fun to keep for myself so I put it up on Google Code for everyone to play:

Enhancer

Those interested in the class generation process itself can examine the two demo ASM-based generators. Those who read the InfoQ article on service dynamics may notice that the proxy generator uses as private implementation the ServiceHolder code presented there. I try to put my code where my mouth is.

Conclusion

The classload acrobatics resented here are used in a number of infrastructural frameworks under OSGi. Classload bridging is used by Guice, Peaberry and Spring Dynamic Modules to get their AOP wrappers and service proxies to work. I have seen classload adapter used on EclipseLink JPA to get it working on a non-equinox OSGi container. When I hear the Spring guys say they did serious work on Tomcat to adapt it to OSGi I imagine they had to do classload site conversion or a more serious refactor to externalize Tomcat's servlet class loading altogether.

Acknowledgements

Many of the lessons in this blog were extracted from the excellent code Stuart McCulloch wrote for Guice and Peaberry. For examples of industrial strength classload bridging look here and here. There you will see how to handle some additional aspects like security, the system class loader, better lazy caching, and concurrency. Thank you Stuart!

I also am obliged to this article by Peter Kriens. Before I read this I though I had OSGi class loading figured out. I hope my meticulous explanations on JVM linking will be a useful contribution to Peter's work. Thank you Peter!

No comments: