I guess I should've covered why I named this blog the way I did in the intro post few months back, but better late than never ;)
One of the things I like most about Coherence is how simple it makes interop between various platforms. While the cluster-side code has to be written in Java, writing client applications is equally simple whether you write them in Java, C# or C++. There are no web services or other heavy-weight technologies to make your life miserable -- you simply code your application against the appropriate client library, and the client library itself takes care of data marshaling and the low-level communication details between the client and the cluster. Best of all, the API used is the same one you would use within the cluster, and apart from the minor platform idiosyncrasies, it is equivalent across the supported platforms.
There are two underlying technologies that make this possible. The first one is Coherence*Extend, a low-level, TCP/IP-based messaging protocol that is used for communication between the client and the cluster. While it is truly amazing piece of software in its own right, Coherence*Extend by itself wouldn't help much if it wasn't for the second piece of the puzzle -- Portable Object Format, or POF.
Portable Object Format
POF is a platform-independent serialization format that allows you to encode equivalent Java, .NET and C++ objects into the identical sequence of bytes. In other words, for any given class with the identical data members, the bytes on the wire will be exactly the same, regardless of the platform.
For example, let's assume that we have the following Java class:
public class Person {
private Long m_id;
private String m_name;
public Long getId() { return m_id; }
public void setId(Long id) { m_id = id; }
public String getName() { return m_name; }
public void setName(String name) { m_name = name; }
}
And the identical (from the data members perspective) class written in C#:
public class Person
{
private Int64 m_id;
private String m_name;
public Int64 Id
{
get { return m_id; }
set { m_id = value; }
}
public String Name
{
get { return m_name; }
set { m_name = value; }
}
}
The POF serialized form of these two classes would be identical.
Implementing POF Serialization
In order to make your objects portable, you need to implement serialization code by hand. While this definitely isn't the most exciting code you'll ever write, it is quite simple and typically doesn't take more than a minute or two per class.
There are two ways to implement POF serialization. The first one is to implement PortableObject interface (IPortableObject in .NET):
public class Person implements PortableObject {
private Long m_id;
private String m_name;
...
public void readExternal(PofReader reader) throws IOException {
m_id = reader.readLong(0);
m_name = reader.readString(1);
}
public void writeExternal(PofWriter writer) throws IOException {
writer.writeLong(0, m_id);
writer.writeString(1, m_name);
}
}
As you can see, making class portable is not a rocket science -- you simply implement PortableObject interface and use the appropriate PofReader and PofWriter methods to read and write class members to a POF stream.
However, what if we don't have the source code for the class, or are not allowed to modify it? We can still make it portable using the second approach, which is to implement an external serializer. This is exactly what we will do for our .NET class:
public class PersonSerializer : IPofSerializer
{
public void Serialize(IPofWriter writer, Object obj)
{
Person p = (Person) obj;
writer.WriteInt64(0, p.Id);
writer.WriteString(1, p.Name);
writer.WriteRemainder(null);
}
public Object Deserialize(IPofReader reader)
{
Person p = new Person();
p.Id = reader.ReadInt64(0);
p.Name = reader.ReadString(1);
reader.ReadRemainder();
return p;
}
}
As you can see, writing an external serializer is not much more complex either (ignore for now the calls to Read/WriteRemainder methods -- they have to do with object evolvability, which is a subject that warrants its own post).
POF Context
POF serializer does not encode class name into the POF stream -- after all, doing so would defeat its purpose, as class names are platform-specific. Instead, it encodes integer type identifier, and leaves it up to the user to ensure that type identifiers map to appropriate classes on each platform.
The mapping is achieved using one of PofContext (IPofContext in .NET) implementations. SimplePofContext allows you to map types programmatically, and is very useful for unit testing. However, in real applications you will likely want to externalize type mappings into a file, which is where ConfigurablePofContext comes in.
The ConfigurablePofContext allows you to specify mappings in an XML file. For example, configuration file for the Java POF serializer that can serialize our Person class would look like this:
<!DOCTYPE pof-config SYSTEM "pof-config.dtd">
<pof-config>
<user-type-list>
<user-type>
<type-id>1000</type-id>
<class-name>example.Person</class-name>
</user-type>
</user-type-list>
</pof-config>
On the .NET side the configuration is very similar. The only difference really is that the XML schema is used instead of DTD, and that because our class doesn't implement IPortableObject interface we also need to configure the external serializer explicitly:
<pof-config xmlns="http://schemas.tangosol.com/pof">
<user-type-list>
<user-type>
<type-id>1000</type-id>
<class-name>Example.Person, MyAssembly</class-name>
<serializer>
<class-name>Example.PersonSerializer, MyAssembly</class-name>
</serializer>
</user-type>
</user-type-list>
</pof-config>
Now that both Java and .NET serializer have been configured, you can serialize Person instance on one platform and deserialize it on the other.
Conclusion
In addition to portability and platform independence, there is much more to like about POF.
For one, it is extremely compact format. Instead of verbose class names, it uses integer identifiers to represent types, and you have probably already realized from code examples that the same is true for class attributes, where integer indexes are used instead of property names. It is quite common to achieve 3-5 times size reduction of serialized data when compared to standard Java or .NET binary serialization, which in the context of Coherence means 3-5x less network traffic and more importantly 3-5x less RAM needed in the cluster (or caching 3-5x the data in the same amount of RAM, depending on how you look at it).
The second benefit is the raw serialization speed. POF serialization is consistently 10-12 times faster than Java or .NET serialization. While this is pretty much irrelevant for individual cache puts and gets, due to network access overhead, it can significantly improve the performance of queries and aggregations against the cache that require objects to be deserialized.
Finally, as of Coherence 3.5, binary POF values can be manipulated directly using PofValue interface and related classes, providing you with one additional way to avoid excessive serialization and deserialization of objects in a partitioned cache. The widely discussed PofExtractor and PofUpdater are built on top of this functionality, but I personally find the former much more interesting as it opens up a world of possibilities when it comes to direct binary manipulation of cached objects.
In the next post, I will show one such example by implementing an equivalent of a VersionedPut entry processor that does not need to deserialize either the old or the new value.
Until then, take a look at POF specification and the API documentation for the POF reflection package.