Canonical Message Data Format

Problem: When you are building a set of systems that will communicate via messaging (e.g. probably using some kind of MessagingOrientedMiddleware) one of the biggest problems is that the parties on both ends of each conversation will need to be able to understand the format of the data passed between them. The MoM's themselves don't have anything to say about this -- they will pass any old data -- binary, text, etc. The question is, how do you make sure that each party understands each other?

What people generally start off with when they begin building systems with MOM's are ConversationalDataModels?, which mean that they create new data structures for each party-to-party conversation. Often these data structures are based on the data structures in use in one or both of the parties in the conversation, or in fact might even *be* the exact data structures in use (for instance the binary representation of a COBOL record structure).

However, the problem is that while that solves the problem for one conversation, the same question will crop up again on the next conversation and the next, and the next.

Solution: So, what people normally arrive at is the idea of a single, common data model in a single, common format. Nowadays, this is often XML, but key-value pairs are also popular and useful. In this way all messages have the same data format, and each system translates from a well-known, canonical model to its own internal format on message receipt and vice versa before it sends a message.


(See FacadeAtTheDistributionBoundary for a relevant discussion)

(See CommandMessagePattern for implemementation architecture)


This problem occurs whenever programs communicate by passing typed data as raw bits. E.g. communicating through pipes or TCP sockets or UDP datagrams. Any programs communicating over a network need to agree on a protocol by which they communicate. A CanonicalMessageDataFormat as described above is actually made up of two layered protocols -- (a) what data is sent between the parties and (b) how that data is represented as bits. These two protocol layers are termed the application layer and presentation layer in the OSI reference model.

There are a large number of presentation layer protocols, such as XDR, ASN.1, CDR, etc. As stated above, you can also use XML or name/value pairs as the basis for a presentation layer protocol, but you still need to strictly define the textual representation of the data types you want to send across the wire.

That's true but even with a common presentation layer protocol you have to define common semantics too. For instance, you have to get all parties to agree on the definitions of all of your terms...

Exactly, this is the application layer protocol part of the reference model.

You might have to refresh my memory here -- as I remember the OSI model it sort of waved its hands at the application layer protocol part (with the exception of X.400). Was there anything more of interest there?


Proxies can be used to separate the application layer from the presentation layer. This allows you to change the CanonicalMessageDataFormat without changing code that sends messages. It also helps simplify the system by cleanly separating concerns.


EditText of this page (last edited August 19, 2001) or FindPage with title or text search