SpaceBasedArchitecture (SBA) is an approach to GridComputing that co-locates the business logic, data, and messaging capabilities required by one instance of an end-to-end business use case in a single processing node. This makes it straightforward to address NonFunctionalRequirements (NFRs) such as near-linear scalability, high availability, low latency, and high throughput.
Scalability is achieved by running more processing nodes. Object affinity, the co-location of related objects into the same processing node, makes it possible for many systems to scale near linearly over a broad range of hardware resources.
- It seems this would imply duplicating the business data across different processing nodes... is the price being paid in consistency and having a complete view of the business data?
- A particular data item (actually a Java object) lives in one and only one primary node in the most common topology. It is possible to have every node contain the same data, but that is expensive to keep in sync and limits the amount of information to the size of one node. The full replication topology is typically only used for reference data.
- Ah, so this architecture would simply be deemed inappropriate for cases where the business logic itself has properties such that access to the full data set is required.
- ItDepends. ;-) I use a distributed MapReduce? pattern to run algorithms against data sets that are too large to fit into the memory of any one machine but that would run too slowly if I had to go to disk. The results depend on the full data set, but I don't need to have it all in one place at one time.
- Yeah,distributed MapReduce? visitors would work if the algorithm itself can be parallelized and really does 'reduce' (independent of item ordering) rather than needing to collect. That isn't a large subset of algorithms, though.
- Do you have an example in mind? I've found it works more often than I would have predicted. There is another subset of problems that falls to a double MapReduce?, where the results of one MapReduce? are used in the map task of the second.
- Topographical sorts (e.g. for scheduling) is one example. MapReduce? will help you with cluster-based queries (e.g. find data objects closest to this pattern) but won't help much for queries of a more general sort (e.g. find all pairs of data sharing a value in this particular pattern). I guess it really depends on the domain how often you encounter such things.
- That's a good example, although not one in my domain. The distributed space might still be valuable if the amount of data is too large for one machine. Another approach I'm considering for a problem with manageable amounts of data but lots of processing required is to fully replicate the data set and parallelize the processing. This will let me try different algorithms and, maybe, do some competitive selection.
The performance benefits of SBA stem from the fact that all processing takes place in memory and, whenever possible, at in process speeds. Any disk-based persistence is done asynchronously to keep those inherently slow operations out of the critical path of the business transaction.
High availability (resiliency) is achieved similarly to scalability. Each processing node has at least one synchronized backup that takes over when a failure is detected. The SBA is monitored by redundant services that detect the failover and start a new backup to provide self-healing.
ServiceOrientedArchitecture (SOA) can be implemented on SBA by distributing stateless services over the grid and using the distributed space for state management.
Darn, and here I thought this page would be about Googie! (http://www.spaceagecity.com/googie/)
One advantage of SBA that doesn't get enough mention is the simplicity of the programming model. The application logic is independent of the topology of the distributed space. The same code can run on one machine or a thousand.
AugustZeroEight
See also GridComputing, JiniTechnology, GigaSpaces