Discussion of implementation strategies for an e-stores application such as Paul Graham's Lisp system.
Moved from PaulGraham:
Paul Graham seems to think that databases are of little use because they were not helpful in his successful e-stores project. I agree that databases may not shine in such an environment because each store is mostly independent from another. Saving strings of Lisp lists is plenty sufficient in such an environment. There is no need for significant or complex cross-store queries, etc. I think this has misled Paul about the uses and possibilities of databases in other business environments, where such independence is less common.
Using EssExpressions as a replacement for a database strikes me as the same sort of folly as using ExtensibleMarkupLanguage (more specifically, files written in the aforementioned formats) as a database. And which XML proponents (some of 'em) are roundly abused for in XmlIsaPoorCopyOfEssExpressions
Both "replacing the database with S-Expressions" and "... with XML" betray a fundamental misunderstanding what a database is for; it's not just a data store.
You do not need a database in every application. Move over it.
For performance reasons, that may be one case where the advantages of databases would not be readily realized. Each individual customer (store owner account) is probably not going to need different complex, changing queries. A simple to-spreadsheet-like dump is probably sufficient. If they grow too large for click-and-play e-stores, then they would probably just create a new website of their own or use more sophisticated store management software from higher-end vendors like IBM. I don't think Yahoo intends their individual e-stores accounts to scale all the way up to Walmart-like size on a regular basis and most of them are indeed niche markets with rarely more than 100 products. It would be an unacceptable performance and complexity penalty to cater to the big stores, less than one percent of their market. I am generally pro-database because of the flexibility and growth potential they offer, but in this specific case, Paul seems to be right. --top
Perhaps, I don't know enough about the e-stores application to make a fully-informed comment with regards to whether or not this was a good architectural decision. If the application never requires concurrent access to the data store, the application takes care of data integrity, and good backup policies are implemented and executed; this might well be a reasonable decision.
Perhaps in the future Paul may had worry about targeting bigger stores, but worrying about that early on is too much against YagNi and FutureDiscounting in my opinion. It is like worrying about where you are going to park your Yacht before you can even afford your first employee.
Here is one of the larger Yahoo's Store customers: http://store.yahoo.com/nasa/ As you can see, they have probably around 300 or so items. I have not seen any that have a thousand or more items.
I have encountered a similar need pattern in other projects (which I can't give specifics on yet, so will use the store example, but perhaps a more traditional store). There may be a lot of "local" activity, but also shared activity. For performance reasons some partitioning may be in order. Thus, it makes sense to partition by location. For the sake of discussion, lets assume that one set of entities tend to be location-specific and another set shared.
For example, the store product catalog may be central. However, individual store sales could perhaps be stored locally. Each store or store region would have it's own Sales table. But they would not have different names. Thus, the app developer does not have to worry about which store they are programming for. If they later combine them into one central Sales table, nothing would change in the app code.
In theory one can partition the database into the shared portion and the local portion to help distribute the work-load. Ideally it would be transparent to the application developer so that the app could be later centralized if needed without code rework. They may not have to know or care if a table is physically local or at a central place. I can see some complications in generating unique record IDs however, if there are no decent domain keys. ID generation may involve algorithms that put location-specific info into the keys. For example, each location may be assigned a range of ID values and new IDs are generated within that range per location. Thus, there would never be an ID collision. The ID generation could be setup as an INSERT trigger so that app developers don't have to worry or care about how IDs are generated (which is generally the case with INSERT anyhow for non-domain keys).
If the central office wants to do a query that searches or sums across multiple locations, then internally the database could do an internal UNION of the tables, such as the Sales table. I believe the BigIron RDBMS have this capability, but I have not implemented it myself. I may have used it without ever knowing it, which is how it should be to an app developer. Any war stories out there?
Again, from the perspective of the database user (SQL), the virtual table looks just like a single table. Nothing should change from an SQL perspective if the tables are later merged and the processing centralized. The primary keys should not collide due to the range trick above.
The only hint that something is different from SQL's view in a distributed setting would be if someone tried to query another store's Sales table and get no results. I suppose the "internal union" trick could be applied for each location, but that may result in performance problems as every Sales table query may search every store's instance of the Sales table. We assume that only the central office would want this feature on anyhow. There is little reason for store A's manager to snoop into store B's sales records. Special user settings may perhaps allow the consolidated ("internal union") view from stores in special cases.
It might be a lot of DBA work to keep all the virtual views and partitioned tables straight, but ideally the app developer and SQL users don't have to worry about it and don't have to change their app if the partitioning changes or is re-centralized.
See also: DistributedComputing, BlubParadox, IfFooIsSoGreatHowComeYouAreNotRich, RocketAnalogyProblem