Pass Identifier As String

I am proposing the rule of thumb that ID "numbers" usually be passed around between functions and modules as strings. The main reason is that in the future if the ID type is changed from number to string in the database or primary source, you won't have to go around to all references and change the parameter type. (Dynamic languages may not have this problem because they can automatically pass on the type, or don't care about type.)

At the final "save" point they probably have to be changed back to number because some dialects of SQL cannot automatically convert a quoted number into an actual number. But all the in-between routines are best done with strings.

One rarely does "math" on ID numbers other than "equal" comparisons such that one doesn't have to worry about operator type issues.

An example of where the ID number changes type is two databases or companies merge and two ID spaces must merge. If you stick to numbers, then you have to re-sequence the ID's and update all the references, which is a royal PITA. An alternative is to change the ID to string, and append a prefix such as "A123" and "B123" where "A" are records from org "A". It's still a lot of work, but often the least of evils compared with numerical re-sequencing.

One can also add a large value to one of the companies such that "123" becomes "1000123" for org "B" if we want to keep numbers. However, in some cases the other side may already be using strings as ID's. With older companies you often notice that customer account numbers have letters in them. The long history of mergers and re-orgs may account for such.

--top

This seems generally reasonable to me, except to say that if you try to convert them 'back to number' for the DB, you'll have an issue where the ID has non-numerics. Why not leave them as strings? One place where IDs might be compared is for sharding. Even there, though, I think your suggestion doesn't cause any issues. --dab

I was assuming that the developer doesn't have a lot of control over the database structures. Plus, actual numbers in databases are generally more compact storage-wise and allow using the built-in auto-numbering. It's after growth, merging, etc. that the flexibility of strings comes into play.

Okay, good point about existing DBs. On storage, I try to live by DiskSpaceIsCheap? as much as my clients will let me. --dab

Back-up costs/effort and DB performance tend to be a factor with bulkier keys. I'm not saying it's necessarily a big deal, only that there is more to it than just measured disk space.

I agree. I've made the same argument, (note my inconsistency) that the cost of disk is more than just the media cost. Also, the disk used for an Oracle DB, for example, is not typically commodity hardware, either. Not that Oracle is required, but it is widely used. --dab


I HaveThisPattern. I find it especially useful during integration efforts - i.e. where there is no centralized DB, but rather a diverse bunch of protocols that all have their own ID models. It's best if most if the middle is done with strings, to avoid managing ID conversion maps. I'd actually go a step further and suggest use of UniformResourceIdentifier's. Rather than trying to compact them, why not "org.A.group/foo/bar/..."? Global uniqueness is quite feasible (and can be secured, using an HMAC). For protocols, I might use: "ros:..." or "jaus:..." or similar. The extra data can be useful for conversion routines, and reduce use of state.


CategoryBusinessDomain


EditText of this page (last edited March 22, 2013) or FindPage with title or text search