File Typing System

Different systems and programs have different ways of identifying the 'type' of a file. Examples include:


The type information extracted clearly varies a lot. Which means using the extracted information can be a pain. The MIME type system was supposed to help with this, providing a common level of metadata external to files themselves. By supplying MIME metadata along with an attached file, mail systems (originally) were supposed to be spared the pain of implementing any or all of the above systems. However, MIME can be fairly useless - since few software companies bother to register their file types with IANA, most of the time data will be described as text/plain, text/html, application/xml, or binary/octet-stream.



RE: "With object persistence, the "type" of a file may be the "class" of the first or "root" instance in the file."

[Feel free to delete this one if it doesn't count, but it seems to me that...] The ultimate 'file' typing system would be an object store of persistent objects. Here each object has a full type hierarchy and can be accessed as either the base type or any super types.

Many object persistence systems prefix instance data with some kind of class identifier. The first or "root" class of a file could reasonably be considered the file's type.

[RefactorMe when this point decided]

[It is worth noting that in some OS, eg PalmOs and ErosOs, the notion of a filesystem doesn't exist, replaced with a checkpointed view of objects in memory. However the 'file/object typing' issue still comes into play if the system is dynamically typed]

This last bullet raises several good points. We should start thinking beyond the traditional file system paradigm, which holds us back in the days of separation of data and code. At the very least, we should be thinking in terms of objects, where the raw data is encapsulated, and we invoke methods to access, modify, navigate, and interpret the data. I prefer to think even higher, in terms of services. My Palm doesn't just "hold my data;" it provides storage and management services for my personal information. Note that it doesn't need a file system to do this; I suppose there is an implied concept of "types" of data, but it handles all of that transparently to me -- if I do everything through its user interface, it takes care of keeping appointment data associated with the calendar function, and separate from phone numbers...


I propose that the name of this page be modified to DataTypingSystem?.

[I - the original author - disagree. This page describes file type determination and it is useful to keep it that way. Start a new discussion on DataTypingSystem? and refer back to here if you want to extend it out. I guess that answers JeffGrigg's question... You might want to look around first, there's a lot of relevant info on the wiki about type inference in compilers etc - which is irrelevant to files]

OTOH, BenefitsOfDynamicTyping is all relevant. Magic numbers are StaticTyping since they can't be easily changed by the owners of those objects, whereas FilenameExtensions are DynamicTyping.

In DynamicTyping, type is an attribute of the value/object, while in StaticTyping type is an attribute of the variable/container, and must be made manifest externally to the object itself. Magic numbers therefore are DynamicTyping while FilenameExtensions are StaticTyping. Consider "foo.jpeg" and "JpegFile? foo". Extensions are though WeaklyTyped, in that you can "cast" foo.jpeg to foo.exe and do interesting things. ;)


CategoryOperatingSystem CategoryInformationManagement


EditText of this page (last edited August 25, 2007) or FindPage with title or text search