Grep Vs Database

I've encountered multiple times situations where developers suggest using "grep" instead of a NimbleDatabase. I don't think grep is a sufficient replacement.

If you use Grep as your database tool, you risk parsing errors. Databases give clean boundaries between the "atoms", while grep has to keep finding these boundaries over and over, and may slip if your regex's are not perfect or there is an error in the text format. Further, more developers know RDMBS than know grep. Grep requires both parsing knowledge and CollectionOrientedProgramming knowledge for the parsed atoms. A database generally requires only the second (usually SQL in practice).

Further, you don't have indexing ability. Indexes are one of the best performance-oriented features of a database. It's very tedious to do large joins without them.

More clarification is filed... I assume what you refer to is the practice of using flat text files (or a collection thereof) to store data, and "grep" to perform queries on that data--and using this as a substitute for a RDBMS of sort (even a NimbleDatabase). Am I correct?

Depends on the data. I think we're all in agreement that doing this sort of thing for arbitrary business data, especially things that change a lot, is an AntiPattern--even worse than the deplorable XmlDatabase. Grep can function as a poor man's SELECT. It doesn't suffice for projections and joins, and the results returned by grep are not sets (duplicates may be present). Other tools in the Unix toolkit can be used to eliminate duplicates and transform data records, but still--this is not a good general purpose solution.

On the other hand, there are some applications a) where the limited power afforded by grep et al is sufficient, b) the dataset is small and infrequently modified (and when it is modified, editing it by hand is OK), c) a text-based representation on the disk (which can be edited by hand) is essential, and d) even use of a NimbleDatabase would be overkill. Not to mention e) data that is provided to you in flat text form. And of course, don't forget greps ability to analyze arbitrary text, which is not replaced by any sort of database.

In other words, DoTheSimplestThingThatCouldPossiblyWork

Note that databases don't have to be complex. Thus, "simple" alone may not be enough to exclude them. Company tool policy limits are a far more likely reason. If the company does not like X, then they don't like X and we must move on. If a shop already has nimble database tools laying around, then THAT is the "simplest thing". Thus, I'm pulling an EverythingIsRelative here.

We're talking about working with text. Not text representations of data, but actual semantic text. Databases are a lousy tool for this and the ones which do have tools for it do it by building on the functionality present in tools like grep.

I assumed a context where Grep was being used on "structured" data. However, text mixed in with structured adds another dimension to the comparison. A "grep for databases" would be nice rather than catering to just files.

There is/was a tool that I used with a database language, which tool was called PhdBase? (it was an extension to Foxpro). I could search the memo fields in tables with blinding speed and get the kind of results you might get from a SuperGrep? -- instead of the standard grep line-oriented expressions, I could use document-oriented expressions, including context keywords like "near" and "within 5 of" and wildcarding of words and strings, including noise-word suppression.

You could say it was a field-specific "Google" search.

Fabulous tool. I could search the customer service database for that weird guy who claimed his Laborador Retriever was responsible and that, besides, he's already spoken to his teenaged son and it wouldn't happen again (yes, that really happened), and have all of his records in a browse in typically a second or two.

I don't suppose anyone here knows whatever became of the product and/or its author?

-- GarryHamilton

There used to be lots of such utilities for ExBase derivatives (such as FoxPro) back in the hayday of ExBase that kind of resembled the activity around VB active-X components.

CategoryLowEnd