"I want PowerfulAdHocDataProcessingTools."
As a developer, and a power user, I sometimes have a bunch of data that I want to process, and I don't want to have to run a proposal through all the layers of management, and perform a cost-benefit analysis, and form and run a project team over the course of months, just to get some data processed. What are the best tools available, for me to just crunch some data?
Ad-Hoc "Data Processing" tasks we might want to do:
- Query all the source control checkins done during an iteration. Get the User Story number from each. (Assume that user stories are numbered, and that every checkin comment starts with the appropriate number.) Produce a list of files changed, by user story.
- We got a list of all the boats in our fleet -- in a poorly formatted Word document. We need to put this in a new database table we just created for this purpose. Edit (format) and import the data into the table.
- Some web server has the time reporting data that I need. I'm allowed to run a "report," that will display the data in HTML, but I'm not allowed to run queries against their database. So I need to "browse" to a URL, "screen scrape" the HTML, and write it to a table in my database.
- One-time conversion of data from the old system to the new one.
- Ad-hoc report of all customers who purchased product B in March who had also purchased product A within the last year. Summarize by zip-code, but also provide the detail in a spreadsheet.
- Internal (IT-deptarment) apps and utils that don't need to be pretty but easy for IT clerks to use.
SAS or SPSS.
Comments:
- As far as SAS is concerned, do you have any specific scenarios you would like to demonstrate?
See the new list above.
From what I've seen, SAS is a really strong tool for grabbing and crunching data in creative ways. ...if if you're not doing any math on it. (And, of course, SAS really shines on the math stuff.) But it's a pricy tool. -- JeffGrigg
In Unix, shell scripts and piping are often used to process data.
Command line tools, scripting languages and custom programs can manipulate data.
Script languages have progressed from awk, through perl, to python, ruby and others.
But the powerful Unix command line tools and piping are not commonly available on Windows platforms, and many GUI tools are not compatible with command line usage.
PowerShell is an extremely promising tool for the Windows DotNet environment.
(But there doesn't seem to be any equivilent in the Java environment.)
See Also: ExBase
Oh please. ExBase is not, by any stretch of the imagination, a "powerful" ad-hoc data processing tool. Comparing ExBase to (say) SAS is like comparing a steel-wheeled rollerskate to a Mac truck. Even for simple database-oriented tasks it is utterly eclipsed by MicrosoftAccess, which is still firmly in the in-line rollerskate territory.
- MicrosoftAccess does not provide a smooth transition between its mousing-world to its scripting world. Although ExBase tends to be keyboard-centric, this is a good thing when you need scripting. MS just tossed their generic VBA engine into the product. VBA is not a DomainSpecificLanguage, unlike ExBase's where the domain is data-diddling. VBA requires filtering stuff through its verbose and unintuitive API's. I agree that with Access one can do an awful lot via point-and-click, but when you hit the limit of the point-and-click world, you hit it hard (DiscontinuitySpike). As far as SAS is concerned, do you have any specific scenarios you would like to demonstrate?
- MicrosoftAccess is not tied to the "mousing-world". While it's true that VBA is not a DomainSpecificLanguage, even despite the slightly more awkward access to database facilities it is every bit as capable as ExBase if not much more so. Despite the die-hards who cling to ExBase, there's a reason MicrosoftAccess dominated (and continues to dominate) the market segment where ExBase once held sway -- it's because MicrosoftAccess is superior to any ExBase implementation in virtually every respect. SAS, on the other hand, is like SQL on steriods -- though that doesn't capture the power of SAS to extract and report interesting statistics from data sources and recombine them as needed on an ad-hoc basis. It's designed for that purpose; ExBase and SQL aren't.
- I've used MicrosoftAccess for almost as long as I've used ExBase, and still find it wanting. Access dominates because its easy to get a lot done via mere point-and-click by IT generalists, and because it's bundled with MS-Office-Pro. I will agree, though, that both could be made better by borrowing ideas from each other. For example, Access could learn from the tight integration/embedding of the query language to the app language and table-driven configuation (instead of proprietary binary blobs); and ExBase could use better GUI & mousing tie-ins and a better relationship with SQL/ODBC data sources.
More generally, importing the data into a
RelationalDatabase would enable you to use SQL to process the data. This approach could be relatively powerful -- as "powerful" as SQL data manipulation (DML).
- Newer dialects of ExBase allow SQL queries also.
See Also: ExBase, DataManipulation, NimbleDatabase, DesktopDatabase, ModernizingExBase
CategoryStatistics