Time Synchronous Processing

A brief explanation of my problem (detailed explanation can be found in HubbleTelemetryReporting):

1. I have to create a report combining data from different sources.

2. The data from each source uses a different time base with varying length records.

3. The data sometimes contains gaps.

A couple of simple report examples (see RealActualHubbleTelemetryData for an explanation of the measurement codes):

1. Report the value of CBAT1CUR, CBAT1PRS and CSS3A every 30 seconds.

2. Report the values of CBAT1PRS and CSS3A when CBAT1CUR is greater than 0 amps.

The data for any given measurement, for any given time period can quickly and easily be retrieved in time order. The difficulty lies in taking records of differing lengths with different time offsets (and don't forget those gaps) and synchronizing the values of several measurements in time to produce the required output.

For example, using SQL there are two ways to produce report example one:

1. Construct a temporary table containing the time values at which the samples are to be taken, then use this temporary table as criteria in a select statement.

2. Use pre-constructed tables containing the time values with a limited set of fixed intervals. Use the table containing the interval chosen by the user as criteria in a select statement.

Each approach has good and bad points. One requires extra time to construct the temporary table, but allows the user to choose any arbitrary sample interval. Two does not have the extra construction time, but limits the user's choices. Both solutions though can have severe performance degradation because they require a CrossJoin?. Specifically, if there are n sample points in the report interval and m database points in the report interval, then the time required for the query will be O(nm) (and depending upon the implementation, memory usage may also be fairly large).

A simple traditional program which steps through the data from the report interval in time order and spits out the value of each sample as it comes up can produce the same report in time O(n+m). This can represent a considerable savings for reports over intervals containing large amounts of data.


''Is there some reason not to use the "traditional program"?

In the SQL approach, if the creation of the temp table is actually a problem (it would surprise me if it were), how about using fixed non-intersecting intervals and then sub-selecting over the union of a cover on the desired interval?


The original designers wanted to do everything in SQL. I finally convinced them that this was a mistake and that the best solution was to divide reporting into two groups - do the stuff that SQL does well in SQL, but reports (such as the two examples given here) that are inefficient in SQL should be implemented as a traditional program.

Now the solution program of which I speak is not specific to the reports shown above, but is very flexible, allowing a wide variety of different reports to be generated.

The main point of these pages, which will be clearer as they develop, is that many parts of my general solution are, in implementation, starkly similar even though they are to a user quite distinct. These similarities lead me to suspect some sort of underlying pattern(s), which is why this discussion is on wiki.


CategoryTime


EditText of this page (last edited January 2, 2005) or FindPage with title or text search