A special case of SeparationOfConcerns.
Problem: Methods are not reusable because they have too much responsibility. In particular, they do IO and they also calculate, so whenever you want to do one of those things, you end up doing both.
Solution: Separate IO and calculation as much as possible. Methods which do either IO or do a calculation are considered reusable. Methods which do both are not considered reusable, because whenever you want to do one of them, you end up doing both..
Can we have someone finish up the pattern template?
Related patterns: ModelViewController, ModelDelegate
SeparateIoFromCalculation is a rule you can't follow all the time. Somehow, you must glue together I/O and calculation. For better reuse though, having functions that do I/O and functions that do calculation is better.
Example: You have a method that calculates sqrt and prints sqrt at the same time. It is called from several thousand places in your program. Then you need to calculate sqrt but you don't need to print it. Or you need sqrt of the same number several times, but you don't want to calculate the expensive operation several times.
It is better to separate this method into calculateSqrt and printSqrt. printSqrt is so similar to print that probably it is better to just use print. Then calculateSqrt() can be simply called sqrt().
Some of the SeparateIoFromCalculation definitions on this page are naive approaches that only work with data sets that always fit in available memory. They fall apart when you can't read and write all the data at once.
The above seems to arguing on different LevelOfAbstraction. By focusing on a single implementation, reading to memory then calculating, one may miss the point is to separate by code module and not by time. The use of ContinuationsAndCoroutines, streams or other forms of LazyEvaluation could work around the memory issues here. -- TylerMac
In the OO world, a similar separation is applied on methods: Some are mutators and others are inspectors. Creating a mixed method is considered bad practice, since then it becomes harder to use and reuse.
Consider, for example, a class used for creating XML documents. You can add new nodes (addNode) and you can obtain the resulting XML (getXml). If getXml() performs the document construction, it is underperforming and probably it won't maintain correctly its invariant, unpleasantly surprising their users.
This is probably the most familiar abstraction in computer science, right before DataStructures (AbstractDataTypes).
Is this a DesignPattern?
There are many ways to decompose the program above:
This page promotes a process decomposition. The example given is also known as a ReadEvalPrintLoop. A variation on this is known as the MasterControlProgram, or MCP, which is cast as the central evil by AlanKayIsTron because over-application of this pattern leads to heavily moded software. See TheEnd. An alternative decomposition is as objects that can read and print themselves, or the spreadsheet that decomposes calculations as a sea of cells that are directly viewed and manipulated.
The problem is that sometimes it is more code to keep them separated.
One shot:
get x for each i in x alter i to j print j end forSeparated:
get x for each i in x alter i to j store j in y end for ..... get y for each j in y print j end forCan you please change this example so that actually it makes any sense?
The code above is the wrong way to separate the code because you transformed 1 loop into 2 loops. At first glance it may look correct, but it is performing slower and using additional storage. Besides the first function does IO and calculation, while the second does the same, it is only separated by a blank line. That is not what I was trying to imply. SeparateIoFromCalculation is about a function or method doing either IO or doing a calculation.
modify x for each i in x alter i to j end for print x for each i in x print i end forI assume the modified i is stored back to corresponding place in x. If not, it means you probably do that calculation just to do the output formatting; that way, the calculation may be ok. Or you could have done
print x for each i in x print format x end for format x return altered xWhat if we wanted to save the altered thingies, but also print them?
get x for each i in x alter i to j put j into x replacing i print j end for
I'm quite unconvinced that 'saving' a value is fundamentally different from I/O. It is just output to memory rather than to the UI. However, saving a temporary, where that temporary is used and discarded as a purely computational side-effect (a side-effect of calculation), would qualify as different. If, however, you're provided an address or reference to someplace to save data, then requesting or sending information from or to that address (whether it be to a printer or elsewhere in memory) is definitely I/O.
SeparateIoFromCalculation, taken fully, means ReferentialTransparency for calculations... since the calculations neither receive input (except to initiate their computation) nor provide output (except to report their result).
The profit of SeparateIoFromCalculation is that the calculation logic is independent of the IO logic or the data source. For example, there is sin(double) function that is used to calculate sin value of a double. What if its signature is changed to sin(File) which can calculate the sin value of the first 4 bytes double from the file?
Separate IO from Calculation makes your code more reusable. usually the IO/GUI code changes when you are polishing your software (improve program looks, support new format). As long as your calculation code is separate from IO code, you can be sure the calculation is always correct, whether or not you change the IO part.
Surely not all calculation can be separate from IO. For example, how would you write FileWriter? or ExcelFileFormatReader? if you shouldn't be interacting with the file? The point is to keep the IO code to the low-level class. Only the class then really needs to know its source of input.
It has been my experience that the first choice (low level objects print themselves) gives you better ObjectOriented systems, in which you really do not care about the IO. All IO is resolved in lower level classes: UI, database, etc. The program is just a bunch of rules. I would say construction rules. For example, to get a doctor you need a person that has a medical "speciality". So you can either go get a "speciality", then the list of doctors appears (persons), the user selects in the UI which doctor he/she will prefer. Or perhaps you are inputting a new doctor in which case you select its "speciality" and then add the person related information.
The second choice (upper levels in the system connect the model and the UI) is exactly like StructuredProgramming. This leads to systems that are very hard to change. -- GuillermoSchwarz
I know that by now this page is really a DeadHorse?, but I think I'll beat on it a little more.
I was astonished that there was even any debate on separating I/O from calculation, but then it occurred to me that most of us have never programmed in AssemblyLanguage.
LOAD POINTER REGISTER WITH LOCATION OF BUFFER LOAD ACCUMULATOR REGISTER WITH "FETCH" API CODE CALL OS API LOCATION ; OS returns number of bytes in COUNTER REGISTER JUMP TO BOTTOM OF LOOP IF COUNTER REGISTER ZERO TOP OF LOOP LOAD BYTE REGISTER WITH BYTE AT CURRENT POINTER REGISTER (do something useful [calculation] with the byte) INCREMENT THE POINTER REGISTER DECREMENT THE COUNTER REGISTER JUMP TO TOP OF LOOP IF COUNTER REGISTER NOT ZERO BOTTOM OF LOOPso, where did the byte come from?
In those instances where we *must* write an I/O driver to fetch or put data, we certainly don't want the code that does the meaningful calculations tangled up in it!
Now, in a higher level language, since "Boy, bring me another byte" handles I/O as a fairly abstract concept, we may see no real need for separation but, trust me, you want the nuts and bolts of I/O done away from your pure-bred algorithms.
Now, don't get me wrong, I have nothing against I/O routines as a whole - heck, some of my finest code is I/O routines - but I wouldn't want one to marry my data.
I/O can't be separated from calculations. I/O is just one form of calculation.
A better way to describe this may be to move from this:
loop { x = highRiskRead() process(x) }To this:
loop { x = highRiskRead() insertIntoStructure(myStruct, x) } loop { x = getNextItem(myStruct, ...) process(x) }This can simplify error-handling because other than running out of disk space (assume data structure is cached), we should not have to do heavy error-handling in the second loop because we can just conclude that something is too messed up if there is a problem and suddenly stop without worrying about cleanup. However, the first approach may need complex error-handling to undo half-done processing because we know the IO is full of risk. I find error-handling is just simpler when high-risk processes are separated from low-risk processes. Ideally, all errors should be handled gracefully, but I find that complex error-handling for rare systems-related problems is not worth the code volume and tough to test, so displaying an error message and closing down is often sufficient. For example, I don't put disk-full error-handling on every structure "insert" command because if the disk is full there are probably far worse things to worry about. It is usually not an application's job to monitor disk or cache space, except maybe in life-support systems or bulk-load batches.
Ideally, a language would have a central handling routine for system errors so that we can give a message and perhaps write to a log if there are problems, but otherwise don't worry about graceful recovery. If task-specific graceful recovery is needed, apply specific handling code to only those critical sections.
It might be true that in resource-sensitive domains that separation may be too costly because it tends to require an intermediate buffer structure. It may be a trade-off between developer productivity and machine productivity.
-- top
The 2nd example above is a good example of an "unbounded memory" algorithm. There's no upper limit to the memory it will consume. It might not exhaust virtual memory, but you can't reliably predict how much memory it will use. That means you can't predict how many instances can be hosted on the same machine or what its impact will be on other processes. This kind of algorithm doesn't scale gracefully and should be avoided.
The preferred alternative is to use a bounded memory algorithm that solves the same problem. The 1st example above might be one of those.
startTransaction loop { x = highRiskRead() process(x) } endTransaction
This is the standard model for steps in a workflow, or requests in a server. Start a transaction, wait for a message, read the message, process the message, send zero or more messages, commit the transaction. This allows you to pipeline operations and distribute I/O & CPU bandwidth between multiple processes.
sub generic_source_loop(&risky, :&onErr:($)) { loop { try { risky(); CATCH { onErr($!); } } } } sub do_calculations(*@input) { map {prosess $^x} @input; } do_calculations generic_source_loop { highRiskRead() };Tadaa!
In ObjectOrientation, this is about separating MutatorMethods from InspectorMethods, CommandQuerySeparation.
I've encountered a situation where SeparateIoFromCalculation and data-driven-programming (perhaps a variation of TableOrientedProgramming) seemed to reduce the need for HOFs (HigherOrderFunctions). I had this domain-specific report presentation engine. It provided formatting and did optional row and column totals. However, each report cell needs custom formatting for different usages (instances) of the report engine. Either I copy-and-paste the report engine for each instance to customize it, or I find a nice way to apply situation-specific formatting rules to the cells. A bunch of domain factors could be involved in how a cell is formatted. One approach I started with went something like this:
function reportEngine(dataStruct, cellDisplayHOF, ...) { ... while (R = rowLoop(...)) { while (C = columnLoop(...)) { ... cellDisplayHOF(dataStruct[R,C].value, factorA, factorB, factorC, etc...) ... } // next C } // next R ... }The problem is that the report engine had to track and pass all the situational factors that could affect cell formatting, such as cell colors, alignment, value formatting with things like "$123,456.00" for some dollar amounts, etc. The situational logic would go something like, "If this is in region X and is one of the special product categories specified by person Y after date Z, then make the cell red".
Some factors that affectted formatting came from parameters passed to "reportEngine", and some were attached to the data structure because they were more cell-specific. Different situations would use different factors such that for consistency I had to assume the widest number of factors. I was shuffling around too much info through the report engine. It was a middle-man that shouldn't have to care about all this situation-specific info.
I decided that a better solution would be a data-driven one that separated the format calucation from display. A structure/table like this was more useful:
struture: reportCells --------------- rowID columnID rawValue // needed for totals calcs cellFormatString // TD attributes displayValue // may have formatting, such commas in big numbers.One routine generates the reportCells table/structure, and the other simply displays it. The display routine no longer has to care about domain factors that may affect the display. Here is a rough idea of how the display portion works:
function displayReport(reportCells, ...) { ... while (R = rowLoop(...)) { print("{tr}"); // HTML row start while (C = columnLoop(...)) { print("{td %1}", reportCells[R,C].cellFormattingString"); print("%1{/td}", reportCells[R,C].displayValue); } // next C print("{/tr}"); // HTML row end } // next R ... } // (Braces used in HTML tags instead of angle-brackets to not confuse wiki.)The "computation" of the cell formatting no longer has to pass through the display portion. It is already done in a prior step. (This is an oversimplification, but gives an idea of what is going on. For one, the data structure may have had other columns used in other calculations and stuff, but it didn't matter because the display routine ignored extra columns.)
The downside is that some of the looping structure is duplicated in the format calculator and the display portion for many usages (but not all because the order of calculation depends on usage-specific issues). But it is worth it.
In this case, use of data-driven programming and SeparateIoFromCalculation avoided the need for HOF's, and simplified the app.
See also SeparateDomainFromPresentation, AvoidExceptionsWheneverPossible, ResultSetSizeIssues