Node Js And Hof Discussion

NodeJs (a JavaScript kit) and HOF Discussion

Continued from ArrayDeletionExample


An absolute demonstration of eval()'s insufficiency and the power of closures is NodeJs, a platform for server-side JavaScript. The genius of Node is that every single I/O call is non-blocking; rather than returning its result directly, it takes a callback function, reminiscent of ContinuationPassingStyle. For example, reading from a file looks like this:

fs.readFile("path/to/file.ext", function(err, data) {
// do stuff with file data in here
});
Because every call is non-blocking, like the above, Node essentially has concurrency ForFree. It's a fantastic way to write all sorts of applications, especially servers. And it's absolutely 100% impossible for any eval() system to achieve this kind of evented structure (at least, without actually being a closure system in disguise).

To show exactly why eval() can't possibly do this, here's a code example from a simple Node server:

http.createServer(function(req, res) {
  res.writeHead(200, {'Content-Type': 'text/html'});
  withImage(function(img) {
res.end("<img src='" + img.url + "' />");
  });
}).listen(8000);
Notice that the bolded variable is accessed in both the outer and inner scopes. Without LexicalScoping and proper closures, that variable would be gone by the time withImage() fetched the image and called the inner callback. With those two things, this code works perfectly and is quite natural.

-DavidMcLean?

I almost never had a real problem with concurrency for web apps or client/server architecture. If it becomes an issue, then I rewrite it to use the concurrency-handling capability of the database, such as ACID and transactions. You appear to be using a file system for data "storage" when a database would perhaps be more appropriate. If you need to store content such as photos with concurrent uploads, then generate a unique ID first in the image or article tracker or record (usually article ID), and use that as part of the destination file name to avoid user collisions. (It's also possible to store images directly in a database, but that's not always an option. There are other approaches, but I'd have to see the the domain details to recommend something specific.) It's common for us CBA developers to rely on the concurrency engine in the RDBMS for concurrency needs. Why do you HOF fans keep trying to sell refrigerators to us Eskimos? -t

Ah, figures. The fact that I mentioned files at all clearly means I advocate them over databases for all data storage. I merely selected the file-read function to demonstrate in a simple way the callback-oriented I/O; in retrospect, this was a very bad move considering to whom I'm talking.

Node's non-blocking I/O doesn't just apply to files. It works equally well with any kind of I/O, including databases, and it does so using the exact same callback design. That withImage() function actually fetches an image URL from the Google Images API, for example, not from anything on the local filesystem (unless your local filesystem just happens to be a Google datacentre).

Actually, given what I've seen of TableOrientedProgramming, I think Node would be an excellent fit for it. TOP clearly is full of database queries. You know how whenever you make a database query there's network overhead? When you use Node, that network overhead is put to good use. While it's waiting on the results of a query, your app is free to receive other requests, perform other computations, and so on. Node's weak point is stuff that requires a lot of processing directly in-app, rather than through I/O, since its magical concurrency design doesn't inherently include threads. Because TOP involves having the database do as much processing as possible, Node's weakness is dodged nicely.

-DavidMcLean?

I'm not sure what you mean by "network overhead" here.

Re: "While it's waiting on the results of a query, your app is free to [do other things]". That hasn't been a common (explicit) need for the web-apps I've worked on. But it can possibly be handled with "hidden" frames or iframes. Web pages make it pretty easy for the user to "spawn" other pages if they want while waiting. One can put up a page that says, "While waiting for your request to process, here are some other articles/topics you can peruse". In-line ads already do this pretty much. Just make sure to set "target='_blank'" so the original "waiting" page is not wiped out.

On client-server apps, one typically makes the long process spawn a secondary process and "status screen" that says something like, "Your query/report is being processed, please wait...". When it's ready, then a notice and a "View Report" button comes up on that secondary screen.

You send a request to your database. There is some delay, due to the time taken to deliver the request and response across the network. You receive a result from your database. Network overhead. (Also, there's overhead in the DBMS itself, especially as queries grow more complex. There's plenty of overhead in the whole process for Node to exploit.) Under most platforms, that network overhead slows down your app, as synchronous code has to block while it waits for results; under Node, overhead gives your app the chance to get some work done, processing events from the queue.

You say there hasn't been an explicit need for concurrent behaviour in your Web apps, but in fact all Web apps need to be concurrent, because they will inevitably receive requests from more than one user. The "explicit" part, I think, highlights that you've missed the point a little: Why should you have to be explicit? It's a Web app; of course you're going to want decent multiuser handling. Node's evented structure means that you get multiuser concurrency in your app server implicitly, without needing to give it special consideration.

While I cannot condone using frames to implement asynchronous Web-apps (you do know what AJAX is, right?), the client-server technique you describe is a sane one. However, the idea of offloading complex processing to a separate process is the crux of Node's nimble event-loop, so that design automagically works in Node without much special server setup! The platform makes it extremely natural and clean to set up many concurrent situations, including the one you describe.

-DavidMcLean?

Maybe if we explore some kind of semi-realistic scenario. The devil's in the details of the domain needs. -t

You'd like a specific scenario? Okay. Here's a simple example:

var db   = require('db');
var http = require('http');

var s = http.createServer(function(req, res) { res.writeHead(200, {"Content-Type": "text/html"}); db.query("SELECT * FROM tweets", function (err, rows) { for(var i = 0; i < rows.length; i++) { var tweet = rows[i]; res.write("<p class='tweet'>" + tweet.text + "</p>"); } res.end(); }); }); s.listen(8080);

The above code, based somewhat on an example program from the book Node Up And Running, implements part of a Twitter clone in Node; the design is fairly conservative so as not to distract from the following explanation of the eventing process (a production version would almost certainly use some HTML templating system, rather than generate HTML directly inside the query callback, as this example does, for instance). This basic code concept would be perfectly applicable to report-writing business applications, among others.

Let's look at how the above code is processed, under Node. When the application starts, practically all of the code shown above is run almost immediately. This "first pass" in Node usually does very little actual processing; in this application, the first pass sets up some callbacks and tells Node to listen on port 8080. After the first pass completes, in any Node application, Node will stop and wait for events to show up in its event queue.

Now our server is up and ready to use. Let's have client A make an HTTP request.

Now, one of two things can happen next:

That about covers the Node eventing model. The most important point is that the only time Node isn't doing something is when there is nothing to do, when there are no events waiting in the queue to be processed. Node never stops to wait for a database query, or file read, or call out to a shell script, or any of those things. It's essentially always getting something done.

It should be apparent from the above example why Node has such good concurrent performance, as well as why it's weakened by applications needing significant processing directly within Node rather than through external processes.

-DavidMcLean?

No no no no. I'm not looking for a technical-only demonstration. I want to see it solve a stated semi-realistic business need. Something along the lines of, "I'm a warehousing company and need a report that shows the both the dfasdf and the asdfasdf every third Tuesday of the month because that's when our fasdfjk deliveries roll in. Here is how to best satisfy this need..." Your demo is more along the lines of "I can print 'Hello World' 0.02 seconds faster!" Which may be neat in a MentalMasturbation kind of way, but is not shown solving a real need out in the field.

You asked "How exactly does it improve performance noticeably?", so I assumed you wanted an explanation of how exactly it improves performance noticeably. Regardless, the example code is easily applicable to business applications, as I already mentioned; it's essentially a (very simple) report writer, much like Challenge #6. (I chose this specifically with the expectation that you would consider report writers business apps, so I'm a little surprised at your response.)

It seems odd to complain that my demonstration shows a way to produce results faster, when I'm demonstrating a concurrency system. Replacing your strawman description with something actually describing what I've just shown, such as "I can print your business's reports seconds-to-minutes faster", should show that what I've provided is actually a highly desirable quality in a business app.

-DavidMcLean?

Sorry, I don't see what bottleneck it's allegedly plugging. What exactly is serial before and now parallel under your gizmo? The delays in most custom web apps are NOT caused by slow server app processing, but the delay of bytes across the network "wires", and secondly the database processing. Now databases can and do use parallelism, but you have to have the hardware and indexes etc. set up properly, which will probably be an issue in any shared, multi-user data "storage", and databases are probably more mature in that department. Most of the same kinds of decisions and trade-offs will be relevant to any thing that fulfills that role. Parallelism and concurrency is often not a free lunch: it can require more discipline and planning. Let's not spend our time coding/tweaking around with such if it's not necessary because single-threading and serial processing are usually easier to debug and grok. Thus one shouldn't parallelize willy nilly.

Further, most web servers already do parallel processing because each user's request is or can be a separate process. Splitting further at the sub-user level doesn't gain one anything. Say there are 16 current users and 8 processors. The web server (IIS or Apache etc.) will typically split these 16 users among the 8 processors such that two users (or requests) are assigned per processor. If you split at the sub-user level also, then you simply have more little processes waiting in line. 10 2-inch processes waiting in line is not going to be better than 5 4-inch processes waiting (especially since they can be allocated to different CPU's when ready). Now when typical servers have say 100 processors, it may start to be a help, but the app code (outside of queries) is generally not doing anything processor intensive anyhow such that 90 of those will be doing NO-OP's (or working on database requests). We're not calculating pi to 500 million decimal places or predicting the weather 2 weeks ahead. If your app code (outside of queries) is the performance bottleneck, then usually you are doing something wrong, or at least the hard way. Typically the sin is not taking advantage of the database's capabilities and instead doing mass DatabaseVerbs in app code. CBA App code should mostly be like a building receptionist: guiding inputs, outputs, and service requests to their proper destination based on the business rules. The receptionists should not be processing mass piles of paper or forms. If they are doing those kinds of tasks, then you are misusing them.

A similar issue comes with queries per unit of time. Sub-user parallelism may be able to submit more queries per second, but if all the other users are doing also submitting queries, then we are right back to the same kind of problem. From the database side, the number of queries it can process per second (from multiple users) is going to be a far far bigger factor than how many queries we can submit to the database (queue) per second. It doesn't address the actual/typical bottlenecks. It would be roughly comparable to bridge toll booths. We can make the highway going up the the booths wider (more lanes, say 12), but if we only have 6 booths actively processing (the database), then the same number of cars are still coming out the other end per minute (after they pay their tolls). We are not noticeably reducing a person's trip time. The wider feeder highway is mostly a waste of space and tax money. If the traffic is light enough not to bottle up the toll booths, then most likely the 12 feeder lanes will be sparse and wasted such that they are not helping the lighter-traffic trip times either.

It's rarely economical to spend your resources/time on the non-bottlenecks.

[Top, database processing is not the bottleneck per se -- even low-end DBMSs on low-end hardware can handle dozens of simultaneous queries. The bottleneck is within applications designed to work serially: They wind up waiting for apparently unrelated parts of themselves. This is exemplified by having to wait for content A to load before you can work on unrelated content B, where users are typically forced to stare at some irritating spinning "please wait" icon or irrelevant substitute content. It's because all the application's processing is serialised, thus forcing the user to perceive the cumulative effect of various inevitable small delays. Concurrent event-driven models essentially eliminate these.]

[What's notable about using concurrent event-driven models with proper support for higher-order functions -- as with AJAX and appropriate client-side libraries, Node.js, Windows 8 modern-style apps, and so on -- is that massive concurrency is essentially free. Once you grasp the underlying approach, it is no more difficult to develop and debug them than conventional "serial" applications, but the improvement in responsiveness and fluidity represents not only a better user experience, but a potential competitive advantage. It doesn't matter if you still develop apps that permit the same overall throughput as your competitor (which, of course, might even be a fellow developer within your department.) If her apps are perceived to be fluid and responsive, but your apps are perceived to be clunky because they stall and/or show a tumbling hourglass, who's going to win?]

You must be doing something weird. In my typical biz web apps, the database is usually the bottleneck. If I comment out the database portions, the rest runs in a snap. I do this frequently when tuning page esthetics. Browsers already have built-in parallelism and speed short-cut mechanisms and one can leverage this. Your parallelism GoldPlating is a waste of programmer time. (There are other ways to speed up perceived and real web-page rendering without JS and HOF's, but we are wondering off topic here.)

And parallel algorithms are in general more difficult to debug because the order can be different on each run. It's like trying to do science in which you cannot isolate one variable because the other variables keep changing upon each test. HeisenBug risk. We can limit these problems by some extent by making certain assumptions and sticking to certain rules, but we are then accepting down-sides to gain the benefits. If the benefits are small for a particulate situation, then the down-sides are not worth it. Even in theory if two processes/events should be "parallel safe", clients (browsers & GUI engines) are often buggy such that events can cross-effect each other. I don't want to take my chances with potential HeisenBugs unless the payoff is fairly large. -t

I think you'll find the database isn't the bottleneck in your applications. Waiting for the database is. If you use an event-driven concurrency system, you never wait for the database, because there's more important stuff to do. As for difficulty in debugging, the callback-oriented structure means that coding under these systems is actually very similar to coding purely serial code. (Because they're still single-threaded, you don't have to worry about thread safety; many threading-related issues vanish when using single-threaded concurrency, partially because closures keep state encapsulated and local.) However, in evented-callback systems, you can be sure that any particular piece of code is only delayed by the things it specifically depends on; if a query is needed to show page A but not page B, then page B won't be delayed by attempts to construct page A.

Heck, even constructing a single page can have fewer delays. Suppose page A requires several database queries (Plus maybe some other external source. Perhaps some HTML sourced from the Google APIs?). In serial code, you'd make each of those queries in sequence, one by one, waiting for each query to complete before making the next one. With concurrent code, you can make all of those queries at once; databases themselves can easily handle multiple queries at once, so you'll get your results much faster than if you made the queries one by one.

-DavidMcLean?

I'm sorry, I still don't know what you are talking about. What is the "more important stuff to do"? Running a Honey Boo Boo video while the report is being generated? If I as a user click "View Report", I don't want to see Honey Boo Boo because I didn't ask for Honey Boo Boo. The button lied. (Don't need HOF's for that anyhow.) The bottom 10% of users get confused and call the help desk if too much is going on at the same time. If this is about making dancing spam more "efficient", I'm not interested in that topic today.

At least in HofPattern, the multi-panel real-time status monitor scenario was something of utility. We just disagreed about whether a JavaScript client should be the reference point for measuring "good". I'd like to get a way from GUI-intensive scenarios if possible because the trade-offs depend heavily on the client technology being used or available, which greatly complicates the comparison. A GUI of some sort is fine in a scenario as long as it doesn't become the focus point of the differences. If HOF's are mostly about making GUI's/UI's "better" in CBA, then perhaps we should spawn a more narrow topic on that alone.

No, stupid. Like I've already explained, basically anything the app needs to do is more important than sitting around waiting for a query. Receiving a request from another client, perhaps. Or finishing off a request from another client, because the results of that client's query just came back. Or, to use an example from my previous paragraph that I was sure you'd love, using the time waiting for one database query to make another database query. I have no clue why you're discussing advertising so much; I'm beginning to suspect it's a strawman tactic. -DavidMcLean?

What biz scenario would you want the app to do that? Why get report B if I, the user, only asked for report A?

I didn't actually say that, but there's an obvious reason you'd want that to happen: if a different user asked for report B, then of course you'll want to retrieve that report as well.

Use query results caching. Most RDBMS support it. Even if it didn't, I don't see what you are trying to do from a business reason perspective.

… what does query-result caching have to do with anything we've been discussing, in the slightest?

It should really be obvious from a business perspective why you'd want this. It both speeds up construction of a single report and allows for multiple users to request reports simultaneously (or for a single user to request several different reports in separate tabs, which is equivalent from an HTTP perspective). Unless you actually want slower response times from your software, the value of evented-I/O concurrency should be apparent at this point. -DavidMcLean?

It sounds to me like a similar issue in the HofPattern topic re the multi-panel real-time monitor screen matrix scenario (AKA: Brady Bunch intro). However, I cannot be sure without specifics. There are different ways to skin the cat, and the choice depends on the domain details/requirements. If we are forced to use JavaScript as the client, then yes we may have to pretty much use HOF's, but that's a client-specific issue and I don't want to explore client specifics/limitations, I want to explore solving CBA problems in a more general sense, not compare browsers to VB to PowerBuilder to Delphi etc. Other than that, I have no idea what the hell you are getting at. You called me "stupid" and I am itching to retaliate at this point. Where's my breathing exercises link? Break your scenario down step-by-step: who, what, when, where, and why. See UseCase. If you want to communicate, roll up your sleeves and do it right. If it turns out your claims are client-specific, then I am bailing out.

The "stupid" comment was in direct response to your suggestion that the only useful thing for an app to do while it waits for queries is play a Honey Boo Boo video. I mean, come on. Why would you jump immediately to something as random and worthless as that, when we'd already gone over a lot of more useful things? It's either stupidity or trolling, and I chose to attribute to ignorance what I could instead have attributed to malice.

I don't care what we're using on the client, and it's irrelevant to the topic at hand. Nothing about anything we've mentioned is client-specific. Since we've been talking about Web apps, the client probably would indeed use JavaScript, but there doesn't necessarily need to be any client-side scripting going on in these apps. Note that Node can do stuff other than Web apps: There are libraries for building a more traditional desktop GUI, the ability to access stdin and stdout for writing command-line apps, as well as provision for TCP sockets such that HTTP isn't the only option for servers. It's a very flexible platform, although Web apps are the usual choice.

Concurrency through evented I/O is a general pattern. It doesn't really need to be plugged into specific UseCases to be demonstrably useful; it has already been explained how evented I/O can improve the performance of a report-writing application, however. -DavidMcLean?

I'm sorry, I don't see what's explicitly being improved. You seem to be making some unrealistic assumptions. Parallelism alone is no guarantee of speed improvement. That's why I want to walk through a specific scenario. You are being too general and vague. I'm fucking tired of foo/lab examples of FP being great. I want real beef from a real goddam cow!

I already gave a simplified-but-practical example of how Node.js uses evented I/O to achieve improved concurrent performance, using a business-domain application (a report-writer). Did you not understand how it works? I'll try to explain in more detail, if required. -DavidMcLean?

That's not a UseCase. There are ways to run multiple threads without having to use (exposed) HOF's on clients and/or servers. You haven't ruled those out. Why are they "bad"? And cranking up the number of threads if the bottleneck is the RDBMS will do us no good.

Because they're multiple threads in the first place, which raises concerns of thread safety, race conditions, and so on. Evented I/O is usually single-threaded (Node is), making it simpler to work with. The preceding description of how Node's eventing system works may be worth another read; if you're still equating it with multi-threading, you haven't really got the basic concept. And the bottleneck isn't the RDBMS, as we've explained: It's local app code waiting for the database.

And the fact that these concurrency systems use explicit anonymous functions isn't a weakness. It's a strength, because functions are very easy to manipulate to do cleverer stuff. For example, retrieving two database queries in parallel to use in one report, a possibility I mentioned above, would be rather convoluted and messy using pure callbacks: You'd need to code up some referencing-counting junk and it'd be annoying. However, because higher-order functions are so general and flexible, you can write libraries to wrap up these sorts of concurrency patterns. In fact, if I wanted to implement the above two-query thing, I wouldn't even consider writing the callback structure myself manually. I'd just load up the async library and do this:

async.parallel({
users: makeQuery("SELECT * FROM users"),
posts: makeQuery("SELECT * FROM posts")
}, function(err, results) {
var users = results.users;
var posts = results.posts;
// can do whatever you want with these two now
res.write(aReportMadeUsing(users, posts));
res.end();
});

Bam. Two queries performed in parallel, used to construct one report. Tidy and intuitive. It'd be impossible to provide nice libraries like async.js if Node's concurrency didn't use handy things like higher-order functions. -DavidMcLean?

Usually an SQL JOIN or UNION is done to "combine two queries". The RDBMS can potentially parallelize multiple sub-queries. Also, multiple different techniques can implement a parallel "makeQuery" function. Bam! Granted many existing web frameworks and languages don't make doing such very easy, but that's likely because the need is not very common. The few times I can recall when I couldn't use JOIN or UNION to get the database to do it, the queries had "lopsided" profiles such that parallelizing them would not double the speed. For example, one may take 500 milliseconds and the other take 50 milliseconds. A non-parallel version would then take 550 ms and the parallel version would take 500 ms (under ideal conditions). That's hardly enough savings to bother in most cases. Optimizing the graphics on the page would probably give the app more of a boost per time spent, and keep the code simpler. Further, if the server is taxed, it may not be able to parallelize them anyhow, and/or they could end up competing for the same resource, such as disk or network I/O such that they end up waiting on each other anyhow. A lot of circumstances would have to line up just right to get a noticeable boost. If you look at the math in context of real systems and real bottlenecks, parallelism is often over-rated for CBA. I vaguely remember one profiling expert saying that as a rule of thumb, in production you get about 20% to 40% of the theoretical maximum of the savings. Thus, if "unstacking" two queries of the same size could in theory boost the speed from 2000 ms to 1000 ms, then the typical actual average would be something like 1700 ms (1000 + (1000 - 30% * 1000)). Spending that unstacking time tweaking with the query statements or indexes may give more speed per programmer time.

And databases are starting to work parallelism into their Stored Procedure languages. See also example "fern01" later.

makeQuery() isn't a parallel function. It'd be defined like this:

function makeQuery(sql) {
return function(callback) {
db.query(sql, callback);
}
}

And you still seem to be assuming the bottleneck is the database itself. It's not. The app code that has to wait for database results is. When you use evented I/O like Node, your app code doesn't have to wait for database results. Therefore that bottleneck is reduced. It's fairly simple, really. -DavidMcLean?

What else is it going to do during that time? If the user asks for Report X by pressing the Report X button, then the app has to run the necessary query(s) for Report X before delivering Report X to the user. Thus, either the user waits for the database to complete its job, or pressing Report X does something else besides (in addition to) deliver Report X, which would make the button a liar. Thus, it's either Lie or Wait. There is no 3rd option known to mankind. Maybe it can run Seti@Home while waiting so that aliens can answer that difficult question. (Seti@Home is a different app, but maybe you mix and match in weird ways such that your vision of "application" differs greatly from mine. It reminds me of the old joke: "The Emacs operating system needs a better editor.")

   Example: Frame-mania
   -----------------
   1. [Run Report A]
   -----------------
   2. Report B is finished. [View]
   -----------------
   3. Report C is running. [Cancel]
   -----------------
   4. [Run Report D]
   -----------------
   5. [Run Report E]
   -----------------
   6. Report F is running. [Cancel]
   -----------------
   7. [Run Report G]
   -----------------
   8. Report H is finished. [View]

results = db.query("SOME QUERY HERE");
print("DEBUG: ran query");
buildReportWith(results);

db.query("SOME QUERY HERE", function(results) {
buildReportWith(results);
});
print("DEBUG: ran query");

    <event name="event72" namespace="_auto" blocking="no">
      <script language="footran">
        myProcess01(7)
        if foo.setting < 4 then
           myProcess02()
           statusBox.value = "done with 2"
        end if
        logThingy(99)
      </script>
    </event>

    <event name="event_flip">
       <namespace name="foo">
         <!-- display the panel, and change title of its button -->
         <panel name="my_panel" visible="true">
         <button name="my_button" title="Flip the Thing">
       </namespace>
    </event>

<event name="event_flip">
<!-- display the panel, and change title of its button -->
<panel name="my_panel" visible="true">
<button name="my_button" title="Flip the Thing">
</event>

events.flip = function() { // display the panel, and change title of its button my_panel.visible = true; my_button.title = "Flip The Thing"; }

No, there's a third option, which I've explained already. The app handles other events in its queue while it's waiting for a particular one to return; all tasks that a Node app must perform are represented as events with callbacks, so handling other events amounts to getting other work done. This might include accepting new requests, finalising older requests, or perhaps handling some sort of deliberately-deferred action. Basically everything you'd want your Node app to do will be represented using events, so having Node process events rather than waiting around is extremely useful and provides a great performance boon for many applications. -DavidMcLean?

Sorry, without UseCases, I cannot evaluate your claims. As written, it sounds like you are jamming too much functionality into a single app. My joke about Emacs seems to be coming true here.

It has come true; there is a port of node.js to emacs called elnode http://www.emacswiki.org/emacs/Elnode

[Exactly the same functionality would be in an AJAX/Node.js app that would be in the same app using traditional approaches. The functionality would only be structured differently.]

Okay, but what specific spot of code or functionality is being improved? Yes it can do the same thing, but that's not the issue. I want to see the production-like code behind something like this:

Here's your app on HOF's:

  Hello World!
Here's your app on non-HOF's:

  Hello Wo~ %6 ` $k~[ z& j&^^

It's a little hard to know how to address your above request, because the claim we've made is for evented I/O, which in general is something that's only possible using callback functions or specific alterations to language semantic. I can't show a non-HOF version because evented I/O not using higher-order functions simply doesn't exist.

For the sake of completeness, by "alterations to language semantic", I mean adding a new keyword, like this:

async users = db.query(query here...);
async posts = db.query(query here...);
res.write(reportUsing(users, posts));

This would function equivalently to:

db.query(query here..., function(users) {
db.query(query here..., function(posts) {
res.write(reportUsing(users, posts));
}
}

This is a useful way to tidy up very simple callback-oriented code. It is a nice tool to have, and indeed there are a couple of JavaScript preprocessors that implement a very similar feature (Coco, for example, has "backcalls" which work like the above).

However, this keyword would be close to worthless if provided as the only option for implementing evented I/O, simply because it's not very flexible. An async keyword is indivisible and cannot be used to construct fancier, cleverer asynchronous logic; with a keyword like that, writing async.parallel wouldn't be possible. By contrast, an evented system focused on higher-order functions can easily have excellent libraries of asynchronous functionality written for it, because functions are so easy to manipulate in clever ways. -DavidMcLean?

We covered this example already above, didn't we?

I described above already why the occurance would be rare in practice. But even for the rare cases where we might want to un-stack the two queries in the app code, ideally the code/language would look something like:

 // example "fern01"
 users, posts: recordSet;  // declare
 parallel a {
     users = db.query(fooSql);  // thread "a"
 } b {
     posts = db.query(barSql);  // thread "b"
 } onError(info)  {
     raiseError("Problem: " & info.msg & " in " & info.threadName);
 } // end parallel block-set
 io.write(reportUsing(users, posts));

The "parallel" block-set is syntactically kind of like a CASE statement (not C's, it sucks badly). Such a language would NOT need (exposed) HOF's to give us something like this. Why would HOF's be an improvement on this for this kind of use? -t

So you're saying that any and all concurrency patterns should be implemented as built-in language constructs? This seems, to me, an awful way to design a language. Considering async.js alone, it contains ten different functions for concurrency control flow. Are you suggesting that there should be ten different language keywords for all these concurrency patterns? Isn't it better to provide a single, extensible concurrency framework, which can be supplemented with libraries?

In fact, the only advantage I see from your method is that it allows you to avoid learning how to use higher-order functions. Can you provide any other justification for your design, which clearly requires the language designers to hard-code every possible concurrency pattern? -DavidMcLean?

You seem to want a gate-way kit into fancy-land with concurrency/parallelism. I'm only looking to give my niche some basic constructs for the rare occasions where it's needed. KISS for the rare stuff. Besides, if you go fancy-land, it's harder for the company to find and hire programmers who can read the code. It's one of the reason why Lisp never caught on in paid development: it's too easy to re-invent your own language in the language, making it a write-only code-base. Everybody reinvents their own "clever" FOR-loop and such. It's like people who like the smell of their own farts. Maybe I'm a grumpy fogey, but so be it. Grumpy people are sometimes a good reality check for the cocky and naive. And get off my parallel lawn!

Re: "Are you suggesting that there should be ten different language keywords for all these concurrency patterns?"

No, a SystemsSoftware language could probably make fair use of such. I have no problem with different languages optimized for different domains or at least groups of domains. For example, I'm a fan of ExBase for smallish low-budget RAD and in-shop apps, but I'd never want to see somebody make an operating system with it.

Hard-coding concurrent constructs isn't KISS. Using one consistent means (callbacks) to implement all concurrency is. In addition, the callback-oriented method doesn't require the semantics of the original language (JavaScript) in this case to be changed at all! In short, it's a much more generic, capable, generally useful method than this hard-coded design you are for some reason recommending.

Also, uh… you think applications software can't make use of concurrency patterns? I don't even know how to respond to that. -DavidMcLean?

It's a rare need, and the above or other tools (such as the database) can stretch to fit such even if it's a bit more coding when the blue moon does happen. Spend Complexity Wisely. --TopMind

The problem with that is that it can't. Coding specific concurrency patterns into the language, rather than providing a generic concurrency mechanism, means that it's not possible to set up concurrency that doesn't conform to any of those patterns.

And, even assuming it were possible, it's going to be a huge mess. And because for some unfathomable reason you hate higher-order functions, you can't even wrap up that huge mess inside a function; that huge mess will need to appear every single time you use the pattern. If you use a generic mechanism to start with, it's easy to keep it simple by encapsulating complexity inside functions and libraries (like async.js).

Basically… your solution is objectively worse, and I still see no advantage to it beyond allowing you to pretend higher-order functions aren't useful in the business domain. -DavidMcLean?

Not if we are also measuring the complexity of the language against the frequency of usage of those constructs in practice. It's a matter of weighing trade-offs.

Even then, because your constructs are more complex than a callback system. If you use callbacks, you have a single, simple, consistent concurrency mechanism that matches up with the rest of the language. If you use your hard-coded constructs, you have a whole bunch of possibly-inconsistent new keywords that aren't anything like the rest of the language. Your way is also more complex. -DavidMcLean?

How are you measuring complexity? What key-words?

A simple count of how many constructs the language has. Under my method, there's exactly one: first-class functions. Under yours, there are at least two: your "parallel" construct and another for accessing things serially, like the "async" keyword I demonstrated earlier, and your method requires you to add a new language construct for any new pattern. Like I said, async.js has ten functions for this purpose, so you'd need at least ten new constructs for completeness. -DavidMcLean?

Please explain why I need an "async" thing. Maybe CBA doesn't need "completeness". If we find a REAL need for a new construct, THEN we'll add it. YagNi.

For making any kind of request serially. Perhaps you have to make a query that depends on the results of another; maybe you're using a ControlTable for example. If you only provide a parallel construct and no serial one, you run into exactly the same issues as providing a serial construct and no parallel one: Both are extremely required, and you can't easily emulate one using the other. -DavidMcLean?

Can you identify a realistic CBA scenario where this might happen?

I just did. ControlTables. I'm given to understand you like those. -DavidMcLean?

No no no, a business setting. You know, UseCase. I WILL ALLOW ACTUAL RELATIVELY-FREQUENT SCENARIOS AND ONLY ACTUAL SCENARIOS TO DICTATE WHAT SHOULD BE IN A CBA LANGUAGE. Control table's are building blocks, not buildings.

… why? You don't see "oh, yes, you can make queries, but you can never ever use the results of a query for control logic" as a serious language flaw, whether or not you have a specific business scenario to use a query in that way? -DavidMcLean?

I don't know what you are talking about. You seem to solve problems backward: focus on the technology first and then think about the problem being solved when it should be the other way around.

If it's possible to make queries in parallel but not serially, it's not possible to use the result of a query to make another query. Since you advocate ControlTables so much, you should be able to appreciate why this is an issue.

I still don't know what you are talking about.

    result_1 = db.query(fooSql);
    result_2 = db.query(barSql);

In general, putting arbitrary limitations on your programming language, like this one, is completely pointless; they make your language worse while offering no advantages in return. So, I ask: Do you believe your language, which leans toward concurrent coding significantly (given the fact it has a "parallel" built-in construct), needs a corresponding construct for serial asynchronous calls? -DavidMcLean?

I have to disagree. In some ways programming languages do indeed "herd" developers to use certain patterns and conventions, and this is generally a good thing with teams and staff turn-over. It's difficult for the average developer to master 15 different ways to do concurrency/parallelism (and lots of other important topics also). It's somewhat comparable to standardization of equipment parts so that one can mix and match them without custom fiddling and filing (as in sand-paper) for each one. The industry wants PlugCompatibleInterchangeableEngineers whether a given developer personally wants that or not. My prior comments about Lisp apply here also. It's all a matter of weighing tradeoffs: learning time, stuff replacement time, the cost of app performance problems, etc. Many developers tend to obsess on some feature or aspect and want to go hog-wild with it (like Tables maybe? :-) Limiting the language reduces the damage done by wild individual tendencies. In CBA, being cheap to create and easy-to-maintain is usually given priority over software speed, except high-use applications or features, and these can usually be improved without HOF's. If you do the economic number crunching, your techniques don't hold water.

Every now and then a project comes along that may indeed need high-end concurrency/parallelism control and the owners may opt to hire a specialist on that subject. However, they know or should know that it's going to limit their ability to hire/rent somebody who can maintain it. Or, just buy top-end servers and use mediocre c/p programmers.

If all programmers were top-of-the-line: well-educated, experienced, had FastEyes, and photographic memories, then we'd all be using Lisp, the most flexible language known to man. (But they'd probably have the people-skills of an alligator such that nobody would want to approach them without a suit of armor and mace.)

On a contract I was once ordered: "Don't over-factor your functions, it confuses our programmers." Like I keep saying, programming is mostly about WetWare, NOT logic, NOT math, NOT physics, and NOT machines.

[You know, you're right. Please keep using the tools you're using, and please continue to cater to the lowest common denominator. That way, you, your company's developers, your clients and your users will be less likely to compete successfully against me, my company's developers, my clients, and my users. Meanwhile, I'll use tools that help me make better applications better than you, so that I can maintain my competitive advantage over you.]

A related SlashDot comment by somebody else: http://ask.slashdot.org/comments.pl?sid=3356465&cid=42549787

       // copy from slashdot link
       What the Business values:
       1. Correctness
       2. Reliability
       3. Maintainability
       4. Speed
       5. Coolness

What the Developer values: 1. Coolness 2. Speed 3. Correctness 4. Maintainability 5. Reliability

[Are you suggesting that improving the fluidity and concurrency of business applications -- which potentially has a direct effect on application usability, and therefore (often) profitability -- is being done at the expense of correctness, reliability and maintainability? Actually, libraries like Node.js make it easier for developers to meet certain requirements (like: make sure our users can keep editing whilst their reports run, but don't open more tabs because users lose track of them and we can't control how many reports are generated at once) without having to worry about the technical complexity of implementing concurrency, which allows developers to focus on correctness, reliability, and maintainability.]

You haven't demonstrated "correctness, reliability and maintainability". You've reinvented GOTO spaghetti, but as HOF's instead of noodles.

A strong claim, and one you haven't made in this discussion previously. How do you plan to back that accusation up? -DavidMcLean?

Next year. This year we focus on your claim that HOF's make common CBA's objectively better.

I've never made that claim, but it's so general that it's evidently true, and I'll make it now. Higher-order functions do make common custom business apps objectively better, in terms of code re-usability, maintainability, and readability. -DavidMcLean?

You cannot objectively measure "readability". It can be done, but I'm sure you don't have the resources. http://geocities.com/tablizer/goals.htm#base has some suggested numeric metrics for maintainability that you can apply to a CBA code base example. (See also CodeChangeImpactAnalysis.)

Okay, I'll provide a (probably) objective metric: Code with fewer elements not pertaining directly to the specific problem being solved is more readable. Do you find this metric reasonable? -DavidMcLean?

Are you including or excluding "completely unrelated"? How is "directly" versus "not directly" measured?

If something is completely unrelated, by definition it doesn't directly pertain to the problem. As for the second point: If exactly the same code snippet would also be used to solve a different problem (or a few different problems), it's not specific to the problem. Reasonable? -DavidMcLean?

No, it's not reasonable because the more generic a thing becomes, the more it tends to resemble a programming language, and thus you can get a really high score by reinventing a TuringComplete programming language.

Uh… we're talking about programming languages. Already. That's what we're doing. If you mean that one might write an interpreter for a language, and then write problem solutions in that, then yes, that's a thing one can do. It's not a thing that gives you a high score, however, if the language is indeed a general-purpose TuringComplete language, because it still won't have absolutely problem-specific code. -DavidMcLean?

What exactly do you mean by "elements" in "code with fewer elements"?

Let's say tokens, to keep it concrete.

That can also be wiggled up or down based on language design. WaterbedTheory.

Okay, I'll try to clarify as to the point here. The kind of metric I'm looking for is one that measures the amount of boilerplate needed to perform a given task. For example, to sum the numbers from one to ten (as in HowToSumFromOneToTenInLispAndScheme?) looks kind of like this in most C-style languages:

int count = 0;
for(int i = 1; i <= 10; i++) {
count += i;
}

Twenty-four tokens. If we allow types to be dynamic or inferred, we can cut down the tokens by two. In Ruby, the same code looks like this:

count = (1..10).inject :+

Ten tokens, plus it's more apparent that the code is operating on the range 1-to-10. "inject" is not a very descriptive method name, I'll admit, but it's a Ruby idiom and would be recognisable for Ruby programmers. Now, we can express the same thing like this in Haskell:

count = sum [1..10]

Eight tokens now. More importantly, this is almost a direct description of the problem, except with some punctuation instead of words. "let count equal the sum of numbers from one to ten". sum isn't a higher-order function (although it was defined with one), so this isn't an example of code concision attained from higher-order functions. I think it expresses well the notion of language boilerplate, however. -DavidMcLean?

But we can have a similar DomainSpecificLanguage/API using a "plain" Algol/C-like language:

 count = sumList(rangeList(1,10));

Here, rangeList creates a list starting with the first parameter and ending with the end parameter. "sumList" adds up all the elements in the list.

I anticpate you are going to argue that one can use HOF's to inject an arbitrary operation on all elements of the list, similar to the "sorter" with an arbitrary comparing operation in the ArrayDeletionExample. However, this pattern doesn't happen that often at all in CBA (or at least nobody has found a way to convert biz apps to use such a pattern with an improved overall result). In HofPattern somebody seemed to more or less agree when they said "It also means you are, indeed, unlikely to need HOF".

Yes, you can write an API like that for a C-like language, which means you can get a similar complexity score for them too. Nothing wrong with doing it that way; I just chose examples that were only using standard-library functionality.

That gets into the issue of what to pack in the standard library and what not to. It's all about weighing tradeoffs.


Here's an example of a "business need" using the "two query" scenario(s) proposed above: Because of a recent corporate merger, the product description and price information currently comes from an Oracle RDBMS while the sales tallies from regions come from a Microsoft RDBMS (using product ID). Because they are different databases from different vendors we cannot join the two table sets on the database side, and probably must do it in the app. We want to speed the app up by allowing the app to query both the Oracle and Microsoft DB's in parallel in order to obtain RAM copies of the two table sets. It's a legitimate scenario. I rate it about "D+" on a commonality/frequency scale for reasons already given, but at least it's something we can explore from both the business needs side and the technical side. -t


Limiting Concurrent Access to a Scarce Resource

As mentioned above, using NodeJs or other single-threaded eventing environments it is trivial to wrap your database such that the server can only make a given number of concurrent queries. I recently had the opportunity (the need, really) to implement a similar setup; the scarce resource involved was not a database but RAM, due to running a RAM-expensive PDF-generation process. However, the solution, it turns out, is capable of being applied to any scarce resource without much complexity. Here's how. (The version I actually used in production is designed to work on Q promises http://documentup.com/kriskowal/q/ rather than on (err, res) callbacks; I've made this one use callbacks since we're mostly talking about core Node in this page, but the details are pretty much the same between both versions.)

limitSection = (n, f) ->
  running = 0
  return inner = (args..., cb) ->
    if running >= n
      return process.nextTick -> inner args..., cb
    ++running
    f args..., ->
      --running
      cb arguments...

# example usage # to run a maximum of five queries simultaneously db.query = limitSection 5, (sql, cb) -> makeTheQueryWithTheSqlGiven sql, cb

The reference-counting (in the running variable) might be a little silly, but the limit decorator as a whole is short and very simple. Once implemented, it'll work on absolutely any async call you want. Thus, as stated above, it is indeed trivial to wrap the database (or any other asynchronous call) such that only a particular number run concurrently. -DavidMcLean?


My attempt at a summary of this topic, from my perspective: I have not seen any frequent and realistic scenario where NodeJS would significantly help typical apps that I see. There is no major "blocking problem" to be solved on the app side (and minor ones can be solved without HOF's). The bottleneck is usually with the DB, not the app code. Perhaps NodeJS can make slight improvements to the app-side speed/responsiveness, but at the expense of code readability to average developers, similar to what is described in the heated topic GreatLispWar. Any big improvement should be easy to demonstrate with a code sample and scenario, which didn't happen here, except in reality-deprived lab toys. Slight improvements could be harder to demonstrate because they are only noticeable in aggregate or in subtle areas. Maybe there is something at the "slight" end of benefits, but again, probably at the expense of code readability. Either way, it's difficult to present the "slight" case here. -t

Rant from another "no-getter":

http://www.michaeljaycantrell.com/blog/2012/5/8/nodejs-i-dont-get-it.html

Which links to a cartoon even:

  http://www.you tube.com/watch?v=bzkRVzciAZg    
  // Remove space from "you tube" to watch. (Auto-play may happen without)
  // I especially like the analogy to assembly programming.

The comments at the bottom and the link to the StackOverflow thread are interesting. Obviously, NodeJs isn't for everything, and I don't think anyone here claims that it is. You write, "I have not seen any frequent and realistic scenario where NodeJS would significantly help typical apps that I see." I agree; I'm sure you don't, and with good reason: Conventional data-in/report-out applications and their serial UI model (i.e., enter data, choose report, wait for report to generate, rinse and repeat ad infinitum) won't benefit.

Here's a business-oriented case: Imagine that all your reports need to be displayed concurrently, in a scrollable executive dashboard of live graphical thumbnails that show live KPI summaries based on up-to-the-second OLTP data, with UI controls to filter and select data in real-time, and the thumbnails can expand into pop-up dynamic detailed 3D graphs as you pass your finger (touch-screen, of course) over them whilst the other thumbnails keep updating in the background. For that kind of dynamism, NodeJs and other event-driven, non-blocking toolkits make development relatively easy without the concurrency-management complexity of classic multi-threading. If you don't create applications like that, then you don't need NodeJs and its kin. If you do create applications like that -- and such "big data" visualisation executive toys are becoming increasingly popular, so some of us do create applications like that -- then NodeJs and the like are helpful tools to know.

There are multiple ways to skin the cat and the devil is often in the details (specifics of the scenario). That being said, I won't challenge such usage of NodeJS for now. However, I would point out that the decision probably heavily depends on the nature of the GUI system(s) involved and perhaps the network technology used to gather the info.

Indeed, there are many ways to skin the cat, and the specifics of the scenario may well define the specifics of the solution. When the specifics of the scenario require a client/server approach with a responsive, multi-tasking user interface, then NodeJs is a tool -- one of a number of such tools -- that may make it easier to build.

The best solution would probably be something that allows multiple (optionally) independent applications to run "in" their own sub-window with their own process regardless of the language of the app and/or of the "monitoring" sub-window, such that JS and HOF's are not really an issue, other than perhaps a nice side option to have. (And I would hope it didn't have the memory leak like your BradyBunchGridDiscussion example did.)

That might be the best solution to one problem, or even a category of problems. I've no doubt someone will eventually create such a tool and put it on GitHub or SourceForge. Indeed, someone may have done so already. It might use NodeJs and/or HOFs internally, so that you don't have to see them. However, frameworks like NodeJs and language constructs like HOFs are general-purpose and intended to solve the broadest possible range of relevant programming problems. If you never encounter such problems, you may have no use for them. If you do encounter such problems, and a tool exists that solves your problems (by, perhaps, using NodeJs's evented I/O or HOFs internally) then can use it and never see them. However, if you do encounter such problems, and no tool exists that solves your problem, then evented I/O (e.g., NodeJs) and HOFs may help.

By the way, the BradyBunchGridDiscussion example (http://shark.armchair.mb.ca/~dave/hofajax/) didn't have a memory leak. It ran flawlessly on IE, Android Web browser, Safari and Chrome. It was FireFox (version 28?) that had a memory leak, something that is quite well known. Google "firefox memory leak" and see (for example) http://support.mozilla.org/en-US/questions/998289


JanuaryThirteen


CategoryJavaScript, CategoryConcurrency, CategoryFunctionalProgramming, CategoryDiscussion


EditText of this page (last edited August 25, 2014) or FindPage with title or text search