Validating Prepared Statement Definitions

On PreparedStatementsDiscussionTwo, it is alleged that PreparedStatement definitions consisting solely of static string definitions -- which optionally make reference to other static strings -- are easier to validate than dynamic SQL definitions that must (in order to be dynamic) make reference to external data and optionally make reference to other dynamic definitions.

Here, we examine that claim.

A typical set of PreparedStatement definitions, in a language like PHP, will look like the following:

 <?php
   // Example-1A
   $prepClauseStatus = "s = 'available'";

$prepStmtDef01 = "SELECT * FROM piddle WHERE p = ?"; $prepStmtDef02 = "INSERT INTO fizzle(x, y, z) VALUES(?, ?, ?)"; $prepStmtDef03 = "SELECT * FROM piddle WHERE " . $prepClauseStatus . " AND p = ?"; $prepStmtDef04 = "SELECT * FROM furgle WHERE " . $prepClauseStatus . " AND q = ?"; ?>
Note that the code is highly constrained, consisting strictly of assigning variables. Variables are only assigned string literals, or the concatenation of string literals and variables. We can express that as a grammar. Assume the existence of token definitions <VARNAME> which is a variable name (e.g., $prepClauseStatus), and <STRING> which is a sequence of any characters delimited by '"'. Other tokens are identified by their literal text.

The grammar, in modified EBNF, is:

 PHPProgram ::= "<?php" ( Assignment )* "?>"
 Assignment ::= <VARNAME> "=" StringDef ";"
 StringDef ::= ( <STRING> | <VARNAME> ) [ "." StringDef ]
If we wanted to, we could build a parser for that grammar to automate validation of PreparedStatement definitions. Additional functionality would be to verify that <STRING>s do not contain embedded variable references, and ensure that <VARNAME>s in the StringDef rule only refer to variables defined by the Assignment rule in the same PHPProgram.

We can take the same approach with dynamic SQL definitions. Typical dynamic SQL string definitions might look like the following:

 <?php
   // Example-1B
   $sqlClauseStatus = "s = 'available'";

function sqlDef01($parm1) { return "SELECT * FROM piddle WHERE p = " . sqlStringFilter($parm1); }

function sqlDef02($parm1, $parm2, $parm3) { return "INSERT INTO fizzle(x, y, z) VALUES(" . sqlStringFilter($parm1) . "," . sqlStringFilter($parm2) . "," . sqlStringFilter($parm3) . ")"; }

function sqlDef03($parm1) { return "SELECT * FROM piddle WHERE " . $sqlClauseStatus . " AND p = " . sqlStringFilter($parm1); }

function sqlDef04($parm1) { return "SELECT * FROM furgle WHERE " . $sqlClauseStatus . " AND q = " . sqlStringFilter($parm1); } ?>
Note that we can't define them as static strings like we did with the PreparedStatement definitions, because we need to incorporate dynamic external data. So, we define them as functions, in order to achieve the same flexibility to pass arbitrary parameters that we would have using PreparedStatements.

The grammar might look something like this, assuming a pre-existing token <FNNAME> to represent a function name in addition to the same <VARNAME> and <STRING> tokens used above:

 PHPProgram ::= "<?php" ( FnDef | Assignment )* "?>"
 Assignment ::= <VARNAME> "=" LiteralStringDef ";"
 LiteralStringDef ::= ( <STRING> | <VARNAME> ) [ "." LiteralStringDef ]
 FnDef ::= "function" <FNNAME> "(" ParmDefs ")" "{" "return" StringDef ";" "}"
 ParmDefs ::= <VARNAME> [ "," ParmDefs ]
 StringDef ::= ( <STRING> | <VARNAME> | WrappedExternal ) [ "." StringDef ]
 WrappedExternal ::= <FNNAME> "(" <VARNAME> ")"
If we automated it, additional functionality would be to verify that <STRING>s do not contain embedded variable references and ensure that <VARNAME>s in the StringDef rule only refer to variables defined by the Assignment rule in the same PHPProgram.

Obviously, the latter grammar is more complex. The first three rules in both grammars are very similar, but the former is only three rules. The latter requires seven rules. This makes us ask what is being gained -- at least in this case -- by using dynamic SQL instead of PreparedStatements.


My summary of the above examples: The author of this example cherry-picked both the specific filter-related API style and the level of abstraction shown here such that the example is misleading. It does not consider all the relevant levels and all the possible API styles. The details of this complaint and counter arguments can be found below. --TopMind

What is this? A signed evidence-less editorial by TopMind, inserted here loooong after TopMind has already been anonymously debating below?

Top, for someone who so frequently decries ArgumentFromAuthority (whether it actually is or not, and it usually isn't), this appears to be nothing more than an attempt to use what you believe to be the strength of your own reputation to back up your own previous arguments. In other words, you appear to be trying to use yourself in your own ArgumentFromAuthority!

The examples at the top of this page are merely one semi-formal proof of an obvious observation: On the client-side of a DBMS, PreparedStatements are simpler to audit than DynamicSql because DynamicSql construction requires filtered external data but PreparedStatement definitions do not. This is true whether PreparedStatement definitions or DynamicSql are being constructed inside some middle-ware, or exposed to the application developer.

Your "relevant levels and all the possible API styles" are irrelevant, because that simple observation -- and its semi-formal proof above -- holds true regardless what level we examine, or what API style is used.

It's not ArgumentFromAuthority because the details of the reasoning are given below, as stated. It's a summary of my criticism, and summaries are typically not the place to give detailed evidence and logic. I believe it's valid and fair to put such a summary in that spot. If a majority of 4 or more WikiZens side with you, I'll remove or move it. As far as the rest of your statements, they repeat already-given claims and are replied-to below. I won't repeat those replies up here in the spirit of OnceAndOnlyOnce. -t

Re: "...simple observation" -- Nothing you do is simple. You can complicate a period.


First of all, I don't consider PHP's prepared statements (PS) "typical". Microsoft, for one, usually expects a type indicator be given in their PS API's. Second, you used unnecessarily large function names for the wrappers/filters, making it look more verbose than it needs to. But these are mostly separate from the grammar processor issue.

You are using a variation of ArgumentByLabToy. The main problem is that functions and other language elements will probably need to be used anyhow for OTHER purposes (general factoring and code organization) such that a parser will have to be able to process them. If a parser can only process certain statements but not most or all, then the statements/syntax it cannot parse will often "derail" it, making it useless or unreliable. Typical programming language parsers don't have common sense and get thrown off track easily if not nearly perfect. Thus, even if we are only interested in 10% of the code for a particular usage, the parser still has to recognize 100% of the code (to be reliable) to properly recognize that 10%. A Guess-A-Tron can be built for either case if we accept "fuzzy parsing".

Also, it's not likely that all SQL clauses would be defined in one place for non-trivial apps. That's anti group-domain-related-things-together. You are grouping on processing technology (database), not on domain issues. Your parser has still gotta process functions, modules, and variable & naming scopes, dude.

--top

PHP's PreparedStatements are typical for PHP PDO, and as I was illustrating PHP, it seemed appropriate to use what's typical for PHP. As for the "unnecessarily large function names", the size of function names is irrelevant. What's intended to be compared is not the size of the PHP code, but the size of the grammar definitions. They are a better indication of code complexity (in this case) than code examples.

As for the above being "a variation of ArgumentByLabToy", my examples are structurally the same -- differing only in object names and query contents -- as real production code I've worked on or written. Whilst my grammar/parser examples could easily be automated, typically the only "automation" is getting a developer to audit it who hasn't written the code. In that case, the simpler grammar represents easier -- and therefore more reliable -- auditing.

You're right -- not all SQL clauses would be defined in one place for non-trivial apps. There may be multiple definitions like those shown above, associated with their relevant classes, modules, components, or whatever. In PHP they can (like with most other languages) be confined to their own source files so I don't have to process functions, modules, and variable and naming scopes, "dude".

Re: "as real production code I've worked on or written." -- Well, it does not fit typical structures I encounter and make. The number and complexity of mouse, keyboard, and eyeball movements would be roughly comparable between the two general techniques when in a fuller code-base environment. You seem to switch back and forth between a WetWare-based argument and a machine-based argument (parsing machine) as it suits a specific narrow argument. You seem to be cherry-picking your reference specimens per sub-argument to force your argument into a certain (misleading) direction. It's a debate technique you've used before.

[Can you give an example of the typical structures you encounter and make?]

That could take a while to code up in a way that makes sense to readers. I don't feel like doing such today. PS's don't interest me enough to motivate me. I'd rather spend pseudo-coding efforts on DynamicRelational or MultiParadigmDatabase, which are far more interesting to me than PS's. If you want a PS's example, you'll probably have to wait. -t

[Top, I'll take that as a "no" until I've seen otherwise, just like your "wrappers done right".]

The default answer is NOT your position, but rather "unknown" until either side presents better evidence.

[The fact that you haven't provided them is the evidence.]

Exactly. I have provided evidence at the top of the page. That you, Top, would like to see "better" evidence is irrelevant; some evidence, any evidence, trumps none at all. We do not default to "unknown" merely because you don't agree with my evidence and lack any of your own.

When your parser was proven to be useless, you waffled back to WetWare, but didn't fill in any details on measuring WetWare. That's how I interpret your round-about "presentation".

And if we are talking WetWare, like I've said before, long positional parameter "lists" can get awkward. Positional parameters are often okay in small quantities, but a maintenance headache for larger lists.

How did you prove my parser "to be useless"? It's very useful. I've built such parsers and they're very useful.

What does positional parameters have to do with this?

PageAnchor same-01

Well, actually the wrappers could use positional parameters also. It's just an API design issue: a personal syntax preference. I personally see no code maintenance benefit in such, but if YOU like such, then fine, but don't have a PersonalChoiceElevatedToMoralImperative.

[Sure it could, but what does it have to do with anything being discussed on this page?]

You are really arguing about API design, NOT prepared statements, such that this grammar pissing contest resolves almost nothing directly related to PS's.

The so-called "grammar pissing contest" demonstrates that code defining PreparedStatements is simpler than code defining dynamic SQL, and so is easier to audit for potential security issues.

The syntax could be made nearly identical to the PS calls with some API rearrangement. Thus, a syntax-only analysis would produce nearly identical scores.

Indeed, an API could be designed to identically support dynamic SQL queries or PreparedStatements. Then the only significant difference between the two (performance issues aside) is that dynamic SQL has to rely on filters to prevent SQL injection. Thus, the sole risk mitigation against SQL injection is the quality of the filters. With PreparedStatements, there is no such risky dependency. It is for that reason that OWASP recommends PreparedStatements even over its own ESAPI filters.

You are wandering off the topic of syntax, grammars, and code style. We've been over the risk issue gajillion times in other topics and I see no reason to re-re-re-re-re-debate here. YOU may like repetition, I don't. My "same syntax" point stands. QED. Good Night.

Yes, I am wandering off the topic of syntax, grammars and code style, because they're irrelevant if we define -- as you suggested -- a common API that can work with either dynamic SQL or PreparedStatements. If we have a common API that can work with either dynamic SQL or PreparedStatements, why would we ever choose to use it with dynamic SQL?

I don't understand the question. Please clarify or restate. Perhaps I am not clear on what you mean by "dynamic SQL". I've talked about the need in non-trivial systems to have "libraries of SQL snippets" to mix and match SQL portions for different needs. For example, we may have snippets for product categories, and another set of snippets for locations/regions, and yet another for "report types/styles". These different factors will interweave because they are mostly orthogonal. I'm not sure I'd call that "dynamic" or not because implementation may vary per tool stack. How about "compos-able" as a working term? The granularity of a full SQL statement is often too large for typical modularization/re-use needs in medium and large organizations or complex apps. -t

PageAnchor Dynamic-SQL-Def-01

By "dynamic SQL", I mean constructing whole SQL queries by string concatenation, in which a SQL query is created as a text string composed of static text (e.g., "SELECT * FROM customers") and optional static string constants (e.g., "SELECT * FROM customers WHERE " + $southRegionWhereClause) and optional external data filtered to hopefully prevent SQL injection (e.g., "SELECT * FROM customers WHERE " + $southRegionWhereClause + " AND custID = " + sqlStringFilter($_POST['userInputID']) and the whole string is passed to the DBMS connection's "execute query" function.

Of course, the more secure alternative is to use PreparedStatements. They still allow SQL queries to be constructed by string concatenation. A SQL query can be created as a text string composed of static text (e.g., "SELECT * FROM customers") and optional static string constants (e.g., "SELECT * FROM customers WHERE " + $southRegionWhereClause), but no external data should be included. Where we want external data, we define a parameter -- e.g., "SELECT * FROM customers WHERE " + $southRegionWhereClause + " AND custID = ?". We pass the string to the DBMS connection's "create prepared statement" function and get back a handle to a prepared statement. When we want to execute the query, we invoke it by its handle, and pass an argument to the parameter specified by the '?'. E.g., prepStmt.Execute($_POST['userInputID']).

Regardless which approach we use, we can define an API that abstracts the underlying approach. In use, it might look something like this:

 $qry = "SELECT * FROM customers WHERE " + $southRegionWhereClause + " AND custID = ?";
 $handle = $dbConnection.CreateQuery($qry);
 $handle.setParameterType(1, ParmTypes.String);
 $handle.Execute($_POST['userInputID']); 
The above could work with either dynamic SQL or PreparedStatements. If we're using dynamic SQL, setParameterType() determines what filter function gets used. If we're using PreparedStatements, setParameterType() identifies the parameter type.

So, if we have the same API for either dynamic SQL or PreparedStatements -- and we can see it's reasonable that the same API be used for both -- why would we ever use it with dynamic SQL?

I suppose you are going to argue that if the API's can be made pretty much the same, then the only decision difference maker is which one is more secure such that security becomes the pivotal factor (and SP's are the alleged security winner). That may be the case, but I not going to repeat the "security fight" here.

I haven't tested the API swappability between the underlying techniques (PS's versus filtering) in practice and so will not comment on that further. In theory we could have:

In theory (with what's known), it does not appear that syntax or grammar in terms of application programmer (database API programmers aside) or automated verification favor one or the other.

If a person or automated code analysis system does not detect that "outside values" are slipping in, then SQL injection can happen BOTH for PS's and filters. Thus your statement that filters are more vulnerable because one may "forget to wrap" or that it's easier for automated code analysis tools to verify is false. PS's do not prevent "slipping in" of outside values. One must inspect the app code to ensure such. Code inspectors won't necessarily face different code under each (PS's or filters) such that detection of one or the other is not easier or harder under each, as you originally claimed. You lost the "complexity/grammar" debate, bub. -t

We've already been over this one several times, "bub". The whole point of this page is that it's objectively simpler (based on grammatical complexity) to audit PreparedStatement definitions than dynamic SQL for inadvertent inclusion of external data. Also, what do you think is easier to do by accident: Include an external variable in a PreparedStatement definition, keeping in mind that accessing external variables normally requires special syntax -- like $_POST['varname'] in PHP -- specifically to prevent accidental inclusion? Or forget to wrap an external variable with a wrapper function?

Arrrrrg. How the hell is it "objectively simpler based on grammatical complexity" if the code can be nearly identical? I have no idea what your $_POST['varname'] example is about. It would "leak in" or not leak in the same way if the code is the same. Arrrrg. For example, using PS's does NOT prevent the following:

 // Example post-01
 $foo = $_POST['varname'];
 ...
 $sql = "SELECT * FROM users WHERE " . $foo . " AND USERNAME = ? AND PASSWORD = ?";
 $stmt = $dbc->prepare($sql);
 ...
I thought you were referring to the original examples. If we're talking about the API, then the process shown at the top of the page for auditing static strings is the same for any API. If you're following any conventional discipline for avoiding external tainting in Web apps, then it's much harder to inadvertently include an external variable where you don't want one, than to inadvertently forget to put a wrapper around an external variable were you do want one. Your example suggests freely intermixing SQL code and application code. Aside from the poor SeparationOfConcerns that represents, it's a potential security nightmare.

Why did you think I was "referring to the original examples"? I stated "Code inspectors won't necessarily face different code under each...". What did you think I meant by that? And I don't know what you mean by "conventional discipline". If you want to claim that different API design styles have different/greater risk (programmer mistyping) profiles, that's orthogonal to the PS versus filtering issue and probably should be split into a different topic and is one I may not choose to participate in.

I interpreted your "different code under each (PS's or filters)" as referring to different code. If it was the same API for both PreparedStatements and filters, why would there be "different code"?

An aspect of conventional discipline for developing secure Web applications is to isolate SQL definitions into their own modules, so as to reduce the likelihood of inadvertently tainting the definitions with external data. The examples at the top of the page are examples of aspects of that approach.

But the key point still stands: the usage of PS's versus filters does NOT by itself make for easier-to-inspect code, per accidental outside data leak-ins. API style/syntax design is technically irrelevant. You lost the "inspection" debate pure and simple. -t

As I pointed out above, if we have the same API for PreparedStatements as filtered dynamic SQL, then conditions are the same for both. Except, of course, PreparedStatements are more secure. Why, using such an API, would we choose dynamic SQL?

If we don't have the same API for PreparedStatements as filtered dynamic SQL, then it's easier to identify whether or not PreparedStatement definitions have been tainted by external data, because PreparedStatement definitions should only reference static strings and never reference external data. The corresponding dynamic SQL definitions are more difficult to audit because they necessarily must mix filtered external data with static strings, and also must ensure that all external data is filtered.

Please elaborate. How does it verify should-ness exactly?

See the top of this page. In particular, "If we wanted to, we could build a parser for that grammar to automate validation of PreparedStatement definitions. Additional functionality would be to verify that <STRING>s do not contain embedded variable references, and ensure that <VARNAME>s in the StringDef rule only refer to variables defined by the Assignment rule in the same PHPProgram."

The top of the page is comparing the apple API design to the orange API design.

How so?

What? They are obviously two different query API's (or at least "usage styles") to me, but the style choice is mostly orthogonal to PS usage, as I thought we already agreed. (Further, per above, it's questionable to wrap every query in a function, but I'll leave personal preferences to each person.)

Of course they're different APIs, but they accomplish the same thing: defining a query that incorporates external values for subsequent execution.

Sorry, I'm not understanding your point. I didn't dispute that they produce the same results. There are multiple API designs that can accomplish the same thing. That's not news.

You said they're "comparing the apple API design to the orange API design". What they're actually comparing is the fact that defining PreparedStatements as shown above -- vs defining dynamic SQL as shown above -- requires objectively less effort to audit.

As for wrapping every query in a function, how would you define a set of parametric queries outside of the main program logic?

I'm not sure what you mean. That appears to be an artificial requirement. Typically I'll only factor them to a function if they are reused at least 3 times (the same SQL called from 3 diff spots), or 2 if it's a large volume of SQL. No use putting everything in a function for the sake of putting everything in a function.

Every query that incorporates external data is a parametric query. If you define SQL queries in-line in your application code, you run the risk of tainting your SQL queries with external data. If you define SQL queries in their own modules, then you run much less risk of inadvertently tainting your SQL queries with external data. Let's assume you define SQL queries in their own modules to reduce the risk of inadvertently tainting your SQL queries with external data. How, then, do you define them so that they incorporate external data when you need them to?

Parametric versus in-lining is a code style/design issue orthogonal to PS's. Parameterizing can be done via functions and via "insertion point markers" such as question marks. I don't want to get into a code design battle here unless it clearly relates to PS's. But I would point out that positional (parameterized) can be difficult to work with when you have lots of fields/columns in a query. There can be a maintenance/reading costs to such. Whether that overrides other issues I'll leave to a shop manager/owner decision. As a coder it's not my job to make such tradeoff decisions, only describe and recommend. If you don't like their final decision, go bicker with them, not me.

Every detectable "mistake path" you've identified so far is tied to the API design/usage-style and not to the usage or non-usage of PS's. Complaints that a given API design/usage-style invites more app-coding mistakes compared to another is not related to PS usage versus non-PS-usage.

Indeed, we've already discussed this both above and below. A unifying API can be created to make definition and invocation of queries -- whether they're PreparedStatements or dynamic SQL with filters -- identical. In that case, the only distinguishing factor is security risk.

So what's all this crap about "grammar complexity" then?

It holds true if you don't use a unifying API. It's also holds true for the author of the unifying API.

Bad things happen if you use a bad API or mix same-purpose API's for no reason? Gee, that's news. Who would've guessed?

But that's not the point. The point is that there's never a good reason to prefer dynamic SQL with filters over PreparedStatements.

That's not the point of THIS topic as I understand it.

This page was spawned from PreparedStatementsDiscussionTwo, itself spawned from TopOnPreparedStatements. All contribute evidence to the thesis that there's never a good reason to prefer dynamic SQL with filters over PreparedStatements.

"Easier to validate" was the claim that spawned this. That claim has been flattened, and the "safer" issue is covered in other topics.

What do you mean by "that claim has been flattened"? It is easier to validate PreparedStatement definitions and they are safer. Of course you can wrap PreparedStatement and dynamic SQL definitions in an API that hides either or both, but that's always possible. For example, if you're comparing 'for' loops to 'while' loops, you can always hide the code you're considering inside a function called 'foo', but then you're no longer comparing 'for' loops to 'while' loops.

However, for the reasons described above, PreparedStatement definitions are grammatically simpler to validate -- even if the only place we need to validate them, in comparison to dynamic SQL definitions, is inside the aforementioned API.

Repeating of summary claims doesn't cancel the details of your failure.

Can you identify and defend the "details of [my] failure"? It appears you have no defence for choosing dynamic SQL with filters over PreparedStatements. Dynamic SQL with filters is grammatically more complex to define and audit (as shown on this page) and at greater risk of security issues. If we eliminate grammar complexity as a concern by defining a common API for dynamic SQL and PreparedStatements, dynamic SQL is still riskier than PreparedStatements because it's forced to rely on filters. Where is "[my] failure", here?

Bull! You did not show they "were grammatically more complex". API design is a personal preference and orthogonal to PS usage. Were are going around in circles again. I'm hopping off the Repetition Express.

The top of this page shows that raw dynamic SQL definitions are grammatically more complex than PreparedStatement definitions, using typical (PHP, in this case) DBMS APIs. If you disagree, please demonstrate otherwise.

I already told you twice. Take some medication for your Alzheimers, it's getting really fucking annoying to have to repeat the same shit over and over and over.

You may have "told" me, but you haven't demonstrated it. I don't see a counter-demonstration of grammar complexity that demonstrates how PreparedStatements are grammatically more complex or dynamic SQL with filters is simpler. Of course, it would be moot anyway -- even if dynamic SQL with filters were obviously simpler, who would so negligent as to risk client security for programming simplicity?

The opening examples could be used for either PS's, filters, both, or neither. You "grammatically measured" something orthogonal to the PS/filter decision.

How so? The first example is of PreparedStatement definitions, and the second is of dynamic SQL. How could they "be used for either PS's, filters, both, or neither"?

Perhaps I measured nothing, but YOU also measured nothing (on topic/relevant). I've said this already multiple times repeatedly redundantly, but it doesn't sink in for those with stubborn dig-in-heals personalities. (And there are other issues with actually applying your grammars {cough} to real tools and WetWare, but those are secondary.)

Why do you think the examples are not "on topic/relevant"? As for applying my "grammars {cough}" to real tools, I've used exactly the approach described at the top of the page to build auditing systems for real code, so I'm not sure what your "other issues" might be. Could you explain?

They are NOT relevant because the top examples are a syntax/API choice that is orthogonal (independent) of the choice between using PS's or filters. You are arguing code design choice, not PS's versus filters. Either can use either style (or other styles, since your choices suck, but that's another story).

What the examples illustrate is a fundamental difference between PreparedStatement definitions and dynamic SQL definitions: The former should never contain references to external data. The latter must contain reference to external data. This is true regardless of syntax/API choice, up to the point that we define an API that unifies PreparedStatement and dynamic SQL definitions. Of course, in that case, the definitions should never contain references to external data.

                             <?php
                               // Example-1C
                               $snoffClauseStatus = "s = 'available'";

$snoffStmtDef01 = "SELECT * FROM piddle WHERE p = ?"; $snoffStmtDef02 = "INSERT INTO fizzle(x, y, z) VALUES(?, ?, ?)"; $snoffStmtDef03 = "SELECT * FROM piddle WHERE " . $snoffClauseStatus . " AND p = ?"; $snoffStmtDef04 = "SELECT * FROM furgle WHERE " . $snoffClauseStatus . " AND q = ?"; ?>
. Groundhog day on the automated tools. Such automated tools would have to recognize functions IN GENERAL because code may be in functions for multiple purposes, which may have nothing to do with databases, correct? Typical CRUD code I see has roughly a third of the code in the MAIN function or equivalent. That means a parser will have to recognize functions and parameters 2/3 of the time. Remember the "derail" argument above? Did you forget about the derail argument? Why are you ignoring it now?

Your "derail argument" was invalidated. Secure applications do not pepper the application code with SQL. Database abstraction layers isolate the application layer from the DBMS, and SQL definitions are confined to modules in the database abstraction layer in order to reduce the likelihood of tainting them with external data. As noted above, "In PHP [SQL definitions] can (like with most other languages) be confined to their own source files so I don't have to process functions, modules, and variable and naming scopes". In other words, automated tools do not have to recognise functions "IN GENERAL". As I wrote at the top of the page, "the code is highly constrained" -- and deliberately so.

That's a code design choice orthogonal to PS's. I'm not going to argue general app code design here. Note that most shops I've been around use Stored Procedures if they truly want such strong separation, not Prepared Statements. If you TRULY want to force-separate the SQL from the app, Stored Procedures make it easier because the SQL then does not come from app code and is not inadvertently mixed in. One can even set database permissions to prevent the app from running non-stored-procedures, THEN you don't need to write code parsers to check for rogue SQL. Typically you'll see something like this in the "DBA panels":

 Allow this account to:
  [X] Run stored procedures
  [_] Run SELECT statements
  [_] Run UPDATE statements
  [_] Run DELETE statements
  [_] Etc...
In summary, if you want solid SQL-from-app separation, then using stored procedures makes it so that you don't need to write parsing-based code validators. The DBMS policy settings prevent rogue SQL alone (assuming they don't use an unknown database account). You are using the wrong tool for the job, based on the given "protection" requirements. -t

Stored procedures are certainly reasonable on the DBMS side, but they have no impact on what SQL appears on the client side, or where it appears on the client side. Whether you're invoking PreparedStatements, StoredProcedures, or dynamic SQL, secure applications do not pepper the application code with SQL.

Regardless what server-side constructs you use, separation between the DBMS interface (e.g., PDO, JDBC, ODBC, whatever) and the application code is still necessary on the client side. If you've seen shops using StoredProcedures to improve security and enhance separation and security on the server-side, that's not surprising -- sometimes StoredProcedures are entirely appropriate, especially if there's more going on than just invoking a parametric query. However, if we're only invoking a parametric query, PreparedStatements are the latest construct for ensuring security of parametric (and improving performance of repeated execution of most static) queries -- without the overhead of StoredProcedures -- but it sometimes takes a while (often decades) for conservative shops to catch up.


I'm still puzzled on how Php's PS's tell the type of the parameter. I suspect it either uses the SQL's syntax and schema info, or the variable's type indicator (seen through getType()). PDO does have an explicit type version that's closer to Microsoft's:

http://www.php.net/manual/en/pdostatement.bindvalue.php

Relying on a variable's type indicator does not seem like a very good idea to me.

Why not? Type mismatches in PreparedStatement parameter arguments aren't a security risk, they only -- at worst -- result in the query failing to execute or returning incorrect results.

Who said security risk is the only factor to judge them on?

If we can create an API that is equivalent for both dynamic SQL and PreparedStatements -- as discussed above -- then the only factor left to judge them on is security risk.

Okay, but that's already covered in prior topics. Let's both resist the urge to repeat prior battles here. I guess I am complaining about your choice of API or API usage. The type tag (indicator) is a crappy way to manage types, and I'd like to see them deprecated or fall out of style. Perhaps my anti-tag rant here is partly off topic. I say "partly" because I don't want to compare bad ideas to other bad ideas if syntax/grammar/language-design issues affect our metrics.

I can appreciate that, and some PreparedStatement APIs do not require explicit type specifications. However, if you're creating dynamic SQL, you must always specify filters for external data to (hopefully) prevent SQL injection. What's the difference between specifying a dynamic SQL filter like sqlStringFilter() or sqlNumberFilter() or sqlDateFilter(), and specifying a PreparedStatement's parameter type?

I don't understand why you are asking that question. The app code can be nearly identical, per above, depending on API design. The syntax/design choice is orthogonal to PS-ness.

Indeed, that's my point. Whether you're specifying "Number" as a filter for dynamic SQL or a parameter type for a PreparedStatement, it's the same.

I don't know what your point is.

You wrote, "The type tag (indicator) is a crappy way to manage types, and I'd like to see them deprecated or fall out of style." It suggests that one of your objections to PreparedStatements is their frequent use of ManifestTyping, i.e., explicitly specifying parameter types like "Number", "String", "Date", etc. However, if you're using dynamic SQL with filters, you also must explicitly specify parameter "types" in the form of filters that ensure external data is actually a Number, String, or Date and not an attempt at SQL injection. So, with PreparedStatements you often (but not always) have to explicitly specify parameter types. With dynamic SQL, you always have to explicitly specify parameter (i.e., external data) "types" (i.e., filters).

It's perfectly possible for filter wrappers to get type info from Php's type tags also, like the PS libraries appear to do (as one possibility), but I consider it bad practice, at least if SQL is the target query language.

It's not really possible, because PHP's built-in types aren't fine-grained enough to reliably select appropriate filters. You'll inevitably need a "Date" filter, but PHP doesn't have a "Date" type.

How do the PDO API's do it then? There's something odd going on. There are some databases that will process numeric literals with quotes around them, so maybe that's what's happening, but it's not universal across DB vendors.

Noting that a PDO prepared statement is a wrapper around a SQL PreparedStatement, PDO prepared statements work like this: When a PDO prepared statement is defined without explicit parameter types, it defers creating the SQL PreparedStatement until the first attempt is made to execute it. Upon execution, the PDO prepared statement's argument types are used to define the underlying SQL PreparedStatement and its parameter types, and then the SQL PreparedStatement is executed. This means the developer need not explicitly specify parameter types, but it has the downside that SQL errors -- if any -- are not reported until the first attempt to execute the PDO prepared statement.


The vast majority of the time, form-related or typical input-related validation is required anyhow such that the filter calls are often integrated with the validation API/framework. For example, we'll know that a given variable is to be validated as a "date" and thus the filter call is factored to a date validation handler instead of repeated for each date. Validating numbers, dates, SSN's, ID's, etc. typically will be checking for the same characters/patterns that anti-injection filters will be checking for.

That is true, but how is it relevant?

One does not need an explicit filter call for each column. It may matter depending on how "complexity" or "app code size" is measured. Thus, in practice, the filter code will usually not look like the opening examples but rather be integrated with validation. Automated code validation would then want to make sure that every "input field" goes through the standard/sanctioned validation API as apposed to "rogue" SQL creation. The same would apply even if using PS's because mere usage of PS's does not by itself stop one from directly concatenating "values" to SQL.

Sorry, I'm not following you. Could you provide an example?


Under both kinds of approaches, the code checker seems to pretty much have the same job either way: make sure the API calls that register or run SQL statements are not fed variables that get outside or suspicious values. Generally there will be one or more "statement oriented" methods and one or more "column oriented" methods (or parameters) in the query API used by the app code. The code checker has to make sure that the statement-oriented methods (or parameters) are not fed variables/values that come from outside (or from suspicious sources). It's okay if the column-oriented methods get outside info. Now, whether there are functions in the app code that the checker checks depends on the API design and app coder style, NOT on whether PS's or filters are used under the hood.

What the grammar examples at the top of the page demonstrate is that a code checker does not "have the same job either way". Checking dynamic SQL requires verifying that a SQL string is constructed from static strings and filtered external data. Checking PreparedStatement definitions is fundamentally simpler, because it need only verify that PreparedStatement definitions are constructed solely from static strings.

I don't know what you mean by "'column-oriented' methods in the query API", "statement-oriented methods", or "PS's or filters". Could you give an example or define your terms?

No. Not today.

Do you expect your claims or terminology to be taken seriously if you're not willing to defend or define them?

I believe what I have already written is sufficiently explained that most IT-experienced humans will know what I am talking about. They will know that you are not measuring what you claim to be measuring. You just have personal mind-blocks due to bias and stubbornness, and I don't feel like babysitting willfully stubborn mental patients today. In short, LetTheReaderDecide.

What do you think AdhominemAttacks do for your credibility?

I honestly believe you are nuts. However, I'm also a nut for arguing with a known nut. Maybe we can get a group discount on therapy. --top

That's just more AdhominemAttacks. If you can't defend your position rationally or logically, have you thought about reconsidering your position?

They are rational and logical. You are just stupid and read it wrong due to bias or insufficient real-world experience to understand how real shops operate.

How is "I don't feel like babysitting willfully stubborn mental patients today" a rational and logical argument in favour of your position?

{ A reader (me) decided that Top is a Crank. http://en.wikipedia.org/wiki/Crank_%28person%29 }

Projection. And at least I keep my AdhominemAttack's relatively compact instead of copy and paste lists. But then again, you seem to like repetition. -t

We have no way of knowing whether the author in question -- i.e., the one who wrote "A reader (me) decided that Top is a Crank" and pasted the list of 'Crank' attributes -- likes repetition or not, as he/she has posted on this page for the first time. I didn't post it.

And if you cannot realize that Example-1A and Example-1B are NOT inherently hard-wired to PS versus filter usage under the hood, then you are truly delusional. It's fucking obvious to normal coders. Something is fucked in your head. I suspect your stubbornness against admitting you are wrong is blinding you. Your ego writes checks your logic cannot cover and you then quibble and obfuscate over irrelevant issues to hide the fact you got it wrong. The two examples are ORTHOGONAL to PS-ness, SUCK IT, WRONGBOY! -t

They're "hard wired" to PreparedStatement vs dynamic SQL usage specifically because the examples say that's what they are. The PreparedStatement definition examples demonstrate text intended -- by the example itself -- to be sent to the DBMS via something like PHP PDO's 'prepare()' method. The dynamic SQL definition examples demonstrate text intended -- by the example itself -- to be sent to the DBMS via something like PHP PDO's 'exec()' method.

I challenge you to prove that Example-1A inherently CANNOT be used with filters alone. Dare ya dare ya! -t

That's as silly a challenge as asking me to prove that Example-1A inherently CANNOT be used as a question in an on-line quiz program about SQL, or in a game to teach newbies about SQL. Of course the strings in the examples could be used that way or in innumerable other ways. But, the examples themselves state that they're PreparedStatements and dynamic SQL, so they're not quiz questions, they're not game elements, and they're not strings that are deliberately made to look like PreparedStatements but are actually turned into filtered dynamic SQL by a hypothetical database abstraction layer. If they were any of these things, I would have said so. The examples explicitly state that they're PreparedStatements and dynamic SQL, which have clear and recognised definitions. The obvious inference -- reinforced by stating it explicitly, multiple times, above -- is that they're sent to the DBMS as-is.

No no no, we can also use the code in Example-1A for the equivalent queries and injection inspections for filters. And there is nothing wrong with making API's "deliberately made to look like PreparedStatements" if that makes code inspection easier for humans and/or machines per shop preference (naming aside). (I'm not making a design style/grokkability value judgement here because that's not the issue.)

As pointed out above, if that's what is being done, I would have stated it. As stated, the presumption is that the strings containing SQL are being sent to the DBMS as-is. At some point, that will have to occur, regardless what database abstraction layers are used. At some point, a programmer must make a decision as to whether PreparedStatements or dynamic SQL are used -- whether inside a database abstraction layer, or at an application programming level -- and at that point the considerations shown at the top of this page are relevant.

That's my very point: what your inspect-a-tron is looking at and what your complexity metrics are measuring is an arbitrary API design choice, NOT something inherently tied to PS's versus filters. That should be butt-clear to everybody in town. Your "metric" is like measuring bicycle speed by measuring the rider's shoe size: shoe size and bicycle speed are weakly related at best.

And I realize your PARTICULAR example is meant to be for PS's. But I've already agreed that that PARTICULAR scenario would score better in your grammar complexity metric. But a sample size of one tells us nothing GENERAL here. I've pointed this out already, but your thick skull ignored it. Certain specific API design choices score higher than others. That's not news.

I'm not sure what you mean by "a sample size of one". ODBC, JDBC, Perl DBI, PHP PDO, various native APIs -- all of which directly accept strings containing SQL like those in Example-1A and Example-1B -- are commonly used, including inside the database abstraction layers which you insist have some relevance. Therefore, the scenario at the top of the page is relevant whenever choosing between PreparedStatements and dynamic SQL, even if that decision is only visible in side a database abstraction layer.

Like I keep pointing out, your particular example is one of a gazillion ways to do it. If such issues are hidden at the low level, it's not something the code inspector would bother with and thus not affect the grammar score. And we are not measuring "commonality", as I keep pointing out on deaf ears. I'm not challenging typical usage pattern claims, only what the laws of logic force or don't force.

The point of this page has been to prove that PreparedStatement SQL definitions are inherently simpler than dynamic SQL. That has been proven. It doesn't matter whether that impacts the application developer or the API developer who is creating middle-ware, or not.

Filter-based API's can cross-dress to look just like out-of-the-box PS API's making your point moot.

As presented by the DBMS, PreparedStatement definitions and dynamic SQL are fundamentally different, and that has a direct impact on the client-side code complexity at the interface to the DBMS. That high level APIs can perform code translation or whatever is irrelevant; we're talking about definitions of SQL code at the interface to the DBMS.

You have yet to demonstrate a "direct impact". It's clearly orthogonal to me because those are details at an abstraction below application-level code checking and both styles can be swapped for another from the app's perspective. There's a missing link in your "logic".

The "direct impact" is what's shown at the top of this page: It's simpler to verify that PreparedStatement definitions don't contain external data than it is to verify that dynamic SQL contains filtered external data. We're not talking about what some higher-level API may do; we're talking about PreparedStatement definitions and dynamic SQL as sent to the DBMS, which is ultimately the only place where the result of client-side database security and auditing is relevant.


You still have failed to show that Example-1A could NOT be used with filter-based systems to produce the same DB results (data changes or output).

That's like asking me to prove that Example-1A inherently CANNOT be used as a question in an on-line quiz program about SQL, or in a game to teach newbies about SQL. Of course the strings in the examples could be used that way or in innumerable other ways. But, the examples themselves state that they're PreparedStatements and dynamic SQL, so they're not quiz questions, they're not game elements, and they're not strings that are deliberately made to look like PreparedStatements but are actually turned into filtered dynamic SQL by a hypothetical database abstraction layer. If they were any of these things, I would have said so. The examples explicitly state that they're PreparedStatements and dynamic SQL, which have clear and recognised definitions. The obvious inference -- reinforced by stating it explicitly, multiple times, above -- is that they're sent to the DBMS as-is.

Thus, it still stands that the choice between using Example-1A and using Example-1B is not dependent on PS's versus filters and thus any differences in them are not due to PS-versus-filters and thus their grammar scores are not related to PS-versus-filters. -t

There is no "choice between using Example-1A and using Example-1B". They exist only to prove that PreparedStatement SQL definitions are simpler than dynamic SQL under the conditions described in the scenario. Various intermediate APIs, middle-ware, deception, translation of strings in one syntax or another, etc., are not part of the scenario and are therefore irrelevant.

You are measuring something that can be abstracted away, and is thus not forced on the developer nor code checkers. You may have won one battle, but you lost the general abstraction war. Now, go waddle back to your family with cannonballs in your ass, LOSER!

What do you think AdhominemAttacks and red-neck insults do for your credibility? As for what is "forced on the developer" or "code checkers", actually I am measuring something that some developer or code checker must deal with: It's imposed by the DBMS -- and therefore reflected in the definitions of the SQL code sent to the DBMS via the DBMS API -- which some developer must deal with. However, that developer is not necessarily you; the code checker is not necessarily yours.

Why should anybody care about metrics on YOUR arbitrary choice? It only means something to you. It tests one approach out of many, but we cannot say anything general based on one sample. 1 down, 38,282,728,402,183,478,092,893,782 API designs to go. Now git back to work, Example Boy!

Whether it's visible to the application programmer or only visible to the API developer, the fundamental difference in complexity between PreparedStatements and dynamic SQL is always there, because it's imposed by the DBMS and therefore must be considered at the interface to the DBMS. That's what this page proves. What the proof provides is another point of evidence that there is no reason to favour dynamic SQL with filtered external data over PreparedStatements.

Your example does NOT demonstrate that because it's orthogonal to how PS's work under the hood. If using PS's imposes some kind of interface constraint, your particular choice of example does NOT demonstrate that. You need to rethink your "proof", something along the lines of:
  If guts use PS's, THEN App-side interface must follow pattern X
  If guts use filters, THEN App-side interface must follow pattern Y
Why? That's not what this page is about. This page simply proves that it easier to verify that PreparedStatement definitions don't include external data than it is to verify that dynamic SQL includes filtered external data.

"Dynamic SQL" is not well defined. Anyhow, I'm tired of this repetitious dance and won't get pulled into your word games. I'm confident most unbiased readers will agree that how the app coder interacts with the DB is fairly arbitrary such that you are measuring a specific and arbitrary API design rather than something INHERENT to PS-versus-filters. You think God holds a gun to their head and forces them to use an out-of-the-box API? What crazed religion do you belong to? You read the Book of Morons by Joseph Meth? The styles in the intro examples are swappable between PS and filters, proving it irrelevant and arbitrary and smart normal people will agree. You measured an arbitrary UselessTruth, at best. You are off-topic but are too stubborn to realize/admit that. Chant it over and over and might click in your stubborn little pea brain:

 BEGIN CHANT
 Arbitrary and Swappable. Arbitrary and Swappable. Arbitrary and Swappable. 
 Arbitrary and Swappable. Arbitrary and Swappable. Arbitrary and Swappable. 
 Arbitrary and Swappable. Arbitrary and Swappable. Arbitrary and Swappable. 
 Arbitrary and Swappable. Arbitrary and Swappable. Arbitrary and Swappable. 
 Arbitrary and Swappable. Arbitrary and Swappable. Arbitrary and Swappable. 
 Arbitrary and Swappable. Arbitrary and Swappable. Arbitrary and Swappable. 
 Arbitrary and Swappable. Arbitrary and Swappable. Arbitrary and Swappable. 
 Arbitrary and Swappable. Arbitrary and Swappable. Arbitrary and Swappable. 
 Arbitrary and Swappable. Arbitrary and Swappable. Arbitrary and Swappable. 
 Arbitrary and Swappable. Arbitrary and Swappable. Arbitrary and Swappable. 
 END CHANT
They're not "arbitrary and swappable" at the interface to the DBMS, and that's what matters.

And you arbitrary choose "what matters". If we "must" check all low-level API's also, then all practical low-level API's will have functions/methods, and thus jack up your grammar score: pick your poison because you lose either way, Hallucitor.

(I bet right now you are cooking up new ARBITRARY rules on what parts of API's "require" checking and which don't. I know your patterns by now, Hallucitor.)

You seem to have misunderstood the point of this page. The "grammar score" proves that it's simpler to verify that PreparedStatement definitions don't include external data than it is to verify that dynamic SQL includes filtered external data. That doesn't mean you have to use a grammar to implement some code checker, and it doesn't say anything about "what parts of API's 'require' checking and which don't". It only means it takes more effort to verify filtered dynamic SQL is secure than to verify that PreparedStatements are secure, regardless how PreparedStatements or dynamic SQL are generated.

I'm only concerned with INHERENT PS-versus-filters issues. If you invent a side topic about something you call "dynamic SQL metrics", that's fine, but it's a WalledGarden issue that only you and your weird buddy cares about and has no bearing on inherent PS-versus-filters issues. If you demonstrate that "It's harder to do Goofy-Thing-7 than to use out-of-the-box PS API's", that may indeed be true, but why anyone should care about doing Goofy-Thing-7 when it is not forced upon them, I have no clue. It's out of the blue and relates to nothing that came before that I can see other than being one possibility of API design among many others.

An inherent PreparedStatement-versus-dynamic-SQL (aka PS-versus-filters) issue is that PreparedStatement definitions must not include external data, but dynamic SQL must include (filtered) external data. It's imposed by security requirements and the DBMS. This page merely discusses one of the implications of that.

I still don't know what you mean by "dynamic SQL". Both type of sub-systems are fed essentially the same thing such that it does not matter if what they are fed is "dynamic" or "static". A classification of "dynamic" versus "static" (or whatever) would NOT affect what an app code inspector would check. The code checker would not do anything different to a string designated as "dynamic" versus "static" such that such a classification is irrelevant.

"Dynamic SQL" means SQL query text is generated at run-time from variable data, such as user input. Contrast that with PreparedStatements, where variable data such as user input is passed as parameter arguments when a predefined PreparedStatement is executed, but the PreparedStatement definition is (or should be) static. What an "app code inspector" -- such as that described at the top of the page -- can check is that PreparedStatements (as defined at the top of this page) are defined entirely and solely from static strings.

However, keep in mind that such an "app code inspector" is not a requirement. It's possible to use it if you're defining PreparedStatements like those at the top of the page, but nothing stops you from defining PreparedStatements or dynamic SQL some other way. The whole point of this page is to show that it's easier to verify that PreparedStatements haven't been tainted by external data than it is to verify that dynamic SQL has been constructed at run-time using strictly-filtered user input. The "app code inspector" described at the top of the page is merely intended to demonstrate that difference in complexity. What's crucial is not the use (or not) of "app code inspectors", but that PreparedStatement definitions be checked (somehow) for tainting by external data, or that dynamic SQL be checked (somehow) to make sure external data is filtered. What this page proves is that the former will always be simpler than the latter.

Example-1A could be used with either PS's or filters and the code checker would process it exactly the same and give the same score REGARDLESS if you personally call a string "dynamic" or "static" or "fliffernobb". It's just a word.

If Example-1A is used with "filters" (and it isn't, because I said so -- but let's run with your argument for a moment) then I presume you mean PreparedStatement-like syntax is converted to dynamic SQL with filtered external data, such that no actual SQL PreparedStatement is involved. If that's the case, then it's not a PreparedStatement, it's dynamic SQL. Notably, it's more complex than a real PreparedStatement -- and more complex than plain dynamic SQL -- because you have introduced the additional complexity of the mechanism that translates the PreparedStatement-like syntax to dynamic SQL.

And please don't say "but I intended Example-1A to be used only with PS". That's irrelevant because we are not considering ONLY your preference of API usage. You are not the center of the universe. We are not testing JUST your preferences or coding style because the claim is not limit to a specific style as given (or at least it shouldn't be if you want your "evidence" to mean anything important and general. I won't challenge your specific preference, but I don't have to use YOUR personal preference in my shop such that it means shit to me.)

I have expressed no preference of "API usage" or anything else. Why do you think I have? The only thing this page shows is that of the SQL sent to the DBMS, it takes more effort -- somewhere, by some developer -- to verify that dynamic SQL is consistently filtered than to verify that PreparedStatements do not involve external data. This is true independent of what "API usage" is (or is not) employed to generate either PreparedStatements or dynamic SQL.

The code it would verify looks exactly identical to me if one so chooses to use the API style shown in Example-1A regardless of whether PS or filters are used under the hood. If there is some OTHER code somewhere else full of Dark Complexity or whatnot, please show it. I can't read your fucking mind; you have to demonstrate with text and code where this alleged complexity is lurking and why it's allegedly in your metric scope (why it's being counted).

The "OTHER code somewhere" is the code inside some middle-ware API that is translating Example-1A's PreparedStatement syntax to dynamic SQL strings using filters, before being sent to the DBMS. If Example-1A is used to construct PreparedStatement SQL strings, then it can be sent to the DBMS as-is via low-level APIs; no translation or middle-ware API is needed. Hence, the overall process of constructing actual PreparedStatement SQL strings is obviously simpler -- and therefore easier to validate -- than using a PreparedStatement-like syntax to create dynamic SQL.

You seem to be saying if a vendor creates them, they are "low level" (and not counted in metrics) but not if a shop creates them? I consider what the vendor creates "middle-ware" also. A vendor could in theory create filter-wrapping API's that resemble typical PS API's also. That's just packaging, not an INHERENT difference in PS's versus filters. What's at the shop-side versus what's done by the vendor is not inherently dictated by PS versus filters. That technology choice does not hold a logic gun to a vendor's head to alter their behavior.

Vendors probably don't make PS-like API's without PS's because positional parameters with lots of columns are awkward to use such that there is no demand: it's not a default preference, and thus the existing market pattern has little if anything to do with "complexity". Vender middle-ware versus shop middle-ware is NOT an INHERENT feature of PS's versus filters. Focus on that word "inherent" so that you don't get side-tracked by irrelevancies again. If filters or PS's FORCE vendors to split things a certain way, then you may have a point. But I don't see force-ness. You mistake population samples for inherent traits it seems. If there is something about filters that absolutely and universally FORCES vendors and/or shops to partition query-related API's a certain way, please identify these absolute rules and prove they are absolute. And remember, I don't care about market habits or preferences. We are not measuring marketing preferences here: something commies, capitalists, anarchists, and space aliens all have to do related to PS-versus-filters.

When you measure and compare, please make double sure you are comparing something INHERENT to PS's versus filters; NOT industry preferences or habits and NOT your personal preferences.

[The comparison shows an inherent difference between prepared statements and dynamic sql. The ability to hide that difference behind an API doesn't make that difference any less inherent. If you think it does, you should reck your own rede and learn what inherent means. (Hint: it does not mean "forced".)]

You invent your own personal definition of "inherent" just like you did with "value". Lovely, Hallucitor. I don't give a shit about characteristics of "dynamic SQL". They are not directly related to the prior PS-versus-filter topics, but rather some screwy side-show given by a crazy clown. The bottom line is that a query API that uses filters under the hood CAN use Example-1A -style application code and that a "poor score" for the alternative 1B says nothing inherent about filters. The metric and example, as given near the top, says nothing inherent about filters. We were talking about tire A versus tire B on cars in the topic that spawned this one, but you went off and measured ducks instead. Your duck conclusions are a UselessTruth at best.

[I'm using the definition in the dictionary. You, if you think it means "forced", quite clearly are not. The previous topics have always been prepared statements vs. dynamic sql. Filters only come into play because they are used to get some semblance of the security inherent in prepared statements. The rest of your rant is an AdHominem. It helps your case not at all.]

Example-1A also uses "concatenation" for SQL and thus your (stated) PS example also uses "dynamic SQL" such that using dynamic SQL is not a difference maker between the two examples. (See PageAnchor Dynamic-SQL-Def-01)

[You clearly didn't bother to finish reading the definition. If you had you would know that it also has to be sent to the "DBMS connection's "execute query" function". The statements in Example-1A cannot be sent to that function because, as far as that function is concerned, they contain syntax errors.]

PS's also have something equivalent, they just use more buffering or caching-like techniques tacked on to it.

Could you explain what you mean by that, and how it's relevant?

Please re-read the definition of "dynamic SQL" from PageAnchor Dynamic-SQL-Def-01, particularly: "By 'dynamic SQL', I mean constructing whole SQL queries by string concatenation, in which a SQL query is created as a text string composed of static text ... and optional static string constants ... and optional external data filtered to hopefully prevent SQL injection ..." PreparedStatement definition strings -- as demonstrated in Example-1A -- should only be constructed from (concatenation of) static strings. The Example-1A grammar takes that into account. Dynamic SQL strings should also be constructed from (the concatenation of) static strings, but must include concatenation of filtered external data. The Example-1B grammar takes that into account. That difference -- that dynamic SQL must include filtered external data -- is why dynamic SQL is more complex than PreparedStatements.

Perhaps you are implying that "concatenation = complexity??? It's such horrid writing it's hard to know. And I still don't know what that "should" shit is about: that's human policy issue, not a technology issue. If you mean a goal, then state it as an assumed goal, not as a human policy.

Of course it's a "human policy issue" and an "assumed goal". Avoiding SQL injection is obviously a human policy issue and an assumed goal. Your computer doesn't care whether SQL injection occurs or not, and concatenation is obviously more complex than not-concatenation. Filters are more complex than not having filters, too.

Okay, but that's a general goal; I don't see the relationship between it and "dynamic SQL" as a general concept. You are mixing fruit and ducks.

The general concept is that PreparedStatement definitions -- in order to be secure -- shouldn't include external data and (therefore) don't require filters, but dynamic SQL requires filtered external data, so the former is simpler than the latter. It's hardly "mixing fruit and ducks", because they're both defined as SQL that's sent to a SQL DBMS for processing.


Application Scope

I'm going to "cut off" my concern at the application level. Whether PS-related middle-ware (query API's) are "less complex" than filter-based API's, I will not debate. I believe you are wrong on the middle-ware level, but will not challenge it for now. I will focus on application-side checking/complexity, and that's it. If you want to debate the middle-ware level and below, that's fine, but you will be debating with yourself.

And note that it's not the case that "everything can be hidden away into middle-ware". If the middle-ware has application-specific concerns, such as UI form or data column names hard-wired into it, than it doesn't qualify as "middle-ware" to me here. I'll also exclude UI issues for clarity (other than maybe error messages if there is a query problem).

--top

You appear to be constructing a StrawMan here. Middle-ware with "application-specific concerns, such as UI form or data column names hard-wired into it" doesn't sound like middle-ware to me, either. It sounds like part of the application.

Regarding middle-ware, having it is certainly more complex than not having it. If PreparedStatement-like syntax is being translated by middle-ware into dynamic SQL with filtered external data and passing it to the DBMS, that's obviously more complex infrastructure than not having any middle-ware and passing PreparedStatement definitions directly to the DBMS.

The PS API's included with say PHP are middle-ware. Thus, if you are using PHP's PS interface API's, you are using middle-ware. ODBC drivers are also middle-ware.

[He might also be constructing a NoTrueScotsman type argument. One where any API that doesn't hide the difference between PreparedStatements and dynamic SQL is declared middle-ware and therefore not truly on the application level.]

I never claimed that. I would note, though, that the examples YOU gave could be made to work either PS's or filters.

To ensure that Example-1A can be "made to work" with either PreparedStatements "or filters", there would have to be middle-ware that translates the PreparedStatement syntax into dynamic SQL (constructed with filtered external data) before sending it to the DBMS. Of course, that would mean Example-1A isn't what it states it is -- an example of PreparedStatements -- but is actually an example of PreparedStatement-like syntax used to create dynamic SQL. That's obviously more complex than sending the PreparedStatement definitions directly to the DBMS without translation, which is what Example-1A is intended to illustrate. Hence dynamic SQL aka "filters" is still more complex, in terms of code requiring security auditing (the discussion of which is what spawned this page), than PreparedStatements.

Like I keeeeeep pointing out, your example is only a "spot sample". I know what you intended in that PARTICULAR example, but it does not say anything GENERAL about PS-versus-filters because it's only a spot sample. I'm looking at generalities, NOT just your spot sample. I don't dispute your analyses for the spot sample. There is no disagreement for that specific particular one single case. And if you are looking at the overall complexity, not just in specific levels (backend, middleware, app, etc.) then you need to be fair and consider all the complexity needed to implement PS's also. If you play the "deep card", then PS's are also subject to the same deep-card rules and get counted in the complexity score. You seem to be arbitrarily cherry-picking what levels you count and don't count to rig the score toward your (wrong) claims.

No, my example isn't a "spot sample". It's illustrative of the general difference in complexity between validating PreparedStatements vs dynamic SQL in every system in which PreparedStatements vs dynamic SQL is a possible consideration. It's of concern to anyone who writes PreparedStatements and/or dynamic SQL on the client-side of a DBMS, which is (for example) most people who develop mobile applications or object-relational mappers or cloud-based applications or custom business applications or database-driven websites.

As very, very few people write DBMSs (I am one of them; I can count the others whom I know of on two hands), the relatively complexity in implementing PreparedStatement parsing and execution vs dynamic SQL execution is irrelevant. (As an aside: Once you're able to produce a parse tree or query plan from SQL, adding support for PreparedStatements is a negligible additional step. If you've designed your DBMS well, it practically falls out.)

You keep claiming it illustrates something general, but give a poor argument for that. To me it is clearly is an arbitrary API choice that is not directly dependent on whether PS's or filters are used under the hood. The alleged dependency between the API style and PS-versus-filters is a "missing link" in your explanations. We seem to be going around in circles and I should give up trying to figure out what hell you are talking about, but so far I can't kick the habit of replying. It's pretty clear our world view of software and abstraction and WetWare is so very different that we don't have enough common ground to use as a bridge between our minds. Seemingly strait-forward words mean something to you so different than what they mean to me. It's like working with an alien from Kolob.

In your work creating custom business applications, don't you frequently construct queries that involve external data?

Yes, and?

Don't those queries often look like the ones in Example-1B?

More often than 1A, yes, but again, we are not measuring on preference or popularity such that frequency doesn't matter anyhow. Actually, I tend to combine the form render and/or validation routines into value-filter-checking such that I tend not to use either of those two styles in practice. (Note: You asked for an example of such combining technique before, and I've yet to supply one.)

That's fine. Do your query definitions consist of static strings concatenated with filtered external values?

Query definitions? Static? It's often a dynamic language; there is no "static". I'm not sure what you mean. Concatenation of filtered "values" with SQL strings eventually happens along the line, but the app developer may not do the actual concatenation (depending on API design choices and special needs).

In a query definition like '"SELECT p, q FROM z WHERE p=" . sqlString($_POST['userinput']);', "SELECT p, q FROM z WHERE p=" is a static string whether the language is dynamically typed or not. Do you not have those?

I'd call it a "literal string".

That's a reasonable thing to call it, but at run-time you can't change "SELECT p, q FROM z WHERE p=" to something else. So it is static, isn't it?

It's a static string, yes.

Good. Then is it true that your query definitions consist of static strings concatenated with filtered external values, even if those query definitions are constructed inside a QBE-like mechanism?

For the most part, yes.

Excellent! Then it isn't necessarily the case that "seemingly strait-forward words mean something to you so different than what they mean to me", as you wrote above. So far, we agree.

Let's assume you have a query defined like that above:

 $qry = "SELECT p, q FROM z WHERE p=" . sqlString($_POST['userinput']);
Do you agree that it could be re-written as a PreparedStatement definition that would look like this?
 $qry = "SELECT p, q FROM z WHERE p=?";


Example 1A.2

A set of SQL clauses in PHP that use a query API that closely resembles those used for PreparedStatements (PS), but uses filters under the hood to check parameter values against injection strings.

 <?php
   // Example-1A.2
   $prepClauseStatus = "s = 'available'";

$prepStmtDef01 = "SELECT * FROM piddle WHERE p = ?"; $prepStmtDef02 = "INSERT INTO fizzle(x, y, z) VALUES(?, ?, ?)"; $prepStmtDef03 = "SELECT * FROM piddle WHERE " . $prepClauseStatus . " AND p = ?"; $prepStmtDef04 = "SELECT * FROM furgle WHERE " . $prepClauseStatus . " AND q = ?"; ?>
Note that the code is highly constrained, consisting strictly of assigning variables. Variables are only assigned string literals, or the concatenation of string literals and variables. We can express that as a grammar. Assume the existence of token definitions <VARNAME> which is a variable name (e.g., $prepClauseStatus), and <STRING> which is a sequence of any characters delimited by '"'. Other tokens are identified by their literal text.

The grammar, in modified EBNF, is:

 PHPProgram ::= "<?php" ( Assignment )* "?>"
 Assignment ::= <VARNAME> "=" stringDef ";"
 stringDef ::= ( <STRING> | <VARNAME> ) [ "." stringDef ]
Note that grammar is the same as example 1A.

Assuming you're using PHP PDO, what method would you call to pass $prepStmtDef04 to the DBMS?

That's an implementation detail; why does it matter? I don't want to dictate arbitrary implementation details in the example. Note that additional (non-PHP-library) middle-ware would likely pre-process $prepStmtDef04 to insert the parameterized values etc.

It matters because it proves my point: Dynamic SQL -- which is what your "additional (non-PHP-library) middle-ware" would have to create -- is inherently more complex than PreparedStatements. If you treat $prepStmtDef04 as a PreparedStatement, you can send it to the DBMS as-is. You wouldn't need any "additional (non-PHP-library) middle-ware" to turn strings containing PreparedStatement-like syntax into dynamic SQL because you could use them as PreparedStatements instead.

See above at "PS's require implementation complexity and filters require implementation complexity." You again seem to be arbitrarily excluding and including the various abstraction levels in your scoring per whim. Please stop doing that; it's bad form.

There's been no "excluding and including". We've consistently assumed the minimal "abstraction levels" needed to get SQL definitions from the client-side to the DBMS, i.e., ODBC, JDBC, etc. These low-level APIs must exist no matter what higher level APIs are employed, and we've consistently noted that higher-level APIs -- such as your PreparedStatement-syntax-to-dynamic-SQL-converter -- are additional system complexity on the client-side.

By the way, why would you use PreparedStatement syntax on the client-side with "additional (non-PHP-library) middle-ware" to convert it to dynamic SQL with filters and send it to the DBMS? Why not just send the PreparedStatement definitions directly to the DBMS?

That question is outside of the scope of this topic. I would note that many find the out-of-the-box PS API's difficult to use because positional parameter passing with a large quantity of parameters is an awkward UI for most humans (TooManyParameters). Sure, it can be wrapped to be more like Example-1B (without functions) or keyword-based, but then you'll penalize that stack for having fatter middle-ware.

You can define any query coding you like on the client-side. It doesn't have to look like PreparedStatements as in your example; it could be something completely different and un-SQL-like. However, to work with a SQL DBMS, you're going to have to translate that custom query coding to either PreparedStatements or filtered dynamic SQL. That is always going to involve more complexity than sending PreparedStatement definitions or filtered dynamic SQL directly to the DBMS.

As for those who "find the out-of-the-box PS API's difficult to use because positional parameter passing with a large quantity of parameters is an awkward UI for most humans", how does your PreparedStatement-like syntax -- which is translated by middle-ware to dynamic SQL with filters -- solve that problem? If we assume it's there at all, aren't you simply making it worse by creating a PreparedStatement-like API? Wouldn't it be better to create some higher-level API to facilitate constructing PreparedStatements with a large quantity of parameters? That would be worthwhile extra complexity, but converting PreparedStatement-like syntax to dynamic SQL seems like pointless extra complexity.

Sure, we can make all kinds of middle-ware to simplify various aspects of querying and/or code checking and/or code-checking grammars. I've been saying that all along. If vendors provided more API options, perhaps more co's would adopt PS's.

That didn't answer my question. Again: As for those who "find the out-of-the-box PS API's difficult to use because positional parameter passing with a large quantity of parameters is an awkward UI for most humans", how does your PreparedStatement-like syntax -- which is translated by middle-ware to dynamic SQL with filters -- solve that problem? You complain about PreparedStatements, but appear to suggest solving a problem with PreparedStatement syntax by using PreparedStatement syntax.

It's not meant to solve that problem. I was addressing your narrow "test", not solving all IT problems that ever existed. Note that it perhaps is not possible to favor automated code checkers and human coders/maintainers at the same time. These factors, at least, are being juggled against each other:

My "narrow 'test'" (by which I presume you mean the grammar comparison) is part of an informal proof; it's neither a coding necessity nor an attempt to solve "all IT problems that ever existed". Whilst it may (or may not) be reasonable to consider your list of factors "juggled against each other", they're irrelevant here. This page exists only to confirm the intuitive observation that it's easier to verify that PreparedStatement definition strings don't include external data than it is to verify that dynamic SQL strings contain filtered external data. Mechanisms that translate PreparedStatement-like syntax into dynamic SQL are either irrelevant if we treat them as PreparedStatements because they're not PreparedStatements, or add obvious complexity to creating dynamic SQL if we treat them as a syntax for creating dynamic SQL.

I disagree with this repetitious summary of already stated repetition, but it appears you play word-games with "dynamic" and shell-games with "complexity" excluding certain levels to rig your metric as you please.

Can you show where I have played "word-games with 'dynamic'" or "shell-games with 'complexity'"? I've simply shown that it's easier to verify that PreparedStatement definition string constructions don't have external data than to verify that dynamic SQL string construction has filtered external data. I find it curious that you don't directly address that simple point.

I've already pointed it out above. Of course, you disagreed, and I disagreed with your disagreements, and you disagreed with my disagreement to your disagreements, etc.

Rather than use vague rhetoric like "... and you disagreed with my disagreement to your disagreements, etc", it would strengthen your argument if you referenced where I have played "word-games with 'dynamic'" or "shell-games with 'complexity'". Otherwise, we'll have to assume your comment is spurious.

You are right, ideally I should put reference markers or PageAnchors, but doing such didn't change anything in past debates except re-trigger prior sub-debates with the same arguments all over again.

Not necessarily. A reference to a prior debate, without new arguments or AdhominemAttacks, need not trigger further debate.

In theory, you are correct. In practice, things end up repeating. I think it's your fault and you think it's mine.


It's Relative

One's acceptance or not of your complexity metric depends on multiple issues. I say LetTheReaderDecide which perspective they wish to consider. Example questions:

We can't dictate to a reader what their own concerns are; we can only make recommendations, and present and explain tradeoffs as well as possible.

As long as we're asking to LetTheReaderDecide, I suggest that readers be careful not to let your attempt at molehill mountaineering distract them from the simple observation that it's easier to verify that PreparedStatement definition strings don't contain external data than it is to verify that dynamic SQL contains filtered external data. That's because PreparedStatement definitions should consist only of concatenation of static strings, but dynamic SQL must consist of concatenation of static strings and filtered external data, and that's true whether they're created inside or outside some middle-ware.

Your bullet points may make for a worthy discussion somewhere, but they're irrelevant here. Your "example questions" are akin to asking "Is it two ducks, or is it two cherries?" or "Is the 4 displayed in Helvetica or Times New Roman?" when presented with a simple demonstration that 2 + 2 = 4.

You arbitrarily declare what's relevant and what's not, based on waffling "rules" you invent in your ass.

That's an AdhominemAttack, rather than an evidence-based point. It only weakens your credibility. It does nothing to strengthen your position, or diminish mine.

It's a summary of points already made. The evidence/reasoning is at the detail.

I see. So you believe I actually "invent in [my] ass" and the evidence/reasoning for that can be found "at the detail"? You're suggesting I have some sort of... Creative miracle ass?

I meant the part of that reply that is not an AdhominemAttack.

What part of that reply isn't an AdhominemAttack?

Let me reword it in a nicer way:

"Your criteria for 'relevant' appears to be contrived and/or inconsistent, based on my best-guess interpretation of your writing".

Obviously, I disagree. Regardless, the essential fact demonstrated by this page -- that construction of PreparedStatement definitions is simpler (and therefore easier to audit) than DynamicSql construction -- still holds.

Bullshit. You cherry-picked both the filter-related API style and the level of abstraction shown and not shown arbitrarily to mislead the reader.

No, my examples are a demonstration of the inherent difference between PreparedStatement definitions and DynamicSql construction. Don't you think string construction that doesn't need filters (or much string concatenation) is obviously simpler to audit than string construction needing filters (and at least one and usually two string concatenations per item of filtered external data)?

You arbitrarily choose what to count as part of the audit and what to exclude. "Simpler" relative to what? Total theoretically complexity, total actual complexity (how parts are currently packaged & sold), complexity from a shop's (RDBMS user) perspective? The answer changes depending on what perspective is being examined.

Actually, no, the answer doesn't change "depending on what perspective is being examined". PreparedStatement definition construction, no matter how you look at it, is simpler to validate than equivalent DynamicSql construction because the former shouldn't incorporate external data or filters, but the latter must incorporate filtered external data. This boils down to a simple, indubitable relation: All things being equal -- and in terms of constructing PreparedStatement definitions vs constructing DynamicSql, they are, because they're just constructing strings -- "filtered external data" is inherently more complex than "no filters and no external data".

PageAnchor complexity-symmetry

Like I keep saying and you keep forgetting: Yes, filters will require "extra" filter-related code to serve filter-ness, but likewise PS's will require extra PS-related code to serve PS-ness. Filters don't need PS-related code and likewise PS's don't need filter-related code. You count one but make up very poor or contradictory excuses to exclude the second.

I know you keep saying it, but your claim that "PS's will require extra PS-related code to serve PS-ness" is incorrect. PreparedStatement definitions can be sent directly to the DBMS, because the DBMS is designed to recognise them and no further processing is required. DynamicSql must be constructed with filtered external data before being sent to the DBMS. Filters are more complex than no filters. QED.

That's the "PageAnchor Bundle-Argument" argument. If bundling is the big key, then illustrate bundling, not string concentration and functions-versus-non-functions etc. Illustrate the real issue, not side word games. A grammar metric is sooooo distantly related to the bundling argument as to likely confuse the reader.

The grammar metric was only intended for you. I don't know anyone else who is likely to be "confused" by the differences between PreparedStatements and DynamicSql.

Well, it's helping neither me nor the reader. I still have no fucking idea what you are talking about. You keep flip-flopping between practical concerns and theoretical concerns willy-nilly, apparently to artificially justify your contrived, round-about, and obtuse arguments by moving the goal posts. I'll try to give up for the fourth time. Must...resist...urge...to...reply...

This much is clear to me:

  IF measuring theoretical complexity THEN
    complexity score roughly the same because:
      PS's require PS-related-code and filters require
      filter-related-code SOMEWHERE in the total necessary stack.
  ELSE  // practical complexity
    complexity score depends on the concern level/scope/jurisdiction
      of a particular user/reader (per RDBMS, middle-ware,
      and app side/level).
  END IF
PageAnchor Bundle-Not-Theory

And note I exclude the "bundling argument" from the theoretical side because how vendors bundle services is a practical-level concern, not a theoretical one because no known related theory dictates what services are bundled or not bundled. There is no inherent law of the universe that says vendors must bundle PS-related services with RDBMS, and tomorrow they may decide to bundle in filter-related services also.

I find it rather baffling that you have no idea what I'm talking about, because it's quite simple: If you're creating PreparedStatement definitions, you don't need filters or external data. If you're creating DynamicSql, you need external data and filters around it. It seems trivially obvious that the former, therefore, is simpler than the latter.

Let's look at it notationally:

PreparedStatement definitions can be created from static strings -- let's call any static string an S -- and concatenation of static strings, e.g., S + S + S, where '+' means concatenation. So, every PreparedStatement definition -- whether inside or outside middleware -- is going to be S, or S + S, or S + S + S, or S + S + S + S, and so on. To save us writing enormously long expressions, we'll use (x)* to mean "x is repeated 0 or more times". Thus, a PreparedStatement definition is S (+ S)*. In other words, a PreparedStatement definition consists of a static string with 0 or more static strings concatenated onto it.

DynamicSql can be created from static strings too, but we also need to concatenate instances of external data D, and we need to filter every D with some filter function F. So, DynamicSql definitions can be S, or S + S, or S + F(D), or S + S + F(D), or S + F(D) + S + F(D), and so on ad infinitum. In general, that means DynamicSql is S (+ F(D) | S)*, where x | y means "either x or y". In other words, DynamicSQL consists of a static string with 0 or more static strings or filtered external data concatenated onto it. (We'll start with an S rather than (F(D) | S) because we'll assume DynamicSql normally starts with a static string like "SELECT", "INSERT", "DELETE", "UPDATE", etc.)

Trivially, we can see that S (+ S)* is simpler than S (+ F(D) | S)*, and thus PreparedStatement definitions are easier to audit than DynamicSql construction.

That's all this page is intended to show.

But you are focusing only on the static "portion" of PS technology. The entire stack when using PS's requires more than just the static portion because "values" must also be processed SOMEWHERE (to be equivalent in results). You are comparing apples without stems to pears WITH stems.

As I've done all along, I am only focusing on PreparedStatement definition on the client-side of the DBMS. Note the title of this page, the first paragraph, and the fact that "PreparedStatement definition" shows up on this page -- as of this writing -- at least 113 times. This page was spawned in a discussion over whether or not PreparedStatement definition construction is easier to audit for SqlInjection opportunities than DynamicSql construction.

It is in SQL construction that errors leading to SqlInjection are likely. PreparedStatement or DynamicSql execution is far, far less likely to result in SqlInjection. Furthermore, the difference in complexity at the point of execution is negligible: PreparedStatement execution requires a reference to a predefined PreparedStatement (often a name, if not disguised behind an opaque handle) and a list of arguments which do not require filters. DynamicSql execution requires a reference to the SQL to be executed. Since both PreparedStatement execution and DynamicSql construction make reference to external data, the references to external data cancel out overall. Thus, the only overall difference in client-side complexity is "no filters required" (PreparedStatements) vs "filters required" (DynamicSql).

Filters are obviously more complex than non-filters.


Here's a more accurate view:

 .........DBMS......Middle-Ware...App
 -----------------------------------
 Filters: DBE.......FM,FAPI.......AC 
 PS's:....DBE,PSM...PSAPI.........AC

AC = App Code DBE = Database engine FM = Filter-processing mechanism FAPI = API for using filters PSAPI = API for using PS's (PS = PreparedStatements) PSM = PS-processing Mechanism
Looking at the entire stack, they both have the same number of necessary "parts": four. If there is an inherent complexity difference between PSM and FM, the two key parts, you have NOT identified it.

The inherent complexity difference is the requirement for filters, which (if we're looking at the entire client-side stack; the server-side is not relevant) consists of both the filter definitions and their application. Both DynamicSql and PreparedStatements will make reference to external data (in DynamicSql such references are made at the point of definition; with PreparedStatements such references are made at the point of execution) but only DynamicSql requires that the external data be filtered.

By the way, counting architectural modules -- which is what you appear to be doing -- is meaningless in terms of measuring overall complexity, because it makes the mistaken assumption that all modules are equal in complexity.

Maybe when you say "prepared statement" you mean literally the "prepared" part of the statement, which excludes value processing. I was generally considering the full picture, not one part. If you are slicing it as such, I don't know what the equivalent "filter" version would be at this point. It's not normally thought of that way, and is an awkward comparison set to try to match up.

The examples at the top of this page illustrate that. They show the "full picture", aside from actual execution of either PreparedStatements or DynamicSql because the intent was to illustrate creation of the constructs. The difference in execution (an argument list vs no argument list) is negligible from an auditing point of view.

[Why are you counting PSM twice on the PS side? It's part of DBE. Or maybe you want to split the DBE into three parts, a core common to PS and DSQL, a PSM, and a DSQLM. In that case, you should add a DSQLM to the Filters side. You should also add a DSAPI to the Filters side that corresponds to the PSAPI.]

 [DSQLM = Dynamic SQL processing mechanism]
 [DSAPI = API for using Dynamic SQL]
PS technology requires "direct" SQL processing also. Remember, one can feed an HTML form's contents directly into a PS (without using "?" parameters). True, it's not a recommended practice, but still must be supported to match expected PS behavior and thus is counted as part of its required component stack. Thus, PS technology is essentially a super-set of what you call "dynamic SQL". One can process all "direct" SQL through the PS API's if they wanted, and perhaps under the hood it's the same processing paths in the DB engine (perhaps with some wasted overhead of caching, etc.).

[There would need to be something in the DSQLM section. It has to return a result set immediately when the statement is sent, while the prepared statement does not.]

That's a minor implementation detail. One can view a direct statement as an "anonymous PS" (like HOF's, he he), a shortcut for: run_PS(register_PS($full_sql),$empty_array); instead of the usual $handle=register_PS($full_sql);run_PS($handle,$empty_array); [pseudo-code]

[So now you're saying that the filters side in the database would be DBE, PSM, and DSQLM. That doesn't make things any better for your case.]

Say what? Both direct SQL and PS's require the ability to process a "full" SQL statement (with values) such that it's not a difference maker. However, PS's also require caching mechanisms, value-to-statement matching (tracking), and interfaces to middle-ware to access such features.

[When trying to defend excluding the DSQLM from the filter side, you created a simple DSQLM which used the PSM which used the DBE. You say that in your response that starts, "That's a minor implementation detail.". Since you are trying to measure the complexity of the whole system, you now have to include DBE, PSM, and DSQLM on the filters side since you claim they are being used.]

That's one approach, but it's not the only approach. I'm not dictating implementation here. A full PS "kit" still requires the ability to process full/direct SQL such that processing full/direct SQL is not a difference maker. Caching it under a name or handle does not change that need. (And managing the cache and names/handles is added complexity of PS's.) How one goes about the implementation is up the DB implementer. The bottom line is still the same: PS's require PS-specific complexity that filters don't need and filters require filter-specific complexity that PS's don't need. I have not seen anything new from you that changes this fact.

That's trivially true inside the DBMS and inside the low-level DBMS APIs, because PreparedStatements and DynamicSql are distinct SQL constructs. However, unless you're a DBMS developer or security auditor, it's irrelevant. This page is focused on the complexity of constructing SQL text, not on interpreting it. When constructing SQL text, the fundamental complexity difference between constructing PreparedStatement definitions and constructing DynamicSql is that the latter requires filtered external data and the former does not.

see PageAnchor complexity-symmetry

At the DBMS API level, PreparedStatements require definition and execution calls, DynamicSql requires just execution calls. Thus, the only complexity that PreparedStatements add over DynamicSql -- in the client-side query construction and execution stack, overall -- is one extra low-level DBMS API call. That's trivial compared to the complexity of filtering external data on every DynamicSql query construction plus the complexity of the filters themselves. Of course, that's irrelevant here, because this page -- as has been often stated -- is about the complexity of constructing of PreparedStatement definitions vs the complexity of constructing DynamicSql. What APIs are used is irrelevant.

"Constructing" by who or what? R2D2? I still don't know what "complexity" you are talking about; you keep waffling up and down the abstraction ladder between DBMS guts, and the application side, and every stop in between. Yes, implementing filters requires complexity, but so does PS implementation. A clear and reasonably-general illustration would help. Your existing examples seem to revolve strangely around functions and seem arbitrary API design choices to me. You are frustrating to communicate with; very indirect and round-about and use English in peculiar ways. -t

"Constructing" by programmers, of course. The complexity I'm talking about is exemplified by the following "clear and reasonably-general illustration", which is representative of either application-level code or what goes on inside middle-ware:

 // Dynamic SQL query in PHP assuming PostgreSQL
 $qry = "SELECT * FROM customers WHERE name = '" + pg_escape_string($userInput) + "'";
 $resultset = pg_query($dbconn, $qry);

// Prepared statement query in PHP assuming PostgreSQL $qry = 'SELECT * FROM customers WHERE name = $1'; $resultset = pg_query_params($dbconn, $qry, array($userInput));
The above is rather poorly constructed from an application development point of view (freely mixing SQL and application logic is questionable) but it illustrates the point of this page: It's easier to determine that $qry in the second example doesn't reference external data -- because it doesn't need filters and shouldn't reference external data -- than it is to determine that $qry in the first example always wraps external data (i.e., $userInput) in pg_escape_string() or some equivalent. That's because the first $qry assignment is always more complex than the second, because the first always needs filtered external data, but the latter should never contain external data. There's nothing arbitrary about it -- it's a characteristic imposed by the DBMS, by the fact that DBMSs allow two different kinds of SQL constructs, PreparedStatements and DynamicSql, to be defined and used differently.

If you are talking about the app programmer side, then complexity may not be the primary metric, for maintainability and grokkability of the code is usually the key concern.

This page was spawned from a discussion that assumed the primary concern is avoiding SqlInjection, for which PreparedStatements are preferable not only because they avoid risky filters, but because they're easier to audit (because they don't need to reference external data or filters). If avoiding SqlInjection isn't the primary concern, but the primary concern is ensuring that accesses to the DBMS are minimised, than DynamicSql is obviously preferable. If the primary concern is minimising CPU resource for multiple executions of a given query, than PreparedStatements are almost inevitably preferable. If the primary concern is simplicity and readability of query definitions, then PreparedStatements are almost inevitably preferable.

Again, if we are only auditing the application code via automated code inspectors, then the "score" depends on the API style used, not the underlying injection prevention technology. You appeared to agree that such an automated code checker would only be used on app-side code, not middle-ware nor DB implementation code. As far as the last sentence, I disagree for reasons already given, but I'm not here to debate grokkability and so will not repeat those points.

"Automated code inspectors" were an aside; they weren't the point. The point is that construction of PreparedStatement definitions is simpler than construction of DynamicSql. You can see it immediately above in the difference between the two examples: The first $qry assignment requires filtered external data but the second $qry assignment does not. Hence, the second is simpler than the first. The illustrations at the top of this page are simply a general proof of that observation, using grammar complexity and assuming we've put all the "$qry = ..." definitions together by themselves. If you do put all the "$qry = ..." definitions together by themselves, I noted (as an aside) that you can easily use automated code inspectors, but it's not a requirement. The point is that no matter how or where you inspect the client-side code -- whether manually or automatically, whether inside middle-ware or not using middle-ware -- PreparedStatement definition construction will be simpler than DynamicSql construction because the latter requires filtered external data but the former does not.

I again shall refer you to PageAnchor complexity-symmetry.

How does "complexity-symmetry" affect auditing the construction of PreparedStatement definitions or DynamicSql?

If it's an "aside", then you've failed to illustrate and measure your complexity claim.

Please read the top of this page. You'll note that my complexity claim is that DynamicSql construction is more complex than PreparedStatement definition construction because the former requires filtered external data but the latter does not. This is demonstrated by a difference in grammar between the former and the latter -- the grammar for code that constructs DynamicSql needs to include code for filtered external data, whilst the grammar for code that constructs equivalent PreparedStatements excludes (by omission) code for filtered external data. The fact that you can use such grammars to automate validation of code was an aside (and a bit of a bonus, perhaps) but it isn't the point. The point is that one SQL construct needs filtered external data and the other doesn't.

While I will concede that certain parts of filter technology may be more complicated (perhaps parts that don't even matter to most developers or shops because they can be abstracted away), I've seen no clear evidence that the total sum complexity is less. You appear to be improperly extrapolating cherry-picked aspects into the whole, which to me is being intellectually sloppy. Shape up! It comes across similar to:

  "Bicycles require bicycle seats while cars don't.
  Therefore, bicycles are more complicated than cars."
Now, maybe in the end the total complexity for filters is indeed more than that of total PS technology, but you have provided very little encompassing evidence other than a narrow selection of pet parts you keep obsessing on. -t

What I "keep obsessing on" is the crucial and inescapable difference: requiring filters vs not requiring filters. Between PreparedStatement definitions and DynamicSql construction -- including the low-level APIs to send SQL to the DBMS and retrieve result sets or response codes -- there is no other difference in complexity.

A better analogy (to the extent that any analogy is comparable) is that cars with automatic transmissions (PreparedStatement definitions) are easier to drive (audit) than cars with manual transmissions (DynamicSql construction) because cars with automatic transmissions don't need a clutch (filters) to change gears (external data).

But the total manual-shift car is mechanically simpler: it has fewer total parts. The "complexity" is matter of perspective (car versus user effort), and I don't know what perspective your PS claim is coming from, and attempts to narrow that down keep proving futile for reasons that escape me.

Fewer parts is irrelevant. Using the car analogy, we're talking about driving complexity (filters vs non-filters), not mechanical complexity (DBMS internals). What we're talking about here is the complexity of having to deal with filters when using DynamicSql that is eliminated when using PreparedStatements. From the point of view of a middle-ware or application developer (or auditor), that is significant.

Why are you counting middle-ware complexity but not DBMS complexity? Because it's "bundled"? You can only exclude DBMS-side complexity if you can clearly justify the reason for drawing such a seemingly arbitrary line in the stack-sand.

The focus of this page is, and has always been, the complexity of constructing and (particularly) auditing the construction of DynamicSql vs PreparedStatement definitions. The DBMS interprets these statements, it doesn't create them. Obviously, creating SQL is particularly of concern in avoiding SqlInjection, and discussing that topic in relation to PreparedStatements is what spawned this page. Whilst auditing DBMSs is certainly a concern for a relatively small number of security-critical applications, it's not an area that the typical database-driven application developer, or the typical middleware developer, is likely to encounter. For most database-driven application or middleware developers, auditing the client-side is a valid concern; auditing the DBMS is not.

Furthermore, within a typical SQL DBMS there is no strong separation of PreparedStatement handling from DynamicSql handling. In SQL DBMSs, these are largely integrated, such that an auditor is going to be looking at the security of the DBMS as a whole. Thus, it no longer makes sense to distinguish PreparedStatements from DynamicSql -- at least in terms of auditing complexity -- on the DBMS-side.

"As a whole"? Please elaborate.

An application mission-critical enough to warrant a source-level audit of a (likely enterprise-grade) DBMS is almost certainly going to audit the whole DBMS, and not just stop at (say) the part that recognises PreparedStatement parameters.

Anyway, auditing a DBMS -- relative to auditing application code or middleware -- is such a specialised, rare and unusual task as to be essentially irrelevant here. This page was spawned from concerns over avoiding SqlInjection related to PreparedStatements and DynamicSql, which is a concern every time a database-driven application is built. Concerns about a DBMS sufficient to warrant auditing the DBMS source is a whole different category of activity, with different motivations, and it almost never happens. For instance, auditing the DBMS is almost certainly not about SqlInjection (or at least it's a negligible concern) because SqlInjection results from constructing queries; it does not result from parsing, planning, and executing them.

The only level we both agree on is the app level. Doing middle-ware and/or DB source inspection is up the organization per needs and resource trade-offs. And app level depends heavily on the API design, which is mostly orthogonal to the PS-versus-filters issue. QED #47.

The style of coding at the application level -- or what happens inside middle-ware -- may depend on the DBMS API design but the content of constructed queries is dependent on what functionality the DBMS exposes:

That will be true at the low-level DBMS API, no matter what API design is used, as long as the aforementioned DBMS functionality is exposed. (Note: Every popular SQL DBMS exposes it via ODBC, JDBC, native APIs, etc.) It will be true no matter what higher-level API design is used on top of the low-level DBMS API, though the difference between DynamicSql and PreparedStatements may only be visible inside the higher-level API. ObjectRelationalMappingLayers, for example, may not expose PreparedStatements or DynamicSql (or any SQL at all) but the developer of the ObjectRelationalMappingLayer will have had to deal with either or both.

In short, whoever interacts directly with the DBMS will have to tackle the difference in complexity between constructing DynamicSql or PreparedStatement definitions, because they're invariably what the DBMS exposes to the client-side.


AprilFourteen


EditText of this page (last edited May 22, 2014) or FindPage with title or text search