On PreparedStatementsDiscussionTwo, it is alleged that PreparedStatement definitions consisting solely of static string definitions -- which optionally make reference to other static strings -- are easier to validate than dynamic SQL definitions that must (in order to be dynamic) make reference to external data and optionally make reference to other dynamic definitions.
Here, we examine that claim.
A typical set of PreparedStatement definitions, in a language like PHP, will look like the following:
<?php
// Example-1A
$prepClauseStatus = "s = 'available'";
$prepStmtDef01 = "SELECT * FROM piddle WHERE p = ?";
$prepStmtDef02 = "INSERT INTO fizzle(x, y, z) VALUES(?, ?, ?)";
$prepStmtDef03 = "SELECT * FROM piddle WHERE " . $prepClauseStatus . " AND p = ?";
$prepStmtDef04 = "SELECT * FROM furgle WHERE " . $prepClauseStatus . " AND q = ?";
?>
Note that the code is highly constrained, consisting strictly of assigning variables. Variables are only assigned string literals, or the concatenation of string literals and variables. We can express that as a grammar. Assume the existence of token definitions <VARNAME> which is a variable name (e.g., $prepClauseStatus), and <STRING> which is a sequence of any characters delimited by '"'. Other tokens are identified by their literal text.
The grammar, in modified EBNF, is:
PHPProgram ::= "<?php" ( Assignment )* "?>"
Assignment ::= <VARNAME> "=" StringDef ";"
StringDef ::= ( <STRING> | <VARNAME> ) [ "." StringDef ]
If we wanted to, we could build a parser for that grammar to automate validation of
PreparedStatement definitions. Additional functionality would be to verify that <STRING>s do not contain embedded variable references, and ensure that <VARNAME>s in the S
tringDef rule only refer to variables defined by the Assignment rule in the same PHPProgram.
We can take the same approach with dynamic SQL definitions. Typical dynamic SQL string definitions might look like the following:
<?php
// Example-1B
$sqlClauseStatus = "s = 'available'";
function sqlDef01($parm1) {
return "SELECT * FROM piddle WHERE p = " . sqlStringFilter($parm1);
}
function sqlDef02($parm1, $parm2, $parm3) {
return "INSERT INTO fizzle(x, y, z) VALUES(" . sqlStringFilter($parm1) . "," . sqlStringFilter($parm2) . "," . sqlStringFilter($parm3) . ")";
}
function sqlDef03($parm1) {
return "SELECT * FROM piddle WHERE " . $sqlClauseStatus . " AND p = " . sqlStringFilter($parm1);
}
function sqlDef04($parm1) {
return "SELECT * FROM furgle WHERE " . $sqlClauseStatus . " AND q = " . sqlStringFilter($parm1);
}
?>
Note that we can't define them as static strings like we did with the
PreparedStatement definitions, because we need to incorporate dynamic external data. So, we define them as functions, in order to achieve the same flexibility to pass arbitrary parameters that we would have using
PreparedStatements.
The grammar might look something like this, assuming a pre-existing token <FNNAME> to represent a function name in addition to the same <VARNAME> and <STRING> tokens used above:
PHPProgram ::= "<?php" ( FnDef | Assignment )* "?>"
Assignment ::= <VARNAME> "=" LiteralStringDef ";"
LiteralStringDef ::= ( <STRING> | <VARNAME> ) [ "." LiteralStringDef ]
FnDef ::= "function" <FNNAME> "(" ParmDefs ")" "{" "return" StringDef ";" "}"
ParmDefs ::= <VARNAME> [ "," ParmDefs ]
StringDef ::= ( <STRING> | <VARNAME> | WrappedExternal ) [ "." StringDef ]
WrappedExternal ::= <FNNAME> "(" <VARNAME> ")"
If we automated it, additional functionality would be to verify that <STRING>s do not contain embedded variable references and ensure that <VARNAME>s in the S
tringDef rule only refer to variables defined by the Assignment rule in the same PHPProgram.
Obviously, the latter grammar is more complex. The first three rules in both grammars are very similar, but the former is only three rules. The latter requires seven rules. This makes us ask what is being gained -- at least in this case -- by using dynamic SQL instead of PreparedStatements.
My summary of the above examples: The author of this example cherry-picked both the specific filter-related API style and the level of abstraction shown here such that the example is misleading. It does not consider all the relevant levels and all the possible API styles. The details of this complaint and counter arguments can be found below. --TopMind
What is this? A signed evidence-less editorial by TopMind, inserted here loooong after TopMind has already been anonymously debating below?
Top, for someone who so frequently decries ArgumentFromAuthority (whether it actually is or not, and it usually isn't), this appears to be nothing more than an attempt to use what you believe to be the strength of your own reputation to back up your own previous arguments. In other words, you appear to be trying to use yourself in your own ArgumentFromAuthority!
The examples at the top of this page are merely one semi-formal proof of an obvious observation: On the client-side of a DBMS, PreparedStatements are simpler to audit than DynamicSql because DynamicSql construction requires filtered external data but PreparedStatement definitions do not. This is true whether PreparedStatement definitions or DynamicSql are being constructed inside some middle-ware, or exposed to the application developer.
Your "relevant levels and all the possible API styles" are irrelevant, because that simple observation -- and its semi-formal proof above -- holds true regardless what level we examine, or what API style is used.
It's not ArgumentFromAuthority because the details of the reasoning are given below, as stated. It's a summary of my criticism, and summaries are typically not the place to give detailed evidence and logic. I believe it's valid and fair to put such a summary in that spot. If a majority of 4 or more WikiZens side with you, I'll remove or move it. As far as the rest of your statements, they repeat already-given claims and are replied-to below. I won't repeat those replies up here in the spirit of OnceAndOnlyOnce. -t
Re: "...simple observation" -- Nothing you do is simple. You can complicate a period.
First of all, I don't consider PHP's prepared statements (PS) "typical". Microsoft, for one, usually expects a type indicator be given in their PS API's. Second, you used unnecessarily large function names for the wrappers/filters, making it look more verbose than it needs to. But these are mostly separate from the grammar processor issue.
You are using a variation of ArgumentByLabToy. The main problem is that functions and other language elements will probably need to be used anyhow for OTHER purposes (general factoring and code organization) such that a parser will have to be able to process them. If a parser can only process certain statements but not most or all, then the statements/syntax it cannot parse will often "derail" it, making it useless or unreliable. Typical programming language parsers don't have common sense and get thrown off track easily if not nearly perfect. Thus, even if we are only interested in 10% of the code for a particular usage, the parser still has to recognize 100% of the code (to be reliable) to properly recognize that 10%. A Guess-A-Tron can be built for either case if we accept "fuzzy parsing".
Also, it's not likely that all SQL clauses would be defined in one place for non-trivial apps. That's anti group-domain-related-things-together. You are grouping on processing technology (database), not on domain issues. Your parser has still gotta process functions, modules, and variable & naming scopes, dude.
--top
- Addendum: There is no reason to place them into functions unless they are called multiple times. Your "to make them equivalent" (paraphrased) is a fake requirement.
- Assuming that's so, how would you create the appropriate dynamic SQL? It's such a common requirement to have to invoke a given query with varying parameters that it's reasonable to define it in a parametric fashion. This is invariably true of interactive applications; perhaps it's less so if you're creating one-off data retrieval or reporting scripts.
- The "varying parameters" are the values of the variables. Perhaps I'm interpreting "dynamic SQL" different than you.
- How and/or where are you creating the dynamic SQL strings that use the values of the variables?
PHP's PreparedStatements are typical for PHP PDO, and as I was illustrating PHP, it seemed appropriate to use what's typical for PHP. As for the "unnecessarily large function names", the size of function names is irrelevant. What's intended to be compared is not the size of the PHP code, but the size of the grammar definitions. They are a better indication of code complexity (in this case) than code examples.
As for the above being "a variation of ArgumentByLabToy", my examples are structurally the same -- differing only in object names and query contents -- as real production code I've worked on or written. Whilst my grammar/parser examples could easily be automated, typically the only "automation" is getting a developer to audit it who hasn't written the code. In that case, the simpler grammar represents easier -- and therefore more reliable -- auditing.
You're right -- not all SQL clauses would be defined in one place for non-trivial apps. There may be multiple definitions like those shown above, associated with their relevant classes, modules, components, or whatever. In PHP they can (like with most other languages) be confined to their own source files so I don't have to process functions, modules, and variable and naming scopes, "dude".
Re: "as real production code I've worked on or written." -- Well, it does not fit typical structures I encounter and make. The number and complexity of mouse, keyboard, and eyeball movements would be roughly comparable between the two general techniques when in a fuller code-base environment. You seem to switch back and forth between a WetWare-based argument and a machine-based argument (parsing machine) as it suits a specific narrow argument. You seem to be cherry-picking your reference specimens per sub-argument to force your argument into a certain (misleading) direction. It's a debate technique you've used before.
- A "fuller code-base environment" differs only in volume, not structure. The grammatical complexity of the PreparedStatement example -- even with type declarations -- would remain constant, and simpler than the dynamic SQL example. This has nothing to do with "mouse, keyboard, and eyeball movements" and is based purely on an objective measure: code complexity measured by grammatical complexity.
- Your "grammatical complex" does not measure anything of importance to the real world. It's arbitrary. It's like counting periods for the hell of it: it doesn't mean anything.
- It measures something fundamental: the possible complexity, particularly the possible diversity, of code.
- I'm sorry, but I just don't see your point. I don't see how it matters to developers or automated testing itself in any practical sense. See ScienceAndTools.
- Grammatically simpler code is generally easier to audit for potential security issues than grammatically more complex code.
[Can you give an example of the typical structures you encounter and make?]
That could take a while to code up in a way that makes sense to readers. I don't feel like doing such today. PS's don't interest me enough to motivate me. I'd rather spend pseudo-coding efforts on DynamicRelational or MultiParadigmDatabase, which are far more interesting to me than PS's. If you want a PS's example, you'll probably have to wait. -t
- I thought you didn't use PreparedStatements.
- I meant "examples related to the PS topic".
- Then for something that by your own admission doesn't interest you, you've spent a lot of time defending your approach to not dealing with it.
- Sometimes it's good to take a break when it appears the debate is going around in circles.
- [You were being asked to provide something that would help prevent the circles. It's telling that you won't provide it yet continue the debate.]
[Top, I'll take that as a "no" until I've seen otherwise, just like your "wrappers done right".]
The default answer is NOT your position, but rather "unknown" until either side presents better evidence.
[The fact that you haven't provided them is the evidence.]
Exactly. I have provided evidence at the top of the page. That you, Top, would like to see "better" evidence is irrelevant; some evidence, any evidence, trumps none at all. We do not default to "unknown" merely because you don't agree with my evidence and lack any of your own.
When your parser was proven to be useless, you waffled back to WetWare, but didn't fill in any details on measuring WetWare. That's how I interpret your round-about "presentation".
And if we are talking WetWare, like I've said before, long positional parameter "lists" can get awkward. Positional parameters are often okay in small quantities, but a maintenance headache for larger lists.
How did you prove my parser "to be useless"? It's very useful. I've built such parsers and they're very useful.
What does positional parameters have to do with this?
PageAnchor same-01
Well, actually the wrappers could use positional parameters also. It's just an API design issue: a personal syntax preference. I personally see no code maintenance benefit in such, but if YOU like such, then fine, but don't have a PersonalChoiceElevatedToMoralImperative.
[Sure it could, but what does it have to do with anything being discussed on this page?]
You are really arguing about API design, NOT prepared statements, such that this grammar pissing contest resolves almost nothing directly related to PS's.
The so-called "grammar pissing contest" demonstrates that code defining PreparedStatements is simpler than code defining dynamic SQL, and so is easier to audit for potential security issues.
The syntax could be made nearly identical to the PS calls with some API rearrangement. Thus, a syntax-only analysis would produce nearly identical scores.
Indeed, an API could be designed to identically support dynamic SQL queries or PreparedStatements. Then the only significant difference between the two (performance issues aside) is that dynamic SQL has to rely on filters to prevent SQL injection. Thus, the sole risk mitigation against SQL injection is the quality of the filters. With PreparedStatements, there is no such risky dependency. It is for that reason that OWASP recommends PreparedStatements even over its own ESAPI filters.
You are wandering off the topic of syntax, grammars, and code style. We've been over the risk issue gajillion times in other topics and I see no reason to re-re-re-re-re-debate here. YOU may like repetition, I don't. My "same syntax" point stands. QED. Good Night.
Yes, I am wandering off the topic of syntax, grammars and code style, because they're irrelevant if we define -- as you suggested -- a common API that can work with either dynamic SQL or PreparedStatements. If we have a common API that can work with either dynamic SQL or PreparedStatements, why would we ever choose to use it with dynamic SQL?
I don't understand the question. Please clarify or restate. Perhaps I am not clear on what you mean by "dynamic SQL". I've talked about the need in non-trivial systems to have "libraries of SQL snippets" to mix and match SQL portions for different needs. For example, we may have snippets for product categories, and another set of snippets for locations/regions, and yet another for "report types/styles". These different factors will interweave because they are mostly orthogonal. I'm not sure I'd call that "dynamic" or not because implementation may vary per tool stack. How about "compos-able" as a working term? The granularity of a full SQL statement is often too large for typical modularization/re-use needs in medium and large organizations or complex apps. -t
PageAnchor Dynamic-SQL-Def-01
By "dynamic SQL", I mean constructing whole SQL queries by string concatenation, in which a SQL query is created as a text string composed of static text (e.g., "SELECT * FROM customers") and optional static string constants (e.g., "SELECT * FROM customers WHERE " + $southRegionWhereClause) and optional external data filtered to hopefully prevent SQL injection (e.g., "SELECT * FROM customers WHERE " + $southRegionWhereClause + " AND custID = " + sqlStringFilter($_POST['userInputID']) and the whole string is passed to the DBMS connection's "execute query" function.
Of course, the more secure alternative is to use PreparedStatements. They still allow SQL queries to be constructed by string concatenation. A SQL query can be created as a text string composed of static text (e.g., "SELECT * FROM customers") and optional static string constants (e.g., "SELECT * FROM customers WHERE " + $southRegionWhereClause), but no external data should be included. Where we want external data, we define a parameter -- e.g., "SELECT * FROM customers WHERE " + $southRegionWhereClause + " AND custID = ?". We pass the string to the DBMS connection's "create prepared statement" function and get back a handle to a prepared statement. When we want to execute the query, we invoke it by its handle, and pass an argument to the parameter specified by the '?'. E.g., prepStmt.Execute($_POST['userInputID']).
Regardless which approach we use, we can define an API that abstracts the underlying approach. In use, it might look something like this:
$qry = "SELECT * FROM customers WHERE " + $southRegionWhereClause + " AND custID = ?";
$handle = $dbConnection.CreateQuery($qry);
$handle.setParameterType(1, ParmTypes.String);
$handle.Execute($_POST['userInputID']);
The above could work with either dynamic SQL or PreparedStatements. If we're using dynamic SQL, setParameterType() determines what filter function gets used. If we're using PreparedStatements, setParameterType() identifies the parameter type.
So, if we have the same API for either dynamic SQL or PreparedStatements -- and we can see it's reasonable that the same API be used for both -- why would we ever use it with dynamic SQL?
I suppose you are going to argue that if the API's can be made pretty much the same, then the only decision difference maker is which one is more secure such that security becomes the pivotal factor (and SP's are the alleged security winner). That may be the case, but I not going to repeat the "security fight" here.
I haven't tested the API swappability between the underlying techniques (PS's versus filtering) in practice and so will not comment on that further. In theory we could have:
- Positional parameters with PS's underlying
- Embedded parameters with PS's underlying
- Positional parameters with filters underlying
- Embedded parameters with filters underlying
In theory (with what's known), it does not appear that syntax or grammar in terms of application programmer (database API programmers aside) or automated verification favor one or the other.
If a person or automated code analysis system does not detect that "outside values" are slipping in, then SQL injection can happen BOTH for PS's and filters. Thus your statement that filters are more vulnerable because one may "forget to wrap" or that it's easier for automated code analysis tools to verify is false. PS's do not prevent "slipping in" of outside values. One must inspect the app code to ensure such. Code inspectors won't necessarily face different code under each (PS's or filters) such that detection of one or the other is not easier or harder under each, as you originally claimed. You lost the "complexity/grammar" debate, bub. -t
We've already been over this one several times, "bub". The whole point of this page is that it's objectively simpler (based on grammatical complexity) to audit PreparedStatement definitions than dynamic SQL for inadvertent inclusion of external data. Also, what do you think is easier to do by accident: Include an external variable in a PreparedStatement definition, keeping in mind that accessing external variables normally requires special syntax -- like $_POST['varname'] in PHP -- specifically to prevent accidental inclusion? Or forget to wrap an external variable with a wrapper function?
Arrrrrg. How the hell is it "objectively simpler based on grammatical complexity" if the code can be nearly identical? I have no idea what your $_POST['varname'] example is about. It would "leak in" or not leak in the same way if the code is the same. Arrrrg. For example, using PS's does NOT prevent the following:
// Example post-01
$foo = $_POST['varname'];
...
$sql = "SELECT * FROM users WHERE " . $foo . " AND USERNAME = ? AND PASSWORD = ?";
$stmt = $dbc->prepare($sql);
...
I thought you were referring to the original examples. If we're talking about the API, then the process shown at the top of the page for auditing static strings is the same for any API. If you're following any conventional discipline for avoiding external tainting in Web apps, then it's much harder to inadvertently include an external variable where you don't want one, than to inadvertently forget to put a wrapper around an external variable were you do want one. Your example suggests freely intermixing SQL code and application code. Aside from the poor SeparationOfConcerns that represents, it's a potential security nightmare.
Why did you think I was "referring to the original examples"? I stated "Code inspectors won't necessarily face different code under each...". What did you think I meant by that? And I don't know what you mean by "conventional discipline". If you want to claim that different API design styles have different/greater risk (programmer mistyping) profiles, that's orthogonal to the PS versus filtering issue and probably should be split into a different topic and is one I may not choose to participate in.
I interpreted your "different code under each (PS's or filters)" as referring to different code. If it was the same API for both PreparedStatements and filters, why would there be "different code"?
An aspect of conventional discipline for developing secure Web applications is to isolate SQL definitions into their own modules, so as to reduce the likelihood of inadvertently tainting the definitions with external data. The examples at the top of the page are examples of aspects of that approach.
But the key point still stands: the usage of PS's versus filters does NOT by itself make for easier-to-inspect code, per accidental outside data leak-ins. API style/syntax design is technically irrelevant. You lost the "inspection" debate pure and simple. -t
As I pointed out above, if we have the same API for PreparedStatements as filtered dynamic SQL, then conditions are the same for both. Except, of course, PreparedStatements are more secure. Why, using such an API, would we choose dynamic SQL?
- Okay okay, but like I have to keep pointing out, that's not the key topic HERE. Please focus.
If we don't have the same API for PreparedStatements as filtered dynamic SQL, then it's easier to identify whether or not PreparedStatement definitions have been tainted by external data, because PreparedStatement definitions should only reference static strings and never reference external data. The corresponding dynamic SQL definitions are more difficult to audit because they necessarily must mix filtered external data with static strings, and also must ensure that all external data is filtered.
Please elaborate. How does it verify should-ness exactly?
See the top of this page. In particular, "If we wanted to, we could build a parser for that grammar to automate validation of PreparedStatement definitions. Additional functionality would be to verify that <STRING>s do not contain embedded variable references, and ensure that <VARNAME>s in the StringDef rule only refer to variables defined by the Assignment rule in the same PHPProgram."
The top of the page is comparing the apple API design to the orange API design.
How so?
What? They are obviously two different query API's (or at least "usage styles") to me, but the style choice is mostly orthogonal to PS usage, as I thought we already agreed. (Further, per above, it's questionable to wrap every query in a function, but I'll leave personal preferences to each person.)
Of course they're different APIs, but they accomplish the same thing: defining a query that incorporates external values for subsequent execution.
Sorry, I'm not understanding your point. I didn't dispute that they produce the same results. There are multiple API designs that can accomplish the same thing. That's not news.
You said they're "comparing the apple API design to the orange API design". What they're actually comparing is the fact that defining PreparedStatements as shown above -- vs defining dynamic SQL as shown above -- requires objectively less effort to audit.
As for wrapping every query in a function, how would you define a set of parametric queries outside of the main program logic?
I'm not sure what you mean. That appears to be an artificial requirement. Typically I'll only factor them to a function if they are reused at least 3 times (the same SQL called from 3 diff spots), or 2 if it's a large volume of SQL. No use putting everything in a function for the sake of putting everything in a function.
Every query that incorporates external data is a parametric query. If you define SQL queries in-line in your application code, you run the risk of tainting your SQL queries with external data. If you define SQL queries in their own modules, then you run much less risk of inadvertently tainting your SQL queries with external data. Let's assume you define SQL queries in their own modules to reduce the risk of inadvertently tainting your SQL queries with external data. How, then, do you define them so that they incorporate external data when you need them to?
Parametric versus in-lining is a code style/design issue orthogonal to PS's. Parameterizing can be done via functions and via "insertion point markers" such as question marks. I don't want to get into a code design battle here unless it clearly relates to PS's. But I would point out that positional (parameterized) can be difficult to work with when you have lots of fields/columns in a query. There can be a maintenance/reading costs to such. Whether that overrides other issues I'll leave to a shop manager/owner decision. As a coder it's not my job to make such tradeoff decisions, only describe and recommend. If you don't like their final decision, go bicker with them, not me.
Every detectable "mistake path" you've identified so far is tied to the API design/usage-style and not to the usage or non-usage of PS's. Complaints that a given API design/usage-style invites more app-coding mistakes compared to another is not related to PS usage versus non-PS-usage.
Indeed, we've already discussed this both above and below. A unifying API can be created to make definition and invocation of queries -- whether they're PreparedStatements or dynamic SQL with filters -- identical. In that case, the only distinguishing factor is security risk.
So what's all this crap about "grammar complexity" then?
It holds true if you don't use a unifying API. It's also holds true for the author of the unifying API.
Bad things happen if you use a bad API or mix same-purpose API's for no reason? Gee, that's news. Who would've guessed?
But that's not the point. The point is that there's never a good reason to prefer dynamic SQL with filters over PreparedStatements.
That's not the point of THIS topic as I understand it.
This page was spawned from PreparedStatementsDiscussionTwo, itself spawned from TopOnPreparedStatements. All contribute evidence to the thesis that there's never a good reason to prefer dynamic SQL with filters over PreparedStatements.
"Easier to validate" was the claim that spawned this. That claim has been flattened, and the "safer" issue is covered in other topics.
What do you mean by "that claim has been flattened"? It is easier to validate PreparedStatement definitions and they are safer. Of course you can wrap PreparedStatement and dynamic SQL definitions in an API that hides either or both, but that's always possible. For example, if you're comparing 'for' loops to 'while' loops, you can always hide the code you're considering inside a function called 'foo', but then you're no longer comparing 'for' loops to 'while' loops.
However, for the reasons described above, PreparedStatement definitions are grammatically simpler to validate -- even if the only place we need to validate them, in comparison to dynamic SQL definitions, is inside the aforementioned API.
Repeating of summary claims doesn't cancel the details of your failure.
Can you identify and defend the "details of [my] failure"? It appears you have no defence for choosing dynamic SQL with filters over PreparedStatements. Dynamic SQL with filters is grammatically more complex to define and audit (as shown on this page) and at greater risk of security issues. If we eliminate grammar complexity as a concern by defining a common API for dynamic SQL and PreparedStatements, dynamic SQL is still riskier than PreparedStatements because it's forced to rely on filters. Where is "[my] failure", here?
Bull! You did not show they "were grammatically more complex". API design is a personal preference and orthogonal to PS usage. Were are going around in circles again. I'm hopping off the Repetition Express.
The top of this page shows that raw dynamic SQL definitions are grammatically more complex than PreparedStatement definitions, using typical (PHP, in this case) DBMS APIs. If you disagree, please demonstrate otherwise.
I already told you twice. Take some medication for your Alzheimers, it's getting really fucking annoying to have to repeat the same shit over and over and over.
You may have "told" me, but you haven't demonstrated it. I don't see a counter-demonstration of grammar complexity that demonstrates how PreparedStatements are grammatically more complex or dynamic SQL with filters is simpler. Of course, it would be moot anyway -- even if dynamic SQL with filters were obviously simpler, who would so negligent as to risk client security for programming simplicity?
The opening examples could be used for either PS's, filters, both, or neither. You "grammatically measured" something orthogonal to the PS/filter decision.
How so? The first example is of PreparedStatement definitions, and the second is of dynamic SQL. How could they "be used for either PS's, filters, both, or neither"?
Perhaps I measured nothing, but YOU also measured nothing (on topic/relevant). I've said this already multiple times repeatedly redundantly, but it doesn't sink in for those with stubborn dig-in-heals personalities. (And there are other issues with actually applying your grammars {cough} to real tools and WetWare, but those are secondary.)
Why do you think the examples are not "on topic/relevant"? As for applying my "grammars {cough}" to real tools, I've used exactly the approach described at the top of the page to build auditing systems for real code, so I'm not sure what your "other issues" might be. Could you explain?
They are NOT relevant because the top examples are a syntax/API choice that is orthogonal (independent) of the choice between using PS's or filters. You are arguing code design choice, not PS's versus filters. Either can use either style (or other styles, since your choices suck, but that's another story).
What the examples illustrate is a fundamental difference between PreparedStatement definitions and dynamic SQL definitions: The former should never contain references to external data. The latter must contain reference to external data. This is true regardless of syntax/API choice, up to the point that we define an API that unifies PreparedStatement and dynamic SQL definitions. Of course, in that case, the definitions should never contain references to external data.
- I'm sorry, but again again again, I don't see how they are illustrating that. You seem to be using "references to external data" in a way that means something very different to you than it does to me. The opening examples are comparing API styles, NOT PS versus filters. (Granted, the out-of-the-box API's may "encourage" a certain style.)
- By "references to external data", I mean (for example, in PHP) references like $_POST['userinput']. They must never appear in PreparedStatement definitions. If they appear in dynamic SQL definitions, they must always be filtered. This isn't an issue of APIs, because it is true right down to the SQL level. Regardless what API or process generates the SQL sent to the DBMS, we have to ensure both statements hold true -- to the best of our ability -- in order to maximise security.
- I'm still not seeing what this has to do with opening examples. You are just restating the same things over again. I think I shall give up. If somebody else knows what you are talking about and wish to restate it a different way, please do. Otherwise, I am giving up for now. Writing a code parser to "check" either API style (which is orthogonal to PS-versus-filters) appears to be generally the same amount of effort to me. Signing off... -t
- The opening examples objectively demonstrate that PreparedStatement definitions are simpler to validate than dynamic SQL definitions. This is because PreparedStatement definitions can be constrained to only reference static strings. Dynamic SQL definitions can also reference static strings, but they must reference external data too.
- "Can be constrained" by what? Not by the PS API's themselves.
- [They are constrained by the grammar mentioned right after the prepared statements in PHP example at the top of this page.]
- But you are not showing the whole stack and one can use Example 1-A app-side style for the filter approach also. If you are also referring to deeper in the stack, then you need to count deeper PS-related code also. (I reject your use of the term DynamicSql.)
- As given, Example 1A and Example 1B show "the whole stack" -- at least in terms of what's actually creating DynamicSql or PreparedStatements -- because the examples assume the SQL is being sent directly to the DBMS, via low-level DBMS interface APIs like ODBC or JDBC. To use "Example 1-A app-side style for the filter approach", there must be some translation layer that's converting Example 1A's PreparedStatement syntax to dynamic SQL -- such that Example 1A is no longer an example of PreparedStatements, but an example of PreparedStatement-syntax being converted to dynamic SQL. Thus, DynamicSql construction is again shown to be more complex than PreparedStatement definition construction.
- Note that even IF SQL and app code are kept separate, it may make sense to put the compose-able SQL snippets into functions/methods/classes for typical modularization needs rather than just be "global". Thus, a code-checker/parser will still have to recognize functions/methods.
- I doubt it would make sense to do so. A chain of static strings doesn't need functions/methods/classes for modularisation. I've created query definition modules containing 500+ queries. Static strings were sufficient.
- Knowing that the type of projects you work on tend to be different than my usual niches, I'm a little skeptical. To handle all the zany business logic and legacy quirks, the usual programming constructs come in handy. A big chain of strings will probably be too limiting by itself unless one chooses to live with a certain level of duplication. But we can LetTheReaderDecide: if you, the reader, can find a way avoid the need for functions/methods in your SQL definitions, then writing a parser to validate the sub-strings will probably be easier. There; we no longer have to bicker: the ball is in the reader's court. Done!
- I've encountered plenty of zany business logic and legacy quirks, but none so zany that a set of static SQL query definitions couldn't be defined entirely from static strings. I suppose it might occur on rare occasions that creating certain definitions could benefit from a function or loop. For such exceptions a definition might have to live apart from the majority, in its own module, and be audited manually.
- "Can be" and "better designed with/without" are two different things. Anyhow, without actual specimens, this is merely an AnecdoteImpasse. Let the reader pick.
- [So why don't you give us an example of where it would be too limiting? That should either break the impasse or show that it wasn't as limiting as you claim.]
- Take a corporate computer-supply company that sells both software and hardware. There may be organizational separation between software and hardware such that the software group writes and manages most of the software-related SQL and the hardware group writes and manages most of the hardware SQL. We wouldn't want to mix these in the same file. A typical way to separate them is to put them in an OOP class and put the classes in different files. Thus, a code parser (for reference inspection) would need to be able to at least parse classes. The same applies for other file-splitting by aspects: like report type. And if we need conditionals, it may be better to wrap some of these in methods with either IF statements or polymorphism. For example, if the target date is earlier than a certain date, then maybe an archive table is used instead of the production table (or both tables if the range overlaps).
- [It's a start. Now what's the advantage of putting them in the OO classes? Can you give as a sample of what the OO classes would be like?]
- I'd have to create a fairly extensive sample snippet set and descriptions. We are back to that issue again. I believe it's reasonable to expect the use of typical modularization constructs when needed or expected per shop conventions. To up-front say "you cannot use classes/methods/functions" or the like is silly and I believe most readers will agree: let us use our fricken programming language. Just because a "flat" approach worked for you once does not mean it will always work. Beyond that, I'm kicking this back (again) to LetTheReaderDecide for now. -t
- [What issue are we back to again? My problem with your scenario so far is that it's too vague to tell if it supports your claim about static strings being too limiting. I'm asking you to flesh it out enough to actually make such a determination.]
- That criticism applies to both sides. I can't see your alleged app either and have to take your word for it.
- How does it apply to both sides? Aside from the quantity of queries (and their content, of course), my "typical set of PreparedStatement definitions" at the top of this page are identical to production code. Various files like the "typical set of PreparedStatement definitions" are created and included (using PHP's 'require' statement) where needed. That covers precisely the situation like the one you described, with multiple sets of non-overlapping query definitions like your "software-related SQL" and "hardware SQL".
- One can't verify one example is comparable to realistic production code; that's merely a claim. (And, not all dynamic languages allow simple "include" or "require" of files and need to use classes or functions.) Why quibble? Just let the damned reader decide. They know their shop style and needs. If you want to argue for arguing sake, argue with Siri. I bet even Siri would slap you eventually, you are one obstinate human being.
- How much verification do you need in order to believe it's "comparable to realistic production code"? Isn't it obvious by looking at it? It's true that not all languages allow a simple "include" or "require" like PHP, but the fundamental principle here holds no matter what the language. The principle is this: Prepared statement definitions should never include external data; dynamic SQL definitions must include filtered external data. Thus, the presence of filtered external data in the latter but not the former -- assuming all else is equal, which it is -- means the latter is inherently more complex. QED. As for me being "obstinate", do you really expect me to change my views simply because you're annoyed by my consistency?
- Sorry, but it's not obvious to me. Something key is missing from your explanation or you are assuming something unstated that I don't. Any must-ness you lay out must still be verified.
- The "must-ness" is inherent in the difference between PreparedStatement definitions and dynamic SQL definitions. The former separates definition from passing external data arguments to parameters, so PreparedStatement definitions must never contain external data. All external data is passed to parameters, but doesn't have to be filtered. Dynamic SQL definitions must contain external data -- otherwise, they're not dynamic SQL -- and that external data must be filtered. Thus, the latter is inherently more complex than the former.
- PS's can (inadvertently) accept external data. (At least many versions can.) With PS's you can use (direct) parameters but are not forced to use parameters. That's what triggered this very topic: automated code inspection to check for such since usage of PS's ALONE did not prevent such coding mistakes. Why the heck are we going around in circles? It's getting frustrating.
- Indeed, the whole purpose of this page was to show that it's easier to verify that PreparedStatements have not inadvertently accepted external data, than to verify external data in dynamic SQL has been correctly and consistently filtered. As for "going around in circles", I'm simply refuting your points. If you keep making the same points, I'll keep providing the same refutations.
- Sorry, I don't see how they are refuted. You seem to be refuting a different issue each time from my reply. After 5 tries perhaps its time to call it quits. A 6th loop is not likely to work; I know this from other debates with you. If another WikiZen with a clearly different way to describe such views wants to give it try, please be encouraged. Otherwise, your peculiar writing style is getting on my damned nerves. Please stay out of the publishing business unless you have stock in Grecian Formula and your goal is to increase its value.
- If I'm "refuting a different issue each time", then how could we be "going around in circles"? Wouldn't "going around in circles" imply that I'm refuting the same issue each time?
- It's kind of like: I explain sub-topic A, but you refute sub-topic B, and then I reply about B, then you refute with C, and I then reply about C, and you refute with an issue related to A. That's an over-simplification, but generally appears to be what's happening, from my perspective.
- That sounds very frustrating. What do you think could be done to reduce the frustration?
- Force you to take your memory pills using Clockwork-Orange-like techniques so that you don't forget prior replies.
- That's nothing but an AdhominemAttack. It does nothing to help your position.
- What is peculiar about my writing style? If you can point it out, perhaps I can change it.
- I've tried to explain in the past, but it doesn't seem to work. English just plain seems to lack enough precision.
- Interesting. You find my writing style "peculiar" enough to be "getting on [your] damned nerves", but "English just seems to lack enough precision" to explain why my writing style is getting on your nerves?
- You rely either too much on words, or too much on the same words. Diagrams, data structures, pseudo-code, etc. could be used to supplement the words, or different words could be used to say the same thing from a different perspective. The "tools" you use are not powerful enough to clearly communicate what you want to communicate, but you don't experiment much with different tools.
- This wiki is text-based, so use of diagrams is awkward and limited. I'm also not sure how diagrams would help on this page, and it appears there are more of my code samples here than yours. I'm not sure what I could say differently, either. I'm not a technical writer, any more than most programmers/academics/technicians, so perhaps you could make a greater effort to understand what I'm writing and ask questions if something is unclear?
- Asking you questions just seems to turn into variations of FractalVagueness. I cannot "see" the actual models in your head and so cannot offer specific suggestions to turn them into better descriptions.
- Have any of your questions on this page turned into FractalVagueness?
- One question: is your argument here based solely on (allegedly) not having to parse functions?
- No, my argument is based on the fact that PreparedStatement definitions require the absence of any external data, whereas dynamic SQL definitions require the presence of filtered external data. The former is simpler to validate than the latter. Having to parse functions or not provides an illustration of that which is particular to a given language (PHP), but I'm sure similar examples can be found in every language.
- No, they do NOT require the absence of external data, as I interpret that phrase. See Example post-01.
- Yes, PreparedStatement definitions do require the absence of external data. Introducing external data into PreparedStatement definitions is unacceptable due to the risk of SQL injection. The grammar at the top of the page is specifically designed to limit PreparedStatement definitions to static strings. Your example post-01 would not successfully parse, because $foo = $_POST['varname']; is not admitted by the grammar.
- You mean the grammar of the custom-rolled code checker?
- Perhaps. I mean the grammar immediately following "The grammar, in modified EBNF, is:"
- Okay, but I thought we clearly established that the app code being checked by such a grammar could be pretty much the same under both filtering and PS's. It's a matter of API design choice, NOT whether PS's or filters are being used and thus the "grammar issue" is orthogonal to PS usage.
- We were discussing the difference between PreparedStatement definitions vs dynamic SQL definitions, not the use of APIs that hide both under a common interface. You can hide almost any coding concern behind an API, so such APIs are moot.
- But we are validating the app code for "leaks", not the API. Your grammar is for checking app code, not for checking API's, correct?
- Yes, my grammar example is validating application code, not the API for defining PreparedStatements or dynamic SQL. However, even if you're implementing an API that is a wrapper around PreparedStatements and dynamic SQL, the same concerns hold true.
- No. Your parser could be checking the same thing under both approaches and not "know" the difference because it's at a level below what the parser is checking. Thus, the grammar thing is moot. I thought I've said this like 4 times already. The differences would or could be at the API/wrapper level, NOT the application code. Thus, if your checker is not checking the API/wrappers, then the app code could be identical and the checker wouldn't detect any difference or need different grammar rules. Separate issues. Your checker is checking level X, but the (required) differences are at level Y, thus the concern or design differences will not be visible at level X and so the checker will not see or need be concerned about (designed different for) the differences at level Y.
- I have no idea what you're trying to say. The examples at the top of this page are based on production code. Perhaps you could create similar examples to illustrate your point?
- Arrrg. The examples at the top are an arbitrary design choice. There are different ways to set-up/design the database access API/wrappers. I will agree that that particular design choice shown is easier to parse-check, but that does not mean all are. (Whether one should code around best-fit for human readers or automated code checking tools is a design trade-off that I won't go into here because it's off-topic.)
- How are they an arbitrary design choice? How do you define dynamic SQL queries? How do you define PreparedStatements?
- I'm not understanding the question. The query API/wrappers that an application programmer sees can be designed a myriad different ways, correct? Note that I am not talking about the implementation of PS's themselves, but of middle-ware between the PS's and app programmer, or lack of, if one want's to go direct. The automated app checker would not be checking the middle-ware nor the PS implementation, but just the app code, correct?
- APIs/wrappers can indeed be implemented in innumerable ways, but whether it's inside an API or exposed to the application developer, the following three statements are true at the interface to the DBMS: PreparedStatement definitions must not contain external data. Dynamic SQL definitions must contain filtered external data. The former is simpler to validate than the latter. These statements are true and some programmer has to deal with them, whether it's the API developer or the application developer.
- Bull, your claim has been debunked for anybody without Alzheimers.
- Don't you think it would be more effective to reference and/or summarise said debunking than resort to an ineffective (and rather childish) AdhominemAttack?
- We've been over these same points multiple times, I'm getting frustrated. LetTheReaderDecide, I believe it's pretty clear you are flat wrong and too stubborn to accept your wrongativity. Fuck off!
- Why are you getting frustrated? Do you disagree that the following statement is true at the interface to the DBMS: PreparedStatement definitions must not contain external data, and dynamic SQL definitions must contain filtered external data?
- I already told you why I am frustrated, Mr. Reagan. Example post-10 shows you are flat wrong. Search for "must-ness" on this page for more.
- There is no post-10. Do you mean post-01? Why do you think it shows that I am "flat wrong"? It supports my point, because it effectively demonstrates why PreparedStatement definitions must not contain external data. It's precisely the sort of PreparedStatement definition that must not be allowed, or at least should be identified as unacceptable through auditing.
- [It can also be reliably detected by the grammar mentioned right after the prepared statements in PHP example at the top of this page.]
- True, but that's orthogonal to PS-versus-filter usage, as already explained multiple times. And as a reminder, the example shown at the top of the page is one of many possible API/middle-ware designs that are also orthogonal to under-the-hood PS-versus-filter usage. It tells us nothing that's on-topic. I have not seen anything that would be checked by stated app-code-inspection tool that is clearly dependent on the choice between PS's and filters. You appear to be conflating middle-ware interface design choices with "absolute" under-the-hood database interaction issues. -t
- [That is the topic at hand. With the prepared statements, we can require that the statements passed to the database's API come solely from string literals or from the concatenation of string literals. This prevents user-supplied data from becoming part of the statements. Attempts to use statements that don't come from these two sources can be reliably detected by the aforementioned grammar. When using dyanmic sql, the equivalent task (in terms of security) is to come up with a set of rules that can be reliably enforced (read machine-checked) that detects the absence of the appropriate filters. Every way I can think of to accomplish that task is significantly more complex.]
- Well, every way I think of, what a code-checker inspects is orthogonal to the issue of using PS's versus filters. Show me a single scenario (coded example) where this is not the case (not dependent on some vendor-specific oddity).
- [The example at the top of this page is such an example.]
- Arrrrg! No! Example-1A and Example-1B can each have either PS's or filters under the hood (behind the API's) as one chooses. Filters can be built behind "?" insert markers and vice verse, for example. Those examples are (or can be) an interface only. They are an arbitrary middle-ware interface CHOICE only. (They may reflect an out-of-the-box default, but we are not stuck with an out-of-the-box default.) See PageAnchor same-01 for essentially the same point.
- Yes, you can hide PreparedStatements and dynamic SQL behind some unifying API, but then you're no longer talking about PreparedStatements and dynamic SQL. You're then talking -- at minimum -- about some higher-level parametric query abstraction (e.g., PHP's pg_query_params(); see http://www.php.net/manual/en/function.pg-query-params.php), and possibly going as far entirely abstracting away all SQL. For that reason, elsewhere on this page, we discounted talking about wrapping PreparedStatements and dynamic SQL in arbitrary APIs. What we're talking about here -- as pointed out above -- are PreparedStatements and dynamic SQL as we usually know them, at the interface to the DBMS. That typically means ODBC, JDBC, some native DBMS interface, or a thin abstraction layer like PHP PDO or Perl DBI. These commonly-used APIs expose PreparedStatements and dynamic SQL as SQL definitions. In these, we must ensure PreparedStatement definitions do not include external data and ensure that dynamic SQL includes filtered external data. It is for these that Example-1A and Example-1B are directly applicable. Since any higher-level query API will almost invariably be built on a low level interface to the DBMS, ensuring that PreparedStatement definitions do not include external data and ensuring that dynamic SQL includes filtered external data becomes a concern of the API developer. If no higher-level query API is used -- which is very common -- ensuring that PreparedStatement definitions do not include external data and ensuring that dynamic SQL includes filtered external data becomes a concern of the application developer. In either case, the same concerns are present and, therefore, the examples at the top of the page are applicable; the only difference lies in which developer deals with those concerns.
- Re: "as we usually know them..." -- What do you want exactly? Different API designs will be easier or harder to check by automated code-checkers. if you want to simplify automated code checkers then you'd want to change the API's. That's a design trade-off decision. I'm not picking an API design horse in this race, only saying that your examples and checker complexity metrics are dependent on the query API's used far more than whether PS's are used or not.
- By "as we usually know them", I mean typical SQL-based definitions of PreparedStatements or dynamic SQL -- like those at the top of this page -- which are typically sent to the DBMS using ODBC, JDBC, some native DBMS interface, or a thin abstraction layer like PHP PDO or Perl DBI. It's not the API calls but the PreparedStatement and dynamic SQL definition strings -- again, like the examples at the top of this page -- that have to be audited to ensure that the former don't contain any external data and that the latter do contain consistently-filtered external data. Thus, the APIs are irrelevant. Note that the examples at the top of the page don't require any API calls, except for the filters in the dynamic SQL example.
- They are NOT "irrelevant" because their design affects the complexity needed by your code checking gizmos, which is the main topic here, as I understand it. Your metric will pivot on the API/middle-ware design/choice, not on PS-ness. Whether the top examples are "typical" or not is a different issue than absolute statements about the complexity of a code checker. I'm NOT here to argue typical-ness either way. If a shop wants to avoid the use of functions in order to simplify a code inspector, they can by altering/wrapping the app API to the databases, regardless of whether they are using PS's or filters under the hood. (Whether that's a good trade-off or not is another issue. I'm only saying the option is there.)
- The design of DBMS APIs like ODBC, JDBC, native DBMS interfaces, or thin abstraction layers like PHP PDO or Perl DBI do not affect the complexity needed by my "code checking gizmos". Whatever low-level DBMS interface we're using, the SQL definition strings -- like those at the top of the page -- are the same. It's the SQL definition strings we're auditing using my "code checking gizmos", not the DBMS APIs. Low level DBMS APIs -- that expect SQL definition strings like those at the top of the page -- are invariably used even within a higher-level API. Therefore, the examples at the top of the page are relevant and the DBMS APIs are irrelevant.
- Re: "It's the SQL definition strings we're auditing using my "code checking gizmos", not the DBMS APIs." -- Since the way SQL definition strings are defined is interchangeable under PS's and filters, what does that have to do with PS's then? If it's not about PS's, then what the hell is it about? Why is it so hard to get to the bottom of your point? Jeez.
- No, it's not that the SQL definition strings are "interchangeable under PS's and filters", it's that the SQL definition strings are the same regardless what low-level API we're using to access the DBMS. It doesn't matter whether we're using ODBC, JDBC, native DBMS interfaces, or thin abstraction layers like PHP PDO or Perl DBI -- the SQL definition strings will be the same. That does not mean, however, that the SQL definition strings for PreparedStatements are the same as the SQL definition strings for dynamic SQL queries. Far from it! PreparedStatements separate definition from execution, so external data must only be present (as parameter arguments) when a pre-defined PreparedStatement is executed; external data must never be part of a PreparedStatement definition. Dynamic SQL combines definition and execution, so filtered external data must be part of the dynamic SQL definition. This fundamental difference between PreparedStatements and dynamic SQL queries -- which is manifest at the low-level DBMS API, regardless what higher-level APIs may be used -- is what this page is about. My point -- demonstrated at the top of this page -- is that it's technically simpler to verify that PreparedStatement SQL string definitions do not contain external data than it is to verify that dynamic SQL string definitions contain filtered external data.
- Your words are not clear to me and the code example at the top of the page does not demonstrate what you claim to demonstrate as I interpret your claim. For one, it's not clear what you are labeling as "dynamic SQL" and where the boundary of that designation ends in your mind. I'm growing tired of trying to figure out your point. It seems pretty damned clear to me that checking whether "external" values are leaking into SQL management API's (for lack of a better term) is orthogonal to the use of PS's or filters under the hood (lower levels). The code checker won't necessarily be checking that lower level. They are clearly different concerns to me. What the code checker inspects is not dependent on the usage of PS's versus filters--it's that fucking simple! You may have a point if you go with "as typically used in practice", but I don't necessarily disagree with that, but that is not the original claim as I understood it. I'm going to abandon this debate for now and LetTheReaderDecide on their own. I'll let my case stand as is. Here are the hard facts: -t
- 1. What the code checker inspects is not dependent on the usage of PS's versus filters (assuming we are not measuring "industry habits", per #5 below.)
- Actually, it is. PreparedStatement definitions must not include external values. Dynamic SQL -- i.e., any non- pre-prepared query dependent on external values -- must include filters on external values. If you're not clear about the distinction between PreparedStatements and dynamic SQL, see any SQL manual, any DBMS API in any popular programming language, and/or the examples at the top of the page.
- Example post-01 proves you wrong. If by "must", you mean developer/shop policy, well that's a different off-topic issue. Please clarify where this must-ness happens. I see nowhere where it is machine-enforced.
- I'm not sure why you think post-01 proves I am wrong, when it does precisely the opposite. It's an illustration of tainting a PreparedStatement definition with external data, which must be avoided. You appear to interpret "must" as meaning "is always technically enforced". That's not what is meant by "must". What is meant by "must" is that it is necessary that PreparedStatement definitions not include external data, and that dynamic SQL include filtered external data, and that this be enforced. That does not necessarily mean it's technically enforced. It could be enforced entirely by human effort -- through manual code audits, for example -- or by some combination of technical means and human effort, such as by using the code checkers suggested at the top of this page. Failure to enforce this -- whether due to technical limitations or human choice (which is what I presume you mean by "developer/shop policy") -- results in serious security risk.
- You appear to be rambling. If you are talking about human policy, it's off topic. To double clarify, an app code checker, whether human or machine, will have the goal of making sure that any external values are fed into field-oriented API calls not the SQL-string-construction API calls REGARDLESS of whether we are using PS's or filters under the hood. Under both PS's and filters, we don't want external values to "sneak in" via the SQL-string-construction side. This must-ness applies to both and is thus not a difference maker and is thus off-topic other than making sure the base assumptions are clear if somebody forgets.
- Regarding "rambling", I mentioned "human policy" only because you mentioned "developer/shop policy". I presume that's a "human" policy, no?
- Whilst it's true that when using both PreparedStatements and dynamic SQL, we "don't want external values to 'sneak in' via the SQL-string-construction side", the point this page demonstrates is that determining whether or not unfiltered external values might "sneak in" is technically simpler with PreparedStatement definitions than it is with dynamic SQL.
- You keep claiming that, but I just don't see the evidence. The code samples are not definitively shaped by whether PS or filters are used under the hood and thus tell us nothing definitive between the two techniques. I don't understand why you think those two samples back our claim.
- The code samples are shaped only by whether PreparedStatements or dynamic SQL (I presume that's what you mean by "filters"?) are used. If we're defining PreparedStatements, the definitions must (for example) never contain anything like $_POST['externalData'], whereas if we're creating dynamic SQL queries, they must contain something like sqlNumber($_POST['externalData']).
- Wrong. The code checker is only verifying that the app code is feeding the "right stuff" into the right API methods, it doesn't "know" whether PS's are used and that detail is hide-able below what the app coder and app checker have to be concerned with. Abstraction is kicking your examples right smack in their nuts.
- It's not wrong. We're assuming -- as has been stated multiple times -- that we're explicitly using low-level DBMS APIs like JDBC, ODBC, Perl DBI, PHP PDO and so on, so there is no "detail [that] is hide-able below what the app coder and app checker have to be concerned with." The only issue here is the complexity distinction between PreparedStatement definitions vs dynamic SQL, i.e., that the latter is more complex because it necessarily contains filtered external data that the former must not contain. Fundamentally, what we're talking about are differences in SQL code; particularly the impact of those differences on the client-side in terms of defining strings to specify the SQL code.
- "We're assuming..." yeah, that's the problem, as has been stated multiple times. Stop assuming. It's an arbitrary assumption and is not forced on an application coder by some inherent law of the universe or logic. You randomly pulled that assumption out of your ass. (Unless you intend to measure popularity of API design or something, which appears off-topic to me.)
- It's a reasonable assumption, because the distinction between PreparedStatement definitions vs dynamic SQL is present in every SQL DBMS that supports PreparedStatements. Since virtually every DBMS supports PreparedStatements, it is manifest in virtually every low-level DBMS interface, regardless what higher-level APIs may be used on top of it. Since higher-level APIs are invariably based on low-level DBMS interfaces, it is an issue for every system that accesses a DBMS. As has been pointed out multiple times, it's true that the concerns may not be "forced on an application coder", but they are certainly forced on some coder -- such as (in the case of higher-level APIs) the API author.
- And please stop using "dynamic sql" unless you clearly define it. Both examples are using dynamic sql as I interpret the phrase.
- "Dynamic SQL" is the standard, and conventional, term. It refers to the fact that the query text varies depending on external data, i.e., it's dynamic and constructed using string concatenation of static snippets and dynamic (usually external) data. PreparedStatements are static; their definitions either do not vary or (at worst) are constructed dynamically from strictly static snippets. Dynamic data should never be used to construct a PreparedStatement, and is only ever passed as parameter arguments at the point where a pre-defined PreparedStatement is executed.
- There is no official canonical standard. Stop making up fake standards. And whether the PS engine under the hood uses a predefined query ("static sql"?) or creates a new one is a low-level issue that is below what the code-checker would be looking for. The code checker is only checking what's fed into the API's, not how PS's or PS-related API's process SQL under the hood.
- There is a de facto standard, which is common use: Google "dynamic SQL" and note the approximately 971,000 results that agree with my usage. As for how "PS's or PS-related API's process SQL under the hood", it's been pointed out repeatedly that we're assuming JDBC, ODBC, native interfaces, PHP's PDO, Perl's DBI, etc. How they "process SQL under the hood" is well understood, i.e., they send the SQL to the DBMS.
- 2. Function usage can be removed from what the code inspector checks under filter usage if so desired (being that functions were adding to your "grammar complexity score" for lack of a better name.)
- Yes, you can conceivably remove the function definitions, presuming you can find a way to parameterise the queries (I'll leave that to you) whilst guaranteeing that parameter values are filtered. The "grammar complexity score" will still be higher for dynamic query definitions because of the filter invocations that are not required by prepared statement definitions.
- No, the score won't inherently be higher. It could potentially be identical even, per #3. (And yes, it is possible to paramaterize the queries in a fashion similar to typical out-of-the-box PS-related API's: question-mark insertion diddling. Now whether that's a developer-friendly design is another issue.)
- Yes, the score would inherently be higher. Remember, we're talking about SQL as sent to the DBMS, not higher-level APIs. In defining the SQL to be sent to the DBMS, PreparedStatements must be defined from, and only from, static strings. Dynamic SQL queries must be defined from static strings plus filtered references to external data. Filtered references to external data plus static strings (i.e., dynamic SQL query definitions) is inherently more complex than just static strings (i.e., PreparedStatement definitions).
- Your code checker does not appear to be checking for "SQL as sent to the DBMS". You are making no sense.
- My code checker would check SQL strings intended to be sent to the DBMS. For example, the PreparedStatement definitions might be sent to the DBMS for creation via PHP PDO's "prepare" method. (See http://www.php.net/manual/en/pdo.prepare.php) The dynamic SQL examples might be sent to the DBMS for execution by PHP PDO's "exec" method. (See http://www.php.net/manual/en/pdo.exec.php).
- Both scenarios would be doing that. It's not a difference maker that I see.
- The "difference maker" is that the PreparedStatement definitions are checked to make sure they're constructed strictly from static strings and nothing else, but the dynamic SQL definitions will be constructed from both static strings and filtered external (and dynamic) data.
- You seem to be mixing up under-the-hood issues with app-side checking issues.
- Here, that appears to be a random comment unrelated to my point. Did you intend to put it somewhere else? If not, I don't know what you mean by "mixing up under-the-hood issues with app-side checking issues". The only issue here is the relative complexity of typical PreparedStatement definition strings vs dynamic SQL definition strings, as used in typical (say) Web-based PHP applications.
- 3. What the code-checker inspects can be the same thing under either PS's or filter usage (via interface design choice of the DB-access middle-ware). (There may be DB-vendor-specific subtleties that vary, but I'm only looking at generalities.)
- Yes, but this (again) falls into the category of hiding concerns behind APIs. You can hide everything inside a function called DoIt() that does everything, and thereby eliminate any concerns at all.
- Good good, you are starting to get it!!! A key gear is starting to turn in your head, creaking from disuse. In this case I am assuming a separation of app concerns from query API/wrapper design concerns. In this scenario/topic the "code checker" only checks app concerns, not API/wrapper concerns. There is no reason to have the code checker check the query API's, for they won't typically have app-specific I/O ties (such as form variable access), and should be vetted via unit tests etc. (It wouldn't hurt to check the API code for things like "$_POST" though.)
- Are you sure it's not you who's "starting to get it"? I've already pointed out several times that higher-level APIs are moot. From above, for example: "We were discussing the difference between PreparedStatement definitions vs dynamic SQL definitions, not the use of APIs that hide both under a common interface. You can hide almost any coding concern behind an API, so such APIs are moot."
- No, we are discussing automated app code checkers and they won't be checking app-blind API's. If the API's are/grow app-specific, THEN they are subject to checking. Thus we cannot hide everything behind them because if we hide app-specific stuff behind them, those "insides" are then subject to automated code inspection.
- What are "app-blind API's"? I've already pointed out that the code checkers don't check APIs, they check string definitions. I'm afraid I don't understand the rest of your comment. Could you explain, perhaps with examples?
- Words are indeed failing us and it would probably require a fairly extensive coded example to explain my point. I don't wish to invest that time right now and so will bail on this topic.......for real this time. -t
- Surely a simple example wouldn't take much time or effort?
- Knowing you, I doubt it would be quick. Communicating with you has never been easy even with runnable examples.
- So your answer is no, you won't provide an example? Perhaps you're finding communication difficult because you're inclined to make assertions like "automated app code checkers [...] won't be checking app-blind API's" without backing them up with examples?
- What? You mean you intended to have it also check middle-ware API's?
- It doesn't check any APIs. Example-1A demonstrates the complexity of checking string definitions of SQL PreparedStatements. Example-1B demonstrates the complexity of checking string definitions of dynamic SQL queries. What are "app-blind API's"?
- The filter technique (without PS's) can use the Example-1A API style if the shop so desires. Do you agree with this statement?
- No, because that would require some database abstraction layer to process the SQL string before it's sent to the DBMS, which is not the scenario illustrated by the examples at the top of the page. As has been pointed out several times, the examples define SQL strings to be sent as-is to the DBMS, typically using low-level APIs like JDBC, ODBC, etc. If we allowed such an abstraction layer, it wouldn't be correct to call the first set of examples PreparedStatements, because they wouldn't be SQL PreparedStatements -- they would be PreparedStatement-like text that is converted to dynamic SQL.
- So you are dictating the middle-ware? Why? That's arbitrary.
- No, "the middle-ware" is irrelevant. We're examining the difference between the construction of SQL PreparedStatements vs the construction of dynamic SQL. It's about the construction of SQL constructs, not what APIs the SQL might flow through on the way to (presumably, but not necessarily) the DBMS. Because we've clearly indicated that Example-1A is a PreparedStatement, it strongly implies typical SQL PreparedStatement behaviour, i.e., that definition and execution are separate, therefore external data must be excluded from the definition. Because we've clearly indicated that Example-1B is dynamic SQL, it strongly implies typical dynamic SQL behaviour, i.e., that definition and execution occur in one step, therefore external data must be included and filtered. This has nothing to do with how -- or even if -- said SQL winds up in the DBMS.
- This paragraph is not clear to me. I still don't know what YOU mean by "dynamic SQL" and I'm going to stop trying to figure that out. And why are you saying "strongly implies"? What does human guessing have to do with machine-based code inspection? We are not building a guess-a-tron.
- "Dynamic SQL" is any SQL statement that is constructed and executed at run-time from a concatenation of variables and/or varying values, and static strings. I.e., the SQL that is executed depends on the values of variables. As for "strongly implies", I'm referring to what you, the reader, should infer from the terms "prepared statement" and "dynamic SQL", not what a machine-based code inspector should do.
- Let's rework the names:
<?php
// Example-1C
$snoffClauseStatus = "s = 'available'";
$snoffStmtDef01 = "SELECT * FROM piddle WHERE p = ?";
$snoffStmtDef02 = "INSERT INTO fizzle(x, y, z) VALUES(?, ?, ?)";
$snoffStmtDef03 = "SELECT * FROM piddle WHERE " . $snoffClauseStatus . " AND p = ?";
$snoffStmtDef04 = "SELECT * FROM furgle WHERE " . $snoffClauseStatus . " AND q = ?";
?>
.
- There! No mention of stinking PS's and your code checker gives it the same complexity score. QED!
- [Congratulations. You have shown that PreparedStatements are just as complex as PreparedStatements.]
- Dumbass! Who said anything about PS's being involved anywhere? Filters can be processed through positional-parameters-based-on-question-marks also, as already discussed like 4 times already.
- [The syntax of the SQL you used is only valid for PreparedStatements.]
- Bull! Filter-based validation can use those same declarations and positional-based parameters of so desired. That's an ARBITRARY middle-ware design choice.
- No, they are clearly PreparedStatement definitions as sent to the DBMS, as is recognisable from the syntax, and as was explicitly stated. What you're describing are not PreparedStatements, but SQL-like strings similar to the syntax of PreparedStatements which are sent to a SQL-syntax pre-processor for conversion to dynamic SQL. If that was what we were talking about, we would have said so.
- 4. Usage of PS's alone does not prevent "outside" values from being inserted into SQL. (I presume that's one of the reasons for an automated code checker, as proposed by the other side.)
- Indeed. That's what this whole page is about. Whilst PreparedStatements themselves cannot prevent external values from being inserted into SQL -- in fact, no DBMS call that accepts an arbitrary SQL string can guarantee that -- what this page shows is that PreparedStatement definitions to be sent to a DBMS are simpler to audit than equivalent dynamic SQL definitions.
- The last sentence is false. You are counting the wrong thing: you are not measuring PS-ness as given. (Or, at least PS-ness is a minor indirect factor.)
- Why do you think the last sentence is false? It's trivially true. As noted above, PreparedStatements must be defined from, and only from, static strings. Dynamic SQL queries must be defined from static strings plus filtered references to external data. Filtered references to external data plus static strings (i.e., dynamic SQL query definitions) are inherently more complex than just static strings (i.e., PreparedStatement definitions).
- That is human "must", I'm talking about machine issues.
- The examples at the top of this page demonstrate how the "human 'must'" can be verified by the machine.
- Under both technologies.
- The examples are obviously PreparedStatements and dynamic SQL. Is that what you mean by "both technologies"?
- They are only "obviously" because the variable names use PS terms. They could be called "$flingPoo" instead and it wouldn't change the metric given. It's a trick.
- No they're obviously PreparedStatements and dynamic SQL because the examples state that they're PreparedStatements and dynamic SQL. It has nothing to do with the variable names.
- Bull. Names and comments can lie or unintentionally mislead. One can use the same set for filters and no Code Zeus will zap them with lightning if they do.
- Huh? The examples are precisely what they are described to be. There is no deception involved, nor is it reasonable to consider it here.
- 5. In this topic, I am not debating typical or common existing industry habits/patterns with regard to middle-ware usage, design, or choice. I'm not taking a stand on those.
- They're irrelevant here, anyway.
- I'm just trying to make sure that's clear and documented because it seems to leak into the discussion from time to time.
- 6. Example-1A and Example-1B are pretty much interchangeable as far as whether PS's or filters are used "under the hood" and we cannot verify from looking at those examples alone which uses what. (Related to #3.)
- We don't have to "verify from looking at those examples alone which uses what", nor would we, nor does it matter. The developer who creates the definition strings -- like those at the top of the page -- knows that the first set of examples (Example-1A) are PreparedStatement definitions, perhaps to be used with PHP PDO's "prepare" method; and the second set (Example-1B) are dynamic SQL, perhaps to be used with PHP PDO's "exec" method.
- That's because the method names tends to mean a specific API call based on history/convention, but it could be some other name with the same syntax for the calls, or even redefined using the magic of polymorphism (although that's probably not a recommended practice in this case). It's just names.
- The examples specifically state that we're defining PreparedStatements or dynamic SQL, so "the magic of polymorphism" (?) etc. doesn't apply, nor is there any reference to method names in the examples. I mentioned PDO only to illustrate typical mechanisms for sending PreparedStatement definitions or dynamic SQL to the SQL DBMS.
- Okay, but that's one particular and arbitrary and swappable choice. With those PARTICULAR api styles you are indeed correct. But one case is not sufficient to make a generalization. The code checker engine won't know the difference, it's NOT reading your comments.
- It's not an "arbitrary and swappable choice" because we're assuming low-level DBMS APIs like ODBC, JDBC, PHP PDO, etc. These all expect SQL strings like those defined in the examples at the top of this page.
- Why are we assuming that? That's an arbitrary and limiting assumption. Did somebody hold a gun to the developer's head and say "you must use the out-of-the-box query API's". Are you in Russia?
- It's a reasonable assumption. Unless you're using some high-level database abstraction layer -- like an ObjectRelationalMapper, in which case you're not dealing with PreparedStatements, dynamic SQL, or any SQL at all -- you're defining SQL on the client-side. If you're defining SQL on the client-side, then you're almost certainly going to be creating SQL strings like those defined in the examples at the top of this page.
- It's not a reasonable assumption. And it's not all-or-nothing: raw out-of-box vender API's versus hide-SQL-all-together. There is plenty of middle ground and I've used that middle ground many times to better fit a shop's or app-style's needs and avoid hard-wiring them to specific vendors. And yes, we may still be creating SQL strings like Example-1A, BUT they don't necessarily have to be used for PS's. The filter approach (underneath) can also use the question-mark-insertion approach if one wants and also use a code-checker to make sure all externally-derived data/variables are only passed through as "column parameters" instead of the SQL part of the call, etc. And if we switched the mechanism underneath to use PS's instead of filters, it would STILL look like Example-1A and the code checker would still be checking the very same code (barring minor vendor-specific tweaks that may be needed).
- Your "middle ground" APIs internally rely on the low-level DBMS APIs like ODBC, JDBC, PHP PDO, etc. Thus, the same security issues that are illustrated by the examples at the top of this page hold true for the author of the "middle ground" API. From above, "any higher-level query API will almost invariably be built on a low level interface to the DBMS, [so] ensuring that PreparedStatement definitions do not include external data and ensuring that dynamic SQL includes filtered external data becomes a concern of the API developer. If no higher-level query API is used -- which is very common -- ensuring that PreparedStatement definitions do not include external data and ensuring that dynamic SQL includes filtered external data becomes a concern of the application developer. In either case, the same concerns are present and, therefore, the examples at the top of the page are applicable; the only difference lies in which developer deals with those concerns." Also from above, "Indeed, an API could be designed to identically support dynamic SQL queries or PreparedStatements. Then the only significant difference between the two (performance issues aside) is that dynamic SQL has to rely on filters to prevent SQL injection. Thus, the sole risk mitigation against SQL injection is the quality of the filters. With PreparedStatements, there is no such risky dependency. It is for that reason that OWASP recommends PreparedStatements even over its own ESAPI filters." When I last wrote that, you complained that I was "wondering [sic] off the topic of syntax, grammars, and code style", so I've stuck to focusing on the key differences between PreparedStatements and dynamic SQL. Hence, my assumption is still reasonable.
- I generally agree with most of that. But it doesn't contradict the fact that the metrics used at the examples at the top of the page are measuring API design and NOT difficulty of checking PS-related issues versus checking filter-related issues. Your mistake still stands. The API style is orthogonal to PS-versus-filter usage. It's true shops may typically use certain styles, but we are not measuring industry habits, like I have to keep pointing out. And this is not the topic to discuss the security merit of PS's versus filters as I keep pointing out, but I guess you have to satisfy an urge to jizz your opinions repeatedly all over this wiki. We seem to be going around in circles yet yet yet yet yet again.
- How could the metrics be measuring API design? Aside from the filter function invocations -- which are measured only by presence, not their structure or purpose or anything else -- there are no references to APIs. The metrics are measuring exactly what they claim to measure -- the complexity of the construction of certain SQL statements as string values which are assigned to variables. They are API-independent because they're strings containing SQL. If you want to define and measure metrics about an API that constructs SQL statements another way, that's fine, but it is entirely orthogonal to what is presented at the top of the page. Claiming that the metrics are measuring API design is like claiming that the CPU speed of your computer measures the number of compiler errors in Visual Studio 2013.
- Let me see if I can state this another way: all possible query API styles heavily overlap between those that use PS's under the hood versus those that use filters under the hood such that measuring the difficultly of checking the app-side query API usage for undesired "outside" value insertion will almost entire depend on the API design and NOT depend on whether those API's wrap PS's or filters (as the injection protection mechanism). Example-1A and Example-1B could be used by API's that use either PS's or filters under the hood such that the contents of Example-1A and Example-1B are not dependent on PS's versus filters. Therefore, measuring the code of Example-1A and Example-1B tells us little or nothing about PS-versus-filters. PS's versus filters is a lower-level-issue than what those examples show (or at least below what a code-checker would or could be looking at). -t
- The examples at the top of the page are explicitly not about some hypothetical "possible query API styles". They're about queries defined for use with standard DBMS APIs like ODBC, JDBC, PHP PDO, Perl DBI, and so on. They're about the construction of SQL at the client-side, and therefore reflective of the SQL itself. Thus, Example-1A and Example-1B tell us precisely the difference between creating two different SQL constructs -- prepared statements vs dynamic SQL.
- You really believe that shit, don't you? I'm confident most readers will agree with me based on what's given and thus I will resist the urge to re-re-re-re-debate these points over and over again redundantly and repeatedly multiple times redundantly again and again. LetTheReaderDecide! LetTheReaderDecide! LetTheReaderDecide! LetTheReaderDecide! LetTheReaderDecide! LetTheReaderDecide! LetTheReaderDecide! LetTheReaderDecide! LetTheReaderDecide! LetTheReaderDecide! LetTheReaderDecide! LetTheReaderDecide! LetTheReaderDecide! LetTheReaderDecide! LetTheReaderDecide! LetTheReaderDecide! LetTheReaderDecide! LetTheReaderDecide! LetTheReaderDecide! LetTheReaderDecide! LetTheReaderDecide! LetTheReaderDecide! LetTheReaderDecide! LetTheReaderDecide! LetTheReaderDecide! LetTheReaderDecide!
- [As a reader, I've decided that you, Top, are wrong.]
- Your opinion is noted. You two think a lot alike such that it doesn't surprise me you two agree with each other. What a code-checker would inspect and whether the query API's use PS's or filters under the hood to prevent injection are clearly orthogonal to me and nobody has clearly articulated with an example or clear-cut logic proving they MUST be related by the inherent laws of the universe, logic, or IT. The non-orthogonal arguments are convoluted, round-about, obtuse, and very indirect on a good day. You are crappy technical writers, at least for non-Phd's. -t
- Why would we assume "query API's that use PS's or filters under the hood" when it's clearly stated up front that we're only defining SQL PreparedStatements and dynamic SQL queries? Thus, Example-1A demonstrates the complexity of checking string definitions of SQL PreparedStatements. Example-1B demonstrates the complexity of checking string definitions of dynamic SQL queries. APIs don't enter into it, other than we can safely assume they're used somewhere to pass the defined strings -- as defined -- to the DBMS.
- By the way, what does being "crappy technical writers" have to do with it? No one claimed to be a technical writer. Should being a technical writer be a prerequisite for posting to Wiki?
- Being a sufficient technical writer is not a prerequisite to post on this wiki, but IT IS a prerequisite to being understood by a general IT audience.
- Then it's a good thing I'm not writing to a general IT audience. I'm discussing IT and programming issues with professional colleagues, not writing a book for general consumption.
- Clubby WalledGarden vocab & GroupThink. That's why you write like your convoluted shuttin wiki-buddy.
- Use of such AdhominemAttacks weakens both your arguments here and your overall credibility.
- Being wrong on the orthogonality between PS's and API's weakens yours.
- Being legitimately wrong about a technical or academic matter and admitting the wrongness once it is realised actually strengthens one's credibility. AdhominemAttacks only weaken your credibility, and do so far, far more than merely being shown to be incorrect.
Groundhog day on the automated tools. Such automated tools would have to recognize functions IN GENERAL because code may be in functions for multiple purposes, which may have nothing to do with databases, correct? Typical CRUD code I see has roughly a third of the code in the MAIN function or equivalent. That means a parser will have to recognize functions and parameters 2/3 of the time. Remember the "derail" argument above? Did you forget about the derail argument? Why are you ignoring it now?
Your "derail argument" was invalidated. Secure applications do not pepper the application code with SQL. Database abstraction layers isolate the application layer from the DBMS, and SQL definitions are confined to modules in the database abstraction layer in order to reduce the likelihood of tainting them with external data. As noted above, "In PHP [SQL definitions] can (like with most other languages) be confined to their own source files so I don't have to process functions, modules, and variable and naming scopes". In other words, automated tools do not have to recognise functions "IN GENERAL". As I wrote at the top of the page, "the code is highly constrained" -- and deliberately so.
That's a code design choice orthogonal to PS's. I'm not going to argue general app code design here. Note that most shops I've been around use Stored Procedures if they truly want such strong separation, not Prepared Statements. If you TRULY want to force-separate the SQL from the app, Stored Procedures make it easier because the SQL then does not come from app code and is not inadvertently mixed in. One can even set database permissions to prevent the app from running non-stored-procedures, THEN you don't need to write code parsers to check for rogue SQL. Typically you'll see something like this in the "DBA panels":
Allow this account to:
[X] Run stored procedures
[_] Run SELECT statements
[_] Run UPDATE statements
[_] Run DELETE statements
[_] Etc...
In summary, if you want solid SQL-from-app separation, then using stored procedures makes it so that you don't need to write parsing-based code validators. The DBMS policy settings prevent rogue SQL alone (assuming they don't use an unknown database account). You are using the wrong tool for the job, based on the given "protection" requirements. -t
Stored procedures are certainly reasonable on the DBMS side, but they have no impact on what SQL appears on the client side, or where it appears on the client side. Whether you're invoking PreparedStatements, StoredProcedures, or dynamic SQL, secure applications do not pepper the application code with SQL.
Regardless what server-side constructs you use, separation between the DBMS interface (e.g., PDO, JDBC, ODBC, whatever) and the application code is still necessary on the client side. If you've seen shops using StoredProcedures to improve security and enhance separation and security on the server-side, that's not surprising -- sometimes StoredProcedures are entirely appropriate, especially if there's more going on than just invoking a parametric query. However, if we're only invoking a parametric query, PreparedStatements are the latest construct for ensuring security of parametric (and improving performance of repeated execution of most static) queries -- without the overhead of StoredProcedures -- but it sometimes takes a while (often decades) for conservative shops to catch up.
I'm still puzzled on how Php's PS's tell the type of the parameter. I suspect it either uses the SQL's syntax and schema info, or the variable's type indicator (seen through getType()). PDO does have an explicit type version that's closer to Microsoft's:
http://www.php.net/manual/en/pdostatement.bindvalue.php
Relying on a variable's type indicator does not seem like a very good idea to me.
Why not? Type mismatches in PreparedStatement parameter arguments aren't a security risk, they only -- at worst -- result in the query failing to execute or returning incorrect results.
Who said security risk is the only factor to judge them on?
If we can create an API that is equivalent for both dynamic SQL and PreparedStatements -- as discussed above -- then the only factor left to judge them on is security risk.
Okay, but that's already covered in prior topics. Let's both resist the urge to repeat prior battles here. I guess I am complaining about your choice of API or API usage. The type tag (indicator) is a crappy way to manage types, and I'd like to see them deprecated or fall out of style. Perhaps my anti-tag rant here is partly off topic. I say "partly" because I don't want to compare bad ideas to other bad ideas if syntax/grammar/language-design issues affect our metrics.
I can appreciate that, and some PreparedStatement APIs do not require explicit type specifications. However, if you're creating dynamic SQL, you must always specify filters for external data to (hopefully) prevent SQL injection. What's the difference between specifying a dynamic SQL filter like sqlStringFilter() or sqlNumberFilter() or sqlDateFilter(), and specifying a PreparedStatement's parameter type?
I don't understand why you are asking that question. The app code can be nearly identical, per above, depending on API design. The syntax/design choice is orthogonal to PS-ness.
Indeed, that's my point. Whether you're specifying "Number" as a filter for dynamic SQL or a parameter type for a PreparedStatement, it's the same.
I don't know what your point is.
You wrote, "The type tag (indicator) is a crappy way to manage types, and I'd like to see them deprecated or fall out of style." It suggests that one of your objections to PreparedStatements is their frequent use of ManifestTyping, i.e., explicitly specifying parameter types like "Number", "String", "Date", etc. However, if you're using dynamic SQL with filters, you also must explicitly specify parameter "types" in the form of filters that ensure external data is actually a Number, String, or Date and not an attempt at SQL injection. So, with PreparedStatements you often (but not always) have to explicitly specify parameter types. With dynamic SQL, you always have to explicitly specify parameter (i.e., external data) "types" (i.e., filters).
It's perfectly possible for filter wrappers to get type info from Php's type tags also, like the PS libraries appear to do (as one possibility), but I consider it bad practice, at least if SQL is the target query language.
It's not really possible, because PHP's built-in types aren't fine-grained enough to reliably select appropriate filters. You'll inevitably need a "Date" filter, but PHP doesn't have a "Date" type.
How do the PDO API's do it then? There's something odd going on. There are some databases that will process numeric literals with quotes around them, so maybe that's what's happening, but it's not universal across DB vendors.
Noting that a PDO prepared statement is a wrapper around a SQL PreparedStatement, PDO prepared statements work like this: When a PDO prepared statement is defined without explicit parameter types, it defers creating the SQL PreparedStatement until the first attempt is made to execute it. Upon execution, the PDO prepared statement's argument types are used to define the underlying SQL PreparedStatement and its parameter types, and then the SQL PreparedStatement is executed. This means the developer need not explicitly specify parameter types, but it has the downside that SQL errors -- if any -- are not reported until the first attempt to execute the PDO prepared statement.
The vast majority of the time, form-related or typical input-related validation is required anyhow such that the filter calls are often integrated with the validation API/framework. For example, we'll know that a given variable is to be validated as a "date" and thus the filter call is factored to a date validation handler instead of repeated for each date. Validating numbers, dates, SSN's, ID's, etc. typically will be checking for the same characters/patterns that anti-injection filters will be checking for.
That is true, but how is it relevant?
One does not need an explicit filter call for each column. It may matter depending on how "complexity" or "app code size" is measured. Thus, in practice, the filter code will usually not look like the opening examples but rather be integrated with validation. Automated code validation would then want to make sure that every "input field" goes through the standard/sanctioned validation API as apposed to "rogue" SQL creation. The same would apply even if using PS's because mere usage of PS's does not by itself stop one from directly concatenating "values" to SQL.
Sorry, I'm not following you. Could you provide an example?
Under both kinds of approaches, the code checker seems to pretty much have the same job either way: make sure the API calls that register or run SQL statements are not fed variables that get outside or suspicious values. Generally there will be one or more "statement oriented" methods and one or more "column oriented" methods (or parameters) in the query API used by the app code. The code checker has to make sure that the statement-oriented methods (or parameters) are not fed variables/values that come from outside (or from suspicious sources). It's okay if the column-oriented methods get outside info. Now, whether there are functions in the app code that the checker checks depends on the API design and app coder style, NOT on whether PS's or filters are used under the hood.
What the grammar examples at the top of the page demonstrate is that a code checker does not "have the same job either way". Checking dynamic SQL requires verifying that a SQL string is constructed from static strings and filtered external data. Checking PreparedStatement definitions is fundamentally simpler, because it need only verify that PreparedStatement definitions are constructed solely from static strings.
- The top example is an arbitrary API design such that your metric is measuring only one of many possibilities. We've been over this already.
- How is it "an arbitrary API design"? It makes no reference to any particular API design, and aside from variations in the parameter indicator (the "?") across various DBMSs and APIs, it would be the same PreparedStatement definition whether using JDBC, ODBC, various native DBMS interfaces, PHP PDO, Perl DBI, or some other SQL-based DBMS access layer in almost every popular imperative programming language. It's not dependent on any particular API, because it's just SQL. The API merely provides the means by which the client-side language passes the SQL code to the DBMS. This page is about the SQL code and how it's constructed, not the API that sends it to the DBMS.
- No it's not, you are measuring an arbitrary design choice, not something inherently forced on one per PS's versus filters.
- PreparedStatements and dynamic SQL (what I presume you mean by "filters") are fundamentally different at the SQL DBMS level. The former is SQL defined and executed (with parameter arguments) in two separate steps. The latter is SQL executed in one step. This fundamental difference impacts how SQL is defined at the client-side; so either the DBMS API developer and/or the application developer must deal with it, and that's what the examples at the top of this page are about.
- Whether they are different at the "SQL DBMS Level" and use different steps is irrelevant or can be made irrelevant at the middle-ware level. You are again conflating low-level issues with middle-level issues. Stop doing that and be smarter. If you claim that low level stuff MUST affect the middle level, then prove it with logic or math. So far you just yap around with vague English. -t
- Remember that the examples at the top of this page logically prove a difference in complexity between client-side PreparedStatements definitions and dynamic SQL definitions, as demonstrated by defining strings within a typical imperative programming language. We're not talking about some hypothetical "middle-ware level". It is irrelevant. It's obviously not necessary to actually use strings like the examples in actual applications -- you can use a higher-level API if you wish. The important point is that PreparedStatement SQL definitions intended for the DBMS (as opposed to some intermediate API) are inherently simpler than dynamic SQL. That complexity difference might have been shielded from you the application developer, but some developer had to deal with it.
- You want it to be irrelevant to hide your flawed logic. It's the fucking 2010's, not the 1970's. App developers don't have to be stuck with low-level or out-of-the-box API's anymore. Stop making up arbitrary rules and arbitrary definitions and arbitrary examples. It's a bad habit. "Inherently simpler" = STUPID-ASS BULLSHIT!
- Is your emotional outburst necessary? And how is my logic flawed? It's true that "app developers don't have to be stuck with low-level or out-of-the-box API's anymore", but as I pointed out, it's irrelevant -- not because I want it to be, but because it is. This page proved that PreparedStatement SQL definitions intended for the DBMS (as opposed to some intermediate API) are inherently simpler than dynamic SQL, as demonstrated via logical proof using their relative grammar complexity. Again, that doesn't mean you the application developer has to deal with that complexity, but some developer -- maybe the one who wrote the APIs you use -- did. By the way, you might wish to examine the source code of various popular OpenSource SQL-database driven applications. Most of them, it appears, are still using "low-level or out-of-the-box API's". I suspect the custom business application world is the same.
- Re: "This page proved..." -- Example-1B is a strawman (and the first one is also misleading). The API's for filter-oriented-checking can be just like Example-1A, creating an identical score. And "popular" doesn't matter for reasons already given. We are concerned with absolutes, not preferences in this topic. I keep pointing out that "popularity" is off topic, yet you keep leaking it back into the discussion without justification. Are you retarded? I have no other explanation other than mental retardation because you keep making the same debate mistake. It's frustrates the shit out of me that you keep bringing up debunked arguments over and over. Popularity is clearly off topic and you made no effort in the past to demonstrate it was on topic. You just seem forgetful. It's like debating an Alzheimers patient. (If you wish to change the base premise to reflect some form of popularity such as-commonly-found, that's fine with but make sure you are clear on that change rather than fuzzily half insert it all about like you've been doing.)
- There is no mention of preferences in this topic. However, what you call "popularity" is the simple fact that the fundamental difference between PreparedStatements and dynamic SQL is imposed by the SQL DBMS and exposed through every low-level DBMS API. Because low-level DBMS APIs (like ODBC, JDBC, Perl DBI, PHP PDO, etc.) underpin high-level APIs, the implications of that fundamental difference -- in terms of code audit-ability and complexity -- are present in every DBMS access layer, whether it's exposed to the application developer or the API developer. It isn't a matter of "popularity", but of DBMS reality: All DBMS access layers are ultimately producing, or passing through, either dynamic SQL or PreparedStatement definitions/executions.
- Re: "There is no mention of preferences in this topic." Whether you label them as "preferences" or not, they ARE preferences (one of many possible API designs). Your logic (cough) sounds similar to, "Hitler is not a tyrant because he never called himself a tyrant". As far as counting ALL code instead of just APPLICATION code, see below near "smell scope-slip".
- Is that a deliberate invocation of GodwinsLaw? If this were USENET, rules would dictate that the debate is over and you lose. :-)
- API designs are irrelevant here, because we're talking about the construction of SQL strings to be sent to the DBMS. I'm not aware of any low-level SQL DBMS interface that doesn't accept PreparedStatement and dynamic SQL strings. Who said anything about "counting ALL code instead of just APPLICATION code"? My argument is that validating dynamic SQL is more complex than validating PreparedStatements, in general. The grammar examples at the top of the page are proof of that using a minimal case, so they aren't meant to be "counting ALL code" in and of themselves.
- And being "shielded" by API's is not necessarily a bad thing either (although I won't get into the tradeoffs here). Certain API designs can reduce the complexity needed by code checkers of the kind you mentioned at the start. It's an issue of API design, NOT whether we use PS's or filters. Either can use either example's style, per API designer preference INDEPENDENT of PS-versus-filters.
- At the point of interaction with the DBMS, it has nothing to do with API design and everything to do with the fact that the DBMS presents a fundamental difference between PreparedStatements and dynamic SQL. PreparedStatements separate definition from execution. Dynamic SQL does not. This results in a difference in complexity at the client-side. Of course, you can hide that difference in complexity behind a "shielding" API, but as has been pointed out numerous times, that's irrelevant: Some developer must consider that difference in complexity on the client-side. Of course, because "it's the fucking 2010's, not the 1970's", that difference in complexity might be given only glancing notice, as PreparedStatements are almost invariably preferable to dynamic SQL -- for various reasons -- even without considering the fact that dynamic SQL is more complex.
- Re: "At the point of interaction with the DBMS" -- why should an app checker give a shit about that? It's low-level gut diddling. And the last sentence is wondering off topic.
- I'm not sure what you mean by "why should an app checker give a shit about that?" -- what's an "app checker"? -- but at the point of interaction with the DBMS it is important, for security reasons, that SQL code not be tainted with unfiltered external data. This page simply uses grammar complexity to prove the intuitively-obvious observation that it's simpler to verify that PreparedStatement definitions (as sent to the DBMS) aren't tainted with external data than it is to verify that the external data used to construct dynamic SQL has been filtered. The reason for this is (again, this is intuitive) that PreparedStatement definitions must not contain any external data, but dynamic SQL must contain filtered external data. That, of course, is because the DBMS imposes the rule that PreparedStatement definitions must be defined in one step and executed in other, whereas dynamic SQL is executed in one step.
I don't know what you mean by "'column-oriented' methods in the query API", "statement-oriented methods", or "PS's or filters". Could you give an example or define your terms?
No. Not today.
Do you expect your claims or terminology to be taken seriously if you're not willing to defend or define them?
I believe what I have already written is sufficiently explained that most IT-experienced humans will know what I am talking about. They will know that you are not measuring what you claim to be measuring. You just have personal mind-blocks due to bias and stubbornness, and I don't feel like babysitting willfully stubborn mental patients today. In short, LetTheReaderDecide.
What do you think AdhominemAttacks do for your credibility?
I honestly believe you are nuts. However, I'm also a nut for arguing with a known nut. Maybe we can get a group discount on therapy. --top
That's just more AdhominemAttacks. If you can't defend your position rationally or logically, have you thought about reconsidering your position?
They are rational and logical. You are just stupid and read it wrong due to bias or insufficient real-world experience to understand how real shops operate.
How is "I don't feel like babysitting willfully stubborn mental patients today" a rational and logical argument in favour of your position?
{ A reader (me) decided that Top is a Crank. http://en.wikipedia.org/wiki/Crank_%28person%29 }
- "'Crank' is a pejorative term used for a person who holds an unshakable belief that most of his or her contemporaries consider to be false. A crank belief is so wildly at variance with those commonly held as to be ludicrous. Cranks characteristically dismiss all evidence or arguments which contradict their own unconventional beliefs, making rational debate a futile task, and rendering them impervious to facts, evidence, and rational inference."
- "Cranks overestimate their own knowledge and ability, and underestimate that of acknowledged experts."
- "Cranks rarely, if ever, acknowledge any error, no matter how trivial."
- "Cranks love to talk about their own beliefs, often in inappropriate social situations, but they tend to be bad listeners, being uninterested in anyone else's experience or opinions."
- "Some cranks lack academic achievement, in which case they typically assert that academic training in the subject of their crank belief is not only unnecessary for discovering the truth, but actively harmful because they believe it poisons the minds by teaching falsehoods."
- "seriously misunderstand the mainstream opinion to which they believe that they are objecting,"
- "compare themselves with Galileo or Copernicus, implying that the mere unpopularity of some belief is in itself evidence of plausibility,"
- "misunderstand or fail to use standard notation and terminology,"
- "ignore fine distinctions which are essential to correctly understand mainstream belief."
- "That is, cranks tend to ignore any previous insights which have been proven by experience to facilitate discussion and analysis of the topic of their cranky claims; indeed, they often assert that these innovations obscure rather than clarify the situation."
- "Or the crank may present their ideas in such a confused, not even wrong manner that it is impossible to determine what they are actually claiming."
Projection. And at least I keep my
AdhominemAttack's relatively compact instead of copy and paste lists. But then again, you seem to like repetition. -t
We have no way of knowing whether the author in question -- i.e., the one who wrote "A reader (me) decided that Top is a Crank" and pasted the list of 'Crank' attributes -- likes repetition or not, as he/she has posted on this page for the first time. I didn't post it.
And if you cannot realize that Example-1A and Example-1B are NOT inherently hard-wired to PS versus filter usage under the hood, then you are truly delusional. It's fucking obvious to normal coders. Something is fucked in your head. I suspect your stubbornness against admitting you are wrong is blinding you. Your ego writes checks your logic cannot cover and you then quibble and obfuscate over irrelevant issues to hide the fact you got it wrong. The two examples are ORTHOGONAL to PS-ness, SUCK IT, WRONGBOY! -t
They're "hard wired" to PreparedStatement vs dynamic SQL usage specifically because the examples say that's what they are. The PreparedStatement definition examples demonstrate text intended -- by the example itself -- to be sent to the DBMS via something like PHP PDO's 'prepare()' method. The dynamic SQL definition examples demonstrate text intended -- by the example itself -- to be sent to the DBMS via something like PHP PDO's 'exec()' method.
I challenge you to prove that Example-1A inherently CANNOT be used with filters alone. Dare ya dare ya! -t
That's as silly a challenge as asking me to prove that Example-1A inherently CANNOT be used as a question in an on-line quiz program about SQL, or in a game to teach newbies about SQL. Of course the strings in the examples could be used that way or in innumerable other ways. But, the examples themselves state that they're PreparedStatements and dynamic SQL, so they're not quiz questions, they're not game elements, and they're not strings that are deliberately made to look like PreparedStatements but are actually turned into filtered dynamic SQL by a hypothetical database abstraction layer. If they were any of these things, I would have said so. The examples explicitly state that they're PreparedStatements and dynamic SQL, which have clear and recognised definitions. The obvious inference -- reinforced by stating it explicitly, multiple times, above -- is that they're sent to the DBMS as-is.
No no no, we can also use the code in Example-1A for the equivalent queries and injection inspections for filters. And there is nothing wrong with making API's "deliberately made to look like PreparedStatements" if that makes code inspection easier for humans and/or machines per shop preference (naming aside). (I'm not making a design style/grokkability value judgement here because that's not the issue.)
As pointed out above, if that's what is being done, I would have stated it. As stated, the presumption is that the strings containing SQL are being sent to the DBMS as-is. At some point, that will have to occur, regardless what database abstraction layers are used. At some point, a programmer must make a decision as to whether PreparedStatements or dynamic SQL are used -- whether inside a database abstraction layer, or at an application programming level -- and at that point the considerations shown at the top of this page are relevant.
- "At some point" perhaps. But if that "point" is hidden in the guts of the middle-ware or below, then it's not something the code inspector will be concerned with. And it may be possible that the app programmer never has to know or care whether PS's, filters, or nothing (as-is) is used under the hood. As far as "would have stated it", that's your PERSONAL PREFERENCE of middle-ware design such that it doesn't matter what you state. We are NOT just testing your personal preferences, but ALL POSSIBLE cases, if we want to make general statements. Your particular example is one of a gazillion ways to do it.
- It's been pointed out several times that the point made by this page is that the complexity difference between PreparedStatements vs dynamic SQL exists, whether it is dealt with by the application programmer or the API developer matters not; the complexity difference has to be considered somewhere by some developer. My examples are indeed one (or two) of a gazillion ways to generate SQL, but remember that the purpose of the examples is to frame a proof of the complexity difference between PreparedStatement definitions and dynamic SQL, not to make some claim about actual use (or not) of middle-ware.
- "Some developer" yes. "Some developer" created a parametrized interface for typical PS API's one finds. The same can be done for API's that use filters instead of PS's under the hood also. And likewise, the out-of-the-box PS API's could have used the interface style of Example-1B. The choice is arbitrary. There's no law of the universe that stops such. You are NOT measuring something unique to PS's. I don't think you thought your example and claim through and got caught with your abstraction down and are now trying to cover it in goggledy-gook and irrelevancies now.
- Of course I'm not measuring something unique to PreparedStatements; I am measuring a characteristic of PreparedStatements. There may be other things that use PreparedStatement-like syntax, but they're irrelevant. The fundamental difference in complexity between actual PreparedStatement definitions and dynamic SQL is there at the interface to the DBMS, regardless what other APIs may emulate PreparedStatements.
- Re: "I am measuring a characteristic of PreparedStatements." -- That's debatable over whether API wrappers over PS's are called PS's themselves. (LaynesLaw over "PS"?) But even if that claim was true, if PS-ness is wrapped away, why should we fucking care? Don't invent fake worries just to win your quibble-war.
- What do you mean by "fake worries"?
That's my very point: what your inspect-a-tron is looking at and what your complexity metrics are measuring is an
arbitrary API design choice, NOT something inherently tied to PS's versus filters. That should be butt-clear to everybody in town. Your "metric" is like measuring bicycle speed by measuring the rider's shoe size: shoe size and bicycle speed are weakly related at best.
And I realize your PARTICULAR example is meant to be for PS's. But I've already agreed that that PARTICULAR scenario would score better in your grammar complexity metric. But a sample size of one tells us nothing GENERAL here. I've pointed this out already, but your thick skull ignored it. Certain specific API design choices score higher than others. That's not news.
I'm not sure what you mean by "a sample size of one". ODBC, JDBC, Perl DBI, PHP PDO, various native APIs -- all of which directly accept strings containing SQL like those in Example-1A and Example-1B -- are commonly used, including inside the database abstraction layers which you insist have some relevance. Therefore, the scenario at the top of the page is relevant whenever choosing between PreparedStatements and dynamic SQL, even if that decision is only visible in side a database abstraction layer.
Like I keep pointing out, your particular example is one of a gazillion ways to do it. If such issues are hidden at the low level, it's not something the code inspector would bother with and thus not affect the grammar score. And we are not measuring "commonality", as I keep pointing out on deaf ears. I'm not challenging typical usage pattern claims, only what the laws of logic force or don't force.
The point of this page has been to prove that PreparedStatement SQL definitions are inherently simpler than dynamic SQL. That has been proven. It doesn't matter whether that impacts the application developer or the API developer who is creating middle-ware, or not.
Filter-based API's can cross-dress to look just like out-of-the-box PS API's making your point moot.
As presented by the DBMS, PreparedStatement definitions and dynamic SQL are fundamentally different, and that has a direct impact on the client-side code complexity at the interface to the DBMS. That high level APIs can perform code translation or whatever is irrelevant; we're talking about definitions of SQL code at the interface to the DBMS.
You have yet to demonstrate a "direct impact". It's clearly orthogonal to me because those are details at an abstraction below application-level code checking and both styles can be swapped for another from the app's perspective. There's a missing link in your "logic".
The "direct impact" is what's shown at the top of this page: It's simpler to verify that PreparedStatement definitions don't contain external data than it is to verify that dynamic SQL contains filtered external data. We're not talking about what some higher-level API may do; we're talking about PreparedStatement definitions and dynamic SQL as sent to the DBMS, which is ultimately the only place where the result of client-side database security and auditing is relevant.
You still have failed to show that Example-1A could NOT be used with filter-based systems to produce the same DB results (data changes or output).
That's like asking me to prove that Example-1A inherently CANNOT be used as a question in an on-line quiz program about SQL, or in a game to teach newbies about SQL. Of course the strings in the examples could be used that way or in innumerable other ways. But, the examples themselves state that they're PreparedStatements and dynamic SQL, so they're not quiz questions, they're not game elements, and they're not strings that are deliberately made to look like PreparedStatements but are actually turned into filtered dynamic SQL by a hypothetical database abstraction layer. If they were any of these things, I would have said so. The examples explicitly state that they're PreparedStatements and dynamic SQL, which have clear and recognised definitions. The obvious inference -- reinforced by stating it explicitly, multiple times, above -- is that they're sent to the DBMS as-is.
Thus, it still stands that the choice between using Example-1A and using Example-1B is not dependent on PS's versus filters and thus any differences in them are not due to PS-versus-filters and thus their grammar scores are not related to PS-versus-filters. -t
There is no "choice between using Example-1A and using Example-1B". They exist only to prove that PreparedStatement SQL definitions are simpler than dynamic SQL under the conditions described in the scenario. Various intermediate APIs, middle-ware, deception, translation of strings in one syntax or another, etc., are not part of the scenario and are therefore irrelevant.
You are measuring something that can be abstracted away, and is thus not forced on the developer nor code checkers. You may have won one battle, but you lost the general abstraction war. Now, go waddle back to your family with cannonballs in your ass, LOSER!
What do you think AdhominemAttacks and red-neck insults do for your credibility? As for what is "forced on the developer" or "code checkers", actually I am measuring something that some developer or code checker must deal with: It's imposed by the DBMS -- and therefore reflected in the definitions of the SQL code sent to the DBMS via the DBMS API -- which some developer must deal with. However, that developer is not necessarily you; the code checker is not necessarily yours.
Why should anybody care about metrics on YOUR arbitrary choice? It only means something to you. It tests one approach out of many, but we cannot say anything general based on one sample. 1 down, 38,282,728,402,183,478,092,893,782 API designs to go. Now git back to work, Example Boy!
Whether it's visible to the application programmer or only visible to the API developer, the fundamental difference in complexity between PreparedStatements and dynamic SQL is always there, because it's imposed by the DBMS and therefore must be considered at the interface to the DBMS. That's what this page proves. What the proof provides is another point of evidence that there is no reason to favour dynamic SQL with filtered external data over PreparedStatements.
- Bullshit, you stupid, willfully stubborn delusional idiot.
- This AdhominemAttack serves what purpose?
- Just like your repetition of bullshit, it serves no fucking purpose whatsoever.
- That's very mature of you.
Your example does NOT demonstrate that because it's orthogonal to how PS's work under the hood. If using PS's imposes some kind of interface constraint, your particular choice of example does NOT demonstrate that. You need to rethink your "proof", something along the lines of:
If guts use PS's, THEN App-side interface must follow pattern X
If guts use filters, THEN App-side interface must follow pattern Y
Why? That's not what this page is about. This page simply proves that it easier to verify that PreparedStatement definitions don't include external data than it is to verify that dynamic SQL includes filtered external data.
"Dynamic SQL" is not well defined. Anyhow, I'm tired of this repetitious dance and won't get pulled into your word games. I'm confident most unbiased readers will agree that how the app coder interacts with the DB is fairly arbitrary such that you are measuring a specific and arbitrary API design rather than something INHERENT to PS-versus-filters. You think God holds a gun to their head and forces them to use an out-of-the-box API? What crazed religion do you belong to? You read the Book of Morons by Joseph Meth? The styles in the intro examples are swappable between PS and filters, proving it irrelevant and arbitrary and smart normal people will agree. You measured an arbitrary UselessTruth, at best. You are off-topic but are too stubborn to realize/admit that. Chant it over and over and might click in your stubborn little pea brain:
BEGIN CHANT
Arbitrary and Swappable. Arbitrary and Swappable. Arbitrary and Swappable.
Arbitrary and Swappable. Arbitrary and Swappable. Arbitrary and Swappable.
Arbitrary and Swappable. Arbitrary and Swappable. Arbitrary and Swappable.
Arbitrary and Swappable. Arbitrary and Swappable. Arbitrary and Swappable.
Arbitrary and Swappable. Arbitrary and Swappable. Arbitrary and Swappable.
Arbitrary and Swappable. Arbitrary and Swappable. Arbitrary and Swappable.
Arbitrary and Swappable. Arbitrary and Swappable. Arbitrary and Swappable.
Arbitrary and Swappable. Arbitrary and Swappable. Arbitrary and Swappable.
Arbitrary and Swappable. Arbitrary and Swappable. Arbitrary and Swappable.
Arbitrary and Swappable. Arbitrary and Swappable. Arbitrary and Swappable.
END CHANT
They're not "arbitrary and swappable" at the interface to the DBMS, and that's what matters.
And you arbitrary choose "what matters". If we "must" check all low-level API's also, then all practical low-level API's will have functions/methods, and thus jack up your grammar score: pick your poison because you lose either way, Hallucitor.
(I bet right now you are cooking up new ARBITRARY rules on what parts of API's "require" checking and which don't. I know your patterns by now, Hallucitor.)
You seem to have misunderstood the point of this page. The "grammar score" proves that it's simpler to verify that PreparedStatement definitions don't include external data than it is to verify that dynamic SQL includes filtered external data. That doesn't mean you have to use a grammar to implement some code checker, and it doesn't say anything about "what parts of API's 'require' checking and which don't". It only means it takes more effort to verify filtered dynamic SQL is secure than to verify that PreparedStatements are secure, regardless how PreparedStatements or dynamic SQL are generated.
I'm only concerned with INHERENT PS-versus-filters issues. If you invent a side topic about something you call "dynamic SQL metrics", that's fine, but it's a WalledGarden issue that only you and your weird buddy cares about and has no bearing on inherent PS-versus-filters issues. If you demonstrate that "It's harder to do Goofy-Thing-7 than to use out-of-the-box PS API's", that may indeed be true, but why anyone should care about doing Goofy-Thing-7 when it is not forced upon them, I have no clue. It's out of the blue and relates to nothing that came before that I can see other than being one possibility of API design among many others.
An inherent PreparedStatement-versus-dynamic-SQL (aka PS-versus-filters) issue is that PreparedStatement definitions must not include external data, but dynamic SQL must include (filtered) external data. It's imposed by security requirements and the DBMS. This page merely discusses one of the implications of that.
I still don't know what you mean by "dynamic SQL". Both type of sub-systems are fed essentially the same thing such that it does not matter if what they are fed is "dynamic" or "static". A classification of "dynamic" versus "static" (or whatever) would NOT affect what an app code inspector would check. The code checker would not do anything different to a string designated as "dynamic" versus "static" such that such a classification is irrelevant.
"Dynamic SQL" means SQL query text is generated at run-time from variable data, such as user input. Contrast that with PreparedStatements, where variable data such as user input is passed as parameter arguments when a predefined PreparedStatement is executed, but the PreparedStatement definition is (or should be) static. What an "app code inspector" -- such as that described at the top of the page -- can check is that PreparedStatements (as defined at the top of this page) are defined entirely and solely from static strings.
However, keep in mind that such an "app code inspector" is not a requirement. It's possible to use it if you're defining PreparedStatements like those at the top of the page, but nothing stops you from defining PreparedStatements or dynamic SQL some other way. The whole point of this page is to show that it's easier to verify that PreparedStatements haven't been tainted by external data than it is to verify that dynamic SQL has been constructed at run-time using strictly-filtered user input. The "app code inspector" described at the top of the page is merely intended to demonstrate that difference in complexity. What's crucial is not the use (or not) of "app code inspectors", but that PreparedStatement definitions be checked (somehow) for tainting by external data, or that dynamic SQL be checked (somehow) to make sure external data is filtered. What this page proves is that the former will always be simpler than the latter.
Example-1A could be used with either PS's or filters and the code checker would process it exactly the same and give the same score REGARDLESS if you personally call a string "dynamic" or "static" or "fliffernobb". It's just a word.
If Example-1A is used with "filters" (and it isn't, because I said so -- but let's run with your argument for a moment) then I presume you mean PreparedStatement-like syntax is converted to dynamic SQL with filtered external data, such that no actual SQL PreparedStatement is involved. If that's the case, then it's not a PreparedStatement, it's dynamic SQL. Notably, it's more complex than a real PreparedStatement -- and more complex than plain dynamic SQL -- because you have introduced the additional complexity of the mechanism that translates the PreparedStatement-like syntax to dynamic SQL.
- If you are talking about TOTAL code, that's a whole different animal than what an app-side checker has to check. I still think you are wrong because you are vendor-centric-biased, but total code is off-topic anyhow so that I won't go down that bicker road. I smell scope-slip. My original understanding was that the proposed code checker ONLY checks application code, not "generic" query API implementation. By "generic" I mean that the query API does not have app-specific stuff in it, like references to HTML "form" variables. One doesn't typically put those in query API's. I stated this before and you didn't seem to have an issue with this assumption earlier.
- What is "TOTAL code"? The "proposed code checker" is part of a proof that validating dynamic SQL is inherently more complex than validating PreparedStatements. You don't have to use the "proposed code checker". Whilst it can be used if the application developer is constructing PreparedStatement SQL and dynamic SQL, it's not necessary in general (as long as the SQL is validated somehow). Fundamentally, it's a practical proof of the intuitive observation that of the SQL sent to the DBMS, a developer -- somewhere between the application and the DBMS -- has to make sure that PreparedStatements don't include external data and that dynamic SQL includes filtered external data, and the former is technically simpler than the latter. If you disagree, please demonstrate otherwise.
- AND if we are checking total API's, they will likely have function/methods in them, cranking up the PS side's grammar score.
- We aren't using the "grammar score" to check "total API's". (I'm not even sure what that means.) We're only using the grammar score to demonstrate that valid PreparedStatement SQL string construction is simpler than valid dynamic SQL string construction. If the strings are constructed explicitly by the application developer, it will be more complex to construct and validate dynamic SQL strings than PreparedStatement strings -- as proven at the top of this page. Therefore, if the strings are constructed via middle-ware, the middle-ware for constructing dynamic SQL will be more complex than the middle-ware for constructing PreparedStatements. If you disagree, please demonstrate otherwise.
- But we are not measuring "dynamic SQL"; that's NOT a direct concern; the concern is PS's versus filters, and filters can use the Example-1A style also. I've said this already, but it's not sinking in. You are comparing irrelevant shit; wrapper/API styles instead of inherent technology features or limits. Do you know what "inherent" means? You are getting side-tracked easily. And if you claim all the middle-ware would be more complex, prove it. And you can't pick and choose which middle-ware to include and exclude from the count without good reasons. Vendor-side-coding versus shop-side is not dividing line, per below, because that cut-off is NOT inherent to PS-versus-filters.
- [It says, quite clearly, at the top of the page that the comparison is between dynamic SQL and prepared statements. How can dynamic SQL not be a direct concern when it's one of the things being compared?]
- I thought the intent was to compare PS's to filters. If you want to compare the "PS-influenced API style X" to "API style Y", that's fine, but it has nothing directly to do PS-versus-filters. ("Dynamic SQL" is a vague term, as I already pointed out. I still don't know what you mean by it due to FractalVagueness on your part.) I guess you invented a WalledGarden topic and it took me a while to realize that. So I'm bailing out... -t
- [Nope, the comparison for this page, and for the discussions that led to this page, has always been prepared statements vs. dynamic SQL. Filters came into it because dynamic SQL needs filters to even have a chance of being as secure as prepared statements. (It's one of those inherent differences.) This page shows that it's more complex to ensure that dynamics SQL has the appropriate filters in place than it is to ensure that prepared statements don't come from external data. (BTW, your definition of FractalVagueness, as pointed out on that page, applies equally to every statement ever uttered in any natural language. Any problems we have communicating are not due to FractalVagueness.)]
And please don't say "but I intended Example-1A to be used only with PS". That's irrelevant because we are not considering ONLY your preference of API usage. You are not the center of the universe. We are not testing JUST your preferences or coding style because the claim is not limit to a specific style as given (or at least it shouldn't be if you want your "evidence" to mean anything important and general. I won't challenge your specific preference, but I don't have to use YOUR personal preference in my shop such that it means shit to me.)
I have expressed no preference of "API usage" or anything else. Why do you think I have? The only thing this page shows is that of the SQL sent to the DBMS, it takes more effort -- somewhere, by some developer -- to verify that dynamic SQL is consistently filtered than to verify that PreparedStatements do not involve external data. This is true independent of what "API usage" is (or is not) employed to generate either PreparedStatements or dynamic SQL.
The code it would verify looks exactly identical to me if one so chooses to use the API style shown in Example-1A regardless of whether PS or filters are used under the hood. If there is some OTHER code somewhere else full of Dark Complexity or whatnot, please show it. I can't read your fucking mind; you have to demonstrate with text and code where this alleged complexity is lurking and why it's allegedly in your metric scope (why it's being counted).
The "OTHER code somewhere" is the code inside some middle-ware API that is translating Example-1A's PreparedStatement syntax to dynamic SQL strings using filters, before being sent to the DBMS. If Example-1A is used to construct PreparedStatement SQL strings, then it can be sent to the DBMS as-is via low-level APIs; no translation or middle-ware API is needed. Hence, the overall process of constructing actual PreparedStatement SQL strings is obviously simpler -- and therefore easier to validate -- than using a PreparedStatement-like syntax to create dynamic SQL.
You seem to be saying if a vendor creates them, they are "low level" (and not counted in metrics) but not if a shop creates them? I consider what the vendor creates "middle-ware" also. A vendor could in theory create filter-wrapping API's that resemble typical PS API's also. That's just packaging, not an INHERENT difference in PS's versus filters. What's at the shop-side versus what's done by the vendor is not inherently dictated by PS versus filters. That technology choice does not hold a logic gun to a vendor's head to alter their behavior.
Vendors probably don't make PS-like API's without PS's because positional parameters with lots of columns are awkward to use such that there is no demand: it's not a default preference, and thus the existing market pattern has little if anything to do with "complexity". Vender middle-ware versus shop middle-ware is NOT an INHERENT feature of PS's versus filters. Focus on that word "inherent" so that you don't get side-tracked by irrelevancies again. If filters or PS's FORCE vendors to split things a certain way, then you may have a point. But I don't see force-ness. You mistake population samples for inherent traits it seems. If there is something about filters that absolutely and universally FORCES vendors and/or shops to partition query-related API's a certain way, please identify these absolute rules and prove they are absolute. And remember, I don't care about market habits or preferences. We are not measuring marketing preferences here: something commies, capitalists, anarchists, and space aliens all have to do related to PS-versus-filters.
When you measure and compare, please make double sure you are comparing something INHERENT to PS's versus filters; NOT industry preferences or habits and NOT your personal preferences.
[The comparison shows an inherent difference between prepared statements and dynamic sql. The ability to hide that difference behind an API doesn't make that difference any less inherent. If you think it does, you should reck your own rede and learn what inherent means. (Hint: it does not mean "forced".)]
You invent your own personal definition of "inherent" just like you did with "value". Lovely, Hallucitor. I don't give a shit about characteristics of "dynamic SQL". They are not directly related to the prior PS-versus-filter topics, but rather some screwy side-show given by a crazy clown. The bottom line is that a query API that uses filters under the hood CAN use Example-1A -style application code and that a "poor score" for the alternative 1B says nothing inherent about filters. The metric and example, as given near the top, says nothing inherent about filters. We were talking about tire A versus tire B on cars in the topic that spawned this one, but you went off and measured ducks instead. Your duck conclusions are a UselessTruth at best.
[I'm using the definition in the dictionary. You, if you think it means "forced", quite clearly are not. The previous topics have always been prepared statements vs. dynamic sql. Filters only come into play because they are used to get some semblance of the security inherent in prepared statements. The rest of your rant is an AdHominem. It helps your case not at all.]
Example-1A also uses "concatenation" for SQL and thus your (stated) PS example also uses "dynamic SQL" such that using dynamic SQL is not a difference maker between the two examples. (See PageAnchor Dynamic-SQL-Def-01)
[You clearly didn't bother to finish reading the definition. If you had you would know that it also has to be sent to the "DBMS connection's "execute query" function". The statements in Example-1A cannot be sent to that function because, as far as that function is concerned, they contain syntax errors.]
PS's also have something equivalent, they just use more buffering or caching-like techniques tacked on to it.
Could you explain what you mean by that, and how it's relevant?
Please re-read the definition of "dynamic SQL" from PageAnchor Dynamic-SQL-Def-01, particularly: "By 'dynamic SQL', I mean constructing whole SQL queries by string concatenation, in which a SQL query is created as a text string composed of static text ... and optional static string constants ... and optional external data filtered to hopefully prevent SQL injection ..." PreparedStatement definition strings -- as demonstrated in Example-1A -- should only be constructed from (concatenation of) static strings. The Example-1A grammar takes that into account. Dynamic SQL strings should also be constructed from (the concatenation of) static strings, but must include concatenation of filtered external data. The Example-1B grammar takes that into account. That difference -- that dynamic SQL must include filtered external data -- is why dynamic SQL is more complex than PreparedStatements.
Perhaps you are implying that "concatenation = complexity??? It's such horrid writing it's hard to know. And I still don't know what that "should" shit is about: that's human policy issue, not a technology issue. If you mean a goal, then state it as an assumed goal, not as a human policy.
Of course it's a "human policy issue" and an "assumed goal". Avoiding SQL injection is obviously a human policy issue and an assumed goal. Your computer doesn't care whether SQL injection occurs or not, and concatenation is obviously more complex than not-concatenation. Filters are more complex than not having filters, too.
Okay, but that's a general goal; I don't see the relationship between it and "dynamic SQL" as a general concept. You are mixing fruit and ducks.
The general concept is that PreparedStatement definitions -- in order to be secure -- shouldn't include external data and (therefore) don't require filters, but dynamic SQL requires filtered external data, so the former is simpler than the latter. It's hardly "mixing fruit and ducks", because they're both defined as SQL that's sent to a SQL DBMS for processing.
Application Scope
I'm going to "cut off" my concern at the application level. Whether PS-related middle-ware (query API's) are "less complex" than filter-based API's, I will not debate. I believe you are wrong on the middle-ware level, but will not challenge it for now. I will focus on application-side checking/complexity, and that's it. If you want to debate the middle-ware level and below, that's fine, but you will be debating with yourself.
And note that it's not the case that "everything can be hidden away into middle-ware". If the middle-ware has application-specific concerns, such as UI form or data column names hard-wired into it, than it doesn't qualify as "middle-ware" to me here. I'll also exclude UI issues for clarity (other than maybe error messages if there is a query problem).
--top
You appear to be constructing a StrawMan here. Middle-ware with "application-specific concerns, such as UI form or data column names hard-wired into it" doesn't sound like middle-ware to me, either. It sounds like part of the application.
- No, if it has application concerns, then it's NOT middle-ware. There appears to be a misunderstanding. Where did my writing give you the opposite impression? And if it has UI concerns it's not "query" middle-ware.
- I agreed with you that "application-specific concerns, such as UI form or data column names hard-wired into it" aren't middle-ware. Why did you mention it in the first place?
- As examples of DIS-qualifiers. In other words, to be called "middle-ware" here, the query API should be usable in a different organization altogether ("app-neutral") and not have UI-engine-specific code or app-specific-UI code in it. (Some UI-engine-specific code may be needed for the purpose of displaying query error messages).
- Middle-ware portability has never been an issue here; I'm not sure why you're raising it.
- See below.
Regarding middle-ware, having it is certainly more complex than not having it. If PreparedStatement-like syntax is being translated by middle-ware into dynamic SQL with filtered external data and passing it to the DBMS, that's obviously more complex infrastructure than not having any middle-ware and passing PreparedStatement definitions directly to the DBMS.
The PS API's included with say PHP are middle-ware. Thus, if you are using PHP's PS interface API's, you are using middle-ware. ODBC drivers are also middle-ware.
- There is a considerable difference in complexity between hypothetical middle-ware that translates PreparedStatement-like syntax into dynamic SQL with filtered external data running on top of a DBMS interface layer, and middle-ware that consists of just the DBMS interface layer.
- PS's require implementation complexity and filters require implementation complexity. How much of that is distributed to the RDBMS compared to the middle-ware is fairly flexible (WaterbedTheory) and as far as I'm concerned is off-topic. You may say that since the RDBMS already has PS's implemented that it's a "sunk cost" and thus shouldn't be counted. But that's merely a marketing habit, not an inherent feature of PS. There is no Rule of Logic that says DB's couldn't be sold with and without PS's like 6 cylinder versus 8 cylinder cars. It's market quibbling, not universal partition logic. Now if you want to change your claim to be "based on what is CURRENTLY available in the market, PS's are less complex to use/implement/check/have-sex-with/etc.", be my guest.
- The internal complexity of the DBMS has never been an issue here; I'm not sure why you're raising it.
- YOU are the ones drifting deeper and deeper into the guts of implementation. I wanted to only look at app-side interfaces.
- Actually, no, we're not "drifting deeper and deeper into the guts of implementation." We've always remained at precisely the same level, maintaining precisely the same argument.
- I guess there is miscommunication then, because it appears vague and flip-floppy to me.
[He might also be constructing a
NoTrueScotsman type argument. One where any API that doesn't hide the difference between
PreparedStatements and dynamic SQL is declared middle-ware and therefore not truly on the application level.]
I never claimed that. I would note, though, that the examples YOU gave could be made to work either PS's or filters.
To ensure that Example-1A can be "made to work" with either PreparedStatements "or filters", there would have to be middle-ware that translates the PreparedStatement syntax into dynamic SQL (constructed with filtered external data) before sending it to the DBMS. Of course, that would mean Example-1A isn't what it states it is -- an example of PreparedStatements -- but is actually an example of PreparedStatement-like syntax used to create dynamic SQL. That's obviously more complex than sending the PreparedStatement definitions directly to the DBMS without translation, which is what Example-1A is intended to illustrate. Hence dynamic SQL aka "filters" is still more complex, in terms of code requiring security auditing (the discussion of which is what spawned this page), than PreparedStatements.
Like I keeeeeep pointing out, your example is only a "spot sample". I know what you intended in that PARTICULAR example, but it does not say anything GENERAL about PS-versus-filters because it's only a spot sample. I'm looking at generalities, NOT just your spot sample. I don't dispute your analyses for the spot sample. There is no disagreement for that specific particular one single case. And if you are looking at the overall complexity, not just in specific levels (backend, middleware, app, etc.) then you need to be fair and consider all the complexity needed to implement PS's also. If you play the "deep card", then PS's are also subject to the same deep-card rules and get counted in the complexity score. You seem to be arbitrarily cherry-picking what levels you count and don't count to rig the score toward your (wrong) claims.
No, my example isn't a "spot sample". It's illustrative of the general difference in complexity between validating PreparedStatements vs dynamic SQL in every system in which PreparedStatements vs dynamic SQL is a possible consideration. It's of concern to anyone who writes PreparedStatements and/or dynamic SQL on the client-side of a DBMS, which is (for example) most people who develop mobile applications or object-relational mappers or cloud-based applications or custom business applications or database-driven websites.
As very, very few people write DBMSs (I am one of them; I can count the others whom I know of on two hands), the relatively complexity in implementing PreparedStatement parsing and execution vs dynamic SQL execution is irrelevant. (As an aside: Once you're able to produce a parse tree or query plan from SQL, adding support for PreparedStatements is a negligible additional step. If you've designed your DBMS well, it practically falls out.)
You keep claiming it illustrates something general, but give a poor argument for that. To me it is clearly is an arbitrary API choice that is not directly dependent on whether PS's or filters are used under the hood. The alleged dependency between the API style and PS-versus-filters is a "missing link" in your explanations. We seem to be going around in circles and I should give up trying to figure out what hell you are talking about, but so far I can't kick the habit of replying. It's pretty clear our world view of software and abstraction and WetWare is so very different that we don't have enough common ground to use as a bridge between our minds. Seemingly strait-forward words mean something to you so different than what they mean to me. It's like working with an alien from Kolob.
In your work creating custom business applications, don't you frequently construct queries that involve external data?
Yes, and?
Don't those queries often look like the ones in Example-1B?
More often than 1A, yes, but again, we are not measuring on preference or popularity such that frequency doesn't matter anyhow. Actually, I tend to combine the form render and/or validation routines into value-filter-checking such that I tend not to use either of those two styles in practice. (Note: You asked for an example of such combining technique before, and I've yet to supply one.)
That's fine. Do your query definitions consist of static strings concatenated with filtered external values?
Query definitions? Static? It's often a dynamic language; there is no "static". I'm not sure what you mean. Concatenation of filtered "values" with SQL strings eventually happens along the line, but the app developer may not do the actual concatenation (depending on API design choices and special needs).
In a query definition like '"SELECT p, q FROM z WHERE p=" . sqlString($_POST['userinput']);', "SELECT p, q FROM z WHERE p=" is a static string whether the language is dynamically typed or not. Do you not have those?
I'd call it a "literal string".
That's a reasonable thing to call it, but at run-time you can't change "SELECT p, q FROM z WHERE p=" to something else. So it is static, isn't it?
It's a static string, yes.
Good. Then is it true that your query definitions consist of static strings concatenated with filtered external values, even if those query definitions are constructed inside a QBE-like mechanism?
For the most part, yes.
Excellent! Then it isn't necessarily the case that "seemingly strait-forward words mean something to you so different than what they mean to me", as you wrote above. So far, we agree.
Let's assume you have a query defined like that above:
$qry = "SELECT p, q FROM z WHERE p=" . sqlString($_POST['userinput']);
Do you agree that it could be re-written as a PreparedStatement definition that would look like this?
$qry = "SELECT p, q FROM z WHERE p=?";
Example 1A.2
A set of SQL clauses in PHP that use a query API that closely resembles those used for PreparedStatements (PS), but uses filters under the hood to check parameter values against injection strings.
<?php
// Example-1A.2
$prepClauseStatus = "s = 'available'";
$prepStmtDef01 = "SELECT * FROM piddle WHERE p = ?";
$prepStmtDef02 = "INSERT INTO fizzle(x, y, z) VALUES(?, ?, ?)";
$prepStmtDef03 = "SELECT * FROM piddle WHERE " . $prepClauseStatus . " AND p = ?";
$prepStmtDef04 = "SELECT * FROM furgle WHERE " . $prepClauseStatus . " AND q = ?";
?>
Note that the code is highly constrained, consisting strictly of assigning variables. Variables are only assigned string literals, or the concatenation of string literals and variables. We can express that as a grammar. Assume the existence of token definitions <VARNAME> which is a variable name (e.g., $prepClauseStatus), and <STRING> which is a sequence of any characters delimited by '"'. Other tokens are identified by their literal text.
The grammar, in modified EBNF, is:
PHPProgram ::= "<?php" ( Assignment )* "?>"
Assignment ::= <VARNAME> "=" stringDef ";"
stringDef ::= ( <STRING> | <VARNAME> ) [ "." stringDef ]
Note that grammar is the same as example 1A.
Assuming you're using PHP PDO, what method would you call to pass $prepStmtDef04 to the DBMS?
That's an implementation detail; why does it matter? I don't want to dictate arbitrary implementation details in the example. Note that additional (non-PHP-library) middle-ware would likely pre-process $prepStmtDef04 to insert the parameterized values etc.
It matters because it proves my point: Dynamic SQL -- which is what your "additional (non-PHP-library) middle-ware" would have to create -- is inherently more complex than PreparedStatements. If you treat $prepStmtDef04 as a PreparedStatement, you can send it to the DBMS as-is. You wouldn't need any "additional (non-PHP-library) middle-ware" to turn strings containing PreparedStatement-like syntax into dynamic SQL because you could use them as PreparedStatements instead.
See above at "PS's require implementation complexity and filters require implementation complexity." You again seem to be arbitrarily excluding and including the various abstraction levels in your scoring per whim. Please stop doing that; it's bad form.
There's been no "excluding and including". We've consistently assumed the minimal "abstraction levels" needed to get SQL definitions from the client-side to the DBMS, i.e., ODBC, JDBC, etc. These low-level APIs must exist no matter what higher level APIs are employed, and we've consistently noted that higher-level APIs -- such as your PreparedStatement-syntax-to-dynamic-SQL-converter -- are additional system complexity on the client-side.
- It's true the middle-ware may be bigger with filters, but the flip side is that the DB and/or lower level does NOT need PS engines or PS-related API's etc. If we look at the entire stack, the NECESSARY complexity will roughly balance out. We don't need complexity devoted to PS if we use filters.
- PreparedStatements have been a feature of popular SQL DBMSs for over a decade. How likely is it that the proven-secure, proven-reliable, speed-improving (when a query has to be re-executed) PreparedStatement facilities -- which, in a well-designed DBMS, are easier to add to the DBMS than to create a layer that translates PreparedStatement-syntax to dynamic SQL -- are going to be removed by SQL DBMS vendors in favour of using some risky filter-based middleware? That's an unreasonable scenario. This page is based on real situations facing real development, not hypothetical middle-ware and imaginary DBMSs.
- Again again again, were are measuring necessary complexity, not what the market currently does. Red herring and/or memory shot to hell.
- No, we're measuring the complexity of defining SQL PreparedStatements vs dynamic SQL from the client-side, as stated from the beginning and consistently throughout. We're looking at the "stack" that actually exists, using actual DBMSs that already exist, not hypothetical speculations.
- You didn't clearly state that. If you are talking about EXISTING products as they are, then I'll agree with you for the most part. You appear to be waffling in an out of the market/popularity thing (versus inherent features).
- I clearly stated, from the beginning, that this page is about demonstrating that PreparedStatement SQL definitions are simpler to validate than dynamic SQL. Given that PreparedStatements and dynamic SQL are supported by every popular SQL DBMS, why would anything other than existing products need to be considered?
- And I clearly stated that what you show being validated is an API design choice and not inherently tied to the use or choice of PS's versus filters and thus you are measuring something not tied to PS usage but rather API design which is orthogonal to PS usage.
- What's being measured is a logical result of characteristics of PreparedStatements vs DynamicSql, as implemented by SQL DBMSs. Fundamentally, the difference derives from the fact that PreparedStatements are defined and executed in separate steps, whilst DynamicSql is executed in one step. PreparedStatements parametrise the execution step, but there's no equivalent for DynamicSql. Thus, DynamicSql must incorporate filtered external data. PreparedStatements do not need to filter external data. API choice only determines whether the DynamicSql filtering -- assuming DynamicSql is used, instead of PreparedStatements -- goes on inside some middle-ware used by the application or inside the application itself.
- But "what technology X must do" is in this case at an abstraction level potentially below that of application code auditing. Thus, even if it were true (I won't concede that here without caveats), it's irrelevant to the issue of app code auditing. And I do realize your PARTICULAR example it may matter, but that's only a single specimen that cannot be extrapolated to all possible or even actual environments.
- You might not be the auditor, but someone should be. Some developer -- maybe not you -- on the client-side of the DBMS will have to deal with the difference in complexity between defining PreparedStatements (no filters needed) and constructing DynamicSql (must have filters).
- But what needs to be audited does not directly depend on whether PS's or filters are used at a lower level. For example, if a shop's query middle-ware has been used in multiple applications and application categories, it's not likely to rely on form-specific or app-specific I/O and thus doesn't typically have to be audited for I/O "leaks". (All middle-ware should be inspected for general quality, but that's another story.) Only the application code would need auditing for rouge form or URL parameter input and the like. If you think that the middle-ware should also be audited for such, then the Php PS-interface libraries should be subject to the same scrutiny and complexity metrics. How do you know they don't grab input from "$_form['my_secret_form_value']" and append it to certain "direct" SQL statements?
- Everything on the client-side of the DBMS should be audited. What this page demonstrates is that it will be easier to audit PreparedStatement definition construction than DynamicSql construction, because the latter requires filters but the former does not.
- Does "everything" include PHP's PS-interface libraries?
- If by "PHP's PS-interface libraries" you mean PHP PDO -- which is a database interface library; not limited to PreparedStatements -- and PHP's native interfaces like (for example) the Postgresql functions (see http://www.php.net/manual/en/ref.pgsql.php), then absolutely. I know of security-conscious companies that use PHP with PostgreSQL because they're OpenSource and therefore can be audited right down to the OperatingSystem.
- Those use functions/methods, so your grammar score wouldn't be much different. I still can get a strait story of what the top examples are meant to illustrate. I should have given up a long time ago, but I guess I'm a glutton for punishment. Poor me.
- The top examples illustrate the fact that client-side construction of PreparedStatement definitions is simpler (to audit or validate) than equivalent DynamicSql construction, because PreparedStatement definitions don't need filters.
- But "make up for it" by needing PS-related stuff on the client side. I've pointed this out already and you ignored it. You keep ignoring that side of things. It's so annoying when you ignore my replies and repeat the same claim yet again. Filters need filter-related code and PS's need PS-related code. You only showed half the issue and thus cheated.
- What "PS-related stuff on the client side" is needed, other than the invocation of a predefined PreparedStatement with its arguments? Invocation is not at issue here, because we're talking about the complexity of defining PreparedStatements vs constructing DynamicSql.
- And you didn't address the function/method complaint. If the top examples illustrate "simplicity", I don't see any objective metric related to "simplicity", except in terms of (alleged) lack of functions/methods. I want to see an object metric, not just a verbal claim.
- The complexity of the grammar required to verify the absence of external data in PreparedStatement definitions vs the complexity of the grammar needed to verify that DynamicSql contains filtered external data, is precisely such an "object metric", as shown at the top of this page. I'm not sure what you mean (or meant) by "the function/method complaint". It appears to be irrelevant. The only issue of concern is the presence or absence of filtered external data. Is the invocation of filters what you meant by "the function/method complaint"?
- Arrrrrg. You said all client-side (non-RDBMS) code should be inspected. This includes PS-processing/interface API's (such as those shipped with Php). And such PS-processing/interface API's will likely have functions/methods. Thus, your example should count the grammar complexity to process (inspect) such functions/methods, but it strangely doesn't. Problem is simple; why is it tripping you up? I don't get it. Arrrg. -t
- I said all client-side code should be inspected. That doesn't mean I expect all client-side code to be passed through an automated code-checker. I expect re-usable, generic client-side code to be audited only when it's -- rarely, presumably -- written or changed, not for every new application. For such code, automated code-checkers are needless overhead. However, client-side application code needs to be checked every time it's changed, which is likely to be relatively frequently, and it's likely to be tediously lengthy as a result of many query definitions. For such code, automated code-checkers are likely to be of value. That said, the examples at the top of the page aren't intended to be illustrations of best (or any, necessarily) practice for application development. They're primarily intended to be proof that validating PreparedStatement definitions is easier than validating DynamicSql construction. The specifics of particular APIs do not change that simple fact, which is based on the (equally simple) fact that not having to employ external data or filters (for PreparedStatements) is inherently simpler to validate than having to employ filters on necessary external data (for DynamicSql).
- This appears to a big contradiction. If the automated code checker is ONLY inspecting app code, then it can end up inspecting code like Example-1A under PS usage and filter-based usage. What an app-side automated code inspector examines depends heavily on the query API's and Example-1A and 1B are orthogonal to whether PS's or filters are used under the hood. If you are talking about complexity somewhere else (in your indirect meandering way), your example doesn't show that.
- There is no "big contradiction". The core point is that constructing PreparedStatement definitions is simpler than constructing DynamicSql because the former doesn't require filtered external data, but the latter requires filtered external data. (Grammar complexity is used to prove this at the top of the page, and it is noted that we can use the grammar to create automated code checkers, but that's an aside rather than the core point.) If we assume Example-1A isn't defining PreparedStatements but is defining DynamicSql using PreparedStatement syntax, then that core point is still demonstrated: In this case, it's because implementing it would require a code translator to convert PreparedStatement SQL syntax to DynamicSql syntax (with filtered external data, of course), which is obviously more complex than just sending PreparedStatement definitions directly to the DBMS. No matter how you look at it, the fact that PreparedStatement definitions don't require filtered external data but DynamicSql construction does, is going to result in more complexity to construct DynamicSql than PreparedStatements somewhere in the SQL construction system.
- But we don't need to actually build an automated grammar checker for middle-ware, and you appear to agree with this. Even IF the middle-ware (non-app) for filter usage was more complex (which I don't concede by the way), it's not something the scenario you set up would need to be concerned with. You keep flip-flopping between practical-land and theory-land when I try to nail down your example context and scope. Are we measuring app-side complexity, or total complexity, or some mutt in-between? I don't fucking know from your writing. The context and scope jumps all over the place like a goto-loving frog on crack.
- Simply: SQL construction that doesn't require filters is simpler than SQL construction that requires filters. If you disagree, please demonstrate a client-side stack where defining DynamicSql (or "filtered immediate SQL" or whatever) construction is equivalent in complexity, or simpler, than PreparedStatement definition between the point of application programmer's definition to the low-level (i.e., the point where SQL text is sent to the DBMS) DBMS API like ODBC or JDBC.
- See PageAnchor Complexity-Symmetry. And why cut off your scope of concern of complexity at the RDBMS? It appears rather arbitrary to me. (Concern scope is relative to what a reader/user is doing, as already explained.)
- This page is about validating PreparedStatement definitions. The DBMS doesn't produce PreparedStatement definitions, it consumes them. Equivalently, if we were talking about the complexity of writing a Java code generator, we wouldn't be interested in the complexity of a Java compiler. It would be irrelevant.
- By the way, if one is using PostgreSQL or MySQL, the source code of the DB is available for inspection, and this includes code to process PS's.
- True, but PreparedStatement support is a negligible addition to DBMS complexity. PreparedStatement support and DynamicSql are integrated on the interpreter (DBMS) side, so if you're auditing the DBMS, you can't ignore the one you're not using and only look at the other. And, of course, on this page we're talking about the complexity of constructing SQL statements, not the complexity of interpreting them.
- You only claim it's "negligible". And I don't know what you mean by "ignore the one you're not using". Please clarify.
- Yes, I claim it's negligible. By "ignore the one you're not using", I mean you can't separate PreparedStatement and DynamicSql implementation inside the DBMS to the point that you can audit just the PreparedStatement part if you only use PreparedStatements, or only audit the DynamicSql part if you only use DynamicSql. There is one SQL interpreter for both constructs. The only "extra" PreparedStatement code is handling parameter definitions, and PreparedStatement invocation. These are trivial compared to the complexity of the rest of the DBMS. (I speak from experience of writing DBMSs.)
By the way, why would you use PreparedStatement syntax on the client-side with "additional (non-PHP-library) middle-ware" to convert it to dynamic SQL with filters and send it to the DBMS? Why not just send the PreparedStatement definitions directly to the DBMS?
That question is outside of the scope of this topic. I would note that many find the out-of-the-box PS API's difficult to use because positional parameter passing with a large quantity of parameters is an awkward UI for most humans (TooManyParameters). Sure, it can be wrapped to be more like Example-1B (without functions) or keyword-based, but then you'll penalize that stack for having fatter middle-ware.
You can define any query coding you like on the client-side. It doesn't have to look like PreparedStatements as in your example; it could be something completely different and un-SQL-like. However, to work with a SQL DBMS, you're going to have to translate that custom query coding to either PreparedStatements or filtered dynamic SQL. That is always going to involve more complexity than sending PreparedStatement definitions or filtered dynamic SQL directly to the DBMS.
- You only claim it's more complex. You haven't clearly demonstrated it.
- It's self-evident. Given a language S supported by a DBMS, the infrastructure is obviously simpler if we code in language S than if we code in language P and require a translator T(P) → S.
- There's nothing inherently stopping the DBMS from also supporting P. We've been over this already.
- PageAnchor Bundle-Argument
- Except... No DBMS supports P. From the beginning, this page has explicitly compared PreparedStatements and dynamic SQL, which all popular SQL DBMSs support. Why would we include some hypothetical language P that no DBMS supports? It's irrelevant to comparing the complexity of validating PreparedStatement definitions vs dynamic SQL.
- Again, you seem to waffle back and forth between theoretical complexity and "practical" complexity. As far as the way products currently are typically found (bundled) in the wild, I will agree with you on the "complexity" issue. But I do not concede the theoretical side. If you count the total stack from top to bottom, then in-the-field (vendor) bundling does not give you a free ride on the theoretical side. A service not used by the filtering technique should not count against it anymore than hypothetical bundling of positional filtering API's should count against the PS score.
- Where do I "waffle back and forth between theoretical complexity and 'practical' complexity"? I've never mentioned theoretical complexity, except to deprecate your attempts to include it, like here. My measure of complexity is entirely pragmatic -- which is as it should be, because that's what's relevant to developers and security auditors. I've merely (and only slightly) formalised the intuitive observation that string construction consisting only of concatenation of static strings is simpler than string construction consisting of concatenation of static strings and function invocations. (I.e., PreparedStatement definitions vs dynamic SQL construction.) What's theoretical about that, and how have I waffled about it?
- If it's "entirely pragmatic", then one should consider the reasonable possibility that a shop will want to group SQL into functions or classes and/or use wrappers to make using PS's and/or filters easier (both of which will use functions/methods). Excluding functions is not "pragmatic". And if you claim it is from personal experience, then we've reached an AnecdoteImpasse and we can LetTheReaderDecide rather than bicker over what's "pragmatic". I'm confident most readers will find your exclusion of functions silly.
- My exclusion of functions is for the sake of illustration, not a necessity. (Of course, if you are doing automated code validation, excluding functions may make it easier to write code validation mechanisms, but automated code validation is not a requirement.) If you are modularising code through functions, or using them to avoid duplication of code, that's entirely reasonable. However, you will inevitably be writing the same functions for the same purpose whether you're defining PreparedStatements or dynamic SQL, so they're not a distinguishing factor. I have focused strictly on the distinguishing factors between defining PreparedStatements and constructing dynamic SQL.
- But you are lopsided about what distinguishing factors you show and which you don't. Yes, using filters requires concatenation SOMEWHERE as you correctly point out, but PS's require their own PS-only processing that you don't show. You sometimes say that's because the PS's happen at a lower level of abstraction, but concatenation can also: that's an middle-ware or DB engine design choice. As far as your "but it's already built into most RDBMSs" argument, if one accepts that argument, then THAT is the distinguishing factor, not concatenation versus non-concatenation. Your words and your presentation don't jive up such that your presentation is misleading if go with the "built-in" (bundling) argument. Your example makes it look like one needs something outright and the other doesn't need it outright, but the REAL issue is bundling (of PS-related complexity) versus non-bundling (of filter-related complexity). Granted, I'm not quite sure how to illustrate bundling either, but I DO KNOW that you did it wrong. If bundling is the difference maker, then state that clearly and CLEARLY illustrate/demonstrate it in the examples. I see no bundle-ness directly shown anywhere.
- What do you mean by "PS's require their own PS-only processing"? The rest, I'm afraid, I don't follow. Could you explain? E.g., what is a "bundle-ness"? My examples ultimately show a simple, fundamental truth about SQL DBMSs: When using dynamic SQL, you need filters. When using PreparedStatements, you don't. Hence, PreparedStatements are simpler.
- See PageAnchor Bundle-Argument. You seem to be arguing that the mechanisms to process PS's are "bundled" in RDBMS and thus these PS mechanisms shouldn't be "counted" in the complexity score. If we are using filters, we don't need these PS-processing mechanisms. However, you are not crediting filters for that lack of need (less required complexity) because of the "bundle argument". But that's not what you are illustrating.
- The internal complexity of RDBMSs isn't relevant to the complexity of constructing client-side strings containing different kinds of SQL constructs, any more than the internal complexity of (say) a Java compiler is relevant to the complexity of constructing PHP strings containing different kinds of Java constructs. In short, it's irrelevant here. If RDBMS internals were normally included in the audit surface for application development, maybe it would be different.
- I don't know enough about the scope of your claim to verify relevancy. Query API's may also be excluded from an "audit surface" for practical reasons.
- That certain APIs may be excluded for practical reasons doesn't mean they aren't part of the query construction audit surface, it only means that surface is unaudited. What do you mean by, "I don't know enough about the scope of your claim to verify relevancy"?
- Re: "When using dynamic SQL, you need filters" -- If you mean using filters requires filters, well that's pretty obvious. ("Dynamic SQL" doesn't mean anything to me. Stop using it.)
- No, I mean if you're using dynamic SQL, you need filters. I'm using "dynamic SQL" in the usual sense, as described above starting with '"Dynamic SQL" is the standard, and conventional, term.' If it helps, think of it as "SQL queries that don't use PreparedStatements."
- See DefinitionOfDynamicSql.
- By the way, only about 20% of query-related code I see/use resembles Example-1B. Thus, it's not a practical representative.
- What does the other 80% of the code you see/use resemble or look like?
- Like I've said before, often the app-side calls are rolled into CRUD/QueryByExample API's. For example, a DataDictionary-like structure(s) using tables, function calls, or objects may be used. The calls to the filters are built into the data-dictionary object processors. These often produce clauses, such as an UPDATE field set or WHERE criteria clause, that may be plugged into SQL. Thus, most fields and related values are not micro-managed by the app coder: the CRUD/QBE API's do 80% of the grunt work. -t
- Inside your CRUD/QueryByExample APIs, isn't the code ultimately constructing dynamic SQL by concatenating static strings and filtered external data?
- Often I don't know, I just use the API's and never checked. That's the wonderful thing about abstraction: I don't have to know or care and it's swappable with a different approach as long as the results are the same.
As for those who "find the out-of-the-box PS API's difficult to use because positional parameter passing with a large quantity of parameters is an awkward UI for most humans", how does your PreparedStatement-like syntax -- which is translated by middle-ware to dynamic SQL with filters -- solve that problem? If we assume it's there at all, aren't you simply making it worse by creating a PreparedStatement-like API? Wouldn't it be better to create some higher-level API to facilitate constructing PreparedStatements with a large quantity of parameters? That would be worthwhile extra complexity, but converting PreparedStatement-like syntax to dynamic SQL seems like pointless extra complexity.
Sure, we can make all kinds of middle-ware to simplify various aspects of querying and/or code checking and/or code-checking grammars. I've been saying that all along. If vendors provided more API options, perhaps more co's would adopt PS's.
That didn't answer my question. Again: As for those who "find the out-of-the-box PS API's difficult to use because positional parameter passing with a large quantity of parameters is an awkward UI for most humans", how does your PreparedStatement-like syntax -- which is translated by middle-ware to dynamic SQL with filters -- solve that problem? You complain about PreparedStatements, but appear to suggest solving a problem with PreparedStatement syntax by using PreparedStatement syntax.
It's not meant to solve that problem. I was addressing your narrow "test", not solving all IT problems that ever existed. Note that it perhaps is not possible to favor automated code checkers and human coders/maintainers at the same time. These factors, at least, are being juggled against each other:
- RDBMS complexity
- RDBMS swappability
- Middle-ware complexity
- Automated code checking
- Human developer readability/grokkability
- Machine performance
My "narrow 'test'" (by which I presume you mean the grammar comparison) is part of an informal proof; it's neither a coding necessity nor an attempt to solve "all IT problems that ever existed". Whilst it may (or may not) be reasonable to consider your list of factors "juggled against each other", they're irrelevant here. This page exists only to confirm the intuitive observation that it's easier to verify that PreparedStatement definition strings don't include external data than it is to verify that dynamic SQL strings contain filtered external data. Mechanisms that translate PreparedStatement-like syntax into dynamic SQL are either irrelevant if we treat them as PreparedStatements because they're not PreparedStatements, or add obvious complexity to creating dynamic SQL if we treat them as a syntax for creating dynamic SQL.
I disagree with this repetitious summary of already stated repetition, but it appears you play word-games with "dynamic" and shell-games with "complexity" excluding certain levels to rig your metric as you please.
Can you show where I have played "word-games with 'dynamic'" or "shell-games with 'complexity'"? I've simply shown that it's easier to verify that PreparedStatement definition string constructions don't have external data than to verify that dynamic SQL string construction has filtered external data. I find it curious that you don't directly address that simple point.
I've already pointed it out above. Of course, you disagreed, and I disagreed with your disagreements, and you disagreed with my disagreement to your disagreements, etc.
Rather than use vague rhetoric like "... and you disagreed with my disagreement to your disagreements, etc", it would strengthen your argument if you referenced where I have played "word-games with 'dynamic'" or "shell-games with 'complexity'". Otherwise, we'll have to assume your comment is spurious.
You are right, ideally I should put reference markers or PageAnchors, but doing such didn't change anything in past debates except re-trigger prior sub-debates with the same arguments all over again.
Not necessarily. A reference to a prior debate, without new arguments or AdhominemAttacks, need not trigger further debate.
In theory, you are correct. In practice, things end up repeating. I think it's your fault and you think it's mine.
It's Relative
One's acceptance or not of your complexity metric depends on multiple issues. I say LetTheReaderDecide which perspective they wish to consider. Example questions:
- Does one consider a theoretical position or a practical position in terms of "what's already out there"?
- Are you satisfied with your existing query API choices in terms of developer UI-friendliness?
- Is "middle-ware" somebody else's concern or yours?
- Do you intend to inspect middle-ware, or just app-side code for "input" issues?
- Would you accept adjusting or adding middle-ware if it makes app-side code inspection easier?
- Is developer grokkability considered more important than security inspections?
We can't dictate to a reader what their own concerns are; we can only make recommendations, and present and explain tradeoffs as well as possible.
As long as we're asking to LetTheReaderDecide, I suggest that readers be careful not to let your attempt at molehill mountaineering distract them from the simple observation that it's easier to verify that PreparedStatement definition strings don't contain external data than it is to verify that dynamic SQL contains filtered external data. That's because PreparedStatement definitions should consist only of concatenation of static strings, but dynamic SQL must consist of concatenation of static strings and filtered external data, and that's true whether they're created inside or outside some middle-ware.
- Even if that's true, it may not affect the TOTAL complexity score, depending on what parts one counts and doesn't count; and the issue of what one counts depends on their particular perspective or concerns, per above. As far as "molehill mountaineering", projection. Your entire function-exclusion in your grammar metric is the biggest fucking molehilling I've seen in a long time. Usage of functions is a really dumb thing to base on argument over.
- I have consistently focused on the actual -- not hypothetical -- construction of PreparedStatement definitions and the construction of dynamic SQL whether inside or outside middle-ware. I take it that your "TOTAL complexity score" is meant to include the innards of DBMSs, but they interpret PreparedStatement definitions and dynamic SQL rather than (normally) constructing them -- nor are ClosedSource DBMSs normally audit-able -- so they are irrelevant here. It's no more relevant to include SQL DBMS complexity in a measure of the complexity of generating SQL statements, than it would be to include (say) Java compiler complexity in a measure of the complexity of generating Java code statements.
- "Function-exclusion", as pointed out above, is only because you will inevitably be writing the same functions for the same purpose whether you're defining PreparedStatements or dynamic SQL, so they're not a distinguishing factor. I have focused on the distinguishing factors between defining PreparedStatements and constructing dynamic SQL.
- What exactly are these "distinguishing factors" and how do your examples CLEARLY illustrate them? The only semi-objective metric I see depends on parsing functions versus non-parsing of functions.
- The foremost distinguishing factor is that dynamic SQL executes in a single step, but PreparedStatements are defined (including parameters) in one step and executed (with specified parameter arguments) in another. Because dynamic SQL executes in a single step, the only way to include external data is to incorporate it into the SQL itself; in order to avoid SQL injection, any such external data must be filtered. PreparedStatements allow external data to be passed as parameter arguments during the execution step, so external data never need be included in the PreparedStatement definition. Therefore, PreparedStatement definitions need not (and should not) include external data, but dynamic SQL construction must include filtered external data. In short, it boils down to this: PreparedStatements don't need filters but dynamic SQL does. All else being equal -- and the examples at the top of this page are deliberately constructed to make all else equal between PreparedStatements and dynamic SQL, so as not to make dynamic SQL look worse than it is -- not-filters are simpler than filters. The examples at the top of this page provide an illustration and a proof of that observation.
- Please stop using "dynamic SQL". It doesn't mean anything useful or clear to me in this context.
- I'm using "dynamic SQL" in the usual sense, as described above starting with '"Dynamic SQL" is the standard, and conventional, term.' If it helps, think of it as "SQL queries that don't use PreparedStatements."
- See DefinitionOfDynamicSql.
Your bullet points may make for a worthy discussion somewhere, but they're irrelevant here. Your "example questions" are akin to asking "Is it two ducks, or is it two cherries?" or "Is the 4 displayed in Helvetica or Times New Roman?" when presented with a simple demonstration that 2 + 2 = 4.
You arbitrarily declare what's relevant and what's not, based on waffling "rules" you invent in your ass.
That's an AdhominemAttack, rather than an evidence-based point. It only weakens your credibility. It does nothing to strengthen your position, or diminish mine.
It's a summary of points already made. The evidence/reasoning is at the detail.
I see. So you believe I actually "invent in [my] ass" and the evidence/reasoning for that can be found "at the detail"? You're suggesting I have some sort of... Creative miracle ass?
I meant the part of that reply that is not an AdhominemAttack.
What part of that reply isn't an AdhominemAttack?
Let me reword it in a nicer way:
"Your criteria for 'relevant' appears to be contrived and/or inconsistent, based on my best-guess interpretation of your writing".
Obviously, I disagree. Regardless, the essential fact demonstrated by this page -- that construction of PreparedStatement definitions is simpler (and therefore easier to audit) than DynamicSql construction -- still holds.
Bullshit. You cherry-picked both the filter-related API style and the level of abstraction shown and not shown arbitrarily to mislead the reader.
No, my examples are a demonstration of the inherent difference between PreparedStatement definitions and DynamicSql construction. Don't you think string construction that doesn't need filters (or much string concatenation) is obviously simpler to audit than string construction needing filters (and at least one and usually two string concatenations per item of filtered external data)?
You arbitrarily choose what to count as part of the audit and what to exclude. "Simpler" relative to what? Total theoretically complexity, total actual complexity (how parts are currently packaged & sold), complexity from a shop's (RDBMS user) perspective? The answer changes depending on what perspective is being examined.
Actually, no, the answer doesn't change "depending on what perspective is being examined". PreparedStatement definition construction, no matter how you look at it, is simpler to validate than equivalent DynamicSql construction because the former shouldn't incorporate external data or filters, but the latter must incorporate filtered external data. This boils down to a simple, indubitable relation: All things being equal -- and in terms of constructing PreparedStatement definitions vs constructing DynamicSql, they are, because they're just constructing strings -- "filtered external data" is inherently more complex than "no filters and no external data".
PageAnchor complexity-symmetry
Like I keep saying and you keep forgetting: Yes, filters will require "extra" filter-related code to serve filter-ness, but likewise PS's will require extra PS-related code to serve PS-ness. Filters don't need PS-related code and likewise PS's don't need filter-related code. You count one but make up very poor or contradictory excuses to exclude the second.
I know you keep saying it, but your claim that "PS's will require extra PS-related code to serve PS-ness" is incorrect. PreparedStatement definitions can be sent directly to the DBMS, because the DBMS is designed to recognise them and no further processing is required. DynamicSql must be constructed with filtered external data before being sent to the DBMS. Filters are more complex than no filters. QED.
That's the "PageAnchor Bundle-Argument" argument. If bundling is the big key, then illustrate bundling, not string concentration and functions-versus-non-functions etc. Illustrate the real issue, not side word games. A grammar metric is sooooo distantly related to the bundling argument as to likely confuse the reader.
The grammar metric was only intended for you. I don't know anyone else who is likely to be "confused" by the differences between PreparedStatements and DynamicSql.
Well, it's helping neither me nor the reader. I still have no fucking idea what you are talking about. You keep flip-flopping between practical concerns and theoretical concerns willy-nilly, apparently to artificially justify your contrived, round-about, and obtuse arguments by moving the goal posts. I'll try to give up for the fourth time. Must...resist...urge...to...reply...
This much is clear to me:
IF measuring theoretical complexity THEN
complexity score roughly the same because:
PS's require PS-related-code and filters require
filter-related-code SOMEWHERE in the total necessary stack.
ELSE // practical complexity
complexity score depends on the concern level/scope/jurisdiction
of a particular user/reader (per RDBMS, middle-ware,
and app side/level).
END IF
PageAnchor Bundle-Not-Theory
And note I exclude the "bundling argument" from the theoretical side because how vendors bundle services is a practical-level concern, not a theoretical one because no known related theory dictates what services are bundled or not bundled. There is no inherent law of the universe that says vendors must bundle PS-related services with RDBMS, and tomorrow they may decide to bundle in filter-related services also.
I find it rather baffling that you have no idea what I'm talking about, because it's quite simple: If you're creating PreparedStatement definitions, you don't need filters or external data. If you're creating DynamicSql, you need external data and filters around it. It seems trivially obvious that the former, therefore, is simpler than the latter.
Let's look at it notationally:
PreparedStatement definitions can be created from static strings -- let's call any static string an S -- and concatenation of static strings, e.g., S + S + S, where '+' means concatenation. So, every PreparedStatement definition -- whether inside or outside middleware -- is going to be S, or S + S, or S + S + S, or S + S + S + S, and so on. To save us writing enormously long expressions, we'll use (x)* to mean "x is repeated 0 or more times". Thus, a PreparedStatement definition is S (+ S)*. In other words, a PreparedStatement definition consists of a static string with 0 or more static strings concatenated onto it.
DynamicSql can be created from static strings too, but we also need to concatenate instances of external data D, and we need to filter every D with some filter function F. So, DynamicSql definitions can be S, or S + S, or S + F(D), or S + S + F(D), or S + F(D) + S + F(D), and so on ad infinitum. In general, that means DynamicSql is S (+ F(D) | S)*, where x | y means "either x or y". In other words, DynamicSQL consists of a static string with 0 or more static strings or filtered external data concatenated onto it. (We'll start with an S rather than (F(D) | S) because we'll assume DynamicSql normally starts with a static string like "SELECT", "INSERT", "DELETE", "UPDATE", etc.)
Trivially, we can see that S (+ S)* is simpler than S (+ F(D) | S)*, and thus PreparedStatement definitions are easier to audit than DynamicSql construction.
That's all this page is intended to show.
But you are focusing only on the static "portion" of PS technology. The entire stack when using PS's requires more than just the static portion because "values" must also be processed SOMEWHERE (to be equivalent in results). You are comparing apples without stems to pears WITH stems.
As I've done all along, I am only focusing on PreparedStatement definition on the client-side of the DBMS. Note the title of this page, the first paragraph, and the fact that "PreparedStatement definition" shows up on this page -- as of this writing -- at least 113 times. This page was spawned in a discussion over whether or not PreparedStatement definition construction is easier to audit for SqlInjection opportunities than DynamicSql construction.
It is in SQL construction that errors leading to SqlInjection are likely. PreparedStatement or DynamicSql execution is far, far less likely to result in SqlInjection. Furthermore, the difference in complexity at the point of execution is negligible: PreparedStatement execution requires a reference to a predefined PreparedStatement (often a name, if not disguised behind an opaque handle) and a list of arguments which do not require filters. DynamicSql execution requires a reference to the SQL to be executed. Since both PreparedStatement execution and DynamicSql construction make reference to external data, the references to external data cancel out overall. Thus, the only overall difference in client-side complexity is "no filters required" (PreparedStatements) vs "filters required" (DynamicSql).
Filters are obviously more complex than non-filters.
Here's a more accurate view:
.........DBMS......Middle-Ware...App
-----------------------------------
Filters: DBE.......FM,FAPI.......AC
PS's:....DBE,PSM...PSAPI.........AC
AC = App Code
DBE = Database engine
FM = Filter-processing mechanism
FAPI = API for using filters
PSAPI = API for using PS's (PS = PreparedStatements)
PSM = PS-processing Mechanism
Looking at the entire stack, they both have the same number of necessary "parts": four. If there is an inherent complexity difference between PSM and FM, the two key parts, you have NOT identified it.
The inherent complexity difference is the requirement for filters, which (if we're looking at the entire client-side stack; the server-side is not relevant) consists of both the filter definitions and their application. Both DynamicSql and PreparedStatements will make reference to external data (in DynamicSql such references are made at the point of definition; with PreparedStatements such references are made at the point of execution) but only DynamicSql requires that the external data be filtered.
By the way, counting architectural modules -- which is what you appear to be doing -- is meaningless in terms of measuring overall complexity, because it makes the mistaken assumption that all modules are equal in complexity.
Maybe when you say "prepared statement" you mean literally the "prepared" part of the statement, which excludes value processing. I was generally considering the full picture, not one part. If you are slicing it as such, I don't know what the equivalent "filter" version would be at this point. It's not normally thought of that way, and is an awkward comparison set to try to match up.
The examples at the top of this page illustrate that. They show the "full picture", aside from actual execution of either PreparedStatements or DynamicSql because the intent was to illustrate creation of the constructs. The difference in execution (an argument list vs no argument list) is negligible from an auditing point of view.
[Why are you counting PSM twice on the PS side? It's part of DBE. Or maybe you want to split the DBE into three parts, a core common to PS and DSQL, a PSM, and a DSQLM. In that case, you should add a DSQLM to the Filters side. You should also add a DSAPI to the Filters side that corresponds to the PSAPI.]
[DSQLM = Dynamic SQL processing mechanism]
[DSAPI = API for using Dynamic SQL]
PS technology requires "direct" SQL processing
also. Remember, one can feed an HTML form's contents directly into a PS (without using "?" parameters). True, it's not a recommended practice, but still must be supported to match expected PS behavior and thus is counted as part of its required component stack. Thus, PS technology is essentially a super-set of what you call "dynamic SQL". One can process all "direct" SQL through the PS API's if they wanted, and perhaps under the hood it's the same processing paths in the DB engine (perhaps with some wasted overhead of caching, etc.).
[There would need to be something in the DSQLM section. It has to return a result set immediately when the statement is sent, while the prepared statement does not.]
That's a minor implementation detail. One can view a direct statement as an "anonymous PS" (like HOF's, he he), a shortcut for: run_PS(register_PS($full_sql),$empty_array); instead of the usual $handle=register_PS($full_sql);run_PS($handle,$empty_array); [pseudo-code]
[So now you're saying that the filters side in the database would be DBE, PSM, and DSQLM. That doesn't make things any better for your case.]
Say what? Both direct SQL and PS's require the ability to process a "full" SQL statement (with values) such that it's not a difference maker. However, PS's also require caching mechanisms, value-to-statement matching (tracking), and interfaces to middle-ware to access such features.
[When trying to defend excluding the DSQLM from the filter side, you created a simple DSQLM which used the PSM which used the DBE. You say that in your response that starts, "That's a minor implementation detail.". Since you are trying to measure the complexity of the whole system, you now have to include DBE, PSM, and DSQLM on the filters side since you claim they are being used.]
That's one approach, but it's not the only approach. I'm not dictating implementation here. A full PS "kit" still requires the ability to process full/direct SQL such that processing full/direct SQL is not a difference maker. Caching it under a name or handle does not change that need. (And managing the cache and names/handles is added complexity of PS's.) How one goes about the implementation is up the DB implementer. The bottom line is still the same: PS's require PS-specific complexity that filters don't need and filters require filter-specific complexity that PS's don't need. I have not seen anything new from you that changes this fact.
- [Actually, you did dictate an implementation in your response that starts, "That's a minor implementation detail.". We also started off treating the DBMS part of both as equivalent, but you objected to that.]
- No, it's a suggestion ("one can view...as"), not a decree.
- [Then how does it show that you don't need to count the DSQLM on the filter side? Wasn't that what you brought it up for?]
- Count as a difference-maker or count as a total necessary for the stack? Both require some way to execute direct/full SQL. (Non-parameter-ized. I reject your "dynamic" usage.)
- [Count in the comparison you are making at the start of this section.]
- Based on past patterns, something tells me we won't agree on the meaning/scope of the labels used for the sub-parts and we'd have to implement entire working samples to get around the English Gap so that we are counting code bulk instead of parsing words. I don't think I want to go there. I'll LetTheReaderDecide which argument is more sound.
- [I've been trying to use the terms you use for this. Now, are you going to answer the question about how your suggestion (not a decree) shows that you don't need to count the DSQLM on the filter side?]
- I'm not sure what you mean. Like I already said, processing direct/full SQL is required by both technologies (even if you personally don't use it) such that it's not a difference maker in complexity scores. It's a "part" BOTH need. If I still didn't answer your question, I'm not understanding your question.
- [In your suggestion, you not only included a non-empty DSQLM, you also brought in the entirety of the PSM. Neither of those things were included in your original analysis. But, for some reason I can't fathom, you seem to think that because you can use the PSM code and some other code to implement the filters side, it should be left out of the complexity analysis. So why do you think your suggestion indicates that the DSQLM should be left out?]
That's trivially true inside the DBMS and inside the low-level DBMS APIs, because PreparedStatements and DynamicSql are distinct SQL constructs. However, unless you're a DBMS developer or security auditor, it's irrelevant. This page is focused on the complexity of constructing SQL text, not on interpreting it. When constructing SQL text, the fundamental complexity difference between constructing PreparedStatement definitions and constructing DynamicSql is that the latter requires filtered external data and the former does not.
see PageAnchor complexity-symmetry
At the DBMS API level, PreparedStatements require definition and execution calls, DynamicSql requires just execution calls. Thus, the only complexity that PreparedStatements add over DynamicSql -- in the client-side query construction and execution stack, overall -- is one extra low-level DBMS API call. That's trivial compared to the complexity of filtering external data on every DynamicSql query construction plus the complexity of the filters themselves. Of course, that's irrelevant here, because this page -- as has been often stated -- is about the complexity of constructing of PreparedStatement definitions vs the complexity of constructing DynamicSql. What APIs are used is irrelevant.
"Constructing" by who or what? R2D2? I still don't know what "complexity" you are talking about; you keep waffling up and down the abstraction ladder between DBMS guts, and the application side, and every stop in between. Yes, implementing filters requires complexity, but so does PS implementation. A clear and reasonably-general illustration would help. Your existing examples seem to revolve strangely around functions and seem arbitrary API design choices to me. You are frustrating to communicate with; very indirect and round-about and use English in peculiar ways. -t
"Constructing" by programmers, of course. The complexity I'm talking about is exemplified by the following "clear and reasonably-general illustration", which is representative of either application-level code or what goes on inside middle-ware:
// Dynamic SQL query in PHP assuming PostgreSQL
$qry = "SELECT * FROM customers WHERE name = '" + pg_escape_string($userInput) + "'";
$resultset = pg_query($dbconn, $qry);
// Prepared statement query in PHP assuming PostgreSQL
$qry = 'SELECT * FROM customers WHERE name = $1';
$resultset = pg_query_params($dbconn, $qry, array($userInput));
The above is rather poorly constructed from an application development point of view (freely mixing SQL and application logic is questionable) but it illustrates the point of this page: It's easier to determine that $qry in the second example doesn't reference external data -- because it doesn't need filters and shouldn't reference external data -- than it is to determine that $qry in the first example always wraps external data (i.e., $userInput) in pg_escape_string() or some equivalent. That's because the first $qry assignment is always more complex than the second, because the first always needs filtered external data, but the latter should never contain external data. There's nothing arbitrary about it -- it's a characteristic imposed by the DBMS, by the fact that DBMSs allow two different kinds of SQL constructs, PreparedStatements and DynamicSql, to be defined and used differently.
If you are talking about the app programmer side, then complexity may not be the primary metric, for maintainability and grokkability of the code is usually the key concern.
This page was spawned from a discussion that assumed the primary concern is avoiding SqlInjection, for which PreparedStatements are preferable not only because they avoid risky filters, but because they're easier to audit (because they don't need to reference external data or filters). If avoiding SqlInjection isn't the primary concern, but the primary concern is ensuring that accesses to the DBMS are minimised, than DynamicSql is obviously preferable. If the primary concern is minimising CPU resource for multiple executions of a given query, than PreparedStatements are almost inevitably preferable. If the primary concern is simplicity and readability of query definitions, then PreparedStatements are almost inevitably preferable.
Again, if we are only auditing the application code via automated code inspectors, then the "score" depends on the API style used, not the underlying injection prevention technology. You appeared to agree that such an automated code checker would only be used on app-side code, not middle-ware nor DB implementation code. As far as the last sentence, I disagree for reasons already given, but I'm not here to debate grokkability and so will not repeat those points.
"Automated code inspectors" were an aside; they weren't the point. The point is that construction of PreparedStatement definitions is simpler than construction of DynamicSql. You can see it immediately above in the difference between the two examples: The first $qry assignment requires filtered external data but the second $qry assignment does not. Hence, the second is simpler than the first. The illustrations at the top of this page are simply a general proof of that observation, using grammar complexity and assuming we've put all the "$qry = ..." definitions together by themselves. If you do put all the "$qry = ..." definitions together by themselves, I noted (as an aside) that you can easily use automated code inspectors, but it's not a requirement. The point is that no matter how or where you inspect the client-side code -- whether manually or automatically, whether inside middle-ware or not using middle-ware -- PreparedStatement definition construction will be simpler than DynamicSql construction because the latter requires filtered external data but the former does not.
I again shall refer you to PageAnchor complexity-symmetry.
How does "complexity-symmetry" affect auditing the construction of PreparedStatement definitions or DynamicSql?
If it's an "aside", then you've failed to illustrate and measure your complexity claim.
Please read the top of this page. You'll note that my complexity claim is that DynamicSql construction is more complex than PreparedStatement definition construction because the former requires filtered external data but the latter does not. This is demonstrated by a difference in grammar between the former and the latter -- the grammar for code that constructs DynamicSql needs to include code for filtered external data, whilst the grammar for code that constructs equivalent PreparedStatements excludes (by omission) code for filtered external data. The fact that you can use such grammars to automate validation of code was an aside (and a bit of a bonus, perhaps) but it isn't the point. The point is that one SQL construct needs filtered external data and the other doesn't.
While I will concede that certain parts of filter technology may be more complicated (perhaps parts that don't even matter to most developers or shops because they can be abstracted away), I've seen no clear evidence that the total sum complexity is less. You appear to be improperly extrapolating cherry-picked aspects into the whole, which to me is being intellectually sloppy. Shape up! It comes across similar to:
"Bicycles require bicycle seats while cars don't.
Therefore, bicycles are more complicated than cars."
Now, maybe in the end the total complexity for filters is indeed more than that of total PS technology, but you have provided very little encompassing evidence other than a narrow selection of pet parts you keep obsessing on. -t
What I "keep obsessing on" is the crucial and inescapable difference: requiring filters vs not requiring filters. Between PreparedStatement definitions and DynamicSql construction -- including the low-level APIs to send SQL to the DBMS and retrieve result sets or response codes -- there is no other difference in complexity.
- PROVE there is "no other difference in complexity", don't just claim it. Claims are cheep. (I suspect you are going to go back to the "bundling" argument. That's fine, but if bundling is your key argument, then say so and admit it's about vendor packaging choices instead of INHERENT differences between the two techniques. Commit and don't be waffly.)
- Have a look for yourself: http://www.postgresql.org/ftp/odbc/versions/src/ and http://www.postgresql.org/ftp/source/v9.3.4/ in /postgresql-9.3.4/src/interfaces/libpq/ Of course, these are really parts of the DBMS, and are unrelated to the construction of DynamicSql or PreparedStatement definitions. This page is fundamentally about differences in complexity between constructing DynamicSql strings and constructing PreparedStatement definition strings on the client-side, not how they're sent to the DBMS.
- I'm not sure what to look for in the source because I don't know what your claim really is, especially in terms of scope. I'm not going to spend 5000 hours digging in source and cataloging it to prove X when likely you'll end up saying, "but I never claimed X". Do you mean "constructing" by application developers? They can be using Example-1A style construction if they want (and have the middle-ware to support it). And the fact that some parts are in middle-ware versus the main RDBMS is an arbitrary packaging choice: the position/level is not inherently forced by the nature of the two technologies themselves. There is no inherent feature of filters that keeps much of them from being implemented in the out-of-the-box RDBMS. I could fork PostgreSQL into FilterGreSql? and add native filtering API's/services to put the filter services at the bottom level of the stack. It's just arbitrary packaging. And I could remove PS-related services to reduce the code size (and complexity). Support for filters adds complexity somewhere and support for PS's adds complexity somewhere; and that "somewhere" is pretty arbitrary. A total stack that doesn't support filters will be smaller than a total stack that does support filters, and a total stack that doesn't support PS's will be smaller than a total stack that does support PS's.
- My claim is that it's more complex -- and therefore more difficult to audit -- construction of DynamicSql (because of needing filters) than constructing equivalent PreparedStatement definitions (because they don't need filters). Unless your FilterGreSql provides some new method of DBMS interaction that isn't DynamicSql or PreparedStatements, DynamicSql is still going to need to include explicit invocation of filters for external data -- and PreparedStatements won't -- even if the filters are (in some fashion) provided by the DBMS.
- I again shall refer you to PageAnchor complexity-symmetry. You are arbitrarily cherry-picking your scope of analysis. Knock it off!
- How does "complexity-symmetry" affect auditing the construction of PreparedStatement definitions or DynamicSql?
- I'm still not fully sure of what you are calling "construction" and "dynamic SQL". The answers seem to dance between very low level and very high level of abstraction depending the flapping of some unknown butterfly in an unknown belfry.
- See Example-1B at the top of this page for a typical example of DynamicSql construction that would need to be audited.
- That's not "typical" in my book. And even if it was, nobody is stuck with that API or coding style.
- It may not be typical in terms of your (or anyone's) particular coding style, but it is typical in terms of the content involved in constructing DynamicSql. SQL DBMSs invariably expose an API method that permits submitting DynamicSql text for immediate execution -- regardless of API or coding style -- so constructing DynamicSql text for immediate execution must involve concatenation of static strings with filtered external data. That's what Example-1B is intended to show.
A better analogy (to the extent that any analogy is comparable) is that cars with automatic transmissions (PreparedStatement definitions) are easier to drive (audit) than cars with manual transmissions (DynamicSql construction) because cars with automatic transmissions don't need a clutch (filters) to change gears (external data).
But the total manual-shift car is mechanically simpler: it has fewer total parts. The "complexity" is matter of perspective (car versus user effort), and I don't know what perspective your PS claim is coming from, and attempts to narrow that down keep proving futile for reasons that escape me.
Fewer parts is irrelevant. Using the car analogy, we're talking about driving complexity (filters vs non-filters), not mechanical complexity (DBMS internals). What we're talking about here is the complexity of having to deal with filters when using DynamicSql that is eliminated when using PreparedStatements. From the point of view of a middle-ware or application developer (or auditor), that is significant.
Why are you counting middle-ware complexity but not DBMS complexity? Because it's "bundled"? You can only exclude DBMS-side complexity if you can clearly justify the reason for drawing such a seemingly arbitrary line in the stack-sand.
The focus of this page is, and has always been, the complexity of constructing and (particularly) auditing the construction of DynamicSql vs PreparedStatement definitions. The DBMS interprets these statements, it doesn't create them. Obviously, creating SQL is particularly of concern in avoiding SqlInjection, and discussing that topic in relation to PreparedStatements is what spawned this page. Whilst auditing DBMSs is certainly a concern for a relatively small number of security-critical applications, it's not an area that the typical database-driven application developer, or the typical middleware developer, is likely to encounter. For most database-driven application or middleware developers, auditing the client-side is a valid concern; auditing the DBMS is not.
Furthermore, within a typical SQL DBMS there is no strong separation of PreparedStatement handling from DynamicSql handling. In SQL DBMSs, these are largely integrated, such that an auditor is going to be looking at the security of the DBMS as a whole. Thus, it no longer makes sense to distinguish PreparedStatements from DynamicSql -- at least in terms of auditing complexity -- on the DBMS-side.
"As a whole"? Please elaborate.
An application mission-critical enough to warrant a source-level audit of a (likely enterprise-grade) DBMS is almost certainly going to audit the whole DBMS, and not just stop at (say) the part that recognises PreparedStatement parameters.
Anyway, auditing a DBMS -- relative to auditing application code or middleware -- is such a specialised, rare and unusual task as to be essentially irrelevant here. This page was spawned from concerns over avoiding SqlInjection related to PreparedStatements and DynamicSql, which is a concern every time a database-driven application is built. Concerns about a DBMS sufficient to warrant auditing the DBMS source is a whole different category of activity, with different motivations, and it almost never happens. For instance, auditing the DBMS is almost certainly not about SqlInjection (or at least it's a negligible concern) because SqlInjection results from constructing queries; it does not result from parsing, planning, and executing them.
The only level we both agree on is the app level. Doing middle-ware and/or DB source inspection is up the organization per needs and resource trade-offs. And app level depends heavily on the API design, which is mostly orthogonal to the PS-versus-filters issue. QED #47.
The style of coding at the application level -- or what happens inside middle-ware -- may depend on the DBMS API design but the content of constructed queries is dependent on what functionality the DBMS exposes:
- If the DBMS exposes a mechanism to support immediate execution of SQL text (i.e., DynamicSql), then the SQL will inevitably be composed of static strings concatenated with filtered (to be safe from SqlInjection) external data.
- If the DBMS exposes mechanisms to support definition and execution of pre-defined parametrised SQL text (i.e., PreparedStatements), then the SQL definitions will inevitably be composed of one or more concatenated static strings, but external data should be excluded from the definitions because external data (no filters needed) should be passed as parameter arguments at the point of execution.
That will be true at the low-level DBMS API, no matter what API design is used, as long as the aforementioned DBMS functionality is exposed. (Note: Every popular SQL DBMS exposes it via ODBC, JDBC, native APIs, etc.) It will be true no matter what higher-level API design is used on top of the low-level DBMS API, though the difference between DynamicSql and PreparedStatements may only be visible inside the higher-level API. ObjectRelationalMappingLayers, for example, may not expose PreparedStatements or DynamicSql (or any SQL at all) but the developer of the ObjectRelationalMappingLayer will have had to deal with either or both.
In short, whoever interacts directly with the DBMS will have to tackle the difference in complexity between constructing DynamicSql or PreparedStatement definitions, because they're invariably what the DBMS exposes to the client-side.
AprilFourteen