Odd Hybrid Arrays

Languages such as PhpLanguage and JavaScript seem to have odd array implementations. They are like a hybrid between arrays and lists. Associative arrays in PHP are generally maps, but also have an ordering. However, one is recommended not to rely on this ordering in some situations.

Is this bad language design, a performance-enhancing compromise, or just plain nonsense?

Sounds to me more like a misunderstanding. How about a reference?

In general in all languages that offer associative arrays, either the nominal traversal order is essentially unrelated to the insertion order (extremely common, and perfectly reasonable, and one needs to sort or do a special traversal method or maintain a secondary index or key set or something if one wants insertion order as well as associative lookup), or else the documentation explicitly says that the traversal order is guaranteed to be the same as the insertion order (relatively rare). In the absence of an explicit guarantee, one should never rely on the two orders being the same, even if some test case seems to show that the insertion order was preserved in some particular case.

I don't see how it could be more complicated than that. Perhaps you misread some documentation, or perhaps someone wrote some misleading documentation that should be corrected. Hard to say, without seeing the quote(s) in question.

I mean, what, is some language designer going to say "the insertion order isn't preserved, but you should go ahead and act like it is as long as you don't mind the fact that that's false and you'll have bugs"??? The question seems to practically answer itself. Something's either guaranteed or it's not.

http://www.php.net/manual/en/language.types.array.php http://www.ecma-international.org/publications/standards/Ecma-262.htm (page 100)

Basically, arrays in these languages are always dictionaries. (In JavaScript and PHP5, objects are dictionaries too, which lets you iterate over the names of the methods, a convenient way to deal with the lack of documentation for many DOM objects.) However, the keys can be either strings or integers. String keys give normal dictionary behavior. Integer keys give sequence behavior. Setting the last element ($arr[] in PHP) gives it a key equal to the largest numeric key + 1.

I couldn't find anything about the iteration order in the PHP manual, but given the existence of the asort/arsort/ksort/krsort functions and their user comments, I assume there's an internal iteration order that's initially determined by the order of insertion and can then be manipulated through the sorting functions.

In practical usage, it's a mixed bag {pun?}. There're some neat tricks you can pull with it, eg. many DB abstraction layers give you back a result set indexed both by column position and column name. But this can also lead to hard-to-find bugs: if you try to iterate over that result set, you'll get twice as many fields as you expect. It does add to the generally kludgey feel of the language, but that's about par for PHP.

That is just done via duplication isn't it? JavaScript allows one to index something by both name and positional index, but I don't think that is done by value duplication. What exactly it is I don't know. A "lap"? (list map)

Understand, every time you think "I can't imagine a language designer doing that", chances are that the PHP designers have done that. See for example the "Why is $foo[bar] bad?" discussion on that PHP manual page. -- JonathanTang

It is my understanding that PHP arrays are more than just a map. I will see if I can find some documentation to quote.

[Not to defend PHP too much, but the example that Jonathan cites is less an indictment of PHP in particular, and more an example of why well-intentioned DWIM (Do What I Mean) "friendly" features can cause more trouble than "unfriendly" strict interpretation in languages, which in turn shows why we are likely to always have programming languages rather than just telling computers in English what we want. Quote from above URL:]

"You should always use quotes around a string literal array index. For example, use $foo['bar'] and not $foo[bar]. But why is $foo[bar] wrong? You might have seen the following syntax in old scripts:"

 <?php
 $foo[bar] = 'enemy';
 echo $foo[bar];
 // etc
 ?>  

"This is wrong, but it works. Then, why is it wrong? The reason is that this code has an undefined constant (bar) rather than a string ('bar' - notice the quotes), and PHP may in future define constants which, unfortunately for your code, have the same name. It works because PHP automatically converts a bare string (an unquoted string which does not correspond to any known symbol) into a string which contains the bare string. For instance, if there is no defined constant named bar, then PHP will substitute in the string 'bar' and use that."

"...When you turn error_reporting() up to show E_NOTICE level errors (such as setting it to E_ALL) then you will see these errors. By default, error_reporting is turned down to not show them."

[This sort of "friendliness" is quite common in various scripting languages. Part of the suggested fix, turning up the volume on warnings, is also best practice in many other languages such as Perl and C.]

As a "scripting fan", I'm generally against the above design. Scripting languages don't "have to" have leaky features like the above. If it offered a sufficient benefit to counter the down-side, I might accept it, but I don't see it here. But then again, the pattern and frequency of my mistakes often differ from that of others. Perhaps we need an InstantLanguageForm for just scripting/dynamic so I can click a few buttons and get a language that fits me and only me. -t

From the PHP documentation (re-spaced):

  $arr=array(5 => 1, 12 => 2);  // sequenced initializer
  $arr[]=56;    // This is the same as $arr[13]=56; at this point of the script
Although this "auto-max" can be a handy shortcut, it can also be very confusing if one forgets about it. If you use one and only one language, then one gets use to it and is not thrown by such when encountered in existing code or via typos. However, in many shops, especially older businesses or ones merging and un-merging often, one switches between multiple languages, making it hard to master any one and keep track of the oddities of each.

Reminds me of a saying: In static-typed languages, a single random character change will usually stop the compiler. In a normal scripting language, such a change would make it produce the wrong output. In Perl, it would make it a whole different program. -t


See also: MergingMapsAndObjects


EditText of this page (last edited March 30, 2011) or FindPage with title or text search