Every language has "gotchas". These are oddities that stump you for a while when you first encounter them. This is not meant to be a HolyWar about which language is better, but rather a catalog that helps experienced developers transition to different languages more quickly, by pointing out things which don't behave like one would expect compared to other languages. It is kind of the DeltaIsolation method of learning. Generally, languages have their own pages if you want to complain about the reasons for given features. (Feel free to add language-specific links.)
PhpLanguage
- "Optimized functions" that don't behave like full functions. For example, the "empty()" function cannot take expressions, only single variables. The workaround is to assign to an intermediate temporary variable or create your own wrapper function. (Version 4)
- you can override static functions but not static variables(php5)
- Using the return result of some functions can result in problems if a zero is returned because it is difficult to tell the difference between zero and false. One must use 3 equal signs or check the type. Examples of such functions include strtok, [forgot, come later].
- One quick fix is to append a blank space at the beginning of a string, which emulates one-based positional indexing.
- (This could have been avoided by using one-based indexing, which is more natural than zero-based in my opinion for most positional stuff.)
- Arrays have internal current element pointers, and you can forget to reset() them and wonder why you are missing elements when iterating over an array.
- mktime() has been "fixed" from its C counterpart (month is index from 1), but localtime() has not been
- Variable references can cause memory corruption if you declare a function as returning a reference and return a constant, and also in other cases which are not syntactically preventable (or even really detectible by the interpreter)
- Biggest of all, IMHO: $_REQUEST[] $_SESSION[] $_SERVER[], ob_start(), and all sorts of constructs which make writing orthogonal code require a lot of mindfulness, since all of your code is essentially tied to the interpreter's state.
- I wrote my own wrapper functions for these so that I only have to change one spot (in theory).
- array('foo', 'bar') + array('baz') == array('foo', 'bar'); array('baz') + array('foo', 'bar') == array('baz', 'bar'); So use array_merge(), not + or +=
JavaScript
- Be careful how you compare variables. You may have to append a blank or add a zero to force the internal "type" flag to be what you want it to be, otherwise you may accidentally compare a string as a number and visa versa.
- Never use Date.getYear . Use Date.getFullYear instead.
- the concatenation operator is a plus sign, so 5 + 3 may very well end up being 53. ParseInt?(5) + ParseInt?(3) is the workaround.
- ParseInt? can return octal results sometimes. Don't forget base parameter. See DebuggingNightmare.
PythonLanguage
- default values persist in each call. I mean this:
>>> def f(default=[]):
... default.append(10)
... print default
...
>>> f()
[10]
>>> f()
[10, 10]
>>> f()
[10, 10, 10]
- it has a DoubleBacktick? construct, like shell, perl and others, but while:
foo=`bar`
-
- means get the output of command bar in other languages, in python it is equivalent to
foo=repr(bar)
- on a l-value you can call an operator but not a method (not a big thing anyway)
- builtin methods have an always different id:
>>> a=10
>>> id(a.__xor__)
10263568
>>> id(a.__xor__)
10263504
>>> id(a.__xor__)
10262672
CeePlusPlus/CeeLanguage
- Confusing assignment with test for equality - especially code such as if (a=b) rather than if (a==b), or a vacuous statement such as a == a + b (rather than a = a + b). You can protect against the former where a constant is involved by deciding to CompareConstantsFromTheLeft.
- * What language doesn't have this property? I only know VB that use "=" for both assignment and comparison
- Returning pointers to TheStack. i.e.
char *int_to_string (int i)
{
char buffer[256];
sprintf (buffer, "%d", i);
return buffer; /* Evil and nasty */
}
-
- Often works initially, until buffer is overwritten by some other function's stack frame. This should be an obvious DoNotDoThis? "(ThingsYouShouldNeverDo --MarkLaBarbara)", but I find this sort of stuff in code reviews far more often than I should. A common newbie mistake.
- pointer = realloc( pointer, some_num ). If it fails for some reason, it returns NULL, and you have lost the pointer to your original memory, which is not freed. In fact, the whole idea that you have to manage your own memory.
- Of course, you can do pointer2 = realloc(pointer,some_num); if for some reason you need to use realloc.
- And, don't forget to check if pointer2 is NULL, and take appropriate action. A reasonably robust call to realloc looks like this:
void *pointer2 = realloc( pointer, some_num );
if (pointer2 == NULL)
do_something_about_the_failure(); /* usually try to crash gracefully */
else
pointer = pointer2; /* and don't forget to update other pointers that are supposed to point to the same data... */
- The CeePreprocessor. If you are using CeePlusPlus, you can mostly avoid the pre-processor. If you are using CeeLanguage, avoid function-like macros where possible (use functions instead), and avoid the token-pasting operator ##.
- Never use gets(). Use fgets() instead, if you're using ANSI C, or ____ if you're using C++. GetsIsDangerous.
- See also MemoryLeakInCpp
- Is C (and C++) the only language where the comment start token is also a part of a valid statement? (e.g "a/*p" is a valid expression!). Caught me out when I was hand-optimizing the first C compiler (and first significant C program) I wrote - I'd changed p[0] to *(p+n) because it was faster with the compiler I had...
- Surprising operator precedence, for certain combinations of operators.
This code really, REALLY looks like it should compare zero to certain bits from a:
if (a & MASK == 0) ...
but instead it parses as horribly broken nonsense, because == is evaluated before &:
if (a & (MASK == 0)) ...
So you must explicitly add parentheses to get the desired effect:
if ((a & MASK) == 0) ...
(I'm curious whether anyone can explain why these operators parse this way. Is there ever a situation where it's helpful?)
''Dunno about C++, but in C# besides the bitwise usage you can use the & with logical operations as a non-shortcircuiting "and" for those cases
where the side-effects are significant. Should work the same in C/C++, although I'd be wary of using it on non-boolean objects. In that case their
late position in evaluation order becomes useful.''
Related: CppGotchas (book)
JavaLanguage
- Changing the Object references in method parameters will not affect the caller, only the contents of the object may be modified, i.e.,
public void fn()
{
StringBuffer sb = new StringBuffer("abc");
f1(sb);
System.out.println(sb.toString()); // will print abcd
f2(sb);
System.out.println(sb.toString()); // will still print abcd
}
private void f1(StringBuffer s)
{
s.append("d");
}
private void f2(StringBuffer s)
{
s = new StringBuffer("xyz");
}
-
- Passing in an array if you need to change the object reference. I find it sad that over 2/3 of the fresh graduates that I interviewed said the second println will print "xyz".
This isn't a Java thing... it's an OO thing in general... other languages share this same behavior...
Or it could be a result of JavaPassesByValue...
Again.... Java isn't the only language to do this... CeeSharp does exactly the same thing by default... that's just how OO works... passing a reference object, will always have the effect of allowing changes to the object to show to all callers; that's what it means to be a reference object. This isn't a language gotcha... it's a knowing the difference between a reference object and a value object gotcha. -- AnonymousDonor
PhpHypertextProcessor doesn't work like this. By default, it passes objects - whole objects, not references to objects - by value. This is a frequent source of bugs in PHP, as modifications to an object don't show up in the caller unless it's explicitly passed and stored by reference. This has lead some people to complain that PHP is not object-oriented.
Then PHP is treating its objects as value objects, much as an int or string would be treated.
CeePlusPlus also doesn't work like this. Unless explicitly marked, parameters are passed by value and don't behave polymorphically. Unfortunately, this has not lead people to complain that C++ is not object-oriented. -- JonathanTang
Then C++ is treating its objects as value objects, much as an int or string would be treated. None of that changes the point that this isn't just a JavaLanguage gotcha, it's a value vs reference thing, as I said before, CeeSharp and even VisualBasic share this same behavior. The difference between a value object and a reference object is conceptual; the language is irrelevant to understanding the behavior difference between them. One must of course know how the language passes parameters... but that doesn't make it a Java problem.
Java doesn't differentiate between objects and references-to-objects, and because of this it has a ConceptualMismatch? between variable access and variable assignment. In the above code, some operations on what appears to be a StringBuffer (but which we all know is a reference-to-StringBuffer) act on the object (such as the append function), and some operations (such as assignment) act on the reference. More confusing still are Java's compound-assignment operators (such as String's +=) which act on both the object and the reference.
CeePlusPlus deals with this properly by not confusing reference objects (pointers and smart pointers) and the objects they refer to. Consequently, you need to distinguish between . and ->, and you need the & and * referencing and dereferencing operators in order to avoid this Java inconsistency. Note that what CeePlusPlus unfortunately calls references are actually not references in this sense, and are just aliases.
I'm still failing to understand what anyone finds confusing about this, seems perfectly logical to me. Here's the gist of my understanding, just to put it on the table for analysis. Objects exist somewhere in memory, variables hold pointers to those objects. I'm going to use the term pointer in the abstract sense, not in the CeeLanguage sense. Variables can either be values themselves, or be pointers to objects located in memory. When passing variables, you can pass by ref or by val (language-dependent of course), so naturally if you pass a variable that points to an object by value, then that variable is copied to create a new variable in the function, but it still points to the same object. If you pass a pointer variable by ref, then it isn't copied, it's just handed in to the function as is. Thus if you change the pointer variable to point to a new object and it was passed by ref, you change it for the function's caller too, whereas if you change a by-val pointer variable it has no effect on the caller since you received a copy of the caller's pointer. But all of this only applies to the variable, not to the object it points to. Regardless of passing by ref or by val, all changes to the object will be reflected to anyone holding a pointer variable to it. Thus passing a reference object by val appears to still pass by ref, because a reference object is a reference object, regardless of how a pointer to it was passed. Passing a pointer to a reference object by val, just makes a copy of a pointer, but it still points to the original object.
Wow, that was way more complicated to explain than I thought it'd be, seems so simple when I visualize it in my head. I think what makes it easy for me is that I don't see variables as objects, merely pointers to objects. I intuitively understand that passing by val is just copying the pointer and not the object, because I know I'm not passing objects around, only pointers to them. To me, the important thing is value vs reference semantics. If I have a customer object, that's a reference object, there should only ever be one of any particular customer, though many other objects may have pointers to it, any changes to it reflect to all pointers to it. But if I have money object, that's a value object, it doesn't have identity, so I don't pass around pointers to it, I pass around copies of it. Every variable points to a different object, thus no aliasing issues. You can fake this by making the object immutable. I feel passing all values by val as the default, is the correct thing to do, I think JavaLanguage does it right, in fact, even though in CeeSharp I can pass variables by ref if I so choose, I've never found a reason to.
(Digression can be unindented, since it's really a digression on a digression that's now on WhyIsTheFirstArgSpecial :)
- (You've never had reason to pass by reference? That's interesting; do you work hard to avoid it, or has it never come up? If the latter, what was your first programming language? It must have left you with a certain kind of mindset that automatically avoids this.)
- Let me rephrase: I've never found a reason to in CeeSharp; I started in VisualBasic, and did pass by ref often, but I found that I was doing it to get back multiple results from functions, which I now do by returning objects. Since taking up CeeSharp, I've become an object oriented programmer, primarily, and find no reason to ever pass by ref. When do you pass by ref and for what reason?
- [I've also never found a reason to pass by reference. I started with JavaLanguage. Any usage of PassByReference can be converted into returning multiple values; in languages that don't support that, it can then be converted into returning objects. Sometimes the resulting code is significantly more verbose than the original; you are, after all, making explicit all the variable-setting that's implicit in pass-by-reference. But in a good OO design, you don't need to copy the variables out anyway, because you just call methods on the returned object, which work on the values en-masse. -- JonathanTang]
- Naturally it can always be avoided, but it depends on philosophy and/or context whether it is always desirable. I sometimes find that I need to add a status return to something that is already returning a scalar, and rather than creating a new object for that, I sometimes use a reference parameter. (Don't say "exceptions"; they don't solve every such thing. Or shouldn't.)
- I am aware that some people think this is a horrific practice, but I think that's a Purist attitude; as you say, sometimes the alternative is pretty verbose, which I consider to be the worse evil. All other things being equal, verbiage decreases readability.
- The real problem here is that, what one really wants to do in such a situation is to merely return more than one value, but most languages don't directly support this. CommonLisp does, via an odd mechanism tacked onto its side, but it's more of a library trick than a central language mechanism. -- dm
- Actually, I've noticed lately that the only time I want anything back, is when I'm looking for a particular object or collection of objects, or a simple value. But rather than needing to return multiple values, I'd just pass in an object and tell the function to do something to it. I'm getting pretty big on TellDontAsk lately, I find it greatly simplifies my code. I don't ever run into a reason to need multiple values returned from a single function.
- As I said, it's not absolutely necessary, and different people will have different judgement calls. But consider that they can be very handy as an in-between step when refactoring someone else's code, even if you don't want to have them in the final code.
[Discussion on why it makes sense to pass the first, 'this' parameter by reference moved to
WhyIsTheFirstArgSpecial]
Keep in mind that this topic is about gotcha's, and not really about whether something is "logical". It is about being tripped up regardless of whether that tripping is from bad language design or one's own ignorance.
No, this topic is about LanguageGotchas, tripping on ones own ignorance is not a LanguageGotchas. LanguageGotchas are where the language doesn't do what you'd logically, based on experience, think it would do. Like scope in JavaScript, has C's syntax, but not C's scoping rules, or VB.Net's changing the way scope is handled from VB 6, which only scoped at the method level, not the block level. Programmer ignorance however, can't be blamed on the language.
I don't see why Java arguments aren't all const anyways. You shouldn't be editing the argument in valuetypes, and you shouldn't be editing the reference itself in objects. The compiler should tell you so - it would eliminate this whole class of gotchas, as they'd be compile-time errors. If you need to conditionally substitute an argument, then make it a new variable. I expect this kind of "enough rope to hang yourself" mentality in C++, but not in Java.
VbClassic
- Instances of classes that use interfaces behave differently depending on which interface is used. If the instance is in a dynamically typed variable (Variant or Object), the interface in use may depend on such non-intuitive things as the return type of the last function that supplied the reference to the variable. In VbClassic, each interface provides its own implementation of each method, and in fact, a method provided only via an interface won't even be visible unless you cast the instance to that interface type.
- ByVal and ByRef have a non-intuitive meaning when objects are involved because of the 2 different meanings of "reference" in VbClassic, one meaning for parameters, and one for class instances. One would think (and many smart VB programmers do) that if you pass an instance of a class with a default value property to a ByVal Variant procedure argument, the procedure will receive the value from the object's default value property, but this is not the case. The problem goes undiscovered for the most part, and leads to all sorts of intermittent run-time bugs. What actually happens when you pass an instance ByVal is that the procedure gets its own new, local reference to the instance and increments the reference count. Passing ByRef, on the other hand, causes the procedure to share the calling procedure's reference to the instance such that placing a different reference in the parameter variable will also be replacing it in the variable passed to the procedure argument.
This is just the ordinary meaning of CallByValue and CallByReference . It's not exactly a gotcha. It would be a gotcha if VB *didn't* act this way
- When checking Err.Number after a statement executed following On Error Resume Next, you may find an error number left from a prior statement executed in a similar fashion, even one that happened in a very distant part of the code. This is in spite of the fact that execution of many types of statement clear the error number. Avoid this by executing Err.Clear before and after such statements, or better yet, avoiding identifying errors by any other means than On Error Goto <label>.
CommonLisp
See http://wiki.alu.org/Lisp_Gotchas
SmalltalkLanguage
- A . at the end of a method can make the difference between returning something useful and returning self. Use hard returns.
- Extending the Collection class with more variables means you have to override grow to save your variables.
- at:put: but it occurs so often that you get used to it, though it remains annoying.
There are 3 other widespread gotchas in ST and there's one esoteric gotcha involving execution order with side effects, because compilers are single-pass but the language spec is triple pass. So in the example:
- aStream doSomething: (otherStream jump: 5) withSomethingElse: (otherStream next)
the jump: executes before the next. I should check out whether it actually works this way though.
PerlLanguage
Too many to mention? Maybe; take a look at the official list at http://perldoc.perl.org/perltrap.html
The first traps mentioned are not using the 'warnings' and 'strict' pragmas. These prevent a whole host of miserable problems in exchange for a little extra discipline.
Oracle SQL
- The "dummy" ROWNUM column does not always work right as a "Top N" filter tool. There are tricks to work around it such as nested queries with a SORT on the inside, but be careful.
MySql
- If you don't configure auto-increment numbers correctly, the system may reuse "deleted" numbers. This can cause unexpected oddities, including while restoring from back-ups. (The default should be non-reuse in my opinion.)
- If not configured a certain way, it may truncate string values when moving columns of different sizes without warning.
(Not really a "language". Perhaps topic should be renamed to something more general, like "Tool and Language Gotchas" or the like.)
SQL (General)
- Be careful when deciding between "UNION" and "UNION ALL". Make sure your record set has the necessary keys if using "UNION".
http://stackoverflow.com/questions/1995113/strangest-language-feature
CategoryPitfall, CategoryLanguageDesign (learn from experience)