Python White Space Discussion

By watching this discussion I had a feeling that both camps (C++ vs Python) are concerned only about safety. That Python significant spacing was made just to type safer and more readable code than C/C++. Well, if that's really the main issue, then answer is self-evident: use both brackets AND significant spacing. Safety is usually implemented through redundancy.

Using both techniques would enable parser to detect cases where you indented a block of code, but forgot brackets. That would be an error. Or a case where you used brackets, but without proper indentation. That would be a warning. Would this increase noise? Well, if a function has 200 characters, adding brackets would result in 202 characters. So we get a huge increase in safety, with only 1% increase in size. Sounds like a good deal to me.

However, the problem is not in safety. The problem is that someone don't want to type brackets. While the others are afraid of making a mistake by typing the wrong number of tabs/spaces. If an average person has to choose between the lazy approach and safe approach, the lazy approach usually wins.

If I'm wrong, if the main concern is really about safety, then language should enforce using BOTH. Additionally, such a solution would equally satisfy (or equally annoy) both programming camps.

Zexx


PythonLanguage may have much to recommend it but the spacing fascism is simply evil to the point of being intolerable. It would be nice if in general it was common practice to separate language syntax and semantics so that those of us who wanted to use the Python (or JavaLanguage or RubyLanguage or whatever) semantics and runtime could do so and still use the one true language syntax (ObjectiveCee, obviously).


When human reads the code, he looks for indentation, not for brackets. So should the compiler, to avoid difference in what programmer thinks and what compiler does. -- PetrMares

Interesting subjective opinion. Do you have any evidence for your claim? do you look for brackets when launching procedures, or are spaces what people look for and therefore procedure brackets() should not be something we use? Again, do you have any evidence?

It's self evident. Take some C code and remove all the indentation. It will be very uncomfortable to read what's going on - but ultimately possible, if you put your language-lawyer specs on. Now remove the braces but preserve the indentation. If your code was indented properly in the first place, readability is unchanged. (in fact, some would argue that because you improved the data/ink ratio, it's marginally improved. <<- does 'indented properly' == 'however python wants it'?

Wrong. By stripping out the braces, you remove visual references, impairing readability. What's self-evident is that the only thing you've saved is the trouble of matching delimiters while writing the code, with readability as a tradeoff. It's a bad trade.


[refactored from PythonProblems]

If you come from CeeCeePlusPlus you might naturally be confused that there aren't an unlimited amount of namespaces that you can create with those jolly little seagull wings { }, and that indentation controls the block structure. This is something you will get used to fairly quickly, but the indentation might cause some problems:

I think the pros are bigger than the cons, though. Python code is much easier to read; always correctly indented and almost without syntactic noise.

-- MagnusLyckaa


Before trying Python, I had written most of my code in CeeLanguage, CeePlusPlus, and PerlLanguage, and I never had a strong aversion to seagull wing formatting. I also had been annoyed by Visual Basic, which does not have curly braces. I liked punctuation, so Python seemed silly. After writing a few programs in Python, though, I have say that I like syntactically significant whitespace. I set my editor to replace tabs with four spaces when working with my own code. -- SteveHowell


Eight spaces? I surely hope not.

Behold, PEP8! For it is written:

  "
  Use 4 spaces per indentation level.

For really old code that you don't want to mess up, you can continue to use 8-space tabs. "
-- http://www.python.org/dev/peps/pep-0008/

LaughOutLoud. In 35 years they will reduce it to two spaces.


Best practice: ConvertTabsToSpaces. Always. If you get a weird error when you are running a Python script that can't easily be attributed to an obvious bug, it's a whitespace error.

Alternative best practice: Convert leading spaces to tabs. Always. And set your tab display width in your editor to whatever you like. You need to do both, but everyone working on the code gets what they want.


Python assumes a tab setting of 8 (as God intended). (as God indented)

I'm sorry? As God intended? Have you ever tried to read code that has a tab setting of 8? I usually have mine at 2 and that suits me fine; 4 would be bordercase. But with 8, hardly anything is legible.

Please don't stir up another HolyWar. It isn't worth it.

This "as God intended" thing is an old joke by Python's creator, GuidoVanRossum. Don't confuse indentation with tab setting. Most Python programmers use 4 spaces for indentation. The only high-profile Python coder I know that uses tabs is FredrikLundh. I'm pretty sure he doesn't use a tab setting of 8 in his editor, but as long as he is consistent in using tabs, that won't matter. The only issue is if a Python program uses a mix of tabs and spaces, and is edited with a tab setting other than 8. In that case, the code won't be interpreted the way it looks. Most likely there will be a syntax error.

As a Python programmer you learn not to mix tabs and spaces, just like we learn that the variable X is not the same as the variable x. There are tools (such as tabnanny) to clean this up, but I very rarely had problems with it. I've had much more problem cleaning up messy indentation in C++ programs.

Whether 2, 3, 4 or 8 spaces is suitable for indentation mainly depends on how many block levels you have in your code. A bigger indentation is certainly clearer. The problem is that you end up with too little space for your line, if it begins too far from the left margin. In my experience, Python programs usually use fewer indentation levels than, for instance, C++. Different languages encourage different programming styles.

DonKnuth once wrote: We will perhaps eventually be writing only small modules which are identified by name as they are used to build larger ones, so that devices like indentation, rather than delimiters, might become feasible for expressing local structure in the source language.


If you really hate Python's whitespace requirement, presumably it wouldn't be too difficult to write (in Python!) a pre-compiler to let you use whatever demarkations you prefer. Of course, that's not very "pythonic". :) Personally my only complaint about Python's whitespace is that it is tedious to change the indentation level of code blocks with a primitive text editor (such as a wiki page), whereas putting in an extra set of curly braces would be fast and easy. -- AndyPierce

I find changing the indentation level much easier than adding curly braces; but then, I use a real editor for both Python and wiki. -- Eric


I really don't understand why using indentation level for blocks is so controversial. You do indentation anyway; you don't want to violate OAOO; it avoids hard-to-spot errors where indentation and begin/end or {/} differ; it looks cleaner; there's no way to have unmatched braces; and I never get indentation wrong, but sometimes I do have to count braces. In other words, don't let this issue stop you from trying out Python; indentation-denoted blocks are very easy to get used to. -- FalkBruegmann

At the risk of fueling the pointless language debate, let me bring in some of my Ruby experience. I have been told by others that I write neat, easy-to-read code, but I don't always use indentation. For one thing, as my code gets mature it tends to have a number of one-line methods like

 def to_s; self.year + " " + self.month; end
and sure, according to some people's rules that's not sufficiently clear since the body of the method should be on its own line, nicely indented. All I can say is that this works fine for me.

Python would accept:

 def to_s(self): return self.year + " " + self.month
though you may need some 'str()'s in there for coercion.

Well, I suppose my general lack of experience of Python shows through in that choice of example. It's still not difficult for me to imagine a situation where I could run afoul of these sorts of indentation rules. What if I have a statement that's longer than 80 characters, and I'd like to break it up with a newline, some tabs, and then some spaces, to make it line up with the line before? Or even just an excessively verbose argument list?

 raise( MissingError,
        "Couldn't find class \"#{ class_name }\"",
        caller )

This will work fine. Open parens and brackets stay open at end-of-line. Whitespace is ignored on the next line. No backslash needed.

Okay. Can somebody unpack for me what is meant at the top of the page by saying that the interpreter gets confused by a mixture of tabs and spaces? -- francis

"Confused" is probably a bad term. "Confusing" might be better. The parser always considers the tab equivalent to spaces up to the next 8-character tab stop (see below for the exact rule), no matter how it's displayed -- this is the source of the 'as God intended' quote above. So, you can indent the second line of an if block with a tab, and the third with eight spaces, no matter how it's displayed in your editor.

This tends to send your teammates to window ledges, though. In fact, Guido van Rossum has listed allowing both tabs and spaces as one of his key "Python Regrets" (from the presentation of the same name).

YamlAintMarkupLanguage prohibits tabs in initial whitespace for this reason.


From https://docs.python.org/2.7/reference/lexical_analysis.html#indentation:

First, tabs are replaced (from left to right) by one to eight spaces such that the total number of characters up to and including the replacement is a multiple of eight (this is intended to be the same rule as used by Unix). The total number of spaces preceding the first non-blank character then determines the line's indentation. Indentation cannot be split over multiple physical lines using backslashes; the whitespace up to the first backslash determines the indentation.

so "\n \t" is the same as "\n\t"; so the tab width is 8, which is not the same as this: "\t" == " " * 8 in other words, python forgives you when you add a space by mistake ;)

Old link rotted - I replaced it with a link to the equivalent section from the documentation for Python 2.7.


When you're worried about satisfying the compiler or interpreter, you have to follow arbitrary rules. When you're worried about making your code readable to humans, whether that's yourself or others, you should follow certain principles and feel free to deviate from them when necessary. That's because computers are stupid and humans are smart. Computer readability is blunt and inflexible, while human readability is not.

Why should I give a computer control over my own judgement as to what makes code more readable? Next you'll start telling me I should mindlessly follow MicrosoftWord's execrable grammar suggestions, too.

BTW, how does Python handle here-docs? -- francis

 """
 Everything between ThreeDoubleQuotes? and ThreeDoubleQuotes? becomes a string,
 where whitespace is important and nothing is escaped.
 """
Could it be a substitute for here-docs? It's not a substitute--it is Python's syntax for the same purpose.

So text between ThreeDoubleQuotes? is not subject to the same rules about indentation? -- francis

Correct. While you have to start the opening """ with proper indentation, indentation is ignored between triple-quotes.

Triple-quoted text is one of a few exceptions to syntactically-significant whitespace; the other two notable exceptions are bracketing syntax (parentheses, square brackets, and curly braces), and line continuation via backslash.

Keep in mind throughout the above discussion that text can be triple quoted with either double quotes or single quotes.


Now that automatic code reformatting tools have been invented, I don't have to think about indentation. I can type 10 statements or Java or C++ onto a single line, hit "reformat", and everything is nicely indented for me by the computer.

Don't assume that is the case for everyone. Poor indentation has wasted hours of my time since the source is checked into VSS and locked into eternity (especially framework code). It is impossible to do this in large-scale source-controlled environment. I'd love python-style indentation rules in C#.

However, a language like Python that doesn't use block delimiters prevents the use of automatic reformatting tools, slowing down the programmer who has to constantly ensure that everything is properly lined up.

There is no use for the syntactically significant indentation when reformatting tools are available.

I disagree. Current automatic code reformatting tools (even the illustrious Emacs) do not indent and break lines nicely in the general case. Consider that the problem of typesetting took Knuth decades to solve with TeX, and that code reformatters have the added constraint of not breaking the meaning of the code. While formatting source code isn't as complex a problem as formatting prose, it is the same class of problem, and I haven't seen a beautifier with one-thousandth of the consideration given to aesthetic issues that even nroff has.

Your "constantly ensuring that everything is properly lined up" is a bit of a strawman, too. Do you complain that the C compiler doesn't insert braces and semicolons where you obviously intend them to be?

The basic premise of Python is that it should maximize readability, even at the expense of writeability. { } ; are not as readable as indented block structure alone. The hit to writeability that you are complaining about is real, but consider how often a program must be read compared to how often it must be written. Don't you think you are optimizing the rare case, at least as far as production code goes?

You miss the point. Reformatting tools are needed/useful because in most languages you need to get the block elements and the indentiation right. If you replace ";" with <return>, "{" with <TAB>, and "}" with the <back space> key your line of CeeLanguage code will be almost legally indented PythonLanguage code. Just write two small pieces in C and Python and count the key strokes to see which language is faster to write.


I incidentally had to use Python for work, and I have to admit that the (Nazi!) indent policy forces you to write very readable code. On the other hand, I do not like to be forced to do something: I'd prefer to do it because I actually find it useful rather than because I have no real other option. To conclude, based on these premises, I still prefer quick and dirty unreadable/unmaintainable Perl gibberish :) --DarioRossi?

A lot of Python Proponents claim that because Python forces you to indent your code, you write more readably. When people complain about Python forcing you to write your code a certain way, many proponents claim that "you would write code that way, anyway." It seems to me that it's strange to take both sides of this. If I was already going to write my code that way, then why does Python force me to? If I need to be forced to write readable code, is Python's whitespace going to keep me from engaging in ThreeStarProgramming? Is there really a benefit, then? -- DaveFayram

The benefit comes from preventing a class of bugs that the eye naturally fails to see. In C code, the lazy programmer might construct the following:

 if (condition)
     sometimesExecuted;
     alwaysExecuted(butShouldntBe);
 alwaysExecuted;

Why do python's supporter always come up with that example? It is obvious that the C programmer is not using python and I am certainly sure he is aware he is not using python, so he would never make such a mistake. And even if he was the kind of person who makes such silly mistakes it could have easily been fixed by himself enforcing the {} block delimiters which somehow are missing on python and are much better at preventing these issues besides of being much more flexible.

Why do Python (note singular) (note capitalization) detractors (note plural) always come up with this same objection? OF COURSE we know that C programmers aren't coding in Python. Duhh! That is why Guido created Python! He observed that faulty code, as listed above, was of such common occurance that he chose indent-based structure for Python as a remedy. As for said C coder never making said mistake, umm, hello! I've been coding in C since 1985, and I still make the aforementioned mistake, particularly when I'm time-stressed. What sucks is that I never find out about it until runtime. (Here's a quick tip to solve that problem though: always pair an if with an else, like this: if(condition) thenDo1() andDo2() else;. The "dangling else" in this case will appear outside of the lexical scope of the if statement, and thus, is a syntax error. But, I digress). I see you also recommend a different approach -- whereby the coder forces himself to always use { and }. Hmmm, this is like a machine language coder forcing himself to properly observe types in his code. It can be done, but ... WHY?! The computer is so much more effective at ensuring and automating the process. So please don't make incorrect, blanket statements about the superiority of C coders over everyone else. It's insulting and provably incorrect. There are other, legitimate claims against significant whitespace (see OlafKlischat comments below). Yours isn't one of them.

#ifdef statements tend to aggravate the issue. Python sidesteps the problem by telling the computer to use the same structure as the eye sees.

This seems to be a special case of OnceAndOnlyOnce, where it is not code that gets duplicated, but structural information.


Problems with the "whitespace is significant" rule:

I think they should just make the "whitespace is significant" rule optional (like in HaskellLanguage e.g.), i.e. in the absence of explicit block delimiter tokens like "begin", "end" etc. the whitespaces are used to decide where a block begins or ends, but you *can* always use block delimiter tokens, and if you do, they take precedence.

-- OlafKlischat


I don't have an issue with the whitespace enforcement itself but with the lack of delimiter tokens. Blocks should have an start and an end, on python's syntax everything depends on just spaces and at least for my case without the block end specifiers makes code much harder to read.

Look:

 while (a>0)
    a++;
    dostuff1();
    dostuff2();

If it is a code sniped you are looking before the end of a paper or a page break in screen and it is using python syntax. How would you be certain that dostuff2() is the last code executed by the while?

The second page has something like this:

    dostuff0();

Is there anyway to read the code and be certain that dostuff0() is inside the while or outside?

Then the missing delimiter token puts python code in disadvantage against some other high level languages, the verbosity of those languages can be really helpful. I think the objective of making the whitespace significant in python was to enforce readable code, but they canceled the effect of this 'feature' by removing the end delimiter.

-- Vexorian


I feel that by dropping the redundancy of the { } combination (and forcing their presence, not optional as in C) a chance was missed to make python both more robust and more acceptable to mainstream programmers.

Of course a new language offers a good opportunity to break the mold and be creative, but in this case it wasn't (very) broken so there was no need to fix it. As an old hand C programmer I can't help it but every time I see a statement like this:

if (something):

   do_something()
   and_some_more()

I feel I'm staring at a potential problem. I've been totally conditioned to that and I realise that is a personal issue but it stops me from exploring python further than I have so far (one 1,000 line program to design windmill blades, just to force myself to really get my feet wet). After that experience I felt that it is just too much of a hassle. That ':' in there might as well have been a {, all you'd need was the } marker to complete the block.

You don't need parentheses in if or while blocks. You could have written this instead: if something: ... Hence the need for the colon. In C, technically, blocking with { and } could be enforced, and its if statement could have been if condition { ... }, but alas, it was never to be so.

        Ruby doesn't need () around conditions and it doesn't need a colon to understand where the condition ends. This : thing in Python always puzzled me.

Redundancy in code is *good*, if you lose one level you always have the other to tell you what you think you are looking at is something else entirely.

Nobody can prove in the example above if the and_some_more() is in the logically correct spot or if some prankster moved it 3 spaces to the right (or if I've been leaning on the space bar for 2 seconds when not paying attention). With { } pair there would be no ambiguity.

Wrong. I could, as a prankster, have repositioned the } as well. But, like so many other detractors, this one case appears with alarming frequency, and is the logical converse of the pro-indent argument made by Pythonistas. The reality is, both are right, and in either case, the example is vastly too simple to get any useful data from either way. You need to examine entire programs, whole code in production, before you can say one way or the other.

Personally, I agree with Haskell's approach, where significant whitespace is optional. But, I find myself in strong preference to OberonLanguage's IF...THEN...END, or WHILE...DO...END constructs. The blocking is enforced, indentation is optional, and as a native English speaker, it just makes sense. It makes reading the code that much better. In my opinion, both C/Perl camps and the Python camps have utterly missed the clue-boat on this one.

Explicit indeed is better than Implicit, and spaces to me are implicit, braces are explicit.

-- Jacques Mattheij

Stick with it -I'm from a hard C/C++ background, and the indention issue goes away after a few months. I've been bitten too many times by C/C++ indentation not matching intent to think that braces solve all issues --PeteHardie

Explicit is better than Implicit. Surely having endfor, endif, endwhile etc. fits the explicit koan much better than just having a return and hoping no one makes any mistakes. An end marker would explicitly mark the end of a block, which would solve most of the whitespace/IDE problems in one go (IDEs would be able to automatically indent any copied text).


I like that python forces you to write readable code, but the use case that just bit me on the ass is where a library was written with 2-space indentation, then frozen, then later converted to 4-space indentation. I've just been merging changes across and of course the merge tool has "compare whitespace differences" off. Major oops. merging a few lines across so that the files check out as identical in the merge tool, then you find that the files' block structures are completely scrambled.

I'd like to see optional explicit begin and end markers, combined with the current mandatory spacing rules. That way a mistake like mine leads to code that gives an indentation error but preserves the intended structure.


I'm surprised that no one has mentioned the legibility problems cause by significant whitespace. In a Python project I'm working on, we're using 2-space indentation. That's fine, until you get 40 or 50 line functions (no, I'm not the author of those) and you're trying to figure out just what that "else:" on line 35 has as its matching "if e:". Most editors have brace matching that makes it really easy to find a matching "if". On the other hand, "find the preceding/next line with the same indentation as this one" seems to be asking an awful lot. -- BillTrost

I always find 2-space indent in C to be unreadable. I think it looks like the randomness of ragged left justification rather than something intentional. My editor of choice for Python (Programmer's Notepad) shows vertical dotted lines for indent matching, which makes it impossible to miss matching indent levels, without having to navigate the cursor to a brace. I don't know if that's a feature that comes from vanilla Scintilla or if it's specific to PN. --RussellBorogove?

Bill, you might want to refactor those functions to use shorter definitions. Note that Python supports nested procedure definitions like Pascal/MODULA-2 do, so you can take an obnoxiously long if statement and make things more concise. For example:

  def retardedProcedure(...):
    if someCondition:
      # ... lots of code
      # ...
      # ...
    else:
      # ...
      # ...
      # ...

becomes:

  def substantiallyMoreMaintainableProcedure(...):
    def consequent(...):
      # ... smaller code here
      # ...

def alternate(...): # ... smaller code again # ...

if someCondition: consequent(...) else: alternate(...)

Of course, a better solution still is to refactor the code to use proper method design in the first place; but, in the event that that is not possible without severe consequences, the above approachs works very well indeed.

-- Samuel A. Falvo II


Why, oh, why do we insist that plain text is the ultimate representation of programming languages, while the natural model for code is a syntax tree, be it procedural or functional? (With the notorious exception of the C programming language where the tree layout can vary with macrodefinitions, but any such piece of code where the difference is significant is probably broken and certainly unmaintainable?) This is especially painful in version control systems, as reported above: they should store the meaning rather the text, and recreate the text for editing using the programmer’s favourite syntax.

BTW, Knuth's proposal boils down to the same observation; the difference is that Knuth assumes each subtree must have a name for this approach to work. This is certainly not technically necessary, and I am not sure it would be good to invent and use so many names; of course, one could always do away with autogenerated names, making the code a total mess unless the names are hidden and the code reassembled by the viewer. And meaningful names for subtrees are hard to coin, and their length tends to exceed the human perception limit by 400%.


I have only one problem with the whitespace, which is that it makes it really hard to integrate good lambdas into the syntax. Cobra almost gets it with

    [1, 2, 3].map(def (i):
        i+1
    ) --> [2, 3, 4]
...but that still looks ugly to me. Any suggestions?

Well, it's not really the lambdas themselves that make that syntax ugly. It's the required parentheses around function arguments, even when those arguments are long blocks of code. CoffeeScript's approach gets this right, in my opinion; it manages indentation-based lambdas with less ugliness, as shown below, because it doesn't require parentheses around function arguments. -DavidMcLean?

    [1, 2, 3].map (i) ->
        i+1
    # evaluates to [2, 3, 4]


Fun trick: You can get end-block delimiters in python. This is useful, as several people have mentioned, for globally reformatting screwed up code. Simply define a global variable that has the name you want, and use that standalone. You can also use "pass" for this purpose (which is less intrusive, but a little confusing)

    enddef = endif = endwhile = endfor = None

def myfn(x): if x == "how many licks": for y in ("one...", "two..."): print y endfor print "three!" else: print "I'm just an owl, I don't understand these things" endif enddef

I've seen suggestions to use short comments ("#end") to achieve the same. It does improve readability and allow correction of code with broken indentation. Unfortunately, it can't be enforced and existing code doesn't use it, limiting its value. Better solution: use another language.


See also: SyntacticallySignificantWhitespaceConsideredHarmful, PythonSyntax, ForkPythonWithoutTabs


CategoryPython


EditText of this page (last edited November 11, 2014) or FindPage with title or text search