Tabs Versus Spaces

This is one of the eternal HolyWars among programmers: should source-code lines be indented using tab characters or space characters?

People generally don't mind reading code that is consistently indented using tabs, or code that is consistently indented using spaces. The problems arise when some lines are indented with tabs, and others with spaces. In such cases, the code will only display nicely if the reader has the same tab stop settings as the authors and if all the authors used consistent settings. One way to solve this problem is to force everyone to be "tab people" or "space people".

The case for tabs:

Gives reader control over visual effect.
A tab has a meaning of "indent one level" (disputed; see below), whereas spaces have many meanings.
Fewer keystrokes required to get things aligned.
Files are smaller.
It's the only way to get vertically aligned columns when we use proportional fonts.

The case for spaces:

Consistent display across all display hardware, text viewer/editor software, and user configuration options.
Gives author control over visual effect.
Tabs are not as "visible" (that is, a tab generally looks just like a bunch of spaces)

The case for random mixture of tabs and spaces:

Easy (you don't have to think about it).
People who don't like it can use some sort of tool to automatically indent the code however they want.

Other wrinkles:

In some programming languages and tools, tabs and spaces do have different meanings. This argument doesn't apply to those languages, except in relation to LanguagePissingMatches.
VersionControl systems generally do consider whitespace to be significant. If a developer reformats a file to have different indentation, makes a minor change, and then checks in the change, the system is likely to treat it as if the developer changed every line in the file.
In PythonLanguage, indentation is semantically significant. Either tabs or spaces may be used for indentation, though it's recommended that programmers use one or the other and stick to it. To quote the language reference, "tabs are replaced (from left to right) by one to eight spaces such that the total number of characters up to and including the replacement is a multiple of eight (this is intended to be the same rule as used by Unix)."
In some editors, hitting the TAB key inserts a tab character at the current position. However, programmers' editors can be configured to use the TAB key to trigger the editors' auto-indent function, which may generate multiple tab characters or multiple spaces.
Some text editors can determine the proper tab settings for a file by analyzing it or by reading special markup contained within. Such editors are nice for those people who have them, but they do not solve the general problem.

Programmers also argue about whether it is acceptable to use tab characters inside source code lines for purposes other than left-side indentation. People may use these to align elements of a multi-line statement or function call, or to align end-of-line comments. The issues are similar to those above, except that tabs inside lines cause addtional problems.

Most programming languages provide a special character sequence (such as "\t" in C, C++, Java, and others) to allow a programmer to make the difference between tabs and spaces more obvious inside strings and character literals. In such languages, it is generally considered bad style to put actual tab characters inside string and character literals.

The above is an attempt to summarize the discussion below.

This is a rant about Jamie Zawinski's rant at http://www.jwz.org/doc/tabs-vs-spaces.html

Read his rant first.

Jamie Zawinksi brings up the crucial issue and fails to recognize the significance. (Because he is clearly a space person.)

His mistake is reducing ASCII character 9 to "compression". In his mind ASCII 9 means "display some number of spaces" and that the problem is that people don't agree about what that number should be. WRONG! The point is that code, like any other kind of structured information, has MEANING. The tab character means "indent 1 level" and if programming languages were XML might be written something like <tab/>. How your editor (or browser or whatever) chooses to present <tab/> should be determined by your editor's configuration, just like the presentation of <tab/> in an xml document should be determined by its stylesheet.

To Quote: " With these three interpretations, the ASCII TAB character is essentially being used as a compression mechanism, to make sequences of SPACE-characters take up less room in the file."

In fact, the tab character is not just a compression for an unspecified number of spaces: I might create an editor that defines the "indent distance" in terms of millimeters! I might create an editor that uses colors instead of actually offsetting the beginning of lines! This reminds me of a friend who said "C++ is all waste: I can do that in pure C with arrays of function pointers."

The very worst thing to do is the suggestion that he makes: get rid of all tab characters. The problem is information entropy: it is difficult and sometimes impossible to compute "indent distance" from a file which has no tabs [think inconsistent spacing]; this is like removing all your <td> tags from a table and replacing them with an equivalent sequence of   - a solution that works fine for your 21" monitor but not for my 17" one.

In fact, the solution is exactly the opposite: NEVER REPLACE TABS WITH SPACES! All editors can be configured to present whatever "indent distance" is desired and without having to add all kinds of crazy comments (as jwz seems to suggest). As long as no one in a group of developers uses spaces to indicate an indent, nobody should ever even have to know what other people's preferred "number of columns" is!

Go forth and tabify!

-- KerryKartchner

" With these three interpretations, the ASCII TAB character is essentially being used as a compression mechanism, to make sequences of SPACE-characters take up less room in the file." The rant totally ignores the etymology of the tab, and this sentence kind of sums that up: typewriters (with their tab stops, often custom-settable) had no concern for how many bytes it took to encode the content they were putting onto paper. Tabs were essentially being used as an indentation mechanism, to ensure that the indentation was consistent ("Did I hit <space> six times or only five?").

The tab character means "indent 1 level"...

That assumes "indent 1 level" is meaningful, and I have encountered lots of instances where there is no meaningful interpretation of "indent 1 level".

For example, when you break a function call with, say, 4 parameters into 4-5 lines, with each parameter on its own line. Often, I want the 2nd parameter to line up with the 1st, what is the "indentation level" of the 2nd parameter?

''Given that the function call starts at indententation level N, the following lines (containing further parameters to the function) are at indentation level N+1.''

No, that only makes parameter 2,3,4 line up with each other. They won't line up with parameter 1 except by accident.

You need spaces (and a monospace font) to get things to line up like this:

do_something_nifty( meaningful_name_source, meaningful_name_for_dest, case_insensitive, other_relevant_parameters );

Yeah, but if you *really* need parameter 1 to line up, you can do it like this:

do_something_nifty( meaningful_name_source, meaningful_name_for_dest, case_insensitive, other_relevant_parameters );

which works with either spaces or tabs, both monospace and proportional fonts.

Tabs for indentation, spaces for alignment

Emacs: http://www.emacswiki.org/SmartTabs

Vim: http://www.vim.org/scripts/script.php?script_id=231

For source code, indentation can in most cases be generated from the code's structure. Don't represent "indent N levels" with tabs or spaces, you already have all you need stored as "if (...) {" on the line above. Remember, source code is not text.

Then WhatIsSourceCode ?

The tab character means "indent 1 level"...

Actually, for most text editors it means "indent to the next tab stop." For example, one might use tab character to offset the line, to format the columns of a table and to indent the //-style comment on the same line, such as in:

<tab><tab><tab><tab><tab><tab><tab><tab><tab><tab>
<tab>char my_table[2][] = {
<tab><tab>{ "The first line",<tab><tab>"abcd" },
<tab><tab>{ "The thirty-second line",<tab>"cdba" }<tab>// Cheating here
<tab>};

As you can see from the example, the difference between "indent 1 level" and "indent to the next tab stop" sometimes means the difference between the number of tab stops used to achieve the same visual representation.

...and if programming languages were XML might be written something like <tab/>.

Most programming languages ignore the tab character as a whitespace. You might as well just not write it at all, especially if your programming language were XML.

The problem is information entropy: it is difficult and sometimes impossible to compute "indent distance" from a file which has no tabs...

That's exactly why you have to know what other people's preferred "number of columns" is if you are using tab characters in programming languages which treat tabs as whitespaces.

There are two uses of "indent to the next tab stop" in such programming languages:

The first is to duplicate in a visually efficient form the structural information already contained in the code ("a poor man's PrettyPrinter"). You don't lose this information if you convert whitespaces to whitespaces, and a good PrettyPrinter will do much more than your tab surrogate.

The second is to provide a reader with the visual representation which isn't contained in the code itself. In this case converting tabs to the corresponding space-indents loses nothing, but losing the tab stops layout makes a mess from the visual representation.

I might create an editor that uses colors instead of actually offsetting the beginning of lines!

And you might create an editor that uses colors instead of letters. But most editors in use don't do that, and rightly so. -- NikitaBelenki

The tab character means "indent 1 level"

To my jaded view, the tab character is shorthand notation for the XML equivalent "<TAB/>.

This, in turn, means whatever the tool that reads it wants it to mean. Here are some "meanings" that are already reasonably widespread:

Introduce some whitespace
Join or separate the current line to the preceding line (as in "make" and some of its brethren)
Separate two elements (as in tab-delimited file formats for spreadsheets and their brethren)
Signal the boundary of a semantic element (in syntax-aware IDE's)

There are surely more "meanings" than these four that just haven't occurred to me yet. Replacing tabs with spaces is an incredibly BRAINDAMAGED approach because:

It is non-reversible (without extensive comments and associated rules)
It confuses the representation of the thing with the thing
It obscures the meaning of the author

Brian Cantwell-Smith presents a compelling model that is perhaps relevant: He suggests that we contemplate three different sets:

Notation: A particular "marking". "Three", "3", and "III" are three (hah!) different notations for the same thing. Some notations for "tab" include "<Tab/>, "\t", "0x0009" and "0x09" (there may be others).
Symbol: A unique token manipulated by a formal system. Various programs have various symbols for tab -- some of the more broken programs require very elaborate symbols (such as Zawinski's proposed combination of comments and spaces).
Meaning: An interpretation that we apply to a symbol. Four are enumerated above.

My suggestion is that a "file" can use whatever Notation it likes for tab, so long as the system that reads it has a mapping onto its tab Symbol, and we are then able to apply whatever Meaning we desire. My own experience has been that embedding an 0x09 in an 8-bit ascii file is easiest to read by the largest variety of programs that I use, and results in the most straightforward way for me to reflect the various meanings that tab has.

For the "meaning" that you get out of tabs, you don't get much mileage. So someone who prefers 4-space tabs can be happy viewing the code you wrote assuming 3-space tabs. Is it worth the anal-retentiveness? It's really hard to get it exactly right all the time. I have never seen anyone so anal-retentive. Part of the problem is that by default in most editors, tab characters are indistinguishable from spaces. So you don't even know if you've got it right by just casually looking at your own code.

Having spent the slight majority of my 11 years as a software developer doing maintenance programming, and having downloaded a lot of source from SourceForge, I can tell you the defacto standard is Tabs Randomly Interspersed In Varying Densities. I don't recall ever seeing a correctly tabbed source file, but I am very thankful when I see a source file with no tab characters at all. They are readable, whether the indentation is 2, 3, or 4. What makes it unreadable is those random tabs. First, I try to guess the author's tab settings. It's usually 2, 3, 4, 6, or 8. If that fails, or if there are several source files in the same project by different authors with different tab settings, I reformat it (with vi's automatic reformatting (=)). But that's not preferable, since continuation lines and multi-line comments tend to get strangely indented by that.

So, you can continue hoping the whole world turns anal-retentive. But I'm with jwz.

Problem: tabs . . . solution: write a program to convert source to your ideal format using editable rules (for ease of maintenance).

So, you can continue hoping the whole world turns anal-retentive. But I'm with jwz. -- AnonymousDonor

ProgrammingIsHard. Close counts in horseshoes, grenades, and atom bombs. Not programming. If you aren't "anal-retentive", then why (how? --ed) are you programming?

Source code works the same way regardless of whether it has tabs or spaces. That is why most people don't worry much about it: ProgrammingIsHard and they have more important things to do than futz around with making sure their tabs are consistent. Maybe they should, but they don't, and your silly warfare metaphors won't change that. That's the author's point - tabs make things unnecessarily complicated and life would be easier if everyone used spaces. -- KrisJohnson

I hope you aren't working on the makefiles for my next major system build. Oversimplifying complex things is a source of much of today's increasingly unreliable software. Inconsistent tab/space formats that break already fragile IDE's (because the IDE developers in turn seem to rely on oversimplified mechanisms to identify syntactic elements) drive whole development teams into using brute-force character editors such as VI and EMACS more than two decades after superior technologies became widely available. If you need a convincing illustration, try debugging even the most straightforward JavaScript code under Venkman, the latest greatest state of the art JavaScript debugger from the Mozilla folks. I know, you don't need a debugger - all you have to do is insert lots of print statements and then look at the log files. For that matter, tell me about how your "uncomplicated" tab-to-space-converting java editor will let you develop and debug the tab-separated output that your client needs to import into their accounting packages. I guess you'll just have a simple little enhancement that will distinguish the tabs you're trying to emit to the output from the tabs you're indenting your code with - or perhaps you'll just skip all that stupid wasted whitespace and left-align everything.

I thought it was pretty clear that this discussion is related only to those cases where tabs and spaces are equivalent. We aren't talking about changing the syntax of make or destroying tab-delimited files. I don't think any automated tab-to-space conversion tool would be so brain-damaged that it would convert the "\t" sequence in a Java string to a space. If you are putting literal tab characters into your literal Java strings, rather than using \t, you and the poor souls maintaining your code are going to get some nasty surprises someday. -- KrisJohnson

Where did anything on this page limit this discussion "only to those cases where tabs and spaces are equivalent"? It sounds to me as though your convention is painting you into a corner where you are required to jump through very special hoops to use your "simplification". Are you sure the source display in your java debugger understands your spaces and tab stops the same way as your editor? Don't you also advocate this for all the other block-structured languages - C, C++, PERL, Smalltalk, etc? Where does JavaScript fit into that mix? Don't you indent tags in XML? Do you ever line up the RHS's of blocks of initializations, for legibility? Don't you sometimes like to be able to read javadoc headers, in source code, that have something approximating a tabular format?

My argument is that your one very specific oversimplification breaks a huge number of things humans do with code to make it legible. Perhaps if you didn't always convert tabs into spaces, you might have more first-hand experience with the huge number of things you exclude by your conversion.

You've created a clumsy special case when the general works just fine.

The article by jwz which started this page is clearly about the visual formatting of source code. I believe everything else on the page up until your comments falls in line with that.

Obviously, tabs are not the same as spaces and replacing all the tabs in the world with spaces would be a disaster. I do not advocate any such position, and you insult me by assuming that I do.

I went for several years being a "tab person" and several years being a "space person". I am aware of the strengths and weaknesses of both positions, and use whichever convention is favored by whatever people I am working with. I have used both conventions with several block structured languages and have never run into a problem with any debuggers or other tools with either convention. If you are aware of a specific real-world case where using spaces instead of tabs to indent Java or C++ source code causes problems, I'd be interested to hear about it. (BTW, why would a source debugger fail if space characters are present???)

I don't understand why you ask whether I ever indent code or align things vertically. Obviously I do or I wouldn't be participating in this discussion. Yes, I indent tags in XML. Yes, I line up the RHS's of blocks of initializations for legibility. Yes, I sometimes like to be able to read javadoc headers, in source code, that have something approximating a tabular format. All these things can be done using tabs or spaces. What have I written that implies I favor doing away with all indentation? -- KrisJohnson

Please consider the simple case of a javadoc comment. Here is an excerpt from the javadoc header for java.lang.Object.clone() (I'm using version 1.47, 10/01/98 as distributed in VisualAge Java for this example).

  * @exception  CloneNotSupportedException?  if the object's class does not
  *               support the <code>Cloneable</code> interface. Subclasses
  *               that override the <code>clone</code> method can also
  *               throw this exception to indicate that an instance cannot
  *               be cloned.
  * @exception  OutOfMemoryError?            if there is not enough memory.
  * @see        java.lang.Cloneable
  */
 protected native Object clone() throws CloneNotSupportedException?;

Please note that I've added a leading space to each line in order to avoid extra wiki prettification. Now, although this surely makes me guilty of being anal retentive, please note that the continuation of the "@exception" line is misaligned. Let me ask which, to you, is more legible - the above, or the following:

  * @exceptionCloneNotSupportedException?if the object's class does not
  *support the <code>Cloneable</code>
  *interface. Subclasses that override
  * the <code>clone</code> method can also
  *throw this exception to indicate that an
  *instance cannot be cloned.
  *
  * @exceptionOutOfMemoryError?if there is not enough memory.
  * @seejava.lang.Cloneable
  */
 protected native Object clone() throws CloneNotSupportedException?;

My clients and coworkers generally prefer the latter. Your approach precludes it. Is this specific enough for you?

I'd like to add, parenthetically, that I despise the need to embed multiple tabs in the continuation lines - I would think that by 2002 I'd be able to set tab stops as I've been able to do in any word processor for nearly 30 years. But that is clearly asking too much of the developer tool community.

There are (and have been for some time) plenty of editors which allow one to set tabs at arbitrary positions. The problem with setting them so that this particular documentation lines up "correctly", is that it's going to mess up the tab positions elsewhere in your source file -- AndySawyer

Your second example is a great illustration of the basic problem with tabs: your example only looks good if tab stops are set at 8 characters, whereas many other authors' examples only look good if tab stops are set at 3, 4, or 5 characters. Here's your second example with spaces used instead of tabs:

  * @exception   CloneNotSupportedException?     if the object's class does not
  *                                             support the <code>Cloneable</code>
  *                                             interface. Subclasses that override
  *                                             the <code>clone</code> method can also
  *                                             throw this exception to indicate that an
  *                                             instance cannot be cloned.
  *
  * @exception   OutOfMemoryError?               if there is not enough memory.
  * @see         java.lang.Cloneable 
  */
 protected native Object clone() throws CloneNotSupportedException?;

This looks identical (ignoring the question marks inserted by wiki), and it doesn't suffer when displayed in an editor with 4-character tab stops. (BTW, it has been a while since I've used javadoc; if it just doesn't grok spaces, then I'm completely wrong here.) -- KrisJohnson

Of course you can substitute spaces for tabs in my example, I created my example by substituting tabs for spaces in the original. Why do you think my second example breaks if tab stops are set to other than 8? I just tried this out using my handy-dandy MSWord and a monospace font. Yes, its true that because of the brain-damaged tab support offered by most editors (including the lame textedit box in our browsers), the number of extra tabs has to be adjusted - but even that can be done programatically for about the same effort as the tab-to-space conversion macro. In my opinion, a modern source code editor should simply provide for tab stops in the same way that every word processor has done it for decades.

That's exactly the point I've been trying to make. You can't open your tabbed example in any old source code editor and expect it to look like you intend it to. But you can open the spaced example in practically any editor with a monospaced font and it will look fine. -- kj

Meanwhile, how much work is it when we now have to change text of the comment? The tab-free approach guarantees that *every* character added or removed anywhere in the line prior to the rightmost tabstop forces a compensating manual realignment of the intervening whitespace. By the way, the misalignment that results from the Wiki question-mark provides an illustrative example - it broke your tab-free approach.

I've found that a lot of work is required to edit examples such as the above with tabs or with spaces. With tabs, you may need to hit fewer keys to get things lined up, but you do still have to deal with a variable number of tabs between columns. To avoid these issues, I tend to format things much like your first example, whether I'm using tabs or spaces. -- kj

Your approach also precludes the use of proportionally-spaced fonts for rendering source code. Proportionally-spaced fonts have been demonstrated to be more legible for hundreds of years. In a proportionally-spaced font, there is no single width for a "space" - the equivalent of a tab stop is the only option.

There is no single width for a tab-stop either. I actually do write code using proportional fonts once in a while, just for a change of pace. You still have the problem that you don't know how many tabs you need to manually insert to get your columns to line up, and everything gets screwed up if you change the font later. -- kj

Finally, I would suggest that all of this tab stuff would be even more elegantly simplified by providing effective table support (because our Javadoc examples are really just tables).

Is the horse dead yet?

Yeah, rigor is setting in. I'll just note that much of what you state here about benefits of tabs is contingent upon having editors that handle tabs in a much different way than most existing source code editors do. I agree that it would be great if we had programming languages with support for smarter word-processor-like editors that knew how to lay out tables of vertically aligned columns with proportional fonts and which would automatically resize and reflow themselves up as they were edited. Whenever we do get those, we probably won't be using space characters or tab characters in the files to control the layout; it will use some sort of structured format like HTML, XML, or RTF. Until we do get those editors, we have to choose plain-old-ASCII workarounds for the tools we do have. Sometimes tabs work best, and sometimes spaces work best. There is no correct answer - both options suck. -- KrisJohnson

An alternate view on tabs is that when a file is saved, tab-stops are at 8-character positions, and your editor should be smart enough to allow that when you enter a tab character, it indents however far you want it to, and converts runs of spaces to equivalent tabs and spaces. Obviously for (arguably smelly) file-formats like make, automatic expansion and compression of whitespace as tabs would need to be turned off. -- MartinRudat who read something like that in a programmer's editor somewhere.

at the end of the line

Most of this page talks about tabs vs. spaces at the beginning of the line or in the middle of a line of text. What about spaces and tabs at the end of the line (just before the newline character) ?

(Should I make a whole new wikipage to discuss this ?)

ImakeTool mentions some tools that do the wrong thing if spaces are accidentally added to the end of the line.

ProleText?: Invisible formatting information is embedded in trailing spaces and tabs on the ends of lines in an ordinary looking plain-text document. "ProleText? is a cool hack" -- Kjetil Torgrim Homme 1998/12/01 "We have no interest in financial gain from the format, and will make our own code to process the format available free." -- ClariNet?. Everything in ProleText? can easily be translated to HTML; the "inform" ProleText? to HTML converter is freely available. -- http://www.vim.org/scripts/script.php?script_id=2308 http://rdrop.com/~cary/html/computer_graphics_tools.html#proletext

Perhaps neither spaces or tabs should be in source code other than spaces to separate symbols. The source code editor should be able to style the source text independently of the meaningful code based on styling rules than can be specified by the user of the editor. In this way the code is just the code, and differences between versions when being compared will only highlight meaningful differences.

Neither a tabber nor a spacer be!

Nor a coder in PythonLanguage

cheers Danny

See TabDelimitedTables WikiTabotage SpellWhiteSpace WhitespaceLanguage SyntacticallySignificantWhitespaceConsideredHarmful WhitespaceIsGood