Forth Readability

Programs written in the ForthLanguage can be hard for some programmers to read for these reasons:

Reverse-Polish Notation (PostfixNotation)-- arguments precede operator, which makes a lot of sense but is something many programmers are not accustomed to reading
Stack juggling -- the basic Forth model is that of a stack machine. Data values on the stack must often be reordered to support the operations upon them. This can lead to a lot of "stack noise" in Forth programs. It difficult for a reader to visualize the stack manipulations.
Terse operators -- many of the common words, like @ (fetch), ! (store), . (dot), and so on, are arbitrary symbols whose meanings must be memorized (Well, "at" is pretty intuitive, and I found ! pretty intuitive because SchemeLanguage uses it to indicate modification...)
No restrictions on identifier names -- an identifier (called a "word" in Forth) is any sequence of non-whitespace characters. This leads many Forth programmers to invent terse cryptic names consisting of punctuation marks.
No data types -- the basic data type in Forth is a "cell", which is an untyped byte or set of bytes. Forth does not require any declarations or have any semantic constraints on how cells can be used. This gives programmers a lot of freedom, but often makes it difficult for a reader to understand how a value is being used
"Unsafe" pointers and arrays-- many Forth programs act directly on memory addresses and use lots of address arithmetic. Forth does not constrain this in any way.
Embedded systems use -- Forth is often used to implement embedded systems and systems with minimal memory and other capabilities. Thus, lots of the sample code one finds is terse and optimized to minimize program size and download time.
Shadow blocks -- many Forth programmers are accustomed to keeping documentation in "shadow blocks" or other files that are associated with the source code. Thus, the code itself has few inline comments because the real documentation is elsewhere.

All of these problems can be surmounted by anyone willing to learn Forth. But it is certainly true that someone who only knows C and Java probably won't be able to read the average Forth program at first glance.

Inflammatory material moved to ForthFlames -- verec 27Aug2001

One aspect of common Forth practice that affects readability is the use of Forth as a MetaLanguage. In other words, many Forth programmers use the ForthLanguage to define domain-specific or application-specific languages for solving particular problems, and then write the high-level definition of the applications in those languages. This leads to a lot of inconsistency of style, syntax, and semantics. For example, a Forth program that analyzes astronomical data may look absolutely nothing like a Forth program that controls industrial robots. If the programmer does not document the created language, and if the reader is not familiar with the domain, then it can be difficult for the reader to understand the meaning of the application-specific language. And if the programmer is not a skilled language designer, a big mess can result. --KrisJohnson

Compare: Many C programmers use C to define domain-specific or application-specific libraries for solving particular problems, and then write the high-level definition of the applications using those libraries. This leads to inconsistencies of style, syntax, and semantics. For example, a C program that analyzes astronomical data may look absolutely nothing like a C program that controls industrial robots. If the programmer does not document the created library, and if the reader is not familiar with the domain, then it can be difficult for the reader to understand the meaning of the library. And if the programmer is not a skilled library designer, a big mess can result.

Should we be surprised that not understanding the domain, nor the code to act in it, would confuse the programmer, regardless of the language? --BillTrost

The difference is that in a C program, no matter who writes it, function definitions look like function definitions, function calls will look like function calls, array operations will look the same, arithmetic operators work the same, pointers work the same, #include directives work the same, and so on. A Forth programmer is more likely to invent a whole new "syntax" for a domain-specific language, which will be gibberish to one who has not figured out the programmer's intention.

The difference is that in a Forth program, no matter who writes it, words are always whitespace delimited. The uniform method by which both variables and functions are referenced leads to highly expressive programs, without rival at all until the advent of FunctionalProgramming as a household term. As a result, once you know the notation, Forth reads like prose, not like mainframe-era job control language. Array operations all look the same unless factored out (+ @ or + !), arithmetic operators work the same, pointers are all Forth knows so they must work the same, Forth uses requires or load to hierarchially load dependent modules (though this is bad form; structuring your code like a book is preferred), and so on. A Forth programmer is likely to invent a whole new syntax for a domain-specific language because that allows the most compact and correct representation of a problem's solution, which is gibberish to only those who have not bothered to read the documentation of the program. A C programmer has consistent syntax, but as a result, you lose the ability to express the problem at the problem's most natural level of abstraction. You end up with a total mess of a program, where the details of syntax often weigh more heavily on the program's design than the intended problem solution! -- SamuelFalvo?

In many programming languages, variable names carry a lot of the clues about what the code is doing. In Forth, this is missing. That's one reason why Forth style tends to lots and lots of small words - you use just as many names to explain what's going on, but you keep them in different places.

Interesting observations, but somewhat skewed. I am amused that I almost never see these criticisms of AssemblyLanguage. Perhaps that's because ASM isn't *supposed* to be a "high level" language.

Reverse-Polish Notation -- arguments precede operator, which makes a lot of sense but is something many programmers are not accustomed to reading
- << A consequence of modeling the actual behavior of the machine. "PRINT 1+2" is a fine abstraction, but it's not what the machine does >>
Stack juggling -- the basic Forth model is that of a stack machine. Data values on the stack must often be reordered to support the operations upon them. This can lead to a lot of "stack noise" in Forth programs. It difficult for a reader to visualize the stack manipulations.
- Yup, you have 2 fully exposed stacks. Yup, you get to manipulate them as you wish. On t'other hand, C and other languages of that family are rife with "stack noise" but it's hidden. >>
Terse operators -- many of the common words, like @ (fetch), ! (store), . (dot), and so on, are arbitrary symbols whose meanings must be memorized
- (Well, "at" is pretty intuitive, and I found ! pretty intuitive because SchemeLanguage uses it to indicate modification...)
- << ... unlike intuitive operators like ?:; and ~= and %= oh, and | vs || which are naturally clearer. All my favorite languages have funny operators, and I had to learn each set. >>
No restrictions on identifier names -- an identifier (called a "word" in Forth) is any sequence of non-whitespace characters. This leads many Forth programmers to invent terse cryptic names consisting of punctuation marks.
- << This is not a function of the language. Picking bad names happens in all languages. >>
- This is also patently false. I openly challenge you to find a snippet of Forth code written by anyone at all besides (a) Chuck Moore, (b) someone who contrived code deliberately to satisfy this challenge, and (c) a program where such notation isn't the most natural language to express solutions to problems. E.g., Forth Scientific Library contains tons of operators and functions that are inspired by their algebraic equivalents; the notation used therein is obvious to math-geeks who bother to read the FSL docs. Chuck's work is off-limits because he deliberately avoids long names at all costs, much to the consternation of every other Forth coder on the Internet. This claim that Forth "leads naturally" to names consisting exclusively of punctuation is to be taken only as an insult. -- SamuelFalvo?
No data types -- the basic data type in Forth is a "cell", which is an untyped byte or set of bytes. Forth does not require any declarations or have any semantic constraints on how cells can be used. This gives programmers a lot of freedom, but often makes it difficult for a reader to understand how a value is being used.
- << This is actually true. Here we get into the realm of AssemblyLanguage, where the programmer is completely responsible for invoking appropriate operations. Care in naming ameliorates this somewhat. >>
- The intented/preferred coding style is, to borrow a term from the Haskell community, to create a trusted kernel of functionality. Like any non-primitive type, in any language that I can think of except BASIC, you're supposed to create a trusted kernel of functionality that abstracts the (e.g.) pointers or handles you dole out. In Modula-2 and its progeny, this trusted kernel is called a module. In BCPL and its progeny, a compilation unit. As a result, Forth doesn't need type systems because most programmers use such trusted kernels. It's interesting that languages with the strongest of the strongest type systems, such as Modula-2 and Haskell, both advocate this type of coding for the sake of maintainability if nothing else.
"Unsafe" pointers and arrays-- many Forth programs act directly on memory addresses and use lots of address arithmetic. Forth does not constrain this in any way.
- << Neither does AssemblyLanguage. Caution, unshielded power source. Some tools can be dangerous; keep sharp ends pointed away from the body. >>
- Neither does C. You need C++ to catch such things, or non-standard compiler switches, and even then, it's only if the C/C++ compiler can catch such things.
Embedded systems use -- Forth is often used to implement embedded systems and systems with minimal memory and other capabilities. Thus, lots of the sample code one finds is terse and optimized to minimize program size and download time.
- << In those embedded applications where the lexicon is preserved, perhaps. Compiling to native code allows full clarity in labelling. >>'
Shadow blocks -- many Forth programmers are accustomed to keeping documentation in "shadow blocks" or other files that are associated with the source code. Thus, the code itself has few inline comments because the real documentation is elsewhere.
- << Yup. That's a bit of a thinking shift. Keeps the size of loaded blocks down, though. Good in non-compiled environments. >>

Kris' observations above are very pertinent. Forth is, in many ways, a "language assembly" kit (vs AssemblyLanguage). It has all the low-level power you need and the power to spin really high-level abstractions.

No seatbelts (just like AssemblyLanguage). You don't want to put this kind of raw power into careless or clumsy hands. Though those who are careless and clumsy either learn from their mistakes really fast, or move on to other languages that come pre-equipped with roll-cages.

I've written really awful Forth (a full sketch program in 48 lines) and really clean Forth (a modem control and dialer program). I've written pretty C and ugly C. Ditto AssemblyLanguage.

In the end, the programmer, not the language, determines readability. -- GarryHamilton

The end result is that the program is expressed in terms a human reader can understand. Humans do not think in terms of abstract nouns, but rather imperatives. When you look at a set of instructions, what's clearer to you, (a) "Insert the aforementioned article into the previously described recepticle," or (b) "Insert the plug into the wall socket." Both say the same thing, but one requires the reader to have a larger context kept in his/her head, while the latter is simple, direct, and to the point. I would rephrase (a) as "Insert it there." Now which is clearer?

This is not to say that variables are a bad thing. If a language is designed to use variables, such as most applicative languages, then that's perfectly OK. But Forth isn't an applicative language -- it's concatenative. It's designed to not use variables, and the structure of the language reflects that. If you try to write C code in Forth, you're resulting program will be an unreadable mess. If you try to write Forth code in C, the result won't even compile. If you write C code in C, it'll be clear and readable to a C programmer. If you write Forth code in Forth, it'll be clear and readable to a Forth programmer. (That's true, but Forth has less chance of being understood by someone that doesn't know the language than a program written in C. Compare them visually: ForthAndCsample. In my experience, I can teach someone isn't a coder the rudaments of Forth in 15 minutes. Teaching them C takes much longer. --SamuelFalvo?)

Huh? If you show the two examples to a person who doesn't know any programming languages, the C example isn't gonna be easier to understand than the terse Forth example. In fact the C example would probably be harder to understand because there's more weird syntax such as i++ and for( ; ; ). Most people exposed to programming get exposed to an imperative language, like C, pascal, java. They forget the learning curve that exists. I think the learning curve for Forth is similar as for a person's first imperative language.

The PrettyPrinter for GNU Forth (gforth command 'see word') demonstrates one rather readable CodingStyle:

 : star  
   [char] * emit ;
 : stars  
   0 
   DO     star 
   LOOP 
   cr ;
 : square  
   dup 0 
   DO     dup stars 
   LOOP 
   drop ;
 : triangle  
   1 
   DO     i stars 
   LOOP ;
 : tower  
   dup triangle square ;
 : main  
   cr 7 stars cr 3 triangle cr 6 tower ;

This code is actually a fine example of how a beginner would code in Forth. Predominantly vertical, because he's not accustomed to keeping stack state in his head. Fortunately, this code can be cleaned up significantly with experience.

 : stars      0 do [char] * emit loop cr ;
 : square     dup 0 do  dup stars  loop drop ;
 : triangle   0 do  i stars  loop ;
 : tower      dup triangle square ;
 : main       cr 7 stars  cr 3 triangle  cr 6 tower ;

Note the use of spaces to separate phrases within the same definition.

If it helps the reader to understand, feel free to annotate your code like so:

 ( let )

  : stars      0 do [char] * emit loop cr ;
  : square     dup 0 do  dup stars  loop drop ;
  : triangle   0 do  i stars  loop ;
  : tower      dup triangle square ;

 ( in )

  : main       cr 7 stars  cr 3 triangle  cr 6 tower ;

However, I'd hazard a guess that what's trying to be accomplished here is actually not at the right level of abstraction. Someone looking at the above, who is not an experienced Forth coder, will find the above only marginally easier to read than JayLanguage.

So, at the risk of introducing concepts using strange notations in one place, we can make the code easier to read (even if slightly more verbose) elsewhere:

 : invoke      >r ;   ( I've discovered using "call" primitive in GForth on 64-bit capable CPUs is buggy )
 : repeated:   begin dup while r@ invoke 1- repeat drop rdrop ;

 : stars       repeated: [char] * emit ;
 : row         over stars ;
 : onASide     repeated: row cr ;
 : square      dup onASide drop ;
 : triangle    0 do  i stars cr  loop ;
 : tower       dup triangle square ;

So, sometimes, with a little imagination and a wee bit of experience, you actually can write readable Forth after all. Who would have thought? :) --SamuelFalvo?

Ooh, that is really neat. I'd been wondering if you could do functional programming in Forth. Elegant, but it didn't work for me in gforth 0.6.2 on Linux 2.6.26-1-amd64 (or in a 32-bit chroot), so I came up with the following alternative for "invoke" that simply monkeys with the return stack rather than relying on "call". I also had to do a little magic to make sure the repeated phrase had access to the expected stack contents:

 : invoke     >r ;
 : repeated: begin  dup  while  r@ swap >r invoke r> 1-  repeat  drop rdrop ;

-- BillTrost

That's true, but Forth has less chance of being understood by someone that doesn't know the language than a program written in C.

I don't understand why it's important for people who don't know a language to be able to read code written in it. Can someone explain?

ForthLanguage, ExampleForthCode, ForthIsDead, ForthPortability, ForthReusability, ForthPessimism

CategoryForth