Markup Language

A markup language defines certain text elements that contain meta information about a text. How to layout the text, what fonts to use, whether it is important, relations to other texts.

Some examples:

Often the plain text once marked up is barely recognizable, buried within the MarkupLanguage. An exception is PXSL which reverses the process and only marks up the data.

A MarkupLanguage uses the EscapePattern.


WikiWiki is different. It doesn't require a MarkupLanguage. We use WhiteSpace? and established plain text conventions: Empty lines separate paragraphs, asterisks start list items. Simplicity means that many people can contribute without learning a lot of baggage first. Most of the markup happens implicitly. Putting empty lines between paragraphs is simple and most people don't have to think about it. Nobody has to memorize fancy rules. That is why anybody is able to contribute.

Some of the conventions are getting dangerously close to YetAnotherMarkupLanguage: two, three, and up to six apostrophes for emphasis, square brackets for links (on other Wikis). That's not good: The benefit of not having a MarkupLanguage lies in the simplicity. Let's not lose that advantage by adding more and more tags to the TextFormattingRules.

Contributors: AlexSchroeder.


self-reinforcing text

Simplicity isn't defined by numbers though - more text formatting rules are fine if they are simple and self reinforcing. Putting links inside square brackets is fine, so long as the resulting formatted text preserves those square brackets - it acts as a small lesson in how to write links. That's why bullet lists start with asterisks - the markup language (which isn't) looks like the final result. An editor of printed texts have all manner of symbols for marking up corrections: a non-capitalized letter which should be is enclosed with a large letter C which is a literal clue to Capitalize, inserts are indicated with a wedge shaped caret, and so on.

A good attribute of a markup language is the ability to scrape it directly from an HTML page and have it appear with bold and italics intact. Wiki markup doesn't quite do this. Is it possible?

Many modern email readers partially implement this -- they accept plain ASCII text email, and recognize http: links and format them so they really work, and do other a few other bits of formatting. The syntax-highlighting editors seem even closer to what you want to do. They only store the raw text, and regenerate all the highlighting (colors, italics, bold, links, etc.) on-the-fly every time they load a text file. That's what you want, right? Something that's easy to learn, because you can directly see everything on screen. All the "magic" formatting stuff is regenerated from the letters you can see, rather than stored in some kind of (normally) invisible formatting code. Rather than the act of creation being some sort of mystery (involving memorizing arcane control codes or hunting around on a menu), it becomes transparent -- you can see exactly which keys the author pressed to create that text. -- DavidCary


Many people believe XML should be used for describing structured text, like the text used by WikiWiki systems. Their reason is that it's easy to use standard tools like XML parsers, which come in different flavors (DOM, SAX, pull parsers), or tools like XSLT.

The problem I find with these markup languages is that it's very hard for people to write them without using programs that keep track for them of what is allowed to enter in a given context. Moreover the tools used to process documents written in such a markup language are very complex not only to use, but also to implement. Just look at [http://xml.apache.org/xerces-j Xerces] and [http://xml.apache.org/xalan-j Xalan], the leading open-source Java tools for XML processing: they're full of bugs, incompatibilities and other problems.

The question is: why do we need all this complexity? The simple annotation of the text, as used by WikiWiki (see TextFormattingRules), should be sufficient.

I think it may have to do with our need to solve problems in a different way. But are they necessarily better?


I happen to like Icoruma, a markup language designed for a very specific purpose, which is to type rules for role-playing games. It can then be compiled into HTML or other formats. Formatting codes include: bold, italic, superscript, index entry, numbered list, unnumbered list, footnotes, tables, six levels of headings, and two types of cross-references. It also supports include files, subroutines, and custom fields; It is even TC (although there is only one kind of loop (iteration through records in a list), you can still call subroutines recursively). For an example, see: http://zzo38computer.cjb.net/icosahedral/icoruma/spells.irm


I don't know, guys. It seems like a pretty tough act to follow with XML. The content is still fully human readable and can be formatted to fit within a large number of constraints. Not only that, but the content and its representation can be isolated to any level you choose just by smarts applied to the design of the elements and their attributes. XML gets my vote as the #1 choice for cross-platform data transport.


Other references:

http://www.usemod.com/cgi-bin/wiki.pl?WikiMarkupLanguage, http://en.wikipedia.org/wiki/Lightweight_markup_language


See: AlmostFreeText

Contributors: AlexSchroeder, DavidCary, MartySchrader

CategoryLanguageFeature (may not be a very good match, but it belongs somewhere) CategoryTextFilter


EditText of this page (last edited December 3, 2010) or FindPage with title or text search