Html Sucks

[Please note that most of the ranting on this page is years out of date and no longer applies to web standards as they are enforced now.]

HyperTextMarkupLanguage is a concretion of years of groups actively avoiding a standard. While it has many advantages, including ease of use, there has not been a sufficiently critical eye cast upon the bigger picture.

HTML has some problems, which include: The majority of HTML users in the beginning just wanted to make HyperText more portable, with an emphasis on doing the minimum necessary to get the data displayed. The popularization and subsequent evolution has resulted in a very strong emphasis on presentation at all costs. As a result, many spurious formatting tags have been injected into the language. These changes in focus have added zero value in regard to the markup and creation of text, and instead divert energy that could better be used elsewhere.

The evolution has also brought us the BrowserWars, and the subsequent HTML writer's hell of trying to get a page to display properly on all the popular browsers, and all versions thereof. This would be far less of an issue if everyone was able to accept function over form.

Because the markup is embedded directly into the source text, the addition of further separate layers of markup is prohibited. i.e., if you have a page, and I wish to highlight a section of it that happens to cross the tags in the page, the page must be completely parsed, and multiple sets of highlight tags created to simulate what should be a very simple task.

There is also the lack of being able to annotate someone else's work. Due to completely practical management concerns, most non-wiki web pages can not be modified. Because HTML isn't true HyperText, it's not possible to practice TransClusion, nor is it possible to refer to part of another document. A true markup system would have to allow anyone to markup any text.

HTML is also a hierarchical database, which does not support the addition of new layers on top of the existing data. This makes it impossible to add multiple layers of markup, let alone providing a facility for the user to select which markup layers are displayed from the available sources.

HTML is a good tool for many reasons, but when you need to write text, and mark it up, Microsoft Word still seems to be the only way to go. But see MicrosoftWordComplaints, such as having an unpublished proprietary format that is difficult to manipulate without Microsoft tools, and inValidHtml output (in older versions of Word, at least).

Please excuse this, it is a bit of a rant, I hope I kept it fairly sane and balanced.

-- MikeWarot


Alternatives?

None of these alternatives is a replacement for HTML; at best overlapping with HTML. Some might replace HTML if you jump through a zillion hoops.


The name HTML - Hyper Text Markup Language, implies a rich set of features that don't exist in reality. -- MikeWarot

In short, HTML really sucks. ItSucksBecauseItDoesntDoWhatiWant

That's one of weakest arguments I've seen for a while. Especially, linking to arbitrary sections would need us to establish some concept of identity of different text portions under change - that's the problem e.g. patch tries to solve (and does so only for almost-trivial cases.) -- PanuKalliokoski


See also: http://diveintomark.org/archives/2003/01/13/semantic_obsolescence


Minor HTML Annoyances


HTML violates the document model for UserInterface (see DocumentDefinitions) upon introduction of the 'Form'. That is, two different users may access the same 'form' by UniformResourceIdentifier, and yet have distinct content in their forms, thus proving they aren't viewing the same document. The inconsistencies and gotchas here hurt composition of HTML documents, hurts performance, and force WebBrowsers to serve a dual role as application platforms. Or perhaps that's looking at it from the wrong direction. Perhaps Meta:Refresh, the Refresh button in browsers, and History-tracking are the problems - violations of a consistent view that HTML documents create browser-applications. After all, HTML has been consistent in going further in the direction of 'browser-as-application-platform' by its later introduction of Dynamic HTML scripting and cookies.

HtmlSucks because Applications Suck. Applications combine too many concerns, and as a result resist composition and a number of optimizations that would be appropriate for multi-user shared, distributed, live ZoomableUserInterfaces. And, clearly, any approach to UserInterface that isn't optimal for multi-user shared, distributed, live ZoomableUserInterfaces must suck by every reasonable measure of suckage. ^_^


HTML is getting to the point where it should be considered deprecated technology. For simple web content (where separating presentation from content isn't necessarily important), it has been supplanted by XHTML, which does the same thing but is well-structured (being a proper XML application). Lots of other XML applications exist for the case where content and presentation ARE separate.

The biggest problem with browser incompatibility wasn't HTML; it was support for other features such as JavaScript and the like.

At this point, picking on HTML seems kinda silly... it served its purpose. Something newer and better is here. As t -> infinity, the number of pages on the web rendered in HTML as opposed to something else is likely to approach zero.


HTML sucks because it's bloated with a bunch of stuff that shouldn't be there. Essentially, HTML 2.0 is superior to HTML 4.0. All we need is basic logical markup, a way to define your own style classes, and a good style language such as CSS. Anything more complex than this should make use of MacromediaFlash, JavaApplets? or something else better suited for the task.

But when it works, it's not all that bad. And in practice, it often does work, provided you keep a few things in mind. In fact, they are pretty much the same as for programming:

The former two of these are more or less forced by XHTML 2.0 (http://www.w3.org/TR/xhtml2/). I think it goes without saying that I can't wait. If authors were forced to write semantically correct documents free from presentational garbage, then users would be given a method for getting rid of all the annoying "style," all in one surgical operation. Think about it: You could have all web pages rendered in a uniform way! And browsers wouldn't have to be gigantic monolithical reptiles to be useful; you could successfully surf the web in the EmacsEditor (modulo MacromediaFlash animations, etc.).

-- DanielBrockman


An interesting example of how an HTML 'feature' came about: the invention of the IMG tag. Be sure to read the whole thread.

http://www.webhistory.org/www.lists/www-talk.1993q1/0182.html


To make it a little bit more technical [and less understandable ;) ]

Normal text is just that: a file of characters; the only structure is basic grammar of the language in which it is written. Hypertext is much richer: it has the ability to connect internal resources with other internal resources or even with external resources. To put this into a representation which a programme can use, there are few options but to put in a tree structure. We can write software which can handle trees and transform or render them into any other format: browsers, compilers and filters are all the same thing; and we know how to make them.

As far as I can tell the meaning of "representation" as is used for html and xml means that it is a tree of text and metadata about the text. The hierarchical tree is the only relevant representation at this level: an AbstractSyntaxTree. Only after rendering or compiling of this AST can a presentation (visual or otherwise) be made. Therefore the xml file is only a serialization of the intended AST which contains the meaning of its writer. This is why good html/xml does not contain visualization or styling information but optionally offers it in an external resource.

Personally, I believe that mosaic's visualization of images inline was the very first mistake. Notice that the img tag used to be the a tag (http://www.webhistory.org/www.lists/www-talk.1993q1/0182.html): the image would be fetched and shown only when you activated the link, it would than be shown in a new window (or when this was already loaded and still existed, it would be raised). This was IMHO a much better use of screen space: now you have to scroll up and down every time the text refers to the image.

When mosaic's tabloid-style rendering hit the market, most people began to believe that this was the way html should be rendered. The whole point however when html was developed was that each platform should render the html to the best of the platform's abilities. Somehow this notion got lost in the rush that followed, and now we have the current problems and html sucks again as much as any other solution before already did. HTML is no longer abstract anymore but convoluted with bold and italic and other useless stuff. At the same time, slowly, we have learned that computer screens are not paper and that the same rules do not apply: the tabloid rendering just does not work in many cases.

Enter xhtml2.0. We have gone back to the root of html and took with us 20/20 hindsight and recreated html how it should have been all along. Key facts: (a) html should not be written by humans, (b) optional external resources (other html pages, images, movies, etc.) are, well, optional and the html document should be understandable without them, (c) optional rendering hints (e.g., css, think of it as a kind of skin) can be given as an external resource, (d) optional AbstractSyntaxTree transformations (e.g., xslt, think of macros in LispLanguage)) can be given as an external resource. -- GideonKlok

Personally, I don't think the problem isn't the separation of presentation and content, and I don't think XHTML 2.0 will solve it. The problem is that, like all successful innovations, HTML has long outgrown its intended purpose. People use HTML for all sorts of things that TimBernersLee couldn't have imagined when he invented it.

HTML will continue to be written by humans for as long as it's more convenient to do that. MacromediaDreamweaver and MicrosoftFrontPage made a dent in that, but now most HTML seems to be generated by PHP/JSP/ASP, and existing GUI tools don't work well with that (though Dreamweaver MX and VisualStudioDotNet are again making it feasible). Optional external resources aren't optional at all if you want a site the general public will visit (Wiki and most technical sites are not "sites the general public will visit", which is rather fortunate for us). Rendering hints as an external resource is catching on, thankfully, but XSLT hasn't because they make it so damn complicated.

I do techie work for a midsize (~50,000 user) fanfiction website. Even in new code, we still use tables for layout, a huge bastardization of the original intent of HTML. Why? Because a non-negligible portion of the user base still uses browsers that don't support CSS positioning well. While I'd love to put everything into nicely separated presentation and content and use only styles for layout, I can't. MakeItWorkMakeItRightMakeItFast - it's more important to make it work than to make it right.

I'm going to go out on a limb and predict that going back to the root of HTML will work only for those people that, well, were the root of HTML. It's fine for the computer geeks and physicists, because they've never cared about presentation anyway. It's even fine for the gizmoscenti, because they'll have the latest browsers that all render XHTML and CSS beautifully, and if there's a glitch, they'll put up with it for the buzzword compliance. But it's not fine for the net magazines. It's not fine the teen sites. It's not fine for the e-commerce sites. And it's certainly not fine for the poor folks trying to deliver entire application UIs through the web. These people all have clientele that care very much about the presentation, and will likely go elsewhere if it's ugly. -- JonathanTang

I do assume that - because in my experience that's the only way to create a successful design. Before it can be used in ways its designers could never have anticipated, it's gotta be used in some way, and unless you're fantastically lucky, that's usually because it solves the problem at hand. Yes, it's great when a design proves flexible and extensible, but most of the time that requires significantly more effort, which means that someone else solves the problem at hand first, and then there's no initial problem for your design to solve. WorseIsBetter.

I'd love to see a general-purpose ObjectBrowser, particularly with the capability to treat HTML pages as pathetically backwards types of objects within it. But it has to handle HTML at least as well as existing browsers. It's not enough to say "Well, it sucks at what you're used to, but it's really great at this new paradigm that will make your WWW completely obsolete." You have to be able to say, "It does HTML as well as Netscape/Mozilla/IE/Firefox/Konqueror/Galeon/Opera, and it has this great new feature that makes software so much easier." I, unfortunately, don't have time to write such a complex piece of software. Do you? -- JonathanTang

Not inlining images would be as annoying as those old books where all the photographs are on a dozen glossy pages in the middle. -- Brock


Isn't the main problem with HTML that you can't just simply design applications to load on a form? By form, I mean one of those forms that you draw on with code, or with your mouse. If your browser acted like a form, instead of a document (think VisualBasic, delphi) then web applications would be fun. Would you try and make computer applications (software) function in a wordpad editbox or a WordPerfect editbox, or StarOffice editbox? This is what html is basically forcing you to do. Why not make them function on a form, not a document? HTML can be great for documents, but who says we have to use it for the "end output" of applications? The server language still spits out HTML in the end; it doesn't spit out anything on a form, as CeePlusPlus would in an OperatingSystem.

Cross platform is not a big issue. There is nothing stopping us from designing a cross-platform style language where you draw applications on a grid (either visually or by hand code), not a bloated RTF document. All browsers could conform to the "form standard" instead of the current "document standard" (html). An HTML browser is a document placeholder, not a form placeholder. In the real offline world, all GUI applications (software) are a form.

Look, it doesn't matter what language is on the server: PhpLanguage, ActiveServerPages, PerlLanguage, PythonLanguage, etc. On the client's browser, you have HTML. In the end, everything is converting over into HTML. One way or another, the code is going to have to end up in HTML once it gets on the client's computer. As for JavaLanguage, that's different, but that isn't the solution and that has nothing to do with what I am trying to point out here. You still always have HTML on the client's computer. WebBrowsers could be redesigned to read a form-based language, not a document-based language. It's as if we are still stuck in DOS (DiskOperatingSystem) days, but instead of DOS, we are still using documents to make our applications, instead of forms. -- AnonymousDonor

--

I think that's backwards - instead of forcing HTML to be form-based, we should ditch forms, go back to documents, and work on making those documents more interactive. The problem with forms, aside from ignoring the past 25 years of UI research, is that they force the user into a modal workflow where you must have all the info the computer needs before you can proceed. The user becomes a data source for the computer, instead of the computer becoming a tool for the user. That's why DirectManipulation UIs are almost universally preferred to form-fill-in.

DirectManipulation, however, is much harder to program, and was only just beginning to become popular before the web took over. The AppleMac? took the lead, and now programs like EclipseIde and some scanner tools and games like StarCraft? are all starting to embrace it. But the web, and e-commerce in particular, took us back to the days of IBM mainframes. Now, for many applications, you have to fill out multiple screensful of forms to do anything. And if you hit the back button, most sites won't even correctly remember what you've filled out.

Unfortunately, bringing real DirectManipulation to the web probably requires a LetsBlowUpTheUniverse type project. And those types of projects tend not to succeed. You run into problems with HTTP being stateless, and HTML not providing any abstractions like drag-and-drop, and browser scripting not being powerful enough to do it yourself, easily at least. And oftentimes you need to round-trip back to the server, which has a lot of overhead. So I'd love to be able to use the web the same way one could use Eclipse or Morphic or a LispMachine, but I don't see it happening any time soon. -- JonathanTang

So say my web site allows someone to register, with sufficient info for me to postal mail them products. What would the not-a-form look like that handles this? -- dm

You're familiar with NakedObjects, right? Like that. ;) Obviously, at some point you'll need a form to get the data in, but that form hardly has to be the focus of the application.

For example, consider Amazon.com, who've actually done a decent job coming up with a UI given the constraints of the web. Currently, if you want to buy stuff, you visit each of the pages of the items, click "Add to shopping cart", and then "Proceed to check out". (It may be simpler with one-click ordering, I don't have it turned on). Then you get presented with about 4 forms, to: 1 Log in 1 Choose a shipping address 1 Choose a shipping method 1 Choose a payment method 1 Review order and confirm At least it remembers all your shipping and payment details, so you don't have to type those in each time.

The problem with this is that sometimes the info in later steps is necessary to make decisions in the first steps. I once spent about an hour on an Amazon order, because I needed 3 textbooks, 2 of which shipped in 24 hours and were needed in a week, and the other of which shipped in 2-3 days and wasn't needed for a month or so. Go through whole process, get to last screen, take a look at expected arrival date. Cutting it too close. Back to shipping methods, group shipments so they ship when ready instead of minimizing shipping cost. Expensive, but should work. Need to go back and put it on a different credit card though. But wait, now I can supersaver the slow textbook and save some money. Back through process. And oh, now I'm down almost to the original price, so I can put it back on the first credit card. Arghh. (And it didn't matter anyway, because the book that "shipped in 2-3 days" got to me before the 24-hour books that I needed. Never trust shipping estimates.)

I've had similar problems when ordering soon before a vacation, because the expected arrival date requires changes in the shipping address. There's nothing to do but go through and see when they'll arrive, then go back and tweak things.

Now, imagine if instead of the form entry, you could drag any icon on the site into the shopping cart. No more clicking through a dozen pages to follow the "Amazon also recommends..." links: you'd just grab their covers and toss them in the cart.

That's what I was saying with a form style system. For example drag and drop, mouse over, OnClick?, etc. are all available in a program created in a language like delphi, c++, etc. You don't have to click submit buttons, you can do it in real time, on mouseover events, drag events, etc. .

Then when you check out, you could grab an address label and stick it on the package. For a new shipping address, you'd grab a blank label, which you'd then fill out form-style (preferably right on the page, where you can do stuff like add other items or change the credit card. Modal interfaces are bad). And if you change shipping options, the order total and expected ship date would be updated in real time.

Why are modal interfaces bad? You can initiate actions in real time in C++, delphi, python, etc. Mouseoff, mouseon, keyboardup, keydown, are some examples of real time dynamic updates.

That way you can just throw on a new label or change a shipping setting or use a new credit card if the results aren't what you like.

The important thing is to make the user feel like they are in control. If the computer needs data, a form is fine - but don't make me drop everything to fill it out. Instead, missing pieces should be quietly indicated the way Eclipse flags syntax errors: with a noticeable but unobtrusive mark that gets the attention but doesn't require immediate correction. -- JonathanTang

Agreed; I use Amazon heavily, and they drive me crazy with their UI (I also haven't noticed a way to switch items between "shopping list saved for later" and "wish list", which is stupid).

It's just that previously you said "ditch forms", which made much less sense to me than your clarification "you'll need a form to get the data in, but that form hardly has to be the focus of the application".

BTW Amazon could of course fix all of this, if they were extremely careful. Naked objects just make it (as far as I can see, not having actually done this) much easier for things to "just work" without that extra effort. -- dm


I also don't understand why he wants to ditch forms. When I say forms, I just mean a form instead of a document.. you can always have a document inside a form. The advantages of a form seem twofold, or threefold to me. A form allows you to draw things on it wherever you need, whereas a document forces you to conform to it's rules. We use tables when we should be using a form.


As an example, take UPS shipping tools or Ebay Turbo Lister. Why is it that when one downloads UPS shipping tools software or Turbo Lister, they are far easier, and severely faster to use than their online browser based shipping center? Because HTML generally isn't designed for hosting applications, and software is. Or in other words, HTML sucks when it's trying to run applications or emulate software.


There seems to be an overhead problem with Html. There are even some websites that have drag and drop functionality, and popunder menus, etc. but the overhead that is needed is far too much. HTML seems to be trying to reinvent software applications inside a browser. It would be better just to run software in the browser.

[Not necessarily.

If you ran a powerful, general ObjectBrowser, then you could do the following:

That's one option.

Another option would be for the server to run its own ObjectBrowser (or Space) which your browser delegates to when it runs into the aptly named .space object type. In that case, your OB would forward its user events to the remote OB and the remote would send back frames.

Only the third option would be to import objects and methods to the local VM so they can be run securely under a capability system. Basically, to distribute your server-side application across to the client side. This is the most involved of the three options and also the least likely. Even IF you have a powerful high-level capability based security system in place that makes absolute security possible and easy, it's conceivable that the user missed something that would let the remote code damage their machine.]

Personally, I think that HTML is completely obsoleted by a secure remote method invocation package. Why do you need a protocol when you've got a graph coming through the pipe? Sure, it uses some kind of protocol internally, somewhere, but why do you care? Why would you even want to parse HTML when you can just send messages to a Smalltalk object of class HtmlPage? Why reinvent marshalling, parsing and unmarshalling for web crap when you've got a generic mechanism that works for everything.

That's why I really, really don't see the point with XML. Apparently, it's a way to marshal, parse and unmarshal trees of tagged data. Ummm, what about graphs? And what's wrong with just creating and using the bloody class definitions in Smalltalk? -- rk

A lot of people rail against XML, but although it's overhyped, otherwise there's not that much wrong with it. In particular you can use it to represent graphs if you want to, and the advantage over using Smalltalk is of course that XML can be read and written by any programming language

As for marshalling and unmarshalling, usually XML is used to represent passive data, which is relatively safe. When used to transmit active agents, well, as you know, everyone is using insecure systems, so that's more worrisome. It does the job, though, if that's what's desired.

The ultimate point, though, is who cares what representation is used at the lowest levels for transmitting raw data? It's just not that big of a deal, as long as it is a mechanism that can be used between a wide variety of different kinds of systems (which obviously would not be true of Smalltalk specific mechanisms).

But if you don't care about any of that sort of thing, do whatever you like. Fortunately, XML is not mandated by law or international treaty (oops, actually it is, never mind).


Merge with TreeUberAlles? What about relational vs hierarchical database discussions on this wiki?


UTD (http://www.cs.tut.fi/~jkorpela/data/utd.html) looks interesting, albeit not likely to be adopted any time soon. It does seem a bit idealistic in places: for example, instead of having an inline-quotation element and a block-quotation element, there's just "Q", and the browser is supposed to analyze the context to decide whether to display the quote inline or as a block. I can't imagine how this would be done, even in the year 2022. Still, there are strokes of genius in UTD, such as distinguishing between forcing and non-forcing markup. It's definitely worth a look.


CategoryWebDesign CategoryInternet CategoryRant CategorySucks


EditText of this page (last edited November 23, 2011) or FindPage with title or text search