Friday, April 5, 2013

Optional HTML, good idea?

ack along the trail I was following in my post on carousels and web design firms I stumbled across Google's HTML and CSS Style Guide. This is, of course, just one resource in a sea of options, and as the name suggests, "more like guidelines, anyway." I found it useful to glance over to get an idea of what to do with my code once I had grasped some of the basic stuff.

The Style Guide has some helpful reminders about indentation (don't mix spaces and tabs, indent child elements), plus general etiquette (stick with the previous author's style, be consistent). The whole thing can be summed up as: optimize as much as possible, made code easy for collaborators to scan through. Somewhere halfway down the page, though, I found a section on optional HTML tags.

In all fairness, both Google and the WhatWG HTML5 spec frame this concept with "you may" and "consider" and other non-mandatory type phrasing. But I thought, hey, here's another way to be kind of minimalist, keep things tidy, not have more clutter than is really needed. And indeed, I could simply be rid of the <html>, <head>, <body> and a whole pile of closing tags. But! the traditionalist and skeptic in me took a pause and said, "Should I really mess with this right now? I haven't noticed this implemented anywhere, and my classes/research have all indicated that closing tags are the basis of a well-formed markup language. Maybe this is best left for another day."

Well ok, inner-skeptic, but I still want to learn a little more. So, some questions:

  • What exactly constitutes a valid, well-formed markup language?
A markup language (ML just for now, though not to be confused with ML, the functional programming language I just couldn't deal with) is used to annotate and describe a document. I'm not going to go into the history and fascinating variety of MLs here, so suffice to say that in this case, for HTML or XML, the markup is intended to indicate what each tagged element contains. In HTML this could be a <header> or a <section>, which provide the browser with standard elements to display and style accordingly. XML uses tags to describe content, too, but is best suited to databases (DB). Since it is used in DBs, it is particularly necessary to have both starting and ending tags and to completely conform with the DB definition or else any references to each element would just fail. Not going into that here.

So a well-formed and valid bunch of whicheverML follows all the rules in the relevant specification and passes tests like the W3C validator. Well-formed generally refers to syntax, so elements that are arranged according the rules are syntactically correct. The traditional definition of well-formed includes closing anything that has an opening tag and properly nesting elements. A valid ML also follows its Document Type Definition (DTD), a sort of grammar dictionary for how to apply the language (here's the HTML4 DTD, for example). See the W3C validator's info here.

  • How much of a difference in speed does tag omission really confer? 
There's not a ton of this out there on the surface, but again, Google has been implementing, or rather, omitting, the optional tags to decrease file size. When every bit counts, taking out even just a few characters here and there can add up to a big difference. According to the Google Developers article and video on reducing HTML document file size, this can actually result in "5-20% savings" in load time.

There's a conversation on StackOverflow from 2010 about how much of a difference this could actually make. It's hypothesized that especially at Google's size, "</body></html> is 14 characters and at 3 billion searches per day, it amounts to approximately39.12 GB of data per day ignoring compressions, or around 26 GB if we take gzipping into account." And another discussion from about the same time that goes more in depth on the whole issue

  • Are there any arguments against tag omission on the whole?
Some people argue for tag omission for readability, others argue against tag omission for readability. So either way, people will eventually stumble through your code. The best reason not to omit is probably compatibility. Some browsers might not recognize an unclosed element properly, some scripts may get confused, it really depends what the content is expected to do. It winds up being a personal choice: you can include the superfluous, or you can be hyper meticulous and make sure you are only omitting tags that can be omitted and not getting confused in the process.

  • Is this just one of those crazy, Google-is-way-ahead-of-everyone-else-again things?
Hard to say. Tag omission is out there, but most people don't need the speed that badly. The implementation status on the HTML5 spec only suggests that tag omission hasn't been tested on the latest browser builds.

1 comment:

  1. Holy cow! I just looked at the page code for this post. Whatta pile of tags, scripts, styles, occasional content.... etc. ad infinitum. Is all that *really* necessary?