Switching your WordPress blog to html5: document outlines, themes, CSS and video

HTML 5 is a huge topic, one that has been occupying me intermittently for much of the past three months or so: when I started looking at it, I didn't realise quite how vast it was and only fully integrated its potential when I actually set about switching my site to html5 about two weeks ago. This was a process I thought would take just two or three days, rather like it usually does when I update my site design every spring; it ended up taking ten.

So my first price of advice to anyone planning to switch a WordPress blog to html5 is: don't rush into it, expecting just to change your DOCTYPE tag [i] and make a few cosmetic adaptations to your markup; of course you can do no more than that—html5 is backward-compatible [ii]; but the list of new elements in html is quite impressive. Older browsers don't support them out of the box but it's childishly simple to make these elements work in any browser, even IE6, by using very basic polyfillers: simple pieces of code that will provide any browser the ability to support html5 features in the same way as those that support them natively. Depending on how far you want to use these new features, you can add as many of these as you like, à la carte, as it were.

A great deal has already been written about the subject, however, and I don't think there's any point in going through the same issues that others have covered already. If you want a broader perspective on html, there's no lack of reading material: you may want to refer to the select bibliography at the end of this article. I'll try to focus here on html5 from the perspective of a WordPress blogger and cover each of the corresponding issues in turn, writing from personal experience.

Before covering the practical aspects of implementing html5 in a WordPress blog, however, I think it's useful to put what you're planning to do in a historical perspective, because standards in page rendering have effectively stood still for over a decade. But if this isn't a priority for you, you can safely skip this part.

Putting html5 in context: the historical perspective and the reasons why html5 has recently become a topical subject

When considering why it makes sense to look at html5, it's worth recalling that html, as a language is now twenty years old [iii]. HTML 4.01, which is the current version of html, was released in 1999, twelve years ago, and this was followed by the abandonment of html and the release of XHTML in 2000.

XHTML 1.0 added nothing new to what is available in HTML 4.01 and is merely a reformulation of HTML in XML. It's also a lot more unforgiving, because if you set your DOCTYPE to XHTML strict, no deprecated tags will be supported and the page will not be displayed correctly if its code is written incorrectly [iv].

For this reason, XHTML is actually a considerable improvement on HTML 4.01: everything needs to be in lower case and all tags needs to be closed—and, in general, this is a very good practice. On the other hand, if your page had even one error, remember this can be as small as an unclosed <img> tag, the user agent must stop parsing the document and present an error to the end user. If you're using a CMS such as WordPress, this is not a feasible solution, because it will tend to add code to your own output without necessarily achieving a standards-compliant result [v]. Because of this, most webdesigners choose to set their DOCTYPE to XHTML Transitional DTD [vi]:

<!DOCTYPE html PUBLIC
  "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

HTML5, on the other hand, uses a DOCTYPE declaration which is very short, owing to its lack of references to a Document Type Definition in the form of a URL and/or FPI. All it contains is the tag name of the root element of the document, HTML. In other words:

<!DOCTYPE HTML>

Not surprisingly, the blunder of developing a standard, XHTML, that was technically superior yet in practice impossible to implement, led to considerable confusion, which is perhaps best described by quoting Expansive Derivation:

With that, further development of XHTML was halted and thus XHTML 2.0 never happened. After this things at the W3C went into a bit of a hiatus, that is until 2004. In 2004 Opera and Mozilla, with Apple joining later, formed the Web Hypertext Application Technology Working Group, or WHAT-WG for short, and approached the W3C with a proposal. They believed that there was life in HTML yet and with some enhancements to the current version of HTML and the integration of what was known as web forms, HTML had a real future.

At that time the W3C was not interested and felt there was other areas that required more of their focus. This did not stop WHAT-WG from continuing their work and they continued working on what they termed HTML5, note the lack of a space between HTML and 5. This was a combination of HTML 4.01 with extensions as well as Web Application 1.0 and Web Forms 2.0. The true turning point in the evolution came in 2006 with an article published by Tim Berners-Lee entitled “Reinventing HTML” where he acknowledged that the W3C had made a mistake and that there was indeed still life in HTML.

In 2008 work officially started at the W3C towards HTML 5. But they did not start from scratch and instead used the work already done by the WHAT-WG as the basis from where to spring board further development.

It's pretty obvious from reading the above that one reason html5 has suddenly been resurrected is simply that prevalent web standards were effectively bursting at the seams as a result of not having been updated for over a decade. What has been happening in practice for well over half that period is that developers have been resorting to hacks in order to implement design choices that have become both desirable and sought-after but that hadn't been natively provided for by HTML 4.01 or by XHTML in its various guises.

The second reason why people are now showing interest in implementing html5 and a growing number of the more forward-looking sites are switching to it is to do not with web standards per se but browser compliance. Until about two years ago, the vast majority of users would still be using Internet Explorer, and of those a significant portion still, incredibly used a version of Internet Explorer, IE6, originally launched in 2001 and by then hopelessly outdated. In other words, most of one's site's visitors would be unlikely to be able to correctly display more richly-presented content, not only because they might not support given web standard, but also because their browser's innate deficiencies made implementation of this rich content consistently across browsers an excessively daunting challenge.

This has of course now finally changed: over the past two years or so, IE6 has effectively tanked as a percentage of browsers still in use, and the emergence of Google Chrome and continued development of new improved versions of Firefox and Safari has meant that a significant and rising proportion of users (about nine-tenths depending on which features you plan to implement) are now capable of displaying html5 correctly and reasonably consistently.

So when can one start using html5? Isn't it supposed not to be ready before 2022?

Ian Hickson, the editor of the html5 Standard, set out 2022 as the date on which HTML5 would reach "Proposed Recommendation" status as 2022 in a memorable interview with TechRepublic, while pointing out that one didn't have to wait until that rather exacting standard (defined as the moment when it can "require at least two browsers to completely pass [HTML 5 test suites]") [vii] to arrive before it could be, in practice, in widespread use:

Standards development isn’t like making software — people are implementing HTML5 as we speak, and many parts of HTML5 will likely be widely used long before HTML5 is officially “done.”

Just to put this in perspective, Ian Hickson also points out that there still isn't even one browser, let alone two, that fully supports every feature of HTML 4.

What this means in practice is that it's perfectly possible to start using html5 now, and has been ever since October 2009, which was the last call for the html5 working draft and the moment when the main issues have been ironed out and the specification can be regarded as more or less stabilised.

The question, therefore, is not whether you should embrace html5 now (that's entirely dependent on whether you want to use more cutting-edge technology) but how you can implement it consistently across browsers. Fortunately, the tools for doing this rather elegantly already exist and are simple to put in place.

Of course, you can choose a via media: before jumping in full-force to HTML5 production sites, you might prefer trying a soft transition, changing your DIV names slightly. There’s no real downside to doing this, you can even use the new DOCTYPE with very little consequence. If this reassures you, you might as well start planning for a fully-fledged transition.

The new structural html5 elements

These are well-documented: but here's a brief summary of the main ones.

<header>
The <header> element contains introductory information to a section or page. This can involve anything from our normal documents headers (branding information) to an entire table of contents.
<nav>
The <nav> element is reserved for a section of a document that contains links to other pages or links to sections of the same page. Not all link groups need to be contained within the <nav> element, just primary navigation.
<section>
The <section> element represents a generic document or application section. It acts much the same way a <div> does by separating off a portion of the document, with the crucial difference that, unlike a <div>, a <section> is a sectioning element (see below).
<article>
The <article> element represents a portion of a page which can stand alone such as: a blog post, a forum entry, user submitted comments or any independent item of content.
<aside>
An <aside> represents content related to the main area of the document. This is usually expressed in sidebars that contain elements like related posts, tag clouds, etc. They can also be used for pull quotes.
<footer>
The <footer> element is for marking up the footer of, not only the current page, but each section contained in the page. So it's very likely that, like me, you'll be using the <footer> element multiple times within one page.

Enabling html5 structure and features in all browsers

Beyond the new elements, the most obvious benefit built into HTML5 is the numerous APIs and the opportunities it opens up for the future of web apps with Holy Grail of application cache and offline capabilities. Google Gears gave us offline data storage and Flash introduced us to the power of application cache (Pandora uses it to save your log in information). With HTML5, these capabilities are now available to use right in the language and can easily be expanded with JavaScript.

For example, you’ve probably already heard of the <section> and <article> tags, both of which are champing at the bit to be embedded in a WordPress template. But to use these HTML5 elements in IE8 (and its predecessors), you need JavaScript in order to create them in the DOM. If you don’t have JavaScript, then the elements can’t be styled with CSS. Turn off JavaScript and you turn off the styling for these elements; invariably, this will break the formatting of your page. Three core technologies make the Web work: HTML, CSS and JavaScript. All desktop browsers support them (to some degree), so if any one of them off is disabled the user will have to expect a degraded experience. JavaScript is now fundamental to the user experience: Yahoo gives compelling evidence that less than 1.5% of its users turn off JavaScript.

There are essentially two ways of 'turning on' html5 for older-generation browsers and the poor people who seem to cling to using them.

html5shiv

If you're looking only to use the new html5 elements (section, article, aside, etc.) and to ensure Internet Explorer (which of course wouldn't do so otherwise, but that won't deter people from ocntinuing to use it) recognises them as block elements, all you need is to include Remy Sharp's incredibly lightweight html5 enabling script in the head element (or you can download the script and serve it inline if you prefer):

<!--[if lt IE 9]>
<script src="http://html5shiv.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->

Modernizr

Modernizr is a slightly more ambitious alternative to html5shiv, that does two things:

  • it detects whether the current browser supports CSS3 features like @font-face, border-radius, border-image, box-shadow, etc, and adds extra css classes to your html tag to reflect this;
  • it makes the new HTML5 elements available for styling in Internet Explorer, like html5shiv does.

Once you've installed Modernizr, you can easily see which features are supported in each browser by looking at the modifications it brings to your html tag in its generated source code. This is what mine looks like in Safari 5:

<html lang="en-US" class=" webkit root-section w-1200 lt-1280 lt-1680 js  gradient rgba opacity textshadow multiplebgs boxshadow borderimage borderradius cssreflections csstransforms csstransitions fontface domloaded js flexbox canvas canvastext no-webgl no-touch geolocation postmessage websqldatabase no-indexeddb hashchange history draganddrop websockets rgba hsla multiplebgs backgroundsize borderimage borderradius boxshadow textshadow opacity cssanimations csscolumns cssgradients cssreflections csstransforms csstransforms3d csstransitions fontface video audio localstorage sessionstorage webworkers applicationcache svg no-inlinesvg smil svgclippaths" id="index-page">

Its usefulness is perhaps best explained by A List Apart:

Ten years ago, only the most cutting-edge web designers used CSS for layouts and styling. Browser support for CSS layouts was slim and buggy, so these people advocated for web standards adherence, while creating hacks that made CSS layouts work in all browsers. One hack that became widely used was browser sniffing: Detecting which browser and version the user had by looking at the navigator.userAgent property in JavaScript. Browser sniffing allowed for quick and easy code forking, allowing developers to target different browsers with different instructions.

Today, CSS-based layouts are commonplace and every browser has pretty solid support for them. But now we have CSS3 and HTML5, and the situation is repeating itself—different browsers demonstrate varying levels of support for these new technologies. We’ve smartened up, however, and no longer employ CSS hacks nor use browser sniffing—an unreliable, poor practice. We’ve also convinced more and more clients that websites don’t need to look exactly the same in every browser. So how do we deal with this new but familiar problem? Simple: We use feature detection, which means that we do not ask the browser “who are you?” and make unreliable assumptions from there on. Instead we ask the browser, “can you do this and that?” It’s a simple way to test browser capabilities, but doing all these tests manually all the time gets tiresome. To solve that problem (and others), you can use Modernizr.

(A List Apart, Taking Advantage of HTML5 and CSS3 with Modernizr)

So in other words, you can use Modernizr to provide elegantly for pretty much every scenario depending on which browser is used on your site: for CSS, for instance, you can set different styles for browsers depending on which properties they support.

Structuring your WordPress blog for html5: the importance of planning everything ahead

A great deal has already been written about this and there's little point in my covering the ground again. Nicolas Gallagher has written an excellent Anatomy of an html5 WordPress theme which covers many of the issues. From the perspective of a WordPress user, perhaps the most directly relevant article is Smashing Magazine's Using HTML5 To Transform WordPress’s TwentyTen Theme.

Whilst these approaches have their merits, I personally chose to do things differently, in fact starting my entire draft again from scratch when I realised I hadn't taken account of all the relevant factors. As far as I could list them, they are:

  • DOM structure: this is different in html5 to HTML 4 and needs to be addressed first of all in order not to end up with an absurdly-structured site;
  • semantics: here again, the new html5 elements need te be used in a discerning way, and in some cases, it's actually preferable to continue using <div>-based structure rather than relying entirely on html5 sectioning elements

Heading (h1 to h6) structure: easily the trickiest aspect of switching a WordPress blog to html5

Headings, created with the h1-h6 elements, are important for any online document: they should be used for anything that either looks like or acts as a heading, partly because this makes more sense semantically, and partly because it improves the accessibility of your content: regardless of how you feel about they way your content is being spidered and indexed by search engines, which obviously can't do so unless you provide semantic, accessible content, you should always ensure what you write makes sense for anyone who, for watever reason, can see your content with styling applied.

In HTML 4, although there is an ongoing debate about it, heading structure was relatively simple: you ensured each page's heading was a relevant title for its contents, incorporated the corresponding words in the <title> tag, and nested it within an <h1> tag with the rest of your document structure flowing from that unique <h1>. The subject still wasn't without its pitfalls, and anyone interested in delving into this in greater detail could read an interesting post by Roger Johansson, 'Headings, heading hierarchy, and document outlines'.

The subject of heading structure in html5 (which is pretty radically different to what it is in HTML 4) is very well explained by Edward O'Connor in his article on 'Blog templates: a case study in using HTML5’s sectioning elements'. He suggests taking advantage of the provision in html5 for nesting articles into sections, each headed by an h1-level heading:

You may have noticed that—with the exception of the blog subtitle—I've only been using <h1> elements for headings, and not <h2> through <h6>. This is because sectioning elements scope heading elements, so that this document:

<body>
  <h1>Heading 1</h1>
  <h2>Heading 1.1</h2>
  <h3>Heading 1.1.1</h3>
  <h2>Heading 1.2</h2>
</body>

and this document:

<body>
  <h1>Heading 1</h1>
  <section>
    <h1>Heading 1.1</h1>
    <section>
      <h1>Heading 1.1.1</h1>
    </section>
  </section>
  <section>
    <h1>Heading 1.2</h1>
  </section>
</body>

have the same outline. Future UAs are expected to render section>h1 smaller than body>h1. For now, though, you could continue to use <h1>–<h6>, even with the new sectioning elements, because the child heading element with the highest rank is considered the heading of the section.

This rule is laid out in the specifications of the html5 document outline algorithm. You can check your html5 draft against it using the tools available online for doing so, which are the ones I used for the checks described below.

The issue: any heading not in a sectioning element will shift right up to the top of the document outline

The html5 specification says that:

The outline for a sectioning content element or a sectioning root element consists of a list of one or more potentially nested sections. A section is a container that corresponds to some nodes in the original DOM tree. Each section can have one heading associated with it, and can contain any number of further nested sections. The algorithm for the outline also associates each node in the DOM tree with a particular section and potentially a heading.

And also, even more crucially:

The section element is not a generic container element. When an element is needed for styling purposes or as a convenience for scripting, authors are encouraged to use the div element instead. A general rule is that the section element is appropriate only if the element's contents would be listed explicitly in the document's outline.

It took me a while to work out the implications of this: but they are pretty massive. At the heart of the html5 spec is the notion that each sectioning content contributes an item to the overall document outline and stems from a sectioning root. For this reason, although there is a tendency to try to weed out divs altogether and replace them with sections, this is not always a good idea, for the reasons we shall develop below.

My initial premise was that I wanted to retain only one h1-level heading per page: while there is nothing wrong with having several <h1> tags on a single page and it certainly would be valid html, I didn't think it would be a good idea semantically: if only from the perspective of search engines and screen readers generally, it makes more sense having one main heading for each page, corresponding to the title tag, with all the other subdivisions branching off from it.

Yet even once you've decided on your chosen document structure, you aren't at the end of your travails: this is because in html5, the document outline is vastly more complicated, because an html5 document is structured around sectioning elements.

If you take your existing structure and nest it within header, article and footer tags with a few <section> and <nav> elements added, you'll be surprised by how incoherent its structure will appear; Roger Johansson of 456 Berea Street explains how his attempt to convert the markup structure of a typical “document/article” page to html5 initially resulted in the footer becoming the header, and this was exactly what happened to me [viii]!

The first problem that this inevitably causes is that to avoid making the body element an untitled section, you need a heading that is outside of any sectioning elements if you want your document to be coherent and have a heading that reflects its content. Yet at the same time, I needed the <h1> heading to be one and the same as the title of every blog post in my blog post single pages: this meant I couldn't wrap my blog posts in <article> tags (since these are sectioning elements), leaving me with an unpalatable choice between letting my <body> stay untitled or moving the <h1> heading outside of the <article> tags.

A second consequence of the html5 outline algorithm is that if you include any sectioning element without a title, the algorithm will include them in the document outline nonetheless (because that is what defines a sectioning element) but will also mark them out as 'untitled' elements, which will result in an absurd-looking outline.

The solution: put any content you want to appear in your document outline in a sectioning element, and anything else in a <div>

Since a <div> isn't a sectioning element, I found the best solution was actually to set three elements within <body> (which here was my sectioning root as defined in the spec), corresponding respectively to the left sidebar, the main content and the right (branding) sidebar). These didn't correspond to structured items that needed to appear as such in my document outline; in fact, quite on the contrary, I needed them not to. Since in html5, a sectioning element is defined as one to which you would attach a heading and want to have appear as such in your outline, this is no more than a common-sense rule, but it goes so much against the grain of the structure one is used to applying in HTML 4 that it's rather difficult to get used to:

html5 DOM
The basic DOM for this blog: the body is used as a wrapper (to which styles can be applied). The left sidebar, main content column and right sidebar don't correspond to 'sectioning elements' (as defined in the html5 specification) that would need to appear in the document outline: so they are enclosed within <div> elements. In this way, sectioning elements can be positioned anywhere within the three main elements and headings placed as appropriate to ensure a consistent document structure.

This left the issue of having a top-level heading for each page that adequately reflected its content: in practice, in a blog, I found that this purpose was served, for SEO purposes, by the document title (this is enclosed in the <title> tags and generated by conditional comments, while the blog post itself can safely be nested in <article> tags .

To achieve this setup, I first let the initial <title> tag be generated from the following PHP:

<title><?php if (is_home () || is_front_page()) { bloginfo('name'); } elseif ( is_category() ) { single_cat_title(); echo " - "; bloginfo('name'); } elseif (is_single() || is_page() ) { single_post_title(); } elseif (is_search() ) { bloginfo('name'); echo " search results: "; echo wp_specialchars($s); } elseif ( is_404() ) { echo "404 Page Not Found"; } else { wp_title('',true); } ?></title>

The <h1> heading for all pages is then generated by the following conditional PHP statement:

<!-- Logo -->
<div id="logo">
    <?php
    if ( is_front_page() ) :{ ?>

    <h1><a class="url" href="/blog/" rel="nofollow" title="Go to blog front page"></a><span class="hidden">Donald Jenkins</span></h1><?php }
    else :{ ?>

    <h1><a class="url" href="/" rel="nofollow" title="Go to front page"></a><span class="hidden"><?php if (is_home () ) { bloginfo('name'); } elseif ( is_category() ) { single_cat_title(); echo " - "; bloginfo('name'); } elseif (is_single() ) { bloginfo('name'); echo " &mdash;&nbsp;Blog post: "; single_post_title(); } elseif (is_page() ) { single_post_title(); } elseif (is_search() ) { bloginfo('name'); echo " search results: "; echo wp_specialchars($s); } elseif ( is_404() ) { echo "404 Page Not Found"; } else { wp_title('',true); } ?></span></h1><?php }
    endif;
    ?>
</div><!-- [End] logo -->

Finally, I inserted the following in my single.php file:

<section id="content">
    <h1 class="hidden">Article</h1><?php if (have_posts()) : ?><?php while (have_posts()) : the_post(); ?>

    <article id="post-&lt;?php the_ID(); ?&gt;" class="post" role="main">
        <header>
            <p class="post-date"><time datetime="<?php the_time('c') ?>" pubdate="pubdate"><?php the_time('F jS, Y') ?></time></p><?php if (is_linked_list()): ?>

            <h2 class="linked-list-item"><a href="<?php the_linked_list_link() ?>" title="Link to &lt;?php the_title_attribute(); ?&gt;"><?php the_title(); ?></a></h2><?php else: ?>

            <h2><?php the_title(); ?></h2><?php endif; ?>
        </header><?php the_content(); ?><?php if (is_linked_list()): ?>

        <p><strong><a href="<?php the_linked_list_link(); ?>" rel="bookmark" title="Link to &lt;?php the_title_attribute(); ?&gt;">%u221E</a></strong></p><?php endif; ?>
    </article><!-- post -->

    <nav id="content-nav">
        <h2 class="hidden">Post navigation</h2>

        <ul>
            <li class="alignleft"><?php next_post_link('&#8592; <strong>Next Post</strong><br />%link') ?></li>

            <li class="alignright"><?php previous_post_link('<strong>Previous Post</strong> &#8594;<br />%link') ?></li>
        </ul>
    </nav>
    <hr>

    <section id="comments">
        <h2 class="hidden">Comments</h2><?php comments_template( '', true ); ?>
    </section><!-- comments -->
    <?php endwhile; ?><?php else : ?>

    <p><?php _e('Sorry, no posts matched your criteria.'); ?></p><?php endif; ?>
</section><!-- [End] Content -->

The combination of these three code items ensured that the hidden h1-level heading in the logo <div> would make it to the top of the document outline, as it wasn't contained in a sectioning element. The only downside of using this markup was that while I was able to enclose my blog posts in an <article> element, I inevitably ended up repeating the blog post title twice: once in the logo <h1> heading, once in the <article> <h2> heading. The conditional satement in the logo <div> would generate the appropriate sectioning root title for any other pages.

This setup resulted in the following outline for the home page (and a similar one for every other page other than single post pages). Note that headings marked with an asterisk were hidden—in other words, they appear in the outline, but aren't displayed:

  1. Donald Jenkins *
    1. Tagline *
    2. Front page *
      1. Latest tweet *
      2. Web presence *
    3. Colophon *
    4. Site navigation *
    5. A few topics
    6. Latest Tech post
    7. Latest Non-Tech post

And the following one for this post (compare the outline with what you see in your browser window and with the single page chart below):

  1. Donald Jenkins — Blog post: Switching your WordPress blog to html5: document outlines, themes, CSS and video *
    1. Tagline *
    2. Article *
      1. Switching your WordPress blog to html5: document outlines, themes, CSS and video
        1. Putting html5 in context: the historical perspective and the reasons why html5 has recently become a topical subject
        2. So when can one start using html5? Isn't it supposed not to be ready before 2022?
        3. The new structural html5 elements
        4. Enabling html5 structure and features in all browsers
          1. html5shiv
          2. Modernizr
        5. Structuring your WordPress blog for html5: the importance of planning everything ahead
        6. Heading (h1 to h6) structure: easily the trickiest aspect of switching a WordPress blog to html5
        7. The issue: any heading not in a sectioning element will shift right up to the top of the document outline
        8. The solution: put any content you want to appear in your document outline in a sectioning element, and anything else in a <div>
        9. Building your html5 files
          1. Serving valid html5
          2. Building the WordPress files
          3. Minimising database calls
          4. Preventing WordPress from rewriting your code
          5. The CSS file
        10. Preparing your existing WordPress database for the switch
          1. Using regex to update your existing WordPress database
          2. Using Video for Everybody to serve video consitently across all browsers from just one code snippet
        11. Bibliography
      2. Post navigation *
      3. Comments *
    3. Colophon *
    4. Site navigation *
    5. Further reference *
      1. Post metadata *
      2. Tags for this post
      3. Related articles
      4. Post navigation *
    6. Recently tweeted

I applied titles to all my sectioning elements, but when I didn't want them to appear on the web page, I simply marked the titles I didn't want to display (such as "tagline"), as well as ones I had inserted purely to give greater coherence to the document outline (such as the "Further reference" and "Article content" <h2> level headings above, used to group other sections together) as hidden (marked with an asterisk in the above list) via CSS, which meant they would be read by screenreaders and that my document outline remained coherent without burdening my site with an excessive number of headings:

Single page DOM
The document outline for a single page on this blog (compare with outline above which is generated by this structure): the post title in the central column is in a sectioning element: it only serves as the heading for the blog post itself, which is nested in <article> tags, but the title of the page (in the <title> tags) will be used by Google for SEO purposes. Hidden headings are used for some sections (Related information, Post content, Tagline), as well as the main heading for the page, which is generated by a conditional PHP statement.

Building your html5 files

Serving valid html5

Until quite recently, W3C didn't even offer an html5 validation service. I t now does, although it takes care to caution users that the service is still experimental. You can also use the dedicated html5 validator, if you prefer. Checking your validation at regular intervals throughout the build process is a good way of avoiding bugs

Building the WordPress files

Once you've worked out your document structure, building the corresponding files is relatively quick. I find it makes sense to start with a set of completely empty theme files:

header.php
The DOCTYPE, <meta> and <rel> tags, syndication, stylesheets and scripts go here, as well as the start of your <body> code.
index.php
This file loads your list of blog posts, also acts as the homepage, unless you set your blog to display a static page, as I do on this site.
sidebar.php
Contains everything you'd want to appear in a sidebar: mine contains conditional WordPress statements that trigger different sidebars for different types of content.
footer.php
Contains everything you'd want to appear at the bottom of your site.
archive.php
The template file used when viewing categories, dates, posts by author, etc.
single.php
The template file used when viewing an individual post.
comments.php
Called at the bottom of the single.php file to enable the comments section: I've disabled mine and replaced it with the Disqus Comments Plugin.
page.php
Similar to single.php, but used for WordPress pages.
search.php
The template file used to display search results: I replaced mine with a Google Custom Search.
404.php
The template file that displays when a 404 error occurs.

Designer Chris Spooner has an excellent tutorial for creating a WordPress theme from scratch.

Minimising database calls

Bear in mind, however, that if you aren't planning to market your theme or to change it frequently, you can actually reduce the number of php calls considerably by replacing theme or home php calls with absolute urls, for instance: this will cumulatively eliminate a lot of unnecessary database calls: when drafting my theme files, which are intended for my own use only, I systematically avoided using PHP whenever it could be avoided.

Preventing WordPress from rewriting your code

I added the following to my theme's functions.php file to deactivate most of the built-in WordPress functions that modify my painstakingly-built code, supposedly to 'improve' on it and which, in practice merely mess it up and break my validation:

<?php

// Stop WordPress mangling up code

//disable auto p
remove_filter ('the_content', 'wpautop');

//disable wptexturize
remove_filter('the_content', 'wptexturize');

?>

The CSS file

I find it easier to defer drafting my CSS file until I've finished with the other theme files, so that the CSS reflects the site structure's requirements and not the other way around. There's no fundamental difference between writing the CSS for an html5 WordPress theme, although I did find I needed a lot less markup in html5, because the underlying structure is simpler:

  • you'll obviously want to use an adapted CSS Reset that includes the new html5 elements: I use one based on html5doctor.com Reset Stylesheet v1.6.1 with modifications suggested by Antonio Lupetti;
  • I used the body tag as a wrapper, instead of a separate wrapper div, using an idea from Camen Design;
  • although it isn't the subject of this post, I try to make full use of lean CSS, YUI Compression and gzipping of my CSS and javascript files.

Preparing your existing WordPress database for the switch

Using regex to update your existing WordPress database

You'll need to upgrade you existing stock of posts to achieve consistency with your new setup.

In some cases, you'll want to move your headings up from, say, <h3> to <h2> (in my case, I left my post heading unchanged, at h3, so they nested inside the hidden <h2> "Article content" heading I mentioned above and that I added to achieve a clean document structure.

You'll also want to update your image and video markup to apply the new <figure>, <figcaption> and <video> elements to your existing posts. Using them enabled me to remove all the divs from my images, and all but one from my videos.

The best way to do this is:

  • back up your WordPress database using whichever tool you feel most comfortable with—there are plenty of them around;
  • install the Search Regex Plugin, which, with a little practice and cautious experimentation, can be made to adapt your posts to almost any new configuration;
  • cautiously update your posts, one bit at a time, making backups all along.

Using Video for Everybody to serve video consitently across all browsers from just one code snippet

I took advantage of the switch to html5 to start using Video for Everybody, Camen Design's clever html-only code that allows you to embed three different video codecs inside one <video> tag, ensuring 100% cross-browser compatibility. You have to encode the codecs yourself, but this isn't as bad as it sounds, and Brett Terpstra has just written a rather clever workflow for automating this process. I've coupled this with video.js, a javascript/css solution that ensures your videos render consistently across crowsers. As a result, all the videos on this site are now hosted on Amazon Cloudfront and viewable on any browser, including mobile browsers. I'll try to write a more detailed post about the huge difference html5 has made in the video field in the near future.

Bibliography

There's no shortage of material on the various aspects of html5. The most interesting articles I've read on the subjects covered in this post are listed below:

_______________
  1. In case you aren't familiar with the DOCTYPE element, it's quite simply an element used to declare what language (and its level) a document uses, and optionally what document type definition (DTD) is to be used in its handling. []
  2. The list of deprecrated html elements in html5 is actually quite small: it didn't include a single element I had even ever used. []
  3. Tim Berners-Lee created the first version of html, called html tags, in 1991. Version 2.0, which included the img tag, was released in 1995 []
  4. The HTML layout engines in modern web browsers perform DOCTYPE "sniffing" or "switching", wherein the DOCTYPE in a document served as text/html determines a layout mode, such as "quirks mode" or "standards mode". The text/html serialization of HTML5, which is not SGML-based, uses the DOCTYPE only for mode selection. Ironically, web browsers are implemented with special-purpose HTML parsers, rather than general-purpose DTD-based parsers, they don't use DTDs and will never access them even if a URL is provided. The DOCTYPE is retained in HTML5 as a "mostly useless, but required" header only to trigger "standards mode" in common browsers. []
  5. For that very reason, I've deactivated this feature in my WordPress theme's functions.php file. []
  6. XHTML Transitional DTD is like the XHTML Strict DTD, but deprecated tags are allowed. []
  7. As pointed out by HTML5 Doctor, CSS 2.1 , which is in widespread use, has been in development for over ten years, and has only relatively recently become a candidate recommendation (April 23, 2009). Yet this doesn’t have two browsers completely supporting it. Only Internet Explorer 8 supports the full CSS 2.1 spec. []
  8. As Mr Johansson explains, 'The footer element is not sectioning content, i.e. it does not create a new section. This leaves <h2>Footer heading</h2> as the only heading in the context of the body element's section. Since the body element is the document's sectioning root the outline algorithm makes it the top level heading, despite it being the last heading in the document and an h2.' []