Semantic Markup and Microformats for Online Scholarship

One way in which electronic scholarship in general, and e-journals in particular, could benefit from Web 2.0 is the creation and adoption of microformats specifically for historians in particular and academics in general. Microformats involve addressing small, clearly defined problems with marking up documents and setting out to solve that problem. Microformats.org provides a good introduction to what microformats include as well as examples of defined microformats for items such as contact cards, calendar events, and simple reviews.

Anyone who has worked with HTML or XHTML has probably had, at some time or another, questions about how to mark up a specific piece of content. Especially when dealing with older documents that don’t always conform to modern publishing conventions. Even then, HTML and XHTML are significantly limited when it comes to marking up documents in meaningful ways.

One way that markup falls short with regard to academic publishing is the way that citations (footnotes or endnotes) are coded (or, in lots of cases, not coded). Take, as an example, a citation from an article in the open-access journal International Journal of Naval History:


<p class="MsoEndnoteText" style="text-align: justify; line-height: 150%; word-spacing: 0; margin-top: 0; margin-bottom: 0"><a name="_edn2"><font size="3" face="Times New Roman"></font></a><a href="#_ednref2" title>
        <div id="edn2">
          <p style="line-height: 150%; word-spacing: 0; margin-top: 0; margin-bottom: 0"><font size="3" face="Times New Roman"><span lang="EN-GB" style="mso-ansi-language: EN-GB; mso-bookmark: _edn2" class="MsoEndnoteReference">[2]</span></font></a><font size="3" face="Times New Roman"><span lang="EN-GB" style="mso-ansi-language:EN-GB">
          See Gary Weir, <i>An Ocean in Common: American Naval Officers,
          Scientists, and the Ocean Environment</i> (College Station, 2001),
          270-6; 334-5.</span><o:p>
          </o:p>
          </font>
        </div>

The Microsoft-specific markup aside, there is no markup indicating the different information included in a citation. A much cleaner, more semantically-coded footnote that would conform to microformat standards might look like this:


<ol class="footnotes">
  <li class="citation" id="edn2"><span class="author fn">See Gary Weir</span>, <span class="title">An Ocean in Common: American Naval Officers, Scientists, and the Ocean Environment</span> (<span class="location">College Station</span>, <span class="date year">2001</span), <span class="pages">270-6; 334-5.</span>
</li>
</ol>

I say “might” because there is no proposed microformat for scholarly citations. I chose my format for several reasons. First, it makes sense to mark up footnotes in an ordered list <ol> because that’s what it is: an ordered, sequential list. The number shows up automatically because ordered lists are, by default, displayed using numbered bullets. The “look and feel” of the citation can easily be changed (and should be changed) using Cascading Style Sheets (CSS. I mark up the author’s name, the book title, the publisher’s location, the date of publication, and the pages used by indicating a class attribute for each. This goes beyond simply using an <i> tag to italicize the book title. The <i> tag is merely presentational; It only italicizes the text, and you can italicize any text. But we want to do more than italicize the title. We want to indicate that it is, in fact, a title of a book. Moreover, we’ve indicated what text is the author’s name and other information, thus providing more meaning to what the text represents.

But why, you might ask, should this matter? Isn’t the fact that it looks at works alright the most important thing? My answer is emphatically, no, it does indeed matter. On an ever-growing web of information, where content is constantly competing with other content for the attention of user, where findability is currency, semantic, meaningful, human and machine-readable content will flourish. Semantic markup makes for better accessibility for all users, will make it easier for academic work to be converted and used in RSS and other XML-based tools, and will help create a uniform standard by which academics should publish scholarly work on the web. Take any 20 electronic journals on the web, and I’ll bet none of them use the same markup for citations.Using the markup in used above only slightly changes how the citation displays and works in a web browser. But my changes are more concerned with enabling people to take that content and use in in a myriad of contexts, “future-proofing” in effect so that we may use this citation in other applcations and display it in other ways. Standardization of scholarly citation on the web might, for instance, enable someone to create a tool that can search for how frequently a particular work is cited and aggregate a list of publications that cite a particular work. In an academic world where the influence of one’s scholarship is important, wouldn’t this be useful?

Tags: