Markdown patch for varying empty element suffixes
As noted in a previous post I am a big fan of the Markdown text-to-HTML conversion tool. However nothing's perfect. I already discussed a bug involving link ids, and I subsequently found one other reason to patch Markdown, for sites like mine that generate both HTML 4.01 Strict and XML pages (an Atom feed in my case).
The issue is that by default Markdown assumes that it is actually
generating XHTML, not HTML, and by default therefore uses the XHTML
syntax for empty elements, where empty elements such as
the BR, HR, or IMG tags must either have an end tag or the start
tag must have a '/'. (Note that some people recommend putting a
space before the '/' in order to satisfy both HTML and XHTML
requirements, but this apparently opens a can of worms and hence
should be avoided in my opinion.)
By default Markdown uses the XHTML form for empty elements. However in
my case I'm generating "real" HTML (as opposed to XHTML sent with a
text/html content type), so I need to change this behavior. The
standard way to do this in vanilla Markdown is to modify the
configurable variable $g_empty_element_suffix, setting it to the
proper suffix for HTML. This fixes HTML pages, but unfortunately then
breaks XML pages, in particular the Atom feed generated by Blosxom.
I therefore decided to patch Markdown to be more intelligent about
setting the empty element suffix. The strategy I decided upon was to
set $g_empty_element_suffix in the start subroutine based on the
current Blosxom flavour: If the flavour is "html" then the suffix is
set to the HTML form, otherwise it is left at the default value. (Note
that the patch won't work if you're really using XHTML under the
"html" flavour; this case is hard to code for because there's no easy
way to tell that XHTML is intended rather than HTML.)
I've created two patches for this problem: a Markdown 1.0 patch and a Markdown 1.0.1 patch. The code is identical; the only difference is where the patch gets applied in terms of line numbers.
UPDATED: For some reason my preferred news aggregators (NewsFire
and NetNewsWire) have an unfortunate habit of taking example
HTML/XHTML code snippets (enclosed in a CODE element and escaped
using character entities) and interpreting them as actual tags. For
that reason I've updated this post to remove examples of empty element
syntax until I figure out exactly what's going on and can work around
the problem.
2005-01-08
Submit a comment
Please enter comments as plain text only; no HTML tags are allowed. All comments and trackbacks are moderated, and will not be displayed until approved by the moderator.
Comments are closed for this story.