Stupid SGML tricks
Thursday, 15 January 2004 21:36LiveJournal's HTML cleaner uses a parser which can parse HTML (but also understands XHTML).
This means, for example, that you can leave off quotes on attribute values if they consist only of letters, numbers, hyphens, dots, underscores, and colons. This is the reason why <lj user=exampleusername> and <lj user="exampleusername"> both work to produce
exampleusername (though I'd prefer the latter)—the HTML cleaner doesn't see whether there are quotes or not and the output of the parser is the same in either case.
The fact that the cleaner uses an HTML parser also means that it does SGML-style attribute minimisation; if the value of an attribute in an HTML file is the same as its name, then you can leave off the value. For example, you can pre-check a checkbox in HTML with <input type=checkbox name=foo value=27 checked>, which would have to be <input type="checkbox" name="foo" value="27" checked="checked" /> in XHTML (which does not allow attribute minimisation).
And because the parser in the HTML cleaner does this, too, you can say <lj-cut text> (see it in action here: [this is cut]) and it'll be parsed the same as <lj-cut text="text"> :p Or even <lj user> which becomes
user (i.e. <lj user="user">). Funny.
no subject
Date: Thursday, 15 January 2004 14:35 (UTC)Anyway, you cannot rely on del being rendered as strike-through. Not only can users override this in their user (or browser) stylesheets, page authors can also override it, and there are user agents which use a different rendering.
I've seen a phone with limited web browsing (not WAP!), where text was always black on white, but del was rendered as red text (CSS: del {color: red}). INS on that browser was blue.
Similarily you cannot rely on em being rendered in italics, or strong being rendered boldfaced.
no subject
Date: Thursday, 15 January 2004 14:40 (UTC)this was struck out, this was bold, and this was italic. :)<strike style="text-decoration: none;">this was struck out</strike>, <b style="font-weight: normal;">this was bold</b>, and <i style="font-style: normal;">this was italic</i>
Sorry for the double comment, I didn't close a tag!
no subject
Date: Thursday, 15 January 2004 16:27 (UTC)EM, STRONG, DEL, INS etc. have a real meaning: even if they are all rendered the same as normal text their meaning is not lost.