[Haddock] rebuilding haddock docs
Mark Lentczner
markl at glyphic.com
Thu Aug 19 22:46:27 EDT 2010
On Aug 9, 2010, at 8:14 AM, Simon Marlow wrote:
> Do the anchors have to change, or would it be possible to make them compatible?
>
> In the past when the .haddock format changes we have tried to degrade gracefully, still producing documentation but without the features that were enabled by the format change.
Here's the deal: Anchors were broken in that they weren't really compliant. The various hacks with escaping in links, and double-anchoring were really just hacks to work around this. The details of how anchor ids should be constructed are in a comment in in Haddock.Utils:
-------------------------------------------------------------------------------
-- * Anchor and URL utilities
--
-- NB: Anchor IDs, used as the destination of a link within a document must
-- conform to XML's NAME production. That, taken with XHTML and HTML 4.01's
-- various needs and compatibility constraints, means these IDs have to match:
-- [A-Za-z][A-Za-z0-9:_.-]*
-- Such IDs do not need to be escaped in any way when used as the fragment part
-- of a URL. Indeed, %-escaping them can lead to compatibility issues as it
-- isn't clear if such fragment identifiers should, or should not be unescaped
-- before being matched with IDs in the target document.
-------------------------------------------------------------------------------
We can compare how the old code treats anchors with the new by looking at three representative functions from Data.Map: !, insertWith, and insertWith':
-- old links have hrefs ending in the fragment --
v%3A%21
v%3AinsertWith
v%3AinsertWith%27
-- old anchor points have two nested(!) A elements with these names --
v%3A%21 v:!
v%3AinsertWith v:insertWith
v%3AinsertWith%27 v:insertWith'
-- new links and anchor points use these --
v:-33-
v:insertWith
v:insertWith-39-
Thanks to the ambiguity in the specs over the years about % escaping and fragments, most browsers are eager to try anything to get a link to work. For most identifiers, this works in our favor, and old style links will find new style anchors; and new style links will find old style anchors, since they had two, one in new(ish) form. For identifiers with non-ASCII alphanumerics, all bets are off, since the new escaping mechanism is necessarily different.
I think this represents an acceptable degradation path, if not completely graceful.
- Mark
Mark Lentczner
http://www.ozonehouse.com/mark/
mark at glyphic.com
More information about the Haddock
mailing list