[Haddock] [haddock] #191: Incorrect handling of character references

Mon Jan 9 03:47:07 GMT 2012

#191: Incorrect handling of character references
---------------------+------------------------------------------------------
Reporter:  selinger  |       Owner:        
    Type:  defect    |      Status:  new   
Priority:  minor     |   Milestone:  2.10.0
 Version:  2.9.4     |    Keywords:        
---------------------+------------------------------------------------------
 In Haddock, a character reference such as &#252; is used to represent non-
 ASCII characters, such as the German umlaut "u.

 However, this does not work in the following situations:

 * if the character appears in italics,
 * if the character appears in a code block with ">",
 * if the character appears in a URL.

 Moreover, if such a character appears in a Haskell identifier between
 single quotes, the character is rendered correctly, but the word is not
 recognized as a Haskell identifier (and therefore the surrounding quotes
 are copied to the output and the identifier not linked).

 See the attached file for examples.

 Here are some comments on how I think it could be fixed. In my opinion,
 the best way to handle the &#252; syntax would be to treat it as an input
 encoding, i.e., handle it at the I/O level, before any lexing and parsing
 is done by Haddock proper. In other words, the sequence &#252; should be
 treated as if it were a single character literally present in the input
 file.

 If it were done this way, then one could use the &#252; in *every*
 context, and one could even use escapes to represent actual ASCII
 characters, for example, &#38; to represent a literal "&". Thus, if the
 sequence of 6 characters &#252; had to appear literally in a comment, one
 could type it as &#38;#252; - although &\#252; would achieve the same
 result in a simpler way.

-- 
Ticket URL: <http://trac.haskell.org/haddock/ticket/191>
haddock <http://www.haskell.org/haddock>
Haddock, The Haskell Documentation Tool