High-precision arithmetic for calculating N-gram probabilities
wren ng thornton
wren at freegeek.org
Sun Oct 16 01:36:45 BST 2011
On 10/13/11 12:34 PM, David Hall wrote:
> Also, you'll probably want to
> use log(double) and operate using + as times and logSum (== a + log(1
> + exp(b-a)) as sum. This will avoid underflow.
Actually, for (a + log(1 + exp(b - a))) you want to first ensure that a
>= b. Otherwise you'd want to do (b + log(1 + exp(a - b)))
Also, there are more accurate implementations of log(1+_) and (exp _-1)
than the naive versions which become extremely inaccurate for small
arguments.
If you want to deal with probabilities in the log-domain, the logfloat
package[1] will handle all these details for you. In addition, it
defines things using newtypes so that you get type safety instead of
mixing up which things are in the log-domain and which are in the
normal-domain; and it provides a Num instance so that you can say what
you mean instead of what the implementation is doing. In fact, the
package was specifically designed for NLP tasks needing to do this sort
of thing. While the primary goal is to maximize precision, the secondary
goal is to maximize performance within those constraints; and the
implementation has been tested thoroughly on these criteria.
The only caveat is that the performance tuning was done a while ago,
back when GHC had horrible code generation for doubles, and so it uses
--fvia-C which has since been deprecated. I've yet to install GHC7 in
order to re-tune for the new backends.
[1] http://hackage.haskell.org/package/logfloat
--
Live well,
~wren
More information about the NLP
mailing list