High-precision arithmetic for calculating N-gram probabilities
wren ng thornton
wren at freegeek.org
Sun Oct 16 01:40:42 BST 2011
On 10/13/11 2:40 PM, Dan Maftei wrote:
> Yeah, log sums are actually a necessity when calculating perplexity. :)
>
> If I ever get around to profiling Rationals vs. Doubles I'll let people know
> what I found. But upon reflection, I see that they shouldn't make a
> significant difference in performance.
The big performance problem with Rationals is that they're stored in
normal form. That means the implementation has to run gcd every time you
manipulate them. If you're really interested in performance, then you'd
want to try working with unnormalized ratios rather than using the
Rational type. Of course, then you have to be careful about not letting
the numbers get too large, otherwise the cost of working with full
Integers will be the new performance sinkhole.
--
Live well,
~wren
More information about the NLP
mailing list