ANNOUNCE: brillig 0.3 - not quite the Brill tagger
Eric Y. Kow
eric.kow at gmail.com
Sat Sep 3 09:32:56 BST 2011
Dear Haskell NLPers,
I like to announce the availability of brillig (on Hackage), which aims
to be, but falls short of, a Brill tagger implementation. You can if
get it by running
cabal update
cabal install brillig
For now you may also want to get the unstable version
darcs get --lazy http://darcsden.com/kowey/brillig
This is largely a seed-planting exercise, naive implementations of
simple algorithms so that we have something rather than nothing
(see also fullstop, a sentence segmenter). Is there a Haskell NLTK?
No, but...
The good news:
- comes with (hopefully) easy quick start instructions
- improves accuracy over unigram baseline by 1%
- available in library form with a permissive Free software license (BSD3)
The bad news:
- only implements templates one tag back. Needs to be generalised
- not actually in use for anything.
I mostly wrote this for the hell of it.
New maintainer welcome!
- top accuracy on a random tenth held out from the Brown corpus is
87.4%... the unigram baseline reported in Jurafsky and Martin's
textbook is something like 91%. Oops.
For what it's worth, my unigram baseline is 86.4% which is why I'm
reporting a 1% improvement). Hopefully, this is due to different
data sets...
Anyway, it's out there. I hope somebody can run with it.
Have fun! Go make it better!
Eric
PS. The toves, they are slithey
--
Eric Kow <http://erickow.com>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <http://projects.haskell.org/pipermail/nlp/attachments/20110903/e956f1ef/attachment.pgp>
More information about the NLP
mailing list