Skip to main content

edu.stanford.nlp.parser.lexparser (Stanford JavaNLP API) - The Diigo Meta page

tides.umiacs.umd.edu/...package-summary.html - Cached

Share This

Bookmarking History
Comments (0)

This link has been bookmarked by 1 people . It was first bookmarked on 08 Jun 2009, by harry.

08 Jun 09

harry
- For Chinese, the package includes two simple word segmenters. One is a lexicon-based maximum match segmenter, and the other uses the parser to do Hidden Markov Model-based word segmentation. These segmentation methods are okay, but if you would like a high quality segmentation of Chinese text, you will have to segment the Chinese by yourself as a preprocessing step. The supplied grammars assume that Chinese input has already been word-segmented according to Penn Chinese Treebank conventions. Choosing Chinese with -tLPP edu.stanford.nlp.parser.lexparser.ChineseTreebankParserParams makes space-separated words the default tokenization. To do word segmentation within the parser, give one of the options -segmentMarkov or -segmentMaxMatch.

Would you like to comment?

Join Diigo for a free account, or sign in if you are already a member.

Other bookmarks from the site tides.umiacs.umd.edu »

Check out another URL