A Bigram Hidden Markov Model Part-Of-Speech Tagger using Viterbi for Decoding
The HMM was trained on the WSJ corpus so sentences that are too far out of context from WSJ sentences may result in garbage output. The tokenizer splits the string based on space so punctuation may not be tagged properly either.