gse
Go efficient text segmentation; support english, chinese, japanese and other.
Go efficient text segmentation; support english, chinese, japanese and other.
This is a Go implementation of [jieba](https://github.com/fxsjy/jieba) which a Chinese word splitting algorithm.
Sentence tokenizer: converts text into a list of sentences.
Go library for performing Unicode Text Segmentation as described in [Unicode Standard Annex #29](https://www.unicode.org/reports/tr29/)
Go package for n-gram based text categorization, with support for utf-8 and raw text.
This is a GO implementation of [MMSEG](http://technology.chtsai.org/mmseg/) which a Chinese word splitting algorithm.
Stemmer packages for Go programming language. Includes English and German stemmers.
A tokenizer based on the dictionary and Bigram language models for Golang. (Now only support chinese segmentation)
The shamoji is word filtering package written in Go.