全部搜尋項
wheezy  ] [  jessie  ] [  sid  ]
[ 原始碼: ucto  ]

套件: ucto (0.5.2-2)

ucto 的相關超連結

Screenshot

Debian 的資源:

下載原始碼套件 ucto

維護小組:

外部的資源:

相似套件:

Unicode Tokenizer

Ucto can tokenize UTF-8 encoded text files (i.e. separate words from punctuation, split sentences, generate n-grams), and offers several other basic preprocessing steps (change case, count words/characters and reverse lines) that make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation.

Ucto is a product of the ILK Research Group, Tilburg University (The Netherlands).

If you are interested in machine parsing of UTF-8 encoded text files, e.g. to do scientific research in natural language processing, ucto will likely be of use to you.

標籤: 實做語言: C++, 角色: 程式

其他與 ucto 有關的套件

  • 依賴
  • 推薦
  • 建議
  • enhances

下載 ucto

下載可用於所有硬體架構的
硬體架構 套件大小 安裝後大小 檔案
amd64 36.7 kB153.0 kB [文件列表]