全部搜尋項
buster  ] [  bullseye  ] [  bookworm  ] [  trixie  ] [  sid  ]
[ 原始碼: golang-gopkg-neurosnap-sentences.v1  ]

套件:golang-gopkg-neurosnap-sentences.v1-dev(1.0.6-1.1)

golang-gopkg-neurosnap-sentences.v1-dev 的相關連結

Screenshot

Debian 的資源:

下載原始碼套件 golang-gopkg-neurosnap-sentences.v1

維護小組:

外部的資源:

相似套件:

Sentence tokenizer for Go

A golang package that converts a blob of text into a list of sentences.

This package attempts to support a multitude of languages: Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Italian, Norwegian, Polish, Portuguese, Slovene, Spanish, Swedish, and Turkish.

An unsupervised multilingual sentence boundary detection library for golang. The goal of this library is to be able to break up any text into a list of sentences in multiple languages. The way the punkt system accomplishes this goal is through training the tokenizer with text in that given language. Once the likelihoods of abbreviations, collocations, and sentence starters are determined, finding sentence boundaries becomes easier.

There are many problems that arise when tokenizing text into sentences, the primary issue being abbreviations. The punkt system attempts to determine whether a word is an abbreviation, an end to a sentence, or even both through training the system with text in the given language. The punkt system incorporates both token- and type-based analysis on the text through two different phases of annotation.

Original research article: http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=BAE5C34E5C3B9DC60DFC4D93B85D8BB1?doi=10.1.1.85.5017&rep=rep1&type=pdf

下載 golang-gopkg-neurosnap-sentences.v1-dev

下載可用於所有硬體架構的
硬體架構 套件大小 安裝後大小 檔案
all 3,840。7 kB19,589。0 kB [檔案列表]