multithreading - How to use Apache OpenNLP in a node.js application -
what best way utilize apache open nlp node.js?
specifically, want utilize name entity extraction api. here says - documentation terrible (new project, think):
http://opennlp.apache.org/documentation/manual/opennlp.html#tools.namefind
from docs:
to utilize name finder in production scheme recommended embed straight application instead of using command line interface. first name finder model must loaded memory disk or other source. in sample below loaded disk.
inputstream modelin = new fileinputstream("en-ner-person.bin"); seek { tokennamefindermodel model = new tokennamefindermodel(modelin); } grab (ioexception e) { e.printstacktrace(); } { if (modelin != null) { seek { modelin.close(); } grab (ioexception e) { } } }
there number of reasons why model loading can fail:
issues underlying i/o
the version of model not compatible opennlp version
the model loaded wrong component, illustration tokenizer model loaded tokennamefindermodel class.
the model content not valid other reason
after model loaded namefinderme can instantiated.
namefinderme namefinder = new namefinderme(model);
the initialization finished , name finder can used. namefinderme class not thread safe, must called 1 thread. utilize multiple threads multiple namefinderme instances sharing same model instance can created. input text should segmented documents, sentences , tokens. perform entity detection application calls find method every sentence in document. after every document clearadaptivedata must called clear adaptive info in feature generators. not calling clearadaptivedata can lead sharp drop in detection rate after few documents. next code illustrates that:
for (string document[][] : documents) { (string[] sentence : document) { span namespans[] = find(sentence); // names } namefinder.clearadaptivedata() } next snippet shows phone call find string sentence = new string[]{ "pierre", "vinken", "is", "61", "years" "old", "." }; span namespans[] = namefinder.find(sentence);
the namespans arrays contains 1 span marks name pierre vinken. elements between begin , end offsets name tokens. in case begin offset 0 , end offset 2. span object knows type of entity. in case person (defined model). can retrieved phone call span.gettype(). additionally statistical name finder, opennlp offers dictionary , regular look name finder implementation.
checkout nodejs library. https://github.com/mbejda/node-opennlp https://www.npmjs.com/package/opennlp
just npm install opennlp
and @ examples on github.
var namefinder = new opennlp().namefinder; namefinder.find(sentence, function(err, results) { console.log(results) });
multithreading node.js apache architecture opennlp
No comments:
Post a Comment