NLP - (Software| API )
NLP - (Software|API)
Articles Related
Plugin combo - Component related: Nothing was found.
List
Apache Nutch: open source web crawler (Nutch can crawl and post to Apache Solr for search/index.)
Apache Tika: detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF)
-
| Library | Language | Open Source | Note |
| NLTK | Python | Yes | |
| Gensim | Python | Yes | |
| spacy.io | Python | Yes | |
| ElasticSearch (Index and Search) | Java | Apache 2 | (based on Lucene) Guide, Crat (query / SQL layer on top of elasticsearch) |
| Solr (Index and Search) | Java | Apache 2 | (based on Lucene) Solr |
| Apache OpenNLP | Java | Yes | |
| Deepleaerning | Java, Scala | Yes | |
| Weka | Java | GPL | See https://github.com/fracpete/nlp-weka-package |
| Standford NLP | Java | GPL | Demo (Part of Speech, Named Entity Recognition, Coreference, Basic dependencies, Collapsed dependencies, Collapsed CC-processed dependencies) Github: http://stanfordnlp.github.io/CoreNLP/ Online Run: http://corenlp.run/ |
| LingPipe | Java | No | Topic Classification, Named Entity Recognition (NER), Sentiment Analysis, … |
| tm | R | Yes | |
| rWeka | R | Yes | rJava via JNI |
| openNLP | R | Yes | rJava via JNI |
| OCR Tesseract | | | |
| TweetNLP | Java | Yes | tokenizer, a part-of-speech tagger, hierarchical word clusters, and a dependency parser for tweets |
| Smile | Java | LGPL | Statistical Machine Intelligence and Learning Engine |
Oracle