問題描述
我正在一個 Java 應用程序中處理一些英文文本,我需要對它們進行詞干處理.例如,從文本amenities/amenity"我需要得到amenit".
I'm processing some English texts in a Java application, and I need to stem them. For example, from the text "amenities/amenity" I need to get "amenit".
函數如下:
String stemTerm(String term){
...
}
我找到了 Lucene Analyzer,但它看起來太復雜了,無法滿足我的需要.http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/analysis/PorterStemFilter.html
I've found the Lucene Analyzer, but it looks way too complicated for what I need. http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/analysis/PorterStemFilter.html
有沒有辦法在不構建分析器的情況下使用它來詞干?我不了解所有 Analyzer 業務...
Is there a way to use it to stem words without building an Analyzer? I don't understand all the Analyzer business...
編輯:我實際上需要詞干提取+詞形還原.Lucene 可以做到這一點嗎?
EDIT: I actually need a stemming + lemmatization. Can Lucene do this?
推薦答案
import org.apache.lucene.analysis.PorterStemmer;
...
String stemTerm (String term) {
PorterStemmer stemmer = new PorterStemmer();
return stemmer.stem(term);
}
參見這里 了解更多詳情.如果您只想使用詞干提取,那么您應該使用 this 而不是 Lucene.
See here for more details. If stemming is all you want to do, then you should use this instead of Lucene.
您應該在將 term
傳遞給 stem()
之前將其小寫.
You should lowercase term
before passing it to stem()
.
這篇關于使用 Lucene 提取英語單詞的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!