問題描述
Lucene 的手冊(cè)中已經(jīng)清楚地解釋了鄰近搜索的含義,其中包含兩個(gè)單詞,例如 "jakarta apache"~10
中的示例http://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Proximity Searches
Lucene's manual has explained the meaning of proximity search for a phrase with two words clearly, such as the "jakarta apache"~10
example in
http://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Proximity Searches
但是,我想知道像 "jakarta apache lucene"~10
這樣的搜索到底是做什么的?它是否允許相鄰的單詞最多相隔 10 個(gè)單詞,或者所有成對(duì)的單詞都是這樣?
However, I am wondering what does a search like "jakarta apache lucene"~10
exactly do? Does it allow neighboring words to be at most 10 words apart, or all pairs of words to be that?
謝謝!
推薦答案
slop (proximity) 就像編輯距離一樣工作(參見 PhraseQuery.setSlop
).因此,這些條款可以重新排序或添加額外的條款.這意味著接近度將是添加到整個(gè)查詢中的最大術(shù)語數(shù).那就是:
The slop (proximity) works like an edit distance (see PhraseQuery.setSlop
). So, the terms could be reordered or have extra terms added. This means that the proximity would be the maximum number of terms added into the whole query. That is:
"jakarta apache lucene"~3
將匹配:
- jakarta lucene apache"(距離:2)
- "jakarta extra words here apache lucene"(距離:3)
- jakarta 一些詞 apache 分隔 lucene"(距離:3)
但不是:
- lucene jakarta apache"(距離:4)
- "jakarta too many extra words here apache lucene"(距離:5)
- jakarta 一些話apache進(jìn)一步分隔lucene"(距離:4)
有些人被以下的困惑:
lucene jakarta apache"(距離:4)
"lucene jakarta apache" (distance: 4)
簡(jiǎn)單的解釋是交換術(shù)語需要兩次編輯,所以:
The simple explanation is that swapping terms takes two edits, so:
- jakarta apache lucene(距離:0)
- jakarta lucene apache(第一次交換,距離:2)
- lucene jakarta apache(第二次交換,距離:4)
更長(zhǎng)但更準(zhǔn)確的解釋是,每次編輯都允許將術(shù)語移動(dòng)一個(gè)位置.交換的第一步將兩個(gè)術(shù)語相互交換.牢記這一點(diǎn)解釋了為什么任何三個(gè)術(shù)語的集合都可以重新排列成距離不大于 4 的任何順序.
The longer, but more accurate, explanation is that every edit allows a term to be moved by one position. The first move of a swap transposes two terms on top of each other. Keeping this in mind explains why any set of three terms can be rearranged into any order with distance no greater than 4.
- jakarta apache lucene(距離:0)
- jakarta [apache,lucene](距離:1)
- [jakarta,apache,lucene](都轉(zhuǎn)置在同一個(gè)位置,距離:2)
- lucene [jakarta,apache](距離:3)
- lucene jakarta apache(距離:4)
這篇關(guān)于Lucene Proximity 搜索超過兩個(gè)詞的短語的文章就介紹到這了,希望我們推薦的答案對(duì)大家有所幫助,也希望大家多多支持html5模板網(wǎng)!