問題描述
我正在嘗試使用 Lucene Java 2.3.2 來實現對產品目錄的搜索.除了產品的常規字段外,還有一個名為類別"的字段.一個產品可以屬于多個類別.目前,我使用 FilteredQuery 在每個類別中搜索相同的搜索詞,以獲取每個類別的結果數.
I am trying to use Lucene Java 2.3.2 to implement search on a catalog of products. Apart from the regular fields for a product, there is field called 'Category'. A product can fall in multiple categories. Currently, I use FilteredQuery to search for the same search term with every Category to get the number of results per category.
這會導致每個查詢進行 20-30 次內部搜索調用以顯示結果.這大大減慢了搜索速度.有沒有更快的方法使用 Lucene 實現相同的結果?
This results in 20-30 internal search calls per query to display the results. This is slowing down the search considerably. Is there a faster way of achieving the same result using Lucene?
推薦答案
這就是我所做的,雖然它有點占用內存:
Here's what I did, though it's a bit heavy on memory:
你需要的是提前創建一堆 BitSet
s,每個類別一個,包含一個類別中所有文檔的doc id.現在,在搜索時,您使用 HitCollector 并對照 BitSet 檢查文檔 ID.
What you need is to create in advance a bunch of BitSet
s, one for each category, containing the doc id of all the documents in a category. Now, on search time you use a HitCollector and check the doc ids against the BitSets.
以下是創建位集的代碼:
Here's the code to create the bit sets:
public BitSet[] getBitSets(IndexSearcher indexSearcher,
Category[] categories) {
BitSet[] bitSets = new BitSet[categories.length];
for(int i=0; i<categories.length; i++)
{
Query query = categories[i].getQuery();
final BitSet bitset = new BitSet()
indexSearcher.search(query, new HitCollector() {
public void collect(int doc, float score) {
bitSet.set(doc);
}
});
bitSets[i] = bitSet;
}
return bitSets;
}
這只是一種方法.您可能會使用 TermDocs 如果您的類別足夠簡單,則不要運行完整搜索,但無論如何,這應該只在您加載索引時運行一次.
This is just one way to do this. You could probably use TermDocs instead of running a full search if your categories are simple enough, but this should only run once when you load the index anyway.
現在,當需要計算搜索結果的類別時,您可以這樣做:
Now, when it's time to count categories of search results you do this:
public int[] getCategroryCount(IndexSearcher indexSearcher,
Query query,
final BitSet[] bitSets) {
final int[] count = new int[bitSets.length];
indexSearcher.search(query, new HitCollector() {
public void collect(int doc, float score) {
for(int i=0; i<bitSets.length; i++) {
if(bitSets[i].get(doc)) count[i]++;
}
}
});
return count;
}
最終得到的是一個數組,其中包含搜索結果中每個類別的計數.如果您還需要搜索結果,則應將 TopDocCollector 添加到您的命中收集器(yo dawg ...).或者,您可以再次運行搜索.2 次搜索優于 30 次.
What you end up with is an array containing the count of every category within the search results. If you also need the search results, you should add a TopDocCollector to your hit collector (yo dawg...). Or, you could just run the search again. 2 searches are better than 30.
這篇關于使用 Lucene 統計分類結果的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!