問題描述
我在 Lucene 5.0 中對字符串字段進行排序時遇到問題.顯然,自 Lucene 4 以來您可以進行排序的方式已經改變.下面顯示了一些正在為我的文檔編制索引的字段的片段.
I'm having issues sorting on string fields in Lucene 5.0. Apparantly the way you could sort since Lucene 4 has changed. Below shows a snippet of some of the fields that are being index for my documents.
@Override
public Document generateDocument(Process entity)
{
Document doc = new Document();
doc.add(new IntField(id, entity.getID(), Field.Store.YES));
doc.add(new TextField(title, entity.getProcessName(), Field.Store.YES));
doc.add(new IntField(organizationID, entity.getOrganizationID(), Field.Store.YES));
doc.add(new StringField(versionDate, DateTools.dateToString(entity.getVersionDate(), DateTools.Resolution.SECOND), Field.Store.YES));
doc.add(new LongField(entityDate, entity.getVersionDate().getTime(), Field.Store.YES));
return doc;
}
我想先對相關性進行排序,這很好用.我遇到的問題是標題字段上的排序不起作用.我創建了一個排序字段,我試圖在一系列方法調用之后與 TopFieldCollector 一起使用.
I would like to sort on relevance first, which works just fine. The issue I have is that sorting on the title field doesn't work. I've created a sortfield which i'm trying to use with a TopFieldCollector after a chain of method calls.
public BaseSearchCore<Process, ProcessSearchResultScore>.SearchContainer search(String searchQuery, Filter filter, int page, int hitsPerPage) throws IOException, ParseException
{
SortField titleSort = new SortField(title, SortField.Type.STRING, true);
return super.search(searchQuery, filter, page, hitsPerPage, title);
}
去往:
public SearchContainer search(String searchQuery, Filter filter, int page, int hitsPerPage, SortField... sortfields) throws IOException, ParseException
{
Query query = getQuery(searchQuery);
TopFieldCollector paginate = getCollector(sortfields);
int startIndex = (page -1) * hitsPerPage;
ScoreDoc[] hits = executeSearch(query, paginate, filter, startIndex, hitsPerPage);
return collectResults(query, filter, hitsPerPage, hits, page);
}
最后是應用排序字段的方法:
And finally to the method that applies the sort field:
private TopFieldCollector getCollector(SortField sortfield) throws IOException
{
SortField[] sortFields = new SortField[] {SortField.FIELD_SCORE, sortField};
Sort sorter = new Sort(sortFields);
TopFieldCollector collector = TopFieldCollector.create(sorter, 25000, true, false, true);
return collector;
}
使用返回的收集器執行常規查詢,并返回結果.但是,如果我嘗試使用這個 SortField 進行排序,我會得到這個異常:
Using the returned collector a regular query is performed, and a result is returned. However, if I try to sort with this SortField i'll get this exception:
java.lang.IllegalStateException:字段標題"的意外文檔值類型 NONE(預期 = SORTED).使用 UninvertingReader 或 index with docvalues.
java.lang.IllegalStateException: unexpected docvalues type NONE for field 'title' (expected=SORTED). Use UninvertingReader or index with docvalues.
我應該如何索引一個字符串字段以便能夠在 Lucene 5 中按字母順序(使用排序字段)對其進行排序?任何代碼示例或片段都會非常有用.
How am I supposed to index a string field to be able to sort it alphabetically(using sortfields) in Lucene 5? Any code examples or snippets would be much appriciated.
按相關性搜索效果很好,但是當用戶輸入空搜索查詢時,所有結果都具有相同的相關性.對于這些查詢,我寧愿按結果標題排序,這會在這次 Lucene 迭代中引起問題.
Searching by relevancy works just fine, but when users enter empty search queries all the results have the same relevancy. With those queries I'd rather sort by the results titles, which is causing issues in this iteration of Lucene.
推薦答案
注意:如果您嘗試將其歸結為最小的錯誤,則更容易找出錯誤(對于您自己和您所詢問的人)你可以先舉個例子.與其對您的體系結構和我無權訪問或不了解的類等進行分類,我將解決以下問題:
A note: It's way easier to figure out bugs (both for yourself and for the people you're asking) if you try to boil it down to the smallest example that you can first. Rather than sort through your architecture, and classes I don't have access to or know anything about, and such, I'll be addressing the problem as reproduced by this:
Sort sort = new Sort(new SortField("title", SortField.Type.STRING));
TopDocs docs = searcher.search(new TermQuery(new Term("title", "something")), 10, sort);
title 的定義類似于:
Where title is defined something like:
doc.add(new TextField("title", term, Field.Store.YES));
這里對字段進行排序的最佳方法可能是采納關于 docvalues 的建議.將 DocValues 添加到字段本質上是對其進行索引以進行排序,據我了解,它比 Lucene 4.X 中的典型排序方法更有效.將典型的 TextField
和 SortedDocValuesField
添加到同一個字段(名稱)似乎效果很好,并且支持使用相同的字段名稱進行搜索和排序:
The best approach to sorting fields here is probably going to be to take the advice on docvalues. Adding DocValues to the field is essentially indexing it for sorting, and is much more efficient the typical sorting method in Lucene 4.X, as I understand it. Adding both the typical TextField
and the SortedDocValuesField
to the same field (name) seems to work rather well, and supports both searching and sorting with the same field name:
doc.add(new TextField("title", term, Field.Store.YES));
doc.add(new SortedDocValuesField("title", new BytesRef(term)));
這篇關于在 Lucene 5.0 中按字母順序排序字符串字段的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!