問題描述
我一直在使用 (Java) Lucene 的熒光筆(在 Sandbox 包中)一段時間.但是,在匹配搜索結果中的正確詞時,這并不是很準確 - 它適用于簡單的查詢,例如搜索兩個單獨的詞會在結果中突出顯示兩個代碼片段.
I've been using the (Java) Highlighter for Lucene (in the Sandbox package) for some time. However, this isn't really very accurate when it comes to matching the correct terms in search results - it works well for simple queries, for example searching for two separate words will highlight both code fragments in the results.
但是,它不適用于更復雜的查詢.在最簡單的情況下,諸如Stack Overflow"之類的短語查詢將匹配突出顯示中出現的所有 Stack 或 Overflow,這會給用戶一種效果不佳的印象.
However, it doesn't act well with more complicated queries. In the simplest case, phrase queries such as "Stack Overflow" will match all occurrences of Stack or Overflow in the highlighting, which gives the impression to the user that it isn't working very well.
我嘗試在 here 應用修復程序,但它來了有很多性能警告,最終根本無法使用.性能尤其是通配符查詢的問題.這是由于突出顯示的工作方式;而不是只處理查詢字符串和文本,它會像 Lucene 那樣解析它,然后查找 Lucene 所做的所有匹配;不幸的是,這意味著對于某些通配符查詢,它可能會在大型文檔中查找 2000 多個子句的匹配項,而且速度還不夠快.
I tried applying the fix here but that came with a lot of performance caveats, and at the end of the day was just plain unusable. The performance is especially an issue on wildcard queries. This is due to the way that the highlighting works; instead of just working on the querystring and the text it parses it as Lucene would and then looks for all the matches that Lucene has made; unfortunately this means that for certain wildcard queries it can be looking for matches to 2000+ clauses on large documents, and it's simply not fast enough.
有沒有更快的實現準確的熒光筆?
Is there any faster implementation of an accurate highlighter?
推薦答案
有一個新的更快的熒光筆(需要修補,但將是 2.9 版本的一部分)
There is a new faster highlighter (needs to be patched in but will be part of release 2.9)
https://issues.apache.org/jira/browse/LUCENE-1522
還有一個回溯這個問題
這篇關于是否有適用于 Lucene 的快速、準確的熒光筆?的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!