問題描述
事情就是這樣.我有一個存儲在索引中的詞,其中包含特殊字符,例如'-',最簡單的代碼是這樣的:
Here is the thing. I have a term stored in the index, which contains special character, such as '-', the simplest code is like this:
Document doc = new Document();
doc.add(new TextField("message", "1111-2222-3333", Field.Store.YES, Field.Index.NOT_ANALYZED));
writer.addDocument(doc);
然后我使用 QueryParser 創建一個查詢,如下所示:
And then I create a query using QueryParser, like this:
String queryStr = "1111-2222-3333";
QueryParser parser = new QueryParser(Version.LUCENE_36, "message", new StandardAnalyzer(Version.LUCENE_36));
Query q = parser.parse(queryStr);
然后我使用搜索器搜索查詢并沒有得到任何結果.我也試過這個:
And then I use a searcher to search the query and get no result. I have also tried this:
Query q = parser.parse(QueryParser.escape(queryStr));
仍然沒有結果.
不使用 QueryParser 而是直接使用 TermQuery 可以做我想做的事,但是這種方式對于用戶輸入文本不夠靈活.
Without using QueryParser and instead using TermQuery directly can do what I want, but this way is not flexible enough for user input texts.
我想也許 StandardAnalyzer 做了一些事情來省略查詢字符串中的特殊字符.試了debug,發現字符串被拆分,實際查詢是這樣的:message:1111 message:2222 message:3333".不知道lucene到底做了什么……
I think maybe the StandardAnalyzer did something to omit the special character in the query string. I tried debug, and I found that the string is splited and the actual query is like this:"message:1111 message:2222 message:3333". I don't know what exactly lucene has done...
所以如果我想用特殊字符執行查詢,我應該怎么做?我應該重寫分析器還是從默認的繼承查詢分析器?以及如何?...
So if I want to perform the query with special character, what should I do? Should I rewrite an analyzer or inherit a queryparser from the default one? And how to?...
更新:
1 @The New Idiot @femtoRgon,我已經嘗試了問題中所述的 QueryParser.escape(queryStr),但它仍然不起作用.
1 @The New Idiot @femtoRgon, I've tried QueryParser.escape(queryStr) as stated in the problem but it still doesn't work.
2 我嘗試了另一種解決問題的方法.我從Tokenizer派生了一個QueryTokenizer,只用空格截取單詞,打包成一個QueryAnalyzer,它派生自Analyzer,最后將QueryAnalyzer傳遞給QueryParser.
2 I've tried another way to solve the problem. I derived a QueryTokenizer from Tokenizer and cut the word only by space, pack it into a QueryAnalyzer, which derives from Analyzer, and finally pass the QueryAnalyzer into QueryParser.
現在可以了.最初它不起作用,因為默認的 StandardAnalyzer 根據默認規則(將某些特殊字符識別為拆分器)切割 queryStr,當查詢傳遞到 QueryParser 時,特殊字符已經被 StandardAnalyzer 刪除.現在我使用我自己的方式來剪切 queryStr 并且它只將空格識別為分隔符,因此特殊字符保留在查詢中等待處理,這很有效.
Now it works. Originally it doesn't work because the default StandardAnalyzer cut the queryStr according to default rules(which recognize some of the special characters as splitters), when the query is passed into QueryParser, the special characters are already deleted by StandardAnalyzer. Now I use my own way to cut the queryStr and it only recognize space as splitter, so the special characters remain into the query waiting for processing and this works.
3 @The New Idiot @femtoRgon,感謝您回答我的問題.
3 @The New Idiot @femtoRgon, thank you for answering my question.
推薦答案
我不確定這個,但我猜你需要用 轉義
-
.根據 Lucene 文檔.
I am not sure about this , but I guess you need to escape -
with . As per the Lucene docs.
-"或禁止運算符排除包含-"之后的術語的文檔.符號.
The "-" or prohibit operator excludes documents that contain the term after the "-" symbol.
再次,
Lucene 支持對屬于查詢語法一部分的特殊字符進行轉義.當前列表特殊字符為
Lucene supports escaping special characters that are part of the query syntax. The current list special characters are
+ - &&||!( ) { } [ ] ^ "~ * ?: /
+ - && || ! ( ) { } [ ] ^ " ~ * ? : /
要轉義這些字符,請在字符前使用 .
To escape these character use the before the character.
另外請記住,如果某些字符在 Java 中具有特殊含義,則需要轉義兩次.
Also remember, some characters you'll need to escape twice if they have special meaning in Java.
這篇關于如何使用 QueryParser 執行包含特殊字符的 lucene 查詢?的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!