精品一区久久,久久99国产精品久久99大师,成人伊人

本文介紹了lucene 中的高光性能非常慢的處理方法，對大家解決問題具有一定的參考價值，需要的朋友們下面隨著小編來一起學(xué)習(xí)吧！

問題描述

Lucene (4.6) 熒光筆在搜索常用詞時性能非常慢.搜索速度很快(100 毫秒)，但突出顯示可能需要一個多小時(！).

Lucene (4.6) highlighter has very slow performance, when a frequent term is searched. Search is fast (100ms), but highlight may take more than an hour(!).

詳細(xì)信息: 使用了很棒的文本語料庫(1.5GB 純文本).性能不取決于文本是否被分成更多的小塊.(也用 500MB 和 5MB 塊進(jìn)行了測試.)存儲位置和偏移量.如果搜索一個非常頻繁的術(shù)語或模式，TopDocs 檢索速度很快(100 毫秒)，但每個searcher.doc(id)"調(diào)用都很昂貴(5-50 秒)，getBestFragments() 非常昂貴(超過 1 小時).甚至它們也為此目的被存儲和索引.(硬件:core i7、8GM mem)

Details: great text corpus was used (1.5GB plain text). Performance doesn't depend if text is splitted into more small pieces or not. (Tested with 500MB and 5MB pieces as well.) Positions and offsets are stored. If a very frequent term or pattern is searched, TopDocs are retrieved fast (100ms), but each "searcher.doc(id)" calls are expensive (5-50s), and getBestFragments() are extremely expensive (more than 1 hour). Even they are stored and indexed for this purpose. (hardware: core i7, 8GM mem)

更大的背景:它將服務(wù)于語言分析研究.使用了一種特殊的詞干提取:它也存儲詞性信息.例如，如果 "adj adj adj adj noun" 被搜索，它會給出它在文本中出現(xiàn)的所有內(nèi)容.

Greater background: it would serve a language analysis research. A special stemming is used: it stores the part of speech info, too. For example if "adj adj adj adj noun" is searched, it gives all its occurrences in the text with context.

我可以調(diào)整它的性能，還是應(yīng)該選擇其他工具?

使用代碼:

            //indexing
            FieldType offsetsType = new FieldType(TextField.TYPE_STORED);
            offsetsType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);

            offsetsType.setStored(true);
            offsetsType.setIndexed(true);
            offsetsType.setStoreTermVectors(true);
            offsetsType.setStoreTermVectorOffsets(true);
            offsetsType.setStoreTermVectorPositions(true);
            offsetsType.setStoreTermVectorPayloads(true);


            doc.add(new Field("content", fileContent, offsetsType));


            //quering
            TopDocs results = searcher.search(query, limitStart+limit);

            int endPos = Math.min(results.scoreDocs.length, limitStart+limit);
            int startPos = Math.min(results.scoreDocs.length, limitStart);

            for (int i = startPos; i < endPos; i++) {
                int id = results.scoreDocs[i].doc;

                // bottleneck #1 (5-50s):
                Document doc = searcher.doc(id);

                FastVectorHighlighter h = new FastVectorHighlighter();

                // bottleneck #2 (more than 1 hour):   
                String[] hs = h.getBestFragments(h.getFieldQuery(query), m, id, "content", contextSize, 10000);

相關(guān)(未回答)問題:https://stackoverflow.com/questions/19416804/very-slow-solr-performance-when-highlighting

久久久久久久av_日韩在线中文_看一级毛片视频_日本精品二区_成人深夜福利视频_武道仙尊动漫在线观看

lucene 中的高光性能非常慢

問題描述

推薦答案

相關(guān)文檔推薦