久久久久久久av_日韩在线中文_看一级毛片视频_日本精品二区_成人深夜福利视频_武道仙尊动漫在线观看

  1. <tfoot id='98FV1'></tfoot>

      • <bdo id='98FV1'></bdo><ul id='98FV1'></ul>

      <small id='98FV1'></small><noframes id='98FV1'>

    1. <i id='98FV1'><tr id='98FV1'><dt id='98FV1'><q id='98FV1'><span id='98FV1'><b id='98FV1'><form id='98FV1'><ins id='98FV1'></ins><ul id='98FV1'></ul><sub id='98FV1'></sub></form><legend id='98FV1'></legend><bdo id='98FV1'><pre id='98FV1'><center id='98FV1'></center></pre></bdo></b><th id='98FV1'></th></span></q></dt></tr></i><div class="qwawimqqmiuu" id='98FV1'><tfoot id='98FV1'></tfoot><dl id='98FV1'><fieldset id='98FV1'></fieldset></dl></div>

      <legend id='98FV1'><style id='98FV1'><dir id='98FV1'><q id='98FV1'></q></dir></style></legend>

      如何在 Lucene 3.5.0 中提取文檔術(shù)語向量

      How to extract Document Term Vector in Lucene 3.5.0(如何在 Lucene 3.5.0 中提取文檔術(shù)語向量)
      <i id='0jv2e'><tr id='0jv2e'><dt id='0jv2e'><q id='0jv2e'><span id='0jv2e'><b id='0jv2e'><form id='0jv2e'><ins id='0jv2e'></ins><ul id='0jv2e'></ul><sub id='0jv2e'></sub></form><legend id='0jv2e'></legend><bdo id='0jv2e'><pre id='0jv2e'><center id='0jv2e'></center></pre></bdo></b><th id='0jv2e'></th></span></q></dt></tr></i><div class="qwawimqqmiuu" id='0jv2e'><tfoot id='0jv2e'></tfoot><dl id='0jv2e'><fieldset id='0jv2e'></fieldset></dl></div>

          <bdo id='0jv2e'></bdo><ul id='0jv2e'></ul>
              <tfoot id='0jv2e'></tfoot>
                <legend id='0jv2e'><style id='0jv2e'><dir id='0jv2e'><q id='0jv2e'></q></dir></style></legend>
              1. <small id='0jv2e'></small><noframes id='0jv2e'>

                  <tbody id='0jv2e'></tbody>
              2. 本文介紹了如何在 Lucene 3.5.0 中提取文檔術(shù)語向量的處理方法,對大家解決問題具有一定的參考價(jià)值,需要的朋友們下面隨著小編來一起學(xué)習(xí)吧!

                問題描述

                我正在使用 Lucene 3.5.0,我想輸出每個(gè)文檔的術(shù)語向量.例如,我想知道一個(gè)詞在所有文檔和每個(gè)特定文檔中的頻率.我的索引代碼是:

                I am using Lucene 3.5.0 and I want to output term vectors of each document. For example I want to know the frequency of a term in all documents and in each specific document. My indexing code is:

                import java.io.FileFilter;
                import java.io.FileReader;
                import java.io.IOException;
                
                import java.io.File;
                import java.io.FileReader;
                import java.io.BufferedReader;
                
                import org.apache.lucene.index.IndexWriter;
                import org.apache.lucene.document.Field;
                import org.apache.lucene.document.Document;
                import org.apache.lucene.store.RAMDirectory;
                import org.apache.lucene.analysis.standard.StandardAnalyzer;
                import org.apache.lucene.store.Directory;
                import org.apache.lucene.store.FSDirectory;
                import org.apache.lucene.util.Version;
                
                public class Indexer {
                public static void main(String[] args) throws Exception {
                        if (args.length != 2) {
                        throw new IllegalArgumentException("Usage: java " + Indexer.class.getName() + " <index dir> <data dir>");
                    }
                
                    String indexDir = args[0];
                    String dataDir = args[1];
                    long start = System.currentTimeMillis();
                    Indexer indexer = new Indexer(indexDir);
                    int numIndexed;
                    try {
                        numIndexed = indexer.index(dataDir, new TextFilesFilter());
                    } finally {
                        indexer.close();
                    }
                    long end = System.currentTimeMillis();
                    System.out.println("Indexing " + numIndexed + " files took " + (end - start) + " milliseconds");
                }
                
                private IndexWriter writer;
                
                public Indexer(String indexDir) throws IOException {
                    Directory dir = FSDirectory.open(new File(indexDir));
                    writer = new IndexWriter(dir,
                        new StandardAnalyzer(Version.LUCENE_35),
                        true,
                        IndexWriter.MaxFieldLength.UNLIMITED);
                }
                
                public void close() throws IOException {
                    writer.close();
                }
                
                public int index(String dataDir, FileFilter filter) throws Exception {
                    File[] files = new File(dataDir).listFiles();
                    for (File f: files) {
                        if (!f.isDirectory() &&
                        !f.isHidden() &&
                        f.exists() &&
                        f.canRead() &&
                        (filter == null || filter.accept(f))) {
                            BufferedReader inputStream = new BufferedReader(new FileReader(f.getName()));
                            String url = inputStream.readLine();
                            inputStream.close();
                            indexFile(f, url);
                        }
                    }
                    return writer.numDocs();
                }
                
                private static class TextFilesFilter implements FileFilter {
                    public boolean accept(File path) {
                        return path.getName().toLowerCase().endsWith(".txt");
                    }
                }
                
                protected Document getDocument(File f, String url) throws Exception {
                    Document doc = new Document();
                    doc.add(new Field("contents", new FileReader(f)));
                    doc.add(new Field("urls", url, Field.Store.YES, Field.Index.NOT_ANALYZED));
                    doc.add(new Field("filename", f.getName(), Field.Store.YES, Field.Index.NOT_ANALYZED));
                    doc.add(new Field("fullpath", f.getCanonicalPath(), Field.Store.YES, Field.Index.NOT_ANALYZED));
                    return doc;
                }
                
                private void indexFile(File f, String url) throws Exception {
                    System.out.println("Indexing " + f.getCanonicalPath());
                    Document doc = getDocument(f, url);
                    writer.addDocument(doc);
                }
                }
                

                誰能幫我寫一個(gè)程序來做到這一點(diǎn)?謝謝.

                can anybody help me in writing a program to do that? thanks.

                推薦答案

                首先,你不需要為了只知道詞在文檔中出現(xiàn)的頻率而存儲詞向量.盡管如此,Lucene 還是存儲了這些數(shù)字以用于 TF-IDF 計(jì)算.您可以通過調(diào)用 IndexReader.termDocs(term) 并遍歷結(jié)果來訪問此信息.

                First of all, you don't need to store term vectors in order to know solely the frequency of term in documents. Lucene stores these numbers nevertheless to use in TF-IDF calculation. You can access this information by calling IndexReader.termDocs(term) and iterating over the result.

                如果您有其他目的并且您確實(shí)需要訪問術(shù)語向量,那么您需要告訴 Lucene 存儲它們,方法是將 Field.TermVector.YES 作為Field 構(gòu)造函數(shù).然后,您可以檢索向量,例如與 IndexReader.getTermFreqVector().

                If you have some other purpose in mind and you actually need to access the term vectors, then you need to tell Lucene to store them, by passing Field.TermVector.YES as the last argument of Field constructor. Then, you can retrieve the vectors e.g. with IndexReader.getTermFreqVector().

                這篇關(guān)于如何在 Lucene 3.5.0 中提取文檔術(shù)語向量的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網(wǎng)!

                【網(wǎng)站聲明】本站部分內(nèi)容來源于互聯(lián)網(wǎng),旨在幫助大家更快的解決問題,如果有圖片或者內(nèi)容侵犯了您的權(quán)益,請聯(lián)系我們刪除處理,感謝您的支持!

                相關(guān)文檔推薦

                How can I detect integer overflow on 32 bits int?(如何檢測 32 位 int 上的整數(shù)溢出?)
                Local variables before return statements, does it matter?(return 語句之前的局部變量,這有關(guān)系嗎?)
                How to convert Integer to int?(如何將整數(shù)轉(zhuǎn)換為整數(shù)?)
                How do I create an int array with randomly shuffled numbers in a given range(如何在給定范圍內(nèi)創(chuàng)建一個(gè)隨機(jī)打亂數(shù)字的 int 數(shù)組)
                Inconsistent behavior on java#39;s ==(java的行為不一致==)
                Why is Java able to store 0xff000000 as an int?(為什么 Java 能夠?qū)?0xff000000 存儲為 int?)

                  1. <i id='833W7'><tr id='833W7'><dt id='833W7'><q id='833W7'><span id='833W7'><b id='833W7'><form id='833W7'><ins id='833W7'></ins><ul id='833W7'></ul><sub id='833W7'></sub></form><legend id='833W7'></legend><bdo id='833W7'><pre id='833W7'><center id='833W7'></center></pre></bdo></b><th id='833W7'></th></span></q></dt></tr></i><div class="qwawimqqmiuu" id='833W7'><tfoot id='833W7'></tfoot><dl id='833W7'><fieldset id='833W7'></fieldset></dl></div>

                    • <legend id='833W7'><style id='833W7'><dir id='833W7'><q id='833W7'></q></dir></style></legend>

                        <tbody id='833W7'></tbody>

                      <small id='833W7'></small><noframes id='833W7'>

                      <tfoot id='833W7'></tfoot>
                        • <bdo id='833W7'></bdo><ul id='833W7'></ul>
                          主站蜘蛛池模板: 久久精品成人 | 欧美日韩国产高清 | 国产一级在线 | 色眯眯视频在线观看 | 区一区二在线观看 | 久久久久黑人 | 91网在线播放 | 97精品超碰一区二区三区 | 91在线观看| 国产精品精品视频一区二区三区 | 午夜精品久久久久久久久久久久 | 成人黄色在线观看 | 久久一区二区三区四区五区 | 国产欧美日韩一区 | 免费h在线 | 蜜桃久久| 中文天堂在线一区 | 伦理午夜电影免费观看 | 欧美xxxx日本 | 亚洲精品一区av在线播放 | 男女又爽又黄视频 | 亚洲第一av | 在线一区观看 | 精品国产乱码久久久久久蜜柚 | 国内精品免费久久久久软件老师 | 亚洲国产精品视频 | 欧美精品一区二区三区在线播放 | 在线观看亚洲精品 | 日韩成人在线视频 | 九九热在线免费视频 | 精品一区二区三区四区视频 | 国产精品久久久久久吹潮 | 久久精品99久久 | 成人av一区二区三区 | 黄色av网站在线观看 | 中文字幕亚洲欧美日韩在线不卡 | 欧美成人免费在线视频 | 亚洲一区二区在线视频 | 九色一区| 毛片av免费看 | 国久久 |