問題描述
我的用例涉及索引一個 Lucene 文檔,然后在以后的多個場合添加指向該現有文檔的術語,而不是為每個新術語刪除和重新添加整個文檔(因為性能,而不是保留原始條款).
My use case involves index a Lucene document, then on multiple future occasions add terms that point to this existing doc, that's without deleting and re-adding the entire document for each new term (because of performance, and not keeping the original terms).
我知道文檔不能真正更新.我的問題是為什么?
I do know that a document can not be truly updated. My question is why?
或者更準確地說,為什么不支持所有形式的更新(術語、存儲字段)?
為什么不可能添加另一個術語來指向現有文檔 - 從技術上講:所需要的不僅僅是將現有的文檔 ID 放在術語的發布列表中.為什么這么難?是否有一些不可變的統計數據?
Or more precisely, why are all forms of updates (terms, stored fields) not supported?
Why it's not possible to add another term to point to an existing document - technically: isn't all that's needed is to have the existing doc Id placed in the posting list of the term. Why is that hard? Is there some immutable statistics that are in the way?
是否有任何解決方法可以支持我將術語(索引字段)添加到現有文檔的用例?
Are there any workarounds for supporting my usecase of adding a term (indexed field) to an existing doc?
推薦答案
我知道文檔不能真正更新.我的問題是為什么?
I do know that a document can not be truly updated. My question is why?
Gili,編輯文檔會導致相關術語發布發生變化,由于術語發布列表結構,這是有問題的.過帳列表被排序并按順序存儲在內存中.因此,要將文檔添加到術語的發布列表中,您必須為其提供更高的 doc id
,這是通過刪除并重新索引整個文檔來完成的.
Gili, editing a document will cause changes in the related terms postings and this is problematic due to to the terms posting-list structure. The posting-list is sorted and stored sequential in memory. Thus to add a document to a term's posting-list you have to give it a higher doc id
this is done by deleting and re-index the entire document.
這篇關于為什么 Lucene 不支持對現有文檔進行任何類型的更新的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!