問題描述
這是一個看似簡單的問題:給定一個生成升序整數序列的迭代器列表,編寫一個只生成每個序列中出現的整數的簡潔生成器.
Here is a seemingly simple problem: given a list of iterators that yield sequences of integers in ascending order, write a concise generator that yields only the integers that appear in every sequence.
昨晚閱讀了幾篇論文后,我決定用 Python 編寫一個完全最小的全文索引器,在這里看到(雖然那個版本現在很老了).
After reading a few papers last night, I decided to hack up a completely minimal full text indexer in Python, as seen here (though that version is quite old now).
我的問題在于 search()
函數,它必須遍歷每個發布列表并僅生成每個列表中出現的文檔 ID.從上面的鏈接可以看出,我當前的非遞歸工作"嘗試很糟糕.
My problem is with the search()
function, which must iterate over each posting list and yield only the document IDs that appear on every list. As you can see from the link above, my current non-recursive 'working' attempt is terrible.
示例:
postings = [[1, 100, 142, 322, 12312],
[2, 100, 101, 322, 1221],
[100, 142, 322, 956, 1222]]
應該讓步:
[100, 322]
至少有一個優雅的遞歸函數解決方案,但如果可能的話,我想避免這種情況.但是,涉及嵌套生成器表達式、itertools
濫用或任何其他類型的代碼高爾夫的解決方案非常受歡迎.:-)
There is at least one elegant recursive function solution to this, but I'd like to avoid that if possible. However, a solution involving nested generator expressions, itertools
abuse, or any other kind of code golf is more than welcome. :-)
應該可以將函數安排為只需要與最小列表中的項目一樣多的步驟,而無需將整組整數吸入內存.將來,這些列表可能會從磁盤中讀取,并且大于可用的 RAM.
It should be possible to arrange for the function to only require as many steps as there are items in the smallest list, and without sucking the entire set of integers into memory. In future, these lists may be read from disk, and larger than available RAM.
在過去的 30 分鐘里,我在舌尖有了一個想法,但我無法將其完全融入代碼中.請記住,這只是為了好玩!
For the past 30 minutes I've had an idea on the tip of my tongue, but I can't quite get it into code. Remember, this is just for fun!
推薦答案
import heapq, itertools
def intersect(*its):
for key, values in itertools.groupby(heapq.merge(*its)):
if len(list(values)) == len(its):
yield key
>>> list(intersect(*postings))
[100, 322]
這篇關于加入一組產生 Python 迭代器的有序整數的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!