問題描述
我正在努力解決這個問題,但它不夠靈活.
I'm trying to wrap my brain around this but it's not flexible enough.
在我的 Python 腳本中,我有一個列表字典.(實際上它會更深一點,但這個問題不涉及這個級別.)我想將所有這些扁平化為一個長列表,丟棄所有字典鍵.
In my Python script I have a dictionary of dictionaries of lists. (Actually it gets a little deeper but that level is not involved in this question.) I want to flatten all this into one long list, throwing away all the dictionary keys.
所以我想變身
{1: {'a': [1, 2, 3], 'b': [0]},
2: {'c': [4, 5, 1], 'd': [3, 8]}}
到
[1, 2, 3, 0, 4, 5, 1, 3, 8]
我可能會設置一個 map-reduce 來迭代外部字典的項目,以從每個子字典構建一個子列表,然后將所有子列表連接在一起.
I could probably set up a map-reduce to iterate over items of the outer dictionary to build a sublist from each subdictionary and then concatenate all the sublists together.
但這對于大型數據集似乎效率低下,因為中間數據結構(子列表)會被丟棄.有沒有辦法一次性完成?
But that seems inefficient for large data sets, because of the intermediate data structures (sublists) that will get thrown away. Is there a way to do it in one pass?
除此之外,我很樂意接受一個有效的兩級實現......我的 map-reduce 生銹了!
Barring that, I would be happy to accept a two-level implementation that works... my map-reduce is rusty!
更新:對于那些感興趣的人,下面是我最終使用的代碼.
Update: For those who are interested, below is the code I ended up using.
請注意,盡管我在上面要求一個列表作為輸出,但我真正需要的是一個排序列表;即展平的輸出可以是任何可以排序的迭代.
Note that although I asked above for a list as output, what I really needed was a sorted list; i.e. the output of the flattening could be any iterable that can be sorted.
def genSessions(d):
"""Given the ipDict, return an iterator that provides all the sessions,
one by one, converted to tuples."""
for uaDict in d.itervalues():
for sessions in uaDict.itervalues():
for session in sessions:
yield tuple(session)
...
# Flatten dict of dicts of lists of sessions into a list of sessions.
# Sort that list by start time
sessionsByStartTime = sorted(genSessions(ipDict), key=operator.itemgetter(0))
# Then make another copy sorted by end time.
sessionsByEndTime = sorted(sessionsByStartTime, key=operator.itemgetter(1))
再次感謝所有提供幫助的人.
Thanks again to all who helped.
[更新:將 nthGetter()
替換為 operator.itemgetter()
,感謝@intuited.]
[Update: replaced nthGetter()
with operator.itemgetter()
, thanks to @intuited.]
推薦答案
編輯:重新閱讀原始問題并重新編寫答案以假設所有非字典都是要展平的列表.
edit: re-read the original question and reworked answer to assume that all non-dictionaries are lists to be flattened.
如果您不確定字典的深度,您可能需要使用遞歸函數.@Arrieta 已經發布遞歸構建非字典值列表的函數.
In cases where you're not sure how far down the dictionaries go, you would want to use a recursive function. @Arrieta has already posted a function that recursively builds a list of non-dictionary values.
這是一個生成器,在字典樹中產生連續的非字典值:
This one is a generator that yields successive non-dictionary values in the dictionary tree:
def flatten(d):
"""Recursively flatten dictionary values in `d`.
>>> hat = {'cat': ['images/cat-in-the-hat.png'],
... 'fish': {'colours': {'red': [0xFF0000], 'blue': [0x0000FF]},
... 'numbers': {'one': [1], 'two': [2]}},
... 'food': {'eggs': {'green': [0x00FF00]},
... 'ham': ['lean', 'medium', 'fat']}}
>>> set_of_values = set(flatten(hat))
>>> sorted(set_of_values)
[1, 2, 255, 65280, 16711680, 'fat', 'images/cat-in-the-hat.png', 'lean', 'medium']
"""
try:
for v in d.itervalues():
for nested_v in flatten(v):
yield nested_v
except AttributeError:
for list_v in d:
yield list_v
doctest 將生成的迭代器傳遞給 set
函數.這很可能是您想要的,因為正如 Martelli 先生指出的那樣,字典的值沒有內在的順序,因此沒有理由跟蹤它們被發現的順序.
The doctest passes the resulting iterator to the set
function. This is likely to be what you want, since, as Mr. Martelli points out, there's no intrinsic order to the values of a dictionary, and therefore no reason to keep track of the order in which they were found.
您可能希望跟蹤每個值的出現次數;如果將迭代器傳遞給 set
,此信息將丟失.如果你想跟蹤它,只需將 flatten(hat)
的結果傳遞給其他函數,而不是 set
.在 Python 2.7 下,其他函數可能是 collections.Counter
.為了與進化較少的 python 兼容,您可以編寫自己的函數或(在效率上有所損失)將 sorted
與 itertools.groupby
結合起來.
You may want to keep track of the number of occurrences of each value; this information will be lost if you pass the iterator to set
. If you want to track that, just pass the result of flatten(hat)
to some other function instead of set
. Under Python 2.7, that other function could be collections.Counter
. For compatibility with less-evolved pythons, you can write your own function or (with some loss of efficiency) combine sorted
with itertools.groupby
.
這篇關于將列表的字典(2 級深)展平的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!