日韩精品1区2区,亚洲综合精品在线,1区2区3区视频

本文介紹了使用 Apache Spark 將鍵值對縮減為鍵列表對的處理方法，對大家解決問題具有一定的參考價值，需要的朋友們下面隨著小編來一起學(xué)習(xí)吧！

問題描述

我正在編寫一個 Spark 應(yīng)用程序，并希望將一組鍵值對 (K, V1), (K, V2), ..., (K, Vn) 組合成一個鍵-多值對(K, [V1, V2, ..., Vn]).我覺得我應(yīng)該能夠使用具有某種風(fēng)味的 reduceByKey 函數(shù)來做到這一點(diǎn):

I am writing a Spark application and want to combine a set of Key-Value pairs (K, V1), (K, V2), ..., (K, Vn) into one Key-Multivalue pair (K, [V1, V2, ..., Vn]). I feel like I should be able to do this using the reduceByKey function with something of the flavor:

My_KMV = My_KV.reduce(lambda a, b: a.append([b]))

發(fā)生這種情況時我得到的錯誤是:

The error that I get when this occurs is:

NoneType"對象沒有附加"屬性.

'NoneType' object has no attribue 'append'.

我的鍵是整數(shù)，值 V1,...,Vn 是元組.我的目標(biāo)是使用鍵和值列表(元組)創(chuàng)建一對.

My keys are integers and values V1,...,Vn are tuples. My goal is to create a single pair with the key and a list of the values (tuples).

推薦答案

Map和ReduceByKey

reduce的輸入類型和輸出類型必須相同，所以如果你想聚合一個列表，你必須map輸入到列表.然后將這些列表合并為一個列表.

Input type and output type of reduce must be the same, therefore if you want to aggregate a list, you have to map the input to lists. Afterwards you combine the lists into one list.

組合列表

您需要一種將列表合并為一個列表的方法.Python 提供了一些組合列表的方法.

You'll need a method to combine lists into one list. Python provides some methods to combine lists.

append 修改第一個列表，并且總是返回 None.

append modifies the first list and will always return None.

x = [1, 2, 3]
x.append([4, 5])
# x is [1, 2, 3, [4, 5]]

extend 做同樣的事情，但解開列表:

extend does the same, but unwraps lists:

x = [1, 2, 3]
x.extend([4, 5])
# x is [1, 2, 3, 4, 5]

這兩種方法都返回 None，但您需要一個返回組合列表的方法，因此只需使用加號.

Both methods return None, but you'll need a method that returns the combined list, therefore just use the plus sign.

x = [1, 2, 3] + [4, 5]
# x is [1, 2, 3, 4, 5]

火花

file = spark.textFile("hdfs://...")
counts = file.flatMap(lambda line: line.split(" ")) 
         .map(lambda actor: (actor.split(",")[0], actor))  

         # transform each value into a list
         .map(lambda nameTuple: (nameTuple[0], [ nameTuple[1] ])) 

         # combine lists: ([1,2,3] + [4,5]) becomes [1,2,3,4,5]
         .reduceByKey(lambda a, b: a + b)

<小時>

組合鍵

也可以使用 combineByKey 來解決這個問題，它在內(nèi)部用于實(shí)現(xiàn) reduceByKey，但它更復(fù)雜并且 在 Spark 中使用一種專門的按鍵組合器可以更快"一個>.對于上面的解決方案，您的用例已經(jīng)足夠簡單了.

It's also possible to solve this with combineByKey, which is used internally to implement reduceByKey, but it's more complex and "using one of the specialized per-key combiners in Spark can be much faster". Your use case is simple enough for the upper solution.

GroupByKey

也可以使用 groupByKey、但它會減少并行化，因此對于大數(shù)據(jù)集可能會慢得多.

It's also possible to solve this with groupByKey, but it reduces parallelization and therefore could be much slower for big data sets.

這篇關(guān)于使用 Apache Spark 將鍵值對縮減為鍵列表對的文章就介紹到這了，希望我們推薦的答案對大家有所幫助，也希望大家多多支持html5模板網(wǎng)！

【網(wǎng)站聲明】本站部分內(nèi)容來源于互聯(lián)網(wǎng),旨在幫助大家更快的解決問題，如果有圖片或者內(nèi)容侵犯了您的權(quán)益，請聯(lián)系我們刪除處理，感謝您的支持！

久久久久久久av_日韩在线中文_看一级毛片视频_日本精品二区_成人深夜福利视频_武道仙尊动漫在线观看

使用 Apache Spark 將鍵值對縮減為鍵列表對

問題描述

推薦答案

相關(guān)文檔推薦