久久久久久久av_日韩在线中文_看一级毛片视频_日本精品二区_成人深夜福利视频_武道仙尊动漫在线观看

Hadoop Streaming:映射器“包裝"二進制可執行文件

Hadoop Streaming: Mapper #39;wrapping#39; a binary executable(Hadoop Streaming:映射器“包裝二進制可執行文件)
本文介紹了Hadoop Streaming:映射器“包裝"二進制可執行文件的處理方法,對大家解決問題具有一定的參考價值,需要的朋友們下面隨著小編來一起學習吧!

問題描述

我有一個管道,目前在一個大型大學計算機集群上運行.出于發布目的,我想將其轉換為 mapreduce 格式,以便任何人都可以在使用 hadoop 集群(如 amazon webservices (AWS))時運行它.該管道目前由一系列 python 腳本組成,這些腳本包裝不同的二進制可執行文件并使用 python 子進程和 tempfile 模塊管理輸入和輸出.不幸的是,我沒有編寫二進制可執行文件,其中許多要么不使用 STDIN,要么不以可用"的方式發出 STDOUT(例如,僅將其發送到文件).這些問題是我將大部分問題封裝在 python 中的原因.

I have a pipeline that I currently run on a large university computer cluster. For publication purposes I'd like to convert it into mapreduce format such that it could be run by anyone on using a hadoop cluster such as amazon webservices (AWS). The pipeline currently consists of as series of python scripts that wrap different binary executables and manage the input and output using the python subprocess and tempfile modules. Unfortunately I didn’t write the binary executables and many of them either don’t take STDIN or don't emit STDOUT in a ‘useable’ fashion (e.g., only sent it to files). These problems are why I’ve wrapped most of them in python.

到目前為止,我已經能夠修改我的 Python 代碼,這樣我就有了一個映射器和一個縮減器,我可以在本地機器上以標準的測試格式"運行它們.

So far I’ve been able to modify my Python code such that I have a mapper and a reducer that I can run on my local machine in the standard ‘test format.’

$ cat data.txt | mapper.py | reducer.py

映射器按照它包裝的二進制文件想要的方式格式化每一行數據,使用 subprocess.popen 將文本發送到二進制文件(這也允許我屏蔽很多虛假的 STDOUT),然后收集我想要的 STOUT,并將其格式化為適合減速器的文本行.當我嘗試在本地 hadoop 安裝上復制命令時出現問題.我可以讓映射器執行,但它給出的錯誤提示它找不到二進制可執行文件.

The mapper formats each line of data the way the binary it wraps wants it, sends the text to the binary using subprocess.popen (this also allows me to mask a lot of spurious STDOUT), then collects the STOUT I want, and formats it into lines of text appropriate for the reducer. The problems arise when I try to replicate the command on a local hadoop install. I can get the mapper to execute, but it give an error that suggests that it can’t find the binary executable.

文件"/Users/me/Desktop/hadoop-0.21.0/./phyml.py",第 69 行,在main() 文件/Users/me/Desktop/hadoop-0.21.0/./mapper.py",第 66 行,主要phyml(無)文件/Users/me/Desktop/hadoop-0.21.0/./mapper.py",第 46 行,在 phyml 中ft = Popen(cli_parts,stdin=PIPE,stderr=PIPE,stdout=PIPE)文件"/Library/Frameworks/Python.framework/Versions/6.1/lib/python2.6/subprocess.py",第 621 行,在 init 中錯誤讀取,錯誤寫入)文件/Library/Frameworks/Python.framework/Versions/6.1/lib/python2.6/subprocess.py",第 1126 行,在 _execute_child 中引發 child_exceptionOSError: [Errno 13] 權限被拒絕

File "/Users/me/Desktop/hadoop-0.21.0/./phyml.py", line 69, in main() File "/Users/me/Desktop/hadoop-0.21.0/./mapper.py", line 66, in main phyml(None) File "/Users/me/Desktop/hadoop-0.21.0/./mapper.py", line 46, in phyml ft = Popen(cli_parts, stdin=PIPE, stderr=PIPE, stdout=PIPE) File "/Library/Frameworks/Python.framework/Versions/6.1/lib/python2.6/subprocess.py", line 621, in init errread, errwrite) File "/Library/Frameworks/Python.framework/Versions/6.1/lib/python2.6/subprocess.py", line 1126, in _execute_child raise child_exception OSError: [Errno 13] Permission denied

我的 hadoop 命令如下所示:

My hadoop command looks like the following:

./bin/hadoop jar /Users/me/Desktop/hadoop-0.21.0/mapred/contrib/streaming/hadoop-0.21.0-streaming.jar 
-input /Users/me/Desktop/Code/AWS/temp/data.txt 
-output /Users/me/Desktop/aws_test 
-mapper  mapper.py 
-reducer  reducer.py 
-file /Users/me/Desktop/Code/AWS/temp/mapper.py 
-file /Users/me/Desktop/Code/AWS/temp/reducer.py 
-file /Users/me/Desktop/Code/AWS/temp/binary

正如我上面提到的,在我看來,映射器不知道二進制文件 - 也許它沒有被發送到計算節點?不幸的是,我無法真正說出問題所在.任何幫助將不勝感激.很高興看到一些用 python 編寫的封裝二進制可執行文件的 hadoop 流映射器/reducer.我無法想象我是第一個嘗試這樣做的人!事實上,這里有另一個帖子問基本相同的問題,但還沒有回答......

As I noted above it looks to me like the mapper isn't aware of the binary - perhaps it's not being sent to the compute node? Unfortunately I can't really tell what the problem is. Any help would be greatly appreciated. It would be particulary nice to see some hadoop streaming mappers/reducers written in python that wrap binary executables. I can’t imagine I’m the first one to try to do this! In fact, here is another post asking essentially the same question, but it hasn't been answered yet...

Hadoop/Elastic Map Reduce 與二進制可執行文件?p>

推薦答案

經過大量谷歌搜索(等)后,我想出了如何包含映射器/reducer 可以訪問的可執行二進制文件/腳本/模塊.訣竅是首先將所有文件上傳到hadoop.

After much googling (etc.) I figured out how to include executable binaries/scripts/modules that are accessible to your mappers/reducers. The trick is to upload all you files to hadoop first.

$ bin/hadoop dfs -copyFromLocal /local/file/system/module.py module.py

然后你需要像下面的模板那樣格式化你的流命令:

Then you need to format you streaming command like the following template:

$ ./bin/hadoop jar /local/file/system/hadoop-0.21.0/mapred/contrib/streaming/hadoop-0.21.0-streaming.jar 
-file /local/file/system/data/data.txt 
-file /local/file/system/mapper.py 
-file /local/file/system/reducer.py 
-cacheFile hdfs://localhost:9000/user/you/module.py#module.py 
-input data.txt 
-output output/ 
-mapper mapper.py 
-reducer reducer.py 
-verbose

如果您要鏈接一個 python 模塊,您需要將以下代碼添加到您的映射器/減速器腳本中:

If you're linking a python module you'll need to add the following code to your mapper/reducer scripts:

import sys 
sys.path.append('.')
import module

如果您通過子處理訪問二進制文件,您的命令應如下所示:

If you're accessing a binary via subprocessing your command should look something like this:

cli = "./binary %s" % (argument)
cli_parts = shlex.split(cli)
mp = Popen(cli_parts, stdin=PIPE, stderr=PIPE, stdout=PIPE)
mp.communicate()[0]

希望這會有所幫助.

這篇關于Hadoop Streaming:映射器“包裝"二進制可執行文件的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!

【網站聲明】本站部分內容來源于互聯網,旨在幫助大家更快的解決問題,如果有圖片或者內容侵犯了您的權益,請聯系我們刪除處理,感謝您的支持!

相關文檔推薦

How to draw a rectangle around a region of interest in python(如何在python中的感興趣區域周圍繪制一個矩形)
How can I detect and track people using OpenCV?(如何使用 OpenCV 檢測和跟蹤人員?)
How to apply threshold within multiple rectangular bounding boxes in an image?(如何在圖像的多個矩形邊界框中應用閾值?)
How can I download a specific part of Coco Dataset?(如何下載 Coco Dataset 的特定部分?)
Detect image orientation angle based on text direction(根據文本方向檢測圖像方向角度)
Detect centre and angle of rectangles in an image using Opencv(使用 Opencv 檢測圖像中矩形的中心和角度)
主站蜘蛛池模板: 日韩色视频 | 久久中文高清 | 亚洲综合视频 | 久久久久久久久久影视 | 欧美成人精品一区二区男人看 | 美女视频h| 99久久免费精品 | 国产精品久久久久久久久久 | 在线观看毛片网站 | 国产一区二区三区在线免费 | 欧美xxxx在线 | 精品美女在线观看视频在线观看 | www.久久99 | 亚洲国产精品一区二区三区 | 久久精品久久综合 | 国产激情在线 | 99久久国产综合精品麻豆 | 久久久久无码国产精品一区 | 99re在线视频 | 一区二区中文字幕 | 在线免费观看视频你懂的 | 1区2区3区视频 | 成人在线小视频 | 国产一区 在线视频 | 国产精品国产精品国产专区不卡 | 久久91av| 毛片免费观看视频 | 欧美一级欧美三级在线观看 | 天天成人综合网 | 毛片免费看 | xxxxxx国产 | 国产精品亚洲精品 | 成人免费视频网址 | 国产成人aⅴ| 国产精品久久久久久久久久久久久 | 亚洲精品毛片av | 日韩成人高清在线 | 国产在线不卡视频 | 午夜天堂精品久久久久 | 国产精品久久久久无码av | 久久久久国产精品一区二区 |