問題描述
來自 本指南,我已經(jīng)成功運行了示例練習.但是在運行我的 mapreduce 作業(yè)時,我收到以下錯誤ERROR streaming.StreamJob:作業(yè)不成功!
2016 年 10 月 12 日 17:13:38 信息流.StreamJob:killJob...
流式傳輸作業(yè)失敗!
來自日志文件的錯誤
From this guide, I have successfully run the sample exercise. But on running my mapreduce job, I am getting the following error
ERROR streaming.StreamJob: Job not Successful!
10/12/16 17:13:38 INFO streaming.StreamJob: killJob...
Streaming Job Failed!
Error from the log file
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
映射器.py
import sys
i=0
for line in sys.stdin:
i+=1
count={}
for word in line.strip().split():
count[word]=count.get(word,0)+1
for word,weight in count.items():
print '%s %s:%s' % (word,str(i),str(weight))
reducer.py
Reducer.py
import sys
keymap={}
o_tweet="2323"
id_list=[]
for line in sys.stdin:
tweet,tw=line.strip().split()
#print tweet,o_tweet,tweet_id,id_list
tweet_id,w=tw.split(':')
w=int(w)
if tweet.__eq__(o_tweet):
for i,wt in id_list:
print '%s:%s %s' % (tweet_id,i,str(w+wt))
id_list.append((tweet_id,w))
else:
id_list=[(tweet_id,w)]
o_tweet=tweet
[edit] 運行作業(yè)的命令:
[edit] command to run the job:
hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar contrib/streaming/hadoop-0.20.0-streaming.jar -file /home/hadoop/mapper.py -mapper /home/hadoop/mapper.py -file /home/hadoop/reducer.py -reducer /home/hadoop/reducer.py -input my-input/* -output my-output
輸入是任意隨機序列的句子.
Input is any random sequence of sentences.
謝謝,
推薦答案
你的 -mapper 和 -reducer 應該只是腳本名稱.
Your -mapper and -reducer should just be the script name.
hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar contrib/streaming/hadoop-0.20.0-streaming.jar -file /home/hadoop/mapper.py -mapper mapper.py -file /home/hadoop/reducer.py -reducer reducer.py -input my-input/* -output my-output
當您的腳本位于 hdfs 內(nèi)另一個文件夾中的作業(yè)中時,該作業(yè)與執(zhí)行為."的嘗試任務相關.(僅供參考,如果您想要添加另一個文件,例如查找表,您可以在 Python 中打開它,就好像它與您的腳本在同一目錄中一樣,而您的腳本在 M/R 作業(yè)中)
When your scripts are in the job that is in another folder within hdfs which is relative to the attempt task executing as "." (FYI if you ever want to ad another -file such as a look up table you can open it in Python as if it was in the same dir as your scripts while your script is in M/R job)
還要確保你有 chmod a+x mapper.py 和 chmod a+x reducer.py
also make sure you have chmod a+x mapper.py and chmod a+x reducer.py
這篇關于python中的Hadoop Streaming Job失敗錯誤的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網(wǎng)!