問題描述
我是 Python 的初學者,目前正在從 eventful.com API 解析一個基于 Web 的 XML 文件,但是,在檢索數據的某些元素時,我收到了一些 unicode 錯誤.
I am a beginner to Python and am currently parsing a web-based XML file from the eventful.com API however, I am receiving some unicode errors when retrieving certain elements of the data.
我能夠從 xml 文件中檢索 5 個數據元素而沒有任何我想要的問題,但是它會終止并在 GAE 錯誤控制臺中產生以下錯誤:
I am able to retrieve 5 data elements without any problems which I want from the xml file, however then it terminates and produces the following error in the GAE error console:
UnicodeEncodeError: 'ascii' codec can't encode character u'u2605' in position 0: ordinal not in range(128)
我知道拋出我的解析器的字符是★"字符,無論如何我都不想從 xml 文件中檢索它.
I know that the character that is throwing my parser is a "★" character, which I would prefer to not retrieve from the xml file anyway.
我的代碼如下:
class XMLParser(webapp2.RequestHandler):
def get(self):
base_url = 'my xml file'
#downloads data from xml file
response = urllib.urlopen(base_url)
#converts data to string:
data = response.read()
#closes file
response.close()
#parses xml downloaded
dom = mdom.parseString(data)
node = dom.documentElement
#print out all event names (titles) found in the eventful xml
event_main = dom.getElementsByTagName('event')
event_names = []
for event in event_main:
eventObj = event.getElementsByTagName("title")[0]
event_names.append(eventObj)
for ev in event_names:
nodes = ev.childNodes
for node in nodes:
if node.nodeType == node.TEXT_NODE:
print node.data
有什么方法可以檢索標題"元素并忽略此處的 ★ 字符等有趣字符?我真的很感激在這件事上的任何幫助.我已經嘗試過使用 word.encode('us-ascii', 'ignore') 的解決方案,但這并不能解決問題.
Is there any way that I would be able to retrieve the "title" elements and ignore funny characters like the ★ character here? I would really appreciate any help on this matter. I have already tried solutions which uses word.encode('us-ascii', 'ignore') but this is not fixing the issue.
-----------我找到了解決方案:
-----------I HAVE FOUND THE SOLUTION:
因此,當我遇到此類問題時,在與該主題的講師交談后,我發現只需要兩行代碼即可對已解析的 xml 文件進行編碼和解碼(在讀取后進入程序).希望這可以幫助遇到同樣問題的其他人!
So as I was having such issues with this problem and after talking to a lecturer on this topic I was able to find that all it required was two lines of code to both encode and decode the parsed xml file (after it was read into the program). Hope this helps someone else having the same issue!
unicode_data = data.decode('utf-8')
data = unicode_data.encode('ascii','ignore')
推薦答案
你在哪里使用你的解碼方法?
Where are you using your decoding methods?
我過去遇到過這個錯誤,不得不解碼原始數據.換句話說,我會嘗試做
I had this error in the past and had to decode the raw. In other words, I would try doing
data = response.read()
#closes file
response.close()
#decode
data.encode("us-ascii")
也就是說,如果它實際上是 ascii.我的意思是,在調用 parseString 之前,請確保在原始結果仍為字符串格式時對其進行編碼/解碼.
That is if it is in fact ascii. My point being make sure you are encoding/decoding the raw results while it is still in a string format, before you call parseString on it.
這篇關于Unicode 編碼錯誤 Python - 解析 XML 無法編碼字符(星號)的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!