久久久久久久av_日韩在线中文_看一级毛片视频_日本精品二区_成人深夜福利视频_武道仙尊动漫在线观看

使用ElementTree在python中解析xml

xml parsing in python using ElementTree(使用ElementTree在python中解析xml)
本文介紹了使用ElementTree在python中解析xml的處理方法,對大家解決問題具有一定的參考價值,需要的朋友們下面隨著小編來一起學習吧!

問題描述

我對 python 很陌生,我需要先解析一些需要清理的臟 xml 文件.

I'm very new to python and I need to parse some dirty xml files which need sanitising first.

我有以下 python 代碼:

I have the following python code:

import arff
import xml.etree.ElementTree
import re

totstring=""

with open('input.sgm', 'r') as inF:
    for line in inF:
        string=re.sub("[^0-9a-zA-Z<>/s=!-""]+","", line)
    totstring+=string


data=xml.etree.ElementTree.fromstring(totstring)

print data

file.close

解析:

<!DOCTYPE lewis SYSTEM "lewis.dtd">
<REUTERS TOPICS="YES" LEWISSPLIT="TRAIN" CGISPLIT="TRAINING-SET" OLDID="5544" NEWID="1">
<DATE>26-FEB-1987 15:01:01.79</DATE>
<TOPICS><D>cocoa</D></TOPICS>
<PLACES><D>el-salvador</D><D>usa</D><D>uruguay</D></PLACES>
<PEOPLE></PEOPLE>
<ORGS></ORGS>
<EXCHANGES></EXCHANGES>
<COMPANIES></COMPANIES>
<UNKNOWN> 
&#5;&#5;&#5;C T
&#22;&#22;&#1;f0704&#31;reute
u f BC-BAHIA-COCOA-REVIEW   02-26 0105</UNKNOWN>
<TEXT>&#2;
<TITLE>BAHIA COCOA REVIEW</TITLE>
<DATELINE>    SALVADOR, Feb 26 - </DATELINE><BODY>Showers continued throughout the week in
the Bahia cocoa zone, alleviating the drought since early
January and improving prospects for the coming temporao,
although normal humidity levels have not been restored,
Comissaria Smith said in its weekly review.
    The dry period means the temporao will be late this year.
    Arrivals for the week ended February 22 were 155,221 bags
of 60 kilos making a cumulative total for the season of 5.93
mln against 5.81 at the same stage last year. Again it seems
that cocoa delivered earlier on consignment was included in the
arrivals figures.
    Comissaria Smith said there is still some doubt as to how
much old crop cocoa is still available as harvesting has
practically come to an end. With total Bahia crop estimates
around 6.4 mln bags and sales standing at almost 6.2 mln there
are a few hundred thousand bags still in the hands of farmers,
middlemen, exporters and processors.
    There are doubts as to how much of this cocoa would be fit
for export as shippers are now experiencing dificulties in
obtaining +Bahia superior+ certificates.
    In view of the lower quality over recent weeks farmers have
sold a good part of their cocoa held on consignment.
    Comissaria Smith said spot bean prices rose to 340 to 350
cruzados per arroba of 15 kilos.
    Bean shippers were reluctant to offer nearby shipment and
only limited sales were booked for March shipment at 1,750 to
1,780 dlrs per tonne to ports to be named.
    New crop sales were also light and all to open ports with
June/July going at 1,850 and 1,880 dlrs and at 35 and 45 dlrs
under New York july, Aug/Sept at 1,870, 1,875 and 1,880 dlrs
per tonne FOB.
    Routine sales of butter were made. March/April sold at
4,340, 4,345 and 4,350 dlrs.
    April/May butter went at 2.27 times New York May, June/July
at 4,400 and 4,415 dlrs, Aug/Sept at 4,351 to 4,450 dlrs and at
2.27 and 2.28 times New York Sept and Oct/Dec at 4,480 dlrs and
2.27 times New York Dec, Comissaria Smith said.
    Destinations were the U.S., Covertible currency areas,
Uruguay and open ports.
    Cake sales were registered at 785 to 995 dlrs for
March/April, 785 dlrs for May, 753 dlrs for Aug and 0.39 times
New York Dec for Oct/Dec.
    Buyers were the U.S., Argentina, Uruguay and convertible
currency areas.
    Liquor sales were limited with March/April selling at 2,325
and 2,380 dlrs, June/July at 2,375 dlrs and at 1.25 times New
York July, Aug/Sept at 2,400 dlrs and at 1.25 times New York
Sept and Oct/Dec at 1.25 times New York Dec, Comissaria Smith
said.
    Total Bahia sales are currently estimated at 6.13 mln bags
against the 1986/87 crop and 1.06 mln bags against the 1987/88
crop.
    Final figures for the period to February 28 are expected to
be published by the Brazilian Cocoa Trade Commission after
carnival which ends midday on February 27.
 Reuter
&#3;</BODY></TEXT>
</REUTERS>

我現在如何才能只從 body 標記中獲取文本?

How can I now go about getting just the text from inside the body tag?

我看到的所有教程都依賴于直接從文件中讀取 xml,以便 Elementtree.parse 工作.當我試圖從一個字符串中解析時,這將不起作用,這會破壞我閱讀的很多教程.

All the tutorials i have seen rely on reading the xml directly from a file so that Elementtree.parse works. As I am trying to parse from a string this will not work and this breaks a lot of tutorials I have read.

非常感謝

推薦答案

如果您不關心(可能是混亂的)XML 文檔的特定結構,而只想快速獲取給定標簽/元素的內容,您可能想嘗試 BeautifulSoup 模塊.

If you don't care about the particular structure of a (potentially messy) XML document and just want to quickly get the contents of a given tag/element, you may want to try the BeautifulSoup module.

import BeautifulSoup
from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup(totstring)

body = soup.find("body")

bodytext = body.text

這篇關于使用ElementTree在python中解析xml的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!

【網站聲明】本站部分內容來源于互聯網,旨在幫助大家更快的解決問題,如果有圖片或者內容侵犯了您的權益,請聯系我們刪除處理,感謝您的支持!

相關文檔推薦

Troubles while parsing with python very large xml file(使用 python 解析非常大的 xml 文件時出現問題)
Find all nodes by attribute in XML using Python 2(使用 Python 2 在 XML 中按屬性查找所有節點)
Python - How to parse xml response and store a elements value in a variable?(Python - 如何解析 xml 響應并將元素值存儲在變量中?)
How to get XML tag value in Python(如何在 Python 中獲取 XML 標記值)
How to correctly parse utf-8 xml with ElementTree?(如何使用 ElementTree 正確解析 utf-8 xml?)
Parse XML from URL into python object(將 XML 從 URL 解析為 python 對象)
主站蜘蛛池模板: 91av视频| 亚洲精品中文字幕在线观看 | 国产成在线观看免费视频 | 午夜私人影院在线观看 | 国产精品久久久久久久粉嫩 | 国产精品日韩欧美一区二区三区 | 99热热精品 | 午夜伊人 | 国产成人在线观看免费 | 国产免费视频 | 欧美色综合天天久久综合精品 | 免费一区二区在线观看 | 国产精品一区二区欧美黑人喷潮水 | 国产资源在线视频 | 韩日精品在线观看 | 爱综合| 一片毛片 | 色在线看 | 亚洲精品中文字幕在线观看 | 国产精品久久久久久久久婷婷 | 国产精品成人一区二区三区夜夜夜 | 午夜电影网| 黄色一级毛片 | 伊人精品在线 | 国产伦精品一区二区三区精品视频 | 色伊人网 | 日韩三级一区 | 在线观看视频一区 | 五月天天丁香婷婷在线中 | 日韩视频在线免费观看 | 青青草精品视频 | 国产精品久久久久久婷婷天堂 | 久久视频免费观看 | 97国产成人| 成人综合视频在线 | 国产精品久久久久国产a级 欧美日韩国产免费 | 国产精品成人国产乱一区 | 超碰天天| 午夜免费在线 | 日日日操 | 青青草精品 |