久久久久久久av_日韩在线中文_看一级毛片视频_日本精品二区_成人深夜福利视频_武道仙尊动漫在线观看

iterparse 無法解析字段,而其他類似的都可以

iterparse fails to parse a field, while other similar ones are fine(iterparse 無法解析字段,而其他類似的都可以)
本文介紹了iterparse 無法解析字段,而其他類似的都可以的處理方法,對大家解決問題具有一定的參考價值,需要的朋友們下面隨著小編來一起學習吧!

問題描述

我使用 Python 的 iterparse 來解析 nessus 掃描的 XML 結果(.nessus 文件).意外記錄解析失敗,但類似的記錄已正確解析.

I use Python's iterparse to parse the XML result of a nessus scan (.nessus file). The parsing fails on unexpected records, wile similar ones have been parsed correctly.

XML 文件的一般結構是很多記錄,如下所示:

The general structure of the XML file is a lot of records like the one below:

<ReportHost>
  <ReportItem>
    <foo>9.3</foo>
    <bar>hello</bar>
  </ReportItem>
  <ReportItem>
     <foo>10.0</foo>
     <bar>world</bar>
</ReportHost>
<ReportHost>
   ...
</ReportHost>

換句話說,很多主機 (ReportHost) 有很多要報告的項目 (ReportItem),而后者有幾個特征 (foo).我將考慮為每個項目生成一行,并具有其特征.

In other words a lot of hosts (ReportHost) with a lot of items to report (ReportItem), and the latter having several characteristics (foo, bar). I will be looking at generating one line per item, with its characteristics.

在文件中間的一行簡單的解析失敗(foo 在這種情況下是 cvss_base_score)

The parsing fails in the middle of the file at a simple line (foo in that case being cvss_base_score)

<cvss_base_score>9.3</cvss_base_score>

雖然已經解析了大約 200 條類似的行,但沒有問題.

while ~200 similar lines have been parsed without problems.

相關的代碼如下——它設置了上下文標記(inReportHostinReportEvent 告訴我我所在的 XML 文件的具體位置,以及根據上下文分配或打印一個值)

The relevant piece of code is below -- it sets context markers (inReportHost and inReportEvent which tell me where in the stricture of the XML file I am in, and either assign or print a value, depending on the context)

import xml.etree.cElementTree as ET
inReportHost = False
inReportItem = False

for event, elem in ET.iterparse("test2.nessus", events=("start", "end")):
    if event == 'start' and elem.tag == "ReportHost":
        inReportHost = True
    if event == 'end' and elem.tag == "ReportHost":
        inReportHost = False
        elem.clear()
    if inReportHost:
        if event == 'start' and elem.tag == 'ReportItem':
            inReportItem = True
            cvss = ''
        if event == 'start' and inReportItem:
            if event == 'start' and elem.tag == 'cvss_base_score':
                cvss = elem.text
        if event == 'end' and elem.tag == 'ReportItem':
            print cvss
            inReportItem = False

cvss 有時具有 None 值(在 cvss = elem.text 賦值之后),即使相同的條目已在文件的前面正確解析.

cvss sometimes has the None value (after the cvss = elem.text assignment), even though identical entries have been parsed properely earlier in the file.

如果我在分配下面添加一些類似的東西

If I add below the assignement something along the lines of

if cvss is None: cvss = "0"

然后解析許多進一步的 cvss 分配它們的正確值(還有一些是 None ).

then the parsing of many further cvss assign their proper values (and some other are None).

當使用 <ReportHost>...</reportHost> 這會導致錯誤的解析并通過程序運行它 - 它工作正常(即.cvss 按預期分配了 9.3).

When taking the <ReportHost>...</reportHost> which causes the wrong parsing and running it through the program - it works fine (ie. cvss is assigned 9.3 as expected).

我迷失在我的代碼中出現錯誤的地方,因為有大量相似的記錄,有些已正確處理,有些 - 未正確處理(有些記錄是相同的,但處理方式仍然不同).我也找不到任何關于失敗記錄的具體信息 - 早晚相同的記錄都可以.

I am lost at where I make a mistake in my code since, withing a large set of similar records, some apre processed correctly and some - not (some of the records are identical, and still are processed differently). I also cannot find anything particular about the records that fail - identical ones earlier and later are fine.

推薦答案

來自 iterparse() 文檔:

注意:iterparse() 只保證它已經看到了>"字符當它發出一個開始"事件時,它的起始標簽,所以屬性是已定義,但 text 和 tail 屬性的內容是那時未定義.這同樣適用于子元素;它們可能存在也可能不存在.如果您需要一個完全填充的元素,而是尋找結束"事件.

Note: iterparse() only guarantees that it has seen the ">" character of a starting tag when it emits a "start" event, so the attributes are defined, but the contents of the text and tail attributes are undefined at that point. The same applies to the element children; they may or may not be present. If you need a fully populated element, look for "end" events instead.

刪除 inReport* 變量并在完全解析后僅在結束"事件上處理 ReportHost.使用 ElementTree API 從當前 ReportHost 元素中獲取必要的信息,例如 cvss_base_score.

Drop inReport* variables and process ReportHost only on "end" events when it fully parsed. Use ElementTree API to get necessary info such as cvss_base_score from current ReportHost element.

要保留內存,請執行以下操作:

To preserve memory, do:

import xml.etree.cElementTree as etree

def getelements(filename_or_file, tag):
    context = iter(etree.iterparse(filename_or_file, events=('start', 'end')))
    _, root = next(context) # get root element
    for event, elem in context:
        if event == 'end' and elem.tag == tag:
            yield elem
            root.clear() # preserve memory

for host in getelements("test2.nessus", "ReportHost"):
    for cvss_el in host.iter("cvss_base_score"):
        print(cvss_el.text)

這篇關于iterparse 無法解析字段,而其他類似的都可以的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!

【網站聲明】本站部分內容來源于互聯網,旨在幫助大家更快的解決問題,如果有圖片或者內容侵犯了您的權益,請聯系我們刪除處理,感謝您的支持!

相關文檔推薦

Troubles while parsing with python very large xml file(使用 python 解析非常大的 xml 文件時出現問題)
Find all nodes by attribute in XML using Python 2(使用 Python 2 在 XML 中按屬性查找所有節點)
Python - How to parse xml response and store a elements value in a variable?(Python - 如何解析 xml 響應并將元素值存儲在變量中?)
How to get XML tag value in Python(如何在 Python 中獲取 XML 標記值)
How to correctly parse utf-8 xml with ElementTree?(如何使用 ElementTree 正確解析 utf-8 xml?)
Parse XML from URL into python object(將 XML 從 URL 解析為 python 對象)
主站蜘蛛池模板: 91精品国产一区二区三区香蕉 | 国产精品欧美一区二区三区不卡 | 视频在线观看一区二区 | 亚洲一区国产 | 美国十次成人欧美色导视频 | 日韩精品一区二区三区中文在线 | 午夜综合 | 97色在线观看免费视频 | 嫩草影院黄 | 欧美日韩精品一区 | 日韩国产三区 | 欧美日韩在线播放 | 91在线视频观看 | 欧美日韩a | 天堂成人国产精品一区 | 日韩中文在线视频 | 99re在线视频观看 | 999精品视频在线观看 | 日本三级做a全过程在线观看 | 免费一级欧美在线观看视频 | 国产激情一区二区三区 | 羞羞视频网页 | 精品在线99 | 国产精品久久久久久高潮 | www国产成人免费观看视频 | 国产午夜一级 | 在线视频一区二区三区 | 久久精品91久久久久久再现 | 色婷婷国产精品综合在线观看 | 在线观看你懂的网站 | 国产欧美一区二区精品忘忧草 | 国产成人99久久亚洲综合精品 | 18性欧美 | 国产成人精品一区二三区在线观看 | a级毛片毛片免费观看久潮喷 | 黄色大片在线播放 | 一级片免费网站 | 日韩一区二区av | 99免费视频 | 久久久免费毛片 | 91视频免费观看 |