久久久久久久av_日韩在线中文_看一级毛片视频_日本精品二区_成人深夜福利视频_武道仙尊动漫在线观看

在 FTP 服務(wù)器上的 zip 文件中獲取文件名,而無需

Get files names inside a zip file on FTP server without downloading whole archive(在 FTP 服務(wù)器上的 zip 文件中獲取文件名,而無需下載整個存檔)
本文介紹了在 FTP 服務(wù)器上的 zip 文件中獲取文件名,而無需下載整個存檔的處理方法,對大家解決問題具有一定的參考價值,需要的朋友們下面隨著小編來一起學(xué)習(xí)吧!

問題描述

我在遠(yuǎn)程 FTP 服務(wù)器中有很多 zip 存檔,它們的大小高達(dá) 20TB.我只需要這些 zip 檔案中的文件名,這樣我就可以將它們插入到我的 Python 腳本中.

I have a lot of zip archives in a remote FTP server and their sizes go up to 20TB. I just need the file names inside those zip archives, so that I can plug them into my Python scripts.

有沒有什么方法可以只獲取文件名而不實(shí)際下載文件并在我的本地機(jī)器上提取它們?如果是這樣,有人可以指導(dǎo)我到正確的庫/包嗎?

Is there any way to just get the file names without actually downloading files and extracting them on my local machine? If so, can someone direct me to the right library/package?

推薦答案

您可以實(shí)現(xiàn)一個類文件對象,從 FTP 讀取數(shù)據(jù),而不是本地文件.并將其傳遞給 ZipFile 構(gòu)造函數(shù),而不是(本地)文件名.

You can implement a file-like object that reads data from FTP, instead of a local file. And pass that to ZipFile constructor, instead of a (local) file name.

一個簡單的實(shí)現(xiàn)可以是:

A trivial implementation can be like:

from ftplib import FTP
from ssl import SSLSocket

class FtpFile:

    def __init__(self, ftp, name):
        self.ftp = ftp
        self.name = name
        self.size = ftp.size(name)
        self.pos = 0
    
    def seek(self, offset, whence):
        if whence == 0:
            self.pos = offset
        if whence == 1:
            self.pos += offset
        if whence == 2:
            self.pos = self.size + offset

    def tell(self):
        return self.pos

    def read(self, size = None):
        if size == None:
            size = self.size - self.pos
        data = B""

        # Based on FTP.retrbinary 
        # (but allows stopping after certain number of bytes read)
        # An alternative implementation is at
        # https://stackoverflow.com/q/58819210/850848#58819362
        ftp.voidcmd('TYPE I')
        cmd = "RETR {}".format(self.name)
        conn = ftp.transfercmd(cmd, self.pos)
        try:
            while len(data) < size:
                buf = conn.recv(min(size - len(data), 8192))
                if not buf:
                    break
                data += buf
            # shutdown ssl layer (can be removed if not using TLS/SSL)
            if SSLSocket is not None and isinstance(conn, SSLSocket):
                conn.unwrap()
        finally:
            conn.close()
        try:
            ftp.voidresp()
        except:
            pass
        self.pos += len(data)
        return data

然后你可以像這樣使用它:

And then you can use it like:

ftp = FTP(host, user, passwd)
ftp.cwd(path)

ftpfile = FtpFile(ftp, "archive.zip")
zip = zipfile.ZipFile(ftpfile)
print(zip.namelist())


上述實(shí)現(xiàn)相當(dāng)瑣碎且效率低下.它開始大量(至少三個)下載小塊數(shù)據(jù)以檢索包含文件的列表.它可以通過讀取和緩存更大的塊來優(yōu)化.但它應(yīng)該給你的想法.


The above implementation is rather trivial and inefficient. It starts numerous (three at minimum) downloads of small chunks of data to retrieve a list of contained files. It can be optimized by reading and caching larger chunks. But it should give your the idea.

特別是您可以利用您將只閱讀列表的事實(shí).該列表位于 ZIP 存檔的 和 處.因此,您可以在開始時下載最后(大約)10 KB 的數(shù)據(jù).您將能夠從該緩存中完成所有 read 調(diào)用.

Particularly you can make use of the fact that you are going to read a listing only. The listing is located at the and of a ZIP archive. So you can just download last (about) 10 KB worth of data at the start. And you will be able to fulfill all read calls out of that cache.

知道了這一點(diǎn),您實(shí)際上可以做一個小技巧.由于列表位于存檔的末尾,您實(shí)際上只能下載存檔的末尾.雖然下載的 ZIP 將被破壞,但它仍然可以列出.這樣,您將不需要 FtpFile 類.您可以甚至將列表下載到內(nèi)存中 (StringIO).

Knowing that, you can actually do a small hack. As the listing is at the end of the archive, you can actually download the end of the archive only. While the downloaded ZIP will be broken, it still can be listed. This way, you won't need the FtpFile class. You can even download the listing to memory (StringIO).

zipstring = StringIO()
name = "archive.zip"
size = ftp.size(name)
ftp.retrbinary("RETR " + name, zipstring.write, rest = size - 10*2024)

zip = zipfile.ZipFile(zipstring)

print(zip.namelist())

如果您因?yàn)?10 KB 太小而無法包含整個列表而收到 BadZipfile 異常,您可以使用更大的塊重試代碼.

If you get BadZipfile exception because the 10 KB is too small to contain whole listing, you can retry the code with a larger chunk.

這篇關(guān)于在 FTP 服務(wù)器上的 zip 文件中獲取文件名,而無需下載整個存檔的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網(wǎng)!

【網(wǎng)站聲明】本站部分內(nèi)容來源于互聯(lián)網(wǎng),旨在幫助大家更快的解決問題,如果有圖片或者內(nèi)容侵犯了您的權(quán)益,請聯(lián)系我們刪除處理,感謝您的支持!

相關(guān)文檔推薦

Why I cannot make an insert to Python list?(為什么我不能插入 Python 列表?)
Insert a column at the beginning (leftmost end) of a DataFrame(在 DataFrame 的開頭(最左端)插入一列)
Python psycopg2 not inserting into postgresql table(Python psycopg2 沒有插入到 postgresql 表中)
list extend() to index, inserting list elements not only to the end(list extend() 索引,不僅將列表元素插入到末尾)
How to add element in Python to the end of list using list.insert?(如何使用 list.insert 將 Python 中的元素添加到列表末尾?)
TypeError: #39;float#39; object is not subscriptable(TypeError:“浮動對象不可下標(biāo))
主站蜘蛛池模板: 亚洲成人av在线播放 | 日日夜夜91| 日本一二三区电影 | 亚洲成人久久久 | 久国久产久精永久网页 | 国产精品不卡一区二区三区 | 欧美精品一区二区三区四区五区 | 亚洲久在线 | 午夜视频在线 | 欧美性生活一区二区三区 | 国产高清免费在线 | 久久精品欧美一区二区三区不卡 | 欧美xxxⅹ性欧美大片 | 欧美成人一区二区三区片免费 | 国产成人精品免高潮在线观看 | www.久久| 小早川怜子xxxxaⅴ在线 | 亚洲成人a v | 欧美一级片在线看 | 久久久精品网 | 91视频播放 | 亚洲在线一区二区 | 日本一区二区三区在线观看 | 亚洲欧美日韩精品久久亚洲区 | 国产成人网 | 黄色一级大片在线观看 | 国产三级网站 | 欧美一区2区三区4区公司二百 | 美女黄18岁以下禁止观看 | 羞羞色网站 | 国产精品一区二区不卡 | 成人免费影院 | 久久专区 | 国产特级毛片aaaaaa喷潮 | 超黄视频网站 | 欧美成人一级 | 91毛片在线看 | 我想看一级黄色毛片 | 久久久久久国产免费视网址 | 区一区二区三在线观看 | 91就要激情 |