問題描述
我正在嘗試從 FTP 服務器讀取文件.該文件是一個 .gz
文件.我想知道我是否可以在套接字打開時對此文件執(zhí)行操作.我試圖遵循 讀取文件而不寫入磁盤和從 FTP 讀取文件而不下載但不成功.
I am trying to read a file from an FTP server. The file is a .gz
file. I would like to know if I can perform actions on this file while the socket is open. I tried to follow what was mentioned in two StackOverflow questions on reading files without writing to disk and reading files from FTP without downloading but was not successful.
我知道如何在下載的文件上提取數(shù)據(jù)/工作,但我不確定我是否可以即時完成.有沒有辦法連接到站點,在緩沖區(qū)中獲取數(shù)據(jù),可能進行一些數(shù)據(jù)提取并退出?
I know how to extract data/work on the downloaded file but I'm not sure if I can do it on the fly. Is there a way to connect to the site, get data in a buffer, possibly do some data extraction and exit?
嘗試 StringIO 時出現(xiàn)錯誤:
When trying StringIO I got the error:
>>> from ftplib import FTP
>>> from StringIO import StringIO
>>> ftp = FTP('ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/PMC-ids.csv.gz')
Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
ftp = FTP('ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/PMC-ids.csv.gz')
File "C:Python27libftplib.py", line 117, in __init__
self.connect(host)
File "C:Python27libftplib.py", line 132, in connect
self.sock = socket.create_connection((self.host, self.port), self.timeout)
File "C:Python27libsocket.py", line 553, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
gaierror: [Errno 11004] getaddrinfo failed
我只需要知道如何將數(shù)據(jù)放入某個變量并在其上循環(huán),直到讀取來自 FTP 的文件.
I just need to know how can I get data into some variable and loop on it until the file from FTP is read.
感謝您的寶貴時間和幫助.謝謝!
I appreciate your time and help. Thanks!
推薦答案
請務必先登錄ftp服務器.之后,使用 retrbinary
以二進制模式拉取文件.它對文件的每個塊使用回調(diào).您可以使用它來將其加載到字符串中.
Make sure to login to the ftp server first. After this, use retrbinary
which pulls the file in binary mode. It uses a callback on each chunk of the file. You can use this to load it into a string.
from ftplib import FTP
ftp = FTP('ftp.ncbi.nlm.nih.gov')
ftp.login() # Username: anonymous password: anonymous@
# Setup a cheap way to catch the data (could use StringIO too)
data = []
def handle_binary(more_data):
data.append(more_data)
resp = ftp.retrbinary("RETR pub/pmc/PMC-ids.csv.gz", callback=handle_binary)
data = "".join(data)
加分項:我們在解壓字符串時如何?
Bonus points: how about we decompress the string while we're at it?
簡單模式,使用上面的數(shù)據(jù)字符串
Easy mode, using data string above
import gzip
import StringIO
zippy = gzip.GzipFile(fileobj=StringIO.StringIO(data))
uncompressed_data = zippy.read()
稍微好一點,完整的解決方案:
from ftplib import FTP
import gzip
import StringIO
ftp = FTP('ftp.ncbi.nlm.nih.gov')
ftp.login() # Username: anonymous password: anonymous@
sio = StringIO.StringIO()
def handle_binary(more_data):
sio.write(more_data)
resp = ftp.retrbinary("RETR pub/pmc/PMC-ids.csv.gz", callback=handle_binary)
sio.seek(0) # Go back to the start
zippy = gzip.GzipFile(fileobj=sio)
uncompressed = zippy.read()
實際上,動態(tài)解壓縮會好得多,但我看不到使用內(nèi)置庫的方法(至少不容易).
In reality, it would be much better to decompress on the fly but I don't see a way to do that with the built in libraries (at least not easily).
這篇關于從 FTP python 讀取緩沖區(qū)中的文件的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網(wǎng)!