問題描述
我正在尋找一種簡單的方法來保存源自已發布的 Google 表格文檔的 csv 文件?由于它已發布,因此可以通過直接鏈接訪問(在下面的示例中特意修改).
I am looking for a simple way to save a csv file originating from a published Google Sheets document? Since it's published, it's accessible through a direct link (modified on purpose in the example below).
一旦我啟動鏈接,我的所有瀏覽器都會提示我保存 csv 文件.
All my browsers will prompt me to save the csv file as soon as I launch the link.
都不是:
DOC_URL = 'https://docs.google.com/spreadsheet/ccc?key=0AoOWveO-dNo5dFNrWThhYmdYW9UT1lQQkE&output=csv'
f = urllib.request.urlopen(DOC_URL)
cont = f.read(SIZE)
f.close()
cont = str(cont, 'utf-8')
print(cont)
,也不是:
req = urllib.request.Request(DOC_URL)
req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.13 (KHTML, like Gecko) Chrome/24.0.1284.0 Safari/537.13')
f = urllib.request.urlopen(req)
print(f.read().decode('utf-8'))
打印除 html 內容之外的任何內容.
print anything but html content.
(在閱讀其他帖子后嘗試了第二個版本:使用 python 將谷歌文檔公共電子表格下載到 csv.)
(Tried the 2nd version after reading this other post: Download google docs public spreadsheet to csv with python .)
知道我做錯了什么嗎?我已經退出了我的 Google 帳戶,如果這值得的話,但這適用于我嘗試過的任何瀏覽器.據我了解,Google Docs API 還沒有移植到 Python 3 上,并且考慮到我個人使用的小項目的玩具"規模,從一開始就使用它甚至沒有太大意義,如果我可以繞過它.
Any idea on what I am doing wrong? I am logged out of my Google account, if that worths to anything, but this works from any browser that I tried. As far as I understood, the Google Docs API is not yet ported on Python 3 and given the "toy" magnitude of my little project for personal use, it would not even make too much sense to use it from the get-go, if I can circumvent it.
在第二次嘗試中,我離開了用戶代理",因為我在想可能被認為來自腳本的請求(b/c 不存在標識信息)可能會被忽略,但它沒有產生區別.
In the 2nd attempt, I left the 'User-Agent', as I was thinking that maybe requests thought as coming from scripts (b/c no identification info is present) might be ignored, but it didn't make a difference.
推薦答案
Google 通過一系列 cookie 設置 302 重定向響應初始請求.如果您不存儲并在請求之間重新提交 cookie,它會將您重定向到登錄頁面.
Google responds to the initial request with a series of cookie-setting 302 redirects. If you don't store and resubmit the cookies between requests, it redirects you to the login page.
所以,問題不在于 User-Agent 標頭,而是默認情況下,urllib.request.urlopen
不存儲 cookie,但它會遵循 HTTP 302 重定向.
So, the problem is not with the User-Agent header, it's the fact that by default, urllib.request.urlopen
doesn't store cookies, but it will follow the HTTP 302 redirects.
以下代碼在 DOC_URL
指定位置的公共電子表格上運行良好:
The following code works just fine on a public spreadsheet available at the location specified by DOC_URL
:
>>> from http.cookiejar import CookieJar
>>> from urllib.request import build_opener, HTTPCookieProcessor
>>> opener = build_opener(HTTPCookieProcessor(CookieJar()))
>>> resp = opener.open(DOC_URL)
>>> # should really parse resp.getheader('content-type') for encoding.
>>> csv_content = resp.read().decode('utf-8')
<小時>
已經向您展示了如何在 vanilla python 中執行此操作,我現在要說正確的方法是使用最優秀的 請求庫.它是非常有據可查的,讓這些任務完成起來非常愉快.
Having shown you how to do it in vanilla python, I'll now say that the Right Way? to go about this is to use the most-excellent requests library. It is extremely well documented and makes these sorts of tasks incredibly pleasant to complete.
例如,使用 requests
庫獲得與上述相同的 csv_content
非常簡單:
For instance, to get the same csv_content
as above using the requests
library is as simple as:
>>> import requests
>>> csv_content = requests.get(DOC_URL).text
那一行更清楚地表達了您的意圖.它更容易編寫和閱讀.做你自己 - 以及任何分享你代碼庫的其他人 - 一個忙,只需使用 requests
.
That single line expresses your intent more clearly. It's easier to write and easier to read. Do yourself - and anyone else who shares your codebase - a favor and just use requests
.
這篇關于如何從 Python 3(或 2)將 Google 表格文件保存為 CSV?的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!