問題描述
我有一個包含一些(數(shù)億)行的數(shù)據(jù)框.我想有效地將??日期時間轉(zhuǎn)換為時間戳.我該怎么做?
我的示例df
:
現(xiàn)在我使用 .apply()
將日期時間逐個轉(zhuǎn)換為時間戳值,但如果我有一些(數(shù)億)行,則需要很長時間(幾個小時):
上面的結(jié)果就是我想要的.
如果我嘗試使用 pandas.Series
的 .dt
訪問器,則會收到錯誤消息:
<塊引用>
AttributeError: 'DatetimeProperties' 對象沒有屬性'時間戳'
如果我嘗試創(chuàng)建例如.使用 .dt
訪問器的日期時間的日期部分比使用 .apply()
快得多:
我想要類似時間戳的東西...
但我不太了解官方文檔:它談到轉(zhuǎn)換為時間戳" 但我沒有看到任何時間戳;它只是談?wù)撌褂?pd.to_datetime()
轉(zhuǎn)換為日期時間,而不是時間戳...
pandas.Timestamp
構(gòu)造函數(shù)也不起作用(返回以下錯誤):
<塊引用>
TypeError:無法將輸入轉(zhuǎn)換為時間戳
pandas.Series.to_timestamp 也做出了我想要的完全不同的東西:
df['ts3'] = df['datetime'].to_timestampdf.head()日期時間 ts ts30 2016-01-01 00:00:01 1451602801 <綁定方法 Series.to_timestamp of 0 2016...1 2016-01-01 01:00:01 1451606401 <綁定方法 Series.to_timestamp of 0 2016...2 2016-01-01 02:00:01 1451610001 <綁定方法 Series.to_timestamp of 0 2016...3 2016-01-01 03:00:01 1451613601 <綁定方法 Series.to_timestamp of 0 2016...4 2016-01-01 04:00:01 1451617201 <綁定方法 Series.to_timestamp of 0 2016...
謝謝??!
解決方案
我覺得你需要先轉(zhuǎn)換成
numpy array
by values
并轉(zhuǎn)換為 int64
- 輸出在 ns
,所以需要除以10 ** 9
:
df['ts'] = df.datetime.values.astype(np.int64)//10 ** 9打印 (df)日期時間 ts0 2016-01-01 00:00:01 14516064011 2016-01-01 01:00:01 14516100012 2016-01-01 02:00:01 14516136013 2016-01-01 03:00:01 14516172014 2016-01-01 04:00:01 14516208015 2016-01-01 05:00:01 14516244016 2016-01-01 06:00:01 14516280017 2016-01-01 07:00:01 14516316018 2016-01-01 08:00:01 14516352019 2016-01-01 09:00:01 145163880110 2016-01-01 10:00:01 145164240111 2016-01-01 11:00:01 145164600112 2016-01-01 12:00:01 145164960113 2016-01-01 13:00:01 145165320114 2016-01-01 14:00:01 145165680115 2016-01-01 15:00:01 145166040116 2016-01-01 16:00:01 145166400117 2016-01-01 17:00:01 145166760118 2016-01-01 18:00:01 145167120119 2016-01-01 19:00:01 145167480120 2016-01-01 20:00:01 145167840121 2016-01-01 21:00:01 145168200122 2016-01-01 22:00:01 145168560123 2016-01-01 23:00:01 145168920124 2016-01-02 00:00:01 1451692801
to_timestamp
用于將 從周期索引轉(zhuǎn)換為日期時間索引.
I have a dataframe with some (hundreds of) million of rows. And I want to convert datetime to timestamp effectively. How can I do it?
My sample df
:
Now I convert datetime to timestamp value-by-value with .apply()
but it takes a very long time (some hours) if I have some (hundreds of) million rows:
The above result is what I want.
If I try to use the .dt
accessor of pandas.Series
then I get error message:
AttributeError: 'DatetimeProperties' object has no attribute
'timestamp'
If I try to create eg. the date parts of datetimes with the .dt
accessor then it is much more faster then using .apply()
:
I want something similar with timestamps...
But I don't really understand the official documentation: it talks about "Converting to Timestamps" but I don't see any timestamps there; it just talks about converting to datetime with pd.to_datetime()
but not to timestamp...
pandas.Timestamp
constructor also doesn't work (returns with the below error):
TypeError: Cannot convert input to Timestamp
pandas.Series.to_timestamp
also makes something totally different that I want:
Thank you!!
解決方案 I think you need convert first to numpy array
by values
and cast to int64
- output is in ns
, so need divide by 10 ** 9
:
to_timestamp
is used for converting from period to datetime index.
這篇關(guān)于Python pandas 通過 dt 訪問器有效地將日期時間轉(zhuǎn)換為時間戳的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網(wǎng)!
【網(wǎng)站聲明】本站部分內(nèi)容來源于互聯(lián)網(wǎng),旨在幫助大家更快的解決問題,如果有圖片或者內(nèi)容侵犯了您的權(quán)益,請聯(lián)系我們刪除處理,感謝您的支持!