問題描述
我在 pandas 中有一個名為munged_data"的數據框,其中包含兩列entry_date"和dob",我已使用 pd.to_timestamp 將其轉換為時間戳.我試圖弄清楚如何根據時間計算人的年齡'entry_date' 和 'dob' 之間的區別,要做到這一點,我需要得到兩列之間的天數差異(這樣我就可以像 round(days/365.25) 一樣做一些事情.我似乎無法找到一種使用矢量化操作的方法.當我執行 munged_data.entry_date-munged_data.dob 時,我得到以下信息:
I have a dataframe in pandas called 'munged_data' with two columns 'entry_date' and 'dob' which i have converted to Timestamps using pd.to_timestamp.I am trying to figure out how to calculate ages of people based on the time difference between 'entry_date' and 'dob' and to do this i need to get the difference in days between the two columns ( so that i can then do somehting like round(days/365.25). I do not seem to be able to find a way to do this using a vectorized operation. When I do munged_data.entry_date-munged_data.dob i get the following :
internal_quote_id
2 15685977 days, 23:54:30.457856
3 11651985 days, 23:49:15.359744
4 9491988 days, 23:39:55.621376
7 11907004 days, 0:10:30.196224
9 15282164 days, 23:30:30.196224
15 15282227 days, 23:50:40.261632
但是我似乎無法將天數提取為整數,以便我可以繼續計算.任何幫助表示贊賞.
However i do not seem to be able to extract the days as an integer so that i can continue with my calculation. Any help appreciated.
推薦答案
你需要 0.11 這個(0.11rc1 已經出來了,下周最后的問題)
You need 0.11 for this (0.11rc1 is out, final prob next week)
In [9]: df = DataFrame([ Timestamp('20010101'), Timestamp('20040601') ])
In [10]: df
Out[10]:
0
0 2001-01-01 00:00:00
1 2004-06-01 00:00:00
In [11]: df = DataFrame([ Timestamp('20010101'),
Timestamp('20040601') ],columns=['age'])
In [12]: df
Out[12]:
age
0 2001-01-01 00:00:00
1 2004-06-01 00:00:00
In [13]: df['today'] = Timestamp('20130419')
In [14]: df['diff'] = df['today']-df['age']
In [16]: df['years'] = df['diff'].apply(lambda x: float(x.item().days)/365)
In [17]: df
Out[17]:
age today diff years
0 2001-01-01 00:00:00 2013-04-19 00:00:00 4491 days, 00:00:00 12.304110
1 2004-06-01 00:00:00 2013-04-19 00:00:00 3244 days, 00:00:00 8.887671
最后你需要這個奇怪的應用程序,因為還沒有完全支持 timedelta64[ns] 標量(例如,我們現在如何使用時間戳來處理 datetime64[ns],在 0.12 中)
You need this odd apply at the end because not yet full support for timedelta64[ns] scalars (e.g. like how we use Timestamps now for datetime64[ns], coming in 0.12)
這篇關于Pandas Timedelta 以天為單位的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!