問題描述
看完之后在 YYYYMMDD 時解析日期和 HH 在 Python 中使用 pandas 在單獨的列中和使用pythonpandas 以年、日、小時、分鐘、秒格式解析 CSV
我仍然無法解析帶有年、月、日和小時分隔列的日期.我的數據是這樣的(第 0 列是 ID,第 1 列是年,第 2 列是月,第 3 列是天,第 4 列是小時,第 5 列是值)
I still am not able to parse dates with separated columns for year, month, day and hour. My data looks like this (zeroth column is ID, first is year, second is month, third is day, fourth is hour and fifth is value)
50136 2011 1 1 21 9792
50136 2011 1 1 22 9794
50136 2011 1 1 23 9796
50136 2011 1 1 0 9798
50136 2011 1 1 1 9799
50136 2011 1 1 2 9802
我嘗試過以下操作:df = pd.read_csv(file, parse_dates = {'date': [1, 2, 3, 4]}, , index_col='date')
,但是我得到的索引不是時間戳,而是作為 unicode(?)
I've tried following:
df = pd.read_csv(file, parse_dates = {'date': [1, 2, 3, 4]}, , index_col='date')
, but then I get index not as timestamp but as unicode(?)
In [17]: print df.head()
Out [17]:
0 5
date
2011 1 1 21 50136 9792
2011 1 1 22 50136 9794
2011 1 1 23 50136 9796
2011 1 1 0 50136 9798
2011 1 1 1 50136 9799
In [18]: print df.index
Out [18]:
Index([u'2011 1 1 21', u'2011 1 1 22', u'2011 1 1 23', u'2011 1 1 0', u'2011 1 1 1', u'2011 1 1 2'], dtype=object)
我顯然做錯了什么,但我無法弄清楚.任何建議都非常感謝.
I'm obviously doing something wrong, but I can't figure it out. Any advise is really appreciated.
推薦答案
如果常規方法不起作用,您總是可以退回到編寫自己的解析器.創建一個函數,它接受來自 parse_dates
的列并返回一個 datetime
并使用 date_parser
添加該函數.
If the regular methods dont work you can always fallback on writing your own parser. Make a function which accepts the columns from parse_dates
and returns a datetime
and add that functions with date_parser
.
比如:
df = pd.read_csv(file, header=None, index_col='datetime',
parse_dates={'datetime': [1,2,3,4]},
date_parser=lambda x: pd.datetime.strptime(x, '%Y %m %d %H'))
返回:
0 5
datetime
2011-01-01 21:00:00 50136 9792
2011-01-01 22:00:00 50136 9794
2011-01-01 23:00:00 50136 9796
2011-01-01 00:00:00 50136 9798
2011-01-01 01:00:00 50136 9799
2011-01-01 02:00:00 50136 9802
如果你把它寫成普通函數而不是 lambda,也許會更清楚:
edit:
Perhaps its more clear if you write it like a normal function instead of a lambda:
def dt_parse(date_string):
dt = pd.datetime.strptime(date_string, '%Y %m %d %H')
return dt
這篇關于使用python中的pandas解析年月日和小時在不同列中的日期的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!