問題描述
我正在嘗試在我的數據幀上使用 drop_duplicates 方法,但我得到了一個錯誤.請參閱以下內容:
<塊引用>錯誤:TypeError:不可散列的類型:'list'
我正在使用的代碼:
df = db.drop_duplicates()
我的數據庫很大,包含字符串、浮點數、日期、NaN、布爾值、整數...感謝任何幫助.
如錯誤消息所示,drop_duplicates 不適用于數據框中的列表.但是,您可以在轉換為 str 的數據幀上刪除重復項,然后使用結果中的索引從原始 df 中提取行.
設置
df = pd.DataFrame({'Keyword': {0: 'apply', 1: 'apply', 2: 'apply', 3: 'terms', 4: 'terms'},'X': {0: [1, 2], 1: [1, 2], 2: 'xy', 3: 'xx', 4: 'yy'},'Y':{0:'yy',1:'yy',2:'yx',3:'ix',4:'xi'}})#Drop直接導致同樣的錯誤df.drop_duplicates()回溯(最近一次通話最后):...類型錯誤:不可散列類型:列表"
解決方案
#convert hte df 為 str 類型,刪除重復項,然后從原始 df 中選擇行.df.loc[df.astype(str).drop_duplicates().index]輸出[205]:關鍵字 X Y0 應用 [1, 2] 是2 應用 xy yx3 學期 xx ix4 學期 yy xi#列表元素在最終結果中仍然是列表.df.loc[df.astype(str).drop_duplicates().index].loc[0,'X']輸出[207]:[1, 2]
<塊引用>
將 iloc 替換為 loc.在這種特殊情況下,兩者都作為index 匹配位置索引,但不通用
I am trying to use drop_duplicates method on my dataframe, but I am getting an error. See the following:
error: TypeError: unhashable type: 'list'
The code I am using:
df = db.drop_duplicates()
My DB is huge and contains strings, floats, dates, NaN's, booleans, integers... Any help is appreciated.
drop_duplicates won't work with lists in your dataframe as the error message implies. However, you can drop duplicates on the dataframe casted as str and then extract the rows from original df using the index from the results.
Setup
df = pd.DataFrame({'Keyword': {0: 'apply', 1: 'apply', 2: 'apply', 3: 'terms', 4: 'terms'},
'X': {0: [1, 2], 1: [1, 2], 2: 'xy', 3: 'xx', 4: 'yy'},
'Y': {0: 'yy', 1: 'yy', 2: 'yx', 3: 'ix', 4: 'xi'}})
#Drop directly causes the same error
df.drop_duplicates()
Traceback (most recent call last):
...
TypeError: unhashable type: 'list'
Solution
#convert hte df to str type, drop duplicates and then select the rows from original df.
df.loc[df.astype(str).drop_duplicates().index]
Out[205]:
Keyword X Y
0 apply [1, 2] yy
2 apply xy yx
3 terms xx ix
4 terms yy xi
#the list elements are still list in the final results.
df.loc[df.astype(str).drop_duplicates().index].loc[0,'X']
Out[207]: [1, 2]
Edit: replaced iloc with loc. In this particular case, both work as the index matches the positional index, but it is not general
這篇關于Pandas drop_duplicates 方法不適用于包含列表的數據框的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!