問題描述
我有一個熊貓數據框如下:
I have a pandas dataframe as follows:
A B C
1 2 x
1 2 y
3 4 z
3 5 x
我希望只剩下 1 行在特定列中共享相同值的行.在上面的示例中,我的意思是列 A 和 B.換句話說,如果列 A 和 B 的值在數據框中多次出現,則應該只保留一行(哪一行無關緊要).
I want that only 1 row remains of rows that share the same values in specific columns. In the example above I mean columns A and B. In other words, if the values of columns A and B occur more than once in the dataframe, only one row should remain (which one does not matter).
FWIW:所謂重復行的最大數量(即列A和B相同)為2.
FWIW: the maximum number of so called duplicate rows (that is, where column A and B are the same) is 2.
結果應該是這樣的:
A B C
1 2 x
3 4 z
3 5 x
或
A B C
1 2 y
3 4 z
3 5 x
推薦答案
使用 drop_duplicates
和參數 subset
,為了只保留最后重復的行添加 keep='last'
:p>
Use drop_duplicates
with parameter subset
, for keeping only last duplicated rows add keep='last'
:
df1 = df.drop_duplicates(subset=['A','B'])
#same as
#df1 = df.drop_duplicates(subset=['A','B'], keep='first')
print (df1)
A B C
0 1 2 x
2 3 4 z
3 3 5 x
<小時>
df2 = df.drop_duplicates(subset=['A','B'], keep='last')
print (df2)
A B C
1 1 2 y
2 3 4 z
3 3 5 x
這篇關于從 Pandas 數據框中刪除只有某些列具有相同值的重復行的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!