問題描述
我正在尋找一種與 SQL 等效的方法
I'm looking for a way to do the equivalent to the SQL
SELECT DISTINCT col1, col2 FROM dataframe_table
pandas sql 比較沒有關于 distinct
的任何內容.
The pandas sql comparison doesn't have anything about distinct
.
.unique()
僅適用于單個列,所以我想我可以連接這些列,或者將它們放在列表/元組中并以這種方式進行比較,但這似乎是熊貓應該做的以更本土的方式進行.
.unique()
only works for a single column, so I suppose I could concat the columns, or put them in a list/tuple and compare that way, but this seems like something pandas should do in a more native way.
我是否遺漏了一些明顯的東西,或者沒有辦法做到這一點?
Am I missing something obvious, or is there no way to do this?
推薦答案
您可以使用drop_duplicates
方法來獲取 DataFrame 中的唯一行:
You can use the drop_duplicates
method to get the unique rows in a DataFrame:
In [29]: df = pd.DataFrame({'a':[1,2,1,2], 'b':[3,4,3,5]})
In [30]: df
Out[30]:
a b
0 1 3
1 2 4
2 1 3
3 2 5
In [32]: df.drop_duplicates()
Out[32]:
a b
0 1 3
1 2 4
3 2 5
如果您只想使用某些列來確定唯一性,您還可以提供 subset
關鍵字參數.請參閱文檔字符串.
You can also provide the subset
keyword argument if you only want to use certain columns to determine uniqueness. See the docstring.
這篇關于如何“選擇不同的"?跨越 pandas 中的多個數據框列?的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!