久久久久久久av_日韩在线中文_看一级毛片视频_日本精品二区_成人深夜福利视频_武道仙尊动漫在线观看

如何在 python 中使用 pandas 獲取所有重復項的列表

How do I get a list of all the duplicate items using pandas in python?(如何在 python 中使用 pandas 獲取所有重復項的列表?)
本文介紹了如何在 python 中使用 pandas 獲取所有重復項的列表?的處理方法,對大家解決問題具有一定的參考價值,需要的朋友們下面隨著小編來一起學習吧!

問題描述

我有一份可能存在出口問題的物品清單.我想獲取重復項目的列表,以便手動比較它們.當我嘗試使用 pandas 重復方法時,它只返回第一個復制.有沒有辦法獲取所有重復項,而不僅僅是第一個?

I have a list of items that likely has some export issues. I would like to get a list of the duplicate items so I can manually compare them. When I try to use pandas duplicated method, it only returns the first duplicate. Is there a a way to get all of the duplicates and not just the first one?

我的數據集的一小部分如下所示:

A small subsection of my dataset looks like this:

ID,ENROLLMENT_DATE,TRAINER_MANAGING,TRAINER_OPERATOR,FIRST_VISIT_DATE
1536D,12-Feb-12,"06DA1B3-Lebanon NH",,15-Feb-12
F15D,18-May-12,"06405B2-Lebanon NH",,25-Jul-12
8096,8-Aug-12,"0643D38-Hanover NH","0643D38-Hanover NH",25-Jun-12
A036,1-Apr-12,"06CB8CF-Hanover NH","06CB8CF-Hanover NH",9-Aug-12
8944,19-Feb-12,"06D26AD-Hanover NH",,4-Feb-12
1004E,8-Jun-12,"06388B2-Lebanon NH",,24-Dec-11
11795,3-Jul-12,"0649597-White River VT","0649597-White River VT",30-Mar-12
30D7,11-Nov-12,"06D95A3-Hanover NH","06D95A3-Hanover NH",30-Nov-11
3AE2,21-Feb-12,"06405B2-Lebanon NH",,26-Oct-12
B0FE,17-Feb-12,"06D1B9D-Hartland VT",,16-Feb-12
127A1,11-Dec-11,"064456E-Hanover NH","064456E-Hanover NH",11-Nov-12
161FF,20-Feb-12,"0643D38-Hanover NH","0643D38-Hanover NH",3-Jul-12
A036,30-Nov-11,"063B208-Randolph VT","063B208-Randolph VT",
475B,25-Sep-12,"06D26AD-Hanover NH",,5-Nov-12
151A3,7-Mar-12,"06388B2-Lebanon NH",,16-Nov-12
CA62,3-Jan-12,,,
D31B,18-Dec-11,"06405B2-Lebanon NH",,9-Jan-12
20F5,8-Jul-12,"0669C50-Randolph VT",,3-Feb-12
8096,19-Dec-11,"0649597-White River VT","0649597-White River VT",9-Apr-12
14E48,1-Aug-12,"06D3206-Hanover NH",,
177F8,20-Aug-12,"063B208-Randolph VT","063B208-Randolph VT",5-May-12
553E,11-Oct-12,"06D95A3-Hanover NH","06D95A3-Hanover NH",8-Mar-12
12D5F,18-Jul-12,"0649597-White River VT","0649597-White River VT",2-Nov-12
C6DC,13-Apr-12,"06388B2-Lebanon NH",,
11795,27-Feb-12,"0643D38-Hanover NH","0643D38-Hanover NH",19-Jun-12
17B43,11-Aug-12,,,22-Oct-12
A036,11-Aug-12,"06D3206-Hanover NH",,19-Jun-12

我的代碼目前看起來像這樣:

My code looks like this currently:

df_bigdata_duplicates = df_bigdata[df_bigdata.duplicated(cols='ID')]

有幾個重復的項目.但是,當我使用上面的代碼時,我只得到第一項.在 API 參考中,我看到了如何獲得最后一項,但我想擁有所有這些,這樣我就可以直觀地檢查它們以了解為什么會出現差異.所以,在這個例子中,我想獲取所有三個 A036 條目和 11795 條目以及任何其他重復的條目,而不是第一個.非常感謝任何幫助.

There area a couple duplicate items. But, when I use the above code, I only get the first item. In the API reference, I see how I can get the last item, but I would like to have all of them so I can visually inspect them to see why I am getting the discrepancy. So, in this example I would like to get all three A036 entries and both 11795 entries and any other duplicated entries, instead of the just first one. Any help is most appreciated.

推薦答案

方法#1:打印ID為重復ID之一的所有行:

Method #1: print all rows where the ID is one of the IDs in duplicated:

>>> import pandas as pd
>>> df = pd.read_csv("dup.csv")
>>> ids = df["ID"]
>>> df[ids.isin(ids[ids.duplicated()])].sort("ID")
       ID ENROLLMENT_DATE        TRAINER_MANAGING        TRAINER_OPERATOR FIRST_VISIT_DATE
24  11795       27-Feb-12      0643D38-Hanover NH      0643D38-Hanover NH        19-Jun-12
6   11795        3-Jul-12  0649597-White River VT  0649597-White River VT        30-Mar-12
18   8096       19-Dec-11  0649597-White River VT  0649597-White River VT         9-Apr-12
2    8096        8-Aug-12      0643D38-Hanover NH      0643D38-Hanover NH        25-Jun-12
12   A036       30-Nov-11     063B208-Randolph VT     063B208-Randolph VT              NaN
3    A036        1-Apr-12      06CB8CF-Hanover NH      06CB8CF-Hanover NH         9-Aug-12
26   A036       11-Aug-12      06D3206-Hanover NH                     NaN        19-Jun-12

但我想不出一個很好的方法來防止重復 ids 這么多次.我更喜歡方法 #2:ID 上的 groupby.

but I couldn't think of a nice way to prevent repeating ids so many times. I prefer method #2: groupby on the ID.

>>> pd.concat(g for _, g in df.groupby("ID") if len(g) > 1)
       ID ENROLLMENT_DATE        TRAINER_MANAGING        TRAINER_OPERATOR FIRST_VISIT_DATE
6   11795        3-Jul-12  0649597-White River VT  0649597-White River VT        30-Mar-12
24  11795       27-Feb-12      0643D38-Hanover NH      0643D38-Hanover NH        19-Jun-12
2    8096        8-Aug-12      0643D38-Hanover NH      0643D38-Hanover NH        25-Jun-12
18   8096       19-Dec-11  0649597-White River VT  0649597-White River VT         9-Apr-12
3    A036        1-Apr-12      06CB8CF-Hanover NH      06CB8CF-Hanover NH         9-Aug-12
12   A036       30-Nov-11     063B208-Randolph VT     063B208-Randolph VT              NaN
26   A036       11-Aug-12      06D3206-Hanover NH                     NaN        19-Jun-12

這篇關于如何在 python 中使用 pandas 獲取所有重復項的列表?的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!

【網站聲明】本站部分內容來源于互聯網,旨在幫助大家更快的解決問題,如果有圖片或者內容侵犯了您的權益,請聯系我們刪除處理,感謝您的支持!

相關文檔推薦

How to draw a rectangle around a region of interest in python(如何在python中的感興趣區域周圍繪制一個矩形)
How can I detect and track people using OpenCV?(如何使用 OpenCV 檢測和跟蹤人員?)
How to apply threshold within multiple rectangular bounding boxes in an image?(如何在圖像的多個矩形邊界框中應用閾值?)
How can I download a specific part of Coco Dataset?(如何下載 Coco Dataset 的特定部分?)
Detect image orientation angle based on text direction(根據文本方向檢測圖像方向角度)
Detect centre and angle of rectangles in an image using Opencv(使用 Opencv 檢測圖像中矩形的中心和角度)
主站蜘蛛池模板: 男女网站免费观看 | 一级片在线免费播放 | 久久国产欧美日韩精品 | 久久婷婷国产麻豆91 | 亚洲区一区二 | 久久久久亚洲精品 | 一区二区三区在线观看视频 | 国产精品久久久久久福利一牛影视 | 日本久久一区 | 永久网站 | 在线色网 | 天天综合天天 | 永久免费av| 妞干网av| 成人精品在线观看 | 亚洲综合视频 | 夏同学福利网 | 久久精品国内 | 国产中文原创 | 一区二区三区亚洲视频 | 日韩电影免费观看中文字幕 | 欧美日韩国产中文 | 亚洲欧美男人天堂 | 欧美成人二区 | 一区二区三区视频在线免费观看 | 国产区在线免费观看 | 国产精品亚洲片在线播放 | 日韩区| 欧美日韩一区二区在线播放 | 国产亚洲一区二区精品 | 午夜精品一区二区三区在线 | 日韩电影a| 成在线人视频免费视频 | 91伊人| 国产午夜三级一区二区三 | 精品久久久久久久 | 羞羞网站免费 | 天天操天天干天天爽 | 欧美日韩国产一区 | 国产精品不卡一区 | 国产美女免费视频 |