問題描述
我想創建一個年度排名(所以在 2012 年,經理 B 為 1.在 2011 年,經理 B 再次為 1).我在 pandas rank 函數上苦苦掙扎了一段時間,不想訴諸 for 循環.
I would like to create a rank on year (so in year 2012, Manager B is 1. In 2011, Manager B is 1 again). I struggled with the pandas rank function for awhile and DO NOT want to resort to a for loop.
s = pd.DataFrame([['2012','A',3],['2012','B',8],['2011','A',20],['2011','B',30]], columns=['Year','Manager','Return'])
Out[1]:
Year Manager Return
0 2012 A 3
1 2012 B 8
2 2011 A 20
3 2011 B 30
<小時>
我遇到的問題是附加代碼(之前認為這無關緊要):
The issue I'm having is with the additional code (didn't think this would be relevant before):
s = pd.DataFrame([['2012', 'A', 3], ['2012', 'B', 8], ['2011', 'A', 20], ['2011', 'B', 30]], columns=['Year', 'Manager', 'Return'])
b = pd.DataFrame([['2012', 'A', 3], ['2012', 'B', 8], ['2011', 'A', 20], ['2011', 'B', 30]], columns=['Year', 'Manager', 'Return'])
s = s.append(b)
s['Rank'] = s.groupby(['Year'])['Return'].rank(ascending=False)
raise Exception('Reindexing only valid with uniquely valued Index '
Exception: Reindexing only valid with uniquely valued Index objects
有什么想法嗎?
這是我正在使用的真實數據結構.重新索引時遇到問題..
Any ideas?
This is the real data structure I am using.
Been having trouble re-indexing..
推薦答案
聽起來你想按Year
分組,然后按降序排列Returns
.
It sounds like you want to group by the Year
, then rank the Returns
in descending order.
import pandas as pd
s = pd.DataFrame([['2012', 'A', 3], ['2012', 'B', 8], ['2011', 'A', 20], ['2011', 'B', 30]],
columns=['Year', 'Manager', 'Return'])
s['Rank'] = s.groupby(['Year'])['Return'].rank(ascending=False)
print(s)
產量
Year Manager Return Rank
0 2012 A 3 2
1 2012 B 8 1
2 2011 A 20 2
3 2011 B 30 1
<小時>
解決 OP 修改后的問題:錯誤消息
To address the OP's revised question: The error message
ValueError: cannot reindex from a duplicate axis
在嘗試對索引中具有重復值的 DataFrame 進行 groupby/rank
時發生.您可以通過構造 s
在追加后具有唯一索引值來避免該問題:
occurs when trying to groupby/rank
on a DataFrame with duplicate values in the index. You can avoid the problem by constructing s
to have unique index values after appending:
s = pd.DataFrame([['2012', 'A', 3], ['2012', 'B', 8], ['2011', 'A', 20], ['2011', 'B', 30]], columns=['Year', 'Manager', 'Return'])
b = pd.DataFrame([['2012', 'A', 3], ['2012', 'B', 8], ['2011', 'A', 20], ['2011', 'B', 30]], columns=['Year', 'Manager', 'Return'])
s = s.append(b, ignore_index=True)
產量
Year Manager Return
0 2012 A 3
1 2012 B 8
2 2011 A 20
3 2011 B 30
4 2012 A 3
5 2012 B 8
6 2011 A 20
7 2011 B 30
<小時>
如果您已經使用
If you've already appended new rows using
s = s.append(b)
然后使用 reset_index
創建唯一索引:
then use reset_index
to create a unique index:
s = s.reset_index(drop=True)
這篇關于Pandas 按年份分組,按銷售列排名,在具有重復數據的數據框中的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!