因此,我希望将两组功能进行分类(分类),然后合并以创建新功能。它与将坐标分类为地图上的网格一样。
问题在于要素的分布不均匀,并且pandas.qcut()在对两个要素/坐标进行装仓(如时)时,我想使用分位数。
有没有比同时qcut()使用两个功能和然后将结果标签串联更好的方法了?
创建一个笛卡尔积。
考虑数据框 df
df = pd.DataFrame(dict(A=np.random.rand(20), B=np.random.rand(20))) A B0 0.538186 0.0389851 0.185523 0.4383292 0.652151 0.0673593 0.746060 0.7746884 0.373741 0.0095265 0.603536 0.1497336 0.775801 0.5853097 0.091238 0.8118288 0.504035 0.6390039 0.671320 0.13297410 0.619939 0.88337211 0.301644 0.88225812 0.956463 0.39194213 0.702457 0.09961914 0.367810 0.07161215 0.454935 0.65163116 0.882029 0.01564217 0.880251 0.34838618 0.496250 0.60634619 0.805688 0.401578
df = pd.DataFrame(dict(A=np.random.rand(20), B=np.random.rand(20)))
A B
0 0.538186 0.038985
1 0.185523 0.438329
2 0.652151 0.067359
3 0.746060 0.774688
4 0.373741 0.009526
5 0.603536 0.149733
6 0.775801 0.585309
7 0.091238 0.811828
8 0.504035 0.639003
9 0.671320 0.132974
10 0.619939 0.883372
11 0.301644 0.882258
12 0.956463 0.391942
13 0.702457 0.099619
14 0.367810 0.071612
15 0.454935 0.651631
16 0.882029 0.015642
17 0.880251 0.348386
18 0.496250 0.606346
19 0.805688 0.401578
我们可以创建新的分类 pd.qcut
d1 = df.assign( A_cut=pd.qcut(df.A, 2, labels=[1, 2]), B_cut=pd.qcut(df.B, 2, labels=list('ab'))) A B A_cut B_cut0 0.538186 0.038985 1 a1 0.185523 0.438329 1 b2 0.652151 0.067359 2 a3 0.746060 0.774688 2 b4 0.373741 0.009526 1 a5 0.603536 0.149733 1 a6 0.775801 0.585309 2 b7 0.091238 0.811828 1 b8 0.504035 0.639003 1 b9 0.671320 0.132974 2 a10 0.619939 0.883372 2 b11 0.301644 0.882258 1 b12 0.956463 0.391942 2 a13 0.702457 0.099619 2 a14 0.367810 0.071612 1 a15 0.454935 0.651631 1 b16 0.882029 0.015642 2 a17 0.880251 0.348386 2 a18 0.496250 0.606346 1 b19 0.805688 0.401578 2 b
d1 = df.assign(
A_cut=pd.qcut(df.A, 2, labels=[1, 2]),
B_cut=pd.qcut(df.B, 2, labels=list('ab'))
)
A B A_cut B_cut
0 0.538186 0.038985 1 a
1 0.185523 0.438329 1 b
2 0.652151 0.067359 2 a
3 0.746060 0.774688 2 b
4 0.373741 0.009526 1 a
5 0.603536 0.149733 1 a
6 0.775801 0.585309 2 b
7 0.091238 0.811828 1 b
8 0.504035 0.639003 1 b
9 0.671320 0.132974 2 a
10 0.619939 0.883372 2 b
11 0.301644 0.882258 1 b
12 0.956463 0.391942 2 a
13 0.702457 0.099619 2 a
14 0.367810 0.071612 1 a
15 0.454935 0.651631 1 b
16 0.882029 0.015642 2 a
17 0.880251 0.348386 2 a
18 0.496250 0.606346 1 b
19 0.805688 0.401578 2 b
您可以使用元组创建笛卡尔乘积
d2 = d1.assign(cartesian=pd.Categorical(d1.filter(regex='_cut').apply(tuple, 1)))print(d2) A B A_cut B_cut cartesian0 0.538186 0.038985 1 a (1, a)1 0.185523 0.438329 1 b (1, b)2 0.652151 0.067359 2 a (2, a)3 0.746060 0.774688 2 b (2, b)4 0.373741 0.009526 1 a (1, a)5 0.603536 0.149733 1 a (1, a)6 0.775801 0.585309 2 b (2, b)7 0.091238 0.811828 1 b (1, b)8 0.504035 0.639003 1 b (1, b)9 0.671320 0.132974 2 a (2, a)10 0.619939 0.883372 2 b (2, b)11 0.301644 0.882258 1 b (1, b)12 0.956463 0.391942 2 a (2, a)13 0.702457 0.099619 2 a (2, a)14 0.367810 0.071612 1 a (1, a)15 0.454935 0.651631 1 b (1, b)16 0.882029 0.015642 2 a (2, a)17 0.880251 0.348386 2 a (2, a)18 0.496250 0.606346 1 b (1, b)19 0.805688 0.401578 2 b (2, b)
d2 = d1.assign(cartesian=pd.Categorical(d1.filter(regex='_cut').apply(tuple, 1)))
print(d2)
A B A_cut B_cut cartesian
0 0.538186 0.038985 1 a (1, a)
1 0.185523 0.438329 1 b (1, b)
2 0.652151 0.067359 2 a (2, a)
3 0.746060 0.774688 2 b (2, b)
4 0.373741 0.009526 1 a (1, a)
5 0.603536 0.149733 1 a (1, a)
6 0.775801 0.585309 2 b (2, b)
7 0.091238 0.811828 1 b (1, b)
8 0.504035 0.639003 1 b (1, b)
9 0.671320 0.132974 2 a (2, a)
10 0.619939 0.883372 2 b (2, b)
11 0.301644 0.882258 1 b (1, b)
12 0.956463 0.391942 2 a (2, a)
13 0.702457 0.099619 2 a (2, a)
14 0.367810 0.071612 1 a (1, a)
15 0.454935 0.651631 1 b (1, b)
16 0.882029 0.015642 2 a (2, a)
17 0.880251 0.348386 2 a (2, a)
18 0.496250 0.606346 1 b (1, b)
19 0.805688 0.401578 2 b (2, b)
如果您愿意,甚至可以为他们声明订购。