python - Pandas GroupBy on subsets of same DataFrame -
this question extension my before one. have pandas dataframe:
import pandas pd codes = ["one","two","three"]; colours = ["black", "white"]; textures = ["soft", "hard"]; n= 100 # length of dataframe df = pd.dataframe({ 'id' : range(1,n+1), 'weeks_elapsed' : [random.choice(range(1,25)) in range(1,n+1)], 'code' : [random.choice(codes) in range(1,n+1)], 'colour': [random.choice(colours) in range(1,n+1)], 'texture': [random.choice(textures) in range(1,n+1)], 'size': [random.randint(1,100) in range(1,n+1)], 'scaled_size': [random.randint(100,1000) in range(1,n+1)] }, columns= ['id', 'weeks_elapsed', 'code','colour', 'texture', 'size', 'scaled_size']) i grouping colour , code , statistics on size , scaled_size below:
grouped = df.groupby(['code', 'colour']).agg( {'size': [np.sum, np.average, np.size, pd.series.idxmax],'scaled_size': [np.sum, np.average, np.size, pd.series.idxmax]}).reset_index() now, want run above calculations on df multiple times different weeks_elapsed intervals. below brute-force solution, there more succint , faster way run this? also, how can concatenate results different intervals in single dataframe?
cut_offs= [4,12] grouped = {c:{} c in cut_offs} c in cut_offs: grouped[c] =df.ix[df.weeks_elapsed <= c ].groupby(['code', 'colour']).agg( {'size': [np.sum, np.average, np.size,pd.series.idxmax], 'scaled_size': [np.sum, np.average, np.size, pd.series.idxmax] }).reset_index() i particularly interested in np.avg , np.size different weeks_elapsed intervals.
so not working answer, maybe can extended ultimatively there.
filter = array([12, 4]) f in filter: df.loc[(df['weeks_elapsed'] <= f), 'filter'] = f now, df looks like
>>> df.head() out[384]: id weeks_elapsed code colour texture size adjusted_size filter 0 1 20 1 white soft 64 494 nan 1 2 3 3 white hard 22 650 4 2 3 22 2 black hard 41 770 nan 3 4 2 2 black hard 4 325 4 4 5 4 2 black hard 19 536 4 where filter contains smallest grouping row belong to. next step
>>> df.groupby(['filter', 'code', 'colour']).agg({'size': [np.sum, np.average, np.size, pd.series.idxmax], 'adjusted_size': [np.sum, np.average, np.size, pd.series.idxmax]} ).reset_index() out[387]: filter code colour adjusted_size size \ sum average size idxmax sum 0 4 1 black 2195 548.750000 4 45 142 1 4 1 white 286 286.000000 1 81 58 2 4 3 black 927 463.500000 2 99 121 3 4 3 white 5850 585.000000 10 95 511 4 4 2 black 1102 367.333333 3 4 94 5 4 2 white 852 852.000000 1 75 2 6 12 1 white 2499 499.800000 5 72 267 7 12 3 black 4709 588.625000 8 84 431 8 12 3 white 569 189.666667 3 97 171 9 12 2 black 2446 611.500000 4 49 241 10 12 2 white 2859 714.750000 4 43 203 average size idxmax 0 35.500000 4 5 1 58.000000 1 81 2 60.500000 2 99 3 51.100000 10 88 4 31.333333 3 21 5 2.000000 1 75 6 53.400000 5 69 7 53.875000 8 12 8 57.000000 3 59 9 60.250000 4 36 10 50.750000 4 43 however, these not groups looking for: observations filter=4 in grouping belonging 4, not in grouping filter=12.
i tried looking @ expanding_mean, row-wise. far, incomplete, maybe helps else reply this.
python pandas group-by condition dataframes
No comments:
Post a Comment