Breeding: python - Pandas GroupBy on subsets of same DataFrame -

Thursday, 15 May 2014

python - Pandas GroupBy on subsets of same DataFrame -

this question extension my before one. have pandas dataframe:

import pandas pd codes = ["one","two","three"]; colours = ["black", "white"]; textures = ["soft", "hard"]; n= 100 # length of dataframe df = pd.dataframe({ 'id' : range(1,n+1),                     'weeks_elapsed' : [random.choice(range(1,25)) in range(1,n+1)],                     'code' : [random.choice(codes) in range(1,n+1)],                     'colour': [random.choice(colours) in range(1,n+1)],                     'texture': [random.choice(textures) in range(1,n+1)],                     'size': [random.randint(1,100) in range(1,n+1)],                     'scaled_size': [random.randint(100,1000) in range(1,n+1)]                    },  columns= ['id', 'weeks_elapsed', 'code','colour', 'texture', 'size', 'scaled_size'])

i grouping colour , code , statistics on size , scaled_size below:

grouped = df.groupby(['code', 'colour']).agg( {'size': [np.sum, np.average, np.size, pd.series.idxmax],'scaled_size': [np.sum, np.average, np.size, pd.series.idxmax]}).reset_index()

now, want run above calculations on df multiple times different weeks_elapsed intervals. below brute-force solution, there more succint , faster way run this? also, how can concatenate results different intervals in single dataframe?

cut_offs= [4,12] grouped = {c:{} c in cut_offs} c in cut_offs:    grouped[c] =df.ix[df.weeks_elapsed <= c ].groupby(['code', 'colour']).agg(                                                   {'size': [np.sum, np.average, np.size,pd.series.idxmax],                                                   'scaled_size': [np.sum, np.average, np.size, pd.series.idxmax]                                                  }).reset_index()

i particularly interested in np.avg , np.size different weeks_elapsed intervals.

so not working answer, maybe can extended ultimatively there.

filter = array([12, 4]) f in filter: df.loc[(df['weeks_elapsed'] <= f), 'filter'] = f

now, df looks like

>>> df.head() out[384]:     id  weeks_elapsed   code colour texture  size  adjusted_size  filter 0   1             20    1  white    soft    64            494     nan 1   2              3  3  white    hard    22            650       4 2   3             22    2  black    hard    41            770     nan 3   4              2    2  black    hard     4            325       4 4   5              4    2  black    hard    19            536       4

where filter contains smallest grouping row belong to. next step

>>> df.groupby(['filter', 'code', 'colour']).agg({'size': [np.sum, np.average, np.size, pd.series.idxmax],                                     'adjusted_size': [np.sum, np.average, np.size, pd.series.idxmax]} ).reset_index() out[387]:      filter   code colour  adjusted_size                            size  \                                     sum     average  size  idxmax   sum    0        4    1  black           2195  548.750000     4      45   142    1        4    1  white            286  286.000000     1      81    58    2        4  3  black            927  463.500000     2      99   121    3        4  3  white           5850  585.000000    10      95   511    4        4    2  black           1102  367.333333     3       4    94    5        4    2  white            852  852.000000     1      75     2    6       12    1  white           2499  499.800000     5      72   267    7       12  3  black           4709  588.625000     8      84   431    8       12  3  white            569  189.666667     3      97   171    9       12    2  black           2446  611.500000     4      49   241    10      12    2  white           2859  714.750000     4      43   203            average  size  idxmax   0   35.500000     4       5   1   58.000000     1      81   2   60.500000     2      99   3   51.100000    10      88   4   31.333333     3      21   5    2.000000     1      75   6   53.400000     5      69   7   53.875000     8      12   8   57.000000     3      59   9   60.250000     4      36   10  50.750000     4      43

however, these not groups looking for: observations filter=4 in grouping belonging 4, not in grouping filter=12.

i tried looking @ expanding_mean, row-wise. far, incomplete, maybe helps else reply this.

python pandas group-by condition dataframes

Breeding

Thursday, 15 May 2014

python - Pandas GroupBy on subsets of same DataFrame -

No comments:

Post a Comment