Breeding: R aggregate all possible combinations incl. "don't cares" -

Thursday, 15 April 2010

R aggregate all possible combinations incl. "don't cares" -

say we've got dataframe 3 columns representing 3 different cases, , each can of state 0 or 1. 4th column contains measurement.

set.seed(123) df <- data.frame(round(runif(25)),                  round(runif(25)),                  round(runif(25)),                  runif(25)) colnames(df) <- c("v1", "v2", "v3", "x") head(df)    v1 v2 v3         x 1  0  1  0 0.2201189 2  1  1  0 0.3798165 3  0  1  1 0.6127710  aggregate(df$x, by=list(df$v1, df$v2, df$v3), fun=mean)    group.1 group.2 group.3         x 1       0       0       0 0.1028646 2       1       0       0 0.5081943 3       0       1       0 0.4828984 4       1       1       0 0.5197925 5       0       0       1 0.4571073 6       1       0       1 0.3219217 7       0       1       1 0.6127710 8       1       1       1 0.6029213

the aggregate function calculates mean possible combinations. however, in research need know outcome of combinations, columns may have state. example, mean of observations v1==1 & v2==1, regardless contents of v3. result should this, asterisk representing "don't care":

  group.1 group.2 group.3         x 1       *       *       * 0.1234567 (this mean of rows) 2       0       *       * 0.1234567 3       1       *       * 0.1234567 4       *       0       * 0.1224567 5       *       1       * 0.1234567 [ other possible combinations follow, should total of 27 rows ]

is there easy way accomplish this?

here ldply-ddply method:

library(plyr) ldply(list(.(v1,v2,v3),.(v1),.(v2),.()), function(y) ddply(df,y,summarise,x=mean(x)))    v1 v2 v3         x  .id 1   0  0  0 0.1028646 <na> 2   0  0  1 0.4571073 <na> 3   0  1  0 0.4828984 <na> 4   0  1  1 0.6127710 <na> 5   1  0  0 0.5081943 <na> 6   1  0  1 0.3219217 <na> 7   1  1  0 0.5197925 <na> 8   1  1  1 0.6029213 <na> 9   0 na na 0.4436400 <na> 10  1 na na 0.4639997 <na> 11 na  0 na 0.4118793 <na> 12 na  1 na 0.5362985 <na> 13 na na na 0.4566702 <na>

essentially create list of variable combinations interested in, , iterate on ldply , using ddply perform aggreation. magic of plyr puts compact dataframe you. remains remove spurious .id column introduced grand mean (.()) , replace nas in groups "*" if needed.

to combinations can utilize combn , lapply generate list relevant combinations plug ldply:

all.combs <- unlist(lapply(0:3,combn,x=c("v1","v2","v3"),simplify=false),recursive=false) ldply(all.combs, function(y) ddply(df,y,summarise,x=mean(x)))     .id         x v1 v2 v3 1  <na> 0.4566702 na na na 2  <na> 0.4436400  0 na na 3  <na> 0.4639997  1 na na 4  <na> 0.4118793 na  0 na 5  <na> 0.5362985 na  1 na 6  <na> 0.4738541 na na  0 7  <na> 0.4380543 na na  1 8  <na> 0.3862588  0  0 na 9  <na> 0.5153666  0  1 na 10 <na> 0.4235250  1  0 na 11 <na> 0.5530440  1  1 na 12 <na> 0.3878900  0 na  0 13 <na> 0.4882400  0 na  1 14 <na> 0.5120604  1 na  0 15 <na> 0.4022073  1 na  1 16 <na> 0.4502901 na  0  0 17 <na> 0.3820042 na  0  1 18 <na> 0.5013455 na  1  0 19 <na> 0.6062045 na  1  1 20 <na> 0.1028646  0  0  0 21 <na> 0.4571073  0  0  1 22 <na> 0.4828984  0  1  0 23 <na> 0.6127710  0  1  1 24 <na> 0.5081943  1  0  0 25 <na> 0.3219217  1  0  1 26 <na> 0.5197925  1  1  0 27 <na> 0.6029213  1  1  1

r aggregate

Breeding

Thursday, 15 April 2010

R aggregate all possible combinations incl. "don't cares" -

No comments:

Post a Comment