Sunday, 15 August 2010

r - Reshape2 - multiple rows to single rows with name change and varying numbers of rows per group -



r - Reshape2 - multiple rows to single rows with name change and varying numbers of rows per group -

i have next info set newbie reshape2 functions:

df<-data.frame(site=c("a", "a", "a", "b", "b", "c"), polygonarea=c(0.6, 0.3, 0.1, 0.7, 0.3, 1.0), originyear=c(1900, 1910, 1905, 1950, 1975, 2000))

i want turn dfinto dataframe structure:

df2<-data.frame(site=c("a", "b", "c"), polygonarea1=c(0.6,0.7,1.0), originyear1=c(1900, 1950,2000), polygonarea2 = c(0.3, 0.3, na), originyear2 = c(1910, 1975, na), polygonarea3 = c(0.1, na, na), originyear3 = c(1905, na, na))

i've tried without success utilize reshape2 bundle illustration has 'na' cells no aggregate function. also, column headers alter names (i.e., '1', '2', '3', etc.).

how do this?

you can do

df2 <- dcast( melt(cbind(df, id=ave(rep.int(1, nrow(df)), df$site, fun=seq_along)), id.vars=c("site","id")), site~variable+id )

here utilize ave give each row unique id each site. looking @ part, gives

#cbind(df, id=ave(rep.int(1, nrow(df)), df$site, fun=seq_along)) site polygonarea originyear id 1 0.6 1900 1 2 0.3 1910 2 3 0.1 1905 3 4 b 0.7 1950 1 5 b 0.3 1975 2 6 c 1.0 2000 1

then melt variables along site/id. looks like

#head(melt(cbind(df, id=ave(rep.int(1, nrow(df)), df$site, fun=seq_along)), id.vars=c("site","id"))) site id variable value 1 1 polygonarea 0.6 2 2 polygonarea 0.3 3 3 polygonarea 0.1 4 b 1 polygonarea 0.7 5 b 2 polygonarea 0.3 6 c 1 polygonarea 1.0

then dcast them them in order want.

site polygonarea_1 polygonarea_2 polygonarea_3 originyear_1 originyear_2 originyear_3 1 0.6 0.3 0.1 1900 1910 1905 2 b 0.7 0.3 na 1950 1975 na 3 c 1.0 na na 2000 na na

this set "_" in variable names. if want remove it, can do

names(df2) <- gsub("_(?=[^_]+$)","", names(df2), perl=t)

(it's bit awkward want create sure remove lastly "_" , not others)

r plyr reshape reshape2

No comments:

Post a Comment