Saturday, 15 January 2011

Create new column using row-wise operations in Data.Table R -



Create new column using row-wise operations in Data.Table R -

i have 2 data.tables in r follows:

dt_a columns sid, date, value1, value2, sid primary key , date secondary key.

dt_b has columns sid, date1, date2, date3, ...., date12. here sid primary key , hence each row corresponds unique sid , other column names correspond secondary keys in dt_a.

i want add together column dt_a ith row contains corresponding element table dt_b[sid_i, date_i]

below illustration code , desired output:

require(data.table) dt_a <- data.table(sid = c(1,2,3,4,5,1,3), date = c("jan 2012", "feb 2012", "april 2012", "may 2012", "dec 2012", "feb 2012", "oct 2012"), value1 = rep("1", 7), value2 = rep("1", 7)) dt_b <- data.table(sid = as.character(c(1,2,3,4,5)), "jan 2012" = rep("1", 5), "feb 2012" = rep("2", 5), "march 2012" = rep("3", 5), "april 2012" = rep("4", 5), "may 2012" = rep("5", 5), "june 2012" = rep("6", 5), "july 2012" = rep("7", 5), "aug 2012" = rep("8", 5), "sept 2012" = rep("9", 5), "oct 2012" = rep("10", 5), "nov 2012" = rep("11", 5), "dec 2012" = rep("12", 5)) #set keys setkey(dt_a, sid, date) setkey(dt_b, sid) #define function fun1 fun1 <- function(x){ trycatch(dt_b[x[1], x[2], with=false], error = function(e) null) } #desired output dt_a$newcol <- sapply(apply(dt_a, 1, fun1),"[[",1) dt_a

although current method works on little example, actual dt_a has 20 1000000 rows. method hangs there. can operation in more efficient way using data.table or other method?

melt sec data.table:

library(reshape2) dt_b.melted = melt(dt_b, id.vars = 'sid', variable.name = 'date')

once it's melted, can set both same key, , join/assign/whatnot:

setkey(dt_b.melted, sid, date) setkey(dt_a, sid, date) dt_a[dt_b.melted, newcol := value]

r data.table

No comments:

Post a Comment