Thursday, 15 July 2010

r - Compare consecutive rows in data.table and replace row values -



r - Compare consecutive rows in data.table and replace row values -

i have data.table in r contains multiple status values each user collected @ different time points. want compare the status values @ consecutive time points , update rows flag whenever status changes. please see below example

dt_a <- data.table(sid=c(1,1,2,2,2,3,3), date=as.date(c("2014-06-22","2014-06-23","2014-06-22","2014-06-23", "2014-06-24","2014-06-22","2014-06-23")), status1 = c("a","b","a","a","b","a","a"), status2 = c("c","c","c","c","d","d","e")) dt_a_final <- data.table(sid=c(1,1,2,2,2,3,3), date=as.date(c("2014-06-22","2014-06-23","2014-06-22","2014-06-23", "2014-06-24","2014-06-22","2014-06-23")), status1 = c("0","1","0","0","1","0","0"), status2 = c("0","0","0","0","1","0","1"))

the original info table dt_a is

sid date status1 status2 1 1 2014-06-22 c 2 1 2014-06-23 b c 3 2 2014-06-22 c 4 2 2014-06-23 c 5 2 2014-06-24 b d 6 3 2014-06-22 d 7 3 2014-06-23 e

the final required info table dt_a_final

sid date status1 status2 1 1 2014-06-22 0 0 2 1 2014-06-23 1 0 3 2 2014-06-22 0 0 4 2 2014-06-23 0 0 5 2 2014-06-24 1 1 6 3 2014-06-22 0 0 7 3 2014-06-23 0 1

please help how can accomplish this?

here option:

dt_a[, c("s1change", "s2change") := lapply(.sd, function(x) c(0, head(x, -1l) != tail(x, -1l))), .sdcols=c("status1", "status2"), # .sd contains these columns by=sid ]

here, create 2 new columns, populate lapply on .sd (defined contain status1 , status2). function compares first value of column lastly of same column. homecoming true time there alter in column. add together 0 @ origin since first value never change; coerces result numeric vector (thanks eddi).

then, by sid, , voila:

sid date status1 status2 s1change s2change 1: 1 2014-06-22 c 0 0 2: 1 2014-06-23 b c 1 0 3: 2 2014-06-22 c 0 0 4: 2 2014-06-23 c 0 0 5: 2 2014-06-24 b d 1 1 6: 3 2014-06-22 d 0 0 7: 3 2014-06-23 e 0 1

you can subset drop original status columns if want. isn't possible re-use them because info type of result different original (numeric vs. character).

r data.table

No comments:

Post a Comment