Breeding: r - How to use dplyr to eliminate for loops? -

Tuesday, 15 May 2012

r - How to use dplyr to eliminate for loops? -

does know of dplyr method doing pairwise matching on info missing observations followed subsequent arithmetic? below for-loop heavy code mwe in base, couldn't arms around dplyr approach (despite first-class vignettes , documentation).

in brief, code calculates dev, average of non-missing quantity observations q sold @ adjacent adj stores week.

edit: i'm interested in states divergent policies. allow vertical line below represent state boundary: counties 1, 2, , 3 in state (with policy a), , counties 4, 5, , 6 in state b (with policy b). counties may have multiple stores.

----|---- 1 | 4 |---- ----| 5 2 | ----|---- 3 | 6 ----|----

contig.id identifies county contiguous 1 or more counties in opposite state. example, county 1 (contig.id == 1) adjacent counties 4 , 5 in opposite state (adj1 == 4 , adj2 == 5), disregard county 2's geographic adjacency since 1 , 2 in same state.

by same method, county 4 (contig.id == 4) adjacent county 1 (adj1 == 1 , adj2 == na). end edit.

df <- data.frame(store     = c(1001,1001,145,331,228,228,500,500,61,1135),                  end.week  = c(20061125,20061118,20061125,20061125,20061125,                            20061118,20061125,20061118,20061118,20061125),                  contig.id = c(1,1,2,3,4,4,4,4,5,na),                  adj1      = c(4,4,5,6,1,1,1,1,1,na),                  adj2      = c(5,5,na,na,na,na,na,na,2,na),                  q         = c(12.25,14.5,18.75,16,16.5,22,55.25,8.25,24,37.75))  dev  <- null dev1 <- null (i in 1:length(df$contig.id)) {   temp1 <- integer(0)   temp2 <- integer(0)   if (is.na(df$contig.id[i]) == false) {     temp1 <- which( (df$contig.id == df$adj1[i]) &                     (df$end.week == df$end.week[i]))     if (length(temp1) > 0) {       dev[i] <- sum(df$q[temp1])       }     if (is.na(df$adj2[i]) == false) {       temp2    <- which( (df$contig.id == df$adj2[i]) &                          (df$end.week == df$end.week[i]) )       if (length(temp2) > 0) {         dev[i] <- dev[i] + sum(df$q[temp2])       }     }   } else {     dev[i] <- na   }   dev[i]  <- dev[i]/(length(temp1) + length(temp2))   dev1[i] <- (df$q[i])/dev[i] } df <- cbind(df,dev,dev1)

so have 3 kinds of info here, why needed such complex for-looping. i've tried normalize info 3 tables:

library(dplyr) library(tidyr)  stores_time <- df %>%   select(-contig.id,-adj1,-adj2)  stores_space <- df %>%   select(store,contig.id) %>%   mutate(county = contig.id %>% paste0("c",.)) %>%   select(-contig.id) %>%   unique  counties <- df %>%   select(contig.id,adj1,adj2) %>%   mutate(county = contig.id %>% paste0("c",.)) %>%   select(-contig.id) %>%   unique %>%   gather(varname,adj_next_state,starts_with("adj")) %>%   select(-varname) %>%   mutate(adj_next_state = adj_next_state %>% paste0("c",.))

now have info on each store's sales on time (stores_time), each store's "location" in space (i.e. county in, stores_space) , info on adjacency of counties (counties). i've converted info wide long -- may come in handy, if have counties adjacent >2 other counties.

we can bring together of these together, obtain dataset of each store's performance in both "time" , "space":

stores_tsc <- stores_time %>% left_join(stores_space) %>% left_join(counties)

to calculate dev, need bring together table onto itself. because, each store x time combination want average adjacent stores. when bring together table itself, need bring together county adj_next_state. can utilize select magic create easy:

stores_tsc %>%   # rename 1 column   select(store,end.week,county = adj_next_state) %>%   # left  bring together table   # removing unneeded columns , using unique prevents duplicate rows.   left_join(stores_tsc %>%               select(-adj_next_state,-store) %>%               unique,             = c("county","end.week")) %>%   # filter out store in unknown county   filter(county != "cna") %>%   # calculate dev   group_by(store,end.week) %>%   summarize(dev = mean(q,na.rm = true)) %>%   ungroup %>%   mutate(dev = ifelse(is.nan(dev), yes = na,no = dev))    store end.week      dev 1    61 20061118 14.50000 2   145 20061125       na 3   228 20061118 14.50000 4   228 20061125 12.25000 5   331 20061125       na 6   500 20061118 14.50000 7   500 20061125 12.25000 8  1001 20061118 18.08333 9  1001 20061125 35.87500

you utilize merge stores_time calculate dev1 = q/dev

r for-loop spatial dplyr

Breeding

Tuesday, 15 May 2012

r - How to use dplyr to eliminate for loops? -

No comments:

Post a Comment