Sunday, 15 February 2015

R: combinatorial string replacement -



R: combinatorial string replacement -

i on lookout gsub based function enable me combinatorial string replacement, if have arbitrary number of string replacement rules

replrules=list("<x>"=c(3,5),"<alk>"=c("hept","oct","non"),"<end>"=c("ane","ene"))

and target string

string="<x>-methyl<alk><end>"

it give me dataframe final string name , substitutions made in

name x alk end 3-methylheptane 3 hept ane 5-methylheptane 5 hept ane 3-methyloctane 3 oct ane 5-methyloctane 5 ... ... 3-methylnonane 3 5-methylnonane 5 3-methylheptene 3 5-methylheptene 5 3-methyloctene 3 5-methyloctene 5 3-methylnonene 3 5-methylnonene 5

the target string of arbitrary structure, e.g. string="1-<alk>anol" or each pattern occur several times, in string="<alk>anedioic acid, di<alk>yl ester"

what elegant way kind of thing in r?

how

d <- do.call(expand.grid, replrules) d$name <- paste0(d$'<x>', "-", "methyl", d$'<alk>', d$'<end>')

edit

this seems work (substituting each of these strplit)

string = "<x>-methyl<alk><end>" string2 = "<x>-ethyl<alk>acosane" string3 = "1-<alk>anol"

using richards regex

d <- do.call(expand.grid, list(replrules, stringsasfactors=false)) names(d) <- gsub("<|>","",names(d)) s <- strsplit(string3, "(<|>)", perl = true)[[1]] out <- list() for(i in s) { out[[i]] <- ifelse (i %in% names(d), d[i], i) } d$name <- do.call(paste0, unlist(out, recursive=f))

edit

this should work repeat items

d <- do.call(expand.grid, list(replrules, stringsasfactors=false)) names(d) <- gsub("<|>","",names(d)) string4 = "<x>-methyl<alk><end>oate<alk>" s <- strsplit(string4, "(<|>)", perl = true)[[1]] out <- list() for(i in seq_along(s)) { out[[i]] <- ifelse (s[i] %in% names(d), d[s[i]], s[i]) } d$name <- do.call(paste0, unlist(out, recursive=f))

string r gsub

No comments:

Post a Comment