r - What is a way to manage paths to local files outside a git repository without clutter from conflicts from differing paths on collaborators' machines? -
i regularly collaborate on big info analysis projects using git , statistical software such r. because datasets big , may alter upon re-download, not maintain these in repository. while design final versions of scripts develop utilize command line arguments read paths raw datasets, it's easier test , debug straight reading files r environment. develop, therefore, end lines such as
something = read.raw.file("path/to/file/on/my/machine") #something = read.raw.file("path/to/file/on/collaborators/machine") #something = read.raw.file("path/to/file/on/other/collaborators/machine")
cluttering code.
there must improve way. i've tried adding file each script reads before running, such
proj-config.local path.to.raw.file.1 = "/path/to/file/on/my/machine"
and adding .gitignore
, "heavyweight" workaround given how much time takes, , it's not obvious collaborators 1 doing or should, or might name or locate file differently (since it's ignored) shared line of code reads file ends wrong, etc. etc.
is there improve way manage local outside-repo paths/references?
ps didn't notice addressing issue in of these related quetions:
workflow statistical analysis , study writing project organization r what best practices utilize programming in r? how combine "revision control" "workflow" r? how software development compare statistical programming/analysis? essential skills of info scientist ensuring reproducibility in r environment r , version command solo info analyst
a solution i've been using build in concept of search path can used locate files. in 1 particular application, i've built-in ability override search path environment variable, similar path
variable commonly used.
i wrote function, findfileinpath
(below) search supplied path , homecoming found. takes in path vector , allows separate pieces character os typically does.
you utilize this: (as illustration only)
datasearchpath = c( "path/to/file/on/my/machine", "path/to/file/on/collaborators/machine", "path/to/file/on/other/collaborators/machine", sys.getenv('datasearchpath') ) datafilename = "data_file.csv" datapathname = findfileinpath(datafilename, path=datasearchpath)[1] # take first 1 if (is.na(datapathname)) { stop(paste("cannot find info file", datafilename), call.=false) } ...
i utilize locate files source
, locate configuration files, info sets, etc. have multiple different paths, of them exposed in environment or various configuration files, others internal. works pretty well.
in illustration above, datasearchpath
environment variable can set (outside of r) colon-separated series of paths search.
my implementation of findfileinpath
defaults searching system's path environment variable, separated colon character. (this won't applicable windows. utilize on mac , linux.)
#' findfileinpath: locates files searching supplied paths #' #' @param filename character: name of file search #' #' @param path character: path search, either vector, or optionally #' separated \code{sep}. #' #' @param sep character: separator character used split \code{path} #' multiple components. #' findfileinpath = function(filename, path=c('.',sys.getenv('path')), sep=':') { # list potential files, , homecoming exist. files = data.frame(name=file.path(unlist(strsplit(path, sep)), filename), stringsasfactors=false) files$exist = file.exists(files$name) files[files$exist==true,1] }
r git version-control path collaboration
No comments:
Post a Comment