Monday, 15 June 2015

csv - R Cluster Package Error Daisy() function long vectors (argument 11) are not supported in .C -



csv - R Cluster Package Error Daisy() function long vectors (argument 11) are not supported in .C -

trying convert data.frame numeric, nominal, , na values dissimilarity matrix using daisy bundle in r. purpose involves creating dissimilarity matrix before applying k-means clustering client segmentation. data.frame has 133,153 rows , 36 columns. here's machine.

sessioninfo() r version 3.1.0 (2014-04-10) platform x86_64-w64-mingw32/x64 (64-bit)

how can prepare daisy warning?

since windows computer has 3 gb ram, increased virtual memory 100gb hoping plenty create matrix - didn't work. still got couple errors memory. i've looked other r packages solving memory problem, don't work. cannot utilize bigmemory biganalytics bundle because accepts numeric matrices. clara , ff packages take numeric matrices.

cran's cluster bundle suggests gower similarity coefficient distance measure before applying k-means. gower coefficient takes numeric, nominal, , na values.

store1 <- read.csv("/users/scdavis6/documents/work/client1.csv", head=false) df <- as.data.frame(store1) save(df, file="df.rda") library(cluster) daisy1 <- daisy(df, metric = "gower", type = list(ordratio = c(1:35))) #error in daisy(df, metric = "gower", type = list(ordratio = c(1:35))) : #long vectors (argument 11) not supported in .c

**edit: have rstudio lined amazon web service's (aws) r3.8xlarge 244gbs of memory , 32 vcpus. tried creating daisy matrix on computer, did not have plenty ram. **

**edit 2: used clara function clustering dataset. **

#50 samples clara2 <- clara(df, 3, metric = "euclidean", stand = false, samples = 50, rngr = false, pamlike = true)

use algorithms not require o(n^2) memory, if have lot of data. swapping disk kill performance, not sensible option.

instead, seek either cut down info set size, or utilize index acceleration avoid o(n^2) memory cost. (and it's not o(n^2) memory, o(n^2) distance computations, take long time!)

r csv amazon-ec2 cluster-analysis

No comments:

Post a Comment