Breeding: Perform cross-validation on randomForest with R -

Monday, 15 June 2015

Perform cross-validation on randomForest with R -

i using randomforest bundle r train model classification. compare other classifiers, need way display info given rather verbose cross-validation method in weka. therefore, r script should output somesthing [a] weka.

is there way validate r model via rweka produce measures? if not, how cross-validation on random forest done purely in r? is possble utilize rfcv randomforest bundle here? not work.

i know out-of-bag error (oob) used in randomforest kind of cross-validation. need total info suited comparison.

what tried far using r [b]. however, code produces error on setup [c] due missing values.

so, can help me cross-validation?

appendix

[a] weka:

=== stratified cross-validation === === summary ===  correctly classified instances        3059               96.712  % incorrectly classified instances       104                3.288  % kappa statistic                          0.8199 mean absolute error                      0.1017 root mean squared error                  0.1771 relative absolute error                 60.4205 % root relative squared error             61.103  % coverage of cases (0.95 level)          99.6206 % mean rel.  part size (0.95 level)      78.043  % total number of instances             3163       === detailed accuracy class ===                   tp rate  fp rate  precision  recall   f-measure  mcc      roc area  prc area  class                  0,918    0,028    0,771      0,918    0,838      0,824    0,985     0,901     sick-euthyroid                  0,972    0,082    0,991      0,972    0,982      0,824    0,985     0,998     negative weighted avg.    0,967    0,077    0,971      0,967    0,968      0,824    0,985     0,989       === confusion matrix ===         b   <-- classified   269   24 |    = sick-euthyroid    80 2790 |    b = negative

[b] code far:

class="lang-r prettyprint-override">

library(randomforest) #randomforest() , rfimpute() library(foreign) # read.arff() library(caret) # train() , traincontrol()  ntrees <- 2 # 200 mydataset <- 'd:\\your\\directory\\se.arff' # http://hakank.org/weka/se.arff  mydb = read.arff(mydataset) mydb.imputed <- rfimpute(class ~ ., data=mydb, ntree = ntrees, importance = true) myres.rf <- randomforest(class ~ ., data=mydb.imputed, ntree = ntrees, importance = true) summary(myres.rf)  # specify type of resampling 10-fold cv fitcontrol <- traincontrol(method = "rf",number = 10,repeats = 10) set.seed(825)  # deal na | null values in categorical variables #mydb.imputed[is.na(mydb.imputed)] <- 1 #mydb.imputed[is.null(mydb.imputed)] <- 1  rffit <- train(class~ ., data=mydb.imputed,              method = "rf",              trcontrol = fitcontrol,              ##  lastly  alternative 1              ## rf() passes through              ntree = ntrees, importance = true, na.action = na.omit) rffit

the error is:

error in names(resamples) <- gsub("^\\.", "", names(resamples)) : effort set attribute on null

using traceback()

5: nominaltrainworkflow(x = x, y = y, wts = weights, info = traininfo,         method = models, ppopts = preprocess, ctrl = trcontrol, lev = classlevels,         ...) 4: train.default(x, y, weights = w, ...) 3: train(x, y, weights = w, ...) 2: train.formula(class~ .,   info = mydb.imputed, method = "rf",         trcontrol = fitcontrol, ntree = ntrees, importance = true,         sampsize = rep(minorityclassnum, 2), na.action = na.omit) 1: train(class~ .,   info = mydb.imputed, method = "rf", trcontrol = fitcontrol,         ntree = ntrees, importance = true, sampsize = rep(minorityclassnum,             2), na.action = na.omit) @ #39

[c] r version info via sessioninfo()

r version 3.1.0 (2014-04-10) platform: i386-w64-mingw32/i386 (32-bit)  [...]  other attached packages:  [1] e1071_1.6-3        caret_6.0-30       ggplot2_1.0.0      foreign_0.8-61     randomforest_4.6-7 dmwr_0.4.1          [7] lattice_0.20-29    jgr_1.7-16         iplots_1.1-7       javagd_0.6-1       rjava_0.9-6

i dont know weka, have done randomforest modelling in r , have used predict function in r this.

try using function

predict(model,data)

bind output original values , utilize table command confusion matrix.

r validation weka random-forest cross-validation

Breeding

Monday, 15 June 2015

Perform cross-validation on randomForest with R -

No comments:

Post a Comment