Monday, 15 June 2015

Perform cross-validation on randomForest with R -



Perform cross-validation on randomForest with R -

i using randomforest bundle r train model classification. compare other classifiers, need way display info given rather verbose cross-validation method in weka. therefore, r script should output somesthing [a] weka.

is there way validate r model via rweka produce measures? if not, how cross-validation on random forest done purely in r? is possble utilize rfcv randomforest bundle here? not work.

i know out-of-bag error (oob) used in randomforest kind of cross-validation. need total info suited comparison.

what tried far using r [b]. however, code produces error on setup [c] due missing values.

so, can help me cross-validation?

appendix

[a] weka:

=== stratified cross-validation === === summary === correctly classified instances 3059 96.712 % incorrectly classified instances 104 3.288 % kappa statistic 0.8199 mean absolute error 0.1017 root mean squared error 0.1771 relative absolute error 60.4205 % root relative squared error 61.103 % coverage of cases (0.95 level) 99.6206 % mean rel. part size (0.95 level) 78.043 % total number of instances 3163 === detailed accuracy class === tp rate fp rate precision recall f-measure mcc roc area prc area class 0,918 0,028 0,771 0,918 0,838 0,824 0,985 0,901 sick-euthyroid 0,972 0,082 0,991 0,972 0,982 0,824 0,985 0,998 negative weighted avg. 0,967 0,077 0,971 0,967 0,968 0,824 0,985 0,989 === confusion matrix === b <-- classified 269 24 | = sick-euthyroid 80 2790 | b = negative

[b] code far:

class="lang-r prettyprint-override">library(randomforest) #randomforest() , rfimpute() library(foreign) # read.arff() library(caret) # train() , traincontrol() ntrees <- 2 # 200 mydataset <- 'd:\\your\\directory\\se.arff' # http://hakank.org/weka/se.arff mydb = read.arff(mydataset) mydb.imputed <- rfimpute(class ~ ., data=mydb, ntree = ntrees, importance = true) myres.rf <- randomforest(class ~ ., data=mydb.imputed, ntree = ntrees, importance = true) summary(myres.rf) # specify type of resampling 10-fold cv fitcontrol <- traincontrol(method = "rf",number = 10,repeats = 10) set.seed(825) # deal na | null values in categorical variables #mydb.imputed[is.na(mydb.imputed)] <- 1 #mydb.imputed[is.null(mydb.imputed)] <- 1 rffit <- train(class~ ., data=mydb.imputed, method = "rf", trcontrol = fitcontrol, ## lastly alternative 1 ## rf() passes through ntree = ntrees, importance = true, na.action = na.omit) rffit

the error is:

error in names(resamples) <- gsub("^\\.", "", names(resamples)) : effort set attribute on null

using traceback()

5: nominaltrainworkflow(x = x, y = y, wts = weights, info = traininfo, method = models, ppopts = preprocess, ctrl = trcontrol, lev = classlevels, ...) 4: train.default(x, y, weights = w, ...) 3: train(x, y, weights = w, ...) 2: train.formula(class~ ., info = mydb.imputed, method = "rf", trcontrol = fitcontrol, ntree = ntrees, importance = true, sampsize = rep(minorityclassnum, 2), na.action = na.omit) 1: train(class~ ., info = mydb.imputed, method = "rf", trcontrol = fitcontrol, ntree = ntrees, importance = true, sampsize = rep(minorityclassnum, 2), na.action = na.omit) @ #39

[c] r version info via sessioninfo()

r version 3.1.0 (2014-04-10) platform: i386-w64-mingw32/i386 (32-bit) [...] other attached packages: [1] e1071_1.6-3 caret_6.0-30 ggplot2_1.0.0 foreign_0.8-61 randomforest_4.6-7 dmwr_0.4.1 [7] lattice_0.20-29 jgr_1.7-16 iplots_1.1-7 javagd_0.6-1 rjava_0.9-6

i dont know weka, have done randomforest modelling in r , have used predict function in r this.

try using function

predict(model,data)

bind output original values , utilize table command confusion matrix.

r validation weka random-forest cross-validation

No comments:

Post a Comment