C5.0 decision tree - c50 code called exit with value 1 C5.0 decision tree - c50 code called exit with value 1 r r

C5.0 decision tree - c50 code called exit with value 1


For anyone interested, the data can be found here: http://www.kaggle.com/c/titanic-gettingStarted/data. I think you need to be registered in order to download it.

Regarding your problem, first of I think you meant to write

new_model <- C5.0(train[,-2],train$Survived)

Next, notice the structure of the Cabin and Embarked Columns. These two factors have an empty character as a level name (check with levels(train$Embarked)). This is the point where C50 falls over. If you modify your data such that

levels(train$Cabin)[1] = "missing"levels(train$Embarked)[1] = "missing"

your algorithm will now run without an error.


Just in case. You can take a look to the error by

summary(new_model)

Also this error occurs when there are a special characters in the name of a variable. For example, one will get this error if there is "я"(it's from Russian alphabet) character in the name of a variable.


Here is what worked finally:-

Got this idea after reading this post

library(C50)test$Survived <- NAcombinedData <- rbind(train,test)combinedData$Survived <- factor(combinedData$Survived)# fixing empty character level names levels(combinedData$Cabin)[1] = "missing"levels(combinedData$Embarked)[1] = "missing"new_train <- combinedData[1:891,]new_test <- combinedData[892:1309,]new_model <- C5.0(new_train[,-2],new_train$Survived)new_model_predict <- predict(new_model,new_test)submitC50 <- data.frame(PassengerId=new_test$PassengerId, Survived=new_model_predict)write.csv(submitC50, file="c50dtree.csv", row.names=FALSE)

The intuition behind this is that in this way both the train and test data set will have consistent factor levels.