Classification and decision trees for beginners

Follow the full discussion on Reddit.
Hi, I'm working with a social science dataset about the academic performance of university students. There are many variables on the dataset (Gender, Vital records, education level of parents, attendance at lectures, hours dedicated on studying etc). I'm supposed to use techniques of statistical classification (It's the first time that I use classification techniques). In particular, I was trying to use a classification tree. Since I have to do predictions about the academic performance of students, I thought to consider the final grades of the students as my response variable. It is a discrete ordinal variable, based on a Likert Scale. In general, the variables I'm working with are all discrete variables. So I was seeking information about how to build a classification tree with these variables. First of all, I was thinking about the opportunity of dichotomizing this variable dividing its outcomes in Pass/Fail regardless of the specific evaluation with whom they passed the exams: is it a good idea? Then, I want to know how should I have to proceed to build a tree-based model to identify the groups people who are supposed to pass or to fail the year. I'm using R: I know that there are two famous packages ( rpart and tree) but I can guess there are also other packages to do this. What package do you use or advice to CART and tree-based models?

Comments

There's unfortunately not much to read here yet...

Discover the Best of Machine Learning.

Ever having issues keeping up with everything that's going on in Machine Learning? That's where we help. We're sending out a weekly digest, highlighting the Best of Machine Learning.

Join over 900 Machine Learning Engineers receiving our weekly digest.

Best of Machine LearningBest of Machine Learning

Discover the best guides, books, papers and news in Machine Learning, once per week.

Twitter