Table of Contents

Data Mining - Decision Tree (DT) Algorithm

About

Desicion Tree (DT) are supervised Classification algorithms.

They are:

Decision trees extract predictive information in the form of human-understandable tree-rules. Decision Tree is a algorithm useful for many classification problems that that can help explain the model’s logic using human-readable “If…. Then…” rules.

They can:

Each decision in the tree can be seen as an feature.

Algorithm

The creation of a tree is a quest for:

At each level, choose the attribute that produces the “purest” nodes (ie choosing the attribute with the highest information gain)

Algorithm:

Overfitting

Decision Trees are prone to overfitting:

Decision Trees can overfit badly because of the highly complex decision boundaries it can produce; the effect is ameliorated, but rarely completely eliminated with Pruning.

Library

Example

Titanic (Survive Yes or No)

Titanic Data Set

if Ticket Class = "1" then
   if Sex = "female" then Survive = "yes"
   if Sex = "male" and age < 5 then Survive = "yes"
if Ticket Class = "1" then
   if Sex = "female" then Survive = "yes"
   if Sex = "male" then Survive = "no"
if Ticket Class = "3"
   if Sex = "male" then Survive = "no"
   if Sex = "female" then 
      if Age < 4  then Survive = "yes"
      if Age >= 4 then Survive = "no"

Every path from the root is a rule

Type

Univariate

Single tests at the nodes

multivariate

Compound tests at the nodes

Documentation / Reference