Decision tree classifier
Syntax
parameters = dtcfit(X,y)
parameters = dtcfit(X,y,options)
Inputs
 X
 Training data.
 Type: double
 Dimension: vector  matrix
 y
 Target values.
 Type: double
 Dimension: vector  matrix
 options
 Type: struct

 criterion
 Function to measure quality of a split. 'gini' for Gini Impurity (default) or 'entropy' for Information Gain.
 Type: char
 Dimension: string
 splitter
 Strategy used to choose the split at each node. 'best' to choose best split (default) or 'random' to choose random split.
 Type: char
 Dimension: string
 max_depth
 The maximum depth of the tree. If not assigned, nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split.
 Type: integer
 Dimension: scalar
 max_samples_split
 The minimum number of samples required to split an internal node (default: 2). If integer, consider it as the minimum number; if double, (min_samples_split * n_samples) is taken as the minimum number of samples for each split.
 Type: double  integer
 Dimension: scalar
 min_samples_leaf
 The minimum number of samples required to be at a leaf node (default: 1). If number of samples are less than min_samples_leaf at any node, tree is not built further under that node. If integer, consider it as the minimum number; if double, (min_samples_leaf * number of samples) is taken as the minimum number of samples for each node.
 Type: double  integer
 Dimension: scalar
 min_weight_fraction_leaf
 The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node (default: 0).
 Type: double
 Dimension: scalar
 max_features
 The number of features to consider when looking for the best split (default: number of features in training data). If integer, at each split, consider max_features; if double, at each split, consider floor(max_features * n_features)
 Type: double  integer
 Dimension: scalar
 random_state
 Controls the randomness of the model. At each split, features are randomly permuted. random_state is the seed used by the random number generator.
 Type: integer
 Dimension: scalar
 max_leaf_nodes
 Grow a tree with max_leaf_nodes in bestfirst fashion. Best nodes are defined by its reduction in impurity. If not assigned, then unlimited number of leaf nodes.
 Type: integer
 Dimension: scalar
 min_impurity_decrease
 A node will be split if this split reduces the impurity >= this value (default: 0).
 Type: double
 Dimension: scalar
Outputs
 parameters
 contains all the values passed to dtcfit method as options. Additionally it has below keyvalue pairs.
 Type: struct

 scorer
 Function handle pointing to 'accuracy' function.
 Type: function handle
 classes
 The class labels (single output problem), or a matrix of class labels (multioutput problem).
 Type: double
 Dimension: vector  matrix
 max_features
 The inferred value of max_features.
 Type: integer
 Dimension: scalar
 feature_importances
 Feature importances. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as Gini Importance.
 Type: double
 Dimension: n_features
 n_samples
 Number of rows in the training data.
 Type: integer
 Dimension: scalar
 n_features
 Number of columns in the training data.
 Type: integer
 Dimension: scalar
Example
Usage of dtcfit without options
data = dlmread('iris.csv', ',', 1);
X = data(:,1:end1);
y = data(:,end);
parameters = dtcfit(X, y);
> parameters
parameters = struct [
classes: [Matrix] 1 x 3
0 1 2
criterion: gini
feature_importances: [Matrix] 1 x 4
0.00000 0.01333 0.06406 0.92261
max_features: 4
min_impurity_decrease: 0
min_samples_leaf: 1
min_samples_split: 2
min_weight_fraction_leaf: 0
n_features: 4
n_samples: 150
splitter: best
]
Comments
It performs classification by constructing a Decision Tree. Once the tree construction is over, it can be used for prediction using dtcpredict function.