Decision tree construction
Use ID3 The algorithm builds the decision tree
1 ID3 Algorithm

ID3 The core of the algorithm is to select features according to the information gain criterion at each node of the decision tree , Build the decision tree recursively . The way to do it is ： From the root (root node) Start , Calculate the information gain of all possible features for nodes , The feature with the largest information gain is selected as the feature of the node , According to the different values of the feature, the sub nodes are established ; Then call the above method recursively to the child node , Build decision tree ; Until the information gain of all features is very small or no features can be selected , Finally, a decision tree is obtained .ID3 It's equivalent to using the maximum likelihood method to select the probability model .
In the use of ID3 Before building a decision tree , Let's analyze the data again .

ID Age There's work Have your own house Credit situation Category ( Is it a loan )
1 youth no no commonly no
2 youth no no good no
3 youth yes no good yes
4 youth yes yes commonly yes
5 youth no no commonly no
6 middle-aged no no commonly no
7 middle-aged no no good no
8 middle-aged yes yes good yes
9 middle-aged no yes very nice yes
10 middle-aged no yes very nice yes
11 The elderly no yes very nice yes
12 The elderly no yes good yes
13 The elderly yes no good yes
14 The elderly yes no very nice yes
15 The elderly no no commonly no

Using the results of the last article , Due to features A3( Have your own house ) The maximum information gain of , So select features A3 As a feature of the root node . It will train set D Divided into two subsets D1(A3 The value is ” yes ”) and D2(A3 The value is ” no ”). because D1 Only sample points of the same kind , So it becomes a leaf node , The class of the node is marked as “ yes ”.
Yes D2 We need to start with the characteristics A1( Age ),A2( There's work ) and A4( Credit situation ) Select a new feature from the list , Calculate the information gain of each feature ：

g(D2,A1) = H(D2) - H(D2 | A1) = 0.251
g(D2,A2) = H(D2) - H(D2 | A2) = 0.918
g(D2,A3) = H(D2) - H(D2 | A3) = 0.474
According to the calculation , Select the feature with maximum information gain A2( There's work ) As a feature of nodes . because A2 There are two possible values , From this node two child nodes are derived ： A correspondence ” yes ”( There's work ) The child node of , contain 3 Samples , They belong to the same category , So this is a leaf node , Class is marked with ” yes ”; The other is corresponding to ” no ”( No job ) The child node of , contain 6 Samples , They belong to the same category , So this is also a leaf node , Class is marked with ” no ”.
This creates a decision tree , The decision tree uses only two features ( There are two internal nodes ), The decision tree generated is shown in the figure below .

We use ID3 Algorithm , Build a decision tree by calculation , Next , Let's take a look at how to implement generation .

## Write code to build a decision tree

We use dictionaries to store the structure of the decision tree , For example, the decision tree we analyzed in the last section , It can be expressed as ：
{‘ Have your own house ’: {0: {‘ There's work ’: {0: ‘no’, 1: ‘yes’}}, 1: ‘yes’}}
Create a function majorityCnt Statistics classList Most of the elements that appear here in ( Class label ), Create a function createTree Used to recursively build decision trees . Write the code as follows ：

``````# -*- coding: UTF-8 -*-
from math import log
import operator

"""
Function description : Calculate the empirical entropy of a given data set ( Shannon entropy )

Parameters:
dataSet -  Data sets
Returns:
shannonEnt -  Experience in entropy ( Shannon entropy )
"""
def calcShannonEnt(dataSet):
numEntires = len(dataSet)                        # Returns the number of rows in the dataset
labelCounts = {}                                # Save each label (Label) A dictionary of the number of occurrences
for featVec in dataSet:                            # Make statistics on each group of eigenvectors
currentLabel = featVec[-1]                    # Extract tags (Label) Information
if currentLabel not in labelCounts.keys():    # If the label (Label) There is no dictionary for counting times , Add in
labelCounts[currentLabel] = 0
labelCounts[currentLabel] += 1                #Label Count
shannonEnt = 0.0                                # Experience in entropy ( Shannon entropy )
for key in labelCounts:                            # Calculate Shannon entropy
prob = float(labelCounts[key]) / numEntires    # Select the label (Label) Probability
shannonEnt -= prob * log(prob, 2)            # Use the formula to calculate
return shannonEnt                                # Back to empirical entropy ( Shannon entropy )

"""
Function description : Create test data sets

Parameters:
nothing
Returns:
dataSet -  Data sets
labels -  Feature tags
"""
def createDataSet():
dataSet = [[0, 0, 0, 0, 'no'],                        # Data sets
[0, 0, 0, 1, 'no'],
[0, 1, 0, 1, 'yes'],
[0, 1, 1, 0, 'yes'],
[0, 0, 0, 0, 'no'],
[1, 0, 0, 0, 'no'],
[1, 0, 0, 1, 'no'],
[1, 1, 1, 1, 'yes'],
[1, 0, 1, 2, 'yes'],
[1, 0, 1, 2, 'yes'],
[2, 0, 1, 2, 'yes'],
[2, 0, 1, 1, 'yes'],
[2, 1, 0, 1, 'yes'],
[2, 1, 0, 2, 'yes'],
[2, 0, 0, 0, 'no']]
labels = [' Age ', ' There's work ', ' Have your own house ', ' Credit situation ']        # Feature tags
return dataSet, labels                             # Return the dataset and classification properties

"""
Function description : Divide the data set according to the given characteristics

Parameters:
dataSet -  Data set to be partitioned
axis -  Partition the characteristics of the data set
value -  The value of the characteristic to be returned
Returns:
nothing
"""
def splitDataSet(dataSet, axis, value):
retDataSet = []                                        # Create a list of returned datasets
for featVec in dataSet:                             # Traversing data sets
if featVec[axis] == value:
reducedFeatVec = featVec[:axis]                # Get rid of axis features
reducedFeatVec.extend(featVec[axis+1:])     # Add the qualified ones to the returned dataset
retDataSet.append(reducedFeatVec)
return retDataSet                                      # Return the partitioned dataset

"""
Function description : Select the best feature

Parameters:
dataSet -  Data sets
Returns:
bestFeature -  The biggest information gain ( The optimal ) Index value of the feature
"""
def chooseBestFeatureToSplit(dataSet):
numFeatures = len(dataSet[0]) - 1                    # Number of features
baseEntropy = calcShannonEnt(dataSet)                 # Calculate the Shannon entropy of the data set
bestInfoGain = 0.0                                  # Information gain
bestFeature = -1                                    # The index value of the optimal feature
for i in range(numFeatures):                         # Traverse all features
# obtain dataSet Of the i All the features
featList = [example[i] for example in dataSet]
uniqueVals = set(featList)                         # establish set aggregate {}, Element is not repeatable
newEntropy = 0.0                                  # Empirical condition entropy
for value in uniqueVals:                         # Calculated information gain
subDataSet = splitDataSet(dataSet, i, value)         #subDataSet The partitioned subset
prob = len(subDataSet) / float(len(dataSet))           # Calculate the probability of a subset
newEntropy += prob * calcShannonEnt(subDataSet)     # The empirical conditional entropy is calculated according to the formula
infoGain = baseEntropy - newEntropy                     # Information gain
# print(" The first %d The gain of each feature is %.3f" % (i, infoGain))            # Print the information gain for each feature
if (infoGain > bestInfoGain):                             # Calculated information gain
bestInfoGain = infoGain                             # Update information gain , Find the maximum information gain
bestFeature = i                                     # Record the index value of the feature with the largest information gain
return bestFeature                                             # Returns the index value of the feature with the largest information gain

"""
Function description : Statistics classList Most of the elements that appear here in ( Class label )

Parameters:
classList -  Class tag list
Returns:
sortedClassCount[0][0] -  The elements that appear most here ( Class label )
"""
def majorityCnt(classList):
classCount = {}
for vote in classList:                                        # Statistics classList The number of occurrences of each element in
if vote not in classCount.keys():classCount[vote] = 0
classCount[vote] += 1
sortedClassCount = sorted(classCount.items(), key = operator.itemgetter(1), reverse = True)        # Sort in descending order according to the dictionary values
return sortedClassCount[0][0]                                # return classList The most frequently occurring element in

"""
Function description : Create a decision tree

Parameters:
dataSet -  Training data set
labels -  Category attribute tag
featLabels -  Store the selected optimal feature tag
"""
def createTree(dataSet, labels, featLabels):
classList = [example[-1] for example in dataSet]            # Take the classification label ( Whether to lend :yes or no)
if classList.count(classList[0]) == len(classList):            # If the categories are exactly the same, stop dividing
return classList[0]
if len(dataSet[0]) == 1:                                    # After traversing all features, return the most frequent class label
return majorityCnt(classList)
bestFeat = chooseBestFeatureToSplit(dataSet)                # Select the best feature
bestFeatLabel = labels[bestFeat]                            # The label of the best feature
featLabels.append(bestFeatLabel)
myTree = {bestFeatLabel:{}}                                    # According to the optimal feature of the tag generation tree
del(labels[bestFeat])                                        # Delete used feature tags
featValues = [example[bestFeat] for example in dataSet]        # Get the attribute values of all the optimal features in the training set
uniqueVals = set(featValues)                                # Remove duplicate attribute values
for value in uniqueVals:                                    # Ergodic characteristics , Create a decision tree .
myTree[bestFeatLabel][value] = createTree(splitDataSet(dataSet, bestFeat, value), labels, featLabels)
return myTree

if __name__ == '__main__':
dataSet, labels = createDataSet()
featLabels = []
myTree = createTree(dataSet, labels, featLabels)
print(myTree)``````
``````{' Have your own house ': {0: {' There's work ': {0: 'no', 1: 'yes'}}, 1: 'yes'}}
``````

When creating a decision tree recursively , Recursion has two termination conditions ： The first stop condition is that all class tags are exactly the same , Then return the label directly ; The second stop condition is to use all the features , You still can't divide data into groups that contain only unique categories , That is, the decision tree construction fails , Features are not enough . This shows that the data latitude is not enough , Because the second stop condition cannot simply return a unique class label , Here, the category with the largest number of occurrences is selected as the return value .