Unit 7 Assignment 6, answers by: A.Aziz Altowayan
Learning Homework 1
Question:
Answer:
I calculated the Entropy according to:
H(p) = - ∑ p_{i} . log_{2}(p_{i})
Since it's a binary classification (PlayTennis: Yes/No), if we only look at the binary variable PlayTennis, then we have:
D = (No, No, Yes, Yes, Yes, No, Yes, No, Yes, Yes, Yes, Yes, Yes, No)
with the estimated probabilities
p_{1} = P(Yes) = 9/14 and p_{2} = P(No) = 5/14
so, p = (9/14, 5/14)
When we plug that into the entropy equation, we get:
H(p) = - ( (9/14) . log_{2}(9/14) + (5/14) . log_{2}(5/14) ) = 0.94
I further worked out the calculation of Entropy and Information Gain for this problem in Python, See the completed and explained calculations in this code file.
The following graph shows the calculated gain for the various attributes (using the same Information Gain formula used in the python code):
That is:
Here is a sample of the (messy) hand calculation:
And the decision tree:
But then (after some research) I realized that the
Humidity
attribute should be more informative than the
Temperature
(as simply indicated by the value of the information gain 0.152 > 0.029). And that makes sense since when it is
Sunny
only
Humidity
determine whether to play tennis or not, no matter what the temperature
is.
If-then rules:
if ( (Outlook == Rain and Wind == Strong) or (Outlook == Sunny and Humidity == High) ) then No else Yes
- For a similar problem, see this worked example.
- Entropy and Information Gain used as described in Introduction to Artificial Intelligence, by W. Ertel.