# 数据挖掘题目：根据规则模板和信息表找出R中的所有强关联规则，基于信息增益、利用判定树进行归纳分类，计算信息熵的代码

## 一、（30分）设最小支持度阈值为0.2500, 最小置信度为0.6500。对于下面的规则模板和信息表找出R中的所有强关联规则：

S∈R，P（S，x ）∧ Q（S，y ）==> Gpa（S，w ） [ s, c ]

Major Status Age Gpa Count

（1）Gpa = Good，

Major Status Age Count

(2) Gpa = Excellent

Major Status Age Count

Major(S, Arts)^Age(S,Young)=>Gpa(S, Good)[s=150/500=0.3000, c=150/150=1.0000]
Major(S, Arts)^Age(S,Old)=>Gpa(S, Excellent)[s=150/500=0.3000, c=150/200=0.7500]

Major(S, Arts)^Age(S,Young)=>Gpa(S, Good)[s=150/500=0.3000, c=150/150=1.0000]
Major(S, Arts)^Age(S,Old)=>Gpa(S, Excellent)[s=150/500=0.3000, c=150/200=0.7500]

## 二、（30分）设类标号属性 Gpa 有两个不同的值（ 即{ Good, Excellent } ）, 基于信息增益，利用判定树进行归纳分类。

N: Gpa = Excellent

p n I(p,n)
300 200 0.97095

I(p,n)=-0.6log2(0.6) –0.4log2(0.4)
= 0.97095

Major pi ni I(pi,ni)
Arts 200 150 0.98523
Appl_science 0 50 0
Science 100 0 0

E(Major) = 350/500*0.98523 = 0.68966

I(p,n)=-(4/7)log2(4/7) –(3/7)log2(3/7) =0.98523

Status pi ni I(pi,ni)

E(Status) = 200/5000.81128+300/5000.65002 = 0.71452

Age pi ni I(pi,ni)
Old 50 150 0.81128
Young 250 50 0.65002

E(Age) = E(Status) = 0.71452

Gain(Major) =0.97095-0.68966 = 0.28129
Gain(Status) =Gain(Age) =0.97095-0.71452 = 0.25643

(1)Major = Arts

Status Age Gpa Count

Status pi ni I(pi,ni)

E(Status) = 200/350*0.81128= 0.46359

Status pi ni I(pi,ni)
Old 50 150 0.81128
Young 150 0 0

E(Age) = E(Status)= 0.46359

Age Gpa Count
Old Good 50
Old Excellent 150

Status Age Gpa Count

（2）Major= Appl_Science

Status Age Gpa Count

（3）Major=Science

Status Age Gpa Count

_______Appl_Science_______________________Excellent

__________Science______________________Good

## 小 tricks

### 计算信息熵的代码

``````import math

def entropy(probabilities):
total = sum(probabilities)
probabilities= [p / total for p in probabilities]
entropy = 0
for p in probabilities:
if p > 0:
entropy -= p * math.log2(p)
return entropy

probabilities = [100,100,150]#计算100 100 150的信息熵

result = entropy(probabilities)
print("信息熵:", result)
``````

THE END