几个算法比较-草稿版
算法名称 | 模型 | 策略 | 求解算法 |
---|---|---|---|
线性回归 |
f ( x ) = W T ⋅ x + b f(x)=W^T cdot x + b f(x)=WT⋅x+b |
最小二乘
L ( W , b ) = ( f ( x ) − y ) 2 L(W,b) = (f(x)-y)^2 L(W,b)=(f(x)−y)2 |
梯度下降、牛顿法 |
LOSSO回归 |
f ( x ) = W T ⋅ x + b f(x)=W^T cdot x + b f(x)=WT⋅x+b |
最小二乘
L ( W , b ) = ( f ( x ) − y ) 2 + λ ∣ ∣ W ∣ ∣ 1 L(W,b) = (f(x)-y)^2+lambda ||W||_1 L(W,b)=(f(x)−y)2+λ∣∣W∣∣1 |
坐标下降法 |
岭回归 |
f ( x ) = W T ⋅ x + b f(x)=W^T cdot x + b f(x)=WT⋅x+b |
最小二乘
L ( W , b ) = ( f ( x ) − y ) 2 + 1 2 λ W 2 L(W,b) = (f(x)-y)^2+frac{1}{2}lambda W^2 L(W,b)=(f(x)−y)2+21λW2 |
梯度下降 |
逻辑回归 |
f ( x ) = 1 1 + e − ( W T ⋅ x + b ) f(x)=frac{1}{1+e^{-(W^Tcdot x + b)}} f(x)=1+e−(WT⋅x+b)1 |
交叉熵损失
− l n p ( y ∣ x ) = − 1 m ∑ i = 1 m ( y l n y ^ + ( 1 − y ) l n ( 1 − y ^ ) ) -lnp(y|x)=-dfrac{1}{m}sum_{i=1}^m(ylnhat y+(1-y)ln(1-hat y)) −lnp(y∣x)=−m1∑i=1m(ylny^+(1−y)ln(1−y^)), y ^ = 1 1 + e − ( W T ⋅ X + b ) hat y = dfrac{1}{1+e^{-(W^T cdot X + b)}} y^=1+e−(WT⋅X+b)1 |
梯度下降、牛顿法 |
感知机 |
f ( x ) = s i g n ( W T ⋅ x + b ) f(x)=sign(W^T cdot x+b) f(x)=sign(WT⋅x+b),sign是一个符号函数 |
让分错的点距离当前分离超平面距离尽可能的小
L ( w , b ) = − ∑ x i ∈ M y i ( w x i + b ) L(w,b)=-sum_{x_i in M} y_i(wx_i+b) L(w,b)=−∑xi∈Myi(wxi+b) M是所有分类错误点的集合 |
随机梯度下降:每找到一个错误的样本更新一次参数 |
K近邻 |
y = a r g m a x ∑ x i ∈ N K ( x ) I ( y i = c j ) y=argmax sum_{x_iin N_K(x)} I(y_i=c_j) y=argmax∑xi∈NK(x)I(yi=cj) |
- | - |
朴素贝叶斯 |
P ( Y = c k ∣ X = x ) = P ( Y = c k ) ∏ j P ( X ( j ) = x ( j ) ∣ Y = c k ) ∑ k P ( Y = c k ) ∏ j P ( X ( j ) = x ( j ) ∣ Y = c k ) P(Y=c_k|X=x)=frac{P(Y=c_k)prod_j P(X^{(j)}=x^{(j)}|Y=c_k)}{sum_kP(Y=c_k)prod_j P(X^{(j)}=x^{(j)}|Y=c_k)} P(Y=ck∣X=x)=∑kP(Y=ck)∏jP(X(j)=x(j)∣Y=ck)P(Y=ck)∏jP(X(j)=x(j)∣Y=ck),条件独立性假设 |
经验风险期望最小化:
y = f ( x ) = a r g m a x c k P ( Y = c k ) ∏ j P ( X ( j ) = x ( j ) ∣ Y = c k ) y=f(x)=argmax_{c_k}P(Y=c_k)prod_j P(X^{(j)}=x^{(j)}|Y=c_k) y=f(x)=argmaxckP(Y=ck)∏jP(X(j)=x(j)∣Y=ck) |
极大似然估计:
P ( Y = c k ) = ∑ i = 1 N I ( y i = c k ) N P(Y=c_k)=frac{sum_{i=1}^N I(y_i=c_k)}{N} P(Y=ck)=N∑i=1NI(yi=ck), P ( X ( j ) = a j l ∣ Y = c k ) = ∑ i = 1 N I ( x i ( j ) = a j l , y i = c k ) ∑ i = 1 N I ( y i = c k ) P(X^{(j)}=a_{jl}|Y=c_k)=frac{sum_{i=1}^N I(x_i^{(j)}=a_{jl},y_i=c_k)}{sum_{i=1}^N I(y_i=c_k)} P(X(j)=ajl∣Y=ck)=∑i=1NI(yi=ck)∑i=1NI(xi(j)=ajl,yi=ck) |
SVM-线性可分 |
f ( x ) = s i g n ( W T ⋅ x + b ) f(x)=sign(W^T cdot x+b) f(x)=sign(WT⋅x+b) |
间隔最大化/合页损失:
m i n 1 2 ∣ ∣ w ∣ ∣ 2 mindfrac{1}{2}||w||^2 min21∣∣w∣∣2同时包含约束: y i ( W T ⋅ x i + b ) − 1 > = 0 y_i(W^Tcdot x_i+b)-1>=0 yi(WT⋅xi+b)−1>=0 |
拉格朗日、SMO |
SVM-线性近似可分 |
f ( x ) = s i g n ( W T ⋅ x + b ) f(x)=sign(W^T cdot x+b) f(x)=sign(WT⋅x+b) |
间隔最大化/合页损失:
m i n 1 2 ∣ ∣ w ∣ ∣ 2 + C ∑ i = 1 N ξ i minfrac{1}{2}||w||^2+Csum_{i=1}^Nxi_i min21∣∣w∣∣2+C∑i=1Nξi,同时包含约束: y i ( w ⋅ x i + b ) > = 1 − ξ i y_i(wcdot x_i+b)>=1-xi_i yi(w⋅xi+b)>=1−ξi |
拉格朗日、SMO |
SVM-支持向量机 |
f ( x ) = s i g n ( W T ⋅ x + b ) f(x)=sign(W^T cdot x+b) f(x)=sign(WT⋅x+b) |
间隔最大化/合页损失:
m i n α 1 2 ∑ i = 1 N ∑ j = 1 N α i α j y i y j K ( x i ⋅ x j ) − ∑ i = 1 N α i min_{alpha}dfrac{1}{2}sum_{i=1}^Nsum_{j=1}^Nalpha_ialpha_jy_iy_jK(x_icdot x_j)-sum_{i=1}^Nalpha_i minα21∑i=1N∑j=1NαiαjyiyjK(xi⋅xj)−∑i=1Nαi同时包含约束: ∑ i = 1 N α i y i = 0 sum_{i=1}^Nalpha_iy_i=0 ∑i=1Nαiyi=0, 0 < = α i < = C 0<=alpha_i<=C 0<=αi<=C |
拉格朗日、SMO |