# 【SVM】简单介绍（三）

### 1、Lagrange Multipliers

Minimize

f

(

x

)

subject to

{

a

(

x

)

0

b

(

x

)

0

c

(

x

)

=

0

begin{aligned} text { Minimize } & f(x) \ text { subject to } quad & left{begin{array}{l} a(x) geq 0 \ b(x) leq 0 \ c(x)=0 end{array}right. end{aligned}

Minimize  subject to f(x)

a(x)0b(x)0c(x)=0

L

(

x

,

α

)

=

f

(

x

)

α

1

a

(

x

)

α

2

b

(

x

)

α

3

c

(

x

)

{

α

1

0

α

2

0

α

3

is unconstrained

begin{aligned} L(x, alpha)= & f(x)-alpha_1 a(x)-alpha_2 b(x)-alpha_3 c(x) \ & left{begin{array}{l} alpha_1 geq 0 \ alpha_2 leq 0 \ alpha_3 text { is unconstrained } end{array}right. end{aligned}

L(x,α)=f(x)α1a(x)α2b(x)α3c(x)

α10α20α3 is unconstrained

max

α

L

(

x

,

α

)

=

{

f

(

x

)

,

if

{

a

(

x

)

0

b

(

x

)

0

c

(

x

)

=

0

+

,

otherwise

max _alpha L(x, alpha)=left{begin{array}{lr} f(x), & text { if }left{begin{array}{l} a(x) geq 0 \ b(x) leq 0 \ c(x)=0 end{array}right. \ +infty, & text { otherwise } end{array}right.

αmaxL(x,α)=

f(x),+, if

a(x)0b(x)0c(x)=0 otherwise

min

x

max

α

L

(

x

,

α

)

subject to

{

a

(

x

)

0

b

(

x

)

0

c

(

x

)

=

0

begin{aligned} min _x &max _alpha L(x, alpha)\ text { subject to } quad & left{begin{array}{l} a(x) geq 0 \ b(x) leq 0 \ c(x)=0 end{array}right. end{aligned}

xmin subject to αmaxL(x,α)

a(x)0b(x)0c(x)=0

min

x

max

α

L

(

x

,

α

)

=

max

α

min

x

L

(

x

,

α

)

min _x max _alpha L(x, alpha)=max _alpha min _x L(x, alpha)

xminαmaxL(x,α)=αmaxxminL(x,α)

x

x

x消掉后，我们最终的优化问题将与样本无关，只与拉格朗日乘子有关，SVM似乎不会受样本的维数影响

### 2、KKT条件

Stationarity

f

(

x

)

α

1

a

(

x

)

α

2

b

(

x

)

α

3

c

(

x

)

=

0

Primal feasibility

{

a

(

x

)

0

b

(

x

)

0

c

(

x

)

=

0

Dual feasibility

{

α

1

0

α

2

0

α

3

is unconstrained

Complementary slackness

{

α

1

a

(

x

)

=

0

α

2

b

(

x

)

=

0

α

3

c

(

x

)

=

0

begin{aligned} & text { Stationarity } nabla fleft(x^*right)-alpha_1 nabla aleft(x^*right)-alpha_2 nabla bleft(x^*right)-alpha_3 nabla cleft(x^*right)=0 \ & text { Primal feasibility }left{begin{array}{l} aleft(x^*right) geq 0 \ bleft(x^*right) leq 0 \ cleft(x^*right)=0 end{array}right. \ & text { Dual feasibility }left{begin{array}{l} alpha_1 geq 0 \ alpha_2 leq 0 \ alpha_3 text { is unconstrained } end{array}right. \ & text { Complementary slackness }left{begin{array}{l} alpha_1 aleft(x^*right)=0 \ alpha_2 bleft(x^*right)=0 \ alpha_3 cleft(x^*right)=0 end{array}right. end{aligned}

Stationarity f(x)α1a(x)α2b(x)α3c(x)=0 Primal feasibility

a(x)0b(x)0c(x)=0 Dual feasibility

α10α20α3 is unconstrained  Complementary slackness

α1a(x)=0α2b(x)=0α3c(x)=0

### 3、Hard Margin SVM 对偶问题

Minimize

1

2

w

2

frac{1}{2}|mathbf{w}|^2

21w2
subject to

1

y

i

(

w

T

x

i

+

b

)

0

1yi(wTxi+b)0 for

i

=

1

,

,

n

i=1, ldots, n

i=1,,n

L

=

1

2

w

T

w

+

i

=

1

n

α

i

(

1

y

i

(

w

T

x

i

+

b

)

)

mathcal{L}=frac{1}{2} mathbf{w}^T mathbf{w}+sum_{i=1}^n alpha_ileft(1-y_ileft(mathbf{w}^T mathbf{x}_i+bright)right)

L=21wTw+i=1nαi(1yi(wTxi+b))

w

+

i

=

1

n

α

i

(

y

i

)

x

i

=

0

w

=

i

=

1

n

α

i

y

i

x

i

i

=

1

n

α

i

y

i

=

0

α

i

0

begin{aligned} mathbf{w}+sum_{i=1}^n alpha_ileft(-y_iright) mathbf{x}_i&=mathbf{0} quad Rightarrow quad mathbf{w}=sum_{i=1}^n alpha_i y_i mathbf{x}_i \ sum_{i=1}^n alpha_i y_i&=0 quad alpha_i geq 0 \ & end{aligned}

w+i=1nαi(yi)xii=1nαiyi=0w=i=1nαiyixi=0αi0

w

mathbf{w}

w代入）

max

.

W

(

α

)

=

i

=

1

n

α

i

1

2

i

=

1

,

j

=

1

n

α

i

α

j

y

i

y

j

x

i

T

x

j

subject to

α

i

0

,

i

=

1

n

α

i

y

i

=

0

begin{aligned} & max . quad W(boldsymbol{alpha})=sum_{i=1}^n alpha_i-frac{1}{2} sum_{i=1, j=1}^n alpha_i alpha_j y_i y_j mathbf{x}_i^T mathbf{x}_j \ & text { subject to } alpha_i geq 0, sum_{i=1}^n alpha_i y_i=0 end{aligned}

max.W(α)=i=1nαi21i=1,j=1nαiαjyiyjxiTxj subject to αi0,i=1nαiyi=0

w

=

i

=

1

n

α

i

y

i

x

i

mathbf{w}=sum_{i=1}^n alpha_i y_i mathbf{x}_i

w=i=1nαiyixi

1. 由于标签的值为+1或-1,所以上式隐含正负样本对分解面的贡献是大致相同的。正负样本规模大致相当
2. 对于每一个样本

x

i

mathbf{x}_i

，都有一个

α

i

alpha_i

，而当

α

i

alpha_i

0

0

时，该样本对分类器没有贡献，事实确实如此。而那些对分类器有贡献的样本又叫支撑向量Support Vectors

THE END