Machine Learning Lecture Notes

1 Introduction

1.1 Machine Learning v.s Statistics

Definition

  • Machine Learning : a field that takes an algorithmic approach to data analysis, processing and prediction.

  • Algorithmic :produces good predictions or extracts useful information from data to solve a practical problem

Venn diagram: Staticticsdata scienceAI

Figure 1 Venn Diagram

1.2 Applications

Supervised Learning

  • Definition:
    Automate decision-making processes by generalising from input-output pairs

    (

    x

    i

    ,

    y

    i

    )

    (x_i , y_i )

    (xi,yi),

    i

    i ∈

    i {

    1

    ,

    .

    .

    .

    N

    1, . . . N

    1,...N} for some

    N

    N

    N ∈ N

    NN

  • Drawback
    Creating a dataset of inputs and outputs is often a laborious manual process.
  • Advantage
    Supervised learning algorithms are well-understood and their performance is easy to measure
  • Example(train tickets pricing by distance)

Unsupervised Learning

  • Definition
    Only the input data is known, and no known output data is given to the algorithm.
  • Drawback
    Harder to understand and evaluate than SL.
  • Advantage
    Only the input data is needed and there is no process of “creating input-output pairs” involved.
  • Example(trade portfolio)

Data & Tasks & Algorithms

  1. data(explain with case)
    - data point
    - feature
    - feature extraction/engineering

  2. task(figure: overview of ml tasks)
    - regression
    - classification
    - clustering
    - dimensonality reduction
    < Different tasks have different loss functions, refer to 1.3–Function >

  3. algorithm
    - supportvectormachines(SVMs)
    - nearestneighbours
    - random forest
    - k means
    - matrix factorisaion/autoencoder
    -

1.3 Deep Learning

Definition(supervised & unsupervised)

Deep learning solves Problems by employing neural networks(artificial neural networks), functions constructed by composing alternatingly affine and (simple) non-linear functions.

Task

  • prediction
  • classification
  • image recognition
  • speech recognition and synthesis
  • simulation
  • optimal decision making

  • Task of MAchine Learning

Applications in Finance

  • detect fraud
  • machine read cheques
  • perform credit scoring

Limits

  • limited explainability of deep learning
  • black-box nature of neural networks

Function

f

=

(

f

1

,

.

.

.

,

f

O

)

:

R

I

R

O

f =(f_1,...,f_O):R^I →R^O

f=(f1,...,fO):RIRO

  • inputs

    x

    1

    ,

    .

    .

    .

    ,

    x

    I

    x_1,...,x_I

    x1,...,xI   (

    I

    N

    I in N

    IN)

  • outputs

    f

    1

    (

    x

    1

    ,

    .

    .

    .

    ,

    x

    I

    )

    ,

    .

    .

    .

    ,

    f

    O

    (

    x

    1

    ,

    .

    .

    .

    ,

    x

    I

    )

    f_1(x_1,...,x_I),..., f_O(x_1,...,x_I )

    f1(x1,...,xI),...,fO(x1,...,xI)   (

    O

    N

    O in N

    ON)

  • loss function

    L

    (

    f

    )

    :

    =

    1

    I

    i

    =

    1

    I

    l

    (

    f

    i

    ^

    ,

    f

    i

    )

    L(f):=frac{1}{I}sum_{i=1}^Il(hat{f_i},f_i)

    L(f):=I1i=1Il(fi^,fi)   (e.g. squared loss, absolute loss)

1) Example

  • Regression Problem: data fitting
  • Binary Classification Problem:the direction of next price change//credit risk analysis

2) A Class of Functions Construction

  • rich in the sense that it encompasses “almost any” reasonable functional relationship between the outputs and inputs;
  • parameterised by a finite set of parameters,so that we can actually work with it numerically;
  • able to cope with high-dimensional inputs and outputs.

3) Optimal

f

f

f Selection

  • implementable numerically;
  • efficient enough to be able to cope with large numbers of samples;
  • able to avoid the pitfall of overfitting, that is, producing a function

    f

    f

    f that performs well with the training data but poorly with other data.
    在这里插入图片描述

4) Functions in Deep Learning

  • Function: Affine FunctionActivation Function

    f

    =

    σ

    r

    L

    r

    σ

    1

    L

    1

    :

    R

    I

    R

    O

    f =σ_r ◦L_r ◦···◦σ_1◦L_1:R^I →R^O

    f=σrLrσ1L1:RIRO
    where

    x

    =

    (

    x

    1

    ,

    .

    .

    .

    ,

    x

    d

    i

    )

    R

    d

    i

    x =(x_1,...,x_{d_i} )∈R_{d_i}

    x=(x1,...,xdi)Rdi;

    L

    i

    :

    R

    d

    i

    1

    R

    d

    i

    L_i:R^{d_i-1} → R^{d_i}

    Li:Rdi1Rdi,

    forall

    i

    N

    iinN

    iN is an affine function, transmiting

    d

    i

    1

    d_{i−1}

    di1 signals to

    d

    i

    d_i

    di units or neurons;

    σ

    i

    (

    x

    )

    :

    =

    (

    σ

    i

    (

    x

    1

    )

    ,

    .

    .

    .

    ,

    σ

    i

    (

    x

    d

    i

    )

    )

    :

    R

    R

    σ_i(x):=(σ_i(x_1),...,σ_i(x_{d_i} )):R→R

    σi(x):=(σi(x1),...,σi(xdi)):RR is an activation function, transforming

    d

    i

    1

    d_{i−1}

    di1 siginals.
    在这里插入图片描述

    < satisfy the 2) requirement>
    < Since there is no specific data definition for the model, it’s super general to fit diverse data.>

  • Optimal

    f

    f

    f:stochastic gradient descent (SGD)
    - the matrices and vectors parameterise its layers
    - a randomly drawn subset of samples(minibatch) is used
    - the gradient is computed using a form of algorithmic differentiation(backpropagation)

  • Loss function
    generally absolute value of residual,but some others in particular

History of Deep Learning

  • Heaviside function< problem sheet1>

    f

    (

    x

    ;

    w

    ,

    b

    )

    :

    =

    H

    (

    w

    x

    +

    b

    )

    ,

    f(x;w,b) := H(w^′x +b),

    f(x;w,b):=H(wx+b), where

    H

    (

    x

    )

    :

    =

    {

    0

    ,

    x

    <

    0

    1

    ,

    x

    0

    H(x):=begin{cases} 0,quad x< 0 \[2ex] 1, quad xgeq0 end{cases}

    H(x):=0,x<01,x0

  • Artificial Neuron case
    在这里插入图片描述
    - 1: if neuron A is activated, then neuron C gets activated as well (since it receives two input signals from neuron A); but if neuron A is off, then neuron C is off as well.
    - 2: (logical AND) Neuron C is activated only when both neurons A and B are activated (a single input signal is not enough to activate neuron C).
    - 3: (logical OR) Neuron C gets activated if either neuron A or neuron B is activated (or both).
    - 4: (logical NOT) Neuron C is activated only if neuron A is active and neuron B is off. If A is active all the time, then neuron C is active when neuron B is off, and vice versa.
本图文内容来源于网友网络收集整理提供,作为学习参考使用,版权属于原作者。
THE END
分享
二维码
,正则化的两个参数
< <上一篇
下一篇>>