《动手学深度学习》pytorch版第二章练习

admin • 2022-01-28 08:14 • 人工智能

Dive Into Deep Learning

2. 预备知识
- 2.1 数据操作
- - 第一题
  - 第二题
- 2.2 数据预处理
- - 第一题
- 2.3 线性代数
- - 第一题
  - 第二题
  - 第三题
  - 第四题
  - 第五题
  - 第六题
  - 第七题
  - 第八题
- 2.4 微积分
- 2.5 自动微分
- - 第一题
  - 第二题
  - 第三题
  - 第四题
  - 第五题
- 2.6 概率

（刚刚开始学习深度学习，争取把节课的练习都记录下来，菜鸡一个，如果哪个地方有错误或是没有理解到位烦请各位大佬指教）

2. 预备知识

2.1 数据操作

第一题

运行本节中的代码。将本节中的条件语句X == Y更改为X < Y或X > Y，然后看看你可以得到什么样的张量

X = torch.arange(12, dtype=torch.float32).reshape((3,4))
Y = torch.tensor([[2.0, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])
X < Y, X > Y

运行结果

	(tensor([[ True, False,  True, False],
         [False, False, False, False],
         [False, False, False, False]]),
    tensor([[False, False, False, False],
         [ True,  True,  True,  True],
         [ True,  True,  True,  True]]))

第二题

用其他形状（例如三维张量）替换广播机制中按元素操作的两个张量。结果是否与预期相同？

若为 (2x1x3) + (1x3x2) 则报错

a = torch.tensor([[[1,2,3]],[[5,3,5]]])  # 2x1x3
b = torch.tensor([[[1,2],[3,5],[6,7]]])  # 1x3x2
a + b

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Input In [103], in <module>
      1 a = torch.tensor([[[1,2,3]],[[5,3,5]]])
      2 b = torch.tensor([[[1,2],[3,5],[6,7]]])
----> 3 a + b

RuntimeError: The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension 2

若为(1x2x3) + (2x1x3)

a = torch.tensor([[[1, 2, 3], [4, 5, 6]]])  # 1x2x3
b = torch.tensor([[[7, 8, 9]], [[4, 5, 6]]])  # 2x1x3
a + b

结果为(2x2x3)

tensor([[[ 8, 10, 12],
         [11, 13, 15]],

        [[ 5,  7,  9],
         [ 8, 10, 12]]])

关于广播机制，参考pytorch官方给出的解释：
1.每个张量至少有一个维度。
2.在迭代维度大小时，从尾随维度开始，维度大小必须相等，其中之一为 1，或者其中之一不存在。
站内找到了<狗狗狗大王>的一篇文章解释的非常详细：

前提2: 按顺序看两个张量的每一个维度，x和y每个对应着的两个维度都需要能够匹配上。什么情况下算是匹配上了？满足下面的条件就可以：
if 这两个维度的大小相等
elif 某个维度一个张量有，一个张量没有
elif 某个维度一个张量有，一个张量也有但大小是1

2.2 数据预处理

os.makedirs(dir_name2, exist_ok=True) 可以递归的创建文件夹
exist_ok参数为True时，判断若文件夹存在就不创建

os.path.join() 拼接文件路径

第一题

创建包含更多行和列的原始数据集。(1) 删除缺失值最多的列。(2) 将预处理后的数据集转换为张量格式。

import os
import pandas as pd
import torch

os.makedirs(os.path.join('..', '2.1', 'data2'), exist_ok=True)
data_file = os.path.join('..', '2.1', 'data2', 'house_tiny.csv')
with open(data_file, 'w') as f:
    f.write('NumRooms,Alley,Floor,Pricen')
    f.write('NA,Pave,2,127500n')
    f.write('2,NA,1,106000n')
    f.write('4,NA,NA,178100n')
    f.write('NA,NA,2,140000n')
    f.write('NA,NA,2,152000n')
data = pd.read_csv(data_file)
inputs, outputs = data.iloc[:, 0:3], data.iloc[:, 3]

num = inputs.isnull().sum() # 获取缺失值最多的个数
Max_NaN = inputs.isnull().sum().idxmax()  # 获取缺失值最多个数的索引
inputs = inputs.drop(Max_NaN, axis=1) # 在inputs里删除缺失值最多的项
inputs = inputs.fillna(inputs.mean()) # 用同一列的均值替换该列的缺失项
inputs = pd.get_dummies(inputs, dummy_na=True)

x, y = torch.tensor(inputs.values), torch.tensor(outputs.values)  # 转化为张量形式

运行结果

(tensor([[3.0000, 2.0000],
         [2.0000, 1.0000],
         [4.0000, 1.7500],
         [3.0000, 2.0000],
         [3.0000, 2.0000]], dtype=torch.float64),
 tensor([127500, 106000, 178100, 140000, 152000]))

df.isnull().sum() 统计每列含有多少行数的null值，返回行数
.idxmax() 获取pandas中series最大值对应的索引。
drop函数默认删除行，列需要加axis = 1

2.3 线性代数

第一题

证明一个矩阵 A 的转置的转置是 A ，即 (A^T)^T=A

A = torch.randn(4, 3)
A == A.T.T

运行结果

tensor([[True, True, True],
        [True, True, True],
        [True, True, True],
        [True, True, True]])

第二题

给出两个矩阵 A 和 B ，证明“它们转置的和”等于“它们和的转置”，即 A^T+B^T=(A+B)^T

A = torch.arange(12).reshape(3,4)
B = torch.randn(3,4)
A.T + B.T == (A + B).T

运行结果

tensor([[True, True, True],
        [True, True, True],
        [True, True, True],
        [True, True, True]])

第三题

给定任意方阵 A ， A+A^T总是对称的吗?为什么?

A = torch.randn(4, 4)
(A + A.T).T == (A + A.T)

运行结果

tensor([[True, True, True, True],
        [True, True, True, True],
        [True, True, True, True],
        [True, True, True, True]])

第四题

我们在本节中定义了形状 (2,3,4) 的张量X。len(X)的输出结果是什么？

X = torch.arange(24).reshape(2, 3, 4)
len(X)

运行结果

第五题

对于任意形状的张量X,len(X)是否总是对应于X特定轴的长度?这个轴是什么?

X = torch.arange(24).reshape(2, 3, 4)  # 2x3x4张量
Y = torch.arange(24).reshape(4, 6)  # 4X6张量
Z = torch.ones(1)   # 1维张量
len(X), len(Y), len(Z)

运行结果

2  4  6

len() 返回tensor第零维的长度

第六题

运行A / A.sum(axis=1)，看看会发生什么。你能分析原因吗？

若A为方阵

A = torch.arange(16).reshape(4, 4)  # 4X4
A / A.sum(axis=1)

运行结果

tensor([[0.0000, 0.0455, 0.0526, 0.0556],
        [0.6667, 0.2273, 0.1579, 0.1296],
        [1.3333, 0.4091, 0.2632, 0.2037],
        [2.0000, 0.5909, 0.3684, 0.2778]])

若A不为方阵

A = torch.arange(12).reshape(3, 4)  # 3X4
B = torch.arange(12).reshape(4, 3)  # 4x3
A / A.sum(axis=1)  # 或 B / B.sum(axis=1)

运行结果

RuntimeError                              Traceback (most recent call last)
Input In [78], in <module>
----> 1 A / A.sum(axis=1)

RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 1

若A为方阵，A.sum(axis=1)为沿着1轴降维，结果与A阵的长度相等，可以进行矩阵按元素的除法操作
若A不是方阵，A.sum(axis=1)为沿着1轴降维，结果与A阵的长度不相等，所以无法按元素进行运算

第七题

考虑一个具有形状 (2,3,4) 的张量，在轴0、1、2上的求和输出是什么形状?

A = torch.arange(24).reshape(2,3,4)  # 2x3x4
A.sum(axis=0).shape, A.sum(axis=1).shape, A.sum(axis=2).shape

运行结果

torch.Size([3, 4])
torch.Size([2, 4])
torch.Size([2, 3])

在轴0，1，2上求和分别为沿着z,x,y轴压缩后的形状

第八题

为linalg.norm函数提供3个或更多轴的张量，并观察其输出。对于任意形状的张量这个函数计算得到什么?

A, B = torch.randn(2,3,4), torch.randn(3, 4)
outputs1 = torch.linalg.norm(A)  # 2x3x4张量
outputs2 = torch.linalg.norm(B)  # 3x4张量
A, B, outputs1, outputs2

运行结果

(tensor([[[ 2.1417, -1.2939, -0.0506,  0.0582],
          [ 0.9437,  0.3785, -0.0736, -0.1000],
          [-0.2323,  1.3399,  0.6603,  0.8154]],
 
         [[-0.1303, -0.4355, -0.2770,  1.8112],
          [ 0.7443, -0.1177,  0.8033,  0.0264],
          [ 0.5158, -0.1448, -0.7694, -0.5072]]]),
 tensor([[-1.1264,  0.0546,  0.4413, -0.1869],
         [-1.7601, -0.4381, -0.2288, -1.7541],
         [-0.1453,  1.0307, -0.8918,  0.7459]]),
 tensor(4.0225),
 tensor(3.2180))

相当于求L2范数，可表示张量的大小

2.4 微积分

第一题

绘制函数

y

=

f

(

x

)

=

x

3

−

1

x

y=f(x)=x^{3}-frac{1}{x}

$y = f (x) = x^{3} - \frac{1}{x}$ 和其在

x

=

1

x=1

$x = 1$ 处切线的图像。

plot(x, [x ** 3 - 1 / x, 4 * x - 4], 'x', 'f(x)', legend=['f(x)', 'Tangent line (x=1)'])

运行结果

第二题

求函数

f

(

x

)

=

3

x

1

2

+

5

e

x

2

f(x)=3x_{1}^{2}+5e^{x_{2}}

$f (x) = 3 x_{12} + 5 e^{x_{2}}$ 的梯度。

[

6

x

1

6x_{1}

$6 x_{1}$ ,

5

e

x

2

5e^{x_{2}}

$5 e^{x_{2}}$ ] 分别对第一项的x₁和第二项的x₂求导梯度为向量形式

第三题

函数

f

(

x

)

=

∥

x

∥

2

f(x)=left | x right |_{2}

$f (x) = ∥ x ∥_{2}$ 的梯度是什么？

∂

f

(

x

)

∂

x

=

∂

∥

x

∥

2

∂

x

=

x

∥

x

∥

2

frac{partial f(x)}{partial x}=frac{partial left | x right |_{2}}{partial x}=frac{x}{left | x right |_{2}}

$\frac{\partial f ( x )}{\partial x} = \frac{\partial ∥ x ∥ _{2}}{\partial x} = \frac{x}{∥ x ∥ _{2}}$

L2范数可看成

x

2

sqrt{x^{2}}

$x^{2}$

, 则

f

(

x

)

f(x)

$f (x)$ 梯度为

x

2

sqrt{x^{2}}

$x^{2}$

对

x

x

$x$ 求导：

f

′

(

x

)

=

2

x

2

x

2

=

x

x

2

=

x

∥

x

∥

2

f^{'}(x)=frac{2x}{2sqrt{ x^{2}}}=frac{x}{sqrt{x^{2}}}=frac{x}{left | x right |_{2}}

$f^{^{'}} (x) = 2 x ^{2}$

2x=x2

x=∥x∥2x

第四题

你可以写出函数

u

=

f

(

x

,

y

,

z

)

u=f(x,y,z)

$u = f (x, y, z)$ ，其中

x

=

x

(

a

,

b

)

x=x(a,b)

$x = x (a, b)$ ，

y

=

y

(

a

,

b

)

y=y(a,b)

$y = y (a, b)$ ，

z

=

z

(

a

,

b

)

z=z(a,b)

$z = z (a, b)$ 的链式法则吗?

d

u

d

a

=

d

u

d

x

d

x

d

a

+

d

u

d

y

d

y

d

a

+

d

u

d

z

d

z

d

a

frac{du}{da}=frac{du}{dx}frac{dx}{da}+frac{du}{dy}frac{dy}{da}+frac{du}{dz}frac{dz}{da}

$\frac{d u}{d a} = \frac{d u}{d x} \frac{d x}{d a} + \frac{d u}{d y} \frac{d y}{d a} + \frac{d u}{d z} \frac{d z}{d a}$

d

u

d

b

=

d

u

d

x

d

x

d

b

+

d

u

d

y

d

y

d

b

+

d

u

d

z

d

z

d

b

frac{du}{db}=frac{du}{dx}frac{dx}{db}+frac{du}{dy}frac{dy}{db}+frac{du}{dz}frac{dz}{db}

$\frac{d u}{d b} = \frac{d u}{d x} \frac{d x}{d b} + \frac{d u}{d y} \frac{d y}{d b} + \frac{d u}{d z} \frac{d z}{d b}$

2.5 自动微分

requires_grad是Pytorch中通用数据结构Tensor的一个属性，用于说明当前量是否需要在计算中保留对应的梯度信息

向量x求和，相当于向量乘一个单位向量E，故求梯度后为1

第一题

为什么计算二阶导数比一阶导数的开销要更大？

因为计算二阶倒数需要先计算出一阶导数

第二题

在运行反向传播函数之后，立即再次运行它，看看会发生什么。

import torch

x = torch.arange(4.0, requires_grad=True)
y = 2 * torch.dot(x, x)
y.backward()
y.backward()  # 立即再执行一次反向传播
x.grad

运行结果

RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling .backward() or autograd.grad() the first time.  # 在试图第二次反向传播时，第一次反向传播的结果已经被释放了

在第一次 .backward时指定retain graph=True

import torch

x = torch.arange(4.0, requires_grad=True)
y = 2 * torch.dot(x, x)
y.backward(retain_graph=True)  # 保留计算图不被释放
y.backward()
x.grad

运行结果

tensor([ 0.,  8., 16., 24.])

pytorch默认不能连续执行反向传播，如需线序执行需要更新x.grad

第三题

在控制流的例子中，我们计算d关于a的导数，如果我们将变量a更改为随机向量或矩阵，会发生什么？

import torch

def f(a):
    b = a * 2
    while b.norm() < 1000:
        b = b * 2
    if b.sum() > 0:
        c = b
    else:
        c = 100 * b
    return c

a = torch.randn(size=(2,2), requires_grad=True)  # a为2x2
d = f(a)
d.backward()

运行结果

RuntimeError: grad can be implicitly created only for scalar outputs  # 不对向量或矩阵进行反向传播

对于非标量（向量或矩阵）来说，需要指定gradient的长度与其长度相匹配

第四题

重新设计一个求控制流梯度的例子，运行并分析结果。

import torch

def f(a):
    b = a / 2
    while b > 1:
        b = pow(a, 2)
    if b < 3:
        c = b * 2
    else:
        c = b * 3
    return c

a = torch.randn(size=(), requires_grad=True)
d = f(a)
d.backward()
a.grad == d / a

运行结果

tensor(True)

与2.5.4例子思想一致

第五题

使

(

)

(

)

f(x)=sin(x)

$f (x) = s i n (x)$ ，绘制

(

)

f(x)

$f (x)$ 和

(

)

frac{df(x)}{dx}

$\frac{d f ( x )}{d x}$ 的图像，其中后者不使用

′

(

)

(

)

f^{'}(x)=cos(x)

$f^{^{'}} (x) = c o s (x)$

错误代码

import torch
import matplotlib.pyplot as plt
import numpy as np
x = torch.arange(-3*np.pi, 3*np.pi, 0.1,requires_grad=True)
y = torch.sin(x)

y.sum().backward()

plt.plot(x, y, label='y=sin(x)') 
plt.plot(x, x.grad, label='dsin(x)=cos(x)') 
plt.legend(loc='upper center')
plt.show()

报错：

RuntimeError: Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.

不能对将要grad的张量调用numpy()，应用tensor.detach().numpy()来代替

错误代码

y.backward()

报错：

grad can be implicitly created only for scalar outputs

y为标量时可进行反向传播，每一个输入x对应一个标量输出y*，故将 y.sum 可得到一个标量

import torch
import matplotlib.pyplot as plt
import numpy as np
x = torch.arange(-3*np.pi, 3*np.pi, 0.1,requires_grad=True)
y = torch.sin(x)

y.sum().backward()

plt.plot(x.detach(), y.detach(), label='y=sin(x)') 
plt.plot(x.detach(), x.grad, label='dsin(x)=cos(x)') 
plt.legend(loc='upper center')
plt.show()

运行结果

2.6 概率

%matplotlib inline 为魔法函数： IPython有一组预先定义好的所谓的魔法函数，可以通过命令行的语法形式来访问它们。“%”后就是魔法函数的参数
该函数的功能是内嵌绘图，并且省掉了plt.show()

torch.cumsum(input, dim, *, dtype=None, out=None) 返回维度dim中输入元素的累计和

第一题

进行 m=500 组实验，每组抽取 n=10 个样本。改变 m 和 n ，观察和分析实验结果。

增加m或增加n都会增加样本容量，这样会使结果更加趋向于真实的概率

第二题

给定两个概率为 P(A) 和 P(B) 的事件，计算 P(A∪B) 和 P(A∩B) 的上限和下限。

第三题

假设我们有一系列随机变量，例如 A 、 B 和 C ，其中 B 只依赖于 A ，而 C 只依赖于 B ，你能简化联合概率 P(A,B,C) 吗？

马尔可夫链是一组具有马尔可夫性质的离散随机变量的集合。具体地，对概率空间

(

℧

,

F

,

P

)

(mho ,F,mathbb{P})

$(℧, F, P)$ 内以一维可数集为指数集（index set）的随机变量集合

X

=

{

X

n

:

n

>

0

}

X=left { X_{n}:n>0 right }

$X = {X_{n} : n > 0}$ ，若随机变量的取值都在可数集内：

X

=

s

i

,

s

i

∈

s

X=s_{i},s_{i}in s

$X = s_{i}, s_{i} \in s$ ，且随机变量的条件概率满足如下关系：

P

(

X

t

+

1

∣

X

t

,

.

.

.

,

X

1

)

=

P

(

X

t

+

1

∣

X

t

)

P(X_{t+1}|X_{t},...,X_{1})=P(X_{t+1}|X_{t})

$P (X_{t + 1} ∣ X_{t}, . . ., X_{1}) = P (X_{t + 1} ∣ X_{t})$

(

)

(

)

(

∣

)

(

∣

)

(

)

(

∣

)

(

∣

)

P(ABC)=P(A)P(B|A)P(C|BA)=P(A)P(B|A)P(C|B)

$P (A B C) = P (A) P (B ∣ A) P (C ∣ B A) = P (A) P (B ∣ A) P (C ∣ B)$

见 2022版《张宇概率论与数理统计9讲》P7，4.注

第四题

在 2.6.2.6节中，第一个测试更准确。为什么不运行第一个测试两次，而是同时运行第一个和第二个测试?

因为每次测试的特征不一样，就会导致每次测试受不同的影响，若运行两次第一个测试则会造成这种影响的叠加，同时运行第一个和第二个测试能够很有效的抵消这种影响。

本图文内容来源于网友网络收集整理提供，作为学习参考使用，版权属于原作者。

THE END

python PyTorch

二维码

2022年的第一篇程序人生。。。

< <上一篇

什么是NFT? NFT有哪些平台和项目？Near下获取NFT实践（Mintbase）

下一篇>>

搜索内容

《动手学深度学习》pytorch版第二章练习

Dive Into Deep Learning

2. 预备知识

2.1 数据操作

第一题

第二题

2.2 数据预处理

第一题

2.3 线性代数

第一题

第二题

第三题

第四题

第五题

第六题

第七题

第八题

2.4 微积分

第一题

第二题

第三题

第四题

2.5 自动微分

第一题

第二题

第三题

第四题

第五题

2.6 概率

第一题

第二题

第三题

第四题

最新文章

分类

标签云

《动手学深度学习》pytorch版 第二章练习

Dive Into Deep Learning

2. 预备知识

2.1 数据操作

第一题

第二题

2.2 数据预处理

第一题

2.3 线性代数

第一题

第二题

第三题

第四题

第五题

第六题

第七题

第八题

2.4 微积分

第一题

第二题

第三题

第四题

2.5 自动微分

第一题

第二题

第三题

第四题

第五题

2.6 概率

第一题

第二题

第三题

第四题

最新文章

分类

标签云

《动手学深度学习》pytorch版第二章练习