# HRNet网络简介

## 0 前言

1. 基于regressing的方式，即直接预测每个关键点的位置坐标。
2. 基于heatmap的方式，即针对每个关键点预测一张热力图（预测出现在每个位置上的分数）。

## 1 HRNet网络结构

# Stage1
downsample = nn.Sequential(
nn.Conv2d(64, 256, kernel_size=1, stride=1, bias=False),
nn.BatchNorm2d(256, momentum=BN_MOMENTUM)
)
self.layer1 = nn.Sequential(
Bottleneck(64, 64, downsample=downsample),
Bottleneck(256, 64),
Bottleneck(256, 64),
Bottleneck(256, 64)
)


## 2 预测结果（heatmap）的可视化

1

4

frac{1}{4}

41，所以高宽分别对应的是64和48，接着对每个关键点对应的预测信息求最大值的位置，即预测score最大的位置，作为预测关键点的位置，映射回原图就能得到原图上关键点的坐标（下图有画出每个预测关键点对应原图的位置）。

Each keypoint location is predicted by adjusting the highest heatvalue location with a quarter offset in the direction from the highest response to the second highest response.

for n in range(coords.shape[0]):
for p in range(coords.shape[1]):
hm = batch_heatmaps[n][p]
px = int(math.floor(coords[n][p][0] + 0.5))
py = int(math.floor(coords[n][p][1] + 0.5))
if 1 < px < heatmap_width-1 and 1 < py < heatmap_height-1:
diff = np.array(
[
hm[py][px+1] - hm[py][px-1],
hm[py+1][px]-hm[py-1][px]
]
)
coords[n][p] += np.sign(diff) * .25


"kps": ["nose","left_eye","right_eye","left_ear","right_ear","left_shoulder","right_shoulder","left_elbow","right_elbow","left_wrist","right_wrist","left_hip","right_hip","left_knee","right_knee","left_ankle","right_ankle"]


## 3 损失的计算

The loss function, defined as the mean squared error, is applied for comparing the predicted heatmaps and the groundtruth heatmaps. The groundtruth heatmpas are generated by applying 2D Gaussian with standard deviation of 1 pixel centered on the grouptruth location of each keypoint.

"kps": ["nose","left_eye","right_eye","left_ear","right_ear","left_shoulder","right_shoulder","left_elbow","right_elbow","left_wrist","right_wrist","left_hip","right_hip","left_knee","right_knee","left_ankle","right_ankle"]
"kps_weights": [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.2, 1.2, 1.5, 1.5, 1.0, 1.0, 1.2, 1.2, 1.5, 1.5]


## 4 评价准则

O

K

S

=

i

[

e

d

i

2

/

2

s

2

k

i

2

δ

(

v

i

>

0

)

]

i

[

δ

(

v

i

>

0

)

]

OKS = frac{{textstyle sum_{i}^{}} [e^{{-d_i^2}/{2s^2}{k_i^2}} cdot delta(v_i>0)]}{ {textstyle sum_{i}^{}} [delta(v_i>0)]}

OKS=i[δ(vi>0)]i[edi2/2s2ki2δ(vi>0)]

• i

i

代表第i个关键点。

• v

i

v_i

代表第i个关键点的可见性，这里的

v

i

v_i

是由GT提供的。

v

i

=

0

v_i=0

表示该点一般是在图像外无法标注，

v

i

=

1

v_i=1

表示虽然该点不可见但大概能猜测出位置（比如人侧着站时虽然有一只耳朵被挡住了，但大概也能猜出位置），

v

i

=

2

v_i=2

表示该点可见。

• δ

(

x

)

delta(x)

，当

x

x

为True时值为1，

x

x

为False时值为0。通过上面公式可知，OKS只计算GT中标注出的点，即

v

i

>

0

v_i>0

的所有关键点。

• d

i

d_i

为第i个预测关键点与对应GT之间的欧氏距离。

• s

s

为目标面积的平方根，原话：scale s which we define as the square root of the object segment area，这里的面积应该指的是分割面积。该数据在COCO数据集标注信息中都是有提供的。

• k

i

k_i

是用来控制关键点类别i的衰减常数，原话：κi is a per-keypont constant that controls falloff，这个常数是在验证集（5000张）上统计得到的，具体如何计算

k

i

k_i

参考官网中1.3. Tuning OKS的介绍。

## 5 其他

### 5.1 数据增强

4

5

-45^{circ}

45

4

5

45^{circ}

45之间），随机缩放（在0.65到1.35之间），随机水平翻转以及half body（有一定概率会对目标进行裁剪，只保留半身关键点，上半身或者下半身）。在源码中，作者主要是通过仿射变换来实现的以上操作，如果对仿射变换不太了解看代码会比较吃力。

THE END