传统图像处理（一）

admin • 2022-11-20 20:04 • 人工智能

文章目录

图像分类
手写字符分类

图像分类

HOG（方向梯度直方图）是目前计算机视觉领域和模式识别领域很常用的一种图像局部纹理特征描述器。图像分类任务可以选择“HOG+分类器”的组合，首先使用HOG提取图片的全局特征，然后将整个图片的特征输入分类器进行计算。

HOG的原理

因为是在图像的局部方格单元上进行操作，所以HOG对图像几何和光学的形变能保持较好的不变性。其次，在多尺度采样以及较强的局部光学归一化等条件下，手写字符的一些细微形变可以被忽略而不影响检测效果。
HOG特征提取过程如下：

计算图片中每个像素的梯度
将图片划分为很多大方格(block)，再将每个block划分成多个小方格（cell)
统计每个cell中的梯度分布直方图，得到每个cell的描述子（descriptor),统计每个像素的梯度方向分布，并按梯度大小加权投影到直方图中
将几个cell组成一个block，将每个cell的descriptor串联起来得到block的descriptor
将图片中每个block的descriptor串联起来得到图片的descriptor，即为图片的HOG特征

CIFAR-10分类

CIFAR-10数据集是一个深度学习任务中常用的小型图像分类数据集，虽然分辨率只有32×32，但却包含了丰富的场景信息

数据加载

##模型训练
from skimage.feature import hog
import matplotlib.pylab  as plt
from tqdm import tqdm
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV

class RFClassifier:
    def __init__(self):
        #加载数据
        self.data=Cifar()
        self.train_x,self.train_y,self.test_x,self.test_y=(self.data.load_cifar10())
        #建立模型
        self.clf=RandomForestClassifier(n_estimators=800,min_samples_leaf=5,verbose=True,n_jobs=-1)
        #提取训练集特征
        self.train_x_hog=[]
        for img in tqdm(self.train_x):
            img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
            self.train_x_hog.append(self.extract_feature(img))
        self.train_x_hog = np.array(self.train_x_hog)
        #提取测试集特征
        self.test_x_hog=[]
        for img in tqdm(self.test_x):
            img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
            self.test_x_hog.append(self.extract_feature(img))
        self.test_x_hog = np.array(self.test_x_hog)
        print(self.train_x_hog.shape)
        print(self.train_x.shape)

    #提取HOG特征
    def extract_feature(self,img):
        hog_feat = hog(img,
                       orientations=9,
                       pixels_per_cell=[3, 3],
                       cells_per_block=[2, 2],
                       feature_vector=True,
        )
        return hog_feat

    #模型训练
    def fit(self):
        self.clf.fit(self.train_x_hog,self.train_y)

    #验证模型
    def evaluate(self):
        trian_y_pred=self.clf.predict(self.train_x_hog)
        #计算准确性
        train_accuracy=sum(trian_y_pred==self.train_y)/len(self.train_y)
        print("train accury :{}".format(train_accuracy))
        test_y_pred = self.clf.predict(self.test_x_hog)
        #计算验证集的准确性
        test_accuracy = sum(test_y_pred == self.test_y) / len(self.test_y)
        print("train accury :{}".format(test_accuracy))


if __name__=="__main__":
    data=RFClassifier()
    data.fit()
    data.evaluate()

得到的结果是，训练集的准确率为98%,但是验证集的准确率在50%左右，可见，使用HOG这样的人工特征识别自然场景图片，并不如使用卷积神经网络那么容易。因此，这种思路可以用于解决更简单的任务。

手写字符分类

对图像进行预处理：

灰度化：字符的色彩，明暗都不会影响字符的含义，黑色的A和红色的A是一个意思。为了避免在模型中引入不必要的干扰，这里选择黑白图片作为训练数据。所以要队员图片进行处理，将3个通道的图片转化为1个通道的灰度图片。
二值化：与色彩，明暗相同，字符中的笔画深浅同样不会影响到字符的含义，所以为了简化运算，并且去除字符图片的背景干扰，需要对图片进行二值化处理
图像裁剪与缩放：图像剪裁是为了尽可能地去除图片中的空白区域，以增大不同的字符图像之间的特征差异
数据预处理

import numpy as np
from glob import glob
import os
from tqdm import tqdm
import re
from skimage.io import imread
from skimage.feature import hog
from skimage.transform import resize


img_paths = sorted(glob(r'./data/English/Hnd/Img/*/*.png'))
# img_paths=sorted(glob(r"E:MyCodePhotodataEnglishHndImg**.png"))
#二值化
def binary(img):
    row,cols=img.shape
    for i in range(row):
        for j in range(cols):
            if img[i,j]<0.5:
                img[i,j]=0
            else :
                img[i,j]=1
    return img

#剪裁空白处
def preprocess(img):
    width,height=img.shape
    #np.where(condition)
    #这个参数中，会将这个参数当成一个条件，如果满足的时候，
    #where它将符合条件下的元素的左边进行返回，而且返回的形式是一个元组
    rows,cols=np.where(img<1.)
    #获得离黑色最远的坐标点
    max_x,max_y=max(rows),max(cols)
    #获得离黑色最近的坐标点
    min_x,min_y=min(rows),min(cols)
    #获取形状的边长
    size=max(max_y-min_y,max_x-min_x)
    #字符旁边留空白
    y_empty = (size - (max_y - min_y))//2
    x_empty=(size-(max_x-min_x))//2
    #照片剪裁
    img=img[max(min_x-x_empty,0):min(max_x+x_empty,width),max(min_y-y_empty,0):min(max_y+y_empty,height)]
    img=resize(img,(64,64))
    return img
#提取特征
def hog_features(img_path):
    #读取图片
    img=imread(img_path,as_gray=True)
    #转化为二进制
    img=binary(img)
    img=preprocess(img)
    #hog提取特征
    hog_feat=hog(img,orientations=9,pixels_per_cell=[5,5],cells_per_block=[3,3])
    return hog_feat
#保存提取的特征
for img_path in tqdm(img_paths):
    np.save(re.sub(r".png",".npy",img_path),hog_features(img_path))

对图像进行训练及预测：

# os.path.split()函数
# 将文件名和路径分割开。
# 语法：os.path.split(‘PATH’)
# 参数说明：
# PATH指一个文件的全路径作为参数：
# 如果给出的是一个目录和文件名，则输出路径和文件名
# 如果给出的是一个目录名，则输出路径和为空文件名


# os.path.dirname(path)
# 语法：os.path.dirname(path)
# 功能：去掉文件名，返回目录

#sorted()作用：排序
#list(set())作用：对原列表去重并按从小到大排序
#set()作用：去重

#zip() 函数用于将可迭代的对象作为参数，将对象中对应的元素打包成一个个元组，
#然后返回由这些元组组成的对象

如有两个list，一个是一层，一个是嵌套，要组成一个dict

#将zip对象转换成字典看看

a=[1,2,3]

d=[['a','b','c'],['aa','bb','cc'],['aaa','bbb','ccc']]

dict(zip(a,d)) #{1: ['a', 'b', 'c'], 2: ['aa', 'bb', 'cc'], 3: ['aaa', 'bbb', 'ccc']}

#如果一个是key值，一个内层list就是一行value

[dict(zip(a,value)) for value in d] #[{1: 'a', 2: 'b', 3: 'c'},

{1: 'aa', 2: 'bb', 3: 'cc'},

{1: 'aaa', 2: 'bbb', 3: 'ccc'}]
原文链接：https://blog.csdn.net/weixin_29796905/article/details/113479564


#numpy.load()函数从具有npy扩展名(.npy)的磁盘文件返回输入数组。
#np.save(file, arr, allow_pickle=True, fix_imports=True)
# 解释：Save an array to a binary file in NumPy .npy format。以“.npy”格式将数组保存到二进制文件中。
# 参数：
# file 要保存的文件名称，需指定文件保存路径，如果未设置，保存到默认路径。其文件拓展名为.npy
# arr 为需要保存的数组，也即把数组arr保存至名称为file的文件中。

a = [1,2,3] 
b = [4,5,6]  
print(list(zip(a,b)))
for i,j in zip(a,b):     
    print(f"{i},{j}")

#coding=utf-8
from glob import glob
import numpy as np
from sklearn.svm import LinearSVC
from sklearn.model_selection import train_test_split
from tqdm import tqdm
import os

#numpy.load()函数从具有npy扩展名(.npy)的磁盘文件返回输入数组。
#np.save(file, arr, allow_pickle=True, fix_imports=True)
# 解释：Save an array to a binary file in NumPy .npy format。以“.npy”格式将数组保存到二进制文件中。
# 参数：
# file 要保存的文件名称，需指定文件保存路径，如果未设置，保存到默认路径。其文件拓展名为.npy
# arr 为需要保存的数组，也即把数组arr保存至名称为file的文件中。
img_paths = sorted(glob(r'./data/English/Hnd/Img/*/*.npy'))
#提取标签
labels =[os.path.split(os.path.dirname(im))[-1] for im in img_paths]
#将标签转化为数字ID
labels_set=sorted(list(set(labels)))
labels_dict = dict(zip(labels_set, [i for i in range(len(labels_set))]))
labels_id=[labels_dict[label] for label in labels]
#将特征整合成矩阵
features=[]
for feature_path in tqdm(img_paths):
    feature=np.load(feature_path)
    features.append(feature)
features=np.array(features)

#划分训练集
x_train,x_test,y_train,y_test=train_test_split(features,labels_id,test_size=0.15)

clf=LinearSVC(multi_class="ovr",verbose=True,max_iter=10000)
#训练模型
clf.fit(x_train,y_train)
#预测结果
pred=clf.predict(x_test)
#计算准确率
print(np.sum(pred==y_test)/len(y_test))

本图文内容来源于网友网络收集整理提供，作为学习参考使用，版权属于原作者。

THE END

二维码

KDD 2022 | 深度图神经网络中的特征过相关：一个新的视角

< <上一篇

阿里巴巴天池机器学习（数据分析达人赛3:汽车产品聚类分析）

下一篇>>

搜索内容