阿里巴巴天池机器学习（数据分析达人赛3:汽车产品聚类分析）

admin • 2022-11-20 20:04 • 人工智能

赛题背景

赛题以竞品分析为背景，通过数据的聚类，为汽车提供聚类分类。对于指定的车型，可以通过聚类分析找到其竞品车型。通过这道赛题，鼓励学习者利用车型数据，进行车型画像的分析，为产品的定位，竞品分析提供数据决策。

赛题数据

数据源：car_price.csv，数据包括了205款车的26个字段

1	Car_ID	Unique id of each observation (Interger)
2	Symboling	Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical)
3	carCompany	Name of car company (Categorical)
4	fueltype	Car fuel type i.e gas or diesel (Categorical)
5	aspiration	Aspiration used in a car (Categorical)
6	doornumber	Number of doors in a car (Categorical)
7	carbody	body of car (Categorical)
8	drivewheel	type of drive wheel (Categorical)
9	enginelocation	Location of car engine (Categorical)
10	wheelbase	Weelbase of car (Numeric)
11	carlength	Length of car (Numeric)
12	carwidth	Width of car (Numeric)
13	carheight	height of car (Numeric)
14	curbweight	The weight of a car without occupants or baggage. (Numeric)
15	enginetype	Type of engine. (Categorical)
16	cylindernumber	cylinder placed in the car (Categorical)
17	enginesize	Size of car (Numeric)
18	fuelsystem	Fuel system of car (Categorical)
19	boreratio	Boreratio of car (Numeric)
20	stroke	Stroke or volume inside the engine (Numeric)
21	compressionratio	compression ratio of car (Numeric)
22	horsepower	Horsepower (Numeric)
23	peakrpm	car peak rpm (Numeric)
24	citympg	Mileage in city (Numeric)
25	highwaympg	Mileage on highway (Numeric)
26	price(Dependent variable)	Price of car (Numeric)

赛题任务

选手需要对该汽车数据进行聚类分析，并找到vokswagen汽车的相应竞品。要求选手在天池实验室中用notebook完成以上任务，并分享到比赛论坛。
（聚类分析是常用的数据分析方法之一，不仅可以帮助我们对用户进行分组，还可以帮我们对产品进行分组（比如竞品分析）这里的聚类个数选手可以根据数据集的特点自己指定，并说明聚类的依据）

import pandas as pd
from sklearn.cluster import KMeans
from sklearn.preprocessing import LabelEncoder
from sklearn import preprocessing

导入数据

df=pd.read_csv("car_price.csv")

提取数据中值为字符串的列，，，注意include里是O，不是0！！！

cols = df.select_dtypes(include='O').columns
df2 = df.copy()

转换数值标签

for col in cols:
le = LabelEncoder()
df2[col] = le.fit_transform(df[col])
df2

数据缩放

scaler = preprocessing.MinMaxScaler()

注意是df2.loc[：]，不能是df2
df2.loc[:] = scaler.fit_transform(df2)

columns = ['symboling', 'fueltype', 'aspiration',
'doornumber', 'carbody', 'drivewheel', 'enginelocation', 'wheelbase',
'carlength', 'carwidth', 'carheight', 'curbweight', 'enginetype',
'cylindernumber', 'enginesize', 'fuelsystem', 'boreratio', 'stroke',
'compressionratio', 'horsepower', 'peakrpm', 'citympg', 'highwaympg',
'price']

kmeans= KMeans(n_clusters=10)

kmeans.fit(df2[columns])
y_pred = kmeans.predict(df2[columns])

df['result'] = y_pred
df

选择无重复的值，便于查看

df['CarName'].unique()
name = 'volkswagen rabbit'

提取标签一样的数据作为我们的汽车竞品

label = df[df['CarName']==name]['result'].values[0]

df[df['result']==label]

本图文内容来源于网友网络收集整理提供，作为学习参考使用，版权属于原作者。

THE END

数据分析汽车

二维码

传统图像处理（一）

< <上一篇

python机器学习-实现ID3决策树

下一篇>>

搜索内容

阿里巴巴天池机器学习（数据分析达人赛3:汽车产品聚类分析）

赛题背景

赛题数据

赛题任务

最新文章

分类

标签云