数据分布探索函数（可以直接调用）

``````data.hist(bins=50,figsize=(20,15))
plt.show()
``````

``````merchant['avg_sales_lag3'].value_counts().sort_index()

``````

data dataframe形式的数据集
variablet 特征名称，得是字符串
va_type 两个选择：‘numeric’时对数值型变量绘制直方图，‘category’时对分类型变量绘制条形图
``````def data_distribution_explore(data,variable,va_type):
if va_type=='numeric':
distri=data[variable].values.tolist()
plt.figure(figsize=(20,5))
nums,bins,patches = plt.hist(distri,bins=25)
plt.xticks(bins,bins)
plt.xticks(rotation=45)
for num,bin in zip(nums,bins):
plt.annotate(num,xy=(bin,num),xytext=(bin+1.5,num+0.5))
plt.title(variable,fontsize=20)
# plt.xlabel(variable,fontsize=20)
plt.ylabel('count',fontsize=20)
if va_type =='category':
distri=data[variable].value_counts().sort_index()
plt.figure(figsize=(20,5))
bar=plt.bar(distri.index, distri.values, 0.6)
for rect in bar:
height = rect.get_height()
plt.annotate('{}'.format(height),
xy=(rect.get_x() + rect.get_width() / 2, height),
xytext=(0, 0.8),#柱子上方距离
textcoords="offset points",
ha='center', va='bottom')
plt.title(variable,fontsize=20)
#   plt.xlabel(variable,fontsize=20)
plt.ylabel('count',fontsize=20)

plt.show()
``````

``````data_distribution_explore(merchant,'category_2',va_type='category')
``````

``````for i in ['avg_sales_lag3','avg_sales_lag6','avg_sales_lag12']:
data_distribution_explore(merchant,i,va_type='numeric')
``````

THE END

A

)">