解放劳动力,自动数据报表

最近在关注Android系统的优化,但是也涉及到了处理和分析线上用户的使用情况。所以需要每天观察线上数据,目前为止,还没有一个可视化的技术数据平台,所以动手搭建一个自动日报或者周报的输出模块,用来解放双手。文中有些步骤为工作环境原因的特殊操作,没有普遍性。

# 获取原始数据

公司考虑到数据安全,原始数据的获取需要经过特殊的密钥申请,但是基本上是aws的那套安全机制。细节就不在这里记录。只需从我们的执行服务器上执行特定的脚本按月拉取截止到目前为止的所有相关原始数据。

1
./pull.sh 08

同步好数据以后,因为原始数据可能会有其他同事做其他分析用,所以一般上我会再同步一份到我的工作目录。

这个步骤在自动报表中,会在每天凌晨1点钟自动运行

# 准备工作目录

为了清晰结构,我习惯性地在工作目录中区分原始数据,中间数据,错误数据,以及结果数据,这样有利于分析问题和有效减少数据预处理的时间。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
.
|-- data-err
|-- data-mid
|-- data-res
|-- data-src
|-- modules
|-- doCalANRV2.py
|-- doCalAPKCrashV2.py
|-- doCalAPKV2.py
|-- doCalBluetoothV2.py
|-- doCalCrackV2.py
|-- doCalDAUV2.py
|-- doCalDHBV2.py
|-- doCalKeyEventV2.py
|-- doCalOtaV2.py
|-- doCalSourceSelectV2.py
|-- doCalVoiceV2.py
|-- muidataV2All.sh
|-- muidataV2.sh
|-- prefilter_post.py
|-- prefilter.py
|-- syndata.sh
|-- synres.sh
`-- TODO.md

上面所有的同步数据到工作目录,我会使用./syndata.sh 来完成。这个步骤在自动报表中,会在每天凌晨3点钟自动运行

# 清洗数据,中间结果入库

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
#!/bin/bash
if [[ "$#" == "0" ]]; then
base=`date +%Y%m%d`
else
base=$1
fi
LastDayType4Get=`date -d "$base yesterday" +%d/%b/%Y`
LastDayType4Post=`date -d "$base yesterday" +%Y%m%d`
LastDayType4Month=`date -d "$base yesterday" +%Y_%m`
#####################################################
# 处理OTA数据
# 1、从心跳数据中获取升级后第一次启动的数量
# 2、从update数据中获取请求升级包的数量
#
# '''DB Format
# +--------+---------+----------------+--------+
# | hw | version | infoname | value |
# +--------+---------+----------------+--------+
# | 32MQ1 | xxx | firstrun | 3456 |
# +--------+---------+----------------+--------+
# | 32MQ1 | xxx | to_xxxx | 3456 |
# +--------+---------+----------------+--------+
# '''
#####################################################
cat data-src/merge_dhb_${LastDayType4Month}.log | grep ${LastDayType4Get} | python prefilter.py > data-mid/dhb.log
cat data-src/merge_update_${LastDayType4Month}.log | grep ${LastDayType4Get} > data-mid/update.log
python doCalOtaV2.py data-mid/update.log data-mid/dhb.log
#####################################################
# 处理ANR数据
#
# '''DB Format
# +--------+---------+----------------+--------+
# | hw | version | infoname | value |
# +--------+---------+----------------+--------+
# | 32MQ1 | xxx | anr_package | 3456 |
# +--------+---------+----------------+--------+
# '''
#####################################################
cat data-src/merge_anr_${LastDayType4Month}.log | grep ${LastDayType4Get} | python prefilter.py | python doCalANRV2.py
#####################################################
# 处理DAU数据
#
# '''DB Format
# +--------+---------+----------------+--------+
# | hw | version | infoname | value |
# +--------+---------+----------------+--------+
# | 32MQ1 | xxx | dau_package | 3456 |
# +--------+---------+----------------+--------+
# '''
#####################################################
cat data-src/merge_pv_${LastDayType4Month}.log | grep ${LastDayType4Get} | python prefilter.py 2>data-err/pv.err | python doCalDAUV2.py
#####################################################
# 处理卡顿数据
# '''DB Format
# +--------+---------+----------------+--------+
# | hw | version | infoname | value |
# +--------+---------+----------------+--------+
# | 32MQ1 | xxx | crack_package | 3456 |
# +--------+---------+----------------+--------+
# '''
#####################################################
cat data-src/merge_crack_${LastDayType4Month}.log | grep ${LastDayType4Get} | python prefilter.py 2>data-err/crack.err | python doCalCrackV2.py
#####################################################
# 处理奔溃数据
#
# '''DB Format
# +--------+---------+----------------+--------+
# | hw | version | infoname | value |
# +--------+---------+----------------+--------+
# | 32MQ1 | xxx | crash_package | 3456 |
# +--------+---------+----------------+--------+
# '''
#####################################################
cat data-src/merge_crashdata_${LastDayType4Month}.log | grep ${LastDayType4Post} | python prefilter_post.py 2>data-err/crash.err | python doCalAPKCrashV2.py
#####################################################
# 处理蓝牙数据
#
# '''DB Format
# +--------+---------+----------------+--------+
# | hw | version | infoname | value |
# +--------+---------+----------------+--------+
# | 32MQ1 | xxx | bstart | 3456 |
# +--------+---------+----------------+--------+
# | 32MQ1 | xxx | bsuccess | 3456 |
# +--------+---------+----------------+--------+
# '''
#####################################################
cat data-src/merge_sys_${LastDayType4Month}.log | grep ${LastDayType4Get} | python prefilter.py 2>data-err/sys.err | python doCalBluetoothV2.py
#####################################################
# 处理语音数据
# '''DB Format
# +--------+---------+----------------+--------+
# | hw | version | infoname | value |
# +--------+---------+----------------+--------+
# | 32MQ1 | xxx | vstart | 3456 |
# +--------+---------+----------------+--------+
# | 32MQ1 | xxx | vsuccess | 3456 |
# +--------+---------+----------------+--------+
# '''
#####################################################
cat data-src/merge_sys_${LastDayType4Month}.log | grep ${LastDayType4Get} | python prefilter.py | python doCalVoiceV2.py
#####################################################
# 处理切源数据
# '''DB Format
# +--------+---------+--------------------+--------+
# | hw | version | infoname | value |
# +--------+---------+--------------------+--------+
# | 32MQ1 | xxx | hdmi_start | 3456 |
# +--------+---------+--------------------+--------+
# | 32MQ1 | xxx | hdmi_success | 3456 |
# +--------+---------+--------------------+--------+
# | 32MQ1 | xxx | atv_start | 3456 |
# +--------+---------+--------------------+--------+
# | 32MQ1 | xxx | atv_success | 3456 |
# +--------+---------+--------------------+--------+
# | 32MQ1 | xxx | dtv_start | 3456 |
# +--------+---------+--------------------+--------+
# | 32MQ1 | xxx | dtv_success | 3456 |
# +--------+---------+--------------------+--------+
# | 32MQ1 | xxx | vga_start | 3456 |
# +--------+---------+--------------------+--------+
# | 32MQ1 | xxx | vga_success | 3456 |
# +--------+---------+--------------------+--------+
# '''
#####################################################
cat data-src/merge_sys_${LastDayType4Month}.log | grep ${LastDayType4Get} | python prefilter.py | python doCalSourceSelectV2.py
#####################################################
# 处理系统按键数据
# '''DB Format
# +--------+---------+----------------+--------+
# | hw | version | infoname | value |
# +--------+---------+----------------+--------+
# | 32MQ1 | xxx | irtime | 3456 |
# +--------+---------+----------------+--------+
# | 32MQ1 | xxx | bltime | 3456 |
# +--------+---------+----------------+--------+
# '''
#####################################################
cat data-src/merge_sbtn_${LastDayType4Month}.log | grep ${LastDayType4Get} | python prefilter.py | python doCalKeyEventV2.py
#####################################################
# 处理设备应用数据
# '''DB Format
# +--------+---------+-------------------+--------+
# | hw | version | infoname | value |
# +--------+---------+-------------------+--------+
# | 32MQ1 | xxx | install_package | 3456 |
# +--------+---------+-------------------+--------+
# | 32MQ1 | xxx | uninstall_package | 3456 |
# +--------+---------+-------------------+--------+
# | 32MQ1 | xxx | lunch_package | 3456 |
# +--------+---------+-------------------+--------+
# '''
#####################################################
cat data-src/merge_apk_${LastDayType4Month}.log | grep ${LastDayType4Get} | python prefilter.py 2>data-err/apk.err | python doCalAPKV2.py
#####################################################
# 处理设备心跳数据
# '''DB Format
# +--------+---------+-------------------+--------+
# | hw | version | infoname | value |
# +--------+---------+-------------------+--------+
# | 32MQ1 | xxx | dhbapk_package | 3456 |
# +--------+---------+-------------------+--------+
# | 32MQ1 | xxx | dhbcpu_cpuusage | 3456 |
# +--------+---------+-------------------+--------+
# | 32MQ1 | xxx | dhbmeml_meml | 3456 |
# +--------+---------+-------------------+--------+
# | 32MQ1 | xxx | dhbmeml_memu | 3456 |
# +--------+---------+-------------------+--------+
# '''
#####################################################
cat data-src/merge_dhb_${LastDayType4Month}.log | grep ${LastDayType4Get} | grep 'cup=' | python prefilter.py 2>data-err/dhb.err | python doCalDHBV2.py
#####################################################
# 处理设备使用过程数据
#####################################################
cat data-src/merge_dhb_${LastDayType4Month}.log | grep ${LastDayType4Get} | grep 'cup=' | python prefilter.py 2>data-err/dhb.err | python doCalUseInfoV2.py

# 本地计算,出图

使用入库的中间结果,开发计算术想要的指标。

1
2
3
4
5
6
7
8
➜ mui-data git:(master tree | grep Show
├── doShowANR.py
├── doShowBluetooth.py
├── doShowCrack.py
├── doShowDHB.py
├── doShowOTA.py
├── doShowUseInfo.py
├── doShowVoice.py

加入到脚本中一起执行:

1
2
3
4
5
6
7
8
#!/bin/bash
python doShowANR.py $1
python doShowBluetooth.py $1
python doShowCrack.py $1
python doShowDHB.py $1
python doShowOTA.py $1
python doShowVoice.py $1
python doShowUseInfo.py $1

# 制作markdown报表模板

模板的使用方式为仅仅将[date]替换成当天的时间值

1
2
3
4
5
6
7
8
9
# MUI 系统技术数据统计日报
> [date]
### OTA成功率
> 截止到 [date] 为止,线上稳定版本升级成功率结果
![](data-res/[date]_otainfo.png)

# 生成报表

替换模板中的[date]字段,即可生成报表:

1
sed -e 's/\[date\]/2017_08_30/g' report_temp.md > report_2017_08_30.md

# 将报表转换成图片(以下为虚拟数据)