当前位置:网站首页>A new idea of key line location in mobile stack

A new idea of key line location in mobile stack

2020-12-07 19:15:49 Aliyun yunqi

brief introduction :  Crash stack is an important auxiliary means in our daily application troubleshooting , Mobile development is no exception , In order to support users to quickly locate on the stack , We are faced with a seemingly simple problem : Highlight the key line in the crash , Help users locate problems quickly .

 

Alibaba cloud Cloud native application R & D platform EMAS Zhang Yue ( here )

One 、 Preface

Crash stack is an important auxiliary means in our daily application troubleshooting , Mobile development is no exception , In order to support users to quickly locate on the stack , We are faced with a seemingly simple problem : Highlight the key line in the crash , Help users locate problems quickly .
Crash stack key line : In the stack is the code that directly causes the crash in the user development code .
for instance :
image.png

Two 、 Industry solutions

The competitive products in the industry are basically through Package Name Judgmental , In the absence of Package Name Under the circumstances , Some competitors will be positioned in the first line , Some will go to the first line of the non system library .
for example : In this case, the key line is placed in the first line fastjson The location of .
image.png

There are two easy problems here :
1.Package Name Most of the time, it has little to do with the real crash package name .
2.App Componentization , Package name cannot cover one library , Two party Library .
In order to solve this problem better , We put forward the following word frequency ratio / Word frequency division is a new way to solve the problem .

3、 ... and 、 New scheme

So in Package Name On the basis of , We also need an adjunct , Let's be able to identify these two situations , So that the key line positioning is more accurate .

One of the things that we think of here is to use the full amount of Crash Crash stack , Calculate the word frequency ratio and the corresponding word frequency , Optimize our critical line judgment through probability .

The implementation is divided into two platforms .

3.1 about iOS

1) Main package judgment

This problem , about iOS, In fact, we don't need to consider what users fill in Bundle ID, because IOS Crash It's natural Binary Images, We pre store the user's main package information , It's just for subsequent judgment .
Binary Images
image.png

2) Direct positioning :

image.png

For componentized packages , We can go through Binary Images The information inside counts the frequency of each package name , The specific frequency distribution statistics are shown in the figure below , The ordinate represents the number of times the package name appears :

-> notes : The abscissa is the package name ( There's no room for ), The ordinate is the number of times the package name appears
image.png

The less frequent it appears , So the more we think of him as a one-way library or a two-way Library .

3.2 about Android

about Android, It's a little more complicated , First Android Of Crash In fact, the package name cannot be clearly identified in , and Android Of Package Name It's not a word , It's a long string of dot separated package names , for example

"com.aliyun.emasha.cache".

If the word frequency ratio of package name is simply used to match , Then the following problems will arise
a. The historical data Only appear com.aliyun.emasha.cache The package name , Next time there's a com.aliyun.emasha.login It doesn't match .
b. The same is com.aliyun.emasha The prefix of , It's a match com.aliyun.emasha And matched to com.aliyun.emasha.cache There is a big difference in the word frequency of package names , Not in accordance with common sense .

So we have to solve these two problems
a. Be able to cover as much as possible the crash that didn't happen .
b. As the matching prefix gets longer , We need to consider the impact of the previous package name matching .

So the concept of packet name classification and word frequency division is introduced here
a. Package name rating : Package name split(".") Get array , From the past to the future 1 level ,2 level ,3 A grade like this .
b. Frequency analysis of Baoming words : According to the word frequency ratio of the package, a multi-level accumulation is used to evaluate whether the package name is the score of the tripartite library , The higher the score , The more likely it is to be a tripartite Library .

But that's not enough , If our word frequency ratio is simply cumulative , that com The package name at the beginning , The frequency of words must be very high , Greater than all org The package name at the beginning , But in our experience , It's not like that , We think different levels of matching , The weight should be different , So I just patted my head and thought of a weight .

0 5 2 1 1 1

Here's an example

com.alibaba.aliyun.emas.ha.tlog The package name
com 1
com.alibaba 0.3
com.alibaba.aliyun 0.1
com.alibaba.aliyun.emas 0.05
com.alibaba.aliyun.emas.ha 0.02
com.alibaba.aliyun.emas.ha.tlog 0.01

If it matches com Then the word frequency is divided into 1 * 0
If it matches com.alibaba Then the word frequency is divided into 1 0 + 0.3 5 = 1.5
If it matches com.alibaba.aliyun Then the word frequency is divided into 1 0 + 0.3 5 + 0.1 * 2 = 1.7
And so on

But in our experience, it matches com.alibaba And matched to com.alibaba.aliyun, The latter is more likely to be the key line , So its word frequency is lower . So let's make a common sense correction here , For matches with too short digits , Need the weight of the last few to make up .

The final results are as follows :
If it matches com Then the word frequency is divided into 1 0 + 1 5 + 1 2 + 1 1 + 1 1 + 1 1 = 10
If it matches com.alibaba Then the word frequency is divided into 1 0 + 0.3 5 + 0.3 2 + 0.3 1 + 0.3 1 + 0.3 1 = 3
If it matches com.alibaba.aliyun Then the word frequency is divided into 1 0 + 0.3 5 + 0.1 2 + 0.1 1 + 0.1 1 + 0.1 1 = 2

It seems to be more in line with our experience .

So here is the final definition of word frequency : According to the word frequency of the package, a multi-level accumulation is used to evaluate whether the package name is the score of the tripartite library , The higher the score , The more likely it is to be a tripartite Library . If a package name rating is too short , We need to add up the missing and graded ones , Used to increase the frequency of short packet names .

We do a word frequency statistics for all packages , The following distribution can be obtained

-> notes : The abscissa is the package name ( There's no room for ), The ordinate is the word frequency of the package name
image.png

According to observation and testing , Here we set the threshold at 0.2 The left-right comparison can distinguish the user's package name from the three parties 、 System libraries .

3.3 The overall architecture

We have also made some optimization in the project implementation
1. Business data used to be stored in OSS Medium , however EMR-OSS At present, the file processing is slow , Here's the one that's more suitable for parallel processing HBase.
2. Just calculate the increments Crash journal , For stock data , With HyperLogLog Form storage of , After incremental calculation, do with stock Merge.
image.png

Four 、 Effect evaluation

Regular use of Package Name Make a decision : F1 Score
image.png

Using the idea of word frequency division :F1 Score
image.png

5、 ... and 、 Real effect evaluation

The above evaluation only considers the case of each package name , Under the production factor , Considering where the crash line appears , How often does the package name appear , And no key lines , Accuracy may vary , So we did a highlight test in the real world , The test method is : On the line 50 individual App, Every App Take before 3 Break down to do statistics , The overall accuracy is as follows , It can be said that it is relatively high .

Android accuracy :(333-9)/(333)*100%=90.91%
iOS Accuracy rate :(173-0)/(173)*100%=100%
Overall accuracy :(503-9)/(503)*100%=94%

6、 ... and 、 reflection

Small demand can make great depth , In the future, we can consider more cross user data desensitization pull through , Understand the data , Bring more data value to customers .

7、 ... and 、 The next direction

1. A friend of the algorithm in the group said that it could be done by marking + CNN The way to do deep learning under the three-party package name judgment , Try this one and you can follow it up .
2. For the parameters and equations that come out of the head by experience ( Word frequency calculation ), In fact, we can fix the parameters and equations by marking training , This is also an optimization direction .

8、 ... and 、 At the end

Mobile R & D platform EMAS

Alibaba application R & D platform EMAS It is the leading cloud native application research and development platform in China ( Move App、H5 application 、 Applet 、Web Application etc. ), Based on a wide range of native cloud technologies (Backend as a Service、Serverless、DevOps、 Low code, etc ), Committed to the enterprise 、 Developers provide one-stop application R & D management services , Covering development 、 test 、 Operation and maintenance 、 Operation and other applications lifecycle .

 

 

Link to the original text
This article is the original content of Alibaba cloud , No reprint without permission .

版权声明
本文为[Aliyun yunqi]所创,转载请带上原文链接,感谢
https://chowdera.com/2020/11/20201112221016711p.html