当前位置:网站首页>Network traffic around us

Network traffic around us

2020-12-07 19:15:30 Aliyun yunqi

author :qinglianghu

One . Good and evil in network traffic

It's not just your friends and relatives who are surfing the Internet with us , There are also web crawlers that live on the Internet . Almost every time 5 In the second web browsing , Yes 2 Next is " false " Of web crawlers . These reptiles that live on the Internet also have " Good and evil " Divided into . For those who comply with network rules , For example, we are familiar with the search engine " raise " That's what we love . But those who violate network rules , By looking for loopholes 、 Take advantage of loopholes to make huge profits , Those who get a lot of privacy are not welcome .
1.jpg
chart 1.1 2019 The proportion of annual traffic generated by good and evil machines and people

stay Imperva Of 《2020 Bad Bot Report 》 in , We can see 2019 year , Well intentioned machine traffic has dropped to 13.1%, Malicious crawler traffic compared to last year (18.1%) rose 24.1%, Almost a quarter of the network traffic .

Two . The current situation of malicious network traffic

1. Malicious traffic level distribution
The types of malicious traffic can be classified according to the complexity of the crawler ,Imperva It is divided into 3 class .

  • Simple: This is generally a simple and easy to detect malicious request , About a fifth of the total malicious requests ;
  • Moderate: This kind of network will change , Sending malicious requests through anonymous proxy ; Half of all malicious requests .
  • Sophisticated: Based on the use of anonymous proxy, this method can forge mouse tracks 、 Click and other user interaction to avoid detection , It can almost completely simulate human browsing behavior . The medium to high type of reptiles is also known as APBs(Advanced persistent bots), Account for the 7 Of malicious traffic .

2.jpg
chart 2.1 Malicious traffic level distribution

Three years in a row , According to the complexity of malicious traffic distribution is very consistent , The proportion of simple malicious requests that are most easily detected is 26.3%; The average percentage is 53.6%; The proportion of malicious requests from complex professionals 20.1%.APBs Proportion 73.7%, Slightly higher than the previous year . Second dial IP Technological development , Make a lot of simple passes through IP Blacklist is invalid to restrict crawlers .

2. Distribution of malicious traffic in different industries

The problem of malicious traffic is pervasive in every industry , There are some problems that are unique to certain industries . For example, a website that only logs in to the portal may encounter a database collision attack , The price climbing is mainly concentrated in the e-commerce industry .
3.jpg
chart 2.2 Distribution of different industries

Financial services For the second year in a row 47.7% The proportion of malicious traffic occupies the first place in all industries attacked by malicious traffic . Most of the malicious traffic comes from database collision attacks , Their goal is to access the privacy information of users of these companies .

The education industry Malicious traffic accounts for 45.7%, Crawlers are generally used to get papers 、 Students choose courses and get accounts .

Market trading related industries This is another industry that has been attacked by a lot of malicious traffic , Similar to e-commerce, this part of the crawler is mainly used to obtain price information and user accounts .

The government There's... In the flow to government websites 37.5% Of malicious traffic , This part of the crawler basically crawls business registration information and election information .

nonprofit organization Use the donation page of non-profit organizations to verify the validity of illegal financial account information , This part of the traffic attack for the non-profit organization's server is difficult to deal with .

Air Tourism Air Tourism 30.5% The composition of malicious traffic is more complex , It's not just direct crawling from competitors , There are also some from third-party eco companies . Unauthorized agents 、 Competitors and second-hand scalpers use advanced crawlers to get fares , This not only increases the transaction costs of ordinary users , It caused a lot of customer complaints , And the crawler program will also cause the server response speed to slow down or even paralysis . Besides , The user accounts of travel companies are also facing the problem of database attack , Black ash will try to steal mileage points from users' accounts to make huge profits .

3. Malicious traffic sources

70% of malicious traffic comes from large data centers ( Cloud service providers ), It's down a little from last year . The proportion of malicious traffic from the home network has increased for three consecutive years , And before 22.7% Up to 27.8%, The proportion of malicious traffic from mobile network is not high, only accounts for 2.3%.
4.jpg
chart 2.3 2019 Annual distribution of malicious traffic sources

From a national point of view , The United States has been at the top of the list for six years in a row , Compared with last year's 53.4% It's down to 45.9%. China to 4.8% In fourth place .
5.jpg
chart 2.4 2019 Annual distribution of malicious traffic source countries

In the distribution of countries with the most resistance to malicious traffic , Russia to 21.1% The share of the first , China is second . This is mainly due to the ban on foreign networks in these countries .
6.jpg
chart 2.5 2019 The situation of blocking malicious traffic by different companies in

3、 ... and . Cause analysis and Countermeasures

2019 Malicious machine traffic in the network has reached a quarter of the total network traffic in . More Than This , The development of malicious machine traffic has entered the next stage , They are trying to improve their image , Make yourself look legal . We are building a professional business in black ash industry , They come from other websites " obtain " data 、 Package data , And provide data to companies willing to buy . All of these are cleverly packaged into “ Intelligent business ” Service for .

The reason for the rapid development of malicious network traffic , It can be summed up as follows :

1. Market orientation

First , Black ash production has a great interest to be drawn . Now they have websites with professional look and feel , Providing what is known as pricing intelligence 、 Business intelligence services that substitute financial data or competitive insight . Usually , These companies offer industry-specific data products . As more and more data can be purchased from crawlers on the market , The competition pressure among enterprises in the industry is increasing . No business wants to be , Failure due to incomplete access to information .
8.jpg
chart 3.1 Various data lists of a certain platform

meanwhile , With the continuous improvement and popularization of the membership system , Each user's account has some digital currency or points that can be exchanged or transferred 、 gift . Account passwords from data leaks combine with the growth of membership systems , It provides convenience for malicious database collision . Malicious machine programs crawl data from a web site without permission ( For example, pricing 、 Inventory, etc ) To gain a competitive advantage . Personal privacy data crawled can even be used by criminals , Engage in fraud 、 Theft and other criminal activities .

secondly , The demand for traffic is also increasing in different areas . In China, , It's a well-known way for fans to buy their favorite stars . In the U.S. , Machine controlled social media account settings can interfere with voting in elections .

Last , Now there are a lot of jobs related to data crawling in the recruitment positions , And has a very high treatment . In such an environment , It's hard to see the problem of malicious machine traffic disappear .

2. Web crawlers in the gray zone

Most of malicious machine traffic comes from web crawlers , Reptile as a computer technology determines its neutrality , So reptiles themselves are not prohibited by law , But the use of reptile technology to obtain data is a violation of the law and even a crime .

stay 2019 year 11 Of the month hiQ Use reptiles to crawl Linkedin In the case of the data . The final verdict of the court is : LinkedIn (LinkedIn) The company must not prevent hiQ The company entered 、 Copy and use LinkedIn The public information of the website , No legal or technical measures should be taken to hinder , If there must be 24 Divide within hours .

The following measures may alleviate the negative effects of malicious crawlers to a certain extent .

Forbid something out of date User Agent Request header , Generally speaking, crawler requests are randomly generated , And there's a lot of code written in earlier years , This part of the request UA It's out of date .

Some proxy service providers are prohibited , Many crawlers use some of the free and cheap third-party proxy services on the market . It might be a good choice to ban these proxy requests .

Manage all the access sources of your website . Including the movement of websites web edition 、H5 Applet version , And links to third-party platforms .
Analyze the request log of the website

Record and analyze the log of website login failure .

 

Link to the original text
This article is the original content of Alibaba cloud , No reprint without permission .

版权声明
本文为[Aliyun yunqi]所创,转载请带上原文链接,感谢
https://chowdera.com/2020/11/20201112221016675j.html