当前位置:网站首页>Didi log service Suite

Didi log service Suite

2021-01-23 17:08:12 Obsuite

01 The challenge of log service

With the escalation of Sino US friction , The rise of open source culture in China , Major Internet companies and leading enterprises in various industries , They are moving towards open source 、 Security 、 independent 、 Controllable development path . Based on open source engine Kafka/ElasticSearch, Build an infrastructure consensus for logging infrastructure :

  • Log collection capability : Server side 、 client 、Web、 Database log collection work ;
  • journal ETL Ability : Log real time ETL、ETL Link monitoring ,ETL Link quality measurement ;
  • Log retrieval capability : Full text search capabilities 、 Log context restore capability ;
  • Log analysis capabilities :Adhoc Log OLAP Ability .

With log traffic 、 Log tasks continue to increase , bring “ Log timeliness 、 Friendly operation and maintenance 、 Service stability 、 Data security ” The problem becomes very tricky , Such as :

1) Challenges in the log collection phase

  • Need to support physical machines 、 virtual machine 、 Containerized scenes , Log collection at service granularity ; Support elastic dynamic expansion and contraction ;
  • Need to support massive 、 Hundreds of thousands Agent monitor 、 Operation and maintenance 、 Multi version management ;
  • Need to support shared multi tenant hierarchical security model ;
  • Need to provide rich metrics for task level , Fault diagnosis and self-healing ability .

2) journal ETL The challenge at this stage

  • ETL The semantic expression should be simple and clear , And decouple from the underlying infrastructure , Yes SQL The expression is strong demand ;
  • ETL Links involve many links , Each has its own index system , The caliber is not uniform , The cost of problem location and investigation is very high ;
  • ETL Link involves log storage and calculation , stay Quota The ability of inner end-to-end elastic expansion and contraction is full of technical challenges .

3) The challenge of log storage

  • Kafka disk IO Cluster production and consumption avalanche caused by hot spots ;
  • Topic Poor resource isolation , A sudden increase in flow 、 Back to consumption , Affect the stability of the cluster ;
  • Kafka There are lots of clusters and topic We need a platform to carry on the community Kafka-Manager Lack of ability .

4) The challenge of log retrieval

  • ElasticSearch Constrained by the meta information bottleneck , colony Shard We can't break hundreds of thousands of levels , Scalability issues need to be addressed ;
  • ElasticSearch Lack of multi tenant and query isolation system for cluster resources , It's the biggest killer of stability ;
  • ElasticSearch The end-to-end stereoscopic monitoring system is missing , The ability of operation and maintenance support is insufficient , We need to solve the problem of operation and maintenance friendliness .

5) The challenge of log analysis

  • Hundred million level detail data level Adhoc Query analysis ability ;
  • One hundred million level cardinal dimension is the support of high-precision scene removal ability ;
  • The lack of end-to-end stereoscopic monitoring system , The ability of operation and maintenance support is insufficient , We need to solve the problem of operation and maintenance friendliness .

02 sound of dripping water Logi Logging service Suite

With the digital transformation of enterprises 、 The whole process of business going to the cloud , Microservices 、 The rapid development of containerization and other technologies , Business to stability 、 An easy-to-use logging infrastructure presents three pressing needs :

  • The need for service security : Full link tracking is an important guarantee of stability ;
  • The needs of business operations :A/B TEST、 Activity operation analysis 、 End user behavior analysis 、 Precision marketing , Right MB/S Second level log storage capacity ,TB The second level search ability of level log is strongly demanded ;
  • The need for business security : Identify the source of attack and stop the loss of assets , Safety audit and traceability ,TB The level of log Adhoc Analytical ability .

sound of dripping water Logi The log service suite passes through Didi 7 Years of precipitation and polishing , For log collection 、 The logging stored 、 Log Computing 、 Log retrieval 、 Every link of log analysis , In terms of component capabilities PAAS Chemical construction 、 Targeted optimization on engine stability and scalability , The structure is as follows :

It has the following advantages :

  • Open source is independent and controllable :Logi-Agent、Logi-LogX、Logi-KafkaManager、 Logi-ElasticSearchManager various PAAS The suite plan is all open source ;
  • The engine is stable and reliable :Agent 40MB/S Single task acquisition performance of , The ability to isolate controllable resources ;LogX Real time data acquisition of the task ETL Second delay 、 The ultimate optimization of computing performance ; sound of dripping water kafka hundred GB/S Real time traffic ; sound of dripping water ElasticSearch Dozens of PB Index storage cluster stability 99.95%;
  • Service operation precipitation : Hundreds of thousands of log service tasks ensure the timeliness of log data through end-to-end full link 、 integrity 、 Observability 、 Friendly operation and maintenance ; Flexible scheduling of resources and productization of hierarchical support capability ;
  • The platform is professional and easy to use : The minute level completes the end-to-end self-service access of the whole log link ;SQL Templates +UDF Personalized cleaning capabilities support ; hundred TB Second level data retrieval experience .

》Logi-Agent Introduce

Logi-Agent Committed to building enterprise level data collection platform , Responsible for the company's multi terminal 、 Collection of polymorphic data , The structure is as follows :

sound of dripping water Logi-Agent Online scale 10W Deployment nodes ,130GB/s The amount of log collection ,20000+ Log collection task , Single task maximum acquisition capability 40MB/S.

》Logi-Kafka Introduce

Based on users 、 Research and development 、 High frequency scenes from different perspectives PAAS turn , Improve the friendliness of operation and maintenance 、 Engine observability 、 User convenience , Open source https://github.com/didi/kafka... 500+ Free users , Experience address : http://117.51.146.109:8080/ , Account and password :admin/admin

sound of dripping water Kafka The cluster size 500+,60GB/S Of traffic , Share the experience of multi tenant large cluster scenario (CPU Peak utilization 30%, disk 50%),SLA promise 99.95%, Based on the engine 2.5 Version has been 40+ Feature enhancement , Disk overload protection , Partition dynamic migration , Business thread isolation is a feature of Didi , The key to stability !

》Logi-LogX Introduce

LogX Service oriented to MB/S As Quota The unit of , With SreamingSQL+UDF As ETL Expression vectors , Support with Quota Dynamic expansion of the unit 、 Capacity to shrink , On a mission basis , Build channel end-to-end performance 、 timeliness 、 Integrity index system .

sound of dripping water 20000+StreamingSQL ETL Mission , Single task maximum traffic 500MB/S, End to end ETL Delay 90 The quantile is less than 2Min, With minute level dynamic expansion and reduction capacity .

》Logi-ElasticSearch Introduce

The most professional in the industry ElasticSearch-Manager, Based on users 、 Research and development 、 High frequency scenes from different perspectives PAAS turn , Precipitation of the full hosting characteristics of the index service .

Provides capacity planning features based on index templates , Cluster disk utilization 30%→65%, Open source preparation .

Since the research ElasticSearch-GateWay, Provide cross cluster access , Multiple versions are compatible , Tenant definition and security ,DSL Audit and analysis, etc , Supporting didi 50 100 million times / Days of data reading ,1200W/S Data written to , yes ES Smooth engine upgrade 2.3.3->6.6.1->7.6.1 The cornerstone components of .

sound of dripping water ElasticSearch The cluster size 3500+,8PB Storage , Shared multi tenant cluster (1000+ example ,60W Shard,CPU Peak utilization 45%, disk 60% ) The experience of the scene .

SLA promise 99.95%, Based on the engine 7.6.1 Version has been 150+ Feature enhancement , Write performance is the community version 2 times .

FastIndex 50TB Indexes 1 Hours to build , Open source (https://github.com/didi/ES-Fastloader).

Since the research DCDR, It provides the ability of high availability of index between clusters , For online 50+ The main search scenario provides the ability to live in different places , Cumulative direction ES Community contribution 30+PR.

03 sound of dripping water Logi The application case

sound of dripping water Logi There are a lot of scenarios in Didi's internal service , In fault location 、 Log analysis 、 The log service 、 Business operations 、 Security audit 、 Log assets 、 There are in-depth practices in such scenes as log and large screen .

Limited to space, we will focus on the log service next LogInsight And business operation mirror , The analysis is based on didi Logi Business value that can be generated .

》LogInsight

LogInsight Based on didi Logi The ability of , Main cloud log storage solution , For the demands of log storage and analysis after cloud and container , Log cold standby is provided 、 Resource management 、 Log retrieval and other capabilities .

  • Significantly reduce log usage 、 Storage costs Full custody 、 Stretch and stretch , No operation and maintenance Cold standby storage , about 0.02 element /GB/ month , Significantly reduce storage overhead , Support 1-365 Days custom storage time ;
  • Quickly found 、 Location problem , Improve business stability Statistical analysis of interface performance and error log based on big data streaming computing , Provide interface call relationship 、 Topological relationship 、 Upstream and downstream flow analysis 、 Service error positioning 、 Error clustering and other functions ; Safe and reliable
  • Safe and reliable Availability is no less than 99.9%, It can handle hundreds of TB Log volume Real time data collection , Minutes down , Log storage is not lost to meet the needs of log audit .

》 The magic mirror

Magic mirror is a professional scene based intelligent analysis platform for user behavior , Provide data collection from 、 Storage 、 Calculation 、 Analyze the whole process solution from operation promotion .

  • Scenario analysis model User retention analysis , User trajectory analysis , User profile analysis ;
  • Basic service capabilities The core index can check the data of the day in real time , Real time computing , Second level data generation , The market supports integrated reports ;
  • Data analysis capabilities Non R & D personnel can build their own indicators , Support multiple types of visual reports , Support data export and analysis , Support omega Data reporting data ;
  • Multi product satisfaction survey Support multi organization and multi product structure , Support online automatic configuration , Support the lottery , Increase participation .

Based on didi Logi Logging service Suite , sound of dripping water Logi Not only can it better meet the general operation and maintenance observability of the enterprise in the log scenario 、 Application observability appeal , It can also better meet the needs of business operation 、 Security audit 、 Log analysis 、 Log mining and other various scenarios .

sound of dripping water Logi The overall open source plan is as follows , Welcome to pay attention .

Enterprise users who use the open source version in production , You can join OCE, We will give extra and better support , For example, the exclusive Technology Salon 、 One on one communication opportunities for enterprises 、 Exclusive Q & a group, etc .OCE The application portal is in Obsuite In the official account menu , Click on 【OCE authentication 】 You can also apply directly .

版权声明
本文为[Obsuite]所创,转载请带上原文链接,感谢
https://chowdera.com/2021/01/20210123170728370C.html