当前位置:网站首页>LinkedIn architecture in the past decade

LinkedIn architecture in the past decade

2021-10-22 10:36:23 Bird's nest

original text : A Brief History of Scaling LinkedIn

Josh Clemm yes LinkedIn Senior engineering manager of , since 2011 To join in LinkedIn. He recently (2015/07/20) Wrote an article , It introduces LinkedIn In view of the structural changes brought about by the rapid expansion of user scale .
The article is a bit like that written by Ziliu This decade of Taobao Technology

2003 Year is LinkedIn First year , The goal of the company is to connect your personal contacts to get better job opportunities . Only in the first week of Launch 2700 Members registered , Time flies ,LinkedIn Products 、 Number of members 、 The server load has increased greatly .
today ,LinkedIn Global users have exceeded 3.5 Billion . We have hundreds of thousands of pages accessed every second , Mobile traffic has accounted for 50% above (mobile moment). All these requests get data from the background , Our background system can handle millions of queries per second .

The problem is coming. : How did all this happen ?

In the early

Leo

As many sites started now , LinkedIn Use one application to do all the work . This application is called "Leo". It contains all the Java Servlet page , Process business logic , Connect a small amount of LinkedIn database .
* Ha ! The style of the early website - Simple and practical *

Member Graph ( Membership diagram )

One of the first jobs is to manage social networks between members . We need a system to traverse through the graph (graph traversals) To query relational data , At the same time, the data needs to reside in memory for efficiency and performance . From this different use feature , Obviously, this requires an independent of Leo System to facilitate scale-up , So one called "Clould" Dedicated to membership diagrams (member graph) An independent system was born . This is a LinkedIn The first service system . In order to and Leo System separation , We use Java RPC To communicate .

Also around this time, we need to increase the ability of search services . Our membership graph service also provides data to a Lucene Search services for .

Replica read DBs ( Multiple read-only database copies )

As the site grows , Leo The system is also expanding , Added more roles and functions , It's more complicated . Through load balancing, you can run multiple Leo example , But the new load also affects LinkedIn The most critical system - Member information database .

One of the easiest solutions is to scale vertically - Add more CPU And memory . Although this can last for a period of time , But we will still encounter the problem of scale expansion in the future . The member information database handles both reading and writing . In order to extend the , We introduced replication from the library (replica slave DB). The replication database is a copy of the member database , Use databus ( Now open source ) To synchronize with the earliest version of . These copies process all read requests from the library , And the logic to ensure the data consistency between the master database and the slave database is added .

* After the master-slave read-write separation scheme , We turned to the database partitioning solution *

When the site encounters more and more traffic , Single Leo The system often goes down , And it's hard to troubleshoot and recover , It's also difficult to release new code . High availability is important to LinkedIn crucial , Obviously we need " kill " Leo, Decompose it into several small functional modules and stateless Services .
*"Kill Leo" This spell has been preached internally for many years *

Service Oriented Architecture ( Service Oriented Architecture )

Engineers started pulling out some microservices , These microservices provide API And some business logic , Search engine , Member information , Distribution and group platforms . Then our presentation layer was extracted , Such as recruitment products and public information pages . New products , New services are independent of Leo. Soon , The vertical stack of each functional area completes .
We built a front-end server , You can get data from different domains , Process presentation logic and generate HTML ( adopt JSP). We also built a middle tier service provider API Interface to access data model and provide database consistency to access back-end data services . To 2010 year , We have more than 150 A separate service , Today, , We have more than 750 A service .

Because of Statelessness , Scaling can be accomplished by stacking new instances of any service and load balancing between them . We set a red line for each service , Know its load capacity , Provide early warning and performance monitoring .

cache ( cache )

LinkedIn Predictable growth urges us to further expand . We know that by adding more cache layers to reduce load pressure . Many applications begin to introduce intermediate cache layers, such as memecached perhaps couchbase. We also added caching in the data layer , And use... When appropriate Voldemort Provide pre calculated results .

after , We actually removed the intermediate cache layer . The intermediate cache layer stores data from multiple domains . Although caching seems to be a simple way to reduce stress at first , But the complexity of cache data invalidation and call graph (call graph) Become uncontrollable . Bringing the cache closer to the data layer can reduce latency , So that we can expand horizontally , Reduce the known load (cognitive load).

Kafka

In order to collect growing data ,LinkedIn Many customized data channels have been developed to streamline and queue data (streaming and queueing). such as , We need to put the data into the data warehouse , We need to put a batch of data into Hadoop Workflow for analysis , We aggregate a large number of logs from each service , We collected a lot of user tracking events, such as page clicks , We need to queue inMail Data in the message system , We need to ensure that the search data is also up-to-date after users update their personal information, etc .
As the website continues to grow , More custom pipes appear . Because the scale of the website needs to be expanded , Each independent pipeline also needs to be expanded , Some things have to give up . The result is Kafka Developed , It is our distributed publish subscribe messaging system .Kafka Become a unified pipeline , according to commit log The concept of building , Pay special attention to speed and scalability . It can access data sources in near real time , drive Hadoop Mission , Allows us to build real-time analysis , It has extensively improved our site monitoring and alarm capabilities , It also enables us to visualize and track call graphs (call graph). today , Kafka
Handle more than 5 Hundreds of billions of events .

Inversion( reverse )

Expansion can be measured in many dimensions , Including organizational structure . stay 2011 end of the year , LinkedIn Started an internal innovation , It's called “ reverse ” (Inversion). We suspended the development of new features , Allow the entire engineering department to focus on lifting tools , Deploy , Infrastructure and developer productivity . It successfully enables us to build scalable new products quickly .

In recent years

Rest.li

When we are from Leao After moving to service-oriented architecture , The previously extracted data is based on Java RPC Of API, It's starting to get inconsistent in the team , Coupling with presentation layer is too tight , This will only get worse . To solve this problem , We developed a new API Model , be called Rest.li. Rest.li In line with our data oriented model architecture , Ensure consistent stateless service throughout the company Restful API Model .
be based on HTTP Of JSON data , Our new API Finally, it's easy to write non Java The client of . LinkedIn It is still mainly used today Java Stack , But there are many uses Python, Ruby, Node.js and C++ The client of , It may be self-developed or acquired . Divorced from RPC Let's also break away from the problem of compatibility between the realization layer and the back end . in addition , Use Dynamic Discovery (D2) Of Rest.li, We can get automatic load balancing , Service discovery and scalable API client .
today , LinkedIn Yes 975 individual Rest.li resources , All data centers have more than 100 billion levels every day Rest.li call .
Rest.li R2/D2  Technology station

Super Blocks ( Superblock )

Service Oriented Architecture decouples the relationship between domains and can extend services independently . But there are drawbacks , Many applications get different types of data , f(call graph) Or called " Fan out " (fanout). for example , Any request for a personal information page will get a photo , Membership , Group , Subscription information , Focus on , Blog , contacts , Recommendation and other information . This call graph is difficult to manage , And it's getting harder to control .
We introduced the concept of super block . Provide a single access for a group of background services API. So we can have a team Specifically optimize this block , At the same time, ensure that the call graph of each client is controllable .

Multi-Data Center ( Multi-data center )

As a global company with fast-growing Membership , We need to expand from a data center , We have worked hard for several years to solve this problem , First , From two data centers ( Los Angeles and Chicago ) Provides public personal information , When it proves feasible , We started to enhance services to handle data replication 、 Calls from different sources 、 One way data replication event 、 Assign users to data centers that are closer to each other .
Most of our databases run on Espresso( A new internal multi-user data warehouse ) On .
Espresso Support multiple data centers , Provides Lord - Lord Support for , And support difficult data replication .

Multiple data centers are incredibly important for high availability , The single point of failure you want to avoid is not just a service failure , More worried about the failure of the whole site . today ,LinkedIn It's running 3 A master data center , At the same time, there is globalization PoPs service .
LinkedIn's operational setup as of 2015 (circles represent data centers, diamonds represent PoPs)

What else have we done ?

Of course , Our extended story will never be so simple . Our engineering and operation and maintenance team has done countless work over the years , Mainly including these big innovations :
Over the years, many of the most critical systems have their own rich history of expansion and evolution , Including membership map service (Leo The first service other than ), Search for ( The second service ), News seeds , Communication platform and member information background .

We have also built a data infrastructure platform to support long-term growth , This is a Databus and Kafka For the first time in real life , Later used Samza Do data flow services ,Espresso and Voldemort As a storage solution ,Pinot Used to analyze the system , And other custom solutions . in addition , Our tools have also been improved , So engineers can automate the deployment of these infrastructures .

We also use Hadoop and Voldemort Data has developed a large number of offline workflows , For intelligent analysis , Such as “ People you may know ”,“ Similar experiences ”,“ Interested alumni ” And “ Resume browsing map ”.

We reconsidered the implementation of the front end , Add client template to mixed page ( Personal center 、 My college page ), In this way, the application can be more interactive , As long as our server sends JSON Or part of it JSON data . Besides , The template page passes CDN And browser caching . We also started using BigPipe and Play frame , Change our model from a threaded server to a non blocking Asynchronous Server .

In addition to the code , We used Apache Traffic Server Do multi-layer agent and use HAProxy Load balancing , Data Center , Security , Intelligent routing , Server rendering, etc .

Last , We continue to improve the performance of the server , Includes optimized hardware , Advanced optimization of memory and system , Use updated JRE.

next step

LinkedIn It is still growing rapidly today , There is still a lot of work to be done , We are solving some problems , It seems that only part of the problem has been solved - Come and join us !

thank Steve, Swee, Venkat, Eran, Ram, Brandon, Mammad, and Nick Review and help

版权声明
本文为[Bird's nest]所创,转载请带上原文链接,感谢
https://chowdera.com/2021/10/20211009000611569f.html

随机推荐