当前位置:网站首页>The sword of Damocles in software development

The sword of Damocles in software development

2020-12-08 13:23:11 osc_ ftbxuxl1

↑ ???? I don't want to read it , Then listen to me ↑

Why does your program always appear bug?

Why should we change bug It takes up most of your time ?

Look at the end of this article , Make sure you can design more stable programs , Get rid of bug The entanglement of , It's more reassuring to do projects !

Remember when I was at school , The projects that we did , It's not for homework , It's just for showing off in the competition , So the quality of the project is very low .

How low is it ?

Most of the projects , As long as the basic functions are available , It's done , Don't consider any anomalies at all . Even if it can run successfully once , Let me cut some pictures and put them in PPT Or in the lab report , It's enough to give the teacher an assignment or to answer a contest .

That project appeared bug What shall I do? ?

  • If some functions are not available during the test , It is very simple , Regardless of him , direct PS A working diagram will do .

  • If you find that some features are not available during the game , That's easy, too , Throw the pan to “ The site network is not good ” Just go .

however , these “ Tips ” It doesn't work in a business , Enterprise projects must bring real value to the enterprise , There is no room for carelessness and deceit .

When I first entered the enterprise internship , Still retain the wolf nature of their own development projects in the school ????, As long as you can complete the basic functions , Make sure to complete the development as quickly as possible .

one day , When I get ready to leave work early , Test students came to me and said .

“ feed , Your program has bug, Here the user orders how the amount is negative ?”

Write a bug

For me, a new employee of the workplace , This is the first time in my life that someone said my code has bug, I have a problem , I'm not right .

at that time , The first thought in my mind was how to put this bug Fool the past , How to correct instead of ! It seems that I have formed a very bad habit .

A few days after that , I received more than one test in a row bug, Then correct them one by one . If such a flawed program is released online , The loss is immeasurable , Now think about it, I'm still afraid .

This principle Development 1 God , Change bug 4 God

After this , I realized , Developing projects in the enterprise , You can't just pursue efficiency in development , Also pay attention to the stability of the project , Otherwise, the additional rework time will be much longer than the time saved in the development time , And it will affect how your colleagues think of you . If you will develop bug Leave it online , The consequences are even more unimaginable !

later , After working in two big companies, byte skipping and Tencent , I further realized how important project stability is , And accumulated more experience to improve the stability of the project only in large companies .

I concluded. 10 individual Risk points that are not normally considered in development , as well as 16 individual Reduce the risk 、 Ways to improve project stability , Share with you ~

Before sharing this , Tell a story first .

Sword of Damocles

In ancient Greek legend , Damocles is BC 4 The tyrant of Syracuse, Italy in the 19th century ( An exclusive title of ancient Greek rulers ) Courtiers of Dionysius II , He was very fond of flattering Dionysius .

He flattered :“ As a great man with power and authority , Dionysius was very lucky .”

So Dionysius offered to exchange his identity for the day , Then he can try the fate of the leader .

At the dinner party in the evening , Damocles enjoyed being king very much . When dinner is almost over , When he looked up, he noticed the sword above the throne, which was only suspended by a horse's mane . He immediately lost interest in food and beauty , And ask the tyrant to let him go , He never wanted to be so lucky again .

Sword of Damocles

What does this story tell us ?

  1. After peace and tranquility , There is always danger and uneasiness .

  2. How much honor and status does one gain , He has to pay the same price .

  3. The higher the status , The safer it seems , The more dangerous it is .

  4. In time of peace prepare for war , Serious consequences that may arise at any time , Be careful .

So what does this have to do with software development ? Let me uncover the sword of Damocles in software development .

be threatened by growing crises

“ After peace and tranquility , There is always danger and uneasiness .”

Software development is just like this , On the surface of the machine “ die ” Of , It will only be executed according to the instructions or programmed programs entered by people , immutable and frozen , Very obedient . It's like when we write code and throw it on the machine , You can have a good sleep .

But is it true ? Can we really trust machines and programs ?

Actually , In the world of procedure, there are dangers , Human factors 、 Environmental factors and so on may have an impact on our program . therefore , We must always adhere to the software development Distrust principle , keep overly pessimistic( Too pessimistic ), Put all requests related to the program 、 service 、 Interface 、 Return value 、 machine 、 frame 、 Middleware and so on are regarded as untrustworthy , Advance gradually and entrench oneself at every step 、 Fortify everywhere .

The principle of distrust in the procedural world

So why should I write code so carefully , I don't trust anything ?

The pain of big projects

“ How much honor and status does one gain , He has to pay the same price .”

Software development , The more valuable the project is , The more pressure you have to bear , Let's hear about big projects .

I'm a big project worth over 100 million , It serves tens of millions of users every day , Help them acquire knowledge and happiness .

My friends only see my aura and glory , But they don't see the pressure and risk I'm carrying , Today I finally have a chance to share my feelings with you .

Remember many years ago , I'm a kid , Only a few small owners developed me , During that time , I grew up very fast . Although only a few dozen people use me , But I feel very relaxed and happy , Once in a while I'm lazy , No one will find out .

later , I have more and more functions , Get stronger . Countless new faces come to greet me every day , And enjoy the service I provide . Little by little , More developers have left a mark on me , I feel like I'm getting complicated , Also began to feel the pressure . I can't find another chance to be lazy , Because once I rest , It will make my masters lose a lot of wealth .

Now , It's a big project for me , Tens of millions of users depend on me every day , I finally have more value , But it also increases a lot of troubles , Felt a greater danger .

First , At the same time, it serves millions of users , There could be hundreds of thousands every second 、 Even millions of requests need to be handled by me , So I have to work at a high load all the time , Let alone rest , Even a little bit slower , You'll get complaints from users , Owners will also be criticized for this .

My run , Must rely on the support of many brothers , So I have to get along with my brothers , Even if a brother falls down , I will be affected .

Behind my great strength , Have a very fragile heart . It's been strengthened and reformed so many times , My functions are getting more and more at the same time , As a result, it has been implanted with various frameworks and plug-ins , It's getting bigger and bigger like a snowball , I don't know when it's going to explode . So that every time the owners change me, they have to be very careful , I grew up very slowly, too .


But what scares me most is , It's the bad guys !

They are different from normal users , Some keep making requests , Trying to knock me down . Some around my back , Trying to control me directly . Some of them are eyeing me , Watch and record my every move . There are also attempts to operate illegally , Trying to make a huge profit from me .

It's so tiring to be a big project , I don't know how much longer I can hold on to .

Is it really believable ?

“ The higher the status , The safer it seems , The more dangerous it is .”

Today is an era of open source and sharing software , When we develop projects , More or less will use the existing resources on the Internet , Like dependency packages 、 Tools 、 Components 、 frame 、 Interface 、 Ready made cloud services and so on , These resources can greatly improve our development efficiency .

Take cloud services for example , It has almost become a necessary resource for us to develop , We used to want to make a website , You may need to buy your own physical server , And then connect to the network , Then deploy the project to . Today, , Log directly to the cloud official website of large companies ( Like Tencent cloud 、 Alibaba cloud ), Then rent a cloud server , Very economical .

The cloud service

Let's talk about the mainstream development framework , Before doing a simple website interface may only use HTMLCSSJavaScript These three basic technologies , Today, , Website styles and interactions are becoming more and more complex , We have to use some well-known frameworks to improve development efficiency , such as Vue and React.

It sounds like there's no problem , You don't have to doubt anything , because We're born with big companies , Or trust in fame .

however , Do you know? , When you decide to use someone else's resources , You've got some control of the project system 、 It could even be half a life , It's all handed in .

So think about it , The resources you use , Is it really believable ?

below 10 A question , It may change your perception of development .

1. Is the development tool credible ?

We usually write code in large, comprehensive development tools , such as JetBrains IDEA perhaps Vscode. A lot of students who are just starting to write code 、 Even experienced veterans , Have absolute trust in development tools .

For example, you type on the keyboard a, The editor interface must show a.

however , Due to lack of memory and other reasons , In fact, development tools are also popular .

For example, you want to call a function , Usually, after the first few letters of a function name , The development tool will automatically prompt you with the full function name , But if the development tool doesn't give you a hint , The first thing you suspect is that this function doesn't exist , It's not that the editor doesn't prompt you as expected . In this case , You can wait a moment for the editor , Or further confirm whether the function really does not exist , Instead of creating a new function immediately .

Or the project doesn't work , I think it's OK to investigate , At this point, you might as well restart the next development tool , Or clean up the cache , Maybe it will work !

There are also a lot of very interesting situations , For example, the editor is in full swing , All kinds of prompt errors , But the project still works .

Why can't it run ? Why can it work ?

therefore , Don't absolutely believe in development tools .

2. Are open source projects credible ?

This is an era of open source software , stay GitHub We can find a lot of excellent open source projects on the open source project platform , Good open source projects can even get 10 More than ten thousand concerns , Are these well-known open source projects credible ?

Not entirely credible ! From every open source project Issues You can see that , And usually the bigger the project , The more problems are found , such as Vue project , Cumulative put forward and closed 8000 Many questions .

Vue Project issues

I remember once using a well-known open source server Tomcat, I met bug, Every time a specific request is received, an error will be reported . At first I had no doubt that it was Tomcat The problem of , It's about trying to figure out what's wrong with your code . After repeated investigation and search , Finally confirmed that is Tomcat Of itself bug!

Although open source projects are not entirely credible , But compared to private projects , All the students who are interested in the project can find the problems in the project together , And solve it , To a certain extent, it can improve the reliability of the project .

3. Is the dependency library trustworthy ?

When we develop projects , A large number of dependency libraries are usually used . Official sources depend directly on ( such as Maven and npm) Search dependency Libraries , Then use the package manager , With a single line of command or writing a configuration file, you can make it automatically install dependencies , Very convenient .

however , These are published to the official source repository , It's believable ?

Not to mention that almost every developer has the opportunity to publish the dependency library to the official , Even if it's a dependency library for big Internet companies , It may not be credible .

What impresses me most is Alibaba's JSON Serialization class library fastjson, Almost nobody knows 、 No one knows , Because of its extremely fast parsing speed, it is widely praised . however , This library has been repeatedly exposed to high-risk vulnerabilities , It allows attackers to execute commands remotely ! The average developer doesn't find this at all , This has brought great harm to the project .

therefore , When selecting dependent Libraries , We should do a good job in research , Rely on the security of the library as much as possible , And make sure you don't conflict with existing dependencies .

4. Is programming language credible ?

Java Is a strong type of language , It's robust . I believe I have learned this sentence Java You can't be more familiar with . however , Can strongly typed programming languages be trusted ?

There may be students here who are going to express their doubts , If the most basic and low-level programming languages we've been using all the time exist bug, So how can we believe in frameworks built on these programming languages ?

But the truth is , All programming languages have bug! And basically every time a new version of a programming language is released, there's some history bug Amendment . Just Java for , There's even a special record bug The database of !

Java Bug database

however , For most developers , I believe that even if the program accidentally triggers the programming language itself bug, I don't have enough confidence to question , Instead, modify the code directly to bypass .

exactly , Questioning a programming language requires a foundation and knowledge base , But once you find a puzzling problem in the program , It is suggested that we should not ignore , You can spend some time exploring , Maybe you've succeeded in discovering a significant bug, Can also deepen the understanding of this programming language .

5. Is the server trusted ?

The server is the host of the project , The performance and stability of the server will directly affect the project process .

Whether it's personal developers or businesses , Usually, they will directly rent cloud servers provided by large companies to deploy projects , It saves you the trouble of building and maintaining yourself .

But is the cloud server of a big company credible ?

Not entirely credible ! Even today's cloud server providers promise their own services SLA( Service level agreements ) You can achieve 5 individual 9(99.999% About one year of downtime 5 minute ), even to the extent that 6 individual 9(99.9999% About one year of downtime 30 second ), But there are still risks .

There's a very famous case , stay 2013 year , China's largest social communication software has suffered a massive failure , Hundreds of millions of users are affected . And the reason is , One of the problems in the construction of municipal roads , Cut off the network cable , As a result, the server where the software is located cannot be accessed .

Except for the unreliability of usability , There may be some security and privacy issues . Of course, cloud service providers usually don't get users' data , But there's no way to trust them absolutely . After all, the privacy of data is crucial to the enterprise , This is why large companies will build their own server rooms and networks .

Computer room

6. Is the database trustworthy ?

Most business data in an enterprise is stored in a database , Through the project back-end program to operate and query the data in the database .

Just like the server , We can use software to build our own database , such as MySQL, You can also rent cloud databases of large companies directly , Is the database credible ?

In fact, in the enterprise back-end project , Databases are usually performance bottlenecks , Relatively fragile , When the amount of access concurrency is higher , The query performance of the database will decrease , In severe cases, the whole system may be down ! Even cloud database services provided by big companies , Encounter slow query ( It takes a long time to query ) when , Maybe there's no way to deal with it .

The data in the database may not be reliable , Sometimes a mistake by the Administrator , Accidentally delete data or add a wrong data , It may affect users , Losses caused . What is more , Even delete the library to run away , Don't talk about Medes !

Delete the library and run away

therefore , Don't trust the database too much , Techniques like caching should be used to help the database share the pressure , And back it up regularly . Otherwise, once the database goes down or data is lost , The loss is immeasurable !

7. Is the cache service trustworthy ?

Caching is a necessary technology for developing high performance programs , By storing slow query data such as database in memory , Read data directly from memory , To improve query performance . With caching , The project can not only support more people to query data at the same time , It also protects the database .

At present, the mainstream cache technology is RedisMemcached etc. , You can build your own server , You can also rent cloud caching services provided by big companies directly .

Store a cache of key value pairs

So is the cache service trustworthy ?

If the concurrency of the project is not particularly large , General caching technology is enough to support , But if the magnitude of the project is large , Maybe the cache can't withstand the pressure , When it's serious, it goes down . And once the cache crashes , A large number of query commands will directly request the database , So the database will hang up in an instant , In severe cases, it can lead to paralysis of the whole project !

therefore , When using caching , Concurrency needs to be evaluated , Ensure high availability by building clusters and data synchronization . Besides , And to prevent Cache avalanche 、 Cache penetration 、 Cache breakdown Other questions , A brief explanation .

Cache avalanche : A large number of caches expire at the same time , Requests can't access the cache , All on the database , Cause the database to hang up .

Cache penetration : Persistent access cache does not exist in key Causes the request to be called directly to the database , Cause the database to hang up .

Cache breakdown : A hot spot with high frequency of requests key Suddenly expired , All requests will be called to the database instantly , Cause the database to hang up .

If you don't prevent these three problems , Even if you rent a cache service from a big company , It's the same with blowing a bomb .

An avalanche

8. Is the object store trusted ?

In the project , There is often the ability for users to upload pictures or files , This kind of data is usually large , It's not convenient to store in database . Although we can save the files directly to the server , But it's better to use specialized object storage services .

You can simply store objects as a large folder , We can upload and download files directly through it . Big cloud service providers also provide professional object storage services , You don't have to build it yourself , Is the object store trusted ?

In general , Files uploaded to the object store will not be missing or lost , And it can also synchronize the uploaded data across the campus , Backup .

Cross campus synchronization

however , Remember the time , The file uploaded to the object store is not consistent with the source file , It's too small 1M. At first I thought it was when the file was uploaded to the object store , It will be compressed automatically , But after downloading the file from the object store to the local , It was found that it was not consistent with the source file ! Although the probability of this happening is extremely small , But from that moment on , I don't believe in object storage anymore .

Let's talk about the cross campus synchronization of object storage with my own real experience . Because the business that the individual is responsible for is more important , In case a single computer room hangs up , Maybe it's hundreds of thousands of dollars per minute ! So I configured automatic cross campus synchronization for the object store , Upload the file to Guangzhou computer room first , Then the data will be automatically synchronized to the Shanghai computer room , And the operation and maintenance students promise that the delay of automatic synchronization will not exceed 15 minute .

I believe most developers don't care if they configure data synchronization , I believe it will automatically synchronize . Then I write a program to do synchronous monitoring 、 When comparing data , It is found that the data is often not synchronized , Proportion up to 10%!

therefore , You can't trust object storage completely , Although most of the time, large companies have reliable object storage services , But there's no guarantee it's safe . Especially in the case of synchronous backup , Whether the synchronization is successful , How many students have cared about ? Write a program to verify and protect .

9. API Is the interface trustworthy ?

In development , We often call other systems to provide API Interface to implement a function easily . For example, inquire about the weather of a place , You can directly call the weather query interface provided by others , You don't have to write it yourself . We can also offer API Interface for other people to use , Especially in microservice Architecture , Each service realizes the interaction and cooperation in the form of interface call .

Almost all API Interface providers will say how secure their interfaces are 、 Please feel free to use , that API Is the interface really trustworthy ?

Actually ,API Interfaces are the least trusted resources !

First ,API The provider of an interface can be any developer , It's hard to determine the stability and security of the interface by their one-sided words .

Even if this interface has high performance 、 And it's safe , But you don't know how many people are using this interface with you at the same time , Maybe it's just you , Or maybe 100 Ten thousand other developers ? In this competitive environment , Interface qps(query per second Queries per second ) Can we meet our expectations ? Does the return time of the interface really not time out ?

What is more , Secretly put API Interface changed , No notification is sent to the caller , In this way, all callers of the interface will fail , Seriously affect the operation of the project !

therefore , We're calling a third party API Interface , Be careful 、 Be careful 、 Be careful again !

Besides , If we were API The provider of the interface , Also pay attention to protect your own API Interface , Avoid being called by too many developers at the same time , Causes the interface to hang up .

API There are complex call relationships

10. Serverless Credible? ?

If the server is not trusted , Let's not rent the server , Rent directly from a large company Serverless It's OK to serve as the backstage of the project ?

Serverless Server free architecture , It's not really that you don't need a server , It's about deploying the project interface 、 The operation and maintenance of the server needs to be done by the service provider , So developers don't have to care about servers , Just concentrate on writing code .

docker Containers

It sounds great , that Serverless Credible? ?

Use Serverless, Although it can greatly improve the efficiency of development and operation and maintenance , But relative to the server and other resources , Even less credible !

First ,Serverless It is deployed on the server itself , It will inevitably be affected by the server .

secondly ,Serverless Services don't keep the state of the application for a long time , It starts with the request , There is a cold start period , Although there are many related optimizations and solutions , However, the performance of the interface cannot be guaranteed precisely , Especially in high concurrency scenarios , Performance is often not as expected .

most important of all , When you choose to use Serverless The service , You're bound to a cloud service provider , It's very difficult to migrate later ! Just imagine , All the functions of your project are left to others to maintain , Is it really a good thing ? Once the cloud service provider has transformed the architecture or interface , Your code will change with it , And this change is not in your control !

Of course ,Serverless It has a lot of advantages , It is also the inevitable trend of the development of cloud computing technology , I just hope that before you use it , Considering the possible risks , And take measures to deal with it .

Cloud Computing Era

summary : It's because we trust so much in those big names 、 Seemingly safe resources , So the danger behind it is more difficult to detect , The consequences are often more lethal !

Defensive programming

“ In time of peace prepare for war , Serious consequences that may arise at any time , Be careful .”

In software development , Although the project appears to be working properly , But risk is everywhere , So we need to learn the idea of defensive programming . Think of yourself as a jerk , Don't trust anyone , Try to find the risks in the program , Active defense .

Let's share 16 A defensive programming approach , After learning , Can greatly reduce the risk in the program .

Praying programming

1. Programming habits

To reduce the risk in the program , First, develop good programming habits .

First , When writing code , Be sure to keep a good attitude , Don't write code in a hurry or with a mission mentality . If it's just to fulfill the requirements , Then it's very likely that you won't notice the risk in the code , Even found the risk is not willing to repair , This really saves development time , But when there's a problem later , You still have to spend more time investigating 、 Communication and repair bug. Spoil things by excessive enthusiasm , Run counter to one's desire .

When writing code , If you use the same and complex variable name or string multiple times in one place , It is suggested not to knock manually , It's using your favorite “ Copy and paste ”, To prevent the wrong hands bug.

Copy and paste a shuttle

Besides , We should strengthen the check of return value in the code , And choose safe syntax and data structure , Avoid using obsolete grammar . Different programming languages have different best programming habits , For example Java In language , Should be for all possible NULL To check the variables of , prevent NPE(NULL Pointer Error Null pointer exception ), When developing multithreaded programs , Choose thread safe ConcurrentHashMap instead of HashMap wait . It can also be used Assert( Assertion ) To ensure that the values of variables in the program run are as expected .

It is recommended to use an editor with check function to write code , When we write code, we automatically check for errors , You can also suggest good coding styles , Can greatly reduce the risk of development . Besides , Before the code is submitted , Be sure to check the code many times , Especially the files that are copied and pasted , There are often missing revisions . After submitting the code , You can also find experienced colleagues to help read and check the code ( Code review ), Further ensure that there are no grammatical and logical errors .

Editor syntax check and prompt

2. exception handling

The operation of the program changes , The same piece of code may produce different results in different situations , Even abnormal . So many mainstream programming languages have exception handling mechanisms , For example Java in , First use try Capture exception 、 Reuse catch Handling exceptions 、 Last use finally Release resources and take care of the aftermath .

In programming , We should make rational use of exception handling mechanism , To defend against possible problems in code . Usually in exception handling , We'll log errors 、 Perform error reporting and alerting 、 Retry etc. .

For example, we don't trust the database , Add exception handling when querying and manipulating data , Once the database ventilation results in operation failure , Record the failure information in the log , And by email 、 SMS and other alarm methods to inform developers , It's the first time you can find problems and investigate them . If necessary, automatic retrying can be realized , Save some manual operation .

It's abnormal

3. Request to check

All requests are untrusted , Even on the intranet , It may also be because of some mistakes , Caused the wrong request to be made .

So every interface we write , Before implementing specific business logic , Be sure to check the request parameters first , Here are a few common verification methods :

  1. Parameter type verification : For example, the request parameter should be Integer Integer instead of Long Long integer type .

  2. Value validity check : For example, the range of integers is greater than or equal to 0、 String length greater than 5, Or meet a particular format , Like cell phone number 、 ID card, etc .

  3. User permission check : Many interfaces need to be called by login users or administrators , So you have to pass the request parameter ( Request header ) To determine the identity of the current user , It was downloaded by an ordinary user VIP It's certainly unreasonable to pay for movies !

4. flow control

above-mentioned , All requests are untrusted , It's not just the requested value , And the amount and frequency of requests . For all interfaces , Limit the frequency of its calls , Prevent the interface from being flushed by a large number of instantaneous requests . For paid interfaces , It also prevents the number of user requests for the interface from exceeding the number of original purchases .

Besides , There is also a situation that is easily overlooked , If your interface A We call other people's interfaces in B, Maybe your interface A Its own logic can withstand every second 1000 A request , But you're sure the interface B Can you bear it ?

therefore , Flow control is needed , It's not just about preventing the interface from being blown up , It can also protect internal services and calls .

what , You said your interface is very good , It can resist 100 Million requests , There are no other services called , Then I'll look for 100 ten thousand + 1 Individuals also request your interface , Look, you're afraid !

DDOS Distributed denial of service attacks

The commonly used flow control can be divided into different granularity :

  1. User flow control : Limit the number of calls per user to an interface in a certain period of time .

  2. Interface flow control : Limit the total number of calls to an interface in a certain period of time .

  3. Single machine flow control : Limit the total number of calls to all interfaces of the project on a single server in a certain period of time .

  4. Distributed flow control : Limit the total number of requests from all servers of the project in a certain period of time .

Of course , In addition to the ways mentioned above , Flow control can be very flexible , There are also many excellent current limiting tools . such as Java Language Guava Library RateLimiter Token bucket single machine current limiting 、 Ali's Sentinel Distributed current limiting framework, etc .

Sentinel Flow control panel

5. Roll back

Sometimes , Our operations on the project may be wrong , It could be manual operation , It could also be machine operation , This led to some online failures . At this time , You can choose to roll back .

Rolling back means undoing an operation , Restore the project to its previous state , Here are some common rollback operations .

Data rollback

Sometimes , We want to insert data in bulk , But when the data is inserted half way through , The program suddenly appears abnormal , At this time, we need to roll back the previously inserted data , It's like nothing happened . Otherwise, there may be a risk of data inconsistency .

The most common way is to use transactions to handle batch operations of databases , When an exception occurs , Execute the rollback method of database client .

Configure rollback

If the configuration information of the project , For example, database link address , Write to code , Once the configuration is wrong or the address changes , You're going to have to rewrite the code , Very trouble .

A better way is to publish the configuration to the configuration center for management , Let the project read the configuration of the configuration center dynamically . If you accidentally publish the wrong configuration , You can roll back directly in the configuration center , Restore the configuration .

Publish rollback

No one can guarantee that their code is correct , A lot of times , The project did not find any problems in the test environment validation , But once online , It's a lot of holes . This shows that there is something wrong with our newly released code .

At this time , The simplest way is to roll back the version , Repackage and release the code that worked before . Large companies generally have their own project release platform , Can use the interface to roll back , Automatically publish previous versions of project packages .

6. Multi level cache

above-mentioned , Caching is very important for projects , It's not just a tool to improve performance , It's also the umbrella of the database .

But what if the cache crashes ?

There are two options , The first is to cluster the cache , So as to ensure the high availability of cache .

Redis colony

But nothing can be trusted , Clusters may also fail !

So you can use the second solution , The first level cache is down , Let's build a second level cache on top of it !

Usually , In high concurrency projects , We're going to design multilevel caching , Distributed caching + Local cache . When a request needs to get data , From distributed cache first ( such as Redis) Query in , If the distributed cache goes down collectively , So get the data from the local cache . such , Even if the cache crashes , It can also help the system support for a period of time .

This may be different from some multilevel cache designs , Sometimes , We will use the local cache as the first level cache , Cache some hot data , When the local cache cannot find a value , To access the distributed cache . The main problem that this kind of design solves is , Reduce the number of requests to the distributed cache , And further improve performance , It's different from the design purpose above .

Multi level cache design

7. Service fusions and downgrades

Every year's double eleven , We'll be on time to watch the flash buying page on the screen , Just waiting for that one “ Please try again later !”

Our project is far more fragile than we thought , Many services often have problems for various reasons . For example, when doing activities , A large number of users accessing at the same time will lead to more requests for project services , If you don't withstand the pressure of the project , It's going to hang up .

To prevent this risk , We can use service degradation Strategy , If the system really can't serve all users , Then go back and ask for the next , Give the user a direct return of “ Amicable ” Tips or interfaces , Instead of forcing the project to death under pressure .

coordination Service failure technology , According to the system load and other indicators to dynamically open or close the degradation . For example, machine CPU When occupied and full , Just turn on the demotion , Direct return error ; When the machine CPU When it comes back to normal , Return the data normally 、 Perform the operation .

Hystrix It is the famous microservice downgrading framework .


8. Active detection

above-mentioned , Even if it's a synchronization service from a big company , It is also possible that the synchronization is not timely or even the data is lost . therefore , To further ensure the success of synchronization 、 The accuracy of the data , We can Active detection .

For example, write a timed script or task , Check whether the data of the original address and the target address are consistent at regular intervals , Or check whether the data is correct through some logic . Of course, it can also be detected immediately after each data synchronization , More insurance .

Active detection

9. Data compensation

When data inconsistency is detected , We're going to have data compensation , For example, synchronize the data that is not synchronized again 、 Update inconsistent data, etc .

In addition to solving the data inconsistency detected actively , Data compensation is also widely used in business design and architecture design .

For example, after calling an interface to query data failed , Pause for a while , And then automatically try again , Or get data from other places . Another example is when the producer of message queue fails to send message , It should be automatically reissued and recorded , Instead of just invalidating the message .

The idea of data compensation is to ensure the final consistency of data , Data errors are not terrible , If you can correct your mistakes, you are a good child . This idea is also widely used in distributed transaction scenarios .

10. The data backup

Data is the life of an enterprise , So we have to keep the data as safe and complete as possible .

Many students store their important documents in multiple places , Like your own computer 、 On the Internet and so on . Again , In software development , We should also duplicate important data , As copies in different places . such , Even if one server is down , You can also get data from other servers , Reduced risk .

The data backup

11. heartbeat

Interface is a complicated and changeable guy , If our project relies on other interfaces for functionality , So it's better to make sure that the interface is alive all the time , Otherwise, it may affect the operation of the project .

for instance , When we pay with the bank , You must call the interface provided by the bank to get the balance information of the bank card , If this interface hangs up , Can't get the balance , Users can't pay , It's a loss of income !

therefore , We need to keep in touch with important interfaces all the time , To prevent them from accidentally dying . You can use the heartbeat mechanism , Call the interface regularly or send a heartbeat packet , To determine whether the interface is still alive . Once the call times out or fails , It can be checked and dealt with immediately , Thus, the impact time of the accident is greatly reduced .

The heartbeat detection

12. Redundant design

When evaluating system resources and capacity , We're going to do some redundant design , For example, the current total amount of data in the database is 1G, So if you want to synchronize the database data to other storage ( such as Elasticsearch) when , At least double the storage space , namely 2G, To deal with possible data growth later . The more potential the business has , There can be more multiples of redundancy , But also be careful not to be redundant , After all, resources are also very expensive !

Actually , Redundancy design is an important design idea . When we design business or system architecture , It can't be limited to current conditions , It's about future development , Choose a mode that is relatively easy to expand . Otherwise, the project will get bigger and bigger later , Every change to the project is difficult .

13. Elastic expansion and contraction

Dreams are necessary , Maybe suddenly , We used to have only 100 The small items that people use suddenly become popular , There are hundreds of thousands of new users to use .

however , Because our project is deployed on only one server , It can't support so many people , Just hang up , These users are very disappointed , I don't want to use our project anymore .

The dream is broken

This is also a common risk , We can use elastic scaling technology , The system will automatically expand or reduce resources according to the current project usage and resource occupancy .

For example, when the system pressure is high , Allocate more machines ( Containers ), When the system pressure is low , Cut down on a few machines . This can not only effectively cope with sudden traffic growth , It can also save costs in peacetime , And it saves the trouble of manually distributing and adjusting the machine .

14. Different live

Mentioned earlier , The server is not trusted , Don't say a server is down , Because of some natural and man-made disasters , The whole computer room may hang up collectively !

Unlike backup , Remote live refers to the establishment of independent data centers in different cities , Under normal circumstances , No matter where the user visits the business system , Can get the right service , That is, there are many at the same time “ live ” Service for .

And when business is abnormal in some place , Users can access normal business systems in other places , To get the right service .

In this way , Even if the computer room in Guangzhou has crossed , We also have Shanghai , Shanghai's Cross , We also have Beijing .

At the same time, the more services are alive , The more reliable the system is , But at the same time, the higher the cost 、 More complicated , So it's almost all big companies that do extra work . Never let the investment under normal circumstances be greater than the loss caused by the failure !

How to realize the technology of living in different places when you are hungry ( One ) General introduction

15. Monitoring alarm

It's impossible for the project to run normally all the time , But we can't 24 Watch your computer screen for hours to monitor project performance ? You can't completely ignore the project , It's out bug Waiting for users to complain .

therefore , The best way is to add monitoring alarms to the service , When something goes wrong with the program , Report to the monitoring platform at the information conference , And send a notice to the developer as soon as possible . You can also view the running status of the project in real time through the monitoring platform , If something goes wrong, you can locate it more quickly .

Grafana Monitoring platform

16. Online diagnosis and thermal repair

Since nothing in the program world can be trusted , Danger is everywhere , So just prepare for the worst , Suppose the online program will produce bug.

Since it is impossible to prevent , Then stand by , stay bug Repair it as quickly as it appears , To reduce the impact .

Usually , We need to change bug, You also need to go through code changes 、 Submission code 、 Merge code 、 Pack to build 、 Release online and other processes . When the process is finished , Maybe the system's cold .

To improve efficiency , We can use online diagnostics and hot fix Technology . In the presence of bug when , First use online diagnostic tools to easily get the running status and code execution information of the project , Improve the efficiency of investigation . Once the problem is discovered , Use hotfix technology to modify the runtime code directly , No need to rebuild and restart the project !

stay Java in , We can use Alibaba's open source diagnostic tools Arthas, At the same time, it supports online hot repair function . You can also write your own scripts to achieve , But it's a little more complicated .

Arthas Logo

See here , There must be some classmates who make complaints about it , How to write a program needs to consider so many problems that have nothing to do with function . Code that could have been written in five minutes , Now it may not take an hour to finish writing !

Super vicious

Actually , Not all projects need to be absolutely safe ( Of course, we can't ), It is that we should always keep in mind the danger in times of peace , Make defensive programming your own habit .

On the ground , According to the magnitude of the project 、 Audience 、 framework 、 The degree of emergency and other factors to comprehensively evaluate the project to achieve what degree of safety , Not over design 、 entertain imaginary or groundless fears .

Let's slow down the time , Think calmly before developing , Anticipate and avoid risks , Don't let the sword of Damocles fall .

Little flowers , Let them know you  “ Looking at ”  I

本文为[osc_ ftbxuxl1]所创,转载请带上原文链接,感谢