当前位置:网站首页>One accidentally drew 24 diagrams to analyze the network application layer protocol!

One accidentally drew 24 diagrams to analyze the network application layer protocol!

2020-11-10 10:44:26 Programmer

The overall context of the article is as follows

After the introduction of the previous two articles , I believe that readers have a preliminary understanding of the computer network , Then we will introduce the classification of different protocol layers , We still use a top-down approach to introduce , This kind of introduction is easier for readers to accept , Better absorption ( To put it bluntly, it is easier to like my articles , flee ).

In general , Users don't care much about how web applications actually work , But we are programmers , Just apply Zhu Wei said : You don't think computer network programmers understand , Do you point to Internet users to understand ? There's something inside ?

Application layer refers to OSI The first part of the standard model 5、6、7 layer , That is to say The session layer 、 The presentation layer 、 application layer .

When we introduce, we will use OSI Standard model to introduce , Because it covers many levels , So for TCP/IP In terms of models , You can also deepen your understanding of .

Application layer concept

Definition of application layer protocol

Today, , More and more applications use the network to communicate , These applications are Web browser 、 Remote login 、 E-mail 、 File transfer 、 File download, etc , The protocol of application layer is the rule and standard to carry out these activities .

Application layer protocol (application layer protocol) It defines how application processes on different end systems pass messages to each other . Generally speaking , Will define the following

  • Type of message exchanged : Is it a request message or a corresponding message
  • Explanation of message fields : A detailed description of each field in the message
  • The semantics of message fields : What is the meaning of each field of the message
  • When is the process 、 How to send messages and respond

Application layer architecture

Application layer architecture English is Application Architecture, It refers to the structure of the application layer , Generally speaking , There are two main architectures in the application layer

  • Customer - Server architecture ( client-server architecture )
  • Peer to peer Architecture ( P2P architecture )

Now let's talk about customers first - The concept of server architecture

At the customer - In the server architecture , There's a host that's always on called The server (Server), It provides from Customer (client) Service for . Our most common server is Web The server ,Web The server serves from browser Request .

BTPnRe.png

When Web After the server receives the user's request through the browser , After a series of processing, it presents information or pages to the application through the browser . This model is the customer - Server mode .

There are two points to note

  • At the customer - In server mode , Usually customers don't communicate with each other .
  • Servers usually have fixed 、 It is well known that IP The address can provide access to .

Customer - Server mode usually leads to the situation that a single server cannot complete a large number of customer requests with the rapid increase of the number of customers . So , Usually need to be equipped with a large number of mainframe Data Center (data center) , Used to track all user requests .

On the contrary ,P2P In other words, peer-to-peer architecture has a low dependence on this kind of data center , Because in P2P In Architecture , Applications communicate directly between two hosts , These hosts are called Counterpart , Unlike a central network system with a central server , Each client of a peer-to-peer network is a node , There are also server functions . common P2P The application of architecture is File sharing 、 Videoconferencing 、 Internet phone etc. .

P2P One of the biggest characteristics is Extensibility (self-scalability), because P2P An important goal of the network is to make all clients provide resources 、 Access to resources , Shared bandwidth , Storage space, etc . therefore , When more nodes join and the number of requests to the system increases , The capacity of the whole system is also increased . This is a client with a set of fixed servers - Server structure does not have , This is the same. P2P Scalability .

Process of communication

We talked about two architectures above , One is the customer - Server mode , One is P2P Peer to peer mode . We all know that a computer allows multiple applications to run at the same time , In our opinion, these applications seem to run at the same time , So how do they communicate with each other ?

In operating system terms , Communication is actually process (process) Not the program . A process can be thought of as a program running in an end system . When multiple processes are running on the same end system , They use interprocess communication mechanisms to communicate with each other . The rules of communication between processes are determined by the operating system .

The interface between a process and a computer network

Computers are huge and complicated , So is the computer network , An application can't be made up of just one process , It is also a multi process negotiation operation , However , How do processes distributed among multiple end systems communicate ? actually , There will be one... Between each process Socket (socket) The software interface exists , A socket is an internal interface to an application , The application can send or receive data through it , It can be opened like a file 、 Read write and close operations .

A simple analogy between socket and network process is given by an example : The process can be likened to a house , And its socket is the door of the house , When a process wants to communicate with other processes , It will push messages out of the door , Then transport the message to another house through transport equipment , Enter the interior of the house through the door .

The following is a flow chart of communication through a socket

BTPePO.png

As you can see from the picture ,Socket Belonging to a host or service process Internal interface , Controlled by application developers , Communication between the two end systems will be through TCP The buffer of is transmitted to another end system through the network TCP buffer ,Socket from TCP Buffer read message is provided for internal use by the program .

Socket is a programmable interface for building network applications , So sockets are also called... Between applications and networks Application programming interface (Application Programming Interface,API). Application developers can control socket internal details , But we can't control the transport layer , Only transport layer transport protocol can be selected , You can also select the transmission parameters of the transport layer , For example, the maximum cache and the maximum message length .

Process addressing

We mentioned above that network applications send messages to each other , So how do you know where you should send messages ? Is there a mechanism to let you know where you can send ? It's like sending an email , You've written the content but you don't know where to send it , So there must be a mechanism to know the address of the other party , This mechanism can identify the only address of the other party , This kind of address is IP Address . We will discuss in detail in the following articles IP The content of the address , At present, we just need to know IP It's a 32 The number of bits and the ability to uniquely identify the address of any host in the Internet is enough .

Only know IP Is the address OK ? We know that a computer may run multiple network applications back , So how to determine which network application receives the sent message ? So at this time, you need to know the network application Port number (port number). for example , Web Applications need to use 80 Port to indicate , The mail server program needs to use 25 To mark .

How does the application choose transportation services

We know that applications belong to the Internet four layer protocol application layer agreement , And the four layer agreement must help each other to complete the work together . Okay , At this time, we only have application layer protocol , We need to send messages , How do we send messages ? It's like you know where the destination is , How do you get to your destination ? It's walking , transit , Subway or taxi ?

The application sends the message The traffic tools There are a lot of options for , We can Is data transmission reliable 、 throughput 、 Timing and security To consider , Here's what you need to think about .

  • Is data transmission reliable

We discussed before , Packet loss may occur in the computer network , The severity of packet loss is related to the nature of network applications , If it's like email 、 File transfer 、 Remote host 、Web There is a problem in the process of document transfer , Data loss can have very serious consequences . If it's like online games , The impact of multi person video conference may be relatively small . In view of this , The reliability of data transmission is also the first problem to be considered . therefore , If an agreement provides such services to ensure data delivery , I think it provides Reliable data transmission (reliable data transfer), Applications that can tolerate data loss are called Tolerate lost applications (loss-tolerant application).

  • throughput

In the previous article we introduced the concept of throughput , Throughput is in the process of data transmission in network applications , The rate at which the sending process can deliver bits to the receiving process . Applications with throughput requirements are called Bandwidth sensitive applications (bandwidth-sensitive application). Bandwidth sensitive applications have specific throughput requirements , and Elastic applications (elastic application) Be able to use the available throughput more or less according to the available bandwidth at that time .

  • timing

What does timing mean ? Timing can ensure that the receiving and sending of two applications in the network can be completed within a specified time , This is also a factor that applications need to consider when choosing transportation services , It sounds natural , Your network application sends and receives the data packet certainly to take the time concept , For example, in the game , You can't send a packet of data later , You're stuck in the middle of the road .

  • Security

Last , Choosing a transport protocol must provide one or more security services for the application .

Transportation services that the Internet can provide

Finish the selection of transportation service , Next, let's talk about what services the Internet can provide . actually , The Internet provides two transport layer protocols for applications , namely UDP and TCP, Here are some network application selection requirements , You can choose the right transport layer protocol according to your needs .

application Data loss bandwidth Time sensitive
File transfer You can't lose elastic Insensitivity
E-mail You can't lose elastic Insensitivity
Web file You can't lose elastic Insensitivity
Internet phone / Videoconferencing Tolerate loss elastic sensitive ,100ms
Streaming audio / video Tolerate loss elastic sensitive , Seconds
Interactive games Tolerate loss elastic yes ,100ms
Smartphone message You can't lose elastic It doesn't matter

Let's talk about the application scenarios of these two transport protocols

TCP

TCP The features of the service model are as follows

  • Connection oriented services

After the application layer datagram is sent , TCP Let client and server exchange transport layer control information with each other . This handshake process is to remind the client and server that they need to be ready to accept datagrams . After the handshake phase , One TCP Connect (TCP Connection) So it's established . This is a full duplex connection , namely Processes on both sides of the connection can send and receive messages simultaneously on this connection . When the application ends sending the message , The connection must be removed .

  • Reliable data transmission

The communication process can rely on TCP, No mistakes 、 Deliver all data sent in proper order . Applications can rely on TCP Hand over the same byte stream to the receiver's socket , There is no loss of bytes and redundancy .

  • Congestion control

TCP Congestion control does not necessarily bring direct benefits to the communication process , But it can bring overall benefits to the Internet . When the network between the receiver and the sender is congested ,TCP Congestion control will inhibit the sending process ( Client or server ), We will discuss congestion control later

UDP

UDP It's a lightweight transportation protocol , It only provides minimal service .UDP It's disconnected , So there is no handshake process before the two processes communicate .UDP It will not guarantee whether the message is transmitted to the server , It's like a quitter . More Than This , Messages arriving at the receiving process may also arrive out of order .

Here are the protocols selected by some of the applications listed in the table above

application Application layer protocol Supporting transport agreements
E-mail SMTP TCP
Remote terminal access Telnet TCP
Web HTTP TCP
File transfer FTP TCP
Streaming multimedia HTTP TCP
Internet phone SIP、RTP TCP or UDP

Application layer protocol

Let's focus on the important application protocols in the application layer

WWW and HTTP

web (WWW, World Wide Web) It's a system that displays information on the Internet in the form of hypertext , That is to say Web . Used to display WWW The resulting client is called Web browser , Through the browser , We don't need to focus on which server the content we want to access is on , We just need to know what we want to visit .

WWW Three important concepts are defined , These concepts mainly include

  • URI, Defines the means and location of access to information
  • HTML, It defines the representation of information
  • HTTP, Defined WWW Access specification for

URI / URL

URI The full name is (Uniform Resource Identifier), The Chinese name is the uniform resource identifier , Use it to uniquely tag resources on the Internet .

URL The full name is (Uniform Resource Locator), Chinese name is the uniform resource locator , That's what we call it website , It's actually URI A subset of .

URI Not only URL, It also includes URN( Unified resource name ), The relationship between them is as follows

URI It is no longer limited to identifying Internet resources , It can be used as an identifier for all resources .

HTML

HTML It's called hypertext markup language , It's an iconic language . It includes a series of labels . Through these tags, the document format on the network can be unified , To disperse Internet Resources are connected as a logical whole .HTML Text is created by HTML Descriptive text composed of commands ,HTML Command descriptive text , graphics 、 Animation 、 voice 、 form 、 Links, etc. .

HTTP

Web The application layer protocol is HTTP(HyperText Transfer Protocol, HTTP), Hypertext transfer protocol , It is Web The core agreement of . Now we need to understand HTTP Some of the core concepts in .

Web page

Web Pages are also called Web Page, It's made up of objects , One object (object) In short, it's just a document , This document can be HTML file 、 A picture 、 a section Java Applications, etc , They all pass through URI To find the . One Web The page contains a lot of objects ,Web A page can be said to be a collection of objects .

browser

It's like email delivery protocol used by major mailboxes SMTP equally , Browser is to use HTTP The main carrier of the agreement , When it comes to browsers , You can think of several ? Yes , With the end of the Netscape war , Browsers are growing fast , The main browsers that have appeared so far are

Web The server

Web The official name of the server is Web Server,Web The server can wait for the browser Web The client provides documentation , You can also place website files , Let the world Browse ; Data files can be placed , Let the world download . At present, the three most mainstream Web Server is Apache、 Nginx 、IIS.

CDN

CDN The full name is Content Delivery Network, namely Content distribution network , It applies HTTP Cache and proxy technology in the protocol , Instead of the source station to respond to the client's request .CDN It's a network built on the existing network , It relies on edge servers deployed everywhere , Load balancing through the central platform 、 content distribution 、 Scheduling and other functional modules , Make users Nearby Get what you need , Reduce network congestion , Improve user access response speed and hit rate .CDN The key technologies are Content storage and Distribution technology .

Let's say you're going to Amazon to buy books , Before you can only buy through the shopping website and then send goods from the United States to your home through customs , Now set up an Amazon sub base in China , You don't have to mail through the United States , You can get the book to... As soon as possible from China .

WAF

WAF It's a kind of Web Application protection system (Web Application Firewall, abbreviation WAF), It's a way of executing a series against HTTP / HTTPS Of The security policy Come for Web An application that provides protection , It's application level A firewall , Special test HTTP Traffic , It's protection Web Applied security technology .

WAF Usually located in Web Before server , Can be stopped as SQL Inject 、 Cross site scripts and other attacks , At present, an open source project with more applications is ModSecurity, It can be fully integrated into Apache or Nginx.

WebService

WebService It's a kind of Web Applications ,WebService Is a cross programming language and cross operating system platform remote call technology .

WebService It's a kind of W3C Defined application service development specifications , Use client-server Master slave architecture , Usually use WSDL Define the service interface , Use HTTP Protocol transfer XML or SOAP news , It is One is based on Web(HTTP) Service architecture technology , It can run on the intranet , It can also run on the Internet after proper protection .

HTTP

HTTP It's a computer world that transfers text between two points 、 picture 、 Audio 、 Conventions and specifications of hypertext data such as video .HTTP It's an application layer protocol , It USES TCP As a transport layer agreement , Because of the document 、 Data this information in our view is an important kind of information , Don't lose .

HTTP Request response process

Let's explore... With an example HTTP Request response process , Let's assume that URL The address is http://www.someSchool.edu/someDepartment/home.index, When we enter the URL and click enter , The following operations will be performed inside the browser

  • DNS The server will first map the domain name , Find the interview www.someSchool.edu Address , then HTTP The client process is in 80 Port initiates a to server www.someSchool.edu Of TCP Connect (80 The port is HTTP Default port for ). There will be one in both client and server processes Socket It's connected to .
  • HTTP The client sends a... To the server through its socket HTTP Request message . The message contains the path someDepartment/home.index Resources for , We will discuss in detail later HTTP Request message .
  • HTTP The server receives the message through its socket , Perform request parsing , And from it Memory (RAM Or disk ) Retrieve the object www.someSchool.edu/someDepartment/home.index, And then encapsulate the retrieved objects , Package to HTTP In the response message , And send it to the customer through the socket .
  • HTTP The server will notify TCP To break off TCP Connect , In fact, it needs to wait for the customer to accept the response message before disconnection TCP Connect .
  • HTTP After the client accepts the response message ,TCP Connection will close .HTTP The client extracts a message from the response HTML The response file , And check that HTML file , Then loop through other internal objects in the message .
  • After inspection ,HTTP The client will present the corresponding resources to the user through the display .

thus , The whole process of typing in the URL and pressing enter is over . The process described above is a simple request - Respond to Whole process , A true request - The response can be much more complex than the process described above .

HTTP Request characteristics

From the whole process above, we can sum up HTTP Packet transmission has the following characteristics

  • Support customers - Server mode
  • Simple and fast : When a client requests a service from the server , Just send the request method and path . The common request methods are GET、HEAD、POST. Each method specifies a different type of client server contact . because HTTP Simple protocol , bring HTTP The program size of the server is small , So communication is fast .
  • flexible :HTTP Allow transfer of any type of data object . The type being transmitted is by Content-Type To mark .
  • There is no connection : Connectionless means that you are limited to one request per connection . The server completes the client's request , And received the customer's response , disconnect . This way you can save transmission time .
  • No state :HTTP A protocol is a stateless protocol . Stateless is a protocol that has no memory for transactions . The lack of state means that the previous information is required for subsequent processing , It must be retransmitted , This can lead to an increase in the amount of data transferred per connection . On the other hand , The server responds quickly when it does not need the previous information .

What we described above HTTP The request response process is a kind of Non persistent Links , Because every time TCP After delivering the paper , Will be closed TCP link , Every TCP The connection transmits only one request message and response message .

There are some non persistent connections shortcoming .

  • First of all , A new connection must be established and maintained for each requested object .
  • second , For every such connection , In both the client and the server TCP Buffer and hold TCP Variable , This gives Web The server brings a heavy burden . Because one station Web The server may serve hundreds or even thousands of customer requests at the same time .

When using HTTP 1.1 In case of continuous connection , The server keeps this... After sending the response TCP Connection on or off . Between the same client and server , Subsequent request and response messages can be transmitted through the same connection . Generally speaking , If a hop connection goes through a certain time interval ( Configurable ) Not used after ,HTTP The server should close its connection .

HTTP Message format

We described it above HTTP Request response process , Believe that you are HTTP Have a deeper understanding , Now let's get to know HTTP What is the message format of .

HTTP The agreement consists of three parts :

  • Start line (start line): Basic information describing the request or response ;
  • Header field (header): Use key-value Form a message... In more detail ;
  • Message body (entity): Data actually transmitted , It doesn't have to be plain text , It can be a picture 、 Video and other binary data .

Where the starting line and the header field become Request header perhaps Response head , Collectively referred to as Header; The body of a message is also called an entity , be called body.HTTP The protocol states that every message sent must have Header, however There can be no body, That is to say, header information is necessary , Entity information may not have . And in header and body There must be an empty line between (CRLF). If you use a picture to show HTTP If you ask , I think it should be like this

If you refine it a little bit , That's how it goes down

This picture needs attention , If you use GET Method , There is no entity , If you're using POST Method , There will be entities . When the user submits the form ,HTTP Clients usually use POST Method ; On the contrary ,HTML Forms are usually obtained using GET Method .HEAD The method is similar to GET Method , It's just HEAD Methods do not return objects .

So let's see HTTP response message

You can see , Only the request header is different from the response message , Other information is consistent .

Request message request line :

GET /some/page.html HTTP/1.1

response message :

HTTP/1.1 200 OK

HTTP A protocol is a Stateless protocol , That is, every time the server receives a request from the client , It's all a new request , The server does not know the client's history of requests ;Session and Cookie The main purpose is to make up for HTTP Stateless characteristics of .

Session What is it?

The client requests the server , The server will open up a piece for this request Memory space , This object is Session object , The storage structure is ConcurrentHashMap.Session Make up for HTTP Stateless characteristic , Servers can take advantage of Session Store some operation records of the client during the same session .

Session How to judge whether it is the same conversation

The first time the server receives a request , Opened up a piece Session Space ( Created Session object ), Generate a sessionId , And through the **Set-Cookie:JSESSIONID=XXXXXXX ** command , Send request settings to client Cookie Response ; After the client receives the response , Set up a **JSESSIONID=XXXXXXX ** Of Cookie Information , The Cookie Expires at the end of the browser session ;

Next, every time the client sends a request to the same website , The request header will carry the Cookie Information ( contain sessionId ), then , The server reads the Cookie Information , Get name as JSESSIONID Value , Received this request sessionId.

Session The shortcomings of

Session The mechanism has a drawback , such as A The server stores Session, After load balancing , If for a while A The number of visits to the Internet has soared , It will be forwarded to B Visit , however B The server has no storage A Of Session, It can lead to Session The failure of .

Cookies What is it?

HTTP In the agreement Cookie Include Web Cookie and browser Cookie, It is sent by the server to Web A small piece of browser data . Server sent to browser Cookie, The browser will store , And send it to the server with the next request . Usually , It is used to determine whether two requests come from the same browser , For example, users stay logged in .

HTTP Cookie The mechanism is HTTP A kind of supplement and improvement of stateless agreement

Cookie It is mainly used for the following three purposes

  • session management

land 、 The shopping cart 、 Game scores or other things the server should remember

  • Individualization

User preferences 、 Theme or other settings

  • track

Record and analyze user behavior

Cookie Used to be used for general client storage . Although it's legal , Because they are the only way to store data on the client side , But modern storage is now recommended API.Cookie Send with each request , So they can degrade performance ( Especially for mobile data connection ).

When receiving the HTTP When asked , The server can send Set-Cookie header ,Cookie Usually stored by a browser , And then Cookie And HTTP The header together makes a request to the server .

Set-Cookie HTTP The response header will cookie Send from server to user agent . Here is a send Cookie Example

This header tells the client to store Cookie

Now? , With every new request to the server , The browser will use Cookie The head will store all the previous Cookie Send back to the server .

There are two types of Cookies, One is Session Cookies, One is Persistent Cookies, If Cookie Does not include due date , Think of it as a conversation Cookie. conversation Cookie Stored in memory , Never write to disk , When the browser is closed , thereafter Cookie Will be lost forever . If Cookie contain The period of validity , Think of it as persistence Cookie. On the due date ,Cookie Will be removed from disk .

The other is Cookie Of Secure and HttpOnly Mark , Let's introduce one by one

conversation Cookies

The above example creates a session Cookie , conversation Cookie There is a characteristic , When the client is shut down Cookie Will delete , Because it doesn't specify Expires or Max-Age Instructions .

however ,Web The browser may use session restore , This will make most conversations Cookie To remain in a permanent state , It's like never closing a browser .

Permanence Cookies

Permanence Cookie Does not expire when the client is shut down , But in A certain date (Expires) or Specific length of time (Max-Age) Out of date . for example

Set-Cookie: id=a3fWa; Expires=Wed, 21 Oct 2015 07:28:00 GMT;

Even though Cookie It can simplify users' network activities , however Cookie The use of is controversial , Because a lot of people think that it is a kind of tort to users . Because of the combination Cookie And account information provided by users ,Web The site can learn more about users .

Web cache

Web cache (Web cache) It's also called proxy server (proxy server), It represents HTTP Server to meet the needs of the user network entities .Web The cache has its own Disk storage space , And will save the most recently requested object in the storage space , As shown in the figure below

Web Caching can be configured in the user's browser , Once configured , The first thing users access is not the initial server , You need to access the proxy server to determine whether the requested object exists , If the proxy server doesn't have , Then the proxy server requests the initial server to return the object to the client , At the same time, save the object in your own disk space .

Here we need to pay attention to , The architecture of the client and the initial server is Customer - The server Pattern , The proxy server can not only be used as a server , It can also be used as a client .

Proxy servers are usually made up of ISP(Internet Service Provider), Provide . Note that it's not an old color batch ...ISP That's what we often call operators , You'll see .

So why do you need a proxy server ? I believe you can roughly guess its function after reading the above description .

  • First , Proxy server can greatly reduce the response time to customer requests , Can respond to users faster .
  • secondly , The proxy server can reduce the traffic from an organization's access link to the network , Reduce network bandwidth , Reduce operator costs .
  • then , The proxy server can share the pressure of the initial server , Improve application performance .

DASH

From the above description, we know that HTTP It can transfer ordinary files 、 Audio 、 The video , These transmitted information are collectively referred to as MIME type .HTTP In the delivery of the video , It's just about transmitting video as an object , And an object is actually a file , A file is all in HTTP You can use URL To express . When the user is watching the video , The client and the server establish a TCP Connect and send to the URL Of GET request , Then when the server responds to the client , The client will cache a certain amount of byte data , When the data exceeds the preset threshold , The app starts playing the client's video .

One limitation of this approach is that for each client , Although the amount of bandwidth available to each client varies , But all clients receive the same video encoding . This leads to a waste of bandwidth . It's like I'm a 2 Megabyte network and 50 Megabytes of fiber can receive the same video encoding , Start playing the video with almost the same waiting time , So why should I spend 50 How about the money for megafiber ?

In order to improve this phenomenon , There is HTTP Of DASH,DASH namely Dynamic Adaptive Streaming HTTP, Dynamic adaptation flow . Its idea is for different traffic network , The bit data that can be transmitted is also different .DASH Allow customers to use different Internet transmission rates to play videos with different encoding rates . about 3G Users and fiber users will naturally choose to transmit bit data at different rates , In order to maximize the use of bandwidth .

CDN

With the access to the Internet, more and more users , Video has gradually become the bottleneck of bit transmission and the strong demand of users . As an Internet video company , In the beginning, the most direct way to provide streaming services is to create a single Large scale data centers . Cache all videos in the data center , And spread video directly from the data center to the world . But there are three problems with this approach

  • If the customer is away from the data center , Then the server to client packet will span many communication links and may pass through many ISP, How fast can your video play go ?
  • Every time the video data is retransmitted to the client , This will seriously waste network bandwidth , And video companies will pay for duplicate bandwidth
  • Single point problem , As long as the video data center goes down or something goes wrong , The direct result is that video can't be played all over the world .

In response to users all over the world 24 Hours of uninterrupted video distribution , Almost all the major video companies use Content distribution network (Content Distribution Network, CDN) .CDN Manage servers in multiple geographic locations , Cache various videos on each server 、 Audio 、 Documents, etc. .

CDN Content selection strategy

CDN Manage servers in multiple geographic locations , Store a copy of the video on its server , And all attempts to direct each user request to a CDN Location . So how to locate the server ? There are actually two server placement principles

  • thorough , Its main goal is to be close to users , By reducing end users and CDN The number of links and routers between clusters , Thus, the delay and throughput of user experience are improved .
  • Invite to be a guest , This principle is adopted in a small amount ( for example 10 individual ) Build large clusters at key locations to invite ISP To be a guest , Compared with in-depth design principles , Invite as a guest design usually results in lower maintenance and management costs .

CDN Workflow

CDN It can be special CDN(private CDN), That is, it's owned by content providers themselves ; Another kind CDN yes The third party CDN(third-party CDN), It distributes content on behalf of multiple content providers .

Now let's talk about CDN Workflow , As shown in the figure below

  • Users want to access the content of the specified website

  • The user initiates the local DNS,LDNS Query for ,LDNS Will relay the request to the website DNS The server , Website DNS The server will return to LDNS A website CDN The address of the authoritative server

  • LDNS The server will send a second request to the website CDN Authoritative server , Want to get the address of the web content distributor , Website CDN Will be able to CDN The address of the content distributor is sent locally DNS The server

  • Local DNS The server will put the website CDN The address of the content distributor is sent to the user

  • Users know the website CDN After the address of the content distributor , No need for extra operation , Direct and website CDN Content distributor set up TCP Connect , And send out HTTP GET request , If used DASH flow , According to different URL Version of the selection of different rates of blocks sent to the user .

CDN Cluster selection strategy

whatever CDN Deployment of , Its core is Cluster selection strategy (cluster selection strategy), That is, dynamically directing customers to CDN The mechanism of a server cluster or data center . A simple strategy is to assign customers to Geographically closest to (geographically closest) The cluster of . This selection strategy ignores that the delay and available bandwidth vary with the Internet path time , Always assign the same cluster to a specific customer ; Another alternative strategy is Real time measurement (real-time measurement), This mechanism is based on the periodic check of delay and packet loss performance between cluster and client .

DNS Internet directory service protocol

Imagine a question , How many ways can we humans recognize ourselves ? It can be identified by ID card , It can be identified by social security card number , It can also be identified by the driver's license , Although we have many ways of identification , But in certain circumstances , One recognition method may be more suitable than the other . Hosts on the Internet are like humans , Multiple identification methods can be used for identification . One way to identify a host on the Internet is to use it Host name (hostname) , Such as www.facebook.com、 www.google.com etc. . But it's the way we remember , Routers don't understand that , Routers like fixed length ones 、 Hierarchical IP Address ,so, Remember IP What is? ?

IP The address is now simply stated , It's just a matter of 4 Byte composition , And has a strict hierarchy . for example 121.7.106.83 Such a IP Address , Each of these bytes can be used . Segmentation , According to the 0 - 255 Decimal number of .( Concrete IP We will discuss later )

However , What routers like is IP Address resolution , What we humans remember is the web site , So how do routers put IP The address resolves to the address we are familiar with ? It's time to DNS There is .

DNS The full name is Domain Name System,DNS , It's a layered DNS The server (DNS server) Implementation of distributed database ; It is also an application layer protocol that enables the host to query the distributed database .DNS The server is usually running BIND(Berkeley Internet Name Domain) The software UNIX machine .DNS The protocol runs in UDP above , Use 53 port .

DNS Basic overview

And HTTP、FTP and SMTP equally ,DNS Protocol is also the protocol of application layer ,DNS Use Customer - The server Patterns run between end systems of communication , The end-to-end transport protocol below is used to transmit data between the end-to-end systems of communication DNS message . however DNS It's not an application that deals directly with users .DNS It provides a core function for user applications and other software on the Internet .

DNS It's not usually an independent agreement , It is usually used by other application layer protocols , These agreements include HTTP、SMTP and FTP, Resolve the user supplied hostname to IP Address .

Let's describe this with an example DNS Analytic process , This and you enter the URL after , What the browser does has similarities and differences

You type... In the browser www.someschool.edu/index.html What happens when ? In order to enable the user host to send a HTTP The request message is sent to Web The server www.someschool.edu , Will experience the following operations

  • The same user host is running DNS Application client
  • Browser from above URL Extract the host name www.someschool.edu , And pass the host name to DNS Application client
  • DNS The customer asked DNS The server sends a request containing the hostname .
  • DNS The customer will eventually receive an answer message , It contains the IP Address
  • Once the browser receives the IP After the address , It can be located in the IP Address 80 Port of HTTP The server process initiates a TCP Connect .

In addition to providing IP Address to hostname conversion ,DNS There are also several important services

  • Host alias (host aliasing), A host with a complex host name can have one or more other aliases , For example, one is called relay1.west-coast.enterprise.com The host , At the same time will have enterprise.com and www.enterprise.com Two host aliases for , under these circumstances ,relay1.west-coast.enterprise.com Also known as Specification host name , The host alias is easier to remember than the standard host name . The application can call DNS To get the canonical host name corresponding to the host alias and the host's IP Address .
  • Email server alias (mail server aliasing), alike , E-mail applications can also call DNS Resolve the provided hostname .
  • Load distribution (load distribution),DNS It is also used for load distribution among redundant servers . Busy sites such as cnn.com Distributed redundantly on multiple servers , Each server runs between different end systems , Each has a different IP Address . Because of these redundant Web The server , One IP The address set is therefore associated with the same canonical hostname .DNS These are stored in the database IP A collection of addresses . Because every time the client initiates HTTP request , therefore DNS Will be in all these redundant Web The load is circulated between servers .

DNS Work Overview

DNS It's a complex system , We are only here to learn about the main aspects of its operation , Here is a DNS A general overview of the work process

Suppose some applications running on the user's host ( Such as Web Browser or email reader ) The hostname needs to be converted to IP Address . These applications will call DNS The client of , And indicate the hostname to be converted . On the user's host DNS After receipt of , Will use UDP adopt 53 The port sends a... To the network DNS Query message , After a period of time , On the user's host DNS Will receive a host name corresponding to DNS Answer message . therefore , From the perspective of user host ,DNS It's like a black box , You can't see the operation inside . But actually , Realization DNS The black box of this service is very complicated , It consists of a large number of DNS Server and definition DNS Server and query host communication mode application layer protocol composition .

DNS One of the first simple designs was to use one on the Internet DNS The server . The server will contain all the mappings . This is a kind of Centralized The design of the , This design is not suitable for today's Internet , Because the Internet has a huge and growing number of hosts , This centralized design has the following problems

  • A single point of failure (a single point of failure), If DNS Server crash , Then the whole network will be paralyzed .
  • Communication capacity (traaffic volume), Single DNS The server has to deal with everything DNS Inquire about , This kind of query level may be millions or tens of millions
  • Remote centralized database (distant centralized database), Single DNS The server can't be near All users , Suppose... In the United States DNS It's impossible for the server to be close to being used by queries in Australia , Among them, query requests are bound to go through low-speed and congested links , Cause serious delay .
  • maintain (maintenance), Maintenance costs are huge , And it needs to be updated frequently .

therefore DNS It's impossible to design centrally , It has no scalability at all , So we use Distributed design , So the features of this design are as follows

Distributed 、 Hierarchical database

The first problem to be solved in distributed design is DNS Server scalability issues , therefore DNS Used a lot of DNS The server , Their organizational model is generally hierarchical , And distributed all over the world . No one DNS The server can have mapping of all hosts on the Internet . contrary , These maps are distributed over all DNS Server .

There are roughly three kinds of DNS The server : root DNS The server Top-level domain (Top-Level Domain, TLD) DNS The server and authority DNS The server . The hierarchical model of these servers is shown in the figure below

Let's say there's a DNS The client wants to know www.amazon.com Of IP Address , So how is the domain name server above resolved ? First , The client will first associate one of the root servers , It will return to the top-level domain com Of TLD Server's IP Address . The customer is with these TLD One of the servers contacted , It will be amazon.com Back to the authoritative server IP Address . Last , The client and amazom.com One of the authoritative servers to contact , It's for www.amazom.com Back to its IP Address .

Let's talk about the hierarchical system of domain name server

  • root DNS The server , Yes 400 Multiple root servers are available all over the world , These root servers are powered by 13 Different organization management . The root domain server list and organization can be found in https://root-servers.org/ Find , The root domain server provides TLD Server's IP Address .
  • Top-level domain DNS The server , For each top-level domain, for example com、org、net、edu and gov And all the national domain names uk、fr、ca and jp There are TLD Servers or server clusters . See... For a list of all top-level domains https://tld-list.com/ .TDL The server provides authority DNS Server's IP Address .
  • authority DNS The server , There are public accessible hosts on the Internet , Such as Web Servers and mail servers , The organization of these hosts must provide accessible DNS Record , These records map the names of these hosts to IP Address . The authority of an organization DNS The server has a collection of these DNS Record .

The general domain name server's hierarchy mainly includes the above three types , besides , There's another important category DNS The server , It is Local DNS The server (local DNS server). Strictly speaking , Local DNS The server does not belong to the above hierarchy , But local DNS And servers are crucial . Every ISP(Internet Service Provider) For example, in residential areas ISP Or an agency's ISP There's a local DNS The server . When the host and ISP When connecting , The ISP Will provide a host IP Address , The host will have one or more of its local DNS Server's IP Address . Connect through access network , Users can easily determine DNS Server's IP Address . When the host sends DNS After the request , The request is sent locally DNS The server , It acts as an agent , And forward the request to DNS In the server hierarchy system .

DNS cache

DNS cache (DNS caching) Sometimes it's called DNS Parser cache , It is a temporary database maintained by the operating system , It contains recent websites and other Internet Domain access records . in other words , DNS Caching is just that the computer caches the loaded resources to meet the fast response speed , A technique and method that can be used directly and quickly when visiting again . that DNS How does the cache work ?

DNS Caching workflow

Before the browser makes a request to the outside , The computer will intercept each request and send it to DNS Search the domain name in the cache database , The database contains a list of recent domain names , as well as DNS On the first request DNS The addresses calculated for them .

DNS Records and messages

To achieve together DNS All of the distributed databases DNS The server stores Resource records (Resource Record, RR),RR Provided host name to IP Address mapping . Every DNS One or more resource records will be included in the response message .RR Records are used to reply to client queries .

The resource record is a... That contains the following fields 4 Tuples

(Name, Value, Type, TTL)

RR There will be different types , Here are different types of RR The summary table

DNS RR type explain
A Record IPv4 Host record , Used to map domain names to IPv4 Address
AAAA Record IPv6 Host record , Used to map domain names to IPv6 Address
CNAME Record Alias record , Used for mapping DNS Alias of domain name
MX Record Mail exchange , Is used to DNS Domain name mapping to mail server
PTR Record The pointer , For reverse lookup (IP Address to domain name resolution )
SRV Record SRV Record , Used to map available services .

DNS message

DNS There are two kinds of messages , One is the query message , One is the response message , And the two messages have the same format , Here is DNS The message format of

The message format is explained below

  • front 12 A message is Head area , That is to say, the first area has 12 Bytes , First field ( identifier ) It's a 16 The number of bits , Used to mark the query . This identifier will be copied to the reply message to the query , So that the customer can use it to match the request sent and the answer received . The flag field contains several flags , The flag field is represented as 1 The bit , It is used to indicate that the message is 0- Query message or 1- response message .

  • Problem area Contains information about ongoing queries . This area includes :1) First name field , Contains the hostname being queried ;2) The type field , Indicate the type of question being asked about the name , For example, the host address is associated with a name ( type A) Or is it associated with a mail server of a certain name ( type MX).

  • In the from DNS The server's answer is , The answer area contains a resource record of the name of the original request . It said DNS RR The record is a quadruple , And Type There will be different types . The response area of the response message can contain multiple RR, So a host name can have more than one IP Address .

  • Authority area Contains records from other authoritative servers

  • Additional areas Contains other helpful records .

About the details DNS I will write an article on the details of the record .

P2P Document distribution

The agreement we discussed above HTTP、SMTP、DNS Have adopted Customer - The server Pattern , This model relies heavily on always open infrastructure servers . and P2P Client and client mode , Minimal dependency on always open infrastructure servers .

P2P The full name is Peer-to-peer, P2P , It's a distributed computer network . stay P2P In the system , All computers and devices are called peers , They exchange jobs with each other . Each peer in a peer-to-peer network is equal to the other peers . There is no privileged peer in the network , There's no primary administrator device .

In a sense , Peer to peer network is the most equal network in the computer world . Each peer is equal , And each counterpart has the same rights and obligations as the other counterparts . Peers are both clients and servers .

actually , Every resource available in a peer-to-peer network is shared between peers , Without any central server .P2P Shared resources in a network can be, for example, processor utilization , Disk storage capacity or network bandwidth, etc .

P2P What to do

P2P The main goal is to share resources and help computers and devices work together , Provide specific services or perform specific tasks . As mentioned earlier ,P2P Used to share various computing resources , For example, network bandwidth or disk storage space . however , The most common example of a peer-to-peer network is Internet File sharing on . Peer to peer networks are great for file sharing , Because they allow to connect to their computers and so on to receive and send files at the same time .

BitTorrent yes P2P The main protocol used .

P2P The role of the network

P2P Networks have some features that make them useful

  • It's hard to get off the line completely , Even if one of the peers drops the line , Other peers are still running and communicating . In order to make P2P( reciprocity ) The network stops working , You have to shut down all peer-to-peer networks . Peer to peer networks have strong scalability . It's easy to add new peers , Because you don't have to do any central configuration on the central server .
  • When it comes to file sharing , The larger the peer-to-peer network , The faster the speed. . stay P2P Storing the same file on many peers in the network means that when someone needs to download the file , The file will be downloaded from multiple locations at the same time .

TELNET

TELNET Also known as remote login , It's an application layer protocol , It provides users with the ability to control the remote host on the local machine . For example, the picture below shows

host A You can go directly through TELNET Protocol access host B.

TELNET utilize TCP A connection of , Send a text command to the host through a connection and execute it on the host .

Use TELNET The protocol needs to meet the following conditions for remote login

  • You have to know the remote host's IP Address or domain name
  • You have to know the login ID and password

TELNET Remote login generally uses 23 port

TELNET The working process is as follows

  • The local host establishes a connection with a remote host , This connection is actually TCP Connect , The user needs to know about the specified host IP Address or domain name
  • After establishing a connection with the remote host , The characters entered on the local host terminal will be marked with NVT(Net Virtual Terminal) To the remote host , This process is actually sending a packet to a remote host .
  • After the remote host receives the packet , The output generated will be NVT Send a packet to the local host in the format of , Including input command echo and command execution results
  • Last , The local host terminal unlinks the remote host , This process is actually TCP The process of disconnecting .

SSH

TELNET There is a very obvious drawback , That is, in the process of sending packets between the host and the remote host, it is plaintext transmission , Without any security encryption , As a result, it is easy to be sniffed by Internet criminals to do some bad things , For data security , We usually use SSH Remote login .

SSH It's an encrypted remote login system . Use SSH Can encrypt communication content , Even if the packet is sniffed and snatched, the information it contains cannot be cracked , besides ,SSH There are other features

  • SSH A stronger authentication mechanism can be used
  • SSH Can forward files
  • SSH You can use the port forwarding function

Port forwarding (Port forwarding) yes SSH A method used for secure network communication .SSH Port forwarding technology can be used to transmit other TCP/IP Protocol message , When using this way ,SSH For other services in the client and server to establish a secure pipeline Port forwarding refers to forwarding messages received by a specific port number to the specified IP A mechanism for address and port number .

FTP

FTP(File Transfer Protocol, File transfer protocol ) Is one of the application layer protocols .FTP The agreement consists of two parts , It is divided into FTP The server and FTP client . among FTP The server is used to store files , Users can use FTP Client pass FTP Protocol access is located at FTP Resources on the server .

because FTP Very efficient transmission , It is usually used to transfer large files over the network .

By default FTP Agreement to use TCP In port 20 and 21 These two ports , among 20 For data transfer ,21 Used to transmit control information .FTP TCP 21 File transfer on port No , Each time, a data transfer TCP Connect , After data transmission , The connection to transmit data will also be disconnected , Continue processing commands or responses on the control connection .

SMTP

The protocol that provides email service is called SMTP(Simple Mail Transfer Protocol), SMTP It's also used in the transport layer TCP agreement .

Early email is established directly between the sending and receiving hosts TCP Connect . After the sender compiles the email, it will save it on disk , Then set up with the receiving host TCP Connect , Send mail to the receiving host's disk . When the sender sends the mail , Then delete the message from the local disk . If the receiving host cannot receive due to special circumstances , The sender will wait a while and then resend .

Although this method can guarantee the integrity and validity of e-mail , But it's not suitable for today's Internet , Because early email could only be sent online , It's obviously not mature enough .

For this , Put forward Mail server The concept of . The mail server is the core of the whole mail system . Each recipient will have a mailbox (mailbox) There is . The user's mailbox management and maintenance send messages to him .

A typical email sending process is : Start with the sender's user agent , To the sender's mail server , And then to the recipient's mail server , And then it's distributed here to the recipient's mailbox . When the user of the receiver wants to read the mail from the mailbox , His email server will authenticate users . If the message sent by the sender cannot be delivered to the receiver's server correctly , Then the sender's user agent will store the mail in a Message queue (message queue) in , And try sending it again later , Usually every 30 Send once in minutes , If the transmission is not successful after a period of time , The server will delete the messages in the message queue and notify the sender by email .

Now you know the general process of mail sending between two mail servers , that ,SMTP How to send email from Alice Mail server to Bob Of the mail server ? It is mainly divided into the following three stages

  • Establishing a connection : At this stage ,SMTP Customer requests with server 25 The port creates a TCP Connect . Once the connection is established ,SMTP The server and the client begin to announce their domain names to each other , At the same time, confirm the domain name of the other party .
  • Mail delivery : Once the connection is established , Start the mail transfer .SMTP rely on TCP It can transfer the mail to the mail server of the receiver accurately .SMTP The source address of the customer's email 、 The destination address and the specific content of the email are delivered to SMTP The server ,SMTP The server responds accordingly and receives mail .
  • Connection release :SMTP The customer issues an exit order , The server responds after processing the command , Then close TCP Connect .

MIME type

At the beginning , E-mail on the Internet can only handle text format , Later, it gradually expanded to MIME type , We also mentioned a simple sentence above MIME type ,MIME(Multipurpose Internet Mail Extensions) Is the use of Internet mail extension type .

It's an Internet standard , Extended e-mail standards , So that it can support many formats , These formats are as follows

  • Hypertext markup language text .html text/html
  • xml file .xml text/xml
  • Plain text .txt text/plain
  • PNG Images .png image/png
  • GIF graphics .gif image/gif
  • JPEG graphics .jpeg,.jpg image/jpeg
  • AVI file .avi video/x-msvideo etc. .

Postscript

This article covers many application layer protocols , Include HTTP、DNS、SMTP、FTP、TELNET Agreements, etc

We will use these application layer protocols in our daily work , We're not just users , Or programmers , It's bound to be understood , I've drawn some pictures for you to help you understand the agreements , Behind the simplification is the complicated and arduous specification standard and the complexity of development .

If the article is well written , I hope readers can like 、 Looking at 、 Share 、 Leaving a message. , This will be my motivation to continue to be more literate , It's also the driving force for me to become a fan , I hope you can support .

In my The programmer cxuan Reply to the same official account cxuan Get the following PDF, Write by yourself .

BbqtMt.md.png
BbqJxI.md.png
Bbq8Gd.md.png
BbqGRA.md.png
BbqxFe.md.png
BbqwdS.md.png
BbqUqf.md.png
BbqDiQ.md.png
Bbq0Ig.md.png
BbqhoF.md.png
BbqXdO.md.png
BbqOeK.md.png

版权声明
本文为[Programmer]所创,转载请带上原文链接,感谢