This paper sorts out some of them TCP/IP Ten problems in protocol cluster that need to know and know , It's a high-frequency interview question , It's also a programmer's essential basic literacy .
TCP/IP Ten questions
TCP/IP Ten questions
One 、TCP/IP Model
TCP/IP Protocol model （Transmission Control Protocol/Internet Protocol）, It contains a series of network protocols that form the basis of the Internet , yes Internet The core agreement of .
be based on TCP/IP Our reference model divides the protocol into four levels , They are the link layer 、 The network layer 、 Transport layer and application layer . The figure below shows TCP/IP Model and OSI The contrast relationship of each layer of the model .
TCP/IP The protocol family is from top to bottom , Layers of packaging . The top is the application layer , There are http,ftp, Wait for the familiar protocol . The second layer is the transport layer , The famous TCP and UDP The agreement is at this level . The third layer is the network layer ,IP The agreement is here , It's responsible for adding IP Address and other data to determine the destination of the transmission . The fourth layer is the data link layer , This layer adds an Ethernet protocol header to the data to be transmitted , And carry on CRC code , Prepare for the final data transfer .
The picture above clearly shows TCP/IP The role of each layer in the protocol , and TCP/IP In fact, the process of protocol communication corresponds to the process of data input and output . The process of entering the stack , Each layer of data sender continuously encapsulates the head and tail , Add some transmitted information , Make sure it can be transmitted to the destination . The process of getting out of the stack , Each layer of data receiver continuously demolishes the head and tail , Get the final data transmitted .
The picture above shows HTTP Agreement, for example , Specify .
Two 、 Data link layer
Physical layer is responsible for 0、1 Bit stream and physical device voltage 、 The exchange of flashes of light .
The data link layer is responsible for 0、1 The sequence is divided into data frames that are transmitted from one node to another nearby node , These nodes are through MAC To uniquely identify (MAC, Physical address , A host will have a MAC Address ).
- Package into frames : Add header and footer to network layer datagram , Package into frames , Include source... In frame header MAC Address and purpose MAC Address .
- Transparent transmission : Zero bit fill 、 Escape character .
- Reliable transmission : It is rarely used on links with low error rate , But wireless links WLAN Will guarantee reliable transmission .
- Error detection (CRC): Receiver detection error , If mistakes are found , Discard the frame .
3、 ... and 、 The network layer
IP Agreement is TCP/IP The core of the agreement , be-all TCP,UDP,IMCP,IGMP All the data are based on IP Data format transmission . It should be noted that ,IP It's not a reliable agreement , This is to say ,IP The protocol does not provide a mechanism for processing data that is not communicated later , This is considered a top-level agreement ：TCP or UDP What to do .
1.1 IP Address
In the data link layer, we usually use MAC Address to identify different nodes , And in the IP We also need to have a similar address , This is it. IP Address .
32 position IP Addresses are divided into network bits and address bits , This can reduce the number of routing table records in the router , With the Internet address , You can limit the terminals with the same network address to the same range , Then the routing table only needs to maintain the direction of the network address , You can find the corresponding terminals .
A class IP Address : 0.0.0.0~127.0.0.0 B class IP Address :188.8.131.52~184.108.40.206 C class IP Address :192.168.0.0~220.127.116.11
1.2 IP Protocol header
This is just an introduction : Eight TTL Field . This field specifies how many routes the packet will traverse before it is discarded . Some IP Every time a packet goes through a router , Of the packet TTL The number will decrease 1, When the packet's TTL Become zero , It will be abandoned automatically .
The maximum value of this field is 255, That is to say, a protocol packet goes through the router 255 Time will be abandoned , Depending on the system , This number is not the same , It's usually 32 Or is it 64.
2.ARP And RARP agreement
ARP It's based on IP The address for MAC A protocol of address .
ARP（ Address resolution ） Protocol is a kind of parsing protocol , The host didn't know this at all IP Corresponding to which host and which interface , When the host wants to send a IP When the package , I'll check my ARP Cache （ It's just one. IP-
MAC Address mapping table cache ）.
If the inquiry IP－MAC Value does not exist , Then the host sends a ARP Protocol broadcast package , There's nothing in this broadcast package to check IP Address , All hosts that receive the broadcast directly will query their own IP Address , If a host receiving a broadcast packet finds itself eligible , Then prepare a self contained MAC Address of the ARP Send the package to send ARP The host of the broadcast .
And the broadcast host gets ARP The package will update its own ARP cache （ Is the deposit IP-
MAC The place corresponding to the table ）. The host that sends the broadcast will use the new ARP The cache data is ready for data link layer packet sending .
RARP The work of the agreement is the opposite , Do not go into .
3. ICMP agreement
IP The agreement is not a reliable agreement , It doesn't guarantee that data will be delivered , that , natural , The work of ensuring data delivery should be done by other modules . One of the important modules is ICMP( Network control message ) agreement .ICMP It's not a high-level agreement , It is IP Layer protocol .
When transmitting IP Packet error . For example, the host is not accessible , The route is unreachable and so on ,ICMP The protocol will pack the error message , Then send it back to the host . Give the host a chance to deal with errors , this
That's why it's built on IP The protocol above layer is the reason why security is possible .
ping Can be said to be ICMP The most famous application of , yes TCP/IP Part of the agreement . utilize “ping” Command to check if the network is connected , It can help us analyze and judge the network fault .
for example ： When we can't go to a website . Usually ping Take a look at this website .ping It will show some useful information . General information is as follows :
ping The word comes from sonar positioning , And that's exactly what this program does , It USES ICMP Protocol package to detect if another host is reachable . The principle is that the type code is 0 Of ICMP Please
seek , The requested host uses the type code 8 Of ICMP Respond .
ping Program to calculate the interval time , And count how many packets have been delivered . Users can judge the general situation of the network . We can see , ping Given the transmission time and TTL The data of .
5、 ... and 、Traceroute
Traceroute It is an important tool to detect the routing between the host and the destination host , It's also the most convenient tool .
Traceroute It's very, very interesting , It receives... From the destination host IP after , First send a... To the destination host TTL=1 Of UDP Data packets , And after the first router received the packet , Just put TTL reduce 1, and TTL Turn into 0 in the future , The router abandoned the packet , And at the same time produce
A host is not accessible ICMP Datagram to host . The host will send another one after receiving this datagram TTL=2 Of UDP The data is reported to the destination host , Then stimulate the second router to send ICMP data
newspaper . So back and forth until the destination host . such ,traceroute I got all the routers IP.
6、 ... and 、TCP/UDP
TCP/UDP It's all transport layer protocols , But they have different characteristics , It also has different application scenarios , The following is a comparative analysis in the form of charts .
The message oriented transmission mode is that the application layer hands over to UDP How many messages ,UDP Just send , Send one message at a time . therefore , The application must choose the appropriate size of the message . If the message is too long , be IP Layers need to be fragmented , Reduce efficiency . If it's too short , Would be IP Too small .
Byte stream oriented
For byte streams , Although the application and TCP The interaction is one data block at a time （ Different sizes ）, but TCP Think of the application as a series of unstructured byte streams .TCP There's a buffer , When the data block transmitted by the application is too long ,TCP It can be divided into short and then transmitted .
About congestion control , flow control , yes TCP Key points of , Explain later .
TCP and UDP Some applications of the protocol
When should I use it TCP？
When there are requirements for the quality of network communication , such as ： The whole data should be delivered to each other accurately , This is often used in applications that require reliability , such as HTTP、HTTPS、FTP Wait for the protocol to transfer files ,POP、SMTP Wait for the email Transfer Protocol .
When should I use it UDP？
When the quality of network communication is not high , Network communication speed is required to be as fast as possible , It can be used at this time UDP.
7、 ... and 、DNS
System, The domain name system ）, On the Internet as domain names and IP A distributed database with address mapping , It can make it easier for users to access the Internet , Instead of remembering what can be read directly by the machine IP Number string . By hostname , Finally, we get the corresponding IP The process of address is called domain name resolution （ Or hostname resolution ）.DNS The protocol runs in UDP The agreement above , Use port number 53.
8、 ... and 、TCP Connection establishment and termination
1. Three handshakes
TCP It's connection-oriented , No matter which party sends data to the other party , We must first establish a connection between the two sides . stay TCP/IP Agreement ,TCP Protocol provides reliable connection service , The connection is initialized with three handshakes . The purpose of the three handshake is to synchronize the serial number and the confirmation number of both sides of the connection and exchange
TCP Window size information .
The first handshake ： Establishing a connection . Client sends connection request message segment , take SYN The position is 1,Sequence
Number by x; then , Client access SYN_SEND state , Wait for the server to confirm ;
The second handshake ： Server received SYN Message segment . The server received... From the client SYN Message segment , Need to be on this SYN Message segment to confirm , Set up Acknowledgment Number by x+1(Sequence Number+1); meanwhile , Send it by yourself SYN Request information , take SYN The position is 1,Sequence
Number by y; The server puts all the above information into a message segment （ namely SYN+ACK Message segment ） in , Send it to the client , At this time, the server enters SYN_RECV state ;
The third handshake ： Client receives server's SYN+ACK Message segment . And then Acknowledgment
Number Set to y+1, Send to server ACK Message segment , After this message segment is sent , Both the client and the server enter ESTABLISHED state , complete TCP Three handshakes .
Why three handshakes ？
In order to prevent the invalid connection request message segment from suddenly transferring to the server , So there's a mistake .
Specific examples ：“ Invalid connection request message segment ” In such a case ：client The first connection request segment sent is not lost , But in a network node for a long time , So that it will not arrive until some time after the connection is released server. Originally, this is a message segment that has already failed . but server After receiving this invalid connection request message segment , I mistook it for client A new connection request issued again . So I went to client Send confirmation message segment , Agree to establish a connection . Suppose you don't use “ Three handshakes ”, So as long as server Send a confirmation , A new connection is established . Because now client No connection request was made , So I won't pay any attention to server The confirmation of , Nor to server send data . but server But think that a new transport connection has been established , And kept waiting client Send data . such ,server A lot of resources are wasted . use “ Three handshakes ” The method can prevent the above phenomenon . For example, the situation just now ,client No direction server To send out a confirmation .server Because no confirmation can be received , You know client No connection required .”
2. Four waves
When the client and the server are established by three handshakes TCP After the connection , When the data transmission is finished , It must be disconnected TCP Connected . The for TCP Of disconnect , There is a mystery here “ Four breakups ”.
First break up ： host 1（ Can make the client , It can also be server-side ）, Set up Sequence
Number, Host computer 2 Send a FIN Message segment ; here , host 1 Get into FIN_WAIT_1 state ; This means host 1 No data to send to the host 2 了 ;
Second break up ： host 2 Received the host 1 Sent FIN Message segment , Host computer 1 Go back to one ACK Message segment ,Acknowledgment Number by Sequence Number Add 1; host 1 Get into FIN_WAIT_2 state ; host 2 Tell the host 1, I “ agree! ” Your request to close ;
The third break up ： host 2 Host computer 1 send out FIN Message segment , Request close connection , At the same time, the host 2 Get into LAST_ACK state ;
The fourth break up ： host 1 Received the host 2 Sent FIN Message segment , Host computer 2 send out ACK Message segment , And then the mainframe 1 Get into TIME_WAIT state ; host 2 Received the host 1 Of ACK After message segment , Just close the connection ; here , host 1 wait for 2MSL I still haven't received a reply , Then prove Server The end is closed normally , Good. , host 1 You can also close the connection .
Why break up four times ？
TCP The protocol is connection oriented 、 reliable 、 Transport layer communication protocol based on byte stream .TCP It's full duplex mode , That means , When the host 1 issue FIN In the message segment , It's just the host 1 There is no data to send , host 1 Tell the host 2, Its data has all been sent ; however , This time the mainframe 1 It's still acceptable to come from the host 2 The data of ; When the host 2 return ACK In the message segment , Indicates that it already knows the host 1 No data sent , But the host 2 You can still send data to the host 1 Of ; When the host 2 Also sent FIN In the message segment , This time is the host 2 There's no data to send , Will tell the host 1, I also have no data to send , After that, we will happily interrupt this TCP Connect .
Why wait 2MSL？
MSL： Maximum segment lifetime , It is the longest time in the network before any message segment is discarded .
There are two reasons ：
- Guarantee TCP The full duplex connection of the protocol can be reliably closed
- Ensure that duplicate data segments of this connection disappear from the network
The first point ： If the host 1 direct CLOSED 了 , So because of IP Protocol unreliability or other network reasons , Lead to a host 2 No host received 1 Last reply ACK. So the mainframe 2 It will continue to send after the timeout FIN, At this time, due to the host 1 already CLOSED 了 , Can't find and resend FIN The corresponding connection . therefore , host 1 Not directly into CLOSED, But to keep TIME_WAIT, When received again FIN When , Can guarantee that the other party receives ACK, Finally close the connection correctly .
Second point ： If the host 1 direct CLOSED, And then to the host 2 Initiate a new connection , We can't guarantee that this new connection is different from the port number of the connection just closed . That is to say, it is possible that the port number of the new connection and the old connection are the same . Generally speaking, nothing will happen , But there are still special cases ： Suppose the new connection has the same port number as the old connection that has been closed , If some data from the previous connection is still stuck in the network , The delay data does not reach the host until a new connection is established 2, Because the port number of the new connection and the old connection is the same ,TCP The protocol assumes that the delayed data belongs to the new connection , This is confused with the actual newly connected packets . therefore TCP The connection is still in TIME_WAIT Status waiting 2 times MSL, This can ensure that all data connected this time will disappear from the network .
Nine 、TCP flow control
If the sender sends the data too fast , The receiver may not have time to receive , This will result in the loss of data . So-called flow control Let the sender's sending rate not be too fast , To give the receiver time to receive .
utilize Sliding window mechanism It's very convenient to be in TCP Connect to realize the flow control of the sender .
set up A towards B send data . When the connection is established ,B Told A：“ My receive window is rwnd = 400 ”( there rwnd Express receiver window)
. therefore , The sending window of the sender shall not exceed the value of the receiving window given by the receiver . Please note that ,TCP The window unit of is byte , It's not a message segment . Suppose each segment is 100 Byte length , The initial value of the serial number of the data message segment is set to 1. Capitalization ACK Indicates the confirmation bit in the header ACK, A lowercase letter ack Indicates the value of the confirmation field ack.
As you can see from the diagram ,B Three times of flow control . For the first time, reduce the window to rwnd = 300 , It was reduced to the second time rwnd = 100 , Finally, it's reduced to rwnd = 0, That is, the sender is not allowed to send data again . The status that causes the sender to suspend sending will last to the host B Reissue a new window value until .B towards A The three message segments sent are all set ACK = 1, Only in ACK=1 Only when the confirmation number field is meaningful .
TCP There is a continuous timer for each connection (persistence timer). as long as TCP The connected party receives a zero window notification from the other party , Start the duration timer . If the duration timer is set to expire , Send a zero window control test message segment （ carry 1 Bytes of data ）, Then the party receiving the segment resets the duration timer .
Ten 、TCP Congestion control
1. Slow start and congestion avoidance
The sender maintains a congestion window cwnd ( congestion window
) State variable of . The size of the congestion window depends on how congested the network is , And it's changing dynamically . The sender makes its sending window equal to congestion window .
The principle for sender control of the congestion window is ： As long as the network is not congested , The congestion window will be larger , In order to send out more groups . But as long as the network is congested , The congestion window gets smaller , To reduce the number of packets injected into the network .
Slow start algorithm ：
When the host starts sending data , If a large amount of data bytes are injected into the network immediately , Then it may cause network congestion , Because it's not clear about the load of the network .
therefore , The better way is Let's probe first , That is to say, the sending window will gradually increase from small to large , in other words , Increasing congestion window value from small to large .
Usually at the beginning of sending a message segment , Put the congestion window first cwnd
Set to a maximum segment MSS The numerical . And every time a new message segment is acknowledged , Add at most one congestion window MSS The numerical . In this way, the congestion window of the sender is gradually increased cwnd
, It can make the rate of packet injection into the network more reasonable .
Every transmission round , Congestion window cwnd Double it . The time of a transmission round is actually the round-trip time RTT.
however “ Transmission rounds ” Put more emphasis on ： Put the congestion window cwnd All message segments allowed to be sent are sent continuously , And received a confirmation of the last byte sent .
another , Slow to start “ slow ” Not mean cwnd The growth rate is slow , It's about TCP Set before sending message segment cwnd=1, Make the sender send only one message segment at the beginning （ The purpose is to test the network congestion situation ）, Then gradually increase cwnd.
To prevent congestion in Windows cwnd Too much growth causes network congestion , You also need to set a slow start threshold ssthresh State variables . Slow start threshold ssthresh Can be used as follows ：
- When cwnd < ssthresh when , Use the slow start algorithm above .
- When cwnd > ssthresh when , Stop using slow start algorithm and use congestion avoidance algorithm .
- When cwnd = ssthresh when , You can use slow start algorithm , Congestion control avoidance algorithms can also be used .
Let the congestion window cwnd Slowly increase , That is, every time One round trip time RTT Just send the sender's Congestion window cwnd Add 1, Not double
. So jam the windows cwnd Grow slowly in a linear fashion , The congestion window growth rate of slow start algorithm is much slower than that of slow start algorithm .
No matter in the slow start stage or congestion avoidance stage , As long as the sender judges that the network is congested （ The basis is that no confirmation has been received ）, We need to set the slow start threshold ssthresh Set to send when congestion occurs
Half of the square window value （ But not less than 2）. Then put the congestion window cwnd Reset to 1, Execute slow start algorithm .
The purpose of this is to quickly reduce the number of packets sent to the network by the host , Make it happen Congested routers have enough time to process the backlog of packets in the queue .
Here's the picture , The process of the above congestion control is illustrated by numerical examples . Now the sending window is as big as the congestion window .
2. Fast retransmission and fast recovery
The fast retransmission algorithm first requires the receiver to send a repeated Confirmation immediately after receiving a out of order message segment （ In order to make the sender know as early as possible that a message segment has not reached the other party ） Don't wait until you send the data to confirm .
The receiver received M1 and M2 After that, they all sent out confirmation . Now suppose the receiver didn't receive it M3 But then I received M4.
obviously , The receiver cannot confirm M4, because M4 Is the received out of sequence message segment . according to Reliable transmission principle , The receiver can do nothing , You can also send a pair of M2 The confirmation of .
But according to the fast retransmission algorithm , The receiving party shall send the right to M2 Repeated confirmation of , This allows
The sender knows the message segment as soon as possible M3 It didn't reach the receiver . The sender then sent M5 and M6. After receiving these two messages , And send out the right again M2 Repeated confirmation of . such , The sender has received
Four pairs of recipients M2 The confirmation of , The last three are double confirmations .
The fast retransmission algorithm also provides , As long as the sender receives three repeated acknowledgments in a row, it shall immediately retransmit the message segments not received by the other party M3, You don't have to Continue to wait for M3 The set retransmission timer expires .
Because the sender retransmits the unacknowledged message segment as early as possible , Therefore, after adopting fast retransmission, the throughput of the whole network can be increased by about 20%.
With fast retransmission, fast recovery algorithm is also used , There are two main points in the process ：
- When the sender receives three consecutive acknowledgments , Is executed “ Multiplication is reduced ” Algorithm , Put the slow start threshold ssthresh halve .
- Unlike slow start, the slow start algorithm is not executed now （ Congestion window cwnd It's not set to 1）, But the cwnd Value is set to Slow start threshold ssthresh The value after halving , Then start to implement congestion avoidance algorithm （“ Add more ”）, Make the congestion window increase linearly slowly .
source | https://juejin.im/post/684490...