当前位置:网站首页>On November 11, 2020, Alibaba cloud grtn starts the second half of live broadcast and RTC technology

On November 11, 2020, Alibaba cloud grtn starts the second half of live broadcast and RTC technology

2020-12-07 19:19:20 Aliyun yunqi

live broadcast , Has become “ big spender ” One of our favorite forms of shopping . The ultimate pursuit of live experience , Taobao technology is also the direction of long-term efforts of people . In order to improve the user's shopping experience , Make the live broadcast more smooth , Let's chop hands faster , stay 2020 Double 11 period , Taobao launched Alibaba cloud for the first time CDN Of GRTN Global real time transmission network . data display , And traditional HTTPFLV/RTMP Compared with , When enabled GRTN after , The end-to-end delay of live broadcasting has been reduced 83%. that ,GRTN What is it ? What core technologies are hidden behind it ?

This article will review the development of live Internet technology , In depth analysis of the technical challenges of live delay , And interpretation of Alibaba cloud global real-time transmission network GRTN Design idea 、 Technical principle 、 Characteristics and application practice , as well as GRTN In getting rid of the involution of traditional live broadcasting technology (Involution) An attempt at embarrassment .GRTN It's not just designed for live Internet , Such as audio and video RTC Users of streaming media technology , For example, cloud Conference 、 Cloud games 、 Cloud desktop, etc , Moving business to GRTN What new ways of playing and innovation opportunities can come after ? This article will answer for you .

author : Zi Rong , Alibaba cloud senior technical expert , Responsible for the research and development of Alibaba cloud live video products and streaming media real-time acceleration platform

The evolution of live Internet technology

The development of internet live broadcasting technology can be roughly divided into 4 Stages : They are innovation period 、 Evolution period 、 Production period and bottleneck period .

 picture  1.png

On the Internet 1 The well-known live broadcast of the show can be traced back to 20 Many years ago , That is 20 The last year of the century , Victoria's Secret (Victoria Secret) Their fashion show was broadcast live online , That is, we are familiar with the Vimy show today , Although the picture is extremely unclear , But it also attracts millions of viewers , It shows the great attraction of this new species , Today's world-famous streaming media companies Netflix Nefi was still relying on DVD Renting to make a living . This period is called the innovation period of live broadcasting technology , It revolutionizes the viewing experience of viewers from offline files and DVD Leasing has been upgraded to online , But the live broadcasting experience in this period is still relatively poor , It is reflected in the delay and stuck in minutes, and often stuck .

Next , With the evolution of Internet infrastructure , Streaming media technology has also made great progress , The typical representative of this is that the streaming media technology has developed into a kind of CDN Very friendly model , Media stream slicing mode , The media stream is divided into 2-10s Unequal slice files , And pass CDN To distribute , This feature is well adapted to the Internet delay jitter , Thus, it provides a relatively smooth viewing experience , And the delay is compressed from minutes to tens of seconds . This period is called the speech period of internet live broadcast , During this period, the live broadcast application was mainly TV sports events .

The time came to 2016 year , With the advent of mobile Internet 4G Time , Beauty anchor 、 The rise of applications such as game anchors , Interactive live broadcasting began to explode , All kinds of live App Springing up like mushrooms , During this period , Internet celebrities have been able to broadcast anytime and anywhere through their mobile phones , At this time, domestic mainstream agreements are familiar RTMP、HTTPFLV、HLS etc. , Because the underlying transmission still uses TCP, Delay is common in 5-10s Between , But the picture is clear and fluent .

today , Live Internet has experienced 4 High speed development in , Users are increasingly demanding experience , Conventional 5-10s Delay is hard to interact in real time , For example, the current popular live delivery and online education business , The anchor and the audience 、 There is still much room for improvement in the real-time interactive experience between teachers and students , And with 5G Coming of age , New scene , such as AR/VR Immersive live 、4K Holographic projection remote live broadcasting requires higher bandwidth and lower delay . But in recent years, live broadcasting technology has failed to make a fundamental breakthrough , Live from each house CDN Manufacturers have invested a lot of energy in the existing based on TCP Of RTMP/FLV On the quality optimization of live broadcast system , The main optimization means are fine scheduling 、 Precise coverage 、 High quality resources 、 Optimize cache hit ratio 、TCP Protocol stack optimization 、 Live broadcast business behavior analysis, etc , The quality optimization system is becoming more and more sophisticated , But in terms of delay improvement, that is, in the hundreds of ms about , Even in the deduction of dozens of ms, The reduction of Caton is about a few percentage points , The improvement of the actual user experience has been very limited , Internet live broadcasting technology began to encounter bottlenecks , The development of this kind of involution actually restricts the development of business to a certain extent .

Internet live broadcast delay distribution and technical challenges

So how can we make a breakthrough in the delay ? To solve this problem , First of all, we need to analyze the overall distribution of live delay , The whole live Internet link can be divided into 7 A step : It's collecting 、 code 、 send out 、 distribution 、 receive 、 Decoding and rendering .

 picture  2.png

It collects + code , decode + The overall rendering delay is fixed , common 100ms about , Distribution and reception are the major changes , From tens of milliseconds to seconds , It mainly depends on the link delay jitter 、 Protocol stack optimization , as well as CDN Coverage of resources .

In the traditional architecture , This 7 Each link is independent of each other , Irrelevant , The advantage of this is that the team division of labor is more clear , But the problem is that it is difficult to achieve cross-border integration by means of optimization , As a result, system level optimization cannot be achieved . such as , If the encoder can consider the congestion of transmission to adjust the code rate in real time, the congestion can be alleviated to some extent , So as to reduce the delay ; For example, in the traditional streaming media transmission, the media data transmission and the underlying transmission are independent of each other , Bottom TCP Transmission congestion control algorithm is a general algorithm , I don't think about the nature of the media , Such a hierarchical structure is difficult to form an instant feedback system , So in order to ensure fluency , The design of buffer size is relatively conservative , Thus the end-to-end delay is sacrificed , If the transport layer and application layer are integrated ,QoS Controls are designed specifically for media characteristics , At the same time, with the rate control of coding side , You can do it in combination , Greatly reduces latency .

Therefore, the above links should be closely linked , Only when the whole link is aware of each other can the delay be compressed to the extreme .

Comparison of mainstream low delay live broadcasting solutions in the industry

The mainstream of the industry 5 Streaming media protocols and technologies , These include WebRTC、QUIC、SRT、CMAF、LLHLS. The contrast here is from the following 8 To expand from dimensions :

 picture  3.png

Put forward the time :WebRTC It was first put forward ,QUIC Followed by , The latest is last year Apple Newly released LLHLS
Completeness : The completeness here mainly focuses on whether the technology involves all aspects of the live full link mentioned earlier , such as WebRTC We think it's full coverage , It involves collecting from 、 codec 、 The whole process of transmission and rendering , So strictly speaking WebRTC It's not an agreement , It's an open real-time streaming media communication framework ; Then let's see QUIC, It's a being IETF Standardized Next Generation Transport Protocol ;SRT stay 2017 It was just a video transmission protocol when it was just open source , But with the support of many encoder manufacturers , It can also affect the bit rate of the coding side , In order to maintain a relatively stable delay .
Underlying transport protocols and types :WebRTC、QUIC、SRT It's all based on UDP And it's all streaming , and CMAF and LLHLS It's all sliced , Bottom based HTTP.

Standards and terminal support :WebRTC It's already W3C standard , And used a lot of IETF RFC standard , At present, almost all browsers and mobile phone operating systems support WebRTC;QUIC It is expected to officially become the next generation by the end of this year HTTP The standard is HTTP/3, at present Chrome Has supported .

Scene and delay :WebRTC It is designed for real-time audio and video communication scenarios , The end-to-end delay is in 400ms within ,250ms about ; And several other agreements need to be done 2s within , It also needs a lot of extra technology investment .

Comprehensive factors , Alibaba cloud's new generation transmission network has chosen WebRTC technology , Not only is the delay low , And it supports single channel full duplex , Can achieve a real sense of low latency + Interaction .

GRTN The positioning of

In order to reduce the end-to-end delay of live broadcast , Alibaba cloud CDN Hand panning technology combined with video cloud 、 Hands on the court XG The lab has been broadcasting live from 、 Short delay live broadcast has been extended to RTC field , And in QoS and AAA In terms of power , In the end, we successfully built GRTN(Global Realtime Transport Network) Global real time transmission network .

GRTN Based on the heterogeneous nodes of the central cloud and the edge cloud , Build ultra low latency 、 Fully distributed communication level streaming media transmission network .GRTN Now it's a combination of live Internet and RTC Audio and video streaming transmission and exchange in various business scenarios . be based on GRTN Short delay live broadcast of RTS Can support standards H5 WebRTC Push broadcast , In the case of tens of millions of concurrent cases, the delay can be controlled within 1s within ;RTC The end-to-end delay can be controlled in 250ms about .

GRTN framework

The following is a typical architecture of a traditional interactive live broadcast system , The characteristics of this architecture are :
• Tree hierarchy
• Upstream push stream mainstream protocol :RTMP/WebRTC
• The mainstream protocol for downplay : HTTPFLV/ RTMP/HLS
• Live distribution and RTC Push stream system separation
• End to end delay ~6s

 picture  4.png

The main drawback of traditional architecture is :

• The high cost , The main reason is that the link length of media data is long 、 Live distribution and RTC The streaming system is isolated
• Big delay , Because it is based on TCP Of RTMP/HTTP-FLV agreement , Moreover, the link of media data is long
• Difficulty in expansion , because RTMP/HTTP-FLV The protocol is not full duplex in transmission , So the business form can only support one-way live broadcast , Video interaction requires the use of a bypass link system .

Compared with the traditional live broadcast Architecture ,GRTN The technical characteristics of the architecture are :

• Hybrid networking : Tree hierarchy + Peer to peer graphics network
• Ability to sink : Protocol edge offload + Internal transport protocol normalization
• Control and data separation : Dynamic path planning + Fully distributed SFU

 picture  5.png

The core value of architecture upgrade is :

• lower the cost ,GRTN It's a multi service convergence network , Can support live 、RTC And the cloud on the video , The service reuse rate is high , in addition GRTN The internal link is shorter , The cost within the node is also lower .
• Improve quality ,GRTN The internal network supports the network structure constructed by dynamic route selection , Internal link delay can be achieved 20ms about , And the internal link uses private protocol for efficient transmission . In addition, the streaming and distribution of clients are based on WebRTC To build ,QoS Congestion control is designed specifically for the characteristics of streaming media , And it's still iterating and polishing based on online data construction .
• Easy to expand ,GRTN Support WebRTC agreement , Full duplex communication can be performed on a single connection channel , So you can publish and subscribe to media streams freely , In terms of business scalability, it brings more imagination space .

GRTN The core technology – Peer to peer networking and dynamic path planning

The traditional live broadcast architecture is a hierarchical tree structure , Because the link of media stream is relatively fixed , This kind of structure can put more R & D resources into the processing of media protocol in the early stage of the product , For the rapid construction of products, the ability is relatively risk controllable . But as the business grows , The defects of this architecture will become more and more obvious , For example, yanshigao 、 The high cost , And the expansibility is poor , To some extent, it hinders the development of the business , For example, it's hard to break through the delay 6s following , Video interaction can only be achieved through the bypass system .

In order to fundamentally solve this series of problems , Combined with hierarchical structure, it is helpful for system operation and maintenance and capacity evaluation , And the mesh structure is good for building high-quality and low-cost network characteristics ,GRTN The hybrid networking mode is adopted , That is, the combination of hierarchical structure and peer-to-peer graphical networking . The routing center periodically collects the results of internal link detection , In order to cooperate with dynamic networking , Streaming media brain module needs to manage streaming information , At the same time, it also needs to support path switching 、 Capacity planning and integrated scheduling between cost and quality .

GRTN The core technology – Multipath transmission

In order to be able to improve GRTN Reliability of internal link transmission , And consider the balance between cost and quality ,GRTN Support the following 3 Multi path transmission mode of internal link : Racing mode 、 Alternative mode and intelligent mode , Can be in high reliability , quality , Under the control of many factors, such as cost, adaptation and adaptive switching .

GRTN The core technology – Ability to sink

Streaming media technology is known for its many protocols , Mainly because of the diversity of business , The following is a comparison table of technological evolution trend of streaming media industry :

 picture 6.png

In the table above, only the relatively common protocols are sorted out , You can see the complexity of streaming media protocols , In the traditional architecture, the processing of these protocols is done in the center , Edge mainly do through transmission distribution , Such a problem is that the link handled by the protocol is too long , Not only the cost is high, but also the delay is long , So a natural idea is to sink the protocol and media processing power down to CDN The edge of , The center is just about control , To do something similar to Service Mesh Design idea , Separate control from data , Because the essence of these protocols is to transmit the basic stream of audio and video ES(Elementary Stream, For example, the common H.264/H.265/AAC/OPUS/VP8/VP8/AV1 etc. ), Different protocols solve the transmission problem of different encapsulation formats , Such as the TS(Transport Stream)、PS(Program Stream)、MP4、fMP4(fragment MP4)、FLV etc. , Different packaging formats are essentially how to package in different scenarios ES The problem of flow , So design a general purpose at the edge for different ES Stream transport protocol and caching system are completely feasible .GRTN Sink the protocol processing power to the edge node , At present, it can support RTMP、HTTP-FLV、WebRTC、GB28181 Equal flow protocol .

GRTN The core technology – Two way real time signaling network

Mentioned earlier GRTN One of the core values is high quality , High quality in addition to low latency , We also need to consider the ability of fast disaster recovery switching , And improve the first screen second open rate and other core indicators .
stay RTC A common function in the scenario is the client network Mobility, For example, when users go home or leave home during a meeting, the mobile network needs to be in 4G and wifi Switch between , In addition, consider the client access CDN When the node is abnormal , In both cases, the client will be in and GRTN Switch access nodes during communication ,GRTN The two-way real-time signaling network can realize the millisecond transmission of network cut messages , When there is a media stream at the publishing end, the network switches , The client of the subscription to GRTN The switching behavior that happens internally is completely insensitive .

GRTN The core technology – Continuously iterative QoS

GRTN The reason why the delay of live broadcast can be achieved by 6s Down to 1s within ,RTC Communication delay is achieved 250ms about , In addition to the transformation of the graphics network structure and protocol Sinking Technology , The most important thing is that we have adopted the perception of media characteristics QoS, This sum TCP or QUIC This kind of universal QoS Strategies are different in nature .

WebRTC Of QoS It is a multi-dimensional decision-making system for the characteristics of streaming media , It involves a lot of algorithm and policy parameters , In order to facilitate the business layer to the bottom layer QoS Optimization of algorithms and parameters ,GRTN Designed a set of pluggable QoS Integration Framework , combination GRTN Data based quality assessment system , You can do one integration continuous iteration , Different algorithms and parameters can be used GRTN Of A/B Quality assessment system for online evaluation , Form a racing mechanism .

meanwhile QoS And the dynamic path planning mentioned earlier in the article also has many joint points ,QoS One of the most important issues in the research is to distinguish the jitter and congestion of the network , If it is congestion, it needs to feed back to upstream for source bandwidth allocation ( For example, reduce the bit rate , Stream switching, etc ), But if it's just a brief jitter , We can enable a relatively aggressive anti packet loss strategy , Dynamic path planning faces similar problems , If it's just a temporary congestion , You can keep the current link and use QoS To carry the packet loss strategy , But if the link is congested , You need to switch links as soon as possible .

GRTN The core technology – Streaming media twins

GRTN There are also some new challenges when you upgrade to mesh . as everyone knows , stay 618 And double 11 Make sure that CDN An adequate supply of resources is essential , In the traditional hierarchical structure, the business hit rate can be used to separate the L1/L2/ The center evaluates it separately , In the network structure, the internal link is dynamically planned out , This means that the distribution of traffic is also dynamic , This is about how to evaluate CDN The overall capacity of the system presents new challenges ; Another example is how dynamic routing algorithms find a balance between quality and cost , In order to ensure that GRTN Low cost and high quality of the system ? In order to solve such problems ,GRTN Learn from the digital twin (Digital Twin https://en.wikipedia.org/wiki/Digital_twin) The idea of designing a streaming media twin (Streamimg Media Digital Twin) System , For capacity assessment 、 Algorithm training 、 Event double check and simulated pressure test etc .

GRTN The core technology – A programmable

The upper business scenarios of streaming media technology are very rich , E-commerce, for example 、 Videoconferencing 、 Online education 、 Live broadcast of enterprises 、 New retail, etc , So there is a lot of demand for customized development . Programmable transformation is GRTN An attempt to improve the stability of the system , at present GRTN The central streaming brain , Node side business module , Media data transmission module 、 Media signaling processing module has been programmable transformation , In most cases, binary publishing can be avoided .

Alibaba cloud ultra low delay live broadcast products RTS

In order to make it more convenient for customers and the industry to embrace GRTN, Alicloud based on GRTN Built a live broadcast service with ultra-low delay RTS, It has four technical features :

One 、 Second delay and excellent anti weak network capability , At the same stuck rate, the delay can be reduced 80%, Compared to traditional RTMP and FLV Of 5-10s Time delay ,RTS The delay of 1s within , And it's still based on big data online , In self-learning and continuous iteration .

Two 、 Mature and stable ,RTS after 2 More than years of dedicated research and development , And experienced Taobao live broadcast 618 The online test of big promotion , At present, it has been on-line in Taobao live broadcast .

3、 ... and 、 Open standards , In order to be able to facilitate the self-developed player customers to use our RTS service , Ali cloud, WebRTC The access signaling protocol is completely open 、 Transparent .

Four 、 Wide coverage and high concurrency ,RTS Services are built in alicloud 2800+ Above the edge node , Can support 10 million levels of concurrent playback .

Customer access RTS Two solutions for

One 、 For using cloud vendors to play SDK The customer , Upgrade playback SDK After that, the network transmission protocol can be flexibly selected according to the service , Create a more cost-effective combination .
1. Add a new live broadcast service RTS Streaming domain name , One push flow, two ways pull flow .
2. There is no need to modify the host side , Only upgrade playback SDK, The player automatically recognizes different URL Parameters .

 picture  7.png

Two 、 For self-developed player customers , Alibaba cloud openness and standards WebRTC Protocol docking code demonstration , Customers upgrade their own network modules , The underlying network docking is more transparent
1. Add a new live broadcast service RTS Streaming domain name , One push flow, two ways pull flow .
2. There's no need to change the anchor , Only upgrade the player network module , Pull ultra low delay stream playback .

 picture  8.png

About low latency + The future of interactive experience upgrading path

On the upgrading of user experience of streaming media business ,GRTN What's happening today is not just a reduction in latency , Another very important ability is GRTN The ability to interact flexibly ,GRTN Let's live and RTC The boundary of the fuzzy , Blur the boundaries between Remy and the conference . stay GRTN In the world of media streaming, there are only publish and subscribe relationships ,
• live broadcast :1 People publish, people subscribe
• 1v1 Client link mac + live broadcast :3 People publish, people subscribe , Here's number 3 The release is a combined screen stream from the anchor side .
• 1v1 The service side is connected with wheat + live broadcast : 3 People publish, people subscribe , Here's number 3 This is a combined release from the service side , When confluence is released , You can use GRTN The ability to cut the flow can make the audience feel nothing when switching between the two .
• Videoconferencing : Many people publish, many people subscribe ,GRTN Can be used for video streaming in conference ( High definition and low definition ) Switch .
Combined with GRTN Hierarchical delay capability based on cost and stability 50ms/250ms/800ms/6s/…, You can blend different scenarios and productization capabilities .

Online education experience upgrade

in the past , The typical framework of online education is bypass live mode , That is, there is a cloud based on WebRTC Of RTC Media Center , Teachers and students who need real-time interaction will access WebRTC Channel , Then from RTC Media Center , Turn it all the way RTMP To live CDN, Live by-pass , Due to the large delay of live broadcast and the inconsistency of protocol , In this case, if the students watching the live broadcast want to cut into RTC Interaction in the channel , The experience is poor , Sometimes there will be black screen interruption .

 picture  9.png

If you want to reduce the delay and completely solve the problem of black screen , Then you need to put the cloud WebRTC The ability of two-way communication channels has sunk to GRTN On , from GRTN To carry , Integrating interaction and distribution , So as to form an integrated ultra-low delay interactive large channel , Control through the business layer WebRTC Stream subscription relationship , There is no longer a clear line between live and interactive channels , Can be flexible according to the business situation on demand in 2 Switch between modes , And this kind of switching is totally insensitive to both students and teachers ,

E-commerce live experience upgrade

There is no essential difference between the current live delivery and online education mentioned above , The same idea can be used in the path of architecture upgrade . It is worth mentioning here that GRTN after , The technical team of the business side can focus more on doing a lot of innovative things , For example, in the small anchor's delivery room , In the past, only words or 1v1 It's even wheat , Access to GRTN after , It is easy to transform this one-way room with goods introduction into a multi person interactive discussion room similar to a video conference , Whether the improved user interaction experience can also bring about the same as the delay reduction GMV What about the conversion of ?

 picture  10.png

This article focuses on GRTN Service side core competence of , I hope to bring some inspiration to the students who do streaming media technology . Thanks for reading .

 

Link to the original text
This article is the original content of Alibaba cloud , No reprint without permission .

版权声明
本文为[Aliyun yunqi]所创,转载请带上原文链接,感谢
https://chowdera.com/2020/11/20201116232434190u.html