当前位置:网站首页>Aliterway source code analysis

Aliterway source code analysis

2020-11-07 20:15:31 gaorong404

background

With the development of the company's business , The underlying container environment also needs to be deployed in various regions , Realize multi Cloud Architecture , Use the CNI The plug-in is k8s An efficient solution of network architecture in multi cloud environment . We are in the Alibaba cloud solution , We used the... Provided by Alibaba cloud CNI plug-in unit terway.terway Provided by the VPC An interworking network scheme , It is convenient to connect with the existing infrastructure , At the same time, there is no overlay Performance loss of network packet unpacking , Simple and easy to use , It's easy to diagnose network problems . This article makes a simple code analysis of the plug-in , Understand how it works , For later diagnosis and maintenance .

Functional division

Alibaba cloud open source terway The code consists of three parts :

  • CNI plugin: namely CNI plug-in unit , Realization ADD、DEL、VERSION Three interfaces for kubelet call , The plug-in will kubelet After simple processing of the passed parameters , Will pass gRPC call terwayBackendServer To achieve concrete logic , For example, apply for network equipment . A synchronous invocation terwayBackendServer After assigning network devices , Will pass ipvlanDriver.Driver Conduct pod sandbox network namespace Of Setup operation , At the same time, it will pass TC Flow control . The plug-in will go through daemonSet in initContainer Install to all node On .
  • backend server: terway The main executive logic of , Will be carried out in IPAM management , And apply for the corresponding network equipment , This part is the object of this analysis . The program uses daemonSet It runs on each node .
  • networkPolicy: This part is based on calico felix Realization , Completely decoupled from the above two parts . We see terway The network device created is based on cali Prefixed , It 's about compatibility calico Of schema.

TerwayBackendServer

stay terway Of main Function will start gRPC server Monitor requests , At the same time, a TerwayBackendServer, TerwayBackendServer Encapsulate all operation logic , stay newNetworkService The function initializes the sub module instances in turn , Specific include :

  • ECS client  Used for operation ECS client, All create delete update operations will finally pass this client To deal with , It's a simple package alicloud Of SKD
  • kubernetes pod Management module , Used to synchronize kubernetes pod Information
  • resouceDB Used to store status information , It is convenient to restore state after restart and other operations
  • resourceManager Examples of managing resource allocation ,terway Will generate different... According to different configurations resourceManager, Here we use ENIMultiIP This model , The corresponding is newENIIPResourceManager

ENIMultiIP The mode will apply for alicloud elastic network card and configure multiple auxiliary VPC Of IP Address , Add these to IP Address mapping and assignment to Pod in , these Pod The network segment of is consistent with that of the host , Can achieve VPC Inter-switch communication .

The whole architecture is shown in the figure below :

First of all, let's understand kubernetes pod Management module , This module is used to get kubernetes pod state .terway In order to support some advanced features , Such as flow control , There's some information that can't go through CNI Call passed in ,  Still have to go kubernetes Go to search for this information . Besides CNI The call may not call back exactly in some exceptional cases CNI plug-in unit , For example, the user directly kubectl delete pod --force --graceperiod=0, At this point, we need kubernetes As the only single source of truth, Make sure that the final network device is pod It must be released when deleting . Its main internal method is GetPod And GetLocalPod.GetPod Method will request apiserver return pod Information , If it's time to pod Already in apiserver Delete in , It will come from the local storage In order to get . The storage Yes, it is boltDB As a local file stored at the bottom , Every one that's been dealt with pod It's all here storage Keep a copy of the information , And the pod The copy does not follow apiserver in pod Delete for , In this way, if the later program needs the pod Information can be obtained from storage In order to get . At the same time pod The replica will be cleaned up asynchronously goroutine stay pod Delete in an hour .GetLocalPod It's from apiserver Get the node On all the pod Information , This procedure is called kubernetes Most places , Two clean up at present goroutine Every 5min Call once , The number of calls is relatively small , Yes apiserver The impact of the load of . The module will also be local DB Cache a piece of data in , Easy to do kubernetes pod You can also get the user information after deleting .

The second is resourceDB modular , This module is used to persist state information , The DB The currently assigned pod And its network devices (networkResource) Information . Each request / The release device will update the DB. After the program restart initialization is complete , Also from the resouceDB Recover last run data in .
In addition to the basic assignment deletion operation, the DB, terway It also starts asynchrony goroutine Regular clearance , To ensure the final consistency in case of abnormality , The goroutine From apiserve Get all pod Information and current DB Compare the information in , If the corresponding pod If deleted, the corresponding network device will be released first , And then from DB Delete the record . At the same time, delayed cleanup can be achieved Statefulset Of Pod During the update process IP The address remains the same ,

most important of all resouceManager modular , The iterface It encapsulates the operation of specific network devices , As shown below :

// ResourceManager Allocate/Release/Pool/Stick/GC pod resource
// managed pod and resource relationship
type ResourceManager interface {
    Allocate(context *networkContext, prefer string) (types.NetworkResource, error)
    Release(context *networkContext, resID string) error
    GarbageCollection(inUseResList map[string]interface{}, expireResList map[string]interface{}) error
}

From three of them method It's obvious what actions can be performed , Every time CNI Plug in call backendServer when , Will call ResoueceManager Carry out specific allocation and release operations . about ENIMultiIP In this mode , The concrete implementation class is eniIPResourceManager

type eniIPResourceManager struct {
    pool pool.ObjectPool
}

Only one pool A member function , The specific implementation type is simpleObjectPool, The pool Maintains all current ENI Information . When resouceManager When allocating and releasing network devices, it is actually from the pool Access in :

func (m *eniIPResourceManager) Allocate(ctx *networkContext, prefer string) (types.NetworkResource, error) {
    return m.pool.Acquire(ctx, prefer)
}

func (m *eniIPResourceManager) Release(context *networkContext, resID string) error {
    if context != nil && context.pod != nil {
        return m.pool.ReleaseWithReverse(resID, context.pod.IPStickTime)
    }
    return m.pool.Release(resID)
}

func (m *eniIPResourceManager) GarbageCollection(inUseSet map[string]interface{}, expireResSet map[string]interface{}) error {
    for expireRes := range expireResSet {
        if err := m.pool.Stat(expireRes); err == nil {
            err = m.Release(nil, expireRes)
            if err != nil {
                return err
            }
        }
    }
    return nil
}

As can be seen from the above code ,resouceManager The actual operation is simpleObjectPool This object .  Let's look at this pool What operations have been done . First initialize the pool:

// NewSimpleObjectPool return an object pool implement
func NewSimpleObjectPool(cfg Config) (ObjectPool, error) {
    if cfg.MinIdle > cfg.MaxIdle {
        return nil, ErrInvalidArguments
    }

    if cfg.MaxIdle > cfg.Capacity {
        return nil, ErrInvalidArguments
    }

    pool := &simpleObjectPool{
        factory:  cfg.Factory,
        inuse:    make(map[string]types.NetworkResource),
        idle:     newPriorityQueue(),
        maxIdle:  cfg.MaxIdle,
        minIdle:  cfg.MinIdle,
        capacity: cfg.Capacity,
        notifyCh: make(chan interface{}),
        tokenCh:  make(chan struct{}, cfg.Capacity),
    }

    if cfg.Initializer != nil {
        if err := cfg.Initializer(pool); err != nil {
            return nil, err
        }
    }

    if err := pool.preload(); err != nil {
        return nil, err
    }

    log.Infof("pool initial state, capacity %d, maxIdle: %d, minIdle %d, idle: %s, inuse: %s",
        pool.capacity,
        pool.maxIdle,
        pool.minIdle,
        queueKeys(pool.idle),
        mapKeys(pool.inuse))

    go pool.startCheckIdleTicker()

    return pool, nil
}

You can see that when it is created, it will be based on the config Initialize the member variables in turn ,  among

  • factory Members are used to assign network devices , Would call ECS SDK To allocate resources , After allocation, the information is stored in pool In , The concrete realization is eniIPFactory.
  • inuse Store all currently in use networkResource
  • idle It stores all the current free networkResource, That is, it has passed factory Assign well , But it has not yet been pod The actual use . If a network resouce No longer use , It will be returned to idle In .  In this way ,pool Have certain slow charging ability , Avoid frequent calls factory Distribution release .idle by priorityQeueu type , That is, all the free networkResouce Arrange by priority queue , The priority queue comparison function compares reverse Field ,reverse The default time is to join the team , That is the networkResouce The release time of , This will try to make a IP After release, it will not be reused immediately .reverse Fields for some statueSet Of resouce There will also be some special treatment , because statufulSet It's state workload, about IP The release of will also be dealt with specially , Make sure it's as reusable as possible .
  • maxIdle, minIdle Respectively means the above idle The maximum and minimum number allowed in the queue , minIdle It is to provide a certain buffer capacity , But that value does not guarantee , The biggest is to prevent too much cache , If you are free networkResouce Too much not being used will release part of ,IP The address is more than a node level resource , It will take up the whole vpc/vswitch/ Security group resources , Too much idle time may cause other nodes or cloud products not to be allocated IP.
  • capacity Is that the pool The capacity of , What can be allocated most networkResouce The number of . This value can be specified by yourself , But if more than that ECS The maximum number allowed will be set to the maximum number allowed .
  • tokenCh It's a buffered channel, The capacity is the above capacity Value , Be done token bucket. pool During initialization, it will be filled with elements , Later in operation , As long as you can from the channel Reading an element in means that pool It's not full yet . Every time you call factory apply networkResouce I would have been from channel Read an element ,  Every time you call factory Release networkDevice It's from time to time channel Put an element .

When the initialization of member variables is completed, it will call Initializer, This function will call back a closure function , It's defined in newENIIPResourceManager in : When the program starts ,resouceManager By reading the data stored on the local disk, i.e resouceDB The information in gets the currently in use networkResouce, And then through ecs Get current all eni Equipment and ip, Traverse all in turn ip Determine if you are using , Initialize respectively inuse and idle. This can ensure that the program can be restarted after the reconstruction of memory pool Data and information .

And then call preload, This function ensures that pool(idle) There is minIdle A free element , Prevent large number of calls at startup factory.
In the end go pool.startCheckIdleTicker()  Asynchronously goroutine Call in checkIdle Check regularly pool(idle) Is the element in more than maxIdle Elements ,  If it exceeds, it will call factory To release . Every call at the same time factory Also through notifyCh To inform the goroutine Perform inspection operations .

pool After structure initialization ,resouceManager All for networkResource The operation will pass the pool Conduct , The pool Call... If necessary factory Distribution release .

factory The concrete realization of eniIPFactory, Used to call ecs SDK Apply for release eniIP, And maintain the corresponding data structure . It's different from using eni equipment ,ENIMultiIP The pattern will be for each eni There will be multiple devices eniIP.eni The device is through ENI Structure identification , eniIP adopt ENIIP Structure identification .terway For every one of them ENI Create a goroutine, The ENI On all the eniIP The distribution and release of will be in goroutine Inside ,factory adopt channel With this groutine signal communication , Every goroutine Corresponds to an acceptance channel ipBacklog, Used to pass assignment requests to the goroutine. Every time factory Need to create (eniIPFactory.Create) One eniIP when , It will traverse the existing ENI equipment , If the device is free eniIP, It will pass the ipBacklog channel Send an element to the ENI The equipment goroutine Request allocation , When goroutine take eniIP Pass after allocation factory Of resultChan notice factory, such factory To successfully complete an assignment . If all ENI Of eniIP All allocated , Will first create ENI Equipment and its corresponding goroutine. Because of every ENI The device will have a master IP, So for the first time ENI There is no need to send a request to ipBacklog, Direct the Lord ip Return to . Corresponding release (Dispose) Is to release first eniIP, Until there's only one left eniIP( Lord eniIP) Will release the whole ENI equipment . For all ecs All calls will pass buffer channel Flow control , Prevent instant calls from being too large .

summary

All in all ,terway The whole realization of , The logic is clear , And the scalability is also high . later stage , We can do some customization and operation and maintenance support on this basis more conveniently , So as to integrate into the company's infrastructure .

Read More

terway design Doc

版权声明
本文为[gaorong404]所创,转载请带上原文链接,感谢