Amazon VPC CNI plugin increases pods per node limits

Amazon VPC CNI 플러그인으로 노드당 파드수 제한 늘리기

As of August 2021, Amazon VPC Container Networking Interface (CNI) Plugin supports “prefix assignment mode”, enabling you to run more pods per node on AWS Nitro based EC2 instance types. To achieve higher pod density, the VPC CNI plugin leverages a new VPC capability that enables IP address prefixes to be associated with elastic network interfaces (ENIs) attached to EC2 instances. You can now assign /28 (16 IP addresses) IPv4 address prefixes, instead of assigning individual secondary IPv4 addresses to network interfaces. This significantly increases number of pods that can be run per node.

2021년 8월에 Amazon VPC 컨테이너 네트워킹 인터페이스(CNI) 플러그인에 AWS Nitro 기반의 EC2 인스턴스 유형인 경우 노드당 더 많은 파드를 실행할 수 있도록 "접두사(prefix) 할당 모드"를 추가했다. 더 많은 파드를 실행하도록 VPC CNI 플러그인은 IP 주소 접두사를 EC2 인스턴스에 할당된 탄력적 네트워크 인터페이스(ENI)와 연결할 수 있는 신규 VPC 기능을 활용한다. 이제 네트워크 인터페이스에 개별 보조(Secondary) IPv4 주소를 할당하는 대신에 /28 (16개의 IP 주소) IPv4 주소 접두사를 할당할 수 있다. 이렇게 하면 노드 당 실행할 수 있는 파드의 수가 크게 늘어난다.

In this post, we’ll look under the hood at how the feature is implemented, walk through how to configure VPC CNI with prefix assignment enabled, and discuss use cases and key considerations.

이 게시물에서는 기능이 어떻게 구현되었는지 자세히 살펴보고, 접두사 할당을 활성화하도록 어떻게 VPC CNI를 설정할 수 있는지 알아보며, 사용사례와 주요 고려사항들에 대해 알아본다.

Amazon VPC CNI prefix assignment mode

Amazon VPC CNI 접두사 할당 모드

Pods are the smallest deployable units of computing that can be created and managed in Kubernetes. A pod requires a unique IP address to communicate in the Kubernetes cluster (host networking pods being an exception). Amazon Elastic Kubernetes Services (EKS) by default runs the VPC Container Networking Interface (CNI) Plugin to assign IP address to a pod by managing network interfaces and IP addresses on EC2 instances. The VPC CNI plugin integrates directly with EC2 networking to provide high performance, low latency container networking in Kubernetes clusters running on AWS. This plugin assigns an IP address from the cluster’s VPC to each pod. By default, the number of IP addresses available to assign to pods is based on the maximum number of elastic network interfaces and secondary IPs per interface that can be attached to an EC2 instance type.

파드는 쿠버네티스에서 생성되고 관리될 수 있는 컴퓨팅의 가장 작은 배포 단위이다. 하나의 파드는 쿠버네티스 클러스터 내에서 통신하기 위한 고유한 IP 주소가 필요하다(호스트 네트워킹 파드는 제외). Amazon Elastic Kubernetes Service (EKS)는 기본적으로 VPC 컨테이너 네트워킹 인터페이스(CNI) 플러그인을 실행하고, EC2 인스턴스에서 네트워크 인터페이스와 IP 주소를 관리하여 Pod에 IP 주소를 할당하게 된다. VPC CNI 플러그인은 AWS에서 동작하는 쿠버네티스 클러스터 내 컨테이너 네트워킹이 높은 성능과 낮은 지연을 제공할 수 있도록 EC2 네트워킹과 직접적으로 통합한다. 이 플러그인은 각 파드에 클러스터가 속한 VPC로부터 하나의 IP 주소를 받아 할당한다. 기본적으로 파드에 할당할 수 있는 IP 주소의 수는 EC2 인스턴스 유형에 연결할 수 있는 인터페이스당 보조 IP와 ENI의 최대 수를 기반으로 한다.

With prefix assignment mode, the maximum number of elastic network interfaces per instance type remains the same, but you can now configure Amazon VPC CNI to assign /28 (16 IP addresses) IPv4 address prefixes, instead of assigning individual IPv4 addresses to network interfaces. The pods are assigned an IPv4 address from the prefix assigned to the ENI.

접두사 할당 모드를 사용하면 인스턴스 유형 별 ENI의 최대수가 유지되는건 동일하지만 이제 네트워크 인터페이스에 개별 IPv4 주소를 할당하는 대신에 /28 (16개의 IP주소) IPv4 주소 접두사로 Amazon VPC CNI를 설정할 수 있다. 파드들은 ENI에 할당된 접두사로부터 하나의 IPv4 주소를 할당 받는다.

How it works

동작 원리

Amazon VPC CNI is deployed on worker nodes as a Kubernetes Daemonset with the name aws-node. The plugin consists of two primary components: Local IP Address Management (L-IPAM) and CNI plugin.

Amazon VPC CNI는 aws-node라는 이름의 쿠버네티스 데몬셋으로 워커 노드들에 배포된다. 이 플러그인은 로컬 IP 주소 관리(L-IPAM)와 CNI 플러그인 이 두가지 주요 컴포넌트들로 구성된다.

L-IPAM daemon (IPAMD) is responsible for creating and attaching network interfaces to worker nodes, assigning prefixes to network interfaces, and maintaining a warm pool of IP prefixes on each node for assignment to pods as they are scheduled.
The CNI plugin is responsible for wiring the host network (for example, configuring the network interfaces and virtual ethernet pairs) and adding the correct network interface to a pod’s namespace. The CNI plugin communicates with IPAMD via Remote Procedure Calls.

L-IPAM 데몬(IPAMD)는 네트워크 인터페이스를 생성해서 워커노드에 연결하고, 네트워크 인터페이스에 접두사를 할당하며, 스케쥴 된 파드에 IP를 할당하기 위해 각 노드에서 IP 접두사의 웜풀을 유지 관리한다.
CNI 플러그인은 호스트 네트워크 연결(예를들어 네트워크 인터페이스와 가상 이더넷 페어 구성)과 올바른 네트워크 인터페이스를 파드의 네임스페이스에 추가하는 역할을 한다. CNI 플러그인은 원격 프로시저 호출을 통해 IPAMD와 통신한다.

The following commands are examples of EC2 API calls that shows how VPC CNI interacts with the EC2 control plane.

아래 명령들은 EC2 컨트롤 플레인과 VPC CNI가 상호 작용하는 방법을 보여주는 EC2 API 호출의 예제이다.

Under the hood, IPAMD on worker node initialization, will request EC2 to assign a CIDR block prefix to the primary ENI.

내부적으로 워커 노드 초기화 중 IPAMD는 기본(primary) ENI에 접두사 CIDR 블럭을 할당하도록 EC2에 요청한다.

aws ec2 assign-private-ipv4-addresses --network-interface-id eni-38664474 --ipv4-prefix-count 1 --secondary-private-ip-address-count 0

aws ec2 assign-private-ipv4-addresses --network-interface-id eni-38664474 --ipv4-prefix-count 1 --secondary-private-ip-address-count 0

As IP needs increase more prefixes will be requested for the existing ENI.

IP 요구사항이 증가함에 따라 기존 ENI에 대해 더 많은 접두사가 요청된다.

aws ec2 assign-private-ipv4-addresses --network-interface-id eni-38664474 --ipv4-prefix-count 1 --secondary-private-ip-address-count 0

aws ec2 assign-private-ipv4-addresses --network-interface-id eni-38664474 --ipv4-prefix-count 1 --secondary-private-ip-address-count 0

When the number of prefixes for an ENI reaches the limit, secondary ENIs will be allocated.

ENI가 할당할 수 있는 접두사의 수가 한계에 다다르면, 보조 ENI가 할당된다.

aws ec2 create-network-interface --subnet-id subnet-9d4a7abc --ipv4-prefix-count 1

aws ec2 create-network-interface --subnet-id subnet-9d4a7abc --ipv4-prefix-count 1

IPAMD will maintain a mapping of the number of IPs consumed in a prefix. And the prefix will be unset when no IPs are used in the prefix.

IPAMD는 접두사 내에서 사용되는 IP 수의 매핑을 유지 관리한다. 접두사 내에서 IP가 사용되지 않으면 접두사가 해제된다.

aws ec2 unassign-ipv4-addresses --IPv4-prefix --network-interface-id eni-38664473

aws ec2 unassign-ipv4-addresses --IPv4-prefix --network-interface-id eni-38664473

Reserving space within a subnet specifically for prefixes can also be useful if there are special requirements to avoid conflicts with other AWS services such as EC2, or to minimize fragmentation within a subnet (we’ll discuss fragmentation in detail later in the blog). The Amazon VPC CNI plugin will automatically use the reserved prefix for IP address assignment. Note that the VPC CNI plugin does not automatically make the below API call, this is something you will need to perform out of band.

특히 접두사를 위해 특정 서브넷 내 공간을 예약하는 것은 EC2와 같은 다른 AWS 서비스와의 충돌을 피하거나 서브넷 내 단편화를 최소화하기 위한 특별한 요구사항이 있을 경우에 유용할 수 있다(블로그에서 추후 단편화에 대해 상세하게 논의할 예정). Amazon VPC CNI 플러그인은 IP 주소 할당에 예약된 접두사가 있으면 자동으로 사용한다. VPC CNI 플러그인은 아래 API 호출을 자동으로 수행하지 않기 때문에 직접 수행해야 한다.

aws ec2 create-subnet-address-reservation --subnet-id subnet-1234 --type prefix --cidr 69.89.31.0/24

aws ec2 create-subnet-address-reservation --subnet-id subnet-1234 --type prefix --cidr 69.89.31.0/24

When prefix mode is enabled, the following diagram illustrates the pod IP allocation process.

접두사 모드가 활성화된 경우 다음 다이어그램은 파드 IP 할당 과정을 보여준다.

Getting started

시작하기

In this section we will create EKS cluster and configure prefix assignment mode. Prefix assignment mode works with VPC CNI version 1.9.0 or later.

이 섹션에서 우리는 EKS를 생성하고 접두사 할당 모드를 구성할 것이다. 접두사 할당 모드는 VPC CNI 버전 1.9.0 이상에서 동작한다.

Create an EKS cluster

EKS 클러스터 생성하기

Use eksctl to create a cluster. Make sure you are using the latest version of eksctl for this example. Note we are instructing eksctl to automatically discover and install the latest version of VPC CNI through EKS add-ons. Copy the following configuration and save it to a file called cluster.yaml:

클러스터를 생성하기 위해 eksctl을 사용한다. 이 예제를 위해 eksctl의 최신 버전을 사용하고 있는지 확인해야한다. 우리는 EKS 애드온으로 VPC CNI의 최신 버전을 자동으로 검색하고 설치하기 위해 eksctl 명령을 실행하고 있다. 아래 구성을 복사하여 cluster.yaml 파일로 저장한다.

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: pd-cluster
  region: us-west-2

iam:
  withOIDC: true

addons:
  - name: vpc-cni
    version: latest

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: pd-cluster
  region: us-west-2

iam:
  withOIDC: true

addons:
  - name: vpc-cni
    version: latest

eksctl create cluster -f cluster.yml

eksctl create cluster -f cluster.yml

Enabling prefix assignment mode

접두사 할당 모드 활성화하기

Use the parameter ENABLE_PREFIX_DELEGATION to configure the VPC CNI plugin to assign prefixes to network interfaces.

ENABLE_PREFIX_DELEGATION 파라미터를 사용하여 네트워크 인터페이스에 접두사를 할당하도록 VPC CNI 플러그인을 구성한다.

kubectl set env daemonset aws-node -n kube-system ENABLE_PREFIX_DELEGATION=true

kubectl set env daemonset aws-node -n kube-system ENABLE_PREFIX_DELEGATION=true

Confirm if environment variable is set.

환경 변수가 설정되었는지 확인한다.

kubectl describe daemonset -n kube-system aws-node | grep ENABLE_PREFIX_DELEGATION

kubectl describe daemonset -n kube-system aws-node | grep ENABLE_PREFIX_DELEGATION

Scale faster, preserve IPv4 addresses

더 빠른 확장, IPv4 주소 보존

The Amazon VPC CNI supports setting WARM_PREFIX_TARGET or either/both WARM_IP_TARGET and MINIMUM_IP_TARGET. The recommended (and default value set in the installation manifest) configuration is to set WARM_PREFIX_TARGET to 1. You can set this value manually using the command below if your manifest does not already have it set.

Amazon VPC CNI는 WARM_PREFIX_TARGET 또는 WARM_IP_TARGET과 MINIMUM_IP_TARGET 중 어느 한쪽이나 둘 모두 설정을 지원한다. 권장되는(설치 매니페스트 내 기본 설정값인) 구성은 WARM_PREFIX_TARGET을 1로 설정하는 것이다. 사용 중인 매니페스트에 아직 설정되어 있지 않다면 아래 명령을 사용하여 수동으로 이 값을 설정할 수 있다.

kubectl set env ds aws-node -n kube-system WARM_PREFIX_TARGET=1

kubectl set env ds aws-node -n kube-system WARM_PREFIX_TARGET=1

When the Prefix IP mode is utilized, you cannot set the values for WARM_PREFIX_TARGET, WARM_IP_TARGET, or MINIMUM_IP_TARGET variables to zero. The settings for these parameters will be determined by use cases. If set, WARM_IP_TARGET and/or MINIMUM_IP_TARGET will take precedence over WARM_PREFIX_TARGET.

접두사 IP 모드가 사용되면 WARM_PREFIX_TARGET이나 WARM_IP_TARGET, MINIMUM_IP_TARGET 값을 0으로 설정할 수 없다. 이 파라미터들을 위한 설정은 사용 사례에 따라 결정된다. 설정하면 WARM_IP_TARGET과 MINIMUM_IP_TARGET이 WARM_PREFIX_TARGET 보다 우선 적용된다.

With the default setting, WARM_PREFIX_TARGET will allocate one additional complete (/28) prefix even if the existing prefix is used by only one pod. If the ENI does not have enough space to assign a prefix, a new ENI is generated. When a new ENI is created, IPAMD determines how many prefixes are required to maintain the WARM_PREFIX_TARGET, and then allocates those prefixes. New ENIs will be attached only when all prefixes assigned to existing ENIs have been exhausted.

기본 설정을 사용하면 WARM_PREFIX_TARGET은 기존 접두사가 단 하나의 파드에 사용되었을지라도 하나의 완전한 (/28) 접두사를 할당한다. ENI에 접두사를 할당하기 위한 충분한 공간을 가지고 있지 않다면, 새로운 ENI가 생성된다. 새로운 ENI가 생성되면 IPAMD는 WARM_PREFIX_TARGET을 유지하는 데 필요한 접두사 수를 결정한 다음 접두사를 할당한다. 새로운 ENI는 기존 ENI에 할당된 모든 접두사가 소진된 경우에만 추가된다.

In most cases, the recommended value of 1 for WARM_PREFIX_TARGET will provide a good mix of fast pod launch times while minimizing unused IP addresses assigned to the instance. This behavior is an improvement over the default WARM_ENI_TARGET of 1 with individual secondary IP address mode, where the number of IP addresses allocated to a node is dependent on instance type. For example with a c5.18xlarge, the maximum 50 IP addresses per ENI would be allocated by default. With prefix assignment, IP addresses are always allocated in /28 (16 IP address) chunks, independent of instance type.

대부분의 경우 WARM_PREFIX_TARGET에 대해 권장되는 값 1은 인스턴스에 할당된 미사용된 IP 주소를 최소화하면서 빠른 파드 시작시간을 적절하게 조합한다. 이 동작은 인스턴스 유형에 따라 노드에 IP 주소의 수가 다르게 할당되는 개별 보조 IP 주소 모드의 기본 WARM_ENI_TARGET 1을 개선한 것이다. 예를들어 c5.18xlarge에는 기본적으로 ENI 당 최대 50개의 IP 주소가 할당된다. 접두사 할당의 경우 IP 주소는 인스턴스 유형에 상관없이 항상 /28(16개의 IP 주소) 청크로 할당된다.

If you have a need to further conserve IPv4 addresses per node, with only a minor performance penalty to pod launch time, you can instead use WARM_IP_TARGET and MINIMUM_IP_TARGET settings, which override WARM_PREFIX_TARGET if set. By setting WARM_IP_TARGET to a value less than 16, you can prevent IPAMD from keeping one full free prefix attached.

노드당 IPv4 주소를 추가로 보존해야 하는 경우 포드 실행 시간에 약간의 성능 저하가 허용될 경우 WARM_IP_TARGET 및 MINIMUM_IP_TARGET 설정을 사용할 수 있다. 이 설정이 사용되면 WARM_PREFIX_TARGET을 재정의한다. WARM_IP_TARGET을 16보다 작은 값으로 설정하면 IPAMD가 하나의 완전한 free 접두사를 할당한 상태로 유지하는 것을 방지할 수 있다.

For a concrete example, let’s imagine a user that estimates with steady state, they will have 25 pods deployed per node. They are using m5.large instances, which support 3 ENIs and 9 prefixes per ENI. This user is operating in a VPC with limited IPv4 address space, and is willing to sacrifice some pod launch time performance to minimize unused IP addresses per node, so they decide to set WARM_IP_TARGET and MINIMUM_IP_TARGET instead of relying on the default recommended behavior of WARM_PREFIX_TARGET set to 1. They set MINIMUM_IP_TARGET to 25, because they expect at least 25 pods to be scheduled on every node as a baseline. They also set WARM_IP_TARGET to 5. Keep in mind, that with prefix assignment, IPAMD can only allocate IPv4 addresses in chunks of 16.

구체적인 예로, 숙련된 한 사용자를 가정해보자. 노드당 25개의 파드들이 배포될 예정이다. 그리고 3개의 ENI와 ENI당 9개의 접두사를 지원하는 m5.large 인스턴스를 사용하고 있다. 이 사용자는 IPv4 주소 공간이 제한된 하나의 VPC에서 운영하고 있고, 노드당 미사용 IP 주소를 최소화하기 위해 약간의 파드 실행 시간 성능을 희생할 의향이 있다. 그래서 기본으로 권장되는 WARM_PREFIX_TARGET 1 설정 대신에 WARM_IP_TARGET과 MINIMUM_IP_TARGET으로 설정하기로 했다. 기본적으로 모든 노드에 최소한 25개의 파드가 스케쥴 될 것을 예상했기 때문에 MINIMUM_IP_TARGET을 25로 설정했다. 또한 WARM_IP_TARGET을 5로 설정했다. 접두사 할당으로 IPAMD는 오직 16개의 청크로만 IPv4 주소를 할당할 수 있다.

When a node is started in their cluster, IPAMD will allocate 2 prefixes (32 IP address) to the primary ENI (this user is not using CNI custom networking, so only the primary ENI is used in this example) to satisfy the MINIMUM_IP_TARGET of 25. Now 25 pods get scheduled to the worker node, and no action is taken by IPAMD, because enough IP addresses are already allocated to meet the pod requirements, and there are 7 unused IP addresses, which satisfies the WARM_IP_TARGET requirement of 5. Next, their application experiences a spike in traffic, and an additional 12 pods are scheduled to the worker node to meet the demand. When the 3rd pod gets scheduled, IPAMD will calculate that their are only 4 free IP addresses left on the node, less than the WARM_IP_TARGET of 5, and will call EC2 to attach an additional prefix to the ENI. 7 of the 12 additional pods will immediately be given IP addresses from the existing pool, however, the last 5 pods may see a slight delay in starting as the additional prefix gets attached. Once this new steady state is reached, the worker node will still have only the primary ENI attached, with 3 prefixes, for a total of 48 IPs allocated, and 37 pods running on the node.

사용 중인 클러스터에서 노드가 시작될 때 IPAMD는 MINIMUM_IP_TARGET의 25를 만족하기 위해 주 ENI에 2개의 접두사(32개의 IP 주소)를 할당한다(이 사용자는 CNI 사용자 정의 네트워킹을 사용하고 있지 않기 때문에 이 예에서는 주 ENI만 사용된다). 이제 워커 노드로 25개의 파드가 스케쥴 될 것이고, 이미 충분한 IP 주소가 파드 요구사항을 만족하며 할당되었고 WARM_IP_TARGET의 요구사항인 5를 만족하는 7개의 미사용 IP 주소가 존재하기 때문에 IPAMD는 어떠한 조치도 수행하지 않는다. 이 후 사용자의 애플리케이션에 트래픽이 급증하고, 수요를 충족하기 위해 12개의 파드가 추가로 워커노드에 스케쥴된다. 3번째 파드가 스케쥴 될 때, IPAMD는 미사용 IP 주소의 수가 WARM_IP_TARGET의 5보다 적어진 4개가 남았다고 계산하고, EC2를 호출하여 ENI에 추가로 접두사를 할당한다. 12개의 추가될 파드 중 7개는 기존의 풀에서 즉시 IP 주소를 부여 받지만 나머지 5개의 파드는 추가될 접두사가 할당될 때까지 파드의 실행이 약간 지연될 수 있다. 정상 상태가 되면 워커 노드는 여전히 총 48개의 IP를 할당하기 위한 3개의 접두사를 가진 주 ENI만 가지고 있고, 노드에 37개의 파드가 실행중이다.

For more details on the use cases and examples of combination of these settings, see the VPC CNI documentation. Note, MINIMUM_IP_TARGET is not required to be set when using WARM_IP_TARGET, but is recommended if you have an expectation of the baseline pods per node to be scheduled in your cluster. If not set in the example above, IPAMD would have only allocated a single prefix on node launch, and an additional prefix would have been attached during the initial scheduling of the 25 pods. An additional network interface only needs to be attached if the maximum number of prefixes is reached on an existing ENI. In the case of this m5.large example, that’s 9 prefixes, or 144 IP addresses, so it’s highly unlikely you’ll need an additional ENI. There is no performance impact from running all pods through one ENI, because traffic still runs through a single underlying network card (only the recently launched p4d.24xl instance type supports more than one network card).

사용 사례와 설정 조합의 예에 대해 더 많은 내용은 VPC CNI 문서를 참고하길 권한다. WARM_IP_TARGET을 사용할 때 MINIMUM_IP_TARGET 설정은 필수가 아니지만 클러스터 내에 스케쥴 되는 노드 당 파드의 기준선을 정하고자 한다면 권장된다. 위 예제에서 설정하지 않은 경우 IPAMD는 노드가 실행될 때 단 하나의 접두사를 할당하고, 25개의 파드가 초기에 스케쥴 되는 동안 추가적인 접두사를 할당하게 된다. 추가적인 네트워크 인터페이스는 기존 ENI에 접두사가 최대치에 도달했을 때만 할당하면 된다. 이 m5.large 예에서는 9개의 접두사(144개의 IP 주소)를 갖기 때문에 추가적인 ENI가 필요할 가능성이 거의 없다. 트래픽이 여전히 하나의 네트워크 카드를 통해 실행되기 때문에 하나의 ENI를 통해 모든 파드가 실행되어도 성능상 영향이 없다(최근에 출시된 p4d.24xl 인스턴스 유형만 두개 이상의 네트워크 카드를 지원).

It’s more performant to use WARM_IP_TARGET and MINIMUM_IP_TARGET with prefix assignment mode, compared to setting these variables in the older individual secondary IP address networking mode. In that networking mode, you can get one at a time control over IP address allocation, but setting WARM_IP_TARGET can drastically increase the number of EC2 API calls needed to achieve that fine grained allocation, resulting in API throttling and delays attaching any new ENIs or IPs to worker nodes. Allocating an additional prefix to an existing ENI is a faster EC2 API operation compared to creating and attaching a new ENI to the instance, which gives you the better performance characteristics while being frugal with IPv4 address allocation. Attaching a prefix typically completes in under a second, where attaching a new ENI can take up to 10 seconds. For most use cases, IPAMD will only need a single ENI per worker node when running in prefix assignment mode. If you can afford (in the worst case) up to 15 unused IPs per node, we strongly recommend using the newer prefix assignment networking mode, and realizing the performance and efficiency gains that come with it.

기존의 개별 보조 IP 주소 네트워킹 모드의 설정과 비교해서 접두사 할당 모드로 WARM_IP_TARGET과 MINIMUM_IP_TARGET을 사용하는 것이 더 성능이 좋다. IP 주소 할당을 한번에 하나씩 제어할 수 있지만 WARM_IP_TARGET 설정은 세분화된 할당을 달성하기 위해 EC2 API 요청의 수를 크게 증가시킬 수 있기 때문에 API 스로틀링을 일으키고, 워커 노드에 새로운 ENI나 IP 할당에 지연을 발생시킨다. 기존 ENI에 추가 접두사를 할당하는 것은 인스턴스에 새로운 ENI를 생성하고 할당하는 것보다 더 빠른 EC2 API 작업으로, IPv4 주소 할당을 절약하면서 더 나은 성능을 제공한다. 접두사를 할당하는 것은 일반적으로 1초 이내에 완료되며 새로운 ENI를 할당하는 데에는 최대 10초가 걸릴 수 있다. 대부분의 사용사례에서 IPAMD는 접두사 할당 모드로 실행되는 워커 노드당 하나의 ENI만 할당해도 충분하다. (최악의 경우) 노드 당 15개의 미사용 IP를 수용할 수 있다면 새로운 접두사 할당 네트워킹 모드를 사용하고, 그에 따른 성능 및 효율성 향상을 실현하는 것이 좋다.

Calculate max pods

최대 파드 수 계산하기

When using VPC CNI, the maximum amount of pods that can be run per node has been dependent on VPC CNI settings. As part of this launch, we’ve updated EKS managed node groups to automatically calculate and set the recommended max pod value based on instance type and VPC CNI configuration values, as long as you are using at least VPC CNI version 1.9. The max pods value will be set on any newly created managed node groups, or node groups updated to a newer AMI version. This helps for both prefix assignment use cases, as well as CNI custom networking where you previously needed to manually set a lower max pods value.

VPC CNI를 사용할 때 노드당 실행할 수 있는 파드의 최대치는 VPC CNI 설정에 따라 다르다. 이번 출시의 일환으로 우리는 최소 VPC CNI 1.9 버전 이상 사용하는 한 인스턴스 유형과 VPC CNI 설정으로 기반으로 하여 추천되는 최대 파드 값을 자동으로 계산하고 설정하도록 EKS 관리형 노드 그룹을 업데이트 했었다. 최대 파드 수는 새롭게 생성된 노드 그룹이나 신규 AMI 버전으로 업데이트된 노드 그룹에 설정된다. 이는 접두사 할당 사용 사례와 더 낮은 최대 파드 값을 수동으로 설정해야했던 CNI 사용자 정의 네트워킹 모두에 도움이 된다.

Important: Managed node groups looks for the VPC CNI plugin installed in the kube-system namespace with daemonset name aws-node and container name aws-node. If you’ve customized your VPC CNI installation to run elsewhere, the managed node groups auto max pod calculation process will be skipped, and the default value built into the EKS optimized AMI will remain set.

중요 : 관리형 노드 그룹은 kube-system 네임스페이스에 aws-node 데몬셋과 aws-node 컨테이너로 된 VPC CNI 플러그인이 설치되었는지 찾는다. VPC CNI 설치를 다른 곳에 실행하도록 사용자 정의된 경우 관리형 노드 그룹은 자동 파드 최대치 계산 절차를 건너뛰게되고 EKS 최적화 AMI에 내장된 기본값은 설정된 상태로 유지된다.

If you are using self-managed node groups or a managed node group with a custom AMI ID, you must manually compute the recommended max pods value. Let us see how the max pod value is calculated in prefix assignment mode. Note that the total number of prefixes and private IP addresses is limited by the number of private IPs allowed on the instance. For example, ENIs on m5.large instance have a limit of 10 slots, one of which is needed for the ENI primary IP addresses, which leaves 9 slots that can be used for /28 prefixes.

자체 관리 노드 그룹이나 사용자 정의 된 AMI ID로 된 관리형 노드그룹을 사용하는 경우 권장되는 최대 파드 값을 수동으로 계산해야한다. 접두사 할당 모드에서는 최대 파드 수를 어떻게 계산하는지 살펴보자. 접두사의 최대수와 private IP 주소는 해당 인스턴스에서 허용되는 Private IP 수로 제한되는 것에 주의해야한다. 예를들어 m5.large 인스턴스의 ENI는 10개의 슬롯의 제한을 갖으며, 그 중 하나는 ENI의 기본 IP 주소를 위해 필요로 하고, /28 접두사에 사용할 수 있는 9개의 슬롯이 남는다.

You can use the following formula to determine the maximum number of pods you can deploy on a node when Prefix IP mode is enabled.

접두사 IP 모드가 활성화 되었을 때 하나의 노드에 배포할 수 있는 파드의 최대 수를 알아내기 위해 다음 공식을 사용할 수 있다.

(Number of network interfaces for the instance type × (the number of slots per network interface - 1)* 16)

(인스턴스 유형에 대한 네트워크 인터페이스 수 × (네트워크 인터페이스 당 슬롯 수 - 1)* 16)

For backwards compatibility reasons, the default max pods value per instance type in the EKS optimized Amazon Linux AMI will not change. When using prefix attachments with smaller instance types like the m5.large, you’re likely to exhaust the instance’s CPU and memory resources long before you exhaust its IP addresses, and max_pods might differ from the result of the above formula.

이전 버전과의 호환성을 위해 EKS 최적화 Amazon Linux AMI의 인스턴스 유형 별 기본 최대 파드 값은 변경되지 않는다. m5.large와 같이 더 작은 인스턴스 유형으로 접두사 할당을 사용할 때 IP 주소가 소진되기 훨씬 전에 인스턴스의 CPU와 메모리 리소스가 소진될 수 있으며, max_pods는 위 공식의 결과와 다를 수 있다.

To help simplify this process for self managed and managed node group custom AMI users, we’ve introduced a max-pod-calculator.sh script to find Amazon EKS recommend number of maximum pods based on your instance type and VPC CNI configuration settings.

자체 관리형 노드 그룹과 관리형 노드 그룹 사용자 지정 AMI 사용자를 위해 이 절차를 간소화하는 max-pod-calculator.sh 스크립트를 도입하여 사용 중인 인스턴스 유형과 VPC CNI 구성 설정에 기반한 Amazon EKS 추천 최대 파드 수를 계산할 수 있다.

Download max-pods-calculator.sh.

max-pods-calculator.sh 다운로드

curl -o max-pods-calculator.sh https://raw.githubusercontent.com/awslabs/amazon-eks-ami/master/files/max-pods-calculator.sh

curl -o max-pods-calculator.sh https://raw.githubusercontent.com/awslabs/amazon-eks-ami/master/files/max-pods-calculator.sh

Mark the script as executable on your computer.

스크립트를 컴퓨터에서 실행 가능하도록 설정한다.

chmod +x max-pods-calculator.sh

chmod +x max-pods-calculator.sh

Run the script to find max_pods value.

max_pods 값을 찾기 위해 스크립트를 실행한다.

./max-pods-calculator.sh --instance-type m5.large --cni-version 1.9.0 --cni-prefix-delegation-enabled

./max-pods-calculator.sh --instance-type m5.large --cni-version 1.9.0 --cni-prefix-delegation-enabled

The maximum number of pods recommended by Amazon EKS for a m5.large instance is:

m5.large 인스턴스에 대해 Amazon EKS에서 권장되는 최대 파드 수는 다음과 같다.

The actual number of IPv4 addresses that can be attached to an m5.large with prefix assignment enabled is actually much higher (3 ENIs × (9 prefixes per ENI)* 16 IPs per prefix) = 432 IPs. However, the max pods calculator script limits the return value to 110 based on Kubernetes scalability thresholds and recommended settings. If your instance type has greater than 30 vCPUs, this limit jumps to 250, a number based on internal EKS scalability team testing. Prefix assignment mode is especially relevant for users of CNI custom networking where the primary ENI is not used for pods. With prefix assignment, you can still attach at least 110 IPs on nearly every Nitro instance type, even without the primary ENI used for pods. In the example above, an m5.large with CNI custom networking and prefix assignment enabled can still be allocated 288 IPv4 addresses.

접두사 모드 활성화로 m5.large에 할당할 수 있는 IPv4 주소의 수는 실제로 훨씬 더 많다(3개의 ENI (ENI당 9개의 접두사) 접두사 당 16개의 IP = 432개의 IP). 하지만 최대 파드 수를 계산하는 스크립트는 쿠버네티스의 확장 한계와 권장되는 설정에 의해 110의 값을 반환하는 것으로 제한된다. 사용 중인 인스턴스 유형이 30개의 vCPUs 이상 되는 경우 제한은 내부 EKS 확장성 팀 테스트에 기반한 250개로 증가된다. 접두사 할당 모드는 특히 기본 ENI가 파드를 위해 사용되지 않는 CNI 사용자 정의 네트워킹을 사용하는 사용자와 관련이 있다. 접두사 할당을 사용하면 파드에 사용되는 기본 ENI 없이도 거의 모든 Nitro 인스턴스 유형에서 최소 110개의 IP를 사용할 수 있다. 위 예제에서 CNI 사용자 지정 네트워킹을 사용하면서 접두사 할당이 활성화된 m5.large는 여전히 288개의 IPv4 주소를 할당할 수 있다.

Create a managed node group

관리형 노드 그룹 생성하기

Prefix assignment mode is supported on AWS Nitro based EC2 instance types. Choose one of the Amazon EC2 Nitro Amazon Linux 2 instance type. This capability is not supported on Windows.

접두사 할당 모드는 AWS Nitro 기반 EC2 인스턴스 유형에서 지원된다. Amazon EC2 Nitro Amazon Linux 2 인스턴스 유형 중 하나를 선택한다. 이는 윈도우즈에서는 지원되지 않는다.

eksctl create nodegroup \
    --cluster pd-cluster \ 
    --region us-west-2 \ 
    --name pg-nodegroup \ 
    --node-type m5.large \
    --nodes 1

eksctl create nodegroup \
    --cluster pd-cluster \ 
    --region us-west-2 \ 
    --name pg-nodegroup \ 
    --node-type m5.large \
    --nodes 1

Deploy Sample Application

샘플 애플리케이션 배포하기

Let us now deploy a sample NGINX application with replica size 80, to demonstrate Prefix IP assignment.

이제 접두사 IP 할당을 시연하기 위해 replica 크기가 80개인 샘플 NGINX 애플리케이션을 배포해보자.

cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 80
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: public.ecr.aws/nginx/nginx:1.21
        ports:
        - containerPort: 80
EOF

cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 80
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: public.ecr.aws/nginx/nginx:1.21
        ports:
        - containerPort: 80
EOF

List the pods created by the deployment:

deployment로 생성된 파드 리스트를 확인한다.

kubectl get pods -l app=nginx

kubectl get pods -l app=nginx

You may now access Amazon EC2 in the Amazon Management Console and navigate to the instance pd-cluster-pg-nodegroup-Node. Under the networking tab, you’ll notice IPv4 prefixes assigned to ENI. Prefixes and ENI counts may vary according to the replicas/pods selected. In this case, the replica count is 80 with a minimum of five prefixes (/28).

이제 Amazon 관리 콘솔에서 Amazon EC2에 접근하여 pd-cluster-pg-nodegroup-Node 인스턴스로 이동할 수 있다. 네트워킹 탭에서 ENI에 할당된 IPv4 접두사를 확인할 수 있다. 접두사와 ENI의 수는 설정된 replica와 pod에 따라 다를 수 있다. 이 예제에서는 replica 수는 최소 5개의 접두사(/28)가 있는 80개다.

Let’s scale deployment to observe a new in IPv4 Prefix added to ENI.

ENI에 추가된 새로운 IPv4 접두사를 관찰하기 위해 deployment를 확장해보자.

kubectl scale deployment.v1.apps/nginx-deployment --replicas=110

kubectl scale deployment.v1.apps/nginx-deployment --replicas=110

In the image above you will see new IPv4 prefix attached to the ENI.

위 그림에서 ENI에 할당된 새로운 IPv4 접두사를 볼 수 있다.

Key considerations

주요 고려사항

Subnet fragmentation

서브넷 세분화

When EC2 allocates a /28 IPv4 prefix to an ENI, it has to be a contiguous block of IP addresses from your subnet. If the subnet that the prefix is generated from is fragmented (highly used subnet with scattered secondary IP addresses), the prefix attachment may fail, and you will see the following error message in VPC CNI logs:

EC2가 ENI에 /28 IPv4 접두사를 할당할 때 서브넷에서 IP 주소가 연속된 블럭이어야 한다. 접두사가 생성된 서브넷이 세분화 된 경우(보조 IP 주소가 분산되어 많이 사용되어지는 서브넷) 접두사 할당은 실패할 수 있으며 VPC CNI 로그에서 아래 에러 메시지를 보게될 것이다.

failed to allocate a private IP/Prefix address: InsufficientCidrBlocks: The specified subnet does not have enough free cidr blocks to satisfy the request

failed to allocate a private IP/Prefix address: InsufficientCidrBlocks: The specified subnet does not have enough free cidr blocks to satisfy the request

To avoid fragmentation and have sufficient contiguous space for creating prefixes, you can use VPC Subnet CIDR reservations, a feature introduced recently along with prefix assignment. With this feature, you can reserve IP space within a subnet for exclusive use by prefixes. EC2 will not use the space to assign individual IP addresses to ENIs or EC2 instances, avoiding fragmentation of this space. Once you create a reservation, the VPC CNI plugin will call EC2 APIs to assign prefixes that are automatically allocated from the reserved space.

단편화를 피하고 접두사 생성을 위한 충분한 연속 공간을 확보하기 위해 최근 접두사 할당과 함께 도입된 기능인 VPC 서브넷 CIDR 예약을 사용할 수 있다. 이 기능으로 접두사가 독점적으로 사용하도록 서브넷 내의 IP 공간을 예약할 수 있다. EC2는 이 공간을 사용하여 개별 IP 주소를 ENI나 EC2 인스턴스에 할당하지 않으므로 이 공간의 단편화를 방지한다. 예약을 생성하고 나면 VPC CNI 플러그인은 EC2 API를 호출하여 예약된 공간에서 자동으로 할당되는 접두사를 할당한다.

You can create a prefix reservation even if some of the space in the reservation range is currently taken up secondary IP addresses. Once IP addresses from that space are released, EC2 won’t reassign them. As a best practice, it’s recommended to create a new subnet, reserve space for prefixes, the enable prefix assignment with VPC CNI for worker nodes running in that subnet. If the new subnet is dedicated only for pods running in your EKS cluster with VPC CNI prefix assignment enabled, then you can skip the prefix reservation step.

예약 범위의 일부 공간이 현재 보조 IP 주소를 차지하는 경우에도 접두사 예약을 생성할 수 있다. 해당 공간의 IP 주소가 해제되면 EC2는 이를 재할당하지 않는다. 모범 사례로 새 서브넷을 만들고 접두사를 위한 공간을 예약하고 해당 서브넷에서 실행 중인 워커 노드에 대해 VPC CNI를 사용하여 접두사 할당을 활성화하는 것이 좋다. 새 서브넷이 VPC CNI 접두사 할당이 활성화된 EKS 클러스터에서 실행중인 파드 전용인 경우 접두사 예약 단계를 건너뛸 수 있다.

Upgrade/downgrade behavior

업그레이드/다운그레이드 동작

Prefix mode works with VPC CNI version 1.9.0 and later. Downgrading of Amazon VPC CNI add-on to a version lower than 1.9.0 must be avoided once the prefix mode is enabled and prefixes are assigned to ENIs. You must delete and recreate nodes if you decide to downgrade the VPC CNI.

접두사 모드는 VPC CNI 버전 1.9.0 이상에서 동작한다. 접두사 모드가 활성화되고 ENI에 접두사가 할당되었다면 Amazon VPC CNI 에드온을 1.9.0 미만의 버전으로 다운그레이드 하지 않아야 한다. VPC CNI를 다운그레이드 해야한다면 노드를 지우고 재생성 해야한다.

It is highly recommended you create new nodes and node group to increase the amount of available IP addresses. And, cordon and drain all the existing nodes to safely evict all of your existing pods. Pods on new nodes will be assigned IP from prefix assigned to ENI. After you confirm pods running, you can delete old nodes and node groups.

사용 가능한 IP 주소의 수를 늘리려면 새로운 노드와 노드 그룹을 생성할 것이 좋다. 그리고 모든 기존 노드를 cordon과 drain으로 기존 모든 파드들을 안전하게 제거한다. 새로운 노드의 파드들은 ENI에 할당된 접두사로부터 IP를 할당받게 된다. 이 후 파드들이 실행 중인 것을 확인하고나면 기존 노드와 노드 그룹은 제거할 수 있다.

Resource requests

리소스 요청

Another important consideration to keep in mind is setting pod resource requests and limits. As a best practice, you should always be setting at least pod resource requests in your workload specifications. If you don’t set these values, you may see resource contention on nodes given the increase in pods that can be scheduled with prefix assignment enabled. With prefix assignment, IPv4 addresses are no longer a pods per node limiting factor when using the VPC CNI plugin.

명심해야할 또다른 중요한 고려사항은 파드 리소스의 request와 limit을 설정하는 것이다. 모범사례에 따라 항상 워크로드 사양에서 최소한 파드 리소스 request 만큼은 설정해야 한다. 이 값을 설정하지 않으면 접두사 할당이 활성화 된 상태로 스케쥴 될 수 있는 파드가 증가하면 노드에서 리소스 경합이 발생할 수 있다. 접두사 할당을 사용하면 IPv4 주소들은 VPC CNI 플러그인을 사용할 때 더이상 노드당 파드 제한 요소가 아니다.

Comparison to security groups for pods

파드에 대한 보안그룹과 비교

Prefix assignment is a good networking mode choice if your workload requirements can be met with a node level security group configuration. With prefix assignment, multiple pods are shared across an ENI, and the security groups associated with that ENI. If you have requirements where each pod needs distinct security groups, then you can leverage the security groups for pods feature. You need to choose your networking mode/strategy based on your use case and requirements. If pod launch time and pod density on smaller instance types are important to you, and you can work with node level security groups, then use prefix assignment. If you have security requirements where pods need a specific set of security groups, then apply a SecurityGroupPolicy custom resource and pods will be allocated dedicated branch network interfaces with those security groups applied. A pod is wired up to the primary IP address of the branch interface. Currently, no secondary IP addresses or prefixes are allocated to branch interfaces.

접두사 할당은 워크로드 요구사항이 노드 수준 보안 그룹 구성으로 충족될 수 있는 경우 좋은 네트워킹 모드의 선택지이다. 접두사 할당을 사용하면 여러 파드들이 ENI 및 해당 ENI와 연결된 보안그룹에서 공유된다. 각 파드들이 고유한 보안 그룹이 필요한 요구사항이 있는 경우 파드 기능에 대한 보안그룹을 활용할 수 있다. 사용 사례와 요구사항에 대한 네트워크 모드나 전략을 선택할 필요가 있다. 더 작은 인스턴스 유형의 파드 실행 시간과 파드 밀도가 중요하고, 노드 수준의 보안 그룹으로 작업할 수 있는 경우 접두사 할당을 사용한다. 파드에 특정 보안그룹의 세트가 필요한 보안 요구사항이 있는 경우 사용자 정의 리소스인 SecurityGroupPolicy를 적용하면 해당 보안 그룹이 적용된 전용 브랜치 네트워크 인터페이스가 파드에 할당된다. 하나의 파드는 브랜치 인터페이스의 주 IP 주소에 연결된다. 현재 보조 IP 주소나 접두사는 브랜치 인터페이스에 할당되지 않는다.

The max pods calculator script is not relevant for security groups for pods, because the max number per node is instead limited through Kubernetes extended resources, where the number of branch network interfaces is advertised as an extended resource, and any pod that matches a SecurityGroupPolicy is injected by a webhook for a branch interface resource request. When the pod with a branch interface requirement is scheduled, then a separate component, the VPC resource controller (running on the EKS control plane), calls an EC2 API to create and attach a branch interface to the worker node. The max number of branch interfaces per instance type can be found in the EKS documentation, and more details on this behavior can be found in the launch blog. Note that pods allocated branch network interfaces and pods allocated an IP from an ENI prefix can co-exist on the same node.

노드당 최대 수는 쿠버네티스의 확장된 리소스를 통해 제한되기 때문에 최대 파드 계산기 스크립트는 파드의 보안그룹과 관련이 없다. 브랜치 네트워크 인터페이스의 수가 확장된 리소스만큼 보급되고 SecurityGroupPolicy에 해당되는 모든 파드는 브랜치 네트워크 리소스 요청에 대한 웹훅이 주입된다. 브랜치 네트워크 요구사항이 있는 파드가 스케쥴되면 별도의 구성요소인 VPC 리소스 컨트롤러(EKS 컨트롤 플레인에서 실행 중인)가 EC2 API를 호출하여 분기 인터페이스를 생성하고 워커 노드에 연결한다. 인스턴스 유형 별 브랜치 네트워크의 최대수는 EKS 문서에서 찾을 수 있고, 이 동작에 대한 더 자세한 내용은 출시 블로그에서 찾을 수 있다. 브랜치 네트워크 인터페이스가 할당된 파드와 ENI 접두사에서 IP가 할당된 파드는 동일한 노드에서 공존할 수 있다.

At the moment, there is no option to get the best of both worlds – pod level security groups with high density on small instance types and fast pod launch time. One potential idea is described in this containers roadmap issue, which we are researching. However, that becomes a tricky scheduling problem.

현재로서는 작은 인스턴스 유형의 고밀도 파드 수준 보안그룹과 빠른 파드 실행 시간이라는 두가지 장점을 모두 누릴 수 있는 옵션은 없다. 우리가 연구하고 있는 이 컨테이너 로드맵 문제에 한가지 잠재적인 아이디어가 설명되어 있지만 이는 까다로운 스케쥴링 문제가 된다.

Cleanup

정리하기

To avoid future costs, delete the Amazon EKS cluster created for this exercise. This action, in addition to deleting the cluster, will also delete the node group.

비용 낭비를 피하기 위해 이 예제를 위해 생성한 Amazon EKS 클러스터를 제거한다. 아래 명령은 클러스터를 제거하는 것에 더해 노드 그룹 또한 제거한다.

eksctl delete cluster --name pd-cluster

eksctl delete cluster --name pd-cluster

Conclusion

결론

In the container roadmap issues 138 and 1557, you asked us for more details on prefix assignment and the various configuration options possible with the Amazon VPC CNI plugin. In this post, we have discussed in detail prefix assignment mode, summarized VPC CNI workflows, and described installation and setup choices. The key considerations and use cases section provide guidance on using prefix assignment mode.

컨테이너 로드맵 이슈 138과 1557에서 여러분들이 접두사 할당과 Amazon VPC CNI 플러그인으로 가능한 다양한 구성 옵션에 대한 자세한 내용을 요청했었다. 이 게시글에서 우리는 접두사 할당 모드에 대해 자세히 설명하고, VPC CNI 워크플로우를 요약하고, 설치와 설정 선택에 대해 설명했다. 주요 고려사항과 사용 사례 섹션에서는 접두사 할당 모드에 대한 가이드를 제공했다.

We covered how to use prefix assignment mode on Amazon EKS to increase pod density. Additionally, we are constantly learning from your use of prefix assignment and planning on upgrades to VPC CNI’s graceful retry methods when dealing with fragmented subnets to address prefix assignment failures.

우리는 파드 밀도를 증가시키기 위한 Amazon EKS에서 접두사 할당 모드를 사용하는 방법에 대해 다뤘다. 또한 접두사 할당 실패를 해결하기 위해 세분화된 서브넷을 처리할 때 접두사 할당을 사용하고 VPC CNI의 우아한 재시도 방법으로 업그레이드할 계획을 지속적으로 배우고 있다.

You can visit EC2 user guide to learn more about assigning prefixes to Amazon EC2 network interfaces and visit the VPC guide for Subnet CIDR reservations. Please visit the Amazon EKS user guide for prefix assignment installation instructions and for any recent product improvements. You may provide feedback on the VPC CNI plugin, evaluate our roadmaps, and suggest new features on the AWS Containers Roadmap, hosted on GitHub.

Amazon EC2 네트워크 인터페이스에 접두사를 할당하는 것에 대해 더 자세히 알아보려면 EC2 사용자 가이드를 방문하고, 서브넷 CIDR 예약에 대한 VPC 가이드에도 방문할 수 있다. 접두사 할당 설치 가이드와 제품의 최근 개선사항들에 대해서는 Amazon EKS 사용자 가이드를 참조할 수 있다. GitHub에서 호스팅되는 AWS 컨테이너 로드맵에서 VPC CNI 플러그인에 대한 피드백을 제공하고, 로드맵을 평가하고, 새로운 기능을 제안할 수 있다.

YongTrans

Loading contents...