AWS EKS NodeGroup "Create failed": Instances failed to join the kubernetes cluster AWS EKS NodeGroup "Create failed": Instances failed to join the kubernetes cluster kubernetes kubernetes

AWS EKS NodeGroup "Create failed": Instances failed to join the kubernetes cluster


Adding another reason to the list:

In my case the Nodes were running in a private subnets and I haven't configured a private endpoint under API server endpoint access.

After the update the nodes groups weren't updated automatically so I had to recreate them.


I noticed there was no answer here, but about 2k visits to this question over the last six months. There seems to be a number of reasons why you could be seeing these failures. To regurgitate the AWS documentation found here:https://docs.aws.amazon.com/eks/latest/userguide/troubleshooting.html

  • The aws-auth-cm.yaml file does not have the correct IAM role ARN foryour nodes. Ensure that the node IAM role ARN (not the instanceprofile ARN) is specified in your aws-auth-cm.yaml file. For moreinformation, see Launching self-managed Amazon Linux nodes.

  • The ClusterName in your node AWS CloudFormation template does notexactly match the name of the cluster you want your nodes to join.Passing an incorrect value to this field results in an incorrectconfiguration of the node's /var/lib/kubelet/kubeconfig file, and thenodes will not join the cluster.

  • The node is not tagged as being owned by the cluster. Your nodes musthave the following tag applied to them, where isreplaced with the name of your cluster.

    Key Value kubernetes.io/cluster/<cluster-name>     Value owned
  • The nodes may not be able to access the cluster using a public IPaddress. Ensure that nodes deployed in public subnets are assigned apublic IP address. If not, you can associate an Elastic IP address toa node after it's launched. For more information, see Associating anElastic IP address with a running instance or network interface. Ifthe public subnet is not set to automatically assign public IPaddresses to instances deployed to it, then we recommend enabling thatsetting. For more information, see Modifying the public IPv4addressing attribute for your subnet. If the node is deployed to aprivate subnet, then the subnet must have a route to a NAT gatewaythat has a public IP address assigned to it.

  • The STS endpoint for the Region that you're deploying the nodes to isnot enabled for your account. To enable the region, see Activating anddeactivating AWS STS in an AWS Region.

  • The worker node does not have a private DNS entry, resulting in thekubelet log containing a node "" not found error. Ensure that the VPCwhere the worker node is created has values set for domain-name anddomain-name-servers as Options in a DHCP options set. The defaultvalues are domain-name:.compute.internal anddomain-name-servers:AmazonProvidedDNS. For more information, see DHCPoptions sets in the Amazon VPC User Guide.

I myself had an issue with the tagging where I needed an uppercase letter. In reality, if you can use another avenue to deploy your EKS cluster I would recommend it (eksctl, aws cli, terraform even).


Firstly, I had the NAT Gateway in my private subnet. Then I moved the NAT gateway back to public subnet which worked fine.

Terraform code is as follows:

resource "aws_internet_gateway" "gw" {  vpc_id = aws_vpc.dev-vpc.id  tags = {    Name = "dev-IG"  }}resource "aws_eip" "lb" {  depends_on    = [aws_internet_gateway.gw]  vpc           = true}resource "aws_nat_gateway" "natgw" {  allocation_id = aws_eip.lb.id  subnet_id     = aws_subnet.dev-public-subnet.id  depends_on = [aws_internet_gateway.gw]  tags = {    Name = "gw NAT"  }}