Skip to main content

Issues faced while launching ECS Tasks (pulling image from an ECR repo) from a private subnet

I created a private subnet. I created an ECR repo with Private visibility and pushed an image into it.

Then, I created an ECS Cluster.

I added a Task Definition with No Task Role and a Task Execution Role (ecs-tasks.amazonaws.com can assuem this role), with AmazonECSTaskExecutionRolePolicy permission policy. The container in the task definition has private repository authentication enabled.

Then, I created a task as follows:

aws ecs run-task --task-definition <task-definition-name> --cluster <ecs-cluster-name> --network-configuration '{"awsvpcConfiguration": {"subnets":["<subnet-id>"], "securityGroups": ["<sg-id>"], "assignPublicIp": "DISABLED" }}' --count 1 --launch-type FARGATE

The task did not start and stopped with the following error:

ERROR: ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve ecr registry auth: service call has been retried 3 time(s): RequestError: send request failed caused by: Post "https://api.ecr.<region>.amazonaws.com/": dial tcp <Public-IP>:443: i/o timeout

FIX: I added the following VPC Interface endpoints & linked Security Group:

com.amazonaws.<region>.ecr.api

com.amazonaws.<region>.ecr.dkr

The Security Group associated with these VPC endpoints has the following rules:

Inbound - HTTPS from VPC CIDR

Outbound - HTTPS from Anywhere

Running the task again, created the following error:

CannotPullContainerError: inspect image has been retried 5 time(s): failed to resolve ref "<ECR-Image-URI>": failed to do request: Head "<ECR-URI>/v2/<name>/manifests/latest": dial tcp: lookup <ECR-URI> on <Private-IP>:53: no such host

FIX: The same private subnet (where ECS task was launched) is linked to these VPC endpoints. 

ERROR: CannotPullContainerError: ref pull has been retried 5 time(s): failed to copy: httpReadSeeker: failed open: failed to do request: Get "https://prod-<region>-starport-layer-bucket.s3.<region>.amazonaws.com/

FIX: Add the following VPC Gateway endpoint:

com.amazonaws.<region>.s3

Link it to a Route Table (my RT has no NAT/Internet GW).

ERROR: ResourceInitializationError: failed to validate logger args: : signal: killed

FIX: I added the following VPC Interface endpoint & linked the same Security Group above:

com.amazonaws.<region>.logs

The ECS Task started running after this.


Comments

Popular posts from this blog

How to install/upgrade/downgrade kubectl in Linux (Ubuntu)?

To install the latest version: curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl" curl -LO "https://dl.k8s.io/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl.sha256" echo "$(<kubectl.sha256) kubectl" | sha256sum --check sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl kubectl version --client kubectl version To install a specific (v1.19.0) version: curl -LO "https://dl.k8s.io/release/v1.19.0/bin/linux/amd64/kubectl" curl -LO "https://dl.k8s.io/v1.19.0/bin/linux/amd64/kubectl.sha256" echo "$(<kubectl.sha256) kubectl" | sha256sum --check sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl kubectl version --client kubectl version This will install kubectl client. Run minikube start to install kubectl server.

Application Load Balancer (ALB)

The ALB spans all subnets in a VPC i.e., it is not inside a subnet but VPC. ALB is bound to Target Groups (TGs). TGs are bound to subnets.