Skip to main content

Issues faced while launching ECS Tasks (pulling image from an ECR repo) from a private subnet

I created a private subnet. I created an ECR repo with Private visibility and pushed an image into it.

Then, I created an ECS Cluster.

I added a Task Definition with No Task Role and a Task Execution Role (ecs-tasks.amazonaws.com can assuem this role), with AmazonECSTaskExecutionRolePolicy permission policy. The container in the task definition has private repository authentication enabled.

Then, I created a task as follows:

aws ecs run-task --task-definition <task-definition-name> --cluster <ecs-cluster-name> --network-configuration '{"awsvpcConfiguration": {"subnets":["<subnet-id>"], "securityGroups": ["<sg-id>"], "assignPublicIp": "DISABLED" }}' --count 1 --launch-type FARGATE

The task did not start and stopped with the following error:

ERROR: ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve ecr registry auth: service call has been retried 3 time(s): RequestError: send request failed caused by: Post "https://api.ecr.<region>.amazonaws.com/": dial tcp <Public-IP>:443: i/o timeout

FIX: I added the following VPC Interface endpoints & linked Security Group:

com.amazonaws.<region>.ecr.api

com.amazonaws.<region>.ecr.dkr

The Security Group associated with these VPC endpoints has the following rules:

Inbound - HTTPS from VPC CIDR

Outbound - HTTPS from Anywhere

Running the task again, created the following error:

CannotPullContainerError: inspect image has been retried 5 time(s): failed to resolve ref "<ECR-Image-URI>": failed to do request: Head "<ECR-URI>/v2/<name>/manifests/latest": dial tcp: lookup <ECR-URI> on <Private-IP>:53: no such host

FIX: The same private subnet (where ECS task was launched) is linked to these VPC endpoints. 

ERROR: CannotPullContainerError: ref pull has been retried 5 time(s): failed to copy: httpReadSeeker: failed open: failed to do request: Get "https://prod-<region>-starport-layer-bucket.s3.<region>.amazonaws.com/

FIX: Add the following VPC Gateway endpoint:

com.amazonaws.<region>.s3

Link it to a Route Table (my RT has no NAT/Internet GW).

ERROR: ResourceInitializationError: failed to validate logger args: : signal: killed

FIX: I added the following VPC Interface endpoint & linked the same Security Group above:

com.amazonaws.<region>.logs

The ECS Task started running after this.


Comments

Popular posts from this blog

AWS Route53 - Private Hosted Zone

AWS - Error - An error occurred (ExpiredToken) when calling the DescribeStacks operation: The security token included in the request is expired

Error:   An error occurred (ExpiredToken) when calling the DescribeStacks operation: The security token included in the request is expired. Reason: It occurred when I ran a MAKE command with a profile having expired token (security credentials) Fix: Generate new security credentials (aws sts assume-role) and run the command again

High availability (Multi-AZ) for Amazon RDS

There is something called failover technology in Amazon. AWS RDS's Multi-AZ deployment uses this technology. If you enable Multi-AZ for an RDS DB, say MySQL DB, RDS automatically creates a standby replica in a different AZ. If the primary DB instance is in AZ-1A, then RDS creates a standby replica in AZ-1B (for example). Suppose I add a new row to a table in the primary DB, then the same row is added, almost in the same time, in the standby replica. This is called as synchronous replication . Thus, standby replicas are useful during DB instance failure/ AZ disruption . How? Because, there is no need to create a backup later because the backup has already been created. This gives high availability during planned system maintenance. Normal backup  operation - I/O activities are blocked in the primary database  Automated backup operation (standby replica) - I/O activities are not blocked This standby replica is not similar to read replica (which is used for disaster recovery). S...