Pimp your EKS cluster on Fargate

Introduction
Prerequisites
Cluster creation
ALB Ingress Controller on Amazon EKS
Deploy a sample application
Route53 Integration
Horizontal Pod Autoscaler
- Test the autoscaler
Monitoring with Prometheus and Grafana
- Prometheus
- Grafana
Logging on CloudWatch Logs
AWS Secrets Manager Integration

Introduction

Setting up a Kubernetes cluster is not an easy task. Amazon EKS greatly simplify the creation and the management of a Kubernetes (k8s) cluster, and used together with AWS Fargate removes the need to provision and manage servers part of the cluster.

The purpose of this post is to guide you step by step in the creation of the cluster and the configuration of the following components:

OIDC Provider (k8s services accounts integrated with AWS IAM)
ALB Ingress Controller (k8s ingresses integrated with AWS ALB)
External DNS (k8s services and ingresses integrated with Amazon Route53)
Horizontal Pod Autoscaler (automatically scales the number of pods according to CPU usage)
Pod Monitoring for EKS Fargate with Prometheus and Grafana (monitoring and alerting for k8s infrastructure and applications)
Pod Logging for EKS Fargate (logging integration for k8s applications with Amazon CloudWatch Logs)

Prerequisites

In order to follow this guide you will need the following tools:

kubectl - Kubernetes CLI tool
eksctl - a simple CLI tool for creating clusters on EKS
aws-iam-authenticator - a tool to use AWS IAM credentials to authenticate to a Kubernetes cluster
helm - package manager for Kubernetes
jq - a lightweight and flexible command-line JSON processor
[OPTIONAL] k9s - a terminal UI to interact with your Kubernetes clusters

Once installed helm, you need to add a chart repository as follows helm:

helm repo add stable https://kubernetes-charts.storage.googleapis.com
helm repo update

Cluster creation

Let’s start creating our cluster. To run our containers on AWS Fargate, we need to create one or more Fargate profiles and define which namespaces are going to be deployed with these profiles.

Open a file named cluster.yaml where we define, the name of the cluster, the region where our cluster will be deployed, and the Fargate profiles of our choice.

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: eks-cluster
  region: eu-west-1
fargateProfiles:
  - name: default
    selectors:
      - namespace: default
  - name: kube-system
    selectors:
      - namespace: kube-system

To create our cluster and the profiles we use the eksctl tool as follows:

eksctl create cluster -f cluster.yaml

By default, eksctl will create a new VPC and a number of subnets used by your Fargate containers.

In some scenarios, you might want to use your own VPC and subnets which you have already configured:

You need to have at least 2 public and 2 private subnets
Tag the VPC and the subnets in your VPC containing the subnets you specify in the following way so that Kubernetes can discover it
Key: The <cluster-name> value matches your Amazon EKS cluster as
defined in cluster.yaml
Value: The shared value allows more than one cluster to use this
subnet.
Tag the private subnets in the following way so that Kubernetes knows it can use the subnets for internal load balancers.
Tag the public subnets in your VPC so that Kubernetes knows to use only those subnets for external load balancers
Add the following block to cluster.yaml, where you specify the ids of your public and private subnets in the respective AZ, before running the eksctl command:

vpc:
  subnets:
    private:
      eu-west-1a: { id: subnet-0ff156e0c4a6d300c }
      eu-west-1b: { id: subnet-0549cdab573695c03 }
    public:
      eu-west-1a: { id: subnet-0ff156e0c4a6d300b }
      eu-west-1b: { id: subnet-0549cdab573695c43 }

ALB Ingress Controller on Amazon EKS

In order to trigger the creation of an Application Load Balancer (currently the only support by EKS on Fargate) and the necessary supporting AWS resources whenever a specific Ingress resource is created on the cluster, we need to configure the ALB Ingress Controller for Kubernetes.

Create the following iam-policy.json file:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "acm:DescribeCertificate",
        "acm:ListCertificates",
        "acm:GetCertificate"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "ec2:AuthorizeSecurityGroupIngress",
        "ec2:CreateSecurityGroup",
        "ec2:CreateTags",
        "ec2:DeleteTags",
        "ec2:DeleteSecurityGroup",
        "ec2:DescribeAccountAttributes",
        "ec2:DescribeAddresses",
        "ec2:DescribeInstances",
        "ec2:DescribeInstanceStatus",
        "ec2:DescribeInternetGateways",
        "ec2:DescribeNetworkInterfaces",
        "ec2:DescribeSecurityGroups",
        "ec2:DescribeSubnets",
        "ec2:DescribeTags",
        "ec2:DescribeVpcs",
        "ec2:ModifyInstanceAttribute",
        "ec2:ModifyNetworkInterfaceAttribute",
        "ec2:RevokeSecurityGroupIngress"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "elasticloadbalancing:AddListenerCertificates",
        "elasticloadbalancing:AddTags",
        "elasticloadbalancing:CreateListener",
        "elasticloadbalancing:CreateLoadBalancer",
        "elasticloadbalancing:CreateRule",
        "elasticloadbalancing:CreateTargetGroup",
        "elasticloadbalancing:DeleteListener",
        "elasticloadbalancing:DeleteLoadBalancer",
        "elasticloadbalancing:DeleteRule",
        "elasticloadbalancing:DeleteTargetGroup",
        "elasticloadbalancing:DeregisterTargets",
        "elasticloadbalancing:DescribeListenerCertificates",
        "elasticloadbalancing:DescribeListeners",
        "elasticloadbalancing:DescribeLoadBalancers",
        "elasticloadbalancing:DescribeLoadBalancerAttributes",
        "elasticloadbalancing:DescribeRules",
        "elasticloadbalancing:DescribeSSLPolicies",
        "elasticloadbalancing:DescribeTags",
        "elasticloadbalancing:DescribeTargetGroups",
        "elasticloadbalancing:DescribeTargetGroupAttributes",
        "elasticloadbalancing:DescribeTargetHealth",
        "elasticloadbalancing:ModifyListener",
        "elasticloadbalancing:ModifyLoadBalancerAttributes",
        "elasticloadbalancing:ModifyRule",
        "elasticloadbalancing:ModifyTargetGroup",
        "elasticloadbalancing:ModifyTargetGroupAttributes",
        "elasticloadbalancing:RegisterTargets",
        "elasticloadbalancing:RemoveListenerCertificates",
        "elasticloadbalancing:RemoveTags",
        "elasticloadbalancing:SetIpAddressType",
        "elasticloadbalancing:SetSecurityGroups",
        "elasticloadbalancing:SetSubnets",
        "elasticloadbalancing:SetWebACL"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "iam:CreateServiceLinkedRole",
        "iam:GetServerCertificate",
        "iam:ListServerCertificates"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": ["cognito-idp:DescribeUserPoolClient"],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "waf-regional:GetWebACLForResource",
        "waf-regional:GetWebACL",
        "waf-regional:AssociateWebACL",
        "waf-regional:DisassociateWebACL"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": ["tag:GetResources", "tag:TagResources"],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": ["waf:GetWebACL"],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "shield:DescribeProtection",
        "shield:GetSubscriptionState",
        "shield:DeleteProtection",
        "shield:CreateProtection",
        "shield:DescribeSubscription",
        "shield:ListProtections"
      ],
      "Resource": "*"
    }
  ]
}

Create an IAM OIDC provider and associate it with your cluster:

eksctl utils associate-iam-oidc-provider --region eu-west-1 --cluster eks-cluster \
 --approve

Create an IAM policy called ALBIngressControllerIAMPolicy for the ALB Ingress Controller pod that allows it to make calls to AWS APIs on your behalf:

aws iam create-policy --policy-name ALBIngressControllerIAMPolicy \
 --policy-document file://iam-policy.json

We need to define a Kubernetes service account named alb-ingress-controller in the kube-system namespace, cluster role, and a cluster role binding for the ALB Ingress Controller. Create a file named rbac-role.yaml as follows:

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    app.kubernetes.io/name: alb-ingress-controller
  name: alb-ingress-controller
rules:
  - apiGroups:
      - ""
      - extensions
    resources:
      - configmaps
      - endpoints
      - events
      - ingresses
      - ingresses/status
      - services
      - pods/status
    verbs:
      - create
      - get
      - list
      - update
      - watch
      - patch
  - apiGroups:
      - ""
      - extensions
    resources:
      - nodes
      - pods
      - secrets
      - services
      - namespaces
    verbs:
      - get
      - list
      - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    app.kubernetes.io/name: alb-ingress-controller
  name: alb-ingress-controller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: alb-ingress-controller
subjects:
  - kind: ServiceAccount
    name: alb-ingress-controller
    namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    app.kubernetes.io/name: alb-ingress-controller
  name: alb-ingress-controller
  namespace: kube-system

And apply it with kubectl:

kubectl apply -f rbac-role.yaml

Create an IAM role for the ALB ingress controller and attach the role to the service account created in the previous step. Change <ACCOUNT_ID> with the id of your AWS Account and <AWS_REGION> with the region where the cluster has been created:

eksctl create iamserviceaccount --region <AWS_REGION> \
--name alb-ingress-controller \
--namespace kube-system \
--cluster eks-cluster \
--attach-policy-arn arn:aws:iam::<ACCOUNT_ID>:policy/ALBIngressControllerIAMPolicy \
--override-existing-serviceaccounts \
--approve

Create a file alb-ingress-controller.yaml to define the alb-ingress-controller deployment, replacing <CLUSTER_NAME> with the name of your cluster, <VPC_ID> with the id of the VPC used by the cluster and <AWS_REGION> with the region where the cluster as been created:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/name: alb-ingress-controller
  name: alb-ingress-controller
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: alb-ingress-controller
  template:
    metadata:
      labels:
        app.kubernetes.io/name: alb-ingress-controller
    spec:
      containers:
        - name: alb-ingress-controller
          args:
            - --ingress-class=alb
            - --cluster-name=<CLUSTER_NAME>
            - --aws-vpc-id=<VPC_ID>
            - --aws-region=<AWS_REGION>
            # - --aws-api-debug
            # - --aws-max-retries=10
          env:
            #- name: AWS_ACCESS_KEY_ID
            #  value: KEYVALUE
            #- name: AWS_SECRET_ACCESS_KEY
            #  value: SECRETVALUE
          image: docker.io/amazon/aws-alb-ingress-controller:v1.1.6
      serviceAccountName: alb-ingress-controller

Deploy the ALB Ingress Controller with the following command:

kubectl apply -f alb-ingress-controller.yaml

Verify that the alb-ingress-controller is running and there are no errors in logs:

kubectl get pods -n kube-system | grep alb-ingress-controller
alb-ingress-controller-688dd94984-m6554   1/1     Running   0          4d22h

Deploy a sample application

We deploy the game 2048 as a sample application to verify that the ALB Ingress Controller creates an Application Load Balancer as a result of the Ingress object.

First we create a new Fargate profile for this application:

eksctl create fargateprofile --cluster eks-cluster --region eu-west-1 \
 --name 2048-game --namespace 2048-game

Create a 2048.yaml to define our namespace, deployment, service and
ingress:

---
apiVersion: v1
kind: Namespace
metadata:
  name: "2048-game"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: "2048-deployment"
  namespace: "2048-game"
spec:
  selector:
    matchLabels:
      app: "2048"
  replicas: 5
  template:
    metadata:
      labels:
        app: "2048"
    spec:
      containers:
        - image: alexwhen/docker-2048
          imagePullPolicy: Always
          name: "2048"
          ports:
            - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: "service-2048"
  namespace: "2048-game"
spec:
  ports:
    - port: 80
      targetPort: 80
      protocol: TCP
  type: NodePort
  selector:
    app: "2048"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: "2048-deployment"
  namespace: "2048-game"
spec:
  selector:
    matchLabels:
      app: "2048"
  replicas: 5
  template:
    metadata:
      labels:
        app: "2048"
    spec:
      containers:
        - image: alexwhen/docker-2048
          imagePullPolicy: Always
          name: "2048"
          ports:
            - containerPort: 80
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: "2048-ingress"
  namespace: "2048-game"
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
  labels:
    app: 2048-ingress
spec:
  rules:
    - http:
        paths:
          - path: /*
            backend:
              serviceName: "service-2048"
              servicePort: 80

Apply the manifest file:

kubectl apply -f 2048.yaml

After a few minutes, verify that the Ingress resource was created with the following command.

kubectl get ingress/2048-ingress -n 2048-game
NAME           HOSTS   ADDRESS                                                                  PORTS   AGE
2048-ingress   *       example-2048game-2048ingr-6fa0-352729433.region-code.elb.amazonaws.com   80      24h

Open a browser and navigate to the ADDRESS URL from the previous command output to see the sample application.

When you finish experimenting with your sample application, delete it with the following command.

kubectl delete -f 2048.yaml

Route53 Integration

external-dns provisions DNS records based on the host information. This project will setup and manage records in Route 53 that point to controller deployed ALBs.

Create a private hosted zone on Route 53 and associate it with the VPC where EKS is running (https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/hosted-zone-private-creating.html).

Create the service account:

eksctl create iamserviceaccount --name external-dns --namespace kube-system --cluster eks-cluster --attach-policy-arn arn:aws:iam::aws:policy/AmazonRoute53FullAccess --override-existing-serviceaccounts --approve --region eu-west-1

Create a file external-dns.yaml as follows:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: external-dns
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  name: external-dns
rules:
  - apiGroups: [""]
    resources: ["services"]
    verbs: ["get", "watch", "list"]
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["get", "watch", "list"]
  - apiGroups: ["extensions"]
    resources: ["ingresses"]
    verbs: ["get", "watch", "list"]
  - apiGroups: [""]
    resources: ["nodes"]
    verbs: ["list"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: external-dns-viewer
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: external-dns
subjects:
  - kind: ServiceAccount
    name: external-dns
    namespace: kube-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: external-dns
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: external-dns
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: external-dns
    spec:
      serviceAccountName: external-dns
      containers:
        - name: external-dns
          #image: registry.opensource.zalan.do/teapot/external-dns:v0.5.18
          image: eu.gcr.io/k8s-artifacts-prod/external-dns/external-dns:v0.7.0
          args:
            - --source=service
            - --source=ingress
            - --domain-filter=my-domain.com # will make ExternalDNS see only the hosted zones matching provided domain, omit to process all available hosted zones
            - --provider=aws
            - --policy=upsert-only # would prevent ExternalDNS from deleting any records, omit to enable full synchronization
            - --aws-zone-type=private # only look at public hosted zones (valid values are public, private or no value for both)
            - --registry=txt
            - --txt-owner-id=my-identifier
      securityContext:
        fsGroup: 65534
      volumes:
        - name: token-vol
          projected:
            sources:
              - serviceAccountToken:
                  path: token

Edit the --domain-filter flag to include your hosted zone(s) and deploy external-dns:

kubectl apply -f external-dns.yaml

Verify it deployed correctly:

kubectl apply -f external-dns.yaml

In order to check if the integration works, you can add the external-dns.alpha.kubernetes.io/hostname annotation in the 2048 ingress annotations using the domain created in the private hosted zone:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: "2048-ingress"
  namespace: "2048-game"
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
    external-dns.alpha.kubernetes.io/hostname: 2048.my-domain.com # add this new annotation
  labels:
    app: 2048-ingress
spec:
  rules:
    - http:
        paths:
          - path: /*
            backend:
              serviceName: "service-2048"
              servicePort: 80

Then, apply it once again with kubectl, and you can check external-dns pod logs again or the Route53 private hosted zone for the new DNS records.

Horizontal Pod Autoscaler

The Kubernetes Horizontal Pod Autoscaler automatically scales the number of pods in a deployment, replication controller, or replica set based on that resource\’s CPU utilization. This can help your applications scale out to meet increased demand or scale in when resources are not needed, thus freeing up your worker nodes for other applications. When you set a target CPU utilization percentage, the Horizontal Pod Autoscaler scales your application in or out to try to meet that target.

Install the metric service:

DOWNLOAD_URL=$(curl -Ls "https://api.github.com/repos/kubernetes-sigs/metrics-server/releases/latest" | jq -r .tarball_url)
DOWNLOAD_VERSION=$(grep -o '[^/v]*$' <<< $DOWNLOAD_URL)
curl -Ls $DOWNLOAD_URL -o metrics-server-$DOWNLOAD_VERSION.tar.gz
mkdir metrics-server-$DOWNLOAD_VERSION
tar -xzf metrics-server-$DOWNLOAD_VERSION.tar.gz --directory metrics-server-$DOWNLOAD_VERSION --strip-components 1
kubectl apply -f metrics-server-$DOWNLOAD_VERSION/deploy/1.8+/

Verify that the metrics-server deployment is running the desired number of pods with the following command.

kubectl get deployment metrics-server -n kube-system

Test the autoscaler

To test the autoscaler configuration, let’s run an Apache web server and a stress test against it.

First run the web server.

kubectl get deployment metrics-server -n kube-system

Then configure autoscaler to target the 50% of CPU, with a minimum of 1
pod and a maximum of 5:

kubectl autoscale deployment httpd --cpu-percent=50 --min=1 --max=5

Describe the autoscaler to crosscheck the configuration:

kubectl describe hpa/httpd

Run the load test to test it:

kubectl run apache-bench -i --tty --rm --image=httpd -- ab -n 500000 -c 1000 http://httpd.default.svc.cluster.local/

Verify that the pods are scaling out:

kubectl get horizontalpodautoscaler.autoscaling/httpd

And cleanup:

kubectl delete deployment.apps/httpd service/httpd horizontalpodautoscaler.autoscaling/httpd

Monitoring with Prometheus and Grafana

We will use Prometheus to record real-time metrics from our cluster and Grafana to create dashboard to display these metrics.

So far we created an EKS cluster backed by the Fargate engine. As of now, one limitation of AWS Fargate is the lack of persistence storage on both EBS or EFS. If we spin-up a Prometheus container on Fargate, it will collect the metrics locally to the container, and if the container terminates we will loose all our metrics. The same stands for Grafana, if we deploy a Grafana container on Fargate, our fancy dashboards won’t survive a container termination.

In most cases this behavior is not acceptable. To overcome this current limitation, we will configure an EKS node group, which will create an Autoscaling Group which spans VPC AZs with one desired EC2 instance. We will also create an EFS volume to persist our data and make it available on all AZs.

Prometheus

Prometheus is an open-source systems monitoring and alerting toolkit. It records real-time metrics in a time series database built using a HTTP pull model, with flexible queries and real-time alerting.

Option 1: Setup Prometheus without persistence

This setup will deploy a Prometheus container on Fargate, which means that the metrics database will not survive a container termination.

Let’s start creating a Fargate profile and a namespace:

eksctl create fargateprofile --cluster eks-cluster --region eu-west-1 \
 --name prometheus --namespace prometheus

kubectl create namespace prometheus

Install prometheus from HelmHub without any persistence:

helm install prometheus stable/prometheus --set alertmanager.persistentVolume.enabled="false" --set server.persistentVolume.enabled="false" -n prometheus

Enable port forwarding:

kubectl port-forward -n prometheus deploy/prometheus-server 8080:9090

Verify the installation navigating with the browser to http://localhost:8080/targets.

Option 2: Setup Prometheus with persistence

EKS node group definition

In a eksctl-nodegroup.yaml we define the new managed node group:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: eks-cluster
  region: eu-west-1
managedNodeGroups:
  - name: ng-1-monitoring
    labels: { role: monitoring }
    instanceType: t3.small
    desiredCapacity: 1
    minSize: 1
    maxSize: 2

And we create the managed node group:

eksctl create nodegroup --config-file=eksctl-nodegroup.yaml

Verify that the node group is up and running with the following command:

eksctl get nodegroup --cluster eks-cluster --region eu-west-1

EFS Configuration

Next, we are going to create an EFS filesystem. It will be used by the Prometheus container to store real-time metric data:

aws efs create-file-system \
--tags Key=Name,Value=PrometheusFs \
--region eu-west-1

You will get the following response:

{
  "OwnerId": "174112236391",
  "CreationToken": "41d3ee19-9f1d-4b60-859d-937d2e2e1586",
  "FileSystemId": "fs-98132853",
  "CreationTime": "2020-03-31T09:33:57+02:00",
  "LifeCycleState": "available",
  "Name": "PrometheusFs",
  "NumberOfMountTargets": 0,
  "SizeInBytes": {
    "Value": 6144,
    "ValueInIA": 0,
    "ValueInStandard": 6144
  },
  "PerformanceMode": "generalPurpose",
  "Encrypted": false,
  "ThroughputMode": "bursting",
  "Tags": [
    {
      "Key": "Name",
      "Value": "PrometheusFs"
    }
  ]
}

Then we need to create a mount target for each private subnet, specifying the security group of the EKS cluster created with eksctl:

aws eks describe-cluster --name eks-cluster --region eu-west-1 | jq  ".cluster.resourcesVpcConfig.clusterSecurityGroupId"
"sg-029325a242a081e08"
aws efs create-mount-target \
--file-system-id fs-98132853 \
--subnet-id "subnet-09be91ec7cb1ddbd6" \
--security-group "sg-029325a242a081e08" \
--region eu-west-1
aws efs create-mount-target \
--file-system-id fs-98132853 \
--subnet-id "subnet-0e5d1f9e4e203ce75" \
--security-group "sg-029325a242a081e08" \
--region eu-west-1

EFS Provisioner

Now that the Node group and the EFS have been created we move on k8s setup. Create a prometheus namespace:

kubectl create namespace prometheus

In order to have our prometheus container mount the EFS volume we have just created, we need to deploy an efs-provisioner.

The efs-provisioner allows you to mount EFS storage as PersistentVolumes in Kubernetes. It consists of a container that has access to an AWS EFS resource. The container reads a ConfigMap which contains the EFS filesystem ID, the AWS region and the name you want to use for your efs-provisioner. This name will be used later when you create a storage class.

To do that we create a prometheus-efs-manifest.yaml as follows:

---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: prometheus-efs
  namespace: prometheus
  annotations:
    volume.beta.kubernetes.io/storage-class: "prometheus-efs"
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Mi
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: prometheus-efs
provisioner: prometheus/aws-efs
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: efs-provisioner-runner
rules:
  - apiGroups: [""]
    resources: ["persistentvolumes"]
    verbs: ["get", "list", "watch", "create", "delete"]
  - apiGroups: [""]
    resources: ["persistentvolumeclaims"]
    verbs: ["get", "list", "watch", "update"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["storageclasses"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["create", "update", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: run-efs-provisioner
subjects:
  - kind: ServiceAccount
    name: efs-provisioner
    # replace with namespace where provisioner is deployed
    namespace: prometheus
roleRef:
  kind: ClusterRole
  name: efs-provisioner-runner
  apiGroup: rbac.authorization.k8s.io
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: leader-locking-efs-provisioner
  namespace: prometheus
rules:
  - apiGroups: [""]
    resources: ["endpoints"]
    verbs: ["get", "list", "watch", "create", "update", "patch"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: leader-locking-efs-provisioner
  namespace: prometheus
subjects:
  - kind: ServiceAccount
    name: efs-provisioner
    # replace with namespace where provisioner is deployed
    namespace: prometheus
roleRef:
  kind: Role
  name: leader-locking-efs-provisioner
  apiGroup: rbac.authorization.k8s.io

Then we create a prometheus-efs-deployment.yaml file to define a ConfigMap, replace <FS_ID> with your EFS filesystem id in the field file.system.id (in our example fs-98132853) and <AWS_REGION> with the region of our cluster, and a new Deployment replacing <EFS_URL> with the url of our EFS filesystem in the following format fs_id.efs.region.amazonaws.com .

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: efs-provisioner
  namespace: prometheus
data:
  file.system.id: <FS_ID>
  aws.region: <AWS_REGION>
  provisioner.name: prometheus/aws-efs
  dns.name: ""
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: efs-provisioner
  namespace: prometheus
---
kind: Deployment
apiVersion: apps/v1
metadata:
  name: efs-provisioner
  namespace: prometheus
spec:
  replicas: 1
  selector:
    matchLabels:
      app: efs-provisioner
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: efs-provisioner
    spec:
      serviceAccount: efs-provisioner
      containers:
        - name: efs-provisioner
          image: quay.io/external_storage/efs-provisioner:latest
          env:
            - name: FILE_SYSTEM_ID
              valueFrom:
                configMapKeyRef:
                  name: efs-provisioner
                  key: file.system.id
            - name: AWS_REGION
              valueFrom:
                configMapKeyRef:
                  name: efs-provisioner
                  key: aws.region
            - name: DNS_NAME
              valueFrom:
                configMapKeyRef:
                  name: efs-provisioner
                  key: dns.name
                  optional: true
            - name: PROVISIONER_NAME
              valueFrom:
                configMapKeyRef:
                  name: efs-provisioner
                  key: provisioner.name
          volumeMounts:
            - name: pv-volume
              mountPath: /persistentvolumes
      volumes:
        - name: pv-volume
          nfs:
            server: <EFS_URL>
            path: /

And we apply our configuration:

kubectl apply -f prometheus-efs-manifest.yaml,prometheus-efs-deployment.yaml

Verify the correctness of the deployment:

kubectl get deployment efs-provisioner -n prometheus
NAME              READY   UP-TO-DATE   AVAILABLE   AGE
efs-provisioner   1/1     1            1           3d18h

Prometheus deployment

Once we have our efs-provisioner up and running, we create a
prometheus-helm-config.yaml:

alertmanager:
  persistentVolume:
    enabled: true
    existingClaim: prometheus-efs
server:
  persistentVolume:
    enabled: true
    existingClaim: prometheus-efs

Install Prometheus using helm with the configuration above:

helm install prometheus stable/prometheus -f prometheus-helm-config.yaml -n prometheus

Enable port forwarding:

kubectl port-forward -n prometheus deploy/prometheus-server 8080:9090

Verify the installation navigating with the browser to
http://localhost:8080/targets.

Leave prometheus running for some time, so that it will collect metrics from the cluster. Then, if you kill the pod, the new pod will use the same database and write-ahead-log persisted on the shared EFS filesystem.

Grafana

Grafana is the open source analytics & monitoring solution that we will use to display metrics stored in Prometheus.

Likewise, deploying Grafana in EKS on Fargate will not persist your dashboard after a pod reboot or termination. If that works you, continue with the setup Option 1 otherwise follow the Option 2.

Option 1: Setup Grafana without persistence

Let’s start creating a namespace:

kubectl create namespace grafana

Create a grafana-config.yaml file to configure the datasource:

persistence:
  enabled: false
datasources:
  datasources.yaml:
    apiVersion: 1
    datasources:
      - name: Prometheus
        type: prometheus
        url: http://prometheus-server.prometheus.svc.cluster.local
        access: proxy
        isDefault: true
dashboardProviders:
  dashboardproviders.yaml:
    apiVersion: 1
    providers:
      - name: "default"
        orgId: 1
        folder: ""
        type: file
        disableDeletion: false
        editable: true
        options:
          path: /var/lib/grafana/dashboards/default
dashboards:
  default:
    kube:
      gnetId: 8588
      revision: 1
      datasource: Prometheus
    prometheus-stats:
      gnetId: 2
      revision: 2
      datasource: Prometheus

Install Grafana from HelmHub without any persistence:

helm install grafana stable/grafana --namespace grafana -f grafana-config.yaml --set adminPassword='admin123'

Deploy a Grafana ingress:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: "grafana"
  namespace: "grafana"
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
    external-dns.alpha.kubernetes.io/hostname: grafana.my-domain.com
  labels:
    app: grafana
spec:
  rules:
    - http:
        paths:
          - path: /*
            backend:
              serviceName: grafana
              servicePort: 80

Get the ingress properties for Grafana:

kubectl get ingress -n grafana

Which shows the url of the ALB:

NAME      HOSTS   ADDRESS                                                               PORTS   AGE
grafana   *       6318d69f-grafana-grafana-fdb1-895535036.eu-west-1.elb.amazonaws.com   80      3d17h

Option 2: Setup Grafana with persistence

This setup is exactly the same done for Prometheus Option 2. The only difference is that, there is no need to create another EKS nodegroup as both Prometheus and Grafana will be deployed in the same node group.

Create a new k8s namespace for Grafana as follows:

kubectl create namespace grafana

Then you will need to create another EFS filesystem and deploy another EFS Provisioner to mount the new filesystem.

Follow the configuration steps in the Prometheus section described in the EFS Configuration section and EFS-Provisioner section replacing every occurrence of the string “prometheus“ with “grafana“ in both YAML files and command lines. Create and apply grafana-efs-manifest.yaml and grafana-efs-deployment.yaml using the grafana namespace and the grafana EFS.

Create a grafana-config.yaml file to configure the datasource and add a couple of dashboards:

replicas: 1
persistence:
  enabled: true
  existingClaim: grafana-efs
datasources:
  datasources.yaml:
    apiVersion: 1
    datasources:
      - name: Prometheus
        type: prometheus
        url: http://prometheus-server.prometheus.svc.cluster.local
        access: proxy
        isDefault: true
dashboardProviders:
  dashboardproviders.yaml:
    apiVersion: 1
    providers:
      - name: "default"
        orgId: 1
        folder: ""
        type: file
        disableDeletion: false
        editable: true
        options:
          path: /var/lib/grafana/dashboards/default
dashboards:
  default:
    kube:
      gnetId: 8588
      revision: 1
      datasource: Prometheus
    prometheus-stats:
      gnetId: 2
      revision: 2
      datasource: Prometheus

Install Grafana from HelmHub without any persistence:

helm install grafana stable/grafana --namespace grafana -f grafana-config.yaml --set adminPassword='admin123'

Deploy a Grafana ingress:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: "grafana"
  namespace: "grafana"
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
    external-dns.alpha.kubernetes.io/hostname: grafana.my-domain.com
  labels:
    app: grafana
spec:
  rules:
    - http:
        paths:
          - path: /*
            backend:
              serviceName: grafana
              servicePort: 80

Get the ingress properties for Grafana:

kubectl get ingress -n grafana

Which shows the url of the ALB:

NAME      HOSTS   ADDRESS                                                               PORTS   AGE
grafana   *       6318d69f-grafana-grafana-fdb1-895535036.eu-west-1.elb.amazonaws.com   80      3d17h

Logging on CloudWatch Logs

Currently EKS for Fargate does not support Kubenertes DaemonSets, therefore CloudWatch Insights is not usable. To be able to send container’s logs to CloudWatch with EKS Fargate, we need to create a new Service Account and attach an IAM policy which allow to do so.

eksctl create iamserviceaccount --name cwl-fargate \
 --namespace default --cluster eks-cluster \
 --attach-policy-arn arn:aws:iam::aws:policy/CloudWatchFullAccess \
 --approve --region eu-west-1

On top of that, we will need a forwarder to forward logs from k8s to CloudWatch.

Fluent Bit is an open source and multi-platform Log Processor and Forwarder which allows you to collect data/logs from different sources, unify and send them to multiple destinations. It\’s fully compatible with Docker and Kubernetes environments.

First let’s create a fluentbit-config.yaml to hold the ConfigMap to configure fluent-bit. Replace the <AWS_REGION> with the region of the cluster, and replace <LOG_GROUP_NAME> and <LOG_STREAM_PREFIX> namely with a log group name and a prefix of your choice:

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluentbit-config
data:
  # Configuration files: server, input, filters and output
  # ======================================================
  fluent-bit.conf: |
    [INPUT]
        Name              tail
        Tag               *.logs
        Path              /var/log/*.log
        DB                /var/log/logs.db
        Mem_Buf_Limit     5MB
        Skip_Long_Lines   On
        Refresh_Interval  10
    [OUTPUT]
        Name              cloudwatch
        Match             *
        region            <AWS_REGION>
        log_group_name    <LOG_GROUP_NAME>
        log_stream_prefix <LOG_STREAM_PREFIX>
        auto_create_group true

Next we will configure a test pod to use the fluentbit config map above and to send the output on both standard output and to a log file which will be forwarded to CloudWatch from fluent bit, and a sidecar container with the fluentbit image:

fluentbit-sidecar.yaml

apiVersion: v1
kind: Pod
metadata:
  name: counter
spec:
  serviceAccountName: cwl-fargate
  containers:
    - name: count
      image: busybox
      args:
        - /bin/sh
        - -c
        - >
          i=0;
          while true;
          do
            echo "$i: $(date) this is an app log" 2>&1 | tee -a /var/log/app.log;
            echo "$(date) $(uname -r) $i" 2>&1 | tee -a /var/log/system.log;
            i=$((i+1));
            sleep 60;
          done
      volumeMounts:
        - name: varlog
          mountPath: /var/log
    - name: count-agent
      image: amazon/aws-for-fluent-bit:latest
      imagePullPolicy: Always
      ports:
        - containerPort: 2020
      env:
        - name: FLUENTD_HOST
          value: "fluentd"
        - name: FLUENTD_PORT
          value: "24224"
      volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: fluentbit-config
          mountPath: /fluent-bit/etc/
  terminationGracePeriodSeconds: 10
  volumes:
    - name: varlog
      emptyDir: {}
    - name: fluentbit-config
      configMap:
        name: fluentbit-config

Apply:

kubectl apply -f fluentbit-config.yaml,fluentbit-sidecar.yaml

After few minutes check on CloudWatch console and you should see the new LogGroup and the logs generated by the test app.

kubectl apply -f fluentbit-config.yaml,fluentbit-sidecar.yaml

AWS Secrets Manager Integration

Most likely your application needs to use secrets (API keys, database credentials, tokens). By default, Kubernetes does not provide a proper secrets management system so we need to integrate it using an external system.

AWS Secrets Manager is a secrets management system used to store, rotate, monitor, and control access to secrets.

At the time of this writing, there is no integration between EKS and AWS Secrets Manager. Fortunately GoDaddy wrote their own integration and open sourced it on GitHub.

For each secret in AWS Secret Manager, we need to define an ExternalSecrets. An External Secrets Controller deployed in our cluster, fetch the secret from AWS Secrets Manager and translate it into a Secrets entity. This process is totally transparent to the Pod that can access Secrets normally.

Alt

Let’s start creating a new secret in AWS Secret Manager

aws secretsmanager create-secret --region eu-west-1 --name secret-consumer-service/credentials \
--secret-string '{"username":"admin","password":"1234"}'

Then we define a scm-iam-policy.json to allow access to our secrets

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "secretsmanager:GetResourcePolicy",
        "secretsmanager:GetSecretValue",
        "secretsmanager:DescribeSecret",
        "secretsmanager:ListSecretVersionIds"
      ],
      "Resource": ["arn:aws:secretsmanager:eu-west-1:608500418044:secret:*"]
    }
  ]
}

And create the IAM policy

aws iam create-policy --policy-name EKSSecretsManagerIAMPolicy \
--policy-document file://scm-iam-policy.json

Create a ServiceAccount secrets-manager-sa.yaml:

apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    app.kubernetes.io/name: secrets-manager
  name: secrets-manager
  namespace: kube-system

kubectl apply -f secrets-manager-sa.yaml

eksctl create iamserviceaccount --region <AWS_REGION> \
--name secrets-manager \
--namespace kube-system \
--cluster eks-cluster \
--attach-policy-arn arn:aws:iam::<ACCOUNT_ID>:policy/EKSSecretsManagerIAMPolicy \
--override-existing-serviceaccounts \
--approve

Add the kubernetes-external-secrets repo to helm

helm repo add external-secrets https://godaddy.github.io/kubernetes-external-secrets/
helm repo update

Create a kubernetes-external-secrets-config.yaml to configure kubernetes-external-secrets

serviceAccount:
  create: false
  name: secrets-manager
env:
  AWS_REGION: eu-west-1
securityContext:
  fsGroup: 65534

and deploy kubernetes-external-secrets:

helm install external-secrets external-secrets/kubernetes-external-secrets \
-f ./kubernetes-external-secrets-config.yaml

Now that we configured kubernetes-external-secret, let’s see how to use it.

We create an external-secrets.yaml file to map our ExternalSecrets to the secrets defined in AWS Secrets Manager:

apiVersion: "kubernetes-client.io/v1"
kind: ExternalSecret
metadata:
  name: secret-consumer-service
spec:
  backendType: secretsManager
  data:
    - key: secret-consumer-service/credentials
      name: password
      property: password
    - key: secret-consumer-service/credentials
      name: username
      property: username

and apply

kubectl apply -f external-secret-service.yaml

You can verify that the secrets has been correctly imported running:

kubectl get secret secret-consumer-service -o=yaml

We create a test pod secrets-consumer-pod.yaml that logs our secrets on standard output (do not do this in production code). It reads the Secrets username and password from secret-consumer-service and it maps them with two environment variables to be used by our application:

apiVersion: v1
kind: Pod
metadata:
  name: secrets-consumer
spec:
  containers:
    - name: secrets-consumer
      image: busybox
      args:
        - /bin/sh
        - -c
        - >
          while true;
          do
            echo "Username: $SECRET_USERNAME";
            echo "Password: $SECRET_PASSWORD";
            sleep 60;
          done
      env:
        - name: SECRET_USERNAME
          valueFrom:
            secretKeyRef:
              name: secret-consumer-service
              key: username
        - name: SECRET_PASSWORD
          valueFrom:
            secretKeyRef:
              name: secret-consumer-service
              key: password
  restartPolicy: Never

Apply it:

kubectl apply -f secrets-consumer-pod.yaml

If you check the logs, you should see the secrets:

secrets-consumer Username: admin
secrets-consumer Password: 1234