EKS DNS troubleshooter - DNS Diagnostic tool
EKS DNS troubleshooter is an automated DNS troubleshooting utility for EKS cluster. It can be used to test, validate and troubleshoot DNS issues in EKS cluster.
This tool runs as a pod in EKS cluster and scans all the cluster configuration, performs DNS resolution and identifies the issue, then generates a diagnosis report in JSON format.
Spend less time in finding a root cause, more on Important Stuff!
Tool verifies the following scenarios to validate/troubleshoot DNS in EKS cluster:
v1.6.6
as of now).kube-dns
) exist and its endpoints.10.100.0.10
) and individual Coredns pod IPs.log
plugin is enabled in Coredns Configmap).To deploy the EKS DNS Troubleshooter to an EKS cluster:
curl -o iam-policy.json https://raw.githubusercontent.com/joshisumit/eks-dns-troubleshooter/v1.1.0/deploy/iam-policy.json
DNSTroubleshooterIAMPolicy
using the policy downloaded in the previous step.
aws iam create-policy \
--policy-name DNSTroubleshooterIAMPolicy \
--policy-document file://iam-policy.json
Take note of the policy ARN that is returned.
eks-dns-ts
, a cluster role, and a cluster role binding for the EKS DNS troubleshooter to use with the following command.
kubectl apply -f https://raw.githubusercontent.com/joshisumit/eks-dns-troubleshooter/v1.1.0/deploy/rbac-role.yaml
Create an IAM role for the EKS DNS Troubleshooter and attach the role to the service account created in the previous step.
eks-dns-troubleshooter
and attach the DNSTroubleshooterIAMPolicy
IAM policy that you created in a previous step to it. Note the Amazon Resource Name (ARN) of the role, once you’ve created it. Refer this document for details on creating an IAM role for Service accounts with eksctl, AWS CLI or AWS Management Console.Annotate the Kubernetes service account with the ARN of the role that you created with the following command (replace IAM Role ARN).
kubectl annotate serviceaccount -n default eks-dns-ts \
eks.amazonaws.com/role-arn=arn
iam:
role/eks-dns-troubleshooter
Deploy the EKS DNS Troubleshooter with the following command.
kubectl apply -f https://raw.githubusercontent.com/joshisumit/eks-dns-troubleshooter/v1.1.0/deploy/eks-dns-troubleshooter.yaml
Verify that pod is deployed
kubectl logs -l=app=eks-dns-troubleshooter
Should display output similar to the following.
time="2020-06-27T15:22:34Z" level=info msg="\n\t-----------------------------------------------------------------------\n\tEKS DNS Troubleshooter\n\tRelease: v1.1.0\n\tBuild: git-89606ea\n\tRepository: https://github.com/joshisumit/eks-dns-troubleshooter\n\t-----------------------------------------------------------------------\n\t"
{"file":"main.go:81","func":"main","level":"info","msg":"Running on Kubernetes v1.15.11-eks-af3caf","time":"2020-06-27T15:22:34Z"}
{"file":"main.go:100","func":"main","level":"info","msg":"kube-dns service ClusterIP: 10.100.0.10","time":"2020-06-27T15:22:34Z"}
{"file":"verify-configs.go:42","func":"checkServieEndpoint","level":"info","msg":"Endpoints addresses: {[{192.168.27.238 0xc0004488a0 \u0026ObjectReference{Kind:Pod,Namespace:kube-system,Name:coredns-6658f9f447-gm5sn,UID:75d7a6cf-4092-4501-a56f-d3c859cdc5d0,APIVersion:,ResourceVersion:36301646,FieldPath:,}} {192.168.5.208 0xc0004488b0 \u0026ObjectReference{Kind:Pod,Namespace:kube-system,Name:coredns-6658f9f447-gqqk2,UID:f1864697-db9d-460c-b219-d5fbbe7484b8,APIVersion:,ResourceVersion:36301561,FieldPath:,}}] [] [{dns 53 UDP} {dns-tcp 53 TCP}]}","time":"2020-06-27T15:22:34Z"}
...
{"file":"main.go:173","func":"main","level":"info","msg":"DNS Diagnosis completed. Please check diagnosis report in /var/log/eks-dns-diag-summary.json file.","time":"2020-06-27T15:23:34Z"}
Once pod is running it will validate and troubleshoot DNS, which will take around 2 mins and then it will generate the diagnosis report in the JSON format (last line in the above log excerpt).
POD_NAME=$(kubectl get pods -l=app=eks-dns-troubleshooter -o jsonpath='{.items..metadata.name}')
kubectl exec -ti $POD_NAME -- cat /var/log/eks-dns-diag-summary.json | jq
OR download the diagnosis report JSON file locally
kubectl exec -ti $POD_NAME -- cat /var/log/eks-dns-diag-summary.json > eks-dns-diag-summary.json
{
"diagnosisCompletion": true,
"diagnosisToolInfo": {
"release": "v1.1.0",
"repo": "https://github.com/joshisumit/eks-dns-troubleshooter",
"commit": "git-89606ea"
},
"Analysis": {
"dnstest": "DNS resolution is working correctly in the cluster",
"naclRules": "naclRules are configured correctly...NOT blocking any DNS communication",
"securityGroupConfigurations": "securityGroups are configured correctly...not blocking any DNS communication"
},
"eksVersion": "v1.16.8-eks-e16311",
"corednsChecks": {
"clusterIP": "10.100.0.10",
"endpointsIP": [
"192.168.10.9",
"192.168.17.192"
],
"notReadyEndpoints": [],
"namespace": "kube-system",
"imageVersion": "v1.6.6",
"recommendedVersion": "v1.6.6",
"dnstestResults": {
"dnsResolution": "success",
"description": "tests the internal and external DNS queries against ClusterIP and two Coredns Pod IPs",
"domainsTested": [
"amazon.com",
"kubernetes.default.svc.cluster.local"
],
"detailedResultForEachDomain": [
{
"domain": "amazon.com",
"server": "10.100.0.10:53",
"result": "success"
},
{
"domain": "amazon.com",
"server": "192.168.10.9:53",
"result": "success"
},
{
"domain": "amazon.com",
"server": "192.168.17.192:53",
"result": "success"
},
{
"domain": "kubernetes.default.svc.cluster.local",
"server": "10.100.0.10:53",
"result": "success"
},
{
"domain": "kubernetes.default.svc.cluster.local",
"server": "192.168.10.9:53",
"result": "success"
},
{
"domain": "kubernetes.default.svc.cluster.local",
"server": "192.168.17.192:53",
"result": "success"
}
]
},
"replicas": 2,
"podNames": [
"coredns-76f4cb57b4-25x8d",
"coredns-76f4cb57b4-2vs9w"
],
"corefile": ".:53 {\n log\n errors\n health\n kubernetes cluster.local in-addr.arpa ip6.arpa {\n pods insecure\n upstream\n fallthrough in-addr.arpa ip6.arpa\n }\n prometheus :9153\n forward . /etc/resolv.conf\n cache 30\n loop\n reload\n loadbalance\n}\n",
"resolvconf": {
"SearchPath": [
"default.svc.cluster.local",
"svc.cluster.local",
"cluster.local",
"eu-west-2.compute.internal"
],
"Nameserver": [
"10.100.0.10"
],
"Options": [
"ndots:5"
],
"Ndots": 5
},
"errorCheckInCorednsLogs": {
"errorsInLogs": false
}
},
"eksClusterChecks": {
"securityGroupChecks": {
"IsClusterSGRuleCorrect": true,
"InboundRule": {
"isValid": "true"
},
"OutboundRule": {
"isValid": "true"
}
},
"naclRulesCheck": true,
"region": "eu-west-2",
"securityGroupIds": [
"eksctl-dev-cluster-nodegroup-ng-1-SG-40FRGTVAFSPT",
"eksctl-dev-cluster-cluster-ClusterSharedNodeSecurityGroup-1PISH2DINONDS"
],
"clusterName": "dev-cluster",
"clusterSecurityGroup": "sg-0529eb51ffbae7373"
}
}
log
plugin in the coredns ConfigMap before running the tool.IAM roles for Service Accounts (IRSA)
to associate the Service Account that the EKS DNS Troubleshooter Deployment runs as with an IAM role that is able to perform these functions. If you are unable to use IRSA
, you may associate an IAM Policy with the EC2 instance on which the EKS DNS Troubleshooter pod runs./var/log/eks-dns-tool.log
- Tool execution logs which can be used for debugging purpose/var/log/eks-dns-diag-summary.json
- Final Diagnosis result in JSON formatkubectl exec -ti $POD_NAME -- /app/eks-dnshooter
curl
, dig
, nslookup
etc.