Why does the DNS Operator show a RouteHealthDegraded or can't marshal DNS message error?
Virtual Private Cloud Classic infrastructure
You receive an error message similar to one of the following.
The following example error message is displayed when installing IBM Cloud Pak for Data from the console.
XXX.us-south.containers.appdomain.cloud: Get "http://image-registry-openshift-image-registry.ocp-data-privacy-prod-c-XXX.us-south.containers.appdomain.cloud/v2/": dial tcp: lookup image-registry-openshift-image-registry.ocp-data-privacy-prod-c-XXX.us-south.containers.appdomain.cloud on XXX.XX.X.XX:XX: can't marshal DNS message
Example nslookup error.
# nslookup XXX.XXX.databases.appdomain.cloud
Server: XXX.XX.X.XX
Address: XXX.XX.X.XX:XX
Non-authoritative answer:
*** Can't find XXX.XXX.databases.appdomain.cloud: Parse error
Non-authoritative answer:
*** Can't find XXX.XXX.databases.appdomain.cloud: Parse error
The fix for bug 1953097 enabled CoreDNS bufsize plug-in responses of 1232 bytes. Some DNS resolvers can't receive responses
greater than 512 bytes. Note that DNS resolvers that retry lookups using TCP, such as Dig, are not impacted. DNS clients that don't require UDP DNS messages to exceed 512 bytes are not impacted.
Update your cluster master and worker nodes.
-
ibmcloud oc cluster master update --cluster <clusterID> --version <4.6.38_openshift|4.7.19_openshift> -
After you update your master, run the
cluster getcommand to get the cluster state and version and verify that the state isdeployed.ibmcloud oc cluster get --cluster <clusterID> -
Get the details of the
openshift-dnsconfigmap and review thebufsizeby running the following command.oc get ConfigMap -n openshift-dns dns-default -o yaml -
If the
bufsizeis still1232, get the name of the DNS Operator pod.oc get pods -n openshift-dns-operatorExample output
NAME READY STATUS RESTARTS AGE dns-operator-111aa1aaab-xxxx1 2/2 Running 0 5h49m -
Delete the DNS Operator pod.
oc delete pod <dns_operator_pod> -n openshift-dns-operator -
Wait for the DNS pod to restart. Run
get podswith the--watchoption to to verify that the pod is deployed.oc get pods -n openshift-dns --watch -
Verify that the DNS Operator pod is running.
oc get pods -n openshift-dns-operator -
Get the ConfigMap YAML and verify that the
bufsizeis512.oc get configmap -n openshift-dns dns-default -o yaml -
After a few minutes, verify that the DNS resolution works. If you see an
nslookuperror, retry thenslookup./ # nslookup XXX.XXX.databases.appdomain.cloud Server: XXX.XX.X.XX Address: XXX.XX.X.XX:XX Non-authoritative answer: XXX.XXX.databases.appdomain.cloud canonical name = icd-prod-us-south-db-lm0sr.us-south.containers.appdomain.cloud icd-prod-us-south-db-lm0sr.us-south.containers.appdomain.cloud canonical name = icd-prod-us-south-db-lm0sr.XXX.akadns.net Non-authoritative answer: XXX.XXX.databases.appdomain.cloud canonical name = icd-prod-us-south-db-lm0sr.us-south.containers.appdomain.cloud icd-prod-us-south-db-lm0sr.us-south.containers.appdomain.cloud canonical name = icd-prod-us-south-db-lm0sr.XXX.akadns.net Name: icd-prod-us-south-db-lm0sr.XXX.akadns.net Address: XXX.XX.XXX.XX Name: icd-prod-us-south-db-lm0sr.XXX.akadns.net Address: XXX.XX.X.XX Name: icd-prod-us-south-db-lm0sr.XXX.akadns.net Address: XXX.XX.XXX.XXX