Why does the DNS Operator show a RouteHealthDegraded
or can't marshal DNS message
error?
Virtual Private Cloud Classic infrastructure
You receive an error message similar to one of the following.
The following example error message is displayed when installing IBM Cloud Pak for Data from the console.
XXX.us-south.containers.appdomain.cloud: Get "http://image-registry-openshift-image-registry.ocp-data-privacy-prod-c-XXX.us-south.containers.appdomain.cloud/v2/": dial tcp: lookup image-registry-openshift-image-registry.ocp-data-privacy-prod-c-XXX.us-south.containers.appdomain.cloud on XXX.XX.X.XX:XX: can't marshal DNS message
Example nslookup
error.
# nslookup XXX.XXX.databases.appdomain.cloud
Server: XXX.XX.X.XX
Address: XXX.XX.X.XX:XX
Non-authoritative answer:
*** Can't find XXX.XXX.databases.appdomain.cloud: Parse error
Non-authoritative answer:
*** Can't find XXX.XXX.databases.appdomain.cloud: Parse error
The fix for bug 1953097 enabled CoreDNS bufsize
plug-in responses of 1232
bytes. Some DNS resolvers can't receive responses
greater than 512
bytes. Note that DNS resolvers that retry lookups using TCP, such as Dig, are not impacted. DNS clients that don't require UDP DNS messages to exceed 512 bytes are not impacted.
Update your cluster master and worker nodes.
-
ibmcloud oc cluster master update --cluster <clusterID> --version <4.6.38_openshift|4.7.19_openshift>
-
After you update your master, run the
cluster get
command to get the cluster state and version and verify that the state isdeployed
.ibmcloud oc cluster get --cluster <clusterID>
-
Get the details of the
openshift-dns
configmap and review thebufsize
by running the following command.oc get ConfigMap -n openshift-dns dns-default -o yaml
-
If the
bufsize
is still1232
, get the name of the DNS Operator pod.oc get pods -n openshift-dns-operator
Example output
NAME READY STATUS RESTARTS AGE dns-operator-111aa1aaab-xxxx1 2/2 Running 0 5h49m
-
Delete the DNS Operator pod.
oc delete pod <dns_operator_pod> -n openshift-dns-operator
-
Wait for the DNS pod to restart. Run
get pods
with the--watch
option to to verify that the pod is deployed.oc get pods -n openshift-dns --watch
-
Verify that the DNS Operator pod is running.
oc get pods -n openshift-dns-operator
-
Get the ConfigMap YAML and verify that the
bufsize
is512
.oc get configmap -n openshift-dns dns-default -o yaml
-
After a few minutes, verify that the DNS resolution works. If you see an
nslookup
error, retry thenslookup
./ # nslookup XXX.XXX.databases.appdomain.cloud Server: XXX.XX.X.XX Address: XXX.XX.X.XX:XX Non-authoritative answer: XXX.XXX.databases.appdomain.cloud canonical name = icd-prod-us-south-db-lm0sr.us-south.containers.appdomain.cloud icd-prod-us-south-db-lm0sr.us-south.containers.appdomain.cloud canonical name = icd-prod-us-south-db-lm0sr.XXX.akadns.net Non-authoritative answer: XXX.XXX.databases.appdomain.cloud canonical name = icd-prod-us-south-db-lm0sr.us-south.containers.appdomain.cloud icd-prod-us-south-db-lm0sr.us-south.containers.appdomain.cloud canonical name = icd-prod-us-south-db-lm0sr.XXX.akadns.net Name: icd-prod-us-south-db-lm0sr.XXX.akadns.net Address: XXX.XX.XXX.XX Name: icd-prod-us-south-db-lm0sr.XXX.akadns.net Address: XXX.XX.X.XX Name: icd-prod-us-south-db-lm0sr.XXX.akadns.net Address: XXX.XX.XXX.XXX