IBM Cloud Docs
Why does the DNS Operator show a RouteHealthDegraded or can't marshal DNS message error?

Why does the DNS Operator show a RouteHealthDegraded or can't marshal DNS message error?

Virtual Private Cloud Classic infrastructure

You receive an error message similar to one of the following.

The following example error message is displayed when installing IBM Cloud Pak for Data from the console.

XXX.us-south.containers.appdomain.cloud: Get "http://image-registry-openshift-image-registry.ocp-data-privacy-prod-c-XXX.us-south.containers.appdomain.cloud/v2/": dial tcp: lookup image-registry-openshift-image-registry.ocp-data-privacy-prod-c-XXX.us-south.containers.appdomain.cloud on XXX.XX.X.XX:XX: can't marshal DNS message

Example nslookup error.

# nslookup XXX.XXX.databases.appdomain.cloud
Server:        XXX.XX.X.XX
Address:    XXX.XX.X.XX:XX

Non-authoritative answer:
*** Can't find XXX.XXX.databases.appdomain.cloud: Parse error

Non-authoritative answer:
*** Can't find XXX.XXX.databases.appdomain.cloud: Parse error

The fix for bug 1953097 enabled CoreDNS bufsize plug-in responses of 1232 bytes. Some DNS resolvers can't receive responses greater than 512 bytes. Note that DNS resolvers that retry lookups using TCP, such as Dig, are not impacted. DNS clients that don't require UDP DNS messages to exceed 512 bytes are not impacted.

Update your cluster master and worker nodes.

  1. Update your cluster master.

    ibmcloud oc cluster master update --cluster <clusterID> --version <4.6.38_openshift|4.7.19_openshift>
    
  2. After you update your master, run the cluster get command to get the cluster state and version and verify that the state is deployed.

    ibmcloud oc cluster get --cluster <clusterID>
    
  3. Update your worker nodes.

  4. Get the details of the openshift-dns configmap and review the bufsize by running the following command.

    oc get ConfigMap -n openshift-dns dns-default -o yaml
    
  5. If the bufsize is still 1232, get the name of the DNS Operator pod.

    oc get pods -n openshift-dns-operator
    

    Example output

    NAME READY STATUS RESTARTS AGE
    dns-operator-111aa1aaab-xxxx1 2/2 Running 0 5h49m
    
  6. Delete the DNS Operator pod.

    oc delete pod <dns_operator_pod> -n openshift-dns-operator 
    
  7. Wait for the DNS pod to restart. Run get pods with the --watch option to to verify that the pod is deployed.

    oc get pods -n openshift-dns --watch
    
  8. Verify that the DNS Operator pod is running.

    oc get pods -n openshift-dns-operator
    
  9. Get the ConfigMap YAML and verify that the bufsize is 512.

    oc get configmap -n openshift-dns dns-default -o yaml
    
  10. After a few minutes, verify that the DNS resolution works. If you see an nslookup error, retry the nslookup.

    / # nslookup XXX.XXX.databases.appdomain.cloud
    Server:        XXX.XX.X.XX
    Address:    XXX.XX.X.XX:XX
    
    Non-authoritative answer:
    XXX.XXX.databases.appdomain.cloud    canonical name = icd-prod-us-south-db-lm0sr.us-south.containers.appdomain.cloud
    icd-prod-us-south-db-lm0sr.us-south.containers.appdomain.cloud    canonical name = icd-prod-us-south-db-lm0sr.XXX.akadns.net
    
    Non-authoritative answer:
    XXX.XXX.databases.appdomain.cloud    canonical name = icd-prod-us-south-db-lm0sr.us-south.containers.appdomain.cloud
    icd-prod-us-south-db-lm0sr.us-south.containers.appdomain.cloud    canonical name = icd-prod-us-south-db-lm0sr.XXX.akadns.net
    Name:    icd-prod-us-south-db-lm0sr.XXX.akadns.net
    Address: XXX.XX.XXX.XX
    Name:    icd-prod-us-south-db-lm0sr.XXX.akadns.net
    Address: XXX.XX.X.XX
    Name:    icd-prod-us-south-db-lm0sr.XXX.akadns.net
    Address: XXX.XX.XXX.XXX