Debugging the health of the location control plane
When you create a Satellite location, IBM automatically sets up a master for the location control plane in IBM Cloud. Additionally, you must assign at least three hosts to the Satellite location control plane as worker nodes to run location components that IBM configures. If the location control plane that runs on your hosts has issues, you can debug the location control plane.
- Get your Satellite location ID.
ibmcloud sat location ls
- List the Hostnames of the subdomains for your location control plane hosts.
Example outputibmcloud sat location dns ls --location <location_name_or_ID>
Retrieving location subdomains... OK Hostname Records Health Monitor SSL Cert Status SSL Cert Secret Name Secret Namespace ne1d37313068166254bcb-edfc0a8ba65085c5081eced6816c5b9c-c000.us-east.satellite.appdomain.cloud 169.62. 196.20,169.62.196.23,169.62.196.30 None created ne1d37313068166254bcb-edfc0a8ba65085c5081eced6816c5b9c-c000 default ne1d37313068166254bcb-edfc0a8ba65085c5081eced6816c5b9c-c001.us-east.satellite.appdomain.cloud 169.62. 196.30 None created ne1d37313068166254bcb-edfc0a8ba65085c5081eced6816c5b9c-c001 default ne1d37313068166254bcb-edfc0a8ba65085c5081eced6816c5b9c-c002.us-east.satellite.appdomain.cloud 169.62. 196.20 None created ne1d37313068166254bcb-edfc0a8ba65085c5081eced6816c5b9c-c002 default ne1d37313068166254bcb-edfc0a8ba65085c5081eced6816c5b9c-c003.us-east.satellite.appdomain.cloud 169.62. 196.23 None created ne1d37313068166254bcb-edfc0a8ba65085c5081eced6816c5b9c-c003 default ne1d37313068166254bcb-edfc0a8ba65085c5081eced6816c5b9c-ce00.us-east.satellite.appdomain.cloud ne1d37313068166254bcb-edfc0a8ba65085c5081eced6816c5b9c-c000.us-east.satellite.appdomain.cloud None created ne1d37313068166254bcb-edfc0a8ba65085c5081eced6816c5b9c-ce00 default
- Check the health of the control plane location subdomains by curling each hostname endpoint. If the endpoint returns a
200
response for each host, the control plane node is healthy and serving Kubernetes traffic. If not, continue to the next step.
Example output of a failed responsecurl -v http://<hostname>:30000
Example output of a* Rebuilt URL to: http://169.xx.xxx.xxx:30000/ * Trying 169.xx.xxx.xxx... * TCP_NODELAY set * Connection failed * connect to 169.xx.xxx.xxx port 30000 failed: Operation timed out * Failed to connect to 169.xx.xxx.xxx port 30000: Operation timed out * Closing connection 0 curl: (7) Failed to connect to 169.xx.xxx.xxx port 30000: Operation timed out
200
response* Rebuilt URL to: http://169.xx.xxx.xxx:30000/ * Trying 169.xx.xxx.xxx... * TCP_NODELAY set * Connected to 169.xx.xxx.xxx (169.xx.xxx.xxx) port 30000 (#0) > GET / HTTP/1.1 > Host: 169.xx.xxx.xxx:30000 > User-Agent: curl/7.54.0 > Accept: */* > < HTTP/1.1 200 OK < content-length: 58 < cache-control: no-cache < content-type: text/html < connection: close < <html><body><h1>200 OK</h1> Service ready. </body></html> * Closing connection 0
- Find the ID of the host that did not return a
200
response. You can compare theHost: 169.xx.xxx.xxx
from the previous step with the Worker IP in the output of the following command.
Example outputibmcloud sat host ls --location <location_ID> | grep infrastructure
Name ID State Status Cluster Worker ID Worker IP host1 aaaaa1a11aaaaaa111aa assigned Ready infrastructure sat-virtualser-1234... 169.xx.xxx.xxx host2 bbbbbbb22bb2bbb222b2 assigned Ready infrastructure sat-virtualser-1234... 169.xx.xxx.xxx host3 ccccc3c33ccccc3333cc assigned Ready infrastructure sat-virtualser-1234... 169.xx.xxx.xxx
- Add a host to the control plane in the same zone so that the location control plane has enough compute resources to continue running when you remove the unhealthy host.
- Remove the unhealthy host from the location control plane.
- Optional: You can reload the operating system on the unhealthy host and try to attach and assign the host to IBM Cloud Satellite again.