Running kubernetes kubeadm cluster behind a corporate firewall/proxy server
It looks like you are misunderstanding a few Kubernetes concepts that I'd like to help clarify here. References to node.kubernetes.io
is not an attempt make any network calls to that domain. It is simply the convention that Kubernetes uses to specify string keys. So if you ever have to apply labels, annotations, or tolerations, you would define your own keys like subdomain.domain.tld/some-key
.
As for the Calico issue that you are experiencing, it looks like the error:
network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: i/o timeout]
is our culprit here. 10.96.0.1
is the IP address used to refer to the Kubernetes API server within pods. It seems like the calico/node
pod running on your node is failing to reach the API server. Could you more context around how you set up Calico? Do you know what version of Calico you are running?
The fact that your calico/node
instance is trying to access the crd.projectcalico.org/v1/clusterinformations
resource tells me that it is using the Kubernetes datastore for its backend. Are you sure you're not trying to run Calico in Etcd mode?
It doesn't seem you have any problem pulling the image as you should see an ImagePullBackOff
status. (Although that may come later after the error message you are seeing)
The error you are seeing from your pods is related to them not being able to connect to the kube-apiserver internally. It looks like a timeout so most likely there's something with the kubernetes
service in your default namespace. You can check it like this, for example:
$ kubectl -n default get svcNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGEkubernetes ClusterIP 10.96.0.1 <none> 443/TCP 2d20h
It could be that is missing(?) You can always re-create it:
$ cat <<'EOF' | kubectl apply -f -apiVersion: v1kind: Servicemetadata: labels: component: apiserver provider: kubernetes name: kubernetes namespace: defaultspec: clusterIP: 10.96.0.1 type: ClusterIP ports: - name: https port: 443 protocol: TCP targetPort: 443EOF
The toleration is basically saying that the pod can tolerate being scheduled on a node that has the node.kubernetes.io/not-ready:NoExecute
and node.kubernetes.io/unreachable:NoExecute
taints but your error doesn't look like is related to that.
The issue normally means docker daemon is unable to respond.
If there any other service consuming more CPU or I/O, then this issue might occur.