Converting from self-signed to commercial certificate TLS error
When installing the cluster, I used a self-signed certificate from an internal CA authority. Everything was fine until I started getting certificate errors from the application I was deploying to the OKD cluster. We decided to stop trying to fix one bug all the time and just buy a commercial certificate and install it. So we bought a SAN certificate from GlobalSign with a wildcard (same as the one we originally got from our internal CA), and I'm trying to have a lot of problems installing it.
Keep in mind that I tried dozens of iterations here. I'm just documenting the last attempt I've tried to figure out what's wrong. This is on my test cluster (which is a VM server) and I revert to the snapshot every time. Snapshots are operational clusters that use internal CA certificates.
So my first step is to build the CAfile to pass. I downloaded GlobalSign's root and intermediate certificates and put them in a ca-globalsign.crt
file. (PEM format)
when i run
openssl verify -CAfile ../ca-globalsign.crt labtest.mycompany.com.pem
I get:
labtest.mycompany.com.pem: OK
and openssl x509 -in labtest.mycompany.com.pem -text -noout
give me (deleted)
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
(redacted)
Signature Algorithm: sha256WithRSAEncryption
Issuer: C=BE, O=GlobalSign nv-sa, CN=GlobalSign Organization Validation CA - SHA256 - G2
Validity
Not Before: Apr 29 16:11:07 2019 GMT
Not After : Apr 29 16:11:07 2020 GMT
Subject: C=US, ST=(redacted), L=(redacted), OU=Information Technology, O=(redacted), CN=labtest.mycompany.com
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
Public-Key: (2048 bit)
Modulus:
(redacted)
Exponent: 65537 (0x10001)
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature, Key Encipherment
Authority Information Access:
CA Issuers - URI:http://secure.globalsign.com/cacert/gsorganizationvalsha2g2r1.crt
OCSP - URI:http://ocsp2.globalsign.com/gsorganizationvalsha2g2
X509v3 Certificate Policies:
Policy: 1.3.6.1.4.1.4146.1.20
CPS: https://www.globalsign.com/repository/
Policy: 2.23.140.1.2.2
X509v3 Basic Constraints:
CA:FALSE
X509v3 Subject Alternative Name:
DNS:labtest.mycompany.com, DNS:*.labtest.mycompany.com, DNS:*.apps.labtest.mycompany.com
X509v3 Extended Key Usage:
TLS Web Server Authentication, TLS Web Client Authentication
X509v3 Subject Key Identifier:
(redacted)
X509v3 Authority Key Identifier:
(redacted)
(redacted)
on my local computer. Everything I know about SSL indicates that the certificate is fine. These new files are placed in the project I use to save the configuration, such as OKD install.
Then, I updated the cert file in the ansible inventory project and ran the command
ansible-playbook -i ../okd_install/inventory/okd_labtest_inventory.yml playbooks/redeploy-certificates.yml
When I read the documentation, everything tells me that I should simply roll over its process and come up with a new certificate. This is not going to happen. When I use it openshift_master_overwrite_named_certificates: false
in the manifest file , the install completes, but it only replaces the certificate on the *.apps.labtest
domain , but console.labtest
keeps the original , but does come online, except the monitoring says in the cluster console .bad gateway
Now if I try to run the command again, with openshift_master_overwrite_named_certificates: true
my command /var/log/containers/master-api*.log
it will be full of errors like this
{"log":"I0507 15:53:28.451851 1 logs.go:49] http: TLS handshake error from 10.128.0.56:46796: EOF\n","stream":"stderr","time":"2019-05-07T19:53:28.451894391Z"}
{"log":"I0507 15:53:28.455218 1 logs.go:49] http: TLS handshake error from 10.128.0.56:46798: EOF\n","stream":"stderr","time":"2019-05-07T19:53:28.455272658Z"}
{"log":"I0507 15:53:28.458742 1 logs.go:49] http: TLS handshake error from 10.128.0.56:46800: EOF\n","stream":"stderr","time":"2019-05-07T19:53:28.461070768Z"}
{"log":"I0507 15:53:28.462093 1 logs.go:49] http: TLS handshake error from 10.128.0.56:46802: EOF\n","stream":"stderr","time":"2019-05-07T19:53:28.463719816Z"}
and these
{"log":"I0507 15:53:29.355463 1 logs.go:49] http: TLS handshake error from 10.70.25.131:44424: remote error: tls: bad certificate\n","stream":"stderr","time":"2019-05-07T19:53:29.357218793Z"}
{"log":"I0507 15:53:29.357961 1 logs.go:49] http: TLS handshake error from 10.70.25.132:43128: remote error: tls: bad certificate\n","stream":"stderr","time":"2019-05-07T19:53:29.358779155Z"}
{"log":"I0507 15:53:29.357993 1 logs.go:49] http: TLS handshake error from 10.70.25.132:43126: remote error: tls: bad certificate\n","stream":"stderr","time":"2019-05-07T19:53:29.358790397Z"}
{"log":"I0507 15:53:29.405532 1 logs.go:49] http: TLS handshake error from 10.70.25.131:44428: remote error: tls: bad certificate\n","stream":"stderr","time":"2019-05-07T19:53:29.406873158Z"}
{"log":"I0507 15:53:29.527221 1 logs.go:49] http: TLS handshake error from 10.70.25.132:43130: remote error: tls: bad certificate\n","stream":"stderr","time":"2019-05-07T19:53
And the installation hangs the ansible task TASK [Remove web console pods]
. It will sit there for hours. When entering the main console and running oc get pods
in openshift-web-console
its terminating
state . When I describe the pod that tries to start pending
, then it comes back saying the hard drive is full. I say this because it cannot communicate with the storage system due to all the TLS errors above. It just stays there. If I force delete the terminated pod, then restart the master pod, then delete the new pod that I tried to start, then restart again, the cluster can be restored. Then, the web console comes online, but all my log files are flooded with these TLS errors. More concerning, however, is that the installation hangs at that location, so I'm assuming there are other steps that are causing me problems after bringing the web console online as well.
So I have also tried redeploying the server CA. This created a problem because my new certificate was not a CA certificate. Then when I just ran the redeployed CA playbook, the script to have the cluster recreate the server CA, it completed, but when I tried to run redeploy-certificates.yml
, I got the same result.
Here is my inventory file
all:
children:
etcd:
hosts:
okdmastertest.labtest.mycompany.com:
masters:
hosts:
okdmastertest.labtest.mycompany.com:
nodes:
hosts:
okdmastertest.labtest.mycompany.com:
openshift_node_group_name: node-config-master-infra
okdnodetest1.labtest.mycompany.com:
openshift_node_group_name: node-config-compute
openshift_schedulable: True
OSEv3:
children:
etcd:
masters:
nodes:
# https://docs.okd.io/latest/install_config/persistent_storage/persistent_storage_glusterfs.html#overview-containerized-glusterfs
# https://github.com/openshift/openshift-ansible/tree/master/playbooks/openshift-glusterfs
# glusterfs:
vars:
openshift_deployment_type: origin
ansible_user: root
openshift_master_cluster_method: native
openshift_master_default_subdomain: apps.labtest.mycompany.com
openshift_install_examples: true
openshift_master_cluster_hostname: console.labtest.mycompany.com
openshift_master_cluster_public_hostname: console.labtest.mycompany.com
openshift_hosted_registry_routehost: registry.apps.labtest.mycompany.com
openshift_certificate_expiry_warning_days: 30
openshift_certificate_expiry_fail_on_warn: false
openshift_master_overwrite_named_certificates: true
openshift_hosted_registry_routetermination: reencrypt
openshift_master_named_certificates:
- certfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.pem"
keyfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.key"
cafile: "/Users/me/code/devops/okd_install/certs/ca-globalsign.crt"
names:
- "console.labtest.mycompany.com"
# - "labtest.mycompany.com"
# - "*.labtest.mycompany.com"
# - "*.apps.labtest.mycompany.com"
openshift_hosted_router_certificate:
certfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.pem"
keyfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.key"
cafile: "/Users/me/code/devops/okd_install/certs/ca-globalsign.crt"
openshift_hosted_registry_routecertificates:
certfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.pem"
keyfile: "/Users/me/code/devops/okd_install/certs/labtest/commercial.04.29.2019.labtest.mycompany.com.key"
cafile: "/Users/me/code/devops/okd_install/certs/ca-globalsign.crt"
# LDAP auth
openshift_master_identity_providers:
- name: 'mycompany_ldap_provider'
challenge: true
login: true
kind: LDAPPasswordIdentityProvider
attributes:
id:
- dn
email:
- mail
name:
- cn
preferredUsername:
- sAMAccountName
bindDN: '[email protected]'
bindPassword: (redacted)
insecure: true
url: 'ldap://dc-pa1.int.mycompany.com/ou=mycompany,dc=int,dc=mycompany,dc=com'
What am I missing here? I think this redeploy-certificates.yml
playbook is designed to renew certificates. Why can't I convert it to a new commercial certificate? It's almost like replacing the certificate on the router (kinda), but screwing up the internal server certificate in the process. I really want to end here, I don't know what else to try.
You should configure openshift_master_cluster_hostname
and openshift_master_cluster_public_hostname
different hosts from each other. These two hostnames should also be resolved by DNS. Your commercial certificate will be used as an external access point.
The openshift_master_cluster_public_hostname and openshift_master_cluster_hostname parameters in the Ansible inventory file, by default /etc/ansible/hosts, must be different.
If they are the same, the named certificates will fail and you will need to re-install them.
# Native HA with External LB VIPs
openshift_master_cluster_hostname=internal.paas.example.com
openshift_master_cluster_public_hostname=external.paas.example.com
Also, you'd better configure certificates for each component step by step for testing purposes. For example, first, configure a custom master host certificate , then verify it. Then, configure a custom wildcard certificate for the default router and verify it. and many more. If you can successfully complete all the tasks to redeploy the certificate, then finally you can run with full parameters to maintain the commercial certificate.
For more details, see Configuring Custom Certificates . hope it helps you.