How to use HttpClient when using an SSL certificate, no matter how "bad" it is
The web crawler I am using Apache's HttpClient is only crawling public data.
I would like it to be able to crawl a website with an invalid certificate, no matter how invalid.
My crawler does not pass any usernames, passwords etc. and no sensitive data is sent or received.
For this use case, I would scrape http
the site's version if it exists, but sometimes it certainly isn't.
How can this be done with Apache's HttpClient ?
I tried this one like some suggested , but they still fail with some invalid certificates, for example:
failed for url:https://dh480.badssl.com/, reason:java.lang.RuntimeException: Could not generate DH keypair
failed for url:https://null.badssl.com/, reason:Received fatal alert: handshake_failure
failed for url:https://rc4-md5.badssl.com/, reason:Received fatal alert: handshake_failure
failed for url:https://rc4.badssl.com/, reason:Received fatal alert: handshake_failure
failed for url:https://superfish.badssl.com/, reason:Connection reset
Note that I tried this with my $JAVA_HOME/jre/lib/security/java.security
files jdk.tls.disabledAlgorithms
set to None to ensure this wasn't an issue, I still get glitches like the above.
Short answer to your question, which is to specifically trust all certificates, would use TrustAllStrategy , doing something like this:
SSLContextBuilder sslContextBuilder = new SSLContextBuilder();
sslContextBuilder.loadTrustMaterial(null, new TrustAllStrategy());
SSLConnectionSocketFactory socketFactory = new SSLConnectionSocketFactory(
sslContextBuilder.build());
CloseableHttpClient httpclient = HttpClients.custom().setSSLSocketFactory(
socketFactory).build();
But... an invalid certificate might not be your main problem. A handshake_failure can happen for a number of reasons, but in my experience it's usually due to an SSL/TLS version mismatch or a cipher suite negotiation failure. It doesn't mean that the SSL certificate is "bad", it's just a mismatch between the server and the client. You can clearly see the handshake using a tool like Wireshark (no more on that )
While Wireshark can greatly see its failures, it won't help you come up with a solution. Whenever I've walked about in the past, I've debugged handshake_failures and found this tool especially useful: https://testssl.sh/
You can point this script at any down site to learn more about what protocols are available on the target and what your client needs to support in order to establish a successful handshake. It will also print information about the certificate.
For example (showing only the part of the output of 2 testssl.sh):
./testssl.sh www.google.com
....
Testing protocols (via sockets except TLS 1.2, SPDY+HTTP2)
SSLv2 not offered (OK)
SSLv3 not offered (OK)
TLS 1 offered
TLS 1.1 offered
TLS 1.2 offered (OK)
....
Server Certificate #1
Signature Algorithm SHA256 with RSA
Server key size RSA 2048 bits
Common Name (CN) "www.google.com"
subjectAltName (SAN) "www.google.com"
Issuer "Google Internet Authority G3" ("Google Trust Services" from "US")
Trust (hostname) Ok via SAN and CN (works w/o SNI)
Chain of trust "/etc/*.pem" cannot be found / not readable
Certificate Expiration expires < 60 days (58) (2018-10-30 06:14 --> 2019-01-22 06:14 -0700)
....
Testing all 102 locally available ciphers against the server, ordered by encryption strength
(Your /usr/bin/openssl cannot show DH/ECDH bits)
Hexcode Cipher Suite Name (OpenSSL) KeyExch. Encryption Bits
------------------------------------------------------------------------
xc030 ECDHE-RSA-AES256-GCM-SHA384 ECDH AESGCM 256
xc02c ECDHE-ECDSA-AES256-GCM-SHA384 ECDH AESGCM 256
xc014 ECDHE-RSA-AES256-SHA ECDH AES 256
xc00a ECDHE-ECDSA-AES256-SHA ECDH AES 256
x9d AES256-GCM-SHA384 RSA AESGCM 256
x35 AES256-SHA RSA AES 256
xc02f ECDHE-RSA-AES128-GCM-SHA256 ECDH AESGCM 128
xc02b ECDHE-ECDSA-AES128-GCM-SHA256 ECDH AESGCM 128
xc013 ECDHE-RSA-AES128-SHA ECDH AES 128
xc009 ECDHE-ECDSA-AES128-SHA ECDH AES 128
x9c AES128-GCM-SHA256 RSA AESGCM 128
x2f AES128-SHA RSA AES 128
x0a DES-CBC3-SHA RSA 3DES 168
So using this output we can see that if your client only supports SSLv3, the handshake will fail because that protocol is not supported by the server. The protocol product is impossible to question, but you can check your Java client support by double-clicking to get the enabled protocol list. You can provide an overridden implementation of the SSLConnectionSocketFactory from the code snippet above to get the list of enabled/supported protocols and cipher suites as follows ( SSLSocket ):
class MySSLConnectionSocketFactory extends SSLConnectionSocketFactory {
@Override
protected void prepareSocket(SSLSocket socket) throws IOException {
System.out.println("Supported Ciphers" + Arrays.toString(socket.getSupportedCipherSuites()));
System.out.println("Supported Protocols" + Arrays.toString(socket.getSupportedProtocols()));
System.out.println("Enabled Ciphers" + Arrays.toString(socket.getEnabledCipherSuites()));
System.out.println("Enabled Protocols" + Arrays.toString(socket.getEnabledProtocols()));
}
}
I often encounter handshake_failure when there is a cipher suite negotiation failure. To avoid this error, the client's list of supported cipher suites must contain at least one matching cipher suite from the server's list of supported cipher suites.
If the server requires cipher suites based on AES256, you may need Java Cryptography Extensions (JCE). These libraries are country-restricted and they may not be available outside the United States.
More about restrictions on ciphers, if you're interested: https://crypto.stackexchange.com/questions/20524/why-there-are-limitations-on-using-encryption-with-keys-beyond-certain-length