Certificates Explained

Wed 23 August 2017
Technology

Why am I writing this?

First, these are my humble non-security-expert opinions. I can keep my computers safe, and I can make sure my programs are using proper security protocols.

When I work with others and the subject of HTTPS certificates comes up, there is often a level of uncertainty about getting involved. Two great things about website security:

There's a large community of brilliant people behind it, always improving it.
It's easy for us to "stand on their shoulders" and benefit from it.

By the end of this article, you should feel more comfortable with understanding website security. If you read about "POODL compromises SSL," or "Deprecating TLS 1.0" then you will know how it will eventually impact you.

Understanding Certificates and PKI

The first thing to understand about certificates is PKI, or public key infrastructure. Simply put, it's like a pair of "best friend" keychains where the two heart halves make one heart. With PKI, you have two keys, a private key and a public key. Both are needed for one encryption. You keep the private key for yourself, and you share the public key with anyone you want (not just one person like the keychain example, you share the "one" public key with everyone). If someone wants to encrypt a message for you to read, they encrypt it with your public key, and you can unencrypt it with your private key. If you want to sign a document, you use your private key. Anyone can validate the signature using your public key.

So thinking of it in terms of web browsing, here is a basic, high-level explanation:

If you can check the signature of a web site's certificate, then there's some confidence they are who they say they are. They sign it with their private key, and you can verify it with your public key.
Using their public certificate, you can encrypt a message and send it to their website, starting an encrypted tunnel between the two of you. They can decrypt it with their private key.

Certificate Authorities and Keystores

So how do you trust that web server, and that the certificate is legitimate? Probably because it has been issued (signed) by a Certificate Authority (CA) that your machine already trusts.

A CA takes the time and diligence to validate a certificate request, sign the certificate, and issue it to the requestor. The CAs keep their master private certificate under tight wraps, and share their public certificates with companies like Microsoft, Apple and Oracle(Java). Those companies maintain a keystore of certificates from the CAs they trust, and they distribute it to all their installs via software updates. If a CA cert is compromised, then the companies will remove that cert from their keystore and send out an update. (This is one reason why you should not ignore software updates)

As long as as the website your are visiting has a certificate issued by a trusted CA, then you can confidently assume that your communicate is safely encrypted. Your browser will warn you about websites that are not using trusted certificates.

If you visit a site that has a self-signed certificate (usually temporary) or has a certificate from some other CA, and you want to trust that certificate, you can import that certificate into your keystore. This gives you the ability to enhance the trust that the software companies above provide.

One final note about certificates! CAs like LetsEncrypt.com provide certificates if one can prove he is in control of the web server content. CAs like Verisign often provide certificates if one can prove he represents a registered business at a physical address. The former provides you with secure encryption, while the latter also provides you with validation of the site owner.

Hashing

So, how can you be confident that the signed certificate from a website isn't hacked by someone else? That's where hashing has a purpose. Hashing is basically taking the every byte of a document as a number and adding it to a total. Two important points:

The total has to stay within a limited size; if the math results in a value bigger than can fit, then it will be reduced to something that does fit. So for example, if a hashing algorithm was going to provide a 4 digit number as a result, the number would be between 0000 and 9999. If somehow during the adding the total went over 9999, there would be some logic to get it back within the acceptable range.

The math has to cause a lot of entropy. If you simply add one more character to the document, or offset one character's value by one (for example, change b to c), the resulting hashes must be completely different.

Those two points make it very difficult for someone to produce fake signatures (also known as a "collision").

For illustration, here are a few md5 hashes.

60b725f10c9c85c70d97880dfe8191b3 (this is for a document containing just the letter a)
3b5d5c3712955042212316173ccf37be (this is for a document containing just the letter b)
2bc412f4f5ce0e545ce77ec0ef2cda14 (this is for a 120 megabyte document on my machine)

Creating collisions is very difficult, but not impossible, using the MD5 algorithm. That's why there are others like SHA1 and SHA256, where collisions are very, very, very difficult. There are many more, too.

One final point about hashing that you may have already guessed: is a one-way process. You cannot "un-hash" something to get the original content.

Ciphers

While hashing is good for signing and validating a document, ciphers handle the encrypting and unencrypting of complete documents. You are probably already familiar with ciphers. As kids we had the letter swapping cipher, substituting A with B, B with C and so on. In WWII the Germans used the Enigma machine. Today there are many, many different ciphers and there are new ones in the works. Institutions like NIST hold competitions and decide which "winners" can be used going forward.

Protocols

Protocols are formal rules that, as far as we are concerned here, document how a web browser and server establish encrypted communication.

Using the cipher example of swapping letters, let's make up a protocol. Alice, Bob are in a class together, with Eve sitting between them. Alice and Bob write notes to each other and pass them via Eve. They notice Eve is peeking at the notes, so they decide to use a cipher: in the note, replace A with B, B with C, etc. Eve can no longer read the notes!

That doesn't last long, as Bob carelessly left a decrypted note in the trash and Eve figured out the substitution cipher. Eve is right back into it, reading their messages. Alice and Bob come up with a new protocol. First, Alice will "cough" when she wants to send a note. Bob will send a fake message to Alice, and the first letter is the new substitute for A. For example, if the fake message starts with R, then in the real notes, A will be replaced with R, B will be replaced with S and so on. Alice then sends her real message to Bob. Eve is back in the dark!

So the first messaging protocol that in this example was simply using cipher. The next protocol involved a signal (cough), a seed (fake message) and a cipher.

In the world of web browsing, Alice and Bob are the web client/browser and the web server. Eve is the man in the middle (MITM), and is everywhere between! Some examples of MITM are internet service providers, company computer networks and perhaps even malware running on your computer. So of course, the protocols are a little more sophisticated than the old cough trick that Alice and Bob used so well.

Protocols have been evolving since we have needed private browsing sessions. SSL versions 1, 2 and 3 are now deprecated. So are TLS versions 1.0 and 1.1. As I write this 1.2 is the currently preferred protocol, and 1.3 is on the way.

Pulling it all together

At a high level, when you start a private web browsing session:
The web client/browser requests a URL.
The server sends you their signed certificate document and their public key.
The web client verifies the certificate with a CA.
The web client negotiates with the server on what protocols, hashes and ciphers can be used in their conversation.
The web client sends an encrypted message to the server

That's all there is to it.

Wrapping up

First, seeing that for security you depend on the CAs, and you depend on your OS trusting the CAs, keep your OS up to date! Of course, your OS will keep the encryption components up to date, too.

Here are some examples of "what does it mean?"

POODL attack

At a time when SSL 1 and 2 where deprecated, some Google employees found that you could trick a web server set up with SSL3 to fall back to SSL1. Web servers had to be fixed so they would no longer do this. We clients had to be updated so they would not allow this.

SHA1 collision

So MD5 and SHA1 have already had demonstrations of collisions, meaning someone can modify the document and have the hash remain the same. The more complicated the algorithm, the harder it is to do. The SHA1 collision was very hard to do, and took a lot of resources (cloud computing, lots of \$\$\$). In theory no one is going to do this. Today. For web browsing and e-commerce, it's best to move on to using SHA256 and other algorithms. If you are downloading a program from apache.org, and it is verified with an MD5 sum – meh, who's going to break in to apache.org's site and spend lots of money on finding a collision, just so they put their malware into a software package? And keep doing that, seeing that their downloads are always being updated?

Self-signed Sites

When you are working in your company and you come across a self-signed site, it's probably OK. If you fire up a web browser with a self-signed cert, you can share that cert with people in your organization. Tell them they can trust the cert. They can import it in their keystore, or do it via their web browser, when visiting the web server. No CA needed.

If you want a properly signed-cert, you can submit that public certificate in a certificate signature request to your company's CA. Seeing that all the other machines in the company already trust that CA. Once you install the signed cert from the CA, the other machines will implicitly trust that certificate.

Protocol Deprecation

"After x, we will no longer support TLS 1.0, switch to using TLS 2.0 now." That's usually a message from the web server folks, giving you fair warning to update your web clients. Update your OS, update Java, .NET or whatever your HTTP client program uses.

"What if it's the other way around? After my Java update, my client no longer connects to our old SSL 3 server, and we need to keep it as SSL3." There are ways to configure Java's security files, so it will allow the use of older protocols, ciphers, etc. It can be done. A better solution: the web server should probably make it's SSL 3 web server reachable only by a newer web server, which set up as a reverse proxy. If you can't control the web server and you must modify your client, then you have no choice; be careful with what you send over the wire!