Cryptographic Hashing: An Overview
Feb 10, 2020 | By Andrew Merenbach
I'm a security engineer here at FloQast. Our security team crosscuts all organizational functions to manage the risk facing the company. Today I wanted to talk about hashing, which can serve several important cryptographic purposes in a SaaS application.
Cryptographic Hashing involves processing data mathematically to derive an output that we can use as a placeholder for the original. Given a particular input, a hash (or digest) of that input is generally:
- A one-way, many-to-one mapping of input to output;
- Based on a mathematical formula;
- Subject to significant output changes for even slight input changes;
- Going to output a long number of a fixed length;
- Able to uniquely identify the input but not reveal the input.
(N.b.: Hashing functions aren't only for cryptography. The above won't always hold 100% true for other use cases.)
Cryptographic hashing is not encryption because it's not supposed to be reversible. Sure, technically you could encrypt data into something that looks the same and throw away the key, but it could still be decoded if someone somehow got a copy of the key. Hashing instead is more like a fingerprint. A fingerprint doesn't give us the name, age, height, weight, or favorite color of the person with whom it's associated. Likewise, a message digest doesn't tell us the length of the input, whether the input was an image or text or something else, whether the word
FloQast appears in the input, and so on.
Let's hash the message
HELLO, WORLD! in the macOS terminal with a 256-bit Secure Hash Algorithm (
% echo -n 'HELLO, WORLD!' | shasum -a 256 b8d28d44584a6440028c72b4c7e774b11331e8f6f3cbae8ed482aef9c27fef74 - % echo -n 'HELLO, WORLD!' | shasum -a 256 b8d28d44584a6440028c72b4c7e774b11331e8f6f3cbae8ed482aef9c27fef74 - % echo -n 'HELLO, WORLD!' | shasum -a 256 b8d28d44584a6440028c72b4c7e774b11331e8f6f3cbae8ed482aef9c27fef74 -
No matter how many times we create a SHA-256 digest of the message, the output will remain the same — it is deterministic. This applies across multiple computers and processor architectures. Because the output length is fixed (always 256 bits) and because this is shorter than the maximum possible input length, this determinism will sometimes lead to different inputs producing the same output. This is known as a collision.
Collision can result in all sorts of security mayhem, from impersonation and forgeries to broken source control systems. A good crpytographic hashing algorithm will not only minimize the likelihood of collisions, but also make them hard to predict. If we modify our above example to make one of the letters lower-case, we can see the output has totally changed:
% echo -n 'HeLLO, WORLD!' | shasum -a 256 6ef325dd396b1ce20cf8cc3815c9ceba0a9daaeb835d24f967d363ff65b0a78c -
While a collision may exist for either of these outputs and this particular digest algorithm, we're not likely to find it by simply substituting out letters.
Because we can't effectively work backward from the output to the input, we can save a copy of the output and use it later. If we do this for a file instead of a string, we can check to see if the contents have changed at all:
% echo helloworld > message.txt % shasum -a 256 message.txt 8cd07f3a5ff98f2a78cfc366c13fb123eb8d29c1ca37c79df190425d5b9e424d message.txt % echo Helloworld > message.txt % shasum -a 256 message.txt 733b8d6bf076298654e1aa28d26a47f43f1cc0958476e19fc09444da2e7884de message.txt
We can take advantage of this method to see if the contents of a file have changed, such as a copy of a contract or an important piece of source code. Because the SHA family of ciphers (along with some others) are designed to be fast, they are ideal for validating integrity of files to see if they've been tampered with or possibly corrupted accidentally, such as over an unreliable network connection.
You might even think to use SHA-256 for password hashing in a database. The problem is that because it's so fast, it won't do much to slow down an attacker getting plaintext passwords. Anyone with a copy of the database could precompute the hash values, making it possible to suss out every row with a password of
hello123 in a matter of microseconds.
To help combat this, using a slower algorithm or key derivation function designed for password-hashing can help stymie these sort of attacks without inconveniencing legitimate users. In its simplest form, these processes simply makes hashing take orders of magnitude longer. This makes precomputing much more expensive in terms of time.
What if we could also make it possible for multiple rows to have the same password without their hashes being the same? Enter salting.
A salt is a piece of randomly-generated data that is prepended or appended as part of the cryptographic hashing process, then stored with the password. It's not secret, not at all, and its primary purpose is to make it impossible to precompute hashes. Combined with a slower hashing function and a hacker who wants to break an entire database of passwords is going to have to go one row at a time, potentially very slowly. Let's suppose that your salt is two bytes long:
% echo -n '2sHELLO, WORLD!' | shasum -a 256 da1c8986242f128d2dea484483446b90dfa3a35b613121ca1de76eff68380907 - % echo -n 'xfHELLO, WORLD!' | shasum -a 256 563f022aee539a070ebec1b1afd4c4e6aaab141ecff2c1886c4982cd084f9bc8 -
We prepend the two bytes to our original string
HELLO, WORLD! and take advantage of how drastically the input changes. The only catch is that when we store the hash, we need to be sure to store the salt as well!
It's now 2020. Maybe there's an even better option: Single Sign-On, or SSO. This allows a service to not store any password data at all, instead relying on some sort of outside authority such as Google, GitHub, or a directory service to deal with password security. Then when you try to sign into the service, you can choose the option to sign in with an external account. For instance, many GitHub integrations give you the option of signing in with GitHub: when you try to sign in, you will be taken to the GitHub login page. Once you login is complete, you'll return to the integration page. They never get a copy of your password.
Although we provide both password-based and SSO authentication for our customers, FloQast strongly encourages everyone to make use of their own organization's directory system. This makes onboarding faster, encourages self-service, and reduces the number of passwords that people have to create. Better security for everyone!