Hashing, MACs and password storage
Hash properties, which algorithms are dead, HMAC, and how passwords should actually be stored.
~4 min read
What a hash is
A cryptographic hash function takes input of any size and produces a fixed-size digest (SHA-256: 256 bits). It's one-way, deterministic and fast. Three properties define a secure hash, and exams love asking for all three with definitions:
- Preimage resistance: given a digest h, you can't find any input m with hash(m) = h. (Can't reverse it.)
- Second-preimage resistance: given an input m₁, you can't find a different m₂ with the same hash. (Can't substitute a specific document.)
- Collision resistance: you can't find any two inputs with the same hash. (Strictly stronger; this is what falls first.)
The avalanche effect (flipping one input bit changes about half the output bits) is why hashes work for integrity checking: any tampering produces a wildly different digest.
Collisions must exist (infinite inputs, finite outputs); security means they're infeasible to find. The birthday paradox is why collision resistance is weaker than it looks: for an n-bit hash, collisions arrive after about 2^(n/2) attempts, not 2^n. So SHA-256 gives 128-bit collision resistance.
Algorithm status
| Algorithm | Output | Status |
|---|---|---|
| MD5 | 128-bit | Broken: collisions on demand. Never for security. |
| SHA-1 | 160-bit | Broken: practical collision demonstrated (Google's "SHAttered", 2017). Retired. |
| SHA-256 / SHA-512 (SHA-2) | 256/512-bit | Current standard, no practical attacks. |
| SHA-3 (Keccak) | variable | Secure; a structurally different backup to SHA-2, not its replacement. |
MD5 and SHA-1 remain fine for non-security jobs like detecting accidental corruption, and you'll still see MD5 in forensics for legacy compatibility, but anywhere an adversary could exploit a collision (signatures, certificates), they're disqualified.
Hashes alone don't authenticate
A hash proves integrity only if the attacker can't replace the hash too. Shipping a file with its SHA-256 on the same compromised server proves nothing. To bind integrity to a key, use a MAC (message authentication code): a tag computed from message + shared secret key.
HMAC is the standard construction; HMAC-SHA256 wraps the hash with the key in a specific double-hash pattern that defeats length extension attacks (a quirk of SHA-2's structure where H(secret‖message) lets attackers append data; HMAC exists partly because naive keyed hashing is broken).
MAC vs signature: both prove integrity and authenticity, but a MAC's key is shared, meaning either party could have produced it, so there is no non-repudiation. Signatures (private key) give non-repudiation. This contrast is a classic short-answer question.
Password storage
Passwords are the special case where fast hashing is a vulnerability. The threat: a stolen database of hashes attacked offline at billions of guesses per second on GPUs.
What "doing it right" means, layer by layer:
- Never store plaintext or encrypt passwords: encryption is reversible and the key lives somewhere.
- Salt: a unique random value per user, stored beside the hash. Salts kill rainbow tables (precomputed hash-to-password lookups) and ensure identical passwords hash differently. Salts are not secret.
- Use a deliberately slow, memory-hard function: not bare SHA-256. The accepted options: Argon2id (the 2015 Password Hashing Competition winner; preferred), scrypt, bcrypt, or PBKDF2 (the weakest acceptable, but FIPS-friendly). Memory-hardness specifically hurts GPU/ASIC crackers.
- A pepper: an additional secret kept outside the database (in an HSM or app config). A database-only breach therefore can't be cracked at all. Optional but increasingly common.
Current NIST guidance (SP 800-63-4, 2025) is worth quoting because it reversed older folklore: minimum length 15 characters for single-factor use (8 with MFA), support for long passphrases (≥64 chars), no composition rules (no "must contain a symbol"), no periodic expiry (change only on evidence of compromise), screen candidates against breached-password blocklists, and throttle online guessing. Length and uniqueness beat complexity theatre.
Other places hashes appear
- File/evidence integrity: forensic images are hashed at acquisition so any later change is provable (chain of custody).
- Deduplication and identification: malware samples referenced by SHA-256.
- Commit IDs: Git is a hash tree (now SHA-256-capable, historically SHA-1; the SHAttered attack is why it's migrating).
- Proof of work: Bitcoin mining is brute-forcing partial SHA-256 preimages.
Quick recall
- Three properties: preimage, second-preimage, collision resistance. Birthday paradox halves the bits for collisions.
- MD5 and SHA-1 are broken for security; SHA-256 is the default; SHA-3 is the structural alternative.
- HMAC = keyed hash done right; MAC gives no non-repudiation (shared key), signatures do.
- Passwords: salt per user + slow memory-hard KDF (Argon2id > scrypt/bcrypt > PBKDF2), optional pepper.
- 800-63-4: ≥15 chars single-factor, no composition rules, no forced expiry, blocklist + throttling.