cyber revision

Hashing, MACs and password storage

Hash properties, which algorithms are dead, HMAC, and how passwords should actually be stored.

~4 min read

What a hash is

A cryptographic hash function takes input of any size and produces a fixed-size digest (SHA-256: 256 bits). It's one-way, deterministic and fast. Three properties define a secure hash, and exams love asking for all three with definitions:

  1. Preimage resistance: given a digest h, you can't find any input m with hash(m) = h. (Can't reverse it.)
  2. Second-preimage resistance: given an input m₁, you can't find a different m₂ with the same hash. (Can't substitute a specific document.)
  3. Collision resistance: you can't find any two inputs with the same hash. (Strictly stronger; this is what falls first.)

The avalanche effect (flipping one input bit changes about half the output bits) is why hashes work for integrity checking: any tampering produces a wildly different digest.

Collisions must exist (infinite inputs, finite outputs); security means they're infeasible to find. The birthday paradox is why collision resistance is weaker than it looks: for an n-bit hash, collisions arrive after about 2^(n/2) attempts, not 2^n. So SHA-256 gives 128-bit collision resistance.

Algorithm status

Algorithm Output Status
MD5 128-bit Broken: collisions on demand. Never for security.
SHA-1 160-bit Broken: practical collision demonstrated (Google's "SHAttered", 2017). Retired.
SHA-256 / SHA-512 (SHA-2) 256/512-bit Current standard, no practical attacks.
SHA-3 (Keccak) variable Secure; a structurally different backup to SHA-2, not its replacement.

MD5 and SHA-1 remain fine for non-security jobs like detecting accidental corruption, and you'll still see MD5 in forensics for legacy compatibility, but anywhere an adversary could exploit a collision (signatures, certificates), they're disqualified.

Hashes alone don't authenticate

A hash proves integrity only if the attacker can't replace the hash too. Shipping a file with its SHA-256 on the same compromised server proves nothing. To bind integrity to a key, use a MAC (message authentication code): a tag computed from message + shared secret key.

HMAC is the standard construction; HMAC-SHA256 wraps the hash with the key in a specific double-hash pattern that defeats length extension attacks (a quirk of SHA-2's structure where H(secret‖message) lets attackers append data; HMAC exists partly because naive keyed hashing is broken).

MAC vs signature: both prove integrity and authenticity, but a MAC's key is shared, meaning either party could have produced it, so there is no non-repudiation. Signatures (private key) give non-repudiation. This contrast is a classic short-answer question.

Password storage

Passwords are the special case where fast hashing is a vulnerability. The threat: a stolen database of hashes attacked offline at billions of guesses per second on GPUs.

What "doing it right" means, layer by layer:

  1. Never store plaintext or encrypt passwords: encryption is reversible and the key lives somewhere.
  2. Salt: a unique random value per user, stored beside the hash. Salts kill rainbow tables (precomputed hash-to-password lookups) and ensure identical passwords hash differently. Salts are not secret.
  3. Use a deliberately slow, memory-hard function: not bare SHA-256. The accepted options: Argon2id (the 2015 Password Hashing Competition winner; preferred), scrypt, bcrypt, or PBKDF2 (the weakest acceptable, but FIPS-friendly). Memory-hardness specifically hurts GPU/ASIC crackers.
  4. A pepper: an additional secret kept outside the database (in an HSM or app config). A database-only breach therefore can't be cracked at all. Optional but increasingly common.

Current NIST guidance (SP 800-63-4, 2025) is worth quoting because it reversed older folklore: minimum length 15 characters for single-factor use (8 with MFA), support for long passphrases (≥64 chars), no composition rules (no "must contain a symbol"), no periodic expiry (change only on evidence of compromise), screen candidates against breached-password blocklists, and throttle online guessing. Length and uniqueness beat complexity theatre.

Other places hashes appear

  • File/evidence integrity: forensic images are hashed at acquisition so any later change is provable (chain of custody).
  • Deduplication and identification: malware samples referenced by SHA-256.
  • Commit IDs: Git is a hash tree (now SHA-256-capable, historically SHA-1; the SHAttered attack is why it's migrating).
  • Proof of work: Bitcoin mining is brute-forcing partial SHA-256 preimages.

Quick recall

  • Three properties: preimage, second-preimage, collision resistance. Birthday paradox halves the bits for collisions.
  • MD5 and SHA-1 are broken for security; SHA-256 is the default; SHA-3 is the structural alternative.
  • HMAC = keyed hash done right; MAC gives no non-repudiation (shared key), signatures do.
  • Passwords: salt per user + slow memory-hard KDF (Argon2id > scrypt/bcrypt > PBKDF2), optional pepper.
  • 800-63-4: ≥15 chars single-factor, no composition rules, no forced expiry, blocklist + throttling.
PreviousNext