In cryptography, a collision attack on a cryptographic hash tries to find two arbitrary inputs that will produce the same hash value, i.e. a hash collision. In contrast to a preimage attack, neither the hash value nor one of the inputs is specified.
There are roughly two types of collision attacks:
Collision attack: Find two arbitrary different messages m1 and m2 such that hash hash.
Chosen-prefix collision attack: Given two different prefixes p1, p2 find two appendages m1 and m2 such that hash hash .
Classical collision attack
Mathematically stated, a collision attack finds two different messages m1 and m2, such that hash hash. In a classical collision attack, the attacker has no control over the content of either message, but they are arbitrarily chosen by the algorithm.
Much like symmetric-key ciphers are vulnerable to brute force attacks, every cryptographic hash function is inherently vulnerable to collisions using a birthday attack. Due to the birthday problem, these attacks are much faster than a brute force would be. A hash of n bits can be broken in 2n/2 time .
More efficient attacks are possible by employing cryptanalysis to specific hash functions. When a collision attack is discovered and is found to be faster than a birthday attack, a hash function is often denounced as "broken". The NIST hash function competition was largely induced by published collision attacks against two very commonly used hash functions, MD5 and SHA-1. The collision attacks against MD5 have improved so much that it takes just a few seconds on a regular computer.
Hash collisions created this way are usually constant length and largely unstructured, so cannot directly be applied to attack widespread document formats or protocols. However, workarounds are possible by abusing dynamic constructs present in many formats. Such a malicious document would contain two different messages in the same document, but conditionally displays one or the other, depending on which of two collided hash values is present:
Computer programs have conditional constructs that allow testing whether a location in the file has one value or another.
Some document formats like PostScript, or macros in Microsoft Word, also have conditional constructs.
File formats that can include images, including TIFF and PDF, are vulnerable to collision attacks by using colliding hash values as indexed colors, such that text of one message is displayed with a bright color that blends into the background, and text of the other message is displayed with a dark color.
A real-world collision attack was published in December 2008 when a group of security researchers published a forged X.509 signing certificate that could be used to impersonate a certificate authority, taking advantage of a prefix collision attack against the MD5 hash function. This meant that an attacker could impersonate any SSL-secured website as a man-in-the-middle, thereby subverting the certificate validation built in every web browser to protect electronic commerce. The rogue certificate may not be revokable by real authorities, and could also have an arbitrary forged expiry time. Even though MD5 was known to be very weak in 2004, and at least one Microsoft code-signing certificate was still using MD5 in May 2012.
The Flame malware successfully used a new variation of a chosen-prefix collision attack to spoof code signing of its components by a Microsoft root certificate that still used the compromised MD5 algorithm.
Many applications of crytographic hash functions do not rely on collision resistance, thus collision attacks do not affect their security. For example, password hashing and HMACs are not vulnerable. For the attack to be useful, the attacker must be in control of the input to the hash function.
Because digital signature algorithms cannot sign a large amount of data efficiently, most implementations use a hash function to reduce the amount of data that needs to be signed down to a constant size. Digital signature schemes are often vulnerable to hash collisions, unless using techniques like randomized hashing.
Note that all public key certificates, like SSL certificates, also rely on the security of digital signatures and are compromised by hash collisions.
The usual attack scenario goes like this:
# Mallory creates two different documents A and B, that have an identical hash value .
# Mallory then sends document A to Alice, who agrees to what the document says, signs it and sends it back to Mallory.
# Mallory copies the signature sent by Alice from document A to document B.
# Then she sends document B to Bob, claiming that Alice signed the different document. Because the digital signature matches the document hash, Bob's software is unable to detect the modification.
Cryptographic hash function