• Natanael@slrpnk.net
      link
      fedilink
      arrow-up
      0
      ·
      4 months ago

      Tldr, modern hash algorithms process data in fixed size blocks. For MD5 you take 128 bits at a time.

      The core function in a hash is a little scrambler function (permutation) that takes two different inputs and gives you a single output back.

      So it starts with a fixed value built into the algorithm, and then scrambles the first block of the message with it. Then it takes that scrambled piece and mixes that with the next block of the message, then takes THAT scrambled piece and mixes it with the next block. And so on until the end of the message. The last scrambled piece is the hash value.

      Collision attacks target that core function by figuring out how to tweak multiple messages so that their scrambler outputs “collide”, ending up equal. So you can hash two tweaked messages and get the same hash value. These tweaks usually include a bunch of random looking bits to work.

      Then for a multicollision we don’t just do it for two messages. We do it for every letter in the alphabet. For a HTML document we encode something like <div hidden garbage=xyz>a</div> and repeat for every letter. Every letter gets a distinct random looking value. Then we have many documents with the same hash and one letter different. We can show you a hash and then pick which letter to present you with in the document. All of them checks out.

      But then we repeat the attack. We add another whole alphabet right after the first one! Now we have <div hidden_garbage=xyz>a</div> <div hidden_garbage_2=xyz>a</div>. And because the second letter is in a different block, that works just fine! Adding a second letter don’t change the first intermediate value, and you can attack the second intermediate value for the second letter separately. So you add the whole alphabet again (with new associated calculated garbage for every letter in the second position), and now after the second letter we have a new intermediate value which is the same regardless of which letter we pick in the second position.

      So now we can independently pick a random letter in the first position and in the second position too! Every combination of two letters has the same hash because of the hidden calculated garbage after each letter!

      Then we just repeat the multicollision attack on the whole alphabet over and over until your document is long enough to encode your message. And that message may include the document’s own hash.