Math is Hard, But Not as Hard as it Used to Be

by Dennis Fisher on Wednesday March 1, 2017

The Internet’s security infrastructure is incredibly fragile, and every once in a while a small fragment of it comes loose and breaks all over the floor. Depending upon the importance of the piece, the mess can sometimes be swept up quickly before anyone notices. That is most decidedly not the case with the recent demonstration of a practical collision in SHA-1.

That collision already has produced reverberations around the web and the aftershocks will be felt for a long time. The most immediate effect is that site owners and organizations that somehow still depend on SHA-1 will need to accelerate their plans to move to a more secure hash algorithm. Like, now. Longer term, this result demonstrates, again, how quickly things are moving in security and how difficult it is to feel like you have a handle on it at any particular moment.

If you’re not familiar, SHA-1 is an older and commonly used hash algorithm, one that has been under assault by security researchers and cryptographers for many years. The algorithm is more than 20 years old, and for more than half of its life, cryptographers have been warning about various weaknesses in it that make it open to attack. Some of these attacks were theoretical, while others were more practical, depending upon the resources at the attacker’s disposal. Theoretical collision attacks were published as far back as 2005.

What researchers from Google and the Centrum Wiskunde and Informatica in the Netherlands revealed recently is not at all theoretical but a practical demonstration that SHA-1 is cooked. The researchers showed that they could take two unique PDFs and produce the same SHA-1 hash digest for both. That’s kind of the opposite of how hash functions are supposed to work.

“We started by creating a PDF prefix specifically crafted to allow us to generate two documents with arbitrary distinct visual contents, but that would hash to the same SHA-1 digest. In building this theoretical attack in practice we had to overcome some new challenges. We then leveraged Google’s technical expertise and cloud infrastructure to compute the collision which is one of the largest computations ever completed,” the team from Google and CWI said.

It’s a highly impressive piece of work, and there is some serious understatement in the researchers’ explanation of how they accomplished it. Saying that they “leveraged Google’s technical expertise and cloud infrastructure” kind of glosses over the enormous amount of compute power and human brain power needed to produce this collision. The team used 6,500 years of CPU computations for the first phase of the attack, 110 years of GPU computations for the second phase, and more than nine quintillion hash calculations in all.

That’s a staggering amount of resources, and the number of organizations that can bring that kind of power to bear on a problem is relatively small. But it’s not zero. Many of those organizations are known by three-letter names and have classified budgets, but some others are on the wrong side of the fence. I asked Marc Stevens, one of the researchers from CWI, involved in this research, how many groups could produce a collision like this.

“It required more than 110 years of total GPU time, but many countries and large companies can afford enough GPUs to do this in reasonable time. But it also requires expertise in cryptanalysis and scaling the attack,” he said.

In this equation, the cryptanalysis expertise is the resource that’s in the shortest supply. Compute power can be had now in quantities and at prices that would have been mind-bending just a decade ago. But the ability to do the kind of work that the Google CWI teams did to get to this point is still relatively scarce, which is a good thing for users and defenders. But that will not always be the case, and the time is coming when more and more of these security problems that were once thought to be unsolvable will start falling in rapid succession.

Tags: Encryption