Massive Dark Web Database Exposes the Shame of Our Password Habits

There are a lot of reasons to worry about the implications of the discovery of a massive trove of 1.4 billion stolen passwords by researchers at the firm 4iQ last week. The way the huge data set exposes the shame of our lousy password habits is what will linger.

First the bad news: 4iQ CTO Julio Casal used a post on Medium on December 8 to call attention to his firm’s discovery of a giant, 41 Gigabyte dump of stolen passwords on December 5. The collection of passwords, which was current as of late November, was found in what Casal described as “an underground community forum” and contained more than 1.4 billion username and password pairs stored in clear text.

Most of the credentials in the dataset - about 1 billion username and password pairs - were already known and have been available on other lists of leaked credentials like exploit.in. But a healthy share - around 28% or 385 million - are new credential pairs. In all, 14% of exposed username and password pairs listed as cleartext in the 41Q dataset had not previously been decrypted. That adds up to more than 200 million username and password pairs that are now exposed.

In an indication of how valuable stolen passwords and usernames are, the data wasn’t just dumped as The Worlds Biggest Spreadsheet. Rather: the dataset had been carefully groomed and structured to make it fast and easy to search and retrieve credentials. The data was organized alphabetically in a directory tree that was further fragmented into 1,981 pieces to allow for fast searching. Scripts and search tools that could further aid queries were included and carefully documented in a README file, Casal noted.

The criminal's attention to detail has some unintended benefits. The biggest of those is to help us better see, understand and absorb the shame of our bad password hygiene. And this isn’t just about weak password choice. “Yes,” the sequence ‘123456’ is the most common password choice in this massive trove of data, appearing 9.2 million times. And “Yes,” ‘password’ is a top five choice (#4 - appearing 1.3 million times) as are ‘qwerty’ (#3 appearing 1.6 million times) and ‘111111’ (#5, appearing 1.2 million times). We’ve seen this pattern before - pretty much every time a sizeable trove of stolen passwords is published. And given that this trove is just a superset of those other troves, it makes sense that the same crummy list of insecure passwords rises to the top.

What’s more interesting is how the arrangement of the data also allows us to see patterns of password reuse across different accounts, as well as how password use patterns change over time with a given account. In his post, Casal notes examples where the same value is used across six different accounts, as well as how a single Hotmail account migrates its password among a series of distinct, but trivially different options (so: ‘iluvbeepbeep’ becomes ‘1luvb33pb33p’ becomes ‘!luvb33pb33p’).

This kind of a close look at our collective password habits is sobering - or amusing, depending on your mood. But the toll we all pay for lax password hygiene is heavy. Just this week, for example, Osaka University in Japan announced that personal data belonging to 80,000 students, graduates, staff and current and former employees was stolen by hackers who used a university lecturers ID and password to log into the university’s computer system and install a data stealing Trojan horse program. So-called “credential stuffing” - or the use by hackers of stolen passwords across different websites - was also to blame for a rash of Pinterest account takeovers, The Register reported.

Despite media attention to ‘sophisticated attackers’ and Mr. Robot style exploits, phishing of user and administrator credentials remains the most common avenue that attackers - sophisticated or not - use to get onto target networks and computers.

Unfortunately, there is no easy solution here. As I noted over at Security Ledger: passwords are widely recognized as a security weakness, but they’re also hard to kill off entirely. The best solution to this problem may start with awareness (and a data trove like the one 4iQ is publicizing helps with that). Beyond that, companies need to start implementing stronger and more individualized authentication technologies, especially for high value accounts and assets.

Paul Roberts is the Editor in Chief at The Security Ledger and the founder of The Security of Things Forum.