Reuters story highlights more details about our discovery.
The business of recovering stolen credentials is not as simple as it would seem. We have set records before, initially with the Adobe user database including 153 million records, then with 360 million recovered in February 2014, and finally with 1.2 billion credentials in the most substantial breach known to-date. Today, large amounts of stolen credentials may not grab headlines, but they never lose their potency, especially when they are recovered by the good guys and returned to the rightful owners. Such is life in our group that recovers about 100 million stolen credentials every month.
This task is rather difficult and tedious. Besides automated harvesting on a daily basis, we interface with hundreds of hackers, monitoring if they have any new information. We do not pay hackers for stolen data. If they have something new and valuable, we start our dance; ask, negotiate, finagle, anything permissible to get the data without rewarding the bad guys for their work. After seeing most everything, and hearing even more, we have become skeptics, analyzing every bit of information we come across. Hence, when someone claims to have 900 million credentials in a single batch, we have to approach it with curbed enthusiasm.
“Well isn’t it an interesting claim, I’ll believe it when I see it”, one of the Hold Security Cyber Intelligence Analysts writes to the hacker. This one is talkative, willing to brag and boast of his success stories. He replies by providing samples of the data he has acquired. Verifying stolen credentials is a science, an average unencrypted email address and password pair is 29-characters long. When looking at a random sample, the distribution of credentials by countries, email providers, and major corporate addresses is nearly always identical. Now, we can conclude it’s a collection of multiple breaches overtime. The hacker admits this without hesitation. Here comes the hard part, how do we get the data? What does the hacker want in exchange?
Over the past month (April of 2016) we have identified 120 million stolen records. This stolen data consists of information from a major Eastern European communication firm, some medium size online service providers, and mostly unattributed data moved around by hackers in search of easy gains. We’ve invested countless hours into acquiring this data without contributing to the hacker monetizing on his work. Because of this, there is an ever flowing amount of data that falls through the uncompensated grasps of the Deep Web.
“50 rubles” is what the hacker wants for this incredibly large set of data. He can’t be serious; based on today’s exchange rate it is less than one US dollar. This greatly impacts the data’s credibility and value, similar to an expensive sports car being sold for pennies at auction. “I am just getting rid of it but I won’t do it for free”, he replies. In all reality, 50 rubles is next to nothing, but we refuse to contribute even insignificant amounts to his cause. It is rather funny to negotiate over this, but finally the hacker just asks us to add likes/votes to his social media page (so much for anonymity). That we can do, and once he is satisfied with the results we get a link to an incredible 10 gigabytes in a compressed database, which takes us more than hour to download.
Now, for the moment of truth: checking the number of records – 917 million, data consistency - 29.8 characters per credential, email domain distribution consistent with western domains, major email providers, and corporate domains. But duplicates stand out. Out of 80 million credentials starting with the letter “a” only 19 million unique credential pairs are found. It is not unusual that most people still reuse credentials across different services, but nearly a 75% overlap is substantial.
For the final step – comparing these unique credentials from this enormous batch to the records that we already have seen before. And, it’s a letdown. Only 0.45% is new, meaning that only 1 out of 200 credentials are ones we have never seen before. Is it disappointing? Of course, but more importantly, we know that most of the stolen data has already been identified and many companies and individuals are already secured. And the 4 million credentials that we have never seen before? Well those are being processed and distributed to companies and individuals who can secure their systems against abuse. All in a good day’s work with the ups and downs behind the scenes of our credentials recovery operation.
However, the story doesn’t end here. When we peel back the layers and dig deeper, we find that the hacker is holding something back from us. Within several days of communication and after a couple more strategically timed votes on his social media pages, he shared more useful information. At the end, this kid from a small town in Russia collected an incredible 1.17 Billion stolen credentials from numerous breaches that we are still working on identifying. 272 million of those credentials turned out to be unique, which in turn, translated to 42.5 million credentials – 15% of the total, that we have never seen before.