Cybercrime , Fraud Management & Cybercrime , Incident & Breach Response

Internet Archive Data Breach Exposes 31 Million Accounts

Nonprofit Digital Archive Also Suffers Denial-of-Service Attacks, Defacement
Internet Archive Data Breach Exposes 31 Million Accounts
Servers that house the Internet Archive. (Image: John Blyberg/CC-BY2.0)

The Internet Archive, a nonprofit digital library that provides free access to archived websites and other material, is in day three of combating a distributed denial-of-service attack preventing users from accessing the service.

See Also: 57 Tips to Secure Your Organization

The San Francisco-based organization's website has been under cyberattack since Tuesday. On Wednesday, the free Have I Been Pwned data breach notification service began emailing subscribers to warn that 31 million Internet Archive accounts were breached in September. Exposed account information includes users' email addresses, screen names and hashed passwords.

Of the 31 million leaked records, slightly more than half had an email address that was already in the breach tracker database, meaning it appeared in a previous breach.

Starting Wednesday, individuals who were able to access Internet Archive's site saw a message injected through JavaScript that read: "Have you ever felt like the Internet Archive runs on sticks and is constantly on the verge of suffering a catastrophic security breach? It just happened. See 31 million of you on HIBP!"

Run by Australian developer Troy Hunt, HIBP is a free service that allows individuals to register their email address. If the email address appears in a data leak or dump, the service emails the subscriber to alert them.

Whether the website injection, data theft and leak, and DDoS attacks are connected remains unclear.

Hunt said the message injection appeared to result from someone accessing a polyfill JavaScript file on a subdomain. Per Mozilla: "a polyfill is a piece of code - usually JavaScript on the Web - used to provide modern functionality on older browsers that do not natively support it."

Credit for at least some of the DDoS attacks against Internet Archive has been claimed by a group that calls itself "Sn_darkmeta" on the social platform X. The group said that it's attacking to protest ongoing U.S. support of Israel.

Breach Timeline

In a timeline posted to X, Hunt said someone sent him a copy of the stolen data on Sept. 30, although he was traveling and wasn't able to review it until five days later, on Saturday, at which point he realized what he'd been sent.

Hunt said he alerted Internet Archive to the breach on Sunday, shared a copy of the data he received, and said he intended to load the 31,081,179 email addresses exposed in the breach into HIBP, within 72 hours. Hunt told BleepingComputer the 6.4 gigabyte SQL file he received, named ia_users.sql, carried a last-updated timestamp of Sept. 28, suggesting that's the date the data got stolen.

The Internet Archive confirmed receipt of the data on Monday, leading Hunt to ask for "a disclosure notice" from the organization, which he again asked for on Tuesday, but has yet to receive. Such a notice involves a breached organization directly warning users about a breach.

"Obviously I would have liked to see that disclosure much earlier, but understanding how under attack they are I think everyone should cut them some slack," Hunt said. "They're a non-profit doing great work and providing a service that so many of us rely heavily on."

"To the archive.org attackers - that isn't sticking it to some evil multinational, it's attacking a genuinely great resource run on near nothing resource, sweat and tears," British cybersecurity researcher Kevin Beaumont said in a Mastodon post. "If you're going to attack things - please aim better."

The attacks against the site follow Internet Archive suffering a major legal defeat. The U.S. Court of Appeals for the Second Circuit upheld on Sept. 4 a lower court ruling in the case of "Hachette v. Internet Archive" which found that a book digitization project launched in 2020 isn't protected under U.S. Copyright Act "fair use" allowances, and infringed some publishers' copyrights.

Bcrypt Password Hashing

What's the risk posed by attackers having stolen password hashes, which the Internet Archive scrambled using the bcrypt algorithm? Unlike hashing algorithms such as MD5 and SHA256 - which are designed for speed - bcrypt is a more slow-working algorithm designed to be much tougher to break, according to password management and authentication vendor SpecOps.

Any bcrypt-hashed passwords that are at least 8 characters long, and which combine upper and lower characters, numbers and symbols, are especially secure.

Using currently available tools and technology, brute-forcing an 8-character password with all of those elements would take 286 years, SpecOps said. Increasing the password length to 12 characters would increase the time required for brute-forcing to 23 million years.

Facing such constraints, attackers have little incentive to try and brute-force passwords hashed using bcrypt.

"Short, non-complex passwords can still be cracked relatively quickly, highlighting the huge risks of allowing users to create weak - yet very common - passwords such as password, 123456 and admin," SpecOps said. "But once a combination of characters are used in passwords over eight characters in length, the time to crack quickly becomes a near-impossible task for hackers."

Even so, security experts advise Internet Archive users to change their password once the site again becomes accessible. For breached sites, one risk is that attackers may have previously been able to inject malware into the infrastructure allowing them to intercept passwords before they get hashed and stored, for an unknown length of time.

Experts also renewed a by-now standard warning to never reuse their passwords across different websites. Many attackers continue to practice credential stuffing, referring to obtaining legitimate email address and password pairs - sometimes stolen directly, sometimes recycled from preexisting data leaks and dumps - and using them to try and access other sites and services.

While defenses exist against credential stuffing such as multifactor authentication, not all sites and services actively defend themselves or their users against such attacks (see: Breach-Weary Snowflake Moves to MFA, 14-Character Passwords).


About the Author

Mathew J. Schwartz

Mathew J. Schwartz

Executive Editor, DataBreachToday & Europe, ISMG

Schwartz is an award-winning journalist with two decades of experience in magazines, newspapers and electronic media. He has covered the information security and privacy sector throughout his career. Before joining Information Security Media Group in 2014, where he now serves as the executive editor, DataBreachToday and for European news coverage, Schwartz was the information security beat reporter for InformationWeek and a frequent contributor to DarkReading, among other publications. He lives in Scotland.




Around the Network

Our website uses cookies. Cookies enable us to provide the best experience possible and help us understand how visitors use our website. By browsing bankinfosecurity.com, you agree to our use of cookies.