The Importance of Privacy: Sexual Fetishes, Salaries, and Other Things We Know About You
It’s time to stop pretending that you can be anonymous on the Internet.
In a data breach, unauthorized users obtain access to databases of user information that are stored by companies. While most people assume it’s harmless, the types of data that are obtained assist in blackmail, identity theft, and fraudulent financial activity, among others. Worse, chances are, there’s much more unauthorized data collected about you than you imagine.
A Dose of Reality
Troy Hunt’s Internet security website, Have I Been Pwned (HIBP), which was launched on December 4, 2013, has published 8,418,474,549 recorded breaches (over 397 total events) of customer accounts as of August 18, 2019 (this list does not include the 2017 Equifax data breach, which exposed the private data of up to 143 million consumers in the US) — a number that exceeds the current population of Earth. To put it into a different perspective, Facebook’s 2018 10-K reports 2.32 billion monthly active users on its platform; mathematically, the amount of compromised accounts reported by HIBP is 3.6x that of the number of Facebook users.
Within those 8.42 billion recorded breaches, over 80 different types of identifiable data have been illicitly harvested. Chart 1 illustrates the most common types of harvested data include emails (21%), passwords (18%), usernames (13%), IP addresses (10%), names (7%). Almost one third (31%) of the total harvested data, though, comes in the “other” category, which alarmingly includes information on sexual preferences, sexual orientation, sexual fetishes, credit status information, family structure, smoking habits, nationalities, income levels, and government-issued IDs. This type of information can be used in blackmailing users, discriminating against job applicants, or other targeted activities.
HIBP records the highest number of breaches in 2016: 86 breaches (see Chart 2). Each breach is an event where a database containing personal records was accessed and exposed in an unauthorized manner. The number of breaches before 2016 rose year by year, and the number of breaches after 2016 has fallen. For example, in 2017, the number of breaches fell by more than half that of the previous year. At first glance, this data seems to indicate a positive future for data privacy, but cross-examining with the number of accounts hacked per year cautions against that narrative.
Chart 3 illustrates the number of compromised accounts per year. In each breach event, the accounts that are exposed are considered compromised accounts. In 2019, while Chart 2 shows only 24 breach events occurring, Chart 3 shows a record number of compromised accounts: almost 1.80 billion (Chart 3’s units are labeled in millions), compared to 1.57 billion in 2016. On average, every breach event in 2016 resulted in an average of 18,283,820 compromised accounts; on the other hand, every breach event in 2019 resulted in an average of 74,889,726 compromised accounts.
Those numbers represent a 4.10x difference in the average number of compromised accounts an unauthorized attacker has access to when breaching a database in 2019 compared to in 2016. While the number of breaches has been going down, attackers may be looking for higher-profile targets, which provide larger datasets of user information.
A notable candidate from the HIBP list of breaches representing a high-profile 2019 data breach is the former company Verifications IO, which HIBP reported at 763,117,241 compromised accounts. The information from those compromised accounts included genders, employers, job titles, names, usernames, phone numbers, physical addresses, IP addresses, geographic locations, email addresses, and dates of birth. Following the breach, the website went down and has not come up at the time of publishing this paper. The breach did not only compromise user data but impacted business activity as well.
Examples of Notable Account Breaches
- Adobe (2013) — 153 million accounts were compromised. Data collected included Adobe’s internal ID for customers, usernames, emails, encrypted passwords (that could be easily deciphered), and password hints.
- Ashley Madison (2015) — Ashley Madison is a website that promotes extramarital affairs. The data breach in 2015 was leaked after the website refused to shut down. Over 25GB of data (30.8 million) was subsequently released including sexual orientation, physical addresses, phone numbers, names, passwords, and emails.
- LinkedIn (2012) — 165 million emails and passwords were stolen.
- MySpace (2008) — 360 million email addresses, passwords, and usernames were compromised.
- Verifications.io (2019) — an email marketing firm few have heard of, Verifications.io had almost 800 million compromised accounts containing personal data: emails, employers, genders, locations, IP addresses, job titles, names, phone numbers, and physical addresses.
- Mate1 (2016) — a smaller dating site that boasts 46 million users, Mate1’s hack resulted in over 27 million accounts compromised with information on astrological signs, dates of birth, drinking habits, drug habits, education levels, emails, ethnicities, fitness levels, genders, locations, income levels, job titles, names, parenting plans, passwords, personal descriptions, physical attributes, political views, relationship status, religion, sexual fetishes, travel habits, usernames, web activity, and work habits.
The Illusion of Compliance
After data is recorded by a central authority, the user has no guaranteed control over it, only perceived control. Ashley Madison, for example, charged $19 to its users to delete their data from the Ashley Madison database. Even after paying, it was never fully deleted.
The Impact Team claimed that Ashley Madison’s parent company, Avid Life Media, received $1.7 million between 2014–2015 for its account removal service. Unfortunately, after accessing the Ashley Madison user database, the Impact Team was able to retrieve the supposedly deleted user data.
It’s Worse Than What I’ve Described
HIBP provides a list of 397 breach events that have occurred since 2007, but the list is by no means exhaustive. Other sources such as KrebsOnSecurity offer additional data points that include other types of breaches as well, such as ones that contain data that is not (yet) publicly available. For example, in 2019, a hacker downloaded 30GB of Capital One credit application data and was subsequently arrested by the FBI. “That data included approximately 140,000 social security numbers and 80,000 bank account numbers on US consumers and roughly 1 million Social Insurance Numbers (SINs) for Canadian credit card customers” who applied for a Capital One credit card product between 2005 and 2019. Other data included customer status data (e.g., credit scores, credit limits, balances, payment history, contact information) and transaction data from a total of 23 days between 2016 and 2018, according to the official Capital One statement on the breach.
In the digital realm, centralized data storage has become a game of cat and mouse between cybersecurity and hackers. User data must be protected, and security professionals take on the complicated task of maintaining patches, identifying vulnerabilities, and properly implementing a secure architecture in all aspects of the centralized system. On the other hand, hackers need only discover a single point of entry to gain the upper hand. Ultimately, the cost of the entire game is paid for by the user. The cost of a company’s cybersecurity is packaged into the price of a product or subscription paid for by the user, and the profit a hacker makes from selling data is earned at the expense of the user.
What This Means for You
You’re probably looking for actionable steps to take to prevent this type of data collection from happening to you. The answer isn’t very clear, especially because data about you is being shared or accessed by organizations without your conscious consent (such as the case with Cambridge Analytica, which pulled Facebook user data directly from Facebook).
The first answer that comes to mind seems impractical: just don’t put any of your information on the Internet. Unfortunately, your data is an entry ticket to the Internet these days — services want to track you with cookies, forms require your email address, etc. without giving up some privacy, you are unable to reap the full benefits, and ultimately the decision to give up information comes down a personal level (whether providing data and risking that privacy infringement is worth the information you will receive).
Another option takes heavier lifting and requires a restructuring of the Internet. Financial incentives must move away from advertising, which is the primary way that people make money on the Internet today. Startups like Worthyt are reimagining this by shifting the financial incentive away from advertising to quality audience interaction. On a structural level, blockchain solutions can serve to decentralize data, which isolates attacks.
The smallest step you can take is to simply be more conscious about your activity on the Internet. Giving away your email to one website can be used to link your activity in the future if that website is compromised. One thing I like to use is a website called Sharklasers. It’s a website that gives you a randomly generated email (or one you can specify) temporarily. You can use this free service to sign up for things anonymously. Because you can access the email’s inbox, you can receive activation links as well.