Data Protection Posted on November 17, 2023 Written by Marin Milenkoski

Legal and Privacy Challenges of Data Scraping in the Digital Age

Data Scraping – beyond the familiar term, what secrets does it hold? Throughout this insightful blog, Marin Milenkoski explores the complex world of data scraping and examines the legal and privacy aspects of gathering publicly available information, a practice that is widely used, but often misunderstood.

Data scraping, also known as data harvesting, can be a source of confusion when it comes to gathering information that is publicly available. Many people, including those in the field of data protection, have long believed that using publicly accessible data from platforms like LinkedIn or other social media sites is legally permissible. The argument has been that, individuals, by sharing their information on platforms like LinkedIn, have essentially given consent for others to use their data for various purposes. However, this assumption is not entirely accurate.

Some businesses claim to be fully compliant with data protection regulations, asserting that they only use publicly available information from LinkedIn, which they consider to be entirely safe because individuals have consented to this use by providing their data on social media. They also point to LinkedIn’s privacy policy, which they believe permits the use of personal information and argue that users expect to be contacted when they leave such information on the platform. Some of these arguments are made by consultants in the field of data protection.

It’s important to clarify that while LinkedIn allows people to create professional profiles, post articles and comments, search for jobs, and connect with others to expand their professional networks, this does not grant unrestricted access to their data for any purpose. LinkedIn’s Privacy Policy states that personal information is visible to others but does not imply that it can be used for any purpose without limitations.

Now, let’s focus on an important aspect – Article 14 of the General Data Protection Regulation (GDPR), which is often overlooked by many companies. This article applies when personal data is obtained from sources other than the data subject, such as public databases, third-party providers, or intermediaries. According to Article 14, the controller must promptly inform the data subject of this information, ideally within one month of obtaining their personal data.

Regarding the lawful basis for data scraping, it’s crucial to demonstrate that you have explicit consent from individuals before extracting their personal information if you intend to scrape the personal data of EU and UK residents. Obtaining consent is often the primary and, in many cases, the only lawful method for scraping personal data from EU and UK residents. Alternatively, web scrapers can also rely on the legal basis of having a legitimate interest in scraping, storing, and using this personal data. However, it’s essential to have a strong and well-justified legitimate interest to comply with GDPR principles, as a vague or weak legitimate interest may not be sufficient.

In most situations, it is typically government bodies and law enforcement agencies, among others, who can make a reasonable case for having a legitimate reason to scrape the personal data of EU and UK citizens. They often engage in such activities for the broader benefit of the public.

Taking this matter into account, The Information Commissioner’s Office, along with eleven other data protection and privacy authorities worldwide, has issued a collective statement advocating for the safeguarding of individuals’ personal data against illegal data scraping activities happening on social media platforms. This statement explicitly outlines the privacy risks that can arise from such scraping, even though many people believe it to be secure.

As stated in the joint statement, many data protection authorities have seen increased reports of mass data scraping from businesses and other websites. The reports raise a number of privacy concerns, including the use of scraped data for:

Targeted Cyberattacks

For instance, when identity and contact information is scraped and shared on ‘hacking forums,’ malicious actors may use this data for precise social engineering or phishing attacks

Identity Fraud

Scraped data can be exploited to submit fraudulent loan or credit card applications or to impersonate individuals by creating fake social media accounts in their name.

Monitoring, Profiling, and Surveillance

Scraped data may be utilized to populate facial recognition databases and provide unauthorized access to authorities for surveillance purposes.

Unauthorized Political or Intelligence Activities

Foreign governments or intelligence agencies might use scraped data for unauthorized purposes, potentially compromising individuals’ privacy.

Unwanted Direct Marketing or Spam

Scraped data often includes contact information that can be exploited to send large volumes of unsolicited marketing messages, resulting in spam.

These privacy concerns highlight the need for vigilant monitoring and regulation of data scraping activities to protect individuals from various forms of misuse and privacy violations.

Should you find yourself with additional questions or a heightened interest for more data protection advices and insights, contact us at [email protected] or reach out directly to Marin on LinkedIn.

Legal and Privacy Challenges of Data Scraping in the Digital Age

Targeted Cyberattacks

Identity Fraud

Monitoring, Profiling, and Surveillance

Unauthorized Political or Intelligence Activities

Unwanted Direct Marketing or Spam

Table of contents

Share this blog

Legal and Privacy Challenges of Data Scraping in the Digital Age

Targeted Cyberattacks

Identity Fraud

Monitoring, Profiling, and Surveillance

Unauthorized Political or Intelligence Activities

Unwanted Direct Marketing or Spam

Soft Opt-in: Complete Guide to UK Email Marketing Rules

How to Make GDPR Compliance Your Business Game-Changer

Passage of UK Data Protection Bill

Table of contents

Share this blog