Blog

Database and web scraping

security-3742114_1280 (002)

Web Scraping: A Business Model and Its Legal Framework

Web scraping is a business model that has emerged with the rise of the internet, based on the use of software programs that extract data from one website to display it on another. This allows end consumers to compare and choose between different offers from various websites.

This technique, known as screen scraping, enables the location and selection of data or information from the web by transferring or reusing third-party information. However, the owners of such data may perceive this activity as harmful to their interests.

The legality of this practice is a complex issue, as it depends on several factors, including how the scraping is carried out, whether the data is considered original, and whether there has been a substantial investment—whether financial, time-based, or effort-based—in obtaining the data.

How Can a Database Be Protected?

Article 12.2 of the Spanish Intellectual Property Law (LPI) provides protection for both original databases and sui generis databases:

“(…) Databases are considered to be collections of works, data, or other independent elements arranged systematically or methodically and accessible individually by electronic or other means.”

But how can a database be classified as original? It depends on the criteria used for selecting or organizing its contents. Protection extends to the structure of the database but not its content.

However, a database does not necessarily have to be original to receive legal protection. It can also be protected under sui generis rights. This means that even if the selection or arrangement of content lacks originality, the database may still hold economic value due to the effort invested in its creation. For this reason, the law protects it as a related right, albeit with a lower level of protection than an original database.

Sui Generis Database Rights

Article 133 of the LPI states:

“The sui generis right over a database protects the substantial investment, whether qualitative or quantitative, made by its manufacturer, whether in financial resources, time, effort, energy, or similar, for the acquisition, verification, or presentation of its content.”

Therefore, not every non-original database is protected—only those that involve a significant effort or investment, either in terms of the volume of information or its value.

The right over a database arises when the manufacturing process is completed and expires after 15 years, subject to subsequent updates.

The database owner is not necessarily the person who uploads the data but rather the manufacturer. Article 133.3 of the LPI defines the manufacturer as the person who takes the initiative and assumes the risk of making substantial investments to obtain, verify, or present the content. Simply financing a database created by a third party does not, by itself, establish ownership.

A sui generis database manufacturer can prohibit the extraction and/or reuse of all or a substantial part of the database’s content if its acquisition, verification, or presentation involved substantial investment.

Thus, sui generis database rights prevent unauthorized use, including plagiarism, and protect both moral and economic rights. However, the manufacturer cannot prevent users from extracting or reusing insubstantial parts of the database, provided that such use does not interfere with the normal exploitation of the database or unfairly harm its owner.

Users may also extract substantial parts of a database without authorization in certain cases, such as for private use (if the database is non-electronic), for teaching or scientific research (with proper citation), or for public security, administrative, or judicial proceedings.  

Is Screen Scraping Legal?

If we refer to Supreme Court rulings, we see that they have addressed the legality of this technique in specific cases, such as those between Ryanair and several online ticket resellers like Atrápalo or Lastminute.

In these cases, the Supreme Court ruled that Ryanair’s database was not protected by intellectual property rights because it did not meet the necessary requirements for originality or sui generis protection:

“A minimum level of originality is required, which is not present in the ordered catalog of flights on Ryanair’s website.”

Both the Commercial Court No. 2 of Barcelona and the Barcelona Provincial Court found that: “Ryanair had only invested in generating its own data (flights, destinations, schedules, prices, etc.) and in the necessary IT processing to ensure system reliability and accessibility.”

The Court of Justice of the European Union (CJEU) stated that for sui generis protection:

“(…) it is necessary to meet the requirements for sui generis protection, which include substantial investment not in the creation of data but in their collection, verification, or presentation.”

Thus, since Ryanair’s database did not qualify for sui generis protection, the Supreme Court did not assess whether there had been an extraction or reuse of a substantial part of the website.

Other arguments against screen scraping include breach of contract, as website terms and conditions often prohibit scraping. However, the Supreme Court ruled that mere browsing does not establish a contract unless the user expressly agrees to the terms.

Additionally, web scraping can be challenged under unfair competition laws if it involves unauthorized exploitation of a competitor’s investment.

What Does the EU Artificial Intelligence Regulation Say About Scraping? Is It Prohibited?

The EU AI Regulation (RIA), in Article 5.1(e), does not broadly prohibit web scraping but bans specific uses deemed to pose an “unacceptable risk” to fundamental rights and privacy. Specifically, it prohibits the marketing, deployment, or use of AI systems that create or expand facial recognition databases through unauthorized web scraping.

“e) The commercialization, deployment, or use of AI systems that create or expand facial recognition databases through indiscriminate extraction of facial images from the Internet or CCTV recordings.”

It is crucial to note that the ban applies to the use of this technique for facial recognition and privacy protection, not to all scraping activities, which may have other purposes and be subject to different regulations.

Thus, web scraping:

  • Is not generally prohibited as long as it does not target specific individuals or groups.
  • Is prohibited if used for facial recognition purposes.

This regulation focuses on large-scale biometric identification rather than extracting data from intellectual property-protected databases.

Leave a Reply