Perplexity AI claims to follow robots.txt, but Amazon investigates

Tech giant Amazon is looking into its cloud division, Amazon Web Services (AWS), after accusations surfaced that a customer, Perplexity AI, might be scraping content from websites without their permission. This investigation centers around a specific practice: ignoring a common web standard known as the Robots Exclusion Protocol (robots.txt).

What is `robots.txt` and why does it matter?

Imagine your website as your house. Robots.txt acts like a sign on your door. It tells automated programs, or “bots,” which areas of your website they are allowed to visit and which ones are off-limits. While respecting robots.txt isn’t mandatory, it’s generally been a well-understood courtesy since the 1990s.

Related /

Perplexity, OpenAI, and Anthropic under fire for ignoring robots.txt

Wired discovers a suspicious crawler

Tech publication Wired reported uncovering a virtual machine, essentially a powerful computer program, that was bypassing a website’s robots.txt instructions. This machine, hosted on an AWS server with an IP address (44.221.181.252) linked to Perplexity AI, reportedly visited several prominent news websites hundreds of times in the last three months.

How did they know it was Perplexity AI?

Wired conducted a test. They entered headlines or short descriptions from the websites in question into Perplexity’s AI chatbot. The chatbot then responded with information that closely resembled the articles, with little to no attribution given to the original source. This suggested Perplexity might be using the scraped content to power its AI.

Is Perplexity the only culprit?

While Amazon’s investigation focuses on Perplexity AI, a recent Reuters report suggests this practice of ignoring robots.txt might be more widespread among AI companies looking to train their large language models.

What does Amazon say?

Amazon is clear: its customers must comply with robots.txt instructions. Their terms of service strictly prohibit illegal activity, and that includes respecting website owners’ wishes regarding how their content is accessed.

Perplexity AI denies wrongdoing, with a caveat

Perplexity maintains they follow robots.txt guidelines. Their spokesperson claims their chatbot respects the protocol, and their services comply with Amazon’s terms of service. However, they admit to an exception: if a user specifically includes a URL in their chatbot query, the robots.txt instructions might be bypassed in that instance.

Perplexity CEO previously denied accusations

Aravind Srinivas, CEO of Perplexity AI, has previously refuted claims that his company disregards robots.txt and then tries to cover it up. He acknowledges using third-party web crawlers alongside their own, and admits the bot identified by Wired belonged to one of these external services.

The investigation by Amazon is ongoing. Whether Perplexity AI will face any consequences for its alleged actions remains to be seen.

Discover more from GadgetBond

Subscribe to get the latest posts sent to your email.

GadgetBond

Perplexity AI claims to follow robots.txt, but Amazon investigates

What is `robots.txt` and why does it matter?

Wired discovers a suspicious crawler

How did they know it was Perplexity AI?

Is Perplexity the only culprit?

What does Amazon say?

Perplexity AI denies wrongdoing, with a caveat

Perplexity CEO previously denied accusations

Discover more from GadgetBond

Kindle Colorsoft hits rare $170 pricing with 32% discount in spring sale

Kindle Scribe is nearly 40% off in Amazon’s Big Spring Sale

iOS 26.4 adds Ambient Music widget and chatbot support to CarPlay

Apple tvOS 26.4 rolls out Genius Browse, better audio, and subtitles

OpenAI and Handshake launch Codex Creator Challenge for students

Apple now makes the medical device status clear on App Store health apps

MLB Scout Insights brings AI-powered context to every at-bat

Google Gemini can now import chats from other AI apps

Google’s MedGemma Challenge crowns EpiCast as global winner

Live Translate with headphones finally lands on iOS for real-time conversations

Gemini 3.1 Flash Live brings multilingual, low-latency AI to developers

Google Search Live rolls out to every AI Mode region

Google Quantum AI adds neutral atoms to superconducting playbook

What is robots.txt and why does it matter?

Wired discovers a suspicious crawler

How did they know it was Perplexity AI?

Is Perplexity the only culprit?

What does Amazon say?

Perplexity AI denies wrongdoing, with a caveat

Perplexity CEO previously denied accusations

Discover more from GadgetBond

What is `robots.txt` and why does it matter?