The use of artificial intelligence tools has surged in recent years. Publishers like The New York Times (NYT) now face new challenges with AI companies. These challenges include copyright issues and unauthorized use of content. In October 2024, NYT issued a cease-and-desist notice to Perplexity, an AI startup, demanding that it stop using its content. This ongoing conflict reflects the growing tension between publishers and AI platforms using web data to generate responses for users.
This article explores the conflict between Perplexity and NYT, including the core issues around copyright, scraping, and web crawling. It also examines how these developments align with other clashes between publishers and AI companies.
What Sparked the Conflict Between NYT and Perplexity?
On October 2, 2024, The New York Times sent a letter to Perplexity. In it, the publisher accused the AI platform of violating copyright laws. NYT claimed that Perplexity used its articles to create summaries and responses for users without permission. NYT had earlier made efforts to block access to its website, yet its content still appeared in Perplexity’s results.
Perplexity had previously assured NYT that it stopped using crawling technology. Despite this assurance, NYT said the platform continued surfacing its content. The publisher demanded Perplexity explain how it was accessing its material. NYT also asked the startup to provide full information about its methods before October 30, 2024.
Perplexity’s Response: Denying Scraping Claims
In response, Perplexity denied scraping NYT’s data to build foundation models. The company clarified that it does not collect data to train large AI models. Instead, it indexes web pages and provides factual summaries through citations. When users ask questions, Perplexity aggregates public content, including news articles, to deliver summarized answers.
This raises a complex legal and ethical question. Publishers like NYT argue that even indexing or repurposing their content without permission can violate copyright laws. Meanwhile, AI companies, including Perplexity, claim that their purpose is to deliver factual information rather than build proprietary models.
The Larger Issue: Web Scraping and Copyright Concerns
The conflict between Perplexity and NYT reflects a broader debate around AI, web scraping, and copyright protections. Publishers fear that AI platforms could siphon off their original content. By using automated crawlers and scrapers, AI firms can gather massive datasets from websites. These datasets are used to generate answers, summaries, or chat-based outputs.
Many publishers argue that this practice undermines the value of original journalism. If AI systems can instantly summarize news articles, fewer users will visit the original websites. This results in loss of traffic and ad revenue for publishers.
How Publishers Block Scraping Attempts
Many media companies, including NYT, deploy technical measures to block web crawlers. They often use robots.txt files to prevent unauthorized bots from accessing their content. This web standard instructs crawlers about which parts of the website they can or cannot index.
Despite these efforts, AI companies have been known to bypass such restrictions. According to Reuters, some platforms ignore or work around the robots.txt protocols. Publishers argue that this behavior reflects disregard for established online norms.
NYT’s Legal Battle with OpenAI
This is not the first time NYT has taken a stand against an AI company. In late 2023, NYT filed a lawsuit against OpenAI, the creator of ChatGPT. NYT accused OpenAI of using millions of its articles without permission to train its language models. The case highlighted growing concerns among publishers about unauthorized content usage by AI firms.
Chatbots like ChatGPT rely on vast datasets to generate responses. Many news outlets fear their archived content could be used in these models without proper compensation. As a result, legal disputes over content ownership and fair use are becoming increasingly common.
Copyright Law and Fair Use: Where Do AI Companies Stand?
Copyright law allows limited use of copyrighted material under the principle of fair use. However, the concept of fair use becomes murky in the context of AI-generated summaries. Some argue that summarizing public articles falls under fair use, especially if the summaries link back to the original sources.
Others, including NYT, argue that AI platforms devalue original content by rephrasing it for their own outputs. Publishers say that AI-generated summaries reduce user engagement with the original articles. They believe this threatens the sustainability of quality journalism.
What Is at Stake for Perplexity and Other AI Platforms?
For Perplexity, the outcome of this conflict could shape its business model and future operations. If courts rule that AI-generated summaries infringe on copyright laws, other AI platforms might face similar restrictions. Many startups rely on public data to provide value to users. If publishers successfully block this access, AI companies may need to negotiate licensing agreements for content usage.
Such restrictions could also affect users’ access to information. AI tools that summarize news articles offer convenience. They help users stay updated without sifting through multiple sources. However, these tools also risk disrupting revenue models for journalism.
Will Licensing Agreements Be the Solution?
Some publishers are exploring licensing agreements with AI firms. Under these agreements, AI companies would pay a fee to access and use their content. This model could benefit both parties. Publishers would gain a new revenue stream, while AI firms could legally use high-quality content.
However, licensing agreements could increase operational costs for AI startups. Smaller companies may struggle to afford the fees demanded by large publishers. This could lead to fewer AI tools entering the market, reducing competition and innovation.
The Future of AI and Journalism: A Complicated Relationship
The Perplexity-NYT conflict highlights the tensions between AI platforms and traditional media outlets. Publishers rely on ad revenue and subscriptions, while AI tools promise free, instant access to information. As AI systems become more powerful, regulating content usage will become a priority.
This conflict also reflects broader shifts in the media landscape. The rise of generative AI means that news consumers increasingly expect concise, AI-generated summaries. As a result, publishers must find ways to stay relevant in an AI-driven world.
Conclusion: Finding Common Ground
The ongoing dispute between Perplexity and The New York Times will set a critical precedent. The resolution of this conflict could redefine how AI platforms interact with public content. Both publishers and AI firms need to strike a balance between innovation and fair compensation.
One potential solution lies in collaborative partnerships. AI companies could develop tools that complement traditional journalism, rather than replace it. For example, AI-powered recommendation engines could drive more traffic to original sources. Similarly, publishers could integrate AI technologies to enhance their news platforms.
However, if disputes like this remain unresolved, regulatory bodies may need to step in. New laws might emerge to govern content scraping, AI usage, and digital copyright. Until then, the tension between AI platforms and publishers will continue to shape the future of journalism.
Ultimately, the goal should be to harness the potential of AI while respecting the value of original content. Both sides stand to benefit from a fair and transparent content ecosystem. With the right approach, AI and journalism can coexist and thrive together.