Close Menu
    Login
    • Log in
    • Entries feed
    • Comments feed
    • WordPress.org
    • Home
    • Technology
    • Daily Tech
      • Science and Technology
    • Gadgets
    • Gaming
    • Space Exploration
    • Scope
    • Tech News
    Facebook X (Twitter) Instagram Pinterest YouTube WhatsApp
    Facebook X (Twitter) Instagram
    NewTechManiaNewTechMania
    Login
    • Home
    • Blog
    • Gadgets
      • Gaming
    • Technology
      • Science
    • Automobile
    • Exploration
    • Scope
    • Tech News
    NewTechManiaNewTechMania
    Daily Tech

    Despite blocking mechanisms, AI businesses scrape websites – technology

    By Skypeak Limits24 June 2024No Comments4 Mins Read
    Facebook Twitter Pinterest LinkedIn Tumblr Email
    ae85fa50 3069 11ef bb7f 65cb0ccb046f
    ae85fa50 3069 11ef bb7f 65cb0ccb046f
    Share
    Facebook Twitter LinkedIn Pinterest Email

    According to Reuters, a number of artificial intelligence businesses are currently avoiding robots.txt guidelines.

    Over the course of the past several days, the company known as Perplexity, which markets its services as “a free AI search engine,” has come under widespread criticism. Wired stated that Perplexity has been defying the Robots Exclusion Protocol, often known as robots.txt, and has been scraping its website as well as other Condé Nast properties. This news came shortly after Forbes accused Perplexity of copying their story and republishing it across numerous platforms. Another accusation leveled against the company was that it scraped articles from the technology website The Shortcut. Additionally, Reuters has discovered that Perplexity is not the only artificial intelligence business that is scraping websites and avoiding robots.txt files in order to obtain content that is subsequently utilized for the purpose of training their technology.

    TollBit is a startup that matches publishers with artificial intelligence companies in order to facilitate licensing transactions. According to Reuters, the company sent a letter to publishers in which it warned them that “AI agents from multiple sources (not just one company) are opting to bypass the robots.txt protocol in order to retrieve content from sites.” Instructions for web crawlers regarding which pages they are permitted to access and which they are not are contained within the robots.txt file. The protocol has been utilized by web developers since the year 1994; nevertheless, compliance is entirely dependent on the individual.

    Business Insider claims to have found that OpenAI and Anthropic, the companies who are responsible for the creation of the ChatGPT and Claude chatbots, respectively, are also circumventing robots.txt signals. However, TollBit’s letter did not reference any specific firm. Both businesses have previously stated that they respect the “do not crawl” directives that websites include in their robots.txt files.

    During the course of its investigation, Wired found out that a machine functioning on an Amazon server that was “certainly operated by Perplexity” was circumventing the robots.txt instructions that were located on its website. Wired gave the company’s tool with headlines from its articles or brief prompts explaining its tales in order to determine whether or not Perplexity was scraping its content. It has been noted that the tool produced results that closely paraphrased its articles “with minimal attribution.” And on occasion, it even generated misleading summaries for its stories. According to Wired, the chatbot made a bogus claim that it reported on a certain California police officer committing a crime in one instance.

    Over the course of an interview with Fast Company, Aravind Srinivas, the Chief Executive Officer of Perplexity, stated that his organization “is not ignoring the Robot Exclusions Protocol and then lying about it.” On the other hand, this does not imply that it does not reap the benefits of crawlers that do disregard the procedure. Additionally, Srinivas noted that the organization makes use of web crawlers that are provided by third parties, and that the crawler that was discovered by Wired was one of such crawlers. In response to Fast Company’s inquiry as to whether or not Perplexity had instructed the crawler provider to cease scraping Wired’s website, he only responded, “it’s complicated.”

    In his defense of the business tactics of his organization, Srinivas stated to the publication that the Robots Exclusion Protocol is “not a legal framework.” He also suggested that publishers and businesses similar to his could need to form a new kind of connection. It has also been reported that he made the insinuation that Wired had purposefully employed prompts in order to make the chatbot for Perplexity act in the manner that it did, so that regular users would not have the same results. Srinivas stated, “We have never said that we have never hallucinated.” This was in reference to the erroneous summaries that the program had created.

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleFive men are prosecuted for running Jetflicks, an unlicensed streaming business – technology
    Next Article A memecoin seller hacked 50 Cent’s accounts and stole millions – technology

    Related Posts

    Sam Altman Says Mission Driven AI Talent Will Outperform Meta’s

    Skypeaklimits 2024: Your Digital Success Elevate Your Presence

    OpenAI partners with Palmer Luckey’s Anduril to build military AI

    MS assures Windows 11 TPM security requirement won’t change

    Add A Comment

    Comments are closed.

    NewTechMania Logo

    About Us
    Embark on a tech adventure with NewTechMania. From the latest gadgets to emerging technologies, join us in exploring the possibilities that lie ahead.

    Catergories
    • Home
    • Technology
    • Daily Tech
      • Science and Technology
    • Gadgets
    • Gaming
    • Space Exploration
    • Scope
    • Tech News
    Useful Links
    • Home
    • About Us
    • Contact Us
    • Get In Touch
    Facebook X (Twitter) Instagram Pinterest
    • Privacy
    • Cookie
    • Disclaimer
    • Terms
    • DMCA
    • About
    • Contact
    © 2025 NewTechMania. All RightS Reserved.

    Type above and press Enter to search. Press Esc to cancel.

    Sign In or Register

    Welcome Back!

    Login to your account below.

    Lost password?