An e4m story by Kanchan Srivastava published on 14 September 2023 says that leading Hindi newspaper publishers Dainik Bhaskar and Amar Ujala have decided to stop OpenAI and the other large tech companies that are working on and promoting their Large Language Models (LLMs) to build their own generative AI tools. Members of the India-based Digital News Publishers Association (DNPA) such as Bennet Coleman, HT Media, and The Hindu have restricted access to Microsoft-backed OpenAI. They are taking action to block web crawlers from their content and attempting to suitably update their copyright terms and conditions.
Global news publishers such as the New York Times, The Guardian, CNN and Reuters have already blocked OpenAI’s access to their online content. The NY Times has threatened to file a lawsuit against OpenAI using its Generative AI tool on copyrighted articles published by the paper.
Additionally the two Hindi news media groups (Dainik Bhaskar also publishes dailies in Gujarati and Marathi apart from Hindi) are attempting to prevent the AI developers and platforms from scanning their digital content for training their LLM models without their written permission. Both publishers updated their terms of reference on their websites last week defining the ‘non-commercial use’ of content in more detail and warning against its unauthorized use as free fodder for LLM models.
On Dainik Bhaskar’s website, the new terms and conditions read, ‘All materials published or available on the Services are protected by copyright, and owned or controlled by DBCL solely or in association with third parties or with such other parties who are given credit as the provider of the Content. Non-commercial use of the Service shall also include the use of Content only upon obtaining prior written consent from DBCL in connection with: (1) the development of any software program, including, but not limited to, training a machine learning or artificial intelligence (AI) system or large language model (LLM); or (2) providing archived or cached data sets containing Content to another person or entity.’
The company has defined its content rights as ‘including, but not limited to all text, photographs, images, illustrations, designs, audio clips, video clips, “look and feel,” metadata, data, or compilations, all referred to as the ‘Content.’” Amar Ujala has added similar conditions on its website in Hindi barring makers of all devices and services, including robots and spiders from using its content without written permission.
Generative AI tools, including ChatGPT, are based on Large language Models (LLM), which are fed and trained using various algorithms by creating a huge library consisting of documents, articles, scholarly work, technical reports and other verbal and visual communication including social media that can be crawled on the internet. It is in the first place a brute force exercise of trying to digest the huge mass of available content and then categorizing it and then establishing algorithms that when prompted mimic natural or semantic rules for putting together strings of words on the basis of a hierarchy of rule for various categories or subject sets. When senseless strings or strings with misplaced artefacts are generated these are called ‘hallucinations.’
As far as the Indian newspapers, Open AI has neither acknowledged their contributions to its data collection nor presented any hope or model of revenue sharing. Some of the Indian publishers are also contemplating the development of their own Large Language Models and AI models, even as they say that they are keeping ChatGPT and other AI out of their newsrooms. At the same time, the technology is seen as an inevitable tool with television news channels having demonstrated their 3D animated anchors.
The competition commissions and courts in developed countries are taking on the big tech monopolies
At the same time, the DNPA and Indian Newspaper Society took Google India to the Competition Commission of India (CCI) more than two years ago suggesting that the tech platform has deprived them of their fair share of ad revenue. Google has denied this and the matter is still pending with the CCI with the government seemingly reluctant to empower its own news media by shoring up its economics following models such as those imposed by the Australian and Canadian governments. Similar cases against Google and Facebook are also pending in the US courts.
On 19 September 2023, Google has attempted what is described as a last ditch effort to avoid paying a Euro 2.42 billion (approximately Rs 20,500 crore) EU antitrust fine related to its shopping service, claiming that the regulators had not sufficiently demonstrated that its practices were anti-competitive. The fine was levied by EU antitrust chief Margrethe Vestager in 2017. Google’s current appeal to the Court of Justice of European Union follows the rejection of its earlier appeal by the General Court in 2021.
The appeal against the antitrust fine is one of three penalties imposed by the EU against Google totalling Euro 8.5 billion (approximately Rs 72,000 crore). The CJEU is expected to rule in the coming months on the antitrust case while the other penalties pertain to its abuse of its dominant search positions to lower the visibility of other advertisers and its Adsense service on Android operating systems.