Successive Indian governments have followed a closed-door policy for sharing government data with the public. Be it the UPA or the NDA, both during their respective regimes have tried to hide data and make it most difficult for people to get authentic data that should be normally available in any democratic nation. All the mature democracies encourage sharing data to expand the public domain. India also does this, but not so transparently.
There are three important issues here. One is the availability of data. The second is regular data update. And the third is the authentication of the data. Most government and private institutions in India do not have the most critical data that they should have. For example, we do not have any updated job data, updated garbage data or updated water data in the country. The employment data was first discontinued in 2009 by the UPA and the little that we had in form of the labor bureau annual survey data was discontinued in 2017 by the NDA. The garbage and groundwater data was last updated almost a decade ago, in the 2010-11 Census. Several data that appears in government websites keeps vanishing without notice, and gets replaced by a new data set without explanation or continuity or transition information.
Making data access difficult is not a recent phenomenon
The callous attitude to data and the lack of its transparency is not a recent phenomenon. It has been there for over a decade. The trend has been to hide it from the citizen and the press and unlock it only after user verification. The UPA went as far as to remove data from the public domain and then create a right to information act (RTI act) where getting data from the government became a skill and test of patience – a fine art that only activists and crusaders have acquired. As a result, the common citizen was left out of the discourse and a lot of critical government data disappeared from the websites. The UPA claimed it had given the citizens a right to social justice.
The present government is equally guilty, though claiming to be different. The data available has become yet more selective and disjointed. In some cases the way of calculating data has been changed. Then again data often begins at 2014 – defying logic. At least 10-year statistics should be displayed. Also to access most of the data hosted by the National Informatics Centre (NIC) you have to log in with a user ID and password. This is not an accepted global practice and does nothing for ease of business. Keeping data under lock and key is not seen in any of the mature democracies.
Also critical data like solid waste and water data has disappeared from CPCB, CWC and other government websites despite it being crucial for Swachh Bharat and other flagship programs. At the recent Media Rumble in Delhi we attended a session on sourcing data, primarily on how data makes great stories. I spoke to Rakesh Dubbudu, the speaker at the session and the founder of FACTLY (https://factly.in/)—a well-known data journalism portal.
Data entry is outsourced to contractors
Dubbudu confirmed the difficulty in finding authentic updated data but said that there are several sources that can be tried. “The RBI data is one of the most authentic and comprehensive data available. There are two types of data here – one, which RBI generates itself, for which annual data as well as quarterly releases are available. Then there is data that the RBI collates painstakingly from other sources including the states. Fifty parameters are tracked including the state’s GDP and fiscal deficit that can be fairly informative. These are available in its annual reports of states and are usually dependable,” said Dubbudu. “Then there is data from MOSPI, the Ministry of Statistics and Program Implementation which again is authentic and detailed. However, at times these may not be fully updated.”
Dubbudu further confirmed that data from parliament proceedings is normally authentic because if they lie, a privileges committee will look into it. “Also there are parliament standing committee reports which have high quality data that is usually the latest. The CAG reports are also a great source of information with wonderful insights; although they are often issued two years after the time the event occurred. From 2009-10 onwards most ministries have started publishing annual data; it gives complete data of at least two to three years. For crime data the NCRB data is there but usually outdated. Then there are scheme websites like those of the Ujjwala scheme where the data is granular but keeps changing as it is frequently updated,” said Dubbudu.
Reasons behind existence of unreliable data
When asked why the data is available in plenty but with uncertain integrity, Dubbudu shared, “One of the reasons of lack of authentic data is because government recruitment methods have not been updated. Today since all information is entered in a database, one of the biggest recruitment by the government should be of data entry operators. But there are no in-house data entry operators in the government. Everything is outsourced to contractors and there is little accountability.” This could be the reason for data errors; especially because the contract is usually short term – for six months or a year. So lack of ownership, responsibility and long-term commitment of the contractual worker could be a reason we lack data integrity.