Transparency Community Voices Concerns over Data.gov

Yesterday, a group of organizations, including OMB Watch, submitted concerns with the high-value datasets published on Data.gov in compliance with the requirements of the Open Government Directive (OGD) issued on Dec. 8. The OGD required that agencies submit at least three high value datasets within 45 days through Data.gov.   These groups outlined the major problems with the site and its implementation thus far.  I have summarized these issues below.

Yesterday, a group of organizations, including OMB Watch, submitted concerns with the high-value datasets published on Data.gov in compliance with the requirements of the Open Government Directive (OGD) issued on Dec. 8. The OGD required that agencies submit at least three high value datasets within 45 days through Data.gov.   These groups outlined the major problems with the site and its implementation thus far.  I have summarized these issues below.

Format & Usability

A major concern of the community is that releasing data in specific formats may make it more usable to coders and the tech-savvy, but not to the general public writ large.  If data is solely released in formats such as XML or CSV leaving the majority of the public unable to decipher these raw formats then Data.gov’s attempt to make the government more transparent has failed.    What good is information if we can’t read it?

The solution?  We proposed that the administration strike a balance between releasing data in machine readable formats and presenting the data to the public through web-based interfaces.  Web-based interface can be designed in a user-friendly way that aggregates the raw data for quick and convenient access.

Definition of High Value


The OGD mandated that agencies release three “high-value” datasets on Data.gov.  However, the Sunlight Foundation noted that only 16 of the 58 datasets posted by major agencies were previously unavailable.  The vast majority of released datasets were already online but not in machine readable format.  This meant that the administration only picked the low-hanging fruit in its first data release.  On the other hand, the data was now available in a central location and in a better format.

Still, this leaves open the question of how the government is defining “high value.”  According to the Open Government Directive, high value data is information that can be used to increase agency accountability and responsiveness; improve public knowledge of the agency and its operations; further the core mission of the agency; create economic opportunity; or respond to need and demand as identified through public consultation.  Yet, the agencies do not have to demonstrate how the releases they submit qualify under these categories.  To resolve this issue we expressed the need for such a requirement.  Other things we asked for:  notations that indicate which datasets are already available and unavailable as well as datasets that help hold agencies accountable for their policies and spending decisions.

Data Quality

This third issue mainly centers on the fact that some datasets could not be opened, were missing portions, or missing headers.  Missing headers, of course, means the data cannot be used even by coders.  Moreover, it was discovered that some of the datasets were being quietly removed from the site without public notification.

Here, we stressed the need for a better feedback mechanism than what exists on the site.  We made the point that there needs to be a system to report problems with specific datasets.  Further, we asked that all datasets on Data.gov be directly associated with their code sheets.
Ultimately, this is a great first step in showing the amount of data the agencies are capable of putting out in machine readable format.  Never before have we been able to access so much raw data in one place.  Despite the short deadline for this disclosure, several executive agencies released more than the required three datasets.  The implementation, however, needs to be improved through  creating public facing interfaces, requiring agencies to demonstrate the value of the data, and by providing a means of user feedback.

In order to improve Data.gov and the range of data included on the site, the administration is welcoming comments on its blog, Join the Dialogue. Additionally, Data.gov allows users to rate each dataset for ease of access, usefulness, data utility, and an overall ranking.

To read the group’s arguments and other points of contention in full, see the letter we sent here.  Feel free to give us your feedback.

 

back to Blog