Agencies Make Data More Widely Available Through Data.gov

On Jan. 22, executive agencies posted hundreds of datasets onto Data.gov as required under the Open Government Directive (OGD). Many transparency advocates have lauded the administration’s efforts while at the same time raising questions about how well this first initiative under the OGD actually worked. The release of the datasets has triggered discussions about the value of the data, how individual privacy rights are protected, whether the datasets being released are new, and the quality of the data that has been released.

Under the OGD, published Dec. 8, 2009, executive branch agencies had 45 days to release at least three "high-value" datasets on their websites and register them with Data.gov. These datasets were to be information "not previously available online or in a downloadable format" and were to be published "online in an open format." All together, about 300 new datasets were uploaded to Data.gov, with 175 labeled "high-value." The topics of datasets released varied widely across the agencies, from population counts of wild horses and burros to hate crime statistics.

Despite the short deadline for this disclosure, several executive agencies released more than the required three datasets. The Departments of Defense, Energy, and Labor led the pack by releasing six high-value datasets each. Numerous independent agencies, such as the National Transportation Safety Board and the Equal Employment Opportunity Commission, also released datasets, despite the fact that the OGD does not appear to apply to them.

A sticking point is the definition of "high-value." According to the OGD, the definition covers information "that can be used to increase agency accountability and responsiveness; improve public knowledge of the agency and its operations; further the core mission of the agency; create economic opportunity; or respond to need and demand as identified through public consultation." A review of the datasets that have been released seem to indicate a limited number of datasets targeting agency accountability. Most agencies did not provide a justification for why the released dataset was considered "high-value."

The datasets that were released are supposed to be just the first installment. By April 7, each federal agency is required to develop an Open Government Plan that includes an inventory of high-value information available for download and identifies high-value information not yet available to the public. For such information, the agency is to provide specific target dates for making the material publicly available.

Some of the new sets that were released on Jan. 22 improve public access while others are simple data that has been released in the past but in raw formats. For example, instead of publishing the data in PDF format, the underlying data is available. Some changes will likely be perceived as very positive. In one case, the Center for Medicare and Medicaid Services, part of the Department of Health and Human Services, chose to publish for free Medicaid information that previously was only available by purchasing a CD-ROM from the agency. Several datasets offer increased insight into inspections and safety ratings, including two on housing inspection scores from the Department of Housing and Urban Development, a tire grading system and child car seat scores from the Department of Transportation, and chemical hazard information from the U.S. Environmental Protection Agency.

However, not all of the data appear to be new data not previously released. Instead, many of the datasets, while already available elsewhere, were being published in new, machine-readable formats that can be more easily manipulated by the public into useful tools. According to the Sunlight Foundation, of the 58 datasets released by the executive agencies, only 16 were previously unavailable in some format online.

The sheer scope of topics made it difficult for any one organization to evaluate the usefulness or value of the new data. Heather West of the Center for Democracy and Technology wrote, “There are some data sets that are clearly high value to the public. Hopefully, this is the start of a process to release all the data sets that are valuable, no matter how valuable or to whom.”

A complication to the release is that some of the datasets that were originally posted by agencies have already been taken down. All three sets posted by the Peace Corps were removed, as well as two from the Nuclear Regulatory Commission. Additionally, a Department of Education set about expenditure data in public schools that was already available through the National Center for Education Statistics was removed from the list of OGD data sets on Data.gov. In fact, the data policy on Data.gov does not mention anything that would guarantee permanent public access, meaning the agencies can take down information just as easily as they can put it up. These actions have reportedly been taken largely due to concerns over individual privacy rights.

In order to improve Data.gov and the range of data included on the site, the administration is welcoming comments on its blog, Join the Dialogue. Additionally, Data.gov allows users to rate each dataset for ease of access, usefulness, data utility, and an overall ranking. Several of the new datasets have already received numerous votes, including the Department of Veteran Affairs data on patient satisfaction with hospitals that currently has 47 votes but low scores and the Department of Homeland Security’s data set on the Federal Emergency Management Administration’s disaster declarations that has 10 votes and top marks in each category.

Overall, the effort demonstrated that if the government can push out this much data in 45 days, then what it is able to accomplish is quite promising. It should be noted that most of the datasets are only available in raw formats, and some of the files are quite large, ranging upward to several hundred megabytes. The general public will find them of limited use. The hope is that public interest groups, reporters, academics, and others will review the information, build interfaces, and report on findings. As agencies move forward with this process, it will be important for them to identify the most important and useful datasets and develop their own interfaces to allow broader public review of the information. The administration’s ongoing dialogue with partner groups and the public will likely be key in identifying these top datasets.

back to Blog