« StatCrunch: Data Analysis on the Web | Main | How Web 2.0 Can Enhance Data »

November 27, 2007

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341c8db453ef00e54f8f40838833

Listed below are links to weblogs that reference The value of already public data:

Comments

mamacate

While I agree with you that the availability of federal data on the web is terrible (I spent an hour trying to walk an office assistant through finding our OWN data on the IPEDS PAS website, and she's darned smart but the interface is utterly baffling), I think that any responsible provision of data is in fact "hard" or at least incredibly time-consuming. Data have to be described and documented in order to be meaningful, so a giant file of census data on the web would need to have reams of documentation associated with it. Ideally, the enormous data sets would be broken down into subsets of interest. These are available in a variety of ways, but the fact is that you really *do* need to understand what you're looking for, and "fishing expeditions" (even when they're off tips of icebergs) are at great risk for terrible misuse and misinterpretation of data. I'm all for making public data public, but I'd like to ask the federal government to do so in a meaningful and responsible way, and I don't think that means slapping ascii files up on a server without documentation. If data availability were the main issue, we'd be done: it's interpretation and management of data that are the real challenges.

Sara Wood

That is an excellent point, thank you. The issue of data curation is crucial to the understanding and use of data. Furthermore, data curation is timely and expensive. Academics have been looking into this issue for years - and governments could learn from their lead. Have a look at some of the work being done at places like ICPSR at the University of Michigan ( http://www.icpsr.umich.edu/ ), the Institute for Quantitative Social Sciences at Harvard ( http://www.iq.harvard.edu/ ) or the Data Enclave at NORC University of Chicago ( http://www.norc.org ).

Matt

It seems as though the problem is that there is a separation of authority from technical knowledge. The people with authority don't have technical knowledge (Marybeth Peters, US Register of Copyrights is a self proclaimed Luddite) and assume it's not possible (and/or misinterpret regulations). Then, the people with knowledge either are either not in the government or are lazy and tell their bosses that it can't be done.

Those with the motivation and skill to rise to a position of authority who also possess technical knowledge may tend to the private sector, where they can, you know, make money.

Jon Koomey

I agree strongly with mamacate's comment above. Making those data available with machine readable metadata is not a trivial task, even if there were a standardized metadata format that could simply be applied to those data. The Umich link above seems to describe one effort to standardize on such a format (the Data Documentation Initiative).


Aside: When I clicked on the links above for www.iq.harvard.edu and www.norc.org
I found that they are bad links. I think these are just typos or stray characters that have corrupted the links.

Joe Hellerstein

There is a lot of historical census data from the IPUMS project at U. Minnesota, http://usa.ipums.org/usa. And yes, understanding it and interpreting it is complex. But still, having it there is way better than not.

From their description:

"What is IPUMS?

The Integrated Public Use Microdata Series (IPUMS) consists of thirty-nine high-precision samples of the American population drawn from fifteen federal censuses and from the American Community Surveys of 2000-2006. Some of these samples have existed for years, and others were created specifically for this database. The thirty-nine samples, which draw on every surviving census from 1850-2000, and the 2000-2006 ACS samples, collectively comprise our richest source of quantitative information on long-term changes in the American population. However, because different investigators created these samples at different times, they employed a wide variety of record layouts, coding schemes, and documentation. This has complicated efforts to use them to study change over time. The IPUMS assigns uniform codes across all the samples and brings relevant documentation into a coherent form to facilitate analysis of social and economic change. "

Bill

"Can’t you put everything onto an FTP server somewhere and I can harvest it all and ongoing as new data are available?”

You can get almost all of the census datasets for free here:

ftp://ftp2.census.gov/

http://www2.census.gov

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment