Skip to content

Learn Open Data / Open Data Overview

This page includes the following sections:


Definition of Open Data

The following is a useful general definition of open data (see "Generating Economic Value Through Open Data" chapter in "Beyond Transparency").

  • Accessible to all - the data become accessible outside of the organization that generated or collected it
  • Machine-readable - data must be usable, which means it must be made available in formats that are easily used by third-party applications
  • Free (zero or low costs) for data access and openness
  • Unrestricted rights to use - data that is unencumbered by contractual or other restrictions leads to the maximum potential for innovation

Why Open Data?

The following are reasons for open data:

  1. Society/citizens have paid for the data so it should be made public.
  2. Good government is based on transparency and open data helps with transparency.
  3. Transparency leads to better government (assuming that the data are not gamed and feedback results in positive change).
  4. Open data breaks down data silos - data will actually get used instead of just being filed away somewhere.
  5. Open data leads to open innovation - resources can be spent on using data for research and innovation rather than finding and transcribing it.
  6. Open data leads to more/better communication - multiple entities see the same data.
  7. Open data can reduce costs because it is easier to find and access data.
  8. Open data facilitates education and comprehension - we cay see the data behind the narrative.
  9. More access to more data allows more people to find more solutions for the many problems we face.
  10. The next generation will consist of more data scientists and we need to encourage science, technology, engineering, and math careers.

Concerns about Open Data

The following are common concerns about open data, with counter-points provided. The word "my" is used to indicate the data publisher, for example a government entity.

  1. I don't know how the data will be used.
    • That is correct - transparency and openness lead to free access to data. Once the decision has been made to make data openly available, limits cannot be placed on who gets to use the data.
  2. People won't understand the data.
    • This can be improved by providing good metadata, documentation, and data annotations.
    • If other people can't understand the data, how well do the maintainers understand it?
    • Neutral third parties such as nonprofits can help provide a lens through which to view the data.
  3. My data will be used against me.
    • If this occurs, does that mean you are doing something wrong?
    • Public data should support policy, law, etc.
    • Mitigate this concern with good documentation.
  4. Someone will sell my data and make money.
    • Maybe, but if it is government data, the taxpayers already paid once. It is really only an issue if the third party makes money and citizens don't have access to the data they paid to develop (and the freedom to use themselves).
    • And, it is perhaps not easy to resell something that is free (think about weather data, Google traffic, etc.)
    • Do a good job publishing the data and many consumers may prefer the free original source.
    • Allowing innovation and new economic growth is good, isn't it?
  5. The data have quality issues.
    • Probably, but all data has quality issues. Use a disclaimer.
    • Alternatively, by publishing data, quality should go up as issues are identified and addressed.
  6. We must protect privacy concerns.
    • Agreed, so publish data to a reasonable level using best practices (see the Sunlight Foundation Guidelines.
    • In contrast, look at how much information people provide on social media, government staff salaries and contact information are often published, etc. Government must be conservative when publishing private information, but this should not be used as a blanket reason to prohibit all open data on a topic.
  7. It will take resources to publish data.
    • Agreed, but in many cases electronic forms of the data already exist and the incremental effort to publish open data should be small.
    • Technologies exist to simplify publishing the data.
    • Knowing that data will be published can lead to process improvements and higher quality outputs.
    • By publishing the data, costs may actually be reduced in other areas such as data sharing within an organization.
    • Publishing data as a matter of policy may avoid costly open records requests.
  8. Once people start using the data, we'll have to maintain the data and that will require resources.
    • Agreed...is that bad? Why is the data not already being maintained?
    • Using open data in government and business processes will help improve processes, which can reduce costs.