Learn Open Data / Resources
The following are useful open data resources:
- Policies, Initiatives, and Guides
- Data Formats
- Software - Portals
- Software - Tools
- Software - Linux Command Line Shell
Policies, Initiatives, and Guides
- Open Data Handbook - guide, stories, resources
- Open Government Data (The Book)
- Sunlight Foundation Open Data Policy Guidelines - the source for policy guidance
- U.S. Data Federation - federal government open data initiatives, including Project Open Data
Data Formats
The following data formats are commonly used for open datasets:
- Tabular
- Comma-separated-value (or other delimiter) - see OWF Learn CSV
- Spatial
- Esri Shapefile
- GeoJSON - see OWF Learn GeoJSON
Software - Portals
Open data portals provide searchable catalogs and provide access to datasets in various formats. The following are popular open data portals software platforms:
- CKAN (open source)
- Socrata (commercial)
Software - Tools
The following software tools are helpful to process datasets:
- curl (Windows) - useful to automate data downloads (also available in Cygwin, Linux, MinGW)
- TSTool (Windows and Linux) - useful to automate time series and tabular dataset processing
Software - Linux Command Line Shell
A Linux shell provides command-line access to powerful data processing tools. Such a shell is more powerful than Windows command prompt and is portable across operating systems. The following are options for enabling a Linux shell, and software may already be available, depending on what has already been installed on the computer.
- Cygwin - a Linux environment on Windows (see OWF Learn Cygwin)
- Git Bash - installed with Git
- see Git for Windows - this will start a download
git
is available on Linux environments through normal install tools
- Windows Subsystem for Linux
- Windows 10 now offers a Linux shell - see documentation to enable
- Linux
- pick a version such as Ubuntu
- can install on Windows using a virtual machine such as Virtual Box