Learn Open Data / Maintaining Open Datasets
Maintaining an open dataset involves managing the electronic data assets for the dataset according to the protocols that have been defined. This can be a simple task or very complicated, depending on the complexity and size of the dataset.
The following are fundamental considerations when maintaining an open dataset:
- Organization - Determine a responsible entity that is suitable to act as steward of the dataset:
- Ideally this entity should have a certain level of "ownership" and organizational mission that aligns with maintaining the dataset so that there is continued interest in serving in the maintenance role.
- The organization and technology platform may overlap, for example if cross-jurisdicational nonprofit open data portal organization is formed to host data.
- The entity should have philosophically agreed to publish the dataset in an open way, without barriers and strings attached, using a suitable machine-readable formats.
- Suitable open data policy and licenses should be used.
- Ideally funding is identified to help the entity perform maintenance. Although some volunteer effort may be appropriate, reliance on volunteer efforts can lead to gaps in maintenance.
- Establish protocols for submitting feedback and implement procedures to update the dataset accordingly.
- Technology Platform(s) - Select one or more technology platforms that are cost-effective, sustainable, and functional.
- Technologies are used to collect, process, store, and publish data.
- Can be simple (such as a file on a website) or complex (such as an open data portal), depending on requirements.
- Should support open data formats.
- Recognize that the use of technology can be a limitation if the technology is costly or difficult to learn, use, and support - strive for simplicity.
- Data Format - Maintain and publish datasets in formats that can be sustained.
- For simple dataset, an Excel workbook may be suitable for maintenance.
- For complex dataset, a database may be required.
- The approach that is chosen should be readily supportable with available human resources.
- Data Elements - The dataset should contain elements that enable its use:
- Unique identifiers should be assigned in data records to allow unique identification and joining the data to other datasets.
- Descriptive names should be provided in data records.
- Use a simple data model and "flat" representation if possible.
- Metadata - Metadata for the dataset should be published:
- At a minimum, provide a table the lists data elements and description.
- Use open metadata standards where possible.
- Publishing - Publish the dataset and metadata:
- Easily accessible.
- Multiple formats to facilitate use.
- Minimize cost to use the data by selecting appropriate technologies.
- Include license/terms of use, disclaimer, etc. to define terms for use and distribution.