Guidance

Record information about data sets you share with others

Using metadata to make it easier to catalogue, validate, reuse and share your data.

When you create a spreadsheet, CSV file or other data in , you should create a record with information about your data and store it with your data. This information is called metadata. By doing this, you will:

  • make your data searchable
  • find it easier to catalogue and validate your data
  • make sure your data is accessible and reusable - often your data is reused even when you do not expect it to be

Refer to the guide on publishing your tabular data, if you鈥檙e making your data open. All CSV files should comply with the Tabular data standard.

Who should use this guidance

Use this guidance if you are creating any data in tabular form that you intend to share. Data, in this instance, refers to data sets collected, used and maintained for analytics or for providing government services. It does not refer to finished documents.

You should use this guidance if your government organisation does not currently have metadata guidance for you to use. This guidance will become part of a collection to assist those already working with metadata.

Do not follow this guidance if you are creating, maintaining or managing metadata for geospatial data (that which references data to a location on the surface of the Earth). You should use metadata for spatial data sets, including those covered by the . You can also refer to the open standards profiles on 鈥楨xchange of location point鈥� and 鈥業dentifying property and street information鈥� for more details.

Using metadata in government

By following this guidance, you will be using a consistent metadata vocabulary which will improve interoperability across government. The metadata vocabulary in this guidance uses the Open Standards of and that are both recommended for government use.

If you are intending to publish your data, you should also read 鈥�Publishing tabular data鈥�.

Where to record and store your metadata

When recording metadata, it鈥檚 useful to store this close to, or with, the data it鈥檚 describing.

You can do this by storing metadata:

  • within a data spreadsheet a separate tab
  • in a separate file, such as a readme file, and keep a record showing the link between data and metadata
  • in a Metadata Catalogue if your government organisation has one

When publishing your data, you will need to consider where you store your metadata depending on the types of data you are publishing and how findable you want your metadata to be. Read our guidance on 鈥�Publishing tabular data鈥� to understand more about how you publish metadata.

Making metadata machine readable and accessible

To make metadata machine readable and accessible, you must format your metadata in a specific way. For example, use camelCase which is the practice of writing phrases so that there are no spaces between words and each word in the middle of the phrase begins with a capital letter.

When recording your metadata, make sure you use plain English and follow the writing for 188体育 guide. For example, do not use jargon, and make sure you define technical terms and expand acronyms. Try to avoid using symbols that users might misinterpret.

When you do not have the information you need to record, you can still add the metadata, but add 鈥渦nknown鈥� when relevant.

Metadata you should record

You should record information that will help others:

  • be informed on where and when your data was collected - use 鈥榗reator鈥� and 鈥榙ateCreated鈥� to record who created the data set and the date they created it

  • find the data you鈥檝e saved on a shared network, and identify whether it鈥檚 the data set they need - use 鈥榥ame鈥�, 鈥榙escription鈥� and 鈥榠dentifier鈥� to describe your data

  • validate the data you鈥檝e collected - use 鈥榚xpires鈥� and 鈥榮upersededBy鈥欌� so users know which version of your data to use, 鈥榯emporalCoverage鈥� to indicate the time period to which your data applies, and 鈥榗onformsTo鈥� to tell users whether your file applies to a specific standard or schema

  • use the data you鈥檝e collected appropriately - use 鈥榟asDigitalDocumentPermission鈥� to make sure users do not share sensitive data in ways it shouldn鈥檛 be and 鈥榣icense鈥� to help users understand their rights to using the data you鈥檝e collected

  • understand the structure and format of your CSV tabular data - use the and read our guidance on 鈥�Publishing tabular data鈥� to get started

Try to avoid recording any metadata that includes personal data. If you include personal data, you will need to comply with the principles, rights and obligations contained in GDPR. You can read the for more information.

Recording dates in your metadata

You must record any dates using the ISO 8601 standard, which is an Open Standard selected for use by the government.

This means listing the date and time elements in descending order of size (years, months, days, hours, minutes, seconds, milliseconds and microseconds). You should provide the right level of accuracy for your data set. For example, if you publish your data set once a year, it might be enough to provide a date down to the day, for example, 2020-07-14. If you publish multiple times a day, it is better to include information down to the second, for example, 2020-07-14T12:57:03Z.

Record the provenance of your data

Using 鈥榗reator鈥� or 鈥榗ontributor鈥�

You should record who created a data set so users can communicate with the creator and understand if the data is relevant to them. For example, a data analyst may want to find out how reliable a data set is before undertaking any analysis.

Record a name for future reference, and an email address if possible. This name and email address should refer to:

  • the name of a team or organisation
  • a role within a team
  • an individual name in some cases - if you can do this while remaining GDPR compliant

For example, creator:鈥滵ata Standards Authority team [email protected]鈥�

You can use 鈥榗ontributor鈥� instead if multiple organisations or teams are contributing to the data set. You can also use 鈥榗reator鈥� and 鈥榗ontributor鈥� together for full clarity around where data has come from.

Using 鈥榙ateCreated鈥�

You should record the date when you create a data set to help users of the data set know whether it is valid and relevant to them. You must record the date using the Open Standard ISO 8601.

For example, dateCreated:鈥�2002-10-02鈥�

You must capture the exact time a data set is collected when you鈥檙e collecting more than one version of a data set a day.

Help users find, use and identify your data set

Using 鈥榥ame鈥�

You must include the name of your data set so users can find and identify the right data set.

You should try to make sure the name captures information that will help users determine whether the data set meets their needs. For example, by capturing the topic and specific information about place and geography.

For example, name:鈥滸DS London Office Employees office commuting tendencies鈥�

Using 鈥榙escription鈥�

You can add a description to your data set, in addition to the title, so that users of your data can find out if it鈥檚 relevant to them.

The descriptions of your data should only describe the type of data collected and should not include warnings about how to use the data - any warnings should be explained with the term 鈥榓ccessRights鈥�.

For example, description:鈥漈he amount GDS employees commute to the office and their busiest times to travel. This data also shows the tendencies of GDS employees to work from home鈥�

Using 鈥榠dentifier鈥�

You should uniquely identify your data set so that users of your data know exactly which source they鈥檙e using.

You should identify your data set by:

  • using the identification system your organisation is using (in cases where organisations have a system in place)

  • using a meaningless identifier you鈥檝e created - this should be random numbers rather than sequential or semi-sequential numbers to avoid meaning being implied

Using a meaningless identifier avoids misunderstanding that comes with applying meaning to identifiers. For example, meaning can change over time. Meaningless identifiers have the ability to be genuinely constant things.

For example, identifier:鈥�362857580鈥�

You can ensure this meaningless identifier stays unique by keeping a catalogue of all data sets with their identifiers.

Using 鈥榚ncodingFormat鈥�

You should record the file format in which you store your data so users know how to use and import it.

File extensions are commonly used for your operating system to decide what program to open a file with. Common file extensions include XLS for Excel spreadsheets and CSV.

Example, encodingFormat:鈥漻ls鈥�

If you think you may publish your data set, you can also record the media type. Media types are used by browsers to decide how to present some data. For more information read, 鈥�Publishing tabular data鈥�.

Media type is also known as a Multipurpose Internet Mail Extensions or MIME type. Mozilla .

Example, encodingFormat:鈥漥peg鈥�

Help others validate your data

Using 鈥榮upersededBy鈥�

When the data you鈥檙e collecting replaces an older version, you should record this change to make sure users use the most up-to-date version.

You must only use 鈥榮upersededBy鈥� when the data you鈥檙e collecting has:

  • the same period of time and location as the older version of the spreadsheet or file

  • different content to the older version of the spreadsheet or file

The new version of the data will need its own unique URL or identifier. For example,

蝉耻辫别谤肠别诲别诲叠测:鈥�/government/organisations/government-digital-service/about/v1鈥�

You may also choose 鈥榠sRelatedTo鈥� as a more generic term that can account for any kind of relationship between resources.

Using 鈥榮upersedes鈥�

You can use 鈥榮upersedes鈥� as an additional property as this will allow users to understand the history or timeline of a document.

For example, supersedes:鈥�/government/organisations/government-digital-service/about/v1鈥�

Using 鈥榚xpires鈥�

If you鈥檙e no longer using a particular data set, or it has been superseded by other data, you should record it as expired. You can do this by adding the date for when your data set is no longer valid. You should give any replacement data a new title and identifier.

You will often need to remember to revisit your data set to update its metadata when becoming aware of the need for the data set to be no longer used.

For example, expires:鈥�2003-12-04鈥�

Do not use this to record the period of time that applies to your data. In these cases, you should use 鈥榯emporalCoverage鈥� instead.

Using 鈥榯emporalCoverage鈥�

If you鈥檙e collecting data over a range of dates, you should record this so users know the period that the content applies to. You should add this using the Open Standard ISO8601.

For example, temporalCoverage:鈥�2002-10-02/2013-01-01鈥�

If your data does not have a specified end date, you can use 鈥�..鈥� in place of the end date. This follows .

For example, temporalCoverage:鈥�2020-10-02/..鈥�

Using 鈥榗onformsTo鈥�

You should tell users whether your file conforms to a specific standard or schema so they can easily validate it. This could be the CSV on the Web schema or RFC4180 standard.

For example, conformsTo:鈥溾�

Your department, agency or local authority might also use a particular schema for specific types of data collection, and you may want to record this. For example the data standards for publishing brownfield land registers.

Make sure your data is used appropriately

Using 鈥榣icense鈥�

For protected data such as personal, sensitive or commercial data, you should record information that will help users of the data understand its terms and conditions.

You may want to include the relevant data sharing agreement, legal regulation or certification. This could be a memorandum of understanding (MOU) or Data Protection Impact Assessment.

The open standards vocabularies schema.org and Dublin Core both spell the noun 鈥榣icence鈥� using the American spelling 鈥榣icense鈥�. You should use 鈥榣icense鈥� for consistency.

For example, license:鈥淢emorandum of Understanding between the Charity Commission for England and Wales and the Office for Students鈥�

When publishing open data, you should label the data you鈥檝e collected with its licence for use. In many cases within government, this will be the . You should also link to the licence file to explain what the licence means and how others can use your code and content.

For example, license:

Using 鈥榟asDigitalDocumentPermission鈥�

You should record the sensitivity of your data so it鈥檚 not shared or published in ways it should not be.

You should provide information about who should be able to access the data you鈥檝e collected, and any restrictions including:

  • whether it鈥檚 open or restricted/protected

  • the handling caveat for the data

  • the security classification of data

For example, hasDigitalDocumentPermission:鈥渞estricted access鈥�

Updates to this page

Published 7 August 2020

Sign up for emails or print this page