Record key information about Essential Shared Data Assets
Updated 26 April 2024
Using metadata to describe聽Essential Shared Data Assets聽(ESDAs) making it easier to catalogue, validate, reuse and share your data.
In this context and for the purposes of specific ESDA guidance, a data asset is a container that holds one or more datasets. Datasets are individual, structured files containing data that are organised within data assets.
If you are responsible for data assets that are shared with another public sector organisation to deliver an essential purpose or process (i.e. Essential Shared Data Assets), you should include information about your data using an agreed metadata standard.听
This information is called metadata and the records that contain this information may be referred to as attributes or meta(data) elements. By doing this, you will:
- make your data searchable and easier for users to find it
- make it easier for the data to be catalogued and validated
- ensure your data is accessible and reusable - your data is often reused even when you do not expect it to be
You should use the Data Catalogue Vocabulary (DCAT) to describe the metadata for your ESDAs if you create, maintain and share datasets with other organisations.听
A is defined in DCAT as: 鈥淎 collection of data, published or curated by a single agent, and available for access or download in one or more representations.听
This guidance also applies to which are defined in DCAT as: A specific representation of a dataset. A dataset might be available in multiple serialisations that may differ in various ways, including natural language, media-type or format, schematic organisation, temporal and spatial resolution, level of detail or profiles (which might specify any or all of the above).听
If the data asset you create will be published in the form of spreadsheets, CSV files or other data in , you should also refer to the guide on publishing your tabular data, if you鈥檙e making your data open. All CSV files should comply with the Tabular data standard.
Who should use this guidance
This guidance is part of a聽collection on open standards聽to assist those already working with metadata and you should adopt it when聽submitting your ESDAs聽for inclusion in the Government Data Marketplace.
You must follow this guidance if the data assets being shared have been designated as Essential Shared Data Assets (ESDAs), which refer to data collected, used and maintained to deliver public services (and any other purposes or processes defined in the underlying聽guidance on ESDAs). As best practice, use this guidance if you create any other type of data assets, unless the use of other metadata standards is more appropriate.听
For example, if you are creating, maintaining or managing metadata for geospatial data (that which references data to a location on the surface of the Earth), you should instead use the metadata for spatial datasets, including those covered by the . You can also refer to the open standards profiles on 鈥楨xchange of location point鈥� and 鈥業dentifying property and street information鈥� for more details.
Using metadata in government
By following this guidance, you will use a consistent metadata vocabulary to describe Essential Shared Data Assets and improve data interoperability across government. This guidance is related to other Open Standards such as and , both recommended for government use.
Where to record and store your ESDA metadata
When recording metadata, it鈥檚 important to store it linked to, or with, the underlying data it鈥檚 describing. You can do this by storing metadata:
- within the dataset itself, or
- in a separate file, such as a readme file, and keep a record showing the link between data and metadata, and
- in a Metadata Catalogue (if your organisation does not currently have one you should consider creating one to support your ESDA submissions)
When publishing your data, you will need to consider:
- how the metadata may be harvested by data portals, such as the Government Data Marketplace,聽
- how easy it will be for users to discover the data assets you intend to publish, and
- How it will enable both humans and machines to interpret the metadata, for example by avoiding the use of acronyms and domain jargon in the title
Read our guidance on 鈥�Publishing tabular data鈥� to understand more about how you publish metadata.
Making your ESDA metadata machine readable and accessible
To make metadata machine readable and accessible, you must format your metadata in a specific way.
When recording the metadata for your ESDAs, make sure you use plain English by making it specific, informative, clear and to the point and follow the writing for 188体育 guide. For example, do not use jargon, and make sure you define technical terms and expand acronyms. Try to avoid using symbols that users or machines might misinterpret.
If you are providing usage guidelines within the metadata information and you include links to content stored elsewhere, i.e. by inserting URLs to ancillary documentation, web pages, etc - do make sure these can be accessible by users outside your organisation, otherwise remove them and provide that information alongside your dataset.听
It is important that all Mandatory attributes are completed when publishing Essential Shared Data Assets. For Recommended or Optional metadata attributes, if you do not have the information you need to record, you can still add the metadata, but add 鈥渦nknown鈥� or 鈥渘ot applicable鈥� when relevant, in preference for 鈥榥ull values鈥�.
Metadata you should record
奥丑别苍听submitting your ESDAs, you should record all Mandatory information that will help others:
- be informed on where and when your data was collected - use 鈥榗reator鈥� and 鈥榙ateCreated鈥� to record who created the data and the date they created it
- find the data you鈥檝e saved on a shared network, and identify whether it鈥檚 the data they need - use 鈥榯itle鈥�, 鈥榙escription鈥� and 鈥榠dentifier鈥� to describe your data
- state the version of the data you鈥檝e collected - use 鈥榚xpires鈥� and 鈥榮upersededBy鈥� so users know which version of your data to use
- use聽 鈥榯emporalCoverage鈥� to indicate the time period to which your data applies, and 鈥榗onformsTo鈥� to tell users whether your file applies to a specific standard or schema
- use the data you鈥檝e collected appropriately - ensure you have stated the 鈥榓ccessRights鈥� and 鈥榮ecurityClassification鈥� to make sure users do not share sensitive data in ways it shouldn鈥檛 be, and also state the 鈥榣icense鈥� that applies to your data assets to help users understand their rights to use the data you鈥檝e collected
Below are further examples of metadata and attributes which need to be included, note however this is not a comprehensive list. You should follow the specific聽guidance for Essential Shared Data Assets聽and use the聽聽applicable to these. In the Metadata Exchange Model, these attributes are referred to as 鈥榩roperties鈥� and each contains a definition and usage notes, as well as some specific examples.
Recording time and dates in your metadata
Using 鈥楥reated鈥�
You should record the date when you create a dataset to help users of the dataset know whether it is valid and relevant to them. You must record any dates using the ISO 8601 standard, which is an Open Standard selected for use by the government.
For example, created:鈥�2002-10-02鈥�
You must capture the exact time a dataset is collected when you鈥檙e collecting more than one version of a dataset a day. This means listing the date and time elements in descending order of size (years, months, days, hours, minutes, seconds, milliseconds and microseconds). You should provide the right level of accuracy for your dataset.
For example, if you publish your dataset once a year, it might be enough to provide a date down to the day, for example, 2020-07-14. If you publish multiple times a day, it is better to include information down to the second, for example, 2020-07-14T12:57:03Z. Note that in the , 鈥榋鈥� specifically means UTC (often known as GMT in the UK). Make it clear when time is not in British Summer Time, even though the date is in July, such as in the above example, which indicates a time stamp for data published shortly before 2pm (BST) on 7th July 2020.
Record the provenance of your data
Using 鈥榗reator鈥�
You should record who created a dataset so users can communicate with the creator and understand if the data is relevant to them. For example, a data analyst may want to find out how reliable a dataset is before undertaking any analysis.
Record the name of the organisation derived from the list of values associated with this attribute, for example, 鈥淐abinet Office鈥�.
Help users find, use and identify your dataset
Using 鈥榯itle鈥�
You must include the name of your dataset so users can find and identify the right dataset.
You should try to ensure the name captures information that will help users determine whether the dataset meets their needs. For example, by capturing the topic and specific information about place and geography.
For example, title:鈥滸overnment Digital Services London Office staff building occupancy鈥�.听
In order to keep titles short yet meaningful, you could describe further using 鈥榓lternativeTitle鈥� or 鈥榙escription鈥� whether the dataset relates to 鈥淎ll London Offices鈥� or a specific location, e.g. 鈥淲hitechapel鈥� as per this example.听聽
Using 鈥榙escription鈥�
You must provide a description so that there is a rich, human readable explanation of the data asset, in addition to the title, so that users of your data can find out if it鈥檚 relevant to them.
The descriptions of your data should only describe the type of data collected and should not include warnings about how to use the data - any warnings should be explained within usage notes or by reference to other properties such as 鈥榓ccessRights鈥�.
Using 鈥榠dentifier鈥�
You should uniquely identify your dataset so that users of your data know exactly which source they鈥檙e using.
You should identify your data asset by:
- using the identification system your organisation is using (in cases where organisations have a system in place)
- using an opaque identifier you鈥檝e created - this should be random numbers rather than sequential or semi-sequential numbers to avoid meaning being implied
Using a meaningless identifier avoids the misunderstanding that comes with applying meaning to identifiers. For example, meaning can change over time. Meaningless identifiers can be genuinely constant things.
For example, identifier:鈥�362857580鈥�
You can ensure this meaningless identifier stays unique by keeping a catalogue of all datasets with their identifiers.
Using 鈥榤ediaType鈥�
In a distribution, you should record the file format or encoding method of the data asset being described, since many datasets are (or can be) published in multiple formats (mediaTypes). For example:聽
- CSV: text/csv
- Excel (.xlsx): application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
- Geopackage: application/geopackage+sqlite3
- HTML: text/html
- PDF: application/pdf
- Word (.docx): application/vnd.openxmlformats-officedocument.wordprocessingml.document
Use the media type which is most relevant to your dataset, for example so that browsers can decide how to present the underlying data. The media type used is derived from a list of values as defined by IANA .
Make sure your data is used appropriately
Using 鈥榣icence鈥�
For protected data such as personal, sensitive or commercial data, you should record information that will help users of the data understand its terms and conditions.
You may want to include the relevant data-sharing agreement, legal regulation or certification. This could be a memorandum of understanding (MOU) or Data Protection Impact Assessment.
NOTE: The open standards vocabularies for schema.org, Dublin Core and DCAT spell the noun 鈥榣icence鈥� using the American spelling 鈥榣icense鈥�.听
For example, licence:鈥淢emorandum of Understanding between the Charity Commission for England and Wales and the Office for Students鈥�
When publishing open data, you should label the data you鈥檝e collected with its licence for use. In many cases within government, this will be the . You should also link to the licence file to explain what the licence means and how others can use your code and content.
For example, licence:
Using 鈥榓ccessRights鈥�
You should record the sensitivity of your data so it鈥檚 not shared or published in ways it should not be.
You should provide information about who should be able to access the data you鈥檝e collected, and any restrictions including:
- whether it鈥檚 Open, Commercial or Internal
- the handling caveat for the data
- the security classification of data
For example, Internal for restricted access to data.
Additional Information
Refer to the full set of attributes in the which has more comprehensive information and usage notes of all metadata requirements that apply to ESDAs.