Secondary data
A data management plan may also need to consider the use of secondary data in research. Secondary data are intellectual property belonging to another party, so they may only be used with that party's permission and subject to any terms specified by the provider. You should investigate any secondary data sources that will be essential to your research in advance, so that you are confident they can be used for the purposes anticipated.
You will, of course, also need to consider issues of data management, as you would for primary data, such as: where the data will be stored; how they will be kept secure, particularly if they contain confidential or sensitive information; and how the data will be processed, for example if data are being combined from multiple sources, used as inputs into modelling activities, or transformed in any way as part of the research.
It is recommended that as part of your data management planning you prepare a list of the key data sources you will use in your research, with full references by DOI or other persistent identifier where possible. For each data source, record the terms of use, and whether the data will be consulted only, or will be incorporated into data outputs intended for distribution in support of project findings.
Below are some of the things you should consider.
Are the data accessible from a reliable source?
Accessible public and archived data sources should be used where possible, such as a data repository or data centre, data provider's website, or document archive, so that sources can be formally cited in your research.
Will the data provider give you permission to use the data in the research?
You should check the data source for a licence or terms of use. If you cannot find any information, contact the data provider to obtain their terms of use in writing.
A lot of research and public sector data available from public data repositories and government providers will be published under an open licence or a licence with broad re-use permissions, such as the Creative Commons Attribution licence or the Open Government Licence (for public sector information). These allow data to be re-used, modified and redistributed for any purpose; often the only requirement will be that the source is acknowledged in any published re-use.
Some research and public sector data may be published under licences that place restrictions on commercial re-use and redistribution (such as the Creative Commons Attribution-NonCommercial licence). Data containing confidential information may be supplied under special licences that require compliance with confidentiality and information security requirements (see below).
Data products provided by commercial organisations are likely to be supplied for a fee under proprietary licences that prohibit the distribution of copies or derived information.
Are there any conditions attached to use of the data?
Any data not supplied under an open licence may have special conditions of use that will restrict what can be done with the data and any derived information, and may require organisational signature of a licence or data use agreement before data can be used.
Data that contain confidential information, such as national census microdata provided the UK Data Service, may be supplied only once you and the University have signed a data use agreement containing confidentiality clauses, and demonstrated that you will meet any information security conditions.
For example, in order to be able to consult Office of National Statistics data held in the UK Data Service Secure Lab you would need to:
- complete the Safe Researcher training course to qualify as an ONS Accredited Researcher;
- arrange organisational signature of the Secure Access User Agreement. You would need to contact your School's Contracts Manager to have the agreement reviewed and signed;
- complete any additional documentation specific to the data collection(s) you wish to consult;
- either attend the Secure Lab in person at the UK Data Archive, or, if you wish to access the data remotely from on campus, ensure the PC you use is: located in a room not accessible to the public and locked when unattended; protected by a password lock with a five-minute timeout; assigned a static routable IP address. You should contact your School's IT Business Partner (login required) to arrange for assignment of a static IP address to your designated PC;
- submit any derived datasets to the data provider for clearance before publication.
Can you distribute the data in part or whole or any derived datasets?
If secondary data will be incorporated into new datasets, or used to derive new datasets, it is preferable that you should be able to distribute the modified or derived data in support of your research outputs, for example by depositing the modified or derived dataset in a data repository.
If data have been supplied under an open licence or a standard licence for research and public sector data, then redistribution in whole or part will be possible, unless the licence includes a No Derivatives clause (e.g. Creative Commons Attribution-NoDerivatives 4.0 International). Beware of any standard licence restrictions (i.e. Non-Commercial or Share-Alike clauses) that may dictate the terms on which a modified dataset can be shared.
Commercial data providers are unlikely to permit distribution of modified or derived datasets. Data supplied under a confidentiality agreement will not be distributable, and publication of derived data may also be prohibited, or permitted only after the derived dataset has been vetted by the data provider.