Licensing data
When making data available to others outside the research team, you should observe two rules:
- Always make the data available under licence, so that it is clear to any person wishing to access and use them who owns the data, and on what terms they can be used;
- Make the data available under the most open licence possible, which allows the widest possible scope for re-use and redistribution.
Data should be made available under an open licence, unless there is good reason to licence them on a more restrictive basis, for example, to prohibit commercial re-use of data in which a commercial partner has an interest.
What is a licence?
A licence is an official authorisation to make use of specified material. As well as telling users what they are and are not allowed to do with the material, a licence also provides protection to the creators and owners of intellectual property. An accompanying rights statement establishes legal ownership of the licensed item and asserts the right of its creator(s) to be recognised as such. The attribution condition that is common to many open licences is the legal basis of your right to be credited as the creator of the licensed material. Many licences also include formal disclaimers of liability for any harm or damage that may arise from someone else's use of the material.
Open licences
An open licence makes an item free to access, use, modify and share by anyone for any purpose. Examples of open licences include:
- Creative Commons licences for creative works (including research publications and datasets);
- Open Source licences for software source code;
- Licences for specific types of work, such as the Open Data Commons licences for databases;
- Government open data licences, such as the UK Open Government Licence for public sector materials;
- Public Domain Dedications, such as the Creative Commons CC0 Public Domain Dedication: strictly speaking, this is a rights waiver, not a licence, but it is generally considered as a type of open licence.
The Creative Commons licence suite includes versions with Non-Commercial and No-Derivatives terms. These and any licences with similar terms are less open licences, because of the restrictions they place on re-use. But if material cannot be made available under a more open licence, it is still wise to publish under a standard licence. The Creative Commons Attribution-NonCommercial (CC BY-NC) licence still grants broad permission for use in research and teaching and other non-commercial activities.
The Open Definition provides a list of conformant open licences for creative works (including publications and datasets). The Open Source Initiative lists Open Source licences for software.
The University does not prescribe use of any particular open licences for data or software, as the most appropriate licence will depend on the nature of the material and related requirements.
Creative Commons Attribution (CC BY) is widely used for the licensing of datasets (as well as Open Access publications and other materials), and is a good choice that will suit most requirements. It is the default licence recommended by the University's Research Data Archive. Other licences may be used or preferred by some repositories. For example, by default NERC data centres release primary data from NERC-funded research under the Open Government Licence; the Dryad Digital Repository releases data only under the Creative Commons Zero Public Domain Dedication.
Licensing software
In most cases short scripts and segments for code written to perform standard operations, e.g. for purposes of data processing, statistical analysis or data visualisation, can be archived alongside data, under the same licence as the dataset (for example, a Creative Commons Attribution licence). This is best suited for situations where the code is likely to have little independent use value to anyone else, and any re-use is likely to be solely for the purpose of validating results, e.g. by re-running analyses described in a paper.
Where re-use of source code in new contexts or further development is anticipated, for example if substantial original software has been developed, or source code has been written in the context of an ongoing project or established community, it will be appropriate to release the code under an Open Source licence.
There are a number of popular Open Source licences for software, which are listed by the Open Source Initiative, and there is a useful licence picker tool at choosealicense.com. Another useful resource, tl:drLegal provides plain English summaries of many Open Source licences. For detailed guidance on software licensing, consult our Guide to publishing research software (PDF).
How to license data or software
A licence to make use of intellectual property is issued by or on behalf of the intellectual property rights-holders. The first thing therefore is to establish who owns the intellectual property, and your right or authorisation to issue the licence. The Intellectual property rights and research data web page provides guidance on identifying the rights-holders in data or software. Rights-holders are typically the University (for IP created by University employees), students (in the absence of any contract or assignment agreement indicating otherwise), or third parties involved in research, such as commercial partners, collaborator organisations, or research sponsors.
If the material has been created by multiple authors, or multiple parties have interests in it, you should ensure that any proposed release under a specific licence is agreed by all concerned beforehand, as once it has been applied to material a licence cannot be revoked. Where ownership of research data resides with the University, researchers are authorised under the Research Data Management Policy to make data and source code available under an open licence, providing no commercial, legal or ethical restrictions apply.
To license material, you should clearly mark it with both a rights statement and a licence statement. These combined statements make clear to any prospective users who is the owner of intellectual property rights in the licensed material, and the terms on which the material can be used.
The rights and licence statements should be included in the public information recorded about the material (such as a metadata record in a data repository, or the landing page of a software code repository), as well as in the material itself and/or in its primary documentation (such as a readme file or user manual). You do not necessarily have to mark all individual files with these statements, providing item-level statements are clearly visible. Licence statements should include the URL to the full legal code of the licence used (the URL can be embedded in text or a licence logo image).
Most data repositories will include include rights and licence statements in the metadata record for an item. A repository will usually enable you to specify rights and licence information when you deposit the dataset. The University's Research Data Archive provides a licence picker tool for uploaded files, with various standard licences and the option to upload your own licence. The licence information displays both in the file metadata and on the item record.
It is important that the rights statement identify all owners of intellectual property in the material. For example, the rights statement for a dataset created by a member of University staff jointly with student John Smith must identify the University and John Smith as rights-holders (assuming the student has not assigned his IP to any other party under contract).
Examples of combined rights and open licence statements are:
© 2019 University of Reading. This dataset is licensed under a Creative Commons Attribution 4.0 International License: https://creativecommons.org/licenses/by/4.0/.
Copyright 2019 University of Reading and John Smith. This software is licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0/.