General Questions

questionQ1. Why do I need to manage my data?

In recent years there has been a significant drive by research funders - and in some cases researchers themselves - to encourage greater openness with research data. This has been partly in response to concerns about the non-reproducibility of research and the potential for malpractice, but also in part to facilitate data reuse and aggregation. Another driver is the HKU Policy on the Management of Research Data & Records, which states that 'accurate and retrievable research data are an essential component of any research project and necessary to verify and defend, when required, the process and outcomes of research.

In order to enable greater openness, research data needs to be discoverable, accessible, and described in such a way that it is intelligible to others. Where there is value in doing so, it also needs to be preserved and curated for the long term.

Besides benefiting the broader research community, data that is well managed is also likely to have more immediate benefits to the research group that created it and their collaborators. By storing, documenting, and preserving data efficiently, it should be easier for researchers to find the data they need when they need it.

If you are interested in learning more about the motivating factors behind the drive towards research data management, take a look at:

Q2. How much detail is required when documenting data?

There is a certain minimum amount of information about your data (metadata) that is required to ensure it can be properly cited: the names of those responsible for creating the data; a title; a publisher; and the publication year. This will generally be required during the data deposit process, along with information about how and why the data was generated. Most data repositories have a form to fill out when depositing data, and it is a good idea to see what information they ask for before you get too far into a project.

It is always sensible to add some documentation to a dataset whilst you are still working on it - explain abbreviations, and add notes about data that seem odd or which may cause confusion to those not involved in generating the data. Not only will this assist researchers who may wish to look at the data in future, but it will also help you and other members of your team to understand it if you need to revisit it yourself in a year or two's time.

Q3. What is metadata?

In the context of research data management, 'metadata' is the contextual information about the data that will help others to find and understand it. This will usually include information such as 'who created the data', 'what is the data about', 'are there any restrictions regarding who can use the data and in what circumstances', and so forth. Different disciplines will generally find different metadata fields useful. Library catalogues are essentially catalogues of metadata.

In some disciplines research publications are often the richest source of information about how a particular dataset was derived, so it is important to link articles to data. In others it may be necessary to develop additional documentation about how the data was collected, organised and used.

Q4. Is it OK simply to instruct people who want to look at my data to call me or send an email?

One of the principles behind funder expectations is that funded research data is a public good produced in the public interest and should be made freely and openly available with as few restrictions as possible in a timely and responsible manner.

A simple direction to interested parties to "contact the author" would not normally be considered sufficient. Decisions about data archiving, preservation and possible future sharing of data need to be made.

Q5. Will the University offer help and support to meet this requirement?

The University will support researchers in meeting HKU and funder expectations or requirements via advice and guidance. It offers services and infrastructure to support various aspects of research data management.

Enquiries relating to research data management should be directed to researchdata@hku.hk, where they will be reviewed and addressed by the Scholarly Communications team of staff from the HKU Libraries.

Information about software, services, and good practice is available on the rest of this website.

Q6. The data upon which I based my analysis was not generated by me or my team. What should I do?

You do not need to deposit existing data or that belonging to a third party unless you have materially altered or added to it. If you have substantially altered the data - and this includes restructuring it so as to enable analysis - then whether you can or should deposit it will depend largely on the intellectual property (IP) rights invested in the data and the licence it was published under (if applicable). Unless it is clear that you have the right to publish the data in its modified form, email researchdata@hku.hk for further advice.

Q7. The data I'm producing needs to remain private. Does this mean requirements regarding archiving and sharing don't apply to me?

Funders expect data be archived whenever possible. HKU asks that all data be archived. This does not necessarily mean it will always be shared publicly or without restrictions. When restrictions are necessary, it is recommended to archive in the HKU Scholars Hub in "dark archive". In this case, please consider using an embargo period for storage in the Hub, after which, the data may be made publicly accessible.

If your data must remain forever restricted, please consider making two versions of the data; one that is original data, and another that is anonymised or redacted, and then can be made publicly accessible. The Hub can store both versions, and dis/allow access accordingly.

The record for the data should indicate why access to the data is restricted or not possible. For example, Expectation vi from the EPSRC guidelines on archiving and sharing data states: "Where access to the data is restricted the published metadata should also give the reason and summarize the conditions which must be satisfied for access to be granted. For example 'commercially confidential' data, in which a business organization has a legitimate interest, might be made available to others subject to a suitable legally enforceable non-disclosure agreement."

If you believe that even publishing a record about your data would be problematic (e.g. it might expose matters of national security), please contact researchdata@hku.hk.


Software and computer code

Q8. If the 'result' in a paper is the demonstration of a novel piece of software, does the software count as data?

Probably not. If data has been generated as a result of running software code, then it may be helpful to provide a link to that code in the metadata, but the software itself would not constitute data in most situations.

Q9. If custom written software is used to process the data do instructions need to be provided on use of this software?

If the software is essential to validating the research findings then adequate information should be provided to enable its re-running by third parties. This may involve taking additional steps to preserve the software in addition to the data itself.


HKU Scholars Hub, and other archives

Q10. I have data that I need to deposit in an 'appropriate' data archive. How do I find such an archive?

An extensive directory of data repositories is available from re3data.org. These range from very generic commercially-provided repositories such as Figshare, to narrowly-defined subject-specific repositories. Generally speaking, it's better to use a subject-specific repository than a generic one, as they will have staff that understand the data and can help curate it properly as time passes. More options are shown here.

Some repositories request that the data they receive is accompanied by metadata (contextual information about the data) in a particular format, so it's worth getting in touch with appropriate repositories before you get too far in to the data gathering process. It's much easier to document your data as you gather it rather than leaving it until the end of a project - and documenting data during a project can also help you and any collaborators find relevant information more quickly whilst you are still working on it. Feel free to discuss this with your Subject Librarian or arrange to meet with one of the RDM support team by emailing researchdata@hku.hk.

Unfortunately, subject-specific data repositories do not exist for many fields. If there is no appropriate disciplinary repositories for your data, you can meet HKU and funder requirements by depositing it in HKU's institutional data repository: HKU DataHub.

Even if you deposit your data to an external data repository, you should still create a record for it in the HKU DataHub, so that the University can keep track of research outputs. This is increasingly expected by research funders, and can help with the assessment of impact. Dataset records published in DataHub can be hyperlinked to repository hosting your data.

Q11. When should I deposit my data in the HKU DataHub, rather than another data repository?

HKU DataHub is the University of Hong Kong's institutional data repository. This does not, however, mean that all research data you wish to preserve should go into the DataHub. If there is a specialist data repository for your discipline, you should under normal circumstances deposit your research data there rather than in the DataHub. You can find out more information about specialist data repositories from re3data.org and here.

You should deposit your data in HKU DataHub if there is no more appropriate specialized data repository in your field. You should however create a metadata record for your data in HKU Scholars Hub even if you deposit the data itself elsewhere. This fulfills your HKU requirements, and helps the University know what and where it is in the event of an audit. It also improves the visibility of your data, as HKU DataHub item records are indexed by search engines such as Google Dataset.

You may have the option to deposit your data in other general data repository such as Dryad, Zenodo, or OSF, based on your individual needs. These may be convenient, but there are reasons why these are not advised as alternatives to specialist repositories or for data underlying published research conclusions:

  • DataHub is the institutional data repository that serves as a hub of research data or other scholarly outputs conducted by HKU-affiliated researchers and students. Users may also establish a DataHub profile to showcase all of their published records.
  • DataHub is managed by the HKU Libraries and relevant support and advisory services are available from the Libraries.
  • DataHub, and to some degree specialist repositories, are likely to have better longevity than other free generalist (and/or) commercial repositories that are outside of the control of institution.
  • DataHub and specialist repositories are likely to be able to offer a better level of post-deposit curation in the future than free services (things like format migrations and integrity checking).
  • The Hub and (most) specialist repositories include a metadata review to ensure that minimum standards are met.
Finally, some journal publishers accept data deposits alongside the articles they publish. If depositing in a publisher's data repository, check that the terms and conditions meet your funder's minimum expectations, and then also create a metadata record in the Hub.

For further advice and options, please see the Deposit of Data page.

Q12. Can I use the HKU DataHub for data deposit only at the end of my project?

No, it is not necessary to only deposit your data to DataHub at the end of your project. You may wish to upload them under your DataHub account as private items when your research project goes on. You may also upload the files in a collaborative space – using the “Project” function, to manage your data with your collaborators without publishing the items. Even if your data are already published on DataHub, you may also make updates on it. "Versioning" of data, ie., version 1, version 2, etc. is available on DataHub. Updating major metadata & elements of the item such as title, authors and uploaded files will lead to a new version with a versioned DOI. Please read this full list of elements that would trigger a new version for reference.

Some funders such as the EPSRC indicate that a record describing your research data should normally be made available within 12 months of the data being generated, even if access to the dataset itself is restricted.


How do I...?

Q13. I'm putting together a project bid and need to complete a data management plan. How do I go about doing this?

Most of the major funding bodies provide a data management plan (DMP) template as well as guidance for completing the plan. Before going any further, it's also worth visiting your funder's own website to ensure you are referring to the latest versions of the template and guidelines where available. The RDM Requirements section gives links for summary requirements of several funders.

The page in this site for Data Management Plan gives further description and sources for a DMP.

If you would like more detailed advice as to what to include in a DMP, arrange to meet with one of the Library's RDM support team by writing to researchdata@hku.hk.

Q14. I need to include a Digital Object Identifier (DOI) for my data in my article submission. How do I get one?

Most data repositories will assign a unique identifier, most commonly a Digital Object Identifier, when you deposit you data with them.

If you deposit in HKU DataHub, a DOI can be automatically minted for your data upon publication. You may also wish to “reserve” a DOI for your dataset uploaded onto DataHub before it is published so that you could cite it in other location before your data become accessible on DataHub.