Carleton University Dataverse Collection
Federated Research Data Repository (FRDR)
Storage and Repository Decision Chart
Data Storage | What are the benefits of publishing my research data? | Where can I publish? | Replication Datasets | FAIR Principle | How to decide | Generalist repositories | Presentation
Data storage tips and options:
A common recommended practice for backing up and storing your data is the 3-2-1 Rule which says you should keep:
3 copies of your data on
2 types of storage media and
1 copy should be offsite
Having 1 copy offsite protects your data from local risks like theft, lab fires, flooding, or natural disasters. Using 2 storage media improves the likelihood that at least one version will be readable in the future should one media type become obsolete or degrade unexpectedly. Having 3 copies helps ensure that your data will exist somewhere without being overly redundant.
Types of commonly used storage devices/services:
- Networked drives
- Personal laptops/computers
- External storage devices
- Cloud storage (Dropbox, Google Drive, OneDrive)
- Carleton University ShareFile
While many cloud-based data storage options are secure, researchers should be cautious when using these solutions. One consideration when using commercial cloud services (e.g., DropBox or Google) is whether the data is stored in a Canadian data centre, as provincial privacy legislation may prevent this approach to storing data with personal information. Services such as networked drives and ShareFile have passed the University's privacy impact assessment for this reason.
If you require a lot of space, to share your data with collaborators within the Carleton community and no DOI, then Carleton University ShareFile may be right for you. ShareFile is hosted by Carleton University ITS and is a great sandbox with lots of storage, so it could provide an option for you to collaborate on your work prior to publishing in a repository.
Check out these ten simple rules for digital data storage for more information, and also check here to learn more about sustainable file formats.
What are the benefits of publishing my research data?
(Taken from Portage Repository Options Guide)
- Publishing research data facilitates data reuse across and within disciplines. Some published data are open for sharing and reuse without restriction (e.g., under Creative Commons or another open data license); in other cases, it may be appropriate to impose restrictions on sharing and reuse.
- Funders and journals increasingly require that data be published in a trustworthy repository as part of a well-developed data management plan (see Tri-Agency RDM Policy).
- Instructions for authors or author guidelines often specify data sharing policies of each publication. Examples include Nature, Springer Nature, PLOS, and Wiley. These requirements typically include making all supporting datasets openly available without restrictions when the article is published.
- You may also be required to include a Digital Object Identifier (DOI) for your dataset to submit with publication. See 'which repository should I use' section below to find out more.
- Publishing your data is also a good way to ensure they remain accessible beyond the life of the study for which they were collected. The ability to find and re-use data is increasingly important for verifying published research findings and supporting new research. Visit here for more information on Research Data Management.
Where can I publish my data?
Data repositories can be categorized broadly as:
- Domain-specific (disciplinary)
- Dryad Digital Repository (for data underlying scientific and medical publications), re3data.org (Registry of Data Repositories), PLoS ONE Recommended repositories (by discipline), Springer Nature Recommended repositories (by discipline)
- Not usually open to academic researchers (open data, etc…)
Many journals, publishers, and funding agencies require researchers to deposit replication datasets in a repository to obtain a DOI to cite in your work. By depositing a replication dataset, it’s more easily discoverable for other researchers to reuse and verify that a study can be replicated without having to contact the original study's author(s). The Carleton University Dataverse Collection is a great option for researchers to deposit these datasets. Just be sure to follow the Dataverse Best Practices: Replication Dataset Guidelines when depositing.
The FAIR Principle
All repositories should follow the FAIR principle: https://www.go-fair.org/fair-principles/
How do I decide which repository to use?
Data Services recommends the Carleton University Dataverse Collection which is a free repository for smaller datasets (>3Gb) that allows collaboration, version control, and accepts all data types. We offer support for uploading and curating your data and it will be widely discoverable in Omni (The Library's search tool), DataCite, Google Data Search, FRDR, and more. You will be provided with a DOI and the service ensures security and preservation. The Scholar's Portal Dataverse FAQ may help to inform your decision as to whether or not Dataverse is for you!
If you have larger datasets, then the Federated Research Data Repository (FRDR) may be right for you. This repository also mints DOIs, is widely discoverable, and ensures preservation and security.
Source: Current Metadata Practices for Long Tail Research Data, Kathleen Shearer, COAR presentation
Preparing your data for upload:
(Modified from Saint Mary's University Research Data Page)
Before uploading your data to a repository, it is helpful to answer the following questions:
- Are you the principal investigator? If you are not the PI, you may need to get the PI's permission to share the data, and/or ensure they are aware of the plan.
- How much data do you have? (in MB/GB/TB)
- Do you have any existing documentation (readme.txt, lab notebooks, a DMP) with a description of the data, file naming conventions, and methodology for the data collection?
- Did you use an existing metadata schema?
- Are there existing constraints or requirements placed on your data archiving and/or sharing? (Grants, journal data policies, research networks, etc.)?
- Are there specific restrictions on the data (legal, ethical, intellectual property)?
- Is there a repository specific to your industry/discipline where your data would be the most useful?
Resources to help you choose the best storage/repository option:
- Storage and Repository Decision Chart
- How to select a repository
- Generalist repository comparison chart
- Generalist repository interactive chart
- Safeguarding Your Research (Government of Canada Guidelines)
- Case study scenarios regarding data security breaches
- Sustainable file types
- Guidance on depositing data into public repositories
Other Repositories Options:
- ICPSR: a large and international archive of social science data. The library's subscription to ICPSR gives us free access to depositing at OpenICPSR
- Re3data.org: a comprehensive listing of disciplinary and institutional repositories to host and share research data. Use the Repository Finder tool to do a more focused search for an appropriate repository
- Directory of Open Access Repositories (OpenDOAR): provides a quality-assured list of open access repositories around the world.
- Zenodo: international OpenAIRE repository hosted by CERN
- Figshare or Dryad: popular multi-disciplinary repositories
- OSF: Open Science Framework (OSF) is a free and open source project management repository that supports researchers across their entire project lifecycle
For more information on data storage and repositories view this recent presentation: