The Tool
Practical skills assignment
- Find a private (not already publicly shared) dataset you have recently created or used, and evaluate it against the recommendations in Section 13.2 of Chapter 13 Project management, in Experimentology. Which recommendations are fulfilled? What needs work?
- Find the (meta)data standards and data repositories that are most relevant for your subfield.
- Identify a public dataset in your subfield. This could be a dataset of your own, or from your own lab, or a publicly shared dataset from another lab in your field. Give a basic description of the dataset in your response paper (who generated it, what kind of data, where is it publicly shared). Evaluate this dataset using the FAIR checklist:
- Findable: Which data search tools can find this dataset? How easy would it be to find? What repository is it stored in? How searchable is the repository? Can the data be cited?
- Accessible: Can the data be easily retrieved and downloaded? Are reasonable restrictions in place?
- Interoperable: Do the data and metadata conform to recognized standards in this discipline? How good are those standards?
- Reusable: Is there sufficient information to support data interpretation and reuse?
In your response paper, describe what you did in fulfilling the practical activity. This might also include any snags you hit.
Then provide a critical evaluation of the tool. What is the promise of this data repositories and data standards in addressing the challenge of lost and wasted data? What are the biggest technical obstacles to data sharing in your subfield? These might include the effort required to develop data standards, to prepare datasets for sharing, or to find data once they have been shared. What are the biggest social obstacles to data sharing in your subfield?
Useful Links and Resources
- A short course on how to evaluate the FAIRness of data.
- A graphical introduction to Tidy data, in a twitter thread.
- Data management start kit, a list of resources and websites collected by gofair.org.
- https://orcid.org
- Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., et al. (2016). “The FAIR Guiding Principles for scientific data management and stewardship.” Scientific Data, 3(1), 1-9.
Example data standards:
- Neurodata without Borders: https://www.nwb.org/
- BIDS: https://bids.neuroimaging.io/
- GIN: https://g-i-n.net/international-guidelines-library/
Example repositories:
- https://dandiarchive.org
- https://openneuro.org/
- https://dataverse.harvard.edu/
- https://data.mendeley.com/
- https://www.ncbi.nlm.nih.gov/geo/ (genomics data repository)
- Figshare
- Databrary
- DSpace@ MIT: A digital repository for MIT’s research.
- Canadian Federated Research Data Repository
Example data search tools:
- https://datasetsearch.research.google.com/
- https://www.re3data.org/: Browse by subject by clicking browse at the top of the screen.
- https://catalog.data.gov/dataset