Resources about reproducibility with specific tools, languages, or methods.
💻 Computers & 🗄️ data
- Test-Driven Data Analysis (TDDA), automatically generating reference tests for data workflows > https://github.com/tdda/tdda
Talk recording:
🇷 R
- Reproducible Research with R & RStudio, Christopher Gandrud: https://christophergandrud.github.io/RepResR-RStudio/
- Reproducible Research CRAN Task View: https://cran.r-project.org/view=ReproducibleResearch
- renv!
- R³: https://r-cubed.rostools.org/
- Tools for Reproducible Research by K. Broman: https://kbroman.org/Tools4RR/pages/schedule.html
- 🌍 Geospatial research with R
- "Geocomputation with R’s guide to reproducible spatial data analysis" by Jakub Nowosad; slides: https://jakubnowosad.com/ogh2022; video: https://doi.org/10.5446/59404
- An R package is a great way to package a workflow - with the
fusen
R package, you can easily create a package based on a single R Markdown file - What they forgot to teach you about R: https://rstats.wtf/
- Reproducible Analysis in R - slides and video of workshop: https://n8cir.org.uk/events/event-resource/analyses-r/ (GitHub)
rrtools
- Reproducibility in R with parallel computations: https://pat-s.me/post/reproducibility-when-going-parallel/
- R Markdown
DE 9+ Making R Markdown Work Better for You by Alison Hill (3/24/2021)
🐍 Python
- https://kbroman.org/Tools4RR/assets/lectures/13_python.pdf
- Research Software Engineering with Python (Book for self-study): https://merely-useful.github.io/py-rse/
- NBIS Tools for Reproducible Research (Conda, Snakemake, R Markdown, Jupyter, Docker, Singularity): https://nbis-reproducible-research.readthedocs.io/
🧮 Matlab
A concise guide to reproducible MATLAB projects, David Wilby: https://rse.shef.ac.uk/blog/2022-05-05-concise-guide-to-reproducible-matlab/
🤯 Machine Learning (ML) / Artificial Intelligence (AI)
- ML Code Completeness Checklist: https://medium.com/paperswithcode/ml-code-completeness-checklist-e9127b168501 (Papers with Code)
- https://ai.facebook.com/blog/how-the-ai-community-can-get-serious-about-reproducibility/
- How to avoid machine learning pitfalls: a guide for academic researchers (https://doi.org/10.48550/arXiv.2108.02497)
- Heil, B.J., Hoffman, M.M., Markowetz, F. et al. Reproducibility standards for machine learning in the life sciences. Nat Methods 18, 1132–1135 (2021). https://doi.org/10.1038/s41592-021-01256-7
🦯 Blind peer review
- How to make the data and code for your manuscript available to peer reviewers before making it public: https://www.cambridge.org/core/blog/2019/08/19/how-to-make-the-data-and-code-for-your-manuscript-available-to-peer-reviewers-before-making-it-public/
🔬 Microscopes
- Montero Llopis, P., Senft, R.A., Ross-Elliott, T.J. et al. Best practices and tools for reporting reproducible fluorescence microscopy methods. Nat Methods (2021). https://doi.org/10.1038/s41592-021-01156-w
GitHub
- Perez-Riverol Y, Gatto L, Wang R, Sachsenberg T, Uszkoreit J, Leprevost FdV, et al. (2016) Ten Simple Rules for Taking Advantage of Git and GitHub. PLoS Comput Biol 12(7): e1004947. https://doi.org/10.1371/journal.pcbi.1004947
- Crystal-Ornelas, Robert, Brandon Edwards, Katherine Hébert, Emma J. Hudgins, Luna L. Sánchez-Reyes, Eric R. Scott, Matthew Grainger, et al. 2022. Not Just for Programmers: How Github Can Accelerate Collaborative and Reproducible Research in Ecology and Evolution. MetaArXiv. July 13. https://doi.org/10.31222/osf.io/x3p2q
🧾 Software and data citation and licensing
At some point the citation and licensing of software can become important for researchers. One structural problem of current science is, that very often scientific research software and research data is not covered by the commonly used 'success metrics' for scientific careers. Ensuring proper citation of software and data should thus be of high importance for developers and researchers alike. This holds for both your own software and data (making it citable, citing it) and software you use. Data sharing can benefit your scientific career by leading to greater collaboration, increased confidence in findings and goodwill between researchers (https://doi.org/10.1038/d41586-019-01506-x). Furthermore, several studies have shown that articles making data available have a citation benefit and data are actually reused (https://doi.org/10.7717/peerj.175, https://doi.org/10.1371/journal.pone.0230416). The same can be argued for software.
Licensing is important to keep in mind when starting to share, collaboratively develop, or reuse code and data. It's worth getting a quick overview of what copyright is (https://en.wikipedia.org/wiki/Copyright) and to acknowledge that (i) copyright law is very diverse across legal jurisdictions, (ii) the laws and their application for "modern" things like data and software are partly still evolving, and (iii) we need copyright law to be able to allow people to use our work. Important disclaimer: the information provided here is not legal advice. If you are unsure about copyright and licensing, consult your lawyer.
"The Legal Framework for Reproducible Scientific Research - Licensing and Copyright" (https://doi.org/10.1109/MCSE.2009.19, public PDF at https://academiccommons.columbia.edu/doi/10.7916/D8GH9TD8/download) gives you a good overview and provides clear recommendations on practices and licenses, as is "A Quick Guide to Software Licensing for the Scientist-Programmer" (https://doi.org/10.1371/journal.pcbi.1002598). If you want to make sure the licenses of software you use or share supports your intentions and do not stand in conflict with each other, TL;DR Legal can help you out: https://tldrlegal.com/. The website http://forschungslizenzen.de informs about rights and licenses for research data (German only) with a special focus on the humanities.
The Software Sustainability Institute's page "How to cite and describe software" is a great starting point for software citation (https://www.software.ac.uk/how-cite-software), albeit being a bit outdated. A more current article ais "Recognizing the value of software: a software citation guide" (https://doi.org/10.12688/f1000research.26932.2), as it includes recent initiatives such as Software Heritage (https://www.softwareheritage.org/). If you use a modern reference managers, the biblatex-software style (https://www.softwareheritage.org/2020/05/26/citing-software-with-style/) might be useful. The GitHub-Zenodo integration makes getting a citable DOI for every release very easy (https://guides.github.com/activities/citable-code/), but manual publishing from GitLab(.com, ZIV-GitLab) is almost as simple. Pro-tip: look for .zenodo.json files on GitHub to automate the metadata insertion on Zenodo and consider publishing a software paper in JOSS (https://joss.theoj.org/) or JORS (https://openresearchsoftware.metajnl.com/).
For data citation, university libraries and data repositories are your places to go. Data publication is part of more established practices around research data management (RDM, Forschungsdatenmanagement - FDM) and often is required by funders. Therefore, all universities have services in this area (https://www.uni-muenster.de/Forschungsdaten/) and the more static and less evolving nature of data, compared to software, makes some things easier as well. Generic information can be found at DataCite ("Cite your Data", https://datacite.org/cite-your-data.html) and DataVerse (https://dataverse.org/best-practices/data-citation). Open Data Commons (https://opendatacommons.org/) provides established licenses to use and an excellent FAQ (https://opendatacommons.org/faq/licenses/).
In a nutshell 🐿️:
- Make your own data and software citable and provide the desired citation in your README.
- Make your data and software usable by others by using open licenses (data licenses for data, software licenses for software).
- Put software on Zenodo and/or Software Heritage.
- Put data in a suitable research data repository, which you can find on https://www.re3data.org/.
- Cite all data and software that you use with their proper version and DOI. If data use reuse does not have a DOI, ask the author to make it citable.
🖥️ Research Software Engineering & Software publishing
- Adding software to package management systems can increase their citation by 280%:
You think something is missing on this page? Get in touch with o2r.support@uni-muenster.de. Thanks!