National Network of Libraries of Medicine
English Arabic Chinese (Simplified) French Hindi Japanese Korean Persian Portuguese Russian Spanish

Data Tools

  • Bitbucke – A platform to store, manage, and share code using Git and Mercurial version control technologies. Alternative to GitHub. Interfaces with Atlassian products like JIRA.
  • Colectica – A software suite for integrating Data Documentation Initiative metadata standards into survey data, for open science.
  • CRAN – A place to find and share R packages.
  • CSV Fingerprints  – A tool that makes it easier to spot mistakes in CSV files.
  • Dash – Open source software that interfaces with repositories, allowing researchers to easily deposit and share data.
  • Data analysis: R – "A free software environment for statistical computing and graphics". Unix, Windows, MacOS.
  • GitHub – Software that makes it a lot easier to use Git. Also a social network for sharing open source code.
  • Open Refine – A tool for working with messy data: cleaning it; transforming it from one format into another; extending it with web services; and linking it to databases like Freebase.
  • PyPI: the Python package index – A place to find and share Python packages.
    • Python – A popular programming language used in data science.
    • RMarkdown – Code annotation tool for R.
    • rOpenSci – A non-profit devoted to creating R packages to be used in open science.
    • RStudio – Desktop software to use when editing R packages.
    • SciPy.org – "A Python-based ecosystem of open-source software for mathematics, science, and engineering." Includes the popular iPython and NumPy packages.
    • Subversion (SVN) – Version control software, a slightly less popular alternative to Git.
  • REDCap – Developed by researchers at Vanderbilt University in 2004, REDCap "is a secure web application for building and managing online surveys and databases. While REDCap can be used to collect virtually any type of data (including 21 CFR Part 11, FISMA, and HIPAA-compliant environments), it is specifically geared to support online or offline data capture for research studies and operations."
  • Tidy Data – An R package for data cleaning/curation.