Michael Luu
May 11, 2023
Slides are publicly available at:
Git is a distributed version control system that tracks changes in any set of computer files, usually used for coordinating work among programmers collaboratively developing source code during software development. Its goals include speed, data integrity, and support for distributed, non-linear workflows.1
GitHub, Inc. is an Internet hosting service for software development and version control using Git. It provides the distributed version control of Git plus access control, bug tracking, software feature requests, task management, continuous integration, and wikis for every project.1
The reproducibility crisis refers to the fact that many scientific findings in biomedical research cannot be replicated by independent researchers.
The crisis has led to calls for increased transparency and data sharing in scientific research, as well as greater emphasis on replication studies to confirm the validity of scientific findings.
An analysis of past studies indicates that the cumulative (total) prevalence of irreproducible preclinical research exceeds 50%, resulting in approximately US$28,000,000,000 (US$28B)/year spent on preclinical research that is not reproducible—in the United States alone.
Repo (Repository) - A repo or repository is generally a folder in which git manages and tracks the files that are contained within.
Staging - This is a ‘holding area’ of the files that you would like to version control. We need to ‘stage’ a file before we can ‘commit’ it.
Commit - A commit is a ‘snapshot’ of current changes among the files that have been ‘staged’ in your repository. A commit also requires a small message or text description, in which you can describe the changes.
Push & Pull - If git is configured to work with Github (or another remote repository), push allows us to push the changes to a remote repository (Github), and pull allows us to pull in the changes from Github
Cloning - Allows the user to create clone of the Github repository, including the files, history, and branches onto the local machine
git provides a highly structured workflow that promotes reproducibility and transparency
git provides a continuous log of the changes you have done on your project / analysis (via Commit Messages)
git provides the ability to travel through time with the files in your project - easily jump back to previous versions of files (via Commits)
git provides the freedom to explore new ideas/analysis without the fear of affecting your primary analysis (via Branching)
Github provides a remote backup of your git repository
Github provides a highly structured method of sharing your repository with other collaborators
Github provides you with an online presence (think of this as an online portfolio of your work)
Github provides you with a free online hosting / website for your project
Git and Github is a industry standard version control system - this skill is transferable to many industries
git is designed as a command line tool (e.g. to take full advantage of git, you will have to learn commands to enter in the terminal)
There is a barrier to entry on getting your local git repository to ‘talk’ with Github
Although git can version control any type of files, it is best used in conjunction with text files (e.g. source code)
https://happygitwithr.com/ (Git / Github Integration with Rstudio)
https://git-scm.com/book/en/v2 (Definitive Git Book)