A Brief Introduction to Git and Github

Michael Luu

May 11, 2023

Slides are publicly available at:

https://mluu921.github.io/git-github-slides/

What is git?

Git is a distributed version control system that tracks changes in any set of computer files, usually used for coordinating work among programmers collaboratively developing source code during software development. Its goals include speed, data integrity, and support for distributed, non-linear workflows.1

What is GitHub?

GitHub, Inc. is an Internet hosting service for software development and version control using Git. It provides the distributed version control of Git plus access control, bug tracking, software feature requests, task management, continuous integration, and wikis for every project.1

Why…?

Reproduciblity Crisis

  • The reproducibility crisis refers to the fact that many scientific findings in biomedical research cannot be replicated by independent researchers.

  • The crisis has led to calls for increased transparency and data sharing in scientific research, as well as greater emphasis on replication studies to confirm the validity of scientific findings.

Reproduciblity Crisis

An analysis of past studies indicates that the cumulative (total) prevalence of irreproducible preclinical research exceeds 50%, resulting in approximately US$28,000,000,000 (US$28B)/year spent on preclinical research that is not reproducible—in the United States alone.

Reproducible Research

Important Git Jargon

  • Repo (Repository) - A repo or repository is generally a folder in which git manages and tracks the files that are contained within.

  • Staging - This is a ‘holding area’ of the files that you would like to version control. We need to ‘stage’ a file before we can ‘commit’ it.

  • Commit - A commit is a ‘snapshot’ of current changes among the files that have been ‘staged’ in your repository. A commit also requires a small message or text description, in which you can describe the changes.

Important Git Jargon

  • Branching & Merging - Git allows for a non-linear workflow, in which we can ‘branch’ away from the current source. With Git we can create infinite number of branches in which we can experiment without affecting the main source code. If content, we can merge those changes from various branches back to the main source.

Important Git Jargon

  • Push & Pull - If git is configured to work with Github (or another remote repository), push allows us to push the changes to a remote repository (Github), and pull allows us to pull in the changes from Github

  • Cloning - Allows the user to create clone of the Github repository, including the files, history, and branches onto the local machine

Why you should use it?

  • git provides a highly structured workflow that promotes reproducibility and transparency

  • git provides a continuous log of the changes you have done on your project / analysis (via Commit Messages)

  • git provides the ability to travel through time with the files in your project - easily jump back to previous versions of files (via Commits)

  • git provides the freedom to explore new ideas/analysis without the fear of affecting your primary analysis (via Branching)

Why you should use it?

  • Github provides a remote backup of your git repository

  • Github provides a highly structured method of sharing your repository with other collaborators

  • Github provides you with an online presence (think of this as an online portfolio of your work)

  • Github provides you with a free online hosting / website for your project

  • Git and Github is a industry standard version control system - this skill is transferable to many industries

However…

git is complicated

  • git is designed as a command line tool (e.g. to take full advantage of git, you will have to learn commands to enter in the terminal)

  • There is a barrier to entry on getting your local git repository to ‘talk’ with Github

  • Although git can version control any type of files, it is best used in conjunction with text files (e.g. source code)

git is complicated

  • There’s additional work to consider in your current workflow
    • Storing your project into a “project folder”
    • Making the folder a git repository
    • Making commits at logical points in time to snapshot your project
    • Writing commit messages to describe the changes
    • Pushing your work to Github

Setting up Git and Github

Creating a Git Repository

Staging and Commiting

Making a commit

Making additional commits

Branching and Merging

Creating and switching branch

Making a commit on a branch

Merging branches

Deleting branches

Creating a branch from a commit

Publishing a repo

Publishing a repo

Cloning a repo

Cloning a repo

Cloning a repo

Additional Resources