2  Git and Github

Git is a version control system. It tracks the changes you make and commit to files in a local repository. It lets you apply those changes to remote repositories hosted on services like Github.

2.1 Why Git and Github?

In 2017, Jennifer Bryan expanded on the ‘Why Git?’ chapter of her invaluable instructional book in the paper Excuse me, do you have a moment to talk about version control?. She articulates the benefits so clearly that they are not worth rephrasing:

  • Doing your work becomes tightly integrated with organizing, recording, and disseminating it. It’s not a separate, burdensome task you are tempted to neglect.
  • Collaboration is much more structured, with powerful tools for asynchronous work and managing versions.
  • The marginal effort required to create a web presence for a project is negligible.
  • By using common mechanics across work modes (research, teaching, analysis), you achieve basic competence quickly and avoid the demoralizing forget-relearn cycle.

If you have worked on a data analysis project at Blueprint, particularly one that involves collaboration, you have undoubtedly done the following:

  • Searching for an explanation about when or why a specific choice about processing the data was made
  • Experienced disorientation attempting to navigate someone else’s code
  • Spent hours undoing a change that breaks the pipeline
  • Wanted someone’s help or advice about how to solve a problem, but didn’t feel like it would be worth getting them access

Hosted version control like Github empowers you with tools to simplify and routinize any of these situations.

2.2 Gitting rolling

At a minimum, you’ll need to install git and register a Github account. Optionally, you can also integrate git and Github into your RStudio environment. If you’ve done those things already, great, you’re good to go! Otherwise, it is strongly recommended that you pause what you’re doing now and read / work through Happy Git before continuing.

2.3 Usage at Blueprint (as of May 2023)

Right now, usage is highly varied. Some members of the Data Lab team use git and Github for version control and hosting of both R packages and analysis project code. It is not, however, part of the typical analysis project. We will begin rolling out the basic workflow in the summer of 2023 and see how it goes.