Automated Version Control


  • Version control is like an unlimited ‘undo’.
  • Version control also allows many people to work in parallel.

Setting Up Git


  • Use git config with the --global option to configure a user name, email address, editor, and other preferences once per machine.

Creating a Repository


  • git init initializes a repository.
  • Git stores all of its repository data in the .git directory.

Tracking Changes


  • git status shows the status of a repository.
  • Files can be stored in a project’s working directory (which users see), the staging area (where the next commit is being built up) and the local repository (where commits are permanently recorded).
  • git add puts files in the staging area.
  • git commit saves the staged content as a new commit in the local repository.
  • Write a commit message that accurately describes your changes.

Exploring History


  • git diff displays differences between commits.
  • git restore recovers old versions of files.

Ignoring Things


  • The .gitignore file is a text file that tells Git which files to track and which to ignore in the repository.
  • You can list specific files or folders to be ignored by Git, or you can include files that would normally be ignored.

Remotes in GitHub


  • A local Git repository can be connected to one or more remote repositories.
  • Use the SSH protocol to connect to remote repositories.
  • git push copies changes from a local repository to a remote repository.
  • git pull copies changes from a remote repository to a local repository.

Open Science


  • Open scientific work is more useful and more highly cited than closed.

Licensing


  • The LICENSE, LICENSE.md, or LICENSE.txt file is often used in a repository to indicate how the contents of the repo may be used by others.
  • People who incorporate General Public License (GPL’d) software into their own software must make the derived software also open under the GPL license if they decide to share it; most other open licenses do not require this.
  • The Creative Commons family of licenses allow people to mix and match requirements and restrictions on attribution, creation of derivative works, further sharing, and commercialization.
  • People who are not lawyers should not try to write licenses from scratch.

Citation


  • Add a CITATION file to a repository to explain how you want your work cited.

Hosting


  • Projects can be hosted on university servers, on personal domains, or on a public hosting service.
  • Rules regarding intellectual property and storage of sensitive information apply no matter where code and data are hosted.

Collaborating


  • git clone copies a remote repository to create a local repository with a remote called origin automatically set up.

Branches


  • Branches allow parallel work without affecting the main codebase.

  • Each branch is a parallel snapshot. Changes are isolated until merged.

  • Use git branch, git switch, and git merge to manage branches.

  • Merging integrates changes from one branch into another.

  • Use git push origin branch-name to share a branch on GitHub.

  • Pull requests enable code review and discussion before merging.

Forks


  • A fork is your own copy of another GitHub repository (the “upstream” repo), stored under your account. You can make changes freely even if you don’t have write access to the original.

  • You can contribute by creating a pull request from your fork to the upstream repository.

  • To keep your fork updated, add the upstream repo as a remote with git remote add and pull changes with git pull.

  • Forking is the standard way to contribute when you don’t have write access to a repo — it enables safe collaboration by letting you propose changes without affecting the original project directly.

Conflicts


  • Conflicts occur when two or more people change the same lines of the same file.
  • The version control system does not allow people to overwrite each other’s changes blindly, but highlights conflicts so that they can be resolved.

Supplemental: Using Git from RStudio


  • Using RStudio’s Git integration allows you to version control a project over time.