Glossary

Select one of the keywords on the left…

UtilitiesGit

Reading time: ~60 min

Managing your files by simply saving them in folders on a hard drive runs afoul of some core concerns of anyone working on a computer for a living:

  1. Preserving your work. It's easy to accidentally overwrite a file containing significant amounts of work. Depending on how much work is lost, this can be devastating. Pixar, for example, deleted nearly all of Toy Story 2 when an errant rm -r -f * command was executed (the -r and -f flags mean "recursive" and "force", respectively). They were saved by the Supervising Technical Director, who had made a copy of the file tree so she could work from home after giving birth to her son.

  1. Tracking history. If you have a way to know what you did and when you did it, you can perform more dynamic operations on your content. For example, suppose you recently made two rounds of edits on a document, and you decide that the first round should be discarded, because the circumstances that motivated those edits has changed. If you have a way to isolate the first-round edits, it's possible you'll be able to do that in an automated way. Otherwise, you'll have to do it manually.

  1. Managing versions. Slightly different use cases often require you to maintain different versions of a given codebase. For example, clients might have different requirements that require custom modifications. If you choose to maintain these versions in separate directories, you have to deal with transferring any changes to the common part of the codebase to all of the different copies. This quickly becomes a major maintenance headache.

  1. Facilitating teamwork. Each team member should have maximum flexibility to work on a project and have that work reflected in their teammates' copies of the project. Some care must be taken to achieve this, because if two people make changes to the same file at the same time, their new versions must be merged.

Software designed to address these concerns is called version control. We will be working with a specific version control system called git which was created by Linus Torvalds in 2005 and has since grabbed a plurality of the version control market share among software developers.

Git main concepts

Git keeps a record, called a repository, of the history and versions of the contents of a particular directory (including its subdirectories, their subdirectories, and so on). The typical setup is to create a single directory for all of the files relevant to a given project and initialize a repository in .

Git uses two components to manage a repository in a given directory: a command-line program called git and a hidden subdirectory called .git. Commands are issued to git to manipulate the contents of .git.

Unlike syncing services like Dropbox or Google Drive, Git doesn't do anything automatically. All interactions are deliberate. This is helpful, because it means that changes made by a colleague won't be uninvitedly pushed to your machine where they might break your environment.

Conceptually, a git repository consists a collection of complete snapshots of the directory contents. These snapshots are called commits. The commit immediately preceding a given commit is called its parent. Commits and parent-child relationships between commits are the fundamental constructs of a Git repository.

Exercise

  1. The name of the hidden subdirectory containing the files Git uses to maintain a repository is .
  2. Git keeps your folder synced to the cloud at all times .
  3. Commits in a Git repository are organized using parent-child relationships between commits .
  4. A commit corresponds most closely to a .

Changes in a Git project migrate through a series of zones. When you make changes in your directory, Git initially knows nothing about them. You stage your changes to a staging area, then commit them to the repository. A project involving multiple contributors typically has a remote copy of the repository on a website like GitHub. When you are ready for your colleagues to get your changes, you push your local repository to the remote repository.

Changes in a Git project are staged, committed, and pushed to a remote repository.

Why does Git have so many zones? The staging area is necessary to help you distinguish files you want Git to track from files you don't want Git to track, and to provide an area to prepare for a well-organized commit. Having both local and remote copies of the repository allows you to make commits even when you don't have network access. Although this workflow might seem at first to be overly complicated, its benefits for flexibility and organization are often regarded as a positive distinguishing feature of Git (as compared to version control systems with fewer such zones).

Exercise

  1. Removing changes that have been prepared to be included in the next commit is called those changes.
  2. In a typical Git project with 4 zones, of them are stored on your computer (as opposed to the cloud).

Branches

Suppose that you and a colleague begin working on different parts of a project at the same time. The commits you make and the commits they make might share a parent (namely, the latest commit at the time when you begin working). If we visualize the set of commits as a graph, this corresponds to a split in the graph.

A fork in the commit graph.

You can maintain these two separate lines of development in the same repository by labeling them as new branches, as illustrated in the figure above. The most common convention is to have a main branch called master and label other branches descriptively. A branch is a pointer to a particular commit. When a commit is added to a given branch, the pointer moves to the new commit:

A branch is a pointer to the latest commit in a given line of development. When a commit as added to a branch, the pointer moves to that new commit.

Exercise
A branch is a .

Typically you will want to merge the changes from your branch back into master. In the example above, the mybranch commit is a descendant of the master commit. In this case, there is no potential for conflicts, and the merge can be performed by simplying pointing master to the same commit as mybranch. This is called a fast-forward merge. After merging, it's safe to delete the mybranch pointer.

If no commits have been added to master, the changes in mybranch can be merged into master by simplying moving the master pointer forward. This is called a fast-forward merge.

After your branch is merged into master, your colleague wants to merge their branch as well. If you edited the same parts of the same files as your colleague, a decision will have to be made about what version of those sections to incorporate into master. Git handles this by putting markings in the file which look like:

<<<<<<< master
The quick brown fox jumped over the lazy dog
=======
The brown fox jumped over the quick lazy dog
>>>>>>> mybranch

Your colleague will have to locate and remove these conflict markers one-by-one, and then stage and commit the resulting files. This commit will have two parents, indicating the two commits which were merged.

If two branches have diverged, then changes from one branch (theirbranch) can be merged into the other (master). The result is a new merge commit.

We will discuss the commands for performing these operations in the Core Git workflow section below.

Exercise
Suppose that you make a copy of a popular repository on GitHub (called a fork), and you spend a couple of months working on a new feature in a new branch you create. If you propose to merge your new branch back into the master branch of the project (this is called a pull request), it's likely that the merge be a fast-forward merge.

Configuring Git

When you first set up Git on your machine, there are a few configuration steps you want to take. The first is to let Git know about your name and email address.

git config --global user.name "Jane Doe"
git config --global user.email "jdoe@gmail.com"

You might also want to turn on colors:

git config --global color.ui true

Core Git workflow

In this section, we'll work through all of the commands necessary to carry out the most common Git operations. We'll begin by creating a directory and initializing a Git repository inside it.

mkdir our-novel
cd our-novel
git init
ls -a

We can see that git init did create a .git directory. The other way to get a Git repository is to clone one from a website like GitHub.

Next, let's create a file for our initial commit. The git command for staging a file is git add. The --all option stages all of the files in the current working directory.

echo 'Once upon a time,' > chapter-1.txt
git add chapter-1.txt # or git add --all

We can inspect the status of our working directory and repository using git status.

git status

The contents of the staging area are indented under the heading Changes to be committed.

Now we can commit the staged changes, including a descriptive commit message with -m:

git commit -m 'Initial commit'

We can display a record of commits using git log.

git log

You'll notice that commits are identified by a long hexadecimal string like d9599305d257a40c0b394a1af78dfe995f0010c7. This string is a hash of all of the data relevant to the commit. The name HEAD is a pointer to the branch you're currently on, so HEAD -> master indicates that the master branch is the currently checked out branch.

The output of the git log command is more helpful with a few of its options set to a non-default state. Let's go ahead and make a git alias so we don't have to type all of these options out every time. We'll use the name lol, which is a customary choice for this alias.

git config --global alias.lol "log --graph --decorate --all --oneline"
git lol

Finally, if we want to store a copy of the repository on GitHub, we visit github.com and create a new repository. Then we connect our local Git repository to the remote one we just created.

git remote add origin git@github.com:jovyan/MyRepo.git
git push --set-upstream origin master

where jovyan is replaced by your actual GitHub name, and MyRepo is replaced by your repository's name. The first line makes the connection to the remote repository and names it origin, while the second line sets the default remote repository to origin and pushes to GitHub. Note that the --set-upstream origin master part is only necessary on the first push; subsequent pushes can be done with git push.

It's a good habit to begin each work session by running git pull to fetch any changes that have been pushed by collaborators to the remote repository and merge those changes into your working directory. This operation aborts if you have changes in your working directory that conflict with the changes from the remote repository. One good way to resolve this issue is to stash your local changes and then apply them after you pull.

git stash
git pull
git stash apply

The command git stash creates a new commit which is not on any branch, and git stash apply merges the latest stash into the current branch.

Exercise

  1. The command for initializing a new Git repository is .
  2. The command for checking which files are staged is .
  3. The command for staging a file is
  4. The command for committing is
  5. The command for showing a decorated history of commits is

Git Branching Commands

Suppose we want to experiment with dragons in the novel's storyline. We can make a new branch called dragons for working on this idea.

git branch dragons
git lol

We've created a new branch called dragons, but we still have the master branch checked out (you can tell because HEAD still points to master). Let's switch to the new branch:

git checkout dragons
git lol

We can now add some dragon content and commit it:

echo '\n\nthere be dragons!' >> chapter-1.txt
git add chapter-1.txt
git commit -m 'Add some dragons'

Now let's switch back to the master branch and commit some different changes:

git checkout master
echo '\n\nin a galaxy far away' >> chapter-1.txt
git add chapter-1.txt
git commit -m 'Write another line'
git lol

Suppose we decide we do want to incorporate the dragons into the story. We want to the dragon branch into master.

While we have the master branch checked out, we do

git merge dragons

Git tells us that this merge led to conflicts, and we'll have to resolve them before making merge commits. Let's look at the new contents of chapter-1.txt:

cat chapter-1.txt

The next step is to edit the file and commit it. Typically you would edit the file in a text editor (we'll see a particularly good way to do it later in this course when we cover VS Code), but here we'll just use echo.

echo 'Once upon a time..., in a galaxy far away..., there be dragons!' > chapter-1.txt
git commit -m "Merge the dragons into the story"
git lol

Now we can delete the dragons branch. Since branches are just pointers to commits, this operation does not result in the loss of any snapshots in our project history.

git branch -d dragons

Exercise
Write a sequence of Git commands to create two new branches, one with dragons in the story and one with wizards in the story. Commit a change to each branch, then merge the wizard branch into the dragons branch, and finally merge the dragons branch into master. Use git lol to confirm that your repository log reflects the wizards to dragons to master merging sequence.

Undoing changes

Suppose you want to have a look at the state of your novel one commit ago. You refer to the commit which is any number of commits back using a tilde followed by the desired number of commits, as in HEAD~1. The git show command lets us extract a single file from a given commit:

git show HEAD~1:chapter-1.txt

Alternatively, you can refer to a particular commit by a distinguishing initial segment of its hash (note that you'll have to git lol to get an appropriate commit identifier for your session before you can run this cell):

git show 06d23b9:chapter-1.txt

We can see just the changes between two commits with a diff:

git diff HEAD HEAD~1 chapter-1.txt

Let's say you decide you want to go back to the version of a file two commits ago. You can checkout a single file.

git checkout HEAD~2 chapter-1.txt
git status

This operation changes the file in the local working directory. You can then stage and commit that change, or edit the file further and then stage and commit.

Exercise
Write a Git command to replace the contents of main.py with their contents four commits ago.

Solution. We checkout the file at that commit: git checkout HEAD~4 main.py.

Exercise
Write a Git command to show the changes in the file main.py from four commits ago to two commits ago.

Solution. We use git diff and specify the two revisions: git diff HEAD~4 HEAD~2 main.py.

To reveal more content, you have to complete all the activities and exercises above. 
Are you stuck? or reveal all steps

Next up:
LaTeX
Bruno
Bruno Bruno