This section looks "under the hood" to understand how Git stores your data. This is not required for daily use, but it makes you a much better Git user.
When you run git init, Git creates a hidden .git directory. This directory is the entire repository. It contains all your commits, branches, tags, and configuration.
- All of your project's historical data is stored in the
.git/objectsdirectory. - These "objects" are not your files directly. They are compressed, read-only data objects. There are 3 main types we care about: Blobs, Trees, and Commits.
Why does Git separate the file name from its content? This is the core concept of Git's design.
A blob (Binary Large Object) stores only the content of a file.
- It does not know the file's name, its location, or its permissions.
- Its hash (e.g.,
83baae...) is a SHA-1 hash of only its content. - This is key: If two files in your project have the exact same content, Git stores only one blob and points to it from two different places. This is called deduplication.
A tree is Git's way of storing a directory.
- A tree object contains a list of pointers.
- Each pointer includes:
- The mode (e.g.,
100644for a file). - The type (is it a
blobor anothertree?). - The hash of the blob or tree it's pointing to.
- The file name.
- The mode (e.g.,
- The tree object links a file name to a blob (which is just content). This is incredibly efficient.
A commit object ties everything together into a "snapshot".
- Each commit has a unique SHA-1 hash (e.g.,
3a3cbb8...). - This hash is generated from all the information in the commit:
- A tree hash: A single pointer to the top-level tree that represents the entire state of your project at that moment.
- A parent hash: The hash of the commit that came before this one. This is what creates the chain of history. (The first commit has no parent).
- Author/Committer info: Your name, email, and the timestamp.
- The commit message you wrote.
The git cat-file command lets you inspect these raw Git objects. The -p flag stands for "pretty-print," which makes them readable.
Let's trace our first commit (3a3cbb8):
Step 1: Inspect the Commit We ask Git to show us the commit object.
$ git cat-file -p 3a3cbb8
tree 9152961f6b8b059e761ed8a77d4c3d7e48600a7b
author Your Name <your@email.com> 1763153000 +0100
committer Your Name <your@email.com> 1763153000 +0100
A: content added- This tells us the commit points to a
treewith the hash9152961....
Step 2: Inspect the Tree Now let's inspect that tree.
$ git cat-file -p 9152961
100644 blob 83baae61804e65cc73a7201a7252750c76066a30 content.txt- This tree object tells us it contains one item: a
blobwith hash83baae6...and the file name iscontent.txt.
Step 3: Inspect the Blob Finally, let's inspect that blob.
$ git cat-file -p 83baae6
This is my first file- And there is our raw file content! This shows the full hierarchy: Commit -> Tree -> Blob.
Now, let's add a second file (titles.md) and make a new commit ("B: titles added").
Our log now looks like this:
$ git log --oneline
db540cd (HEAD -> main) B: titles added
3a3cbb8 A: content addedStep 1: Inspect the New Commit
Let's cat-file our new commit, db540cd.
$ git cat-file -p db540cd
tree c6a58c8f139c614485b636f652df7b589d9f8b21
parent 3a3cbb8c4cd19dca31992d67ee9f54396ca48033
author Your Name <your@email.com> 1763153435 +0100
committer Your Name <your@email.com> 1763153435 +0100
B: titles added- Key Insight 1: It now has a
parenthash! This is3a3cbb8, the hash of our first commit. This is the link that forms the project's history. - It also points to a new tree,
c6a58c8....
Step 2: Inspect the New Tree Let's inspect this new tree.
$ git cat-file -p c6a58c8
100644 blob 83baae61804e65cc73a7201a7252750c76066a30 content.txt
100644 blob e9c5221e7883b5523363b3617e4f3a076a5966c4 titles.md- Key Insight 2: This tree lists both files.
- Key Insight 3 (Deduplication): Look at the blob hash for
content.txt. It's83baae6...—the exact same hash as before. We didn't change the file, so Git did not save it again. It just re-used the pointer to the existing blob. It only created a new blob fortitles.md.
This is what "Git stores an entire snapshot" means. Every commit points to a tree that defines the entire project, but it does so efficiently by re-using any objects (blobs or trees) that haven't changed.
If you want to see the changes in a commit without doing cat-file, use git log -p.
- The
-pflag (for "patch") shows the commit message and the diff (the exact lines that were added or removed) for each commit in your history.
$ git log -p
commit db540cd...
Author: ...
Date: ...
B: titles added
diff --git a/titles.md b/titles.md
new file mode 100644
index 0000000..e9c5221
--- /dev/null
+++ b/titles.md
@@ -0,0 +1 @@
+My New Title
commit 3a3cbb8...
Author: ...
Date: ...
A: content added
...