How git clone Really Works: A Deep Dive into Git’s Object Database

Published: (December 11, 2025 at 01:22 AM EST)
4 min read
Source: Dev.to

Source: Dev.to

What git clone Actually Does

Git performs the following steps:

  1. Negotiates with the remote to discover available references (branches, tags).
  2. Downloads the full object graph — all commits, trees, and blobs reachable from those references — efficiently packed and delta‑compressed.
  3. Writes these objects into .git/objects/pack/, sets up local refs and HEAD, and then checks out a working directory from the root tree of the checked‑out commit.

In essence:

clone = copy the object graph + set references + checkout the working tree

The Git Object Model: Core Building Blocks

Git is a content‑addressed database, not a traditional filesystem. Every file, directory, commit, and tag exists as an immutable object identified by a cryptographic hash (SHA‑1 or SHA‑256). This makes Git’s data model tamper‑evident, deduplicated, and verifiable.

TypePurposeContains
BlobFile dataRaw bytes and a header
TreeDirectory snapshotMode, name, and object IDs for children
CommitSnapshot metadataAuthor, message, parent commits, root tree
TagAnnotated referenceTag message and pointer

The Object Graph

commit C
│  tree -> T_root
│            ├── mode 100644 "README.md" -> blob B1
│            ├── mode 100755 "build.sh"  -> blob B2
│            └── mode 040000 "src"       -> tree T_src
│                                                ├── "main.go" -> blob B3
│                                                └── "util.go" -> blob B4

└── parent -> commit P
               │ tree -> T_prev
               └── parent -> ...

Key ideas

  • A commit points to a tree, which represents a snapshot of the repository.
  • Trees point to blobs (files) or other subtrees (directories).
  • Commits form a Directed Acyclic Graph (DAG) through parent references.
  • Identical content produces identical hashes, so Git automatically reuses objects.

How git clone Communicates with the Remote

The clone operation is a structured conversation between your Git client and the remote server.

The remote server advertises:

  • Its available references (e.g., refs/heads/main, refs/tags/v1.0)
  • Supported capabilities (e.g., side-band, ofs-delta, multi_ack)

Negotiation Phase

The client responds with:

  • Wants: commits it needs
  • Haves: commits it already has (for incremental clones)

The server analyzes the commit graph to determine exactly which objects the client lacks.

Packfile Transfer Phase

The server:

  • Gathers all reachable objects from the requested commits
  • Delta‑compresses them for efficient transfer
  • Streams a single .pack file to the client

The client writes this pack into:

.git/objects/pack/pack-XXXX.pack
.git/objects/pack/pack-XXXX.idx

Protocol Flow Overview

Client                          Server
  |          ls-refs              |
  |------------------------------>|
  |       refs + capabilities     |
  ||
  |           have(s)             |
  |------------------------------>|
  |        ACK/NAK + pack         |
  | "ref: refs/heads/main"
├── config               -> [remote "origin"]
├── refs
│   ├── heads/main
│   ├── remotes/origin/main
│   └── tags/
└── objects
    ├── pack/
    │   ├── pack-XYZ.pack
    │   └── pack-XYZ.idx
    └── info/

Key components

  • .git/objects/pack: packed object store
  • .git/refs/heads: local branches
  • .git/refs/remotes/origin: remote‑tracking branches
  • .git/index: staging cache
  • .git/HEAD: symbolic reference to the current branch

How Git Checkout Creates Files

The checkout process transforms database objects into real files:

  1. Read HEAD → resolve branch → resolve commit
  2. Read the commit’s root tree
  3. Traverse the tree and write each blob to the working directory
  4. Cache path–blob mappings in the index
HEAD -> refs/heads/main -> commit C -> tree T_root
                                   |-> blobs -> files
Working tree  base OBJ_A]
[OBJ_C full]
...
[checksum]

This mechanism significantly reduces both disk usage and network transfer size.

Data Integrity and Security

  • Every object’s hash covers both its header and content—change any byte, and the hash changes.
  • Commits link via parent hashes, creating a verifiable chain of trust.
  • Tools such as git fsck and git verify-pack detect corruption.
  • Signed commits and tags add cryptographic authenticity.

Git’s security model is mathematical: integrity is guaranteed by hash linkage.

Example: Minimal Repository Flow

  1. Initial commit C0 → tree T0 → blob B1 (README)
  2. Next commit C1 → modifies README → blob B2
  3. Server packs {C1, C0, T1, T0, B2, B1}
  4. Client writes pack → sets refs → checks out C1 → files appear

Visual summary

refs/heads/main -> C3 -> C2 -> C1 -> C0

Each commit points to its root tree; trees link to blobs; references point to commits—forming a single, content‑addressed DAG.

Key Mental Models

  • Git is a database, not a filesystem. Every file, directory, and commit is an immutable object in a key–value store.
  • Cloning = graph download + reference binding. You fetch an object graph, then assign human‑readable names (branches, tags).
  • The working tree = a view of one tree object. Switching branches simply changes which tree object you’re viewing.
  • The index = a performance cache. It speeds up diffing and staging by tracking file stats and blob IDs.

Closing Thoughts

git clone doesn’t just copy files. It reconstructs a graph‑based database of snapshots, hashes, and relationships. Understanding this process gives you a more predictable, transparent view of how Git actually manages your code—and why it’s so efficient at doing so.

Link to original article

Back to Blog

Related posts

Read more »