A Guide to Git and GitHub for Data Analysts

Published: (January 17, 2026 at 03:59 AM EST)
3 min read
Source: Dev.to

Source: Dev.to

What is Git and Why Version Control Matters

Version Control is a system that records changes to a file or set of files over time so that you can recall specific versions later.

Git is a Distributed Version Control System (DVCS). Unlike a central server where files are locked, every developer’s computer has a full copy of the code history.

Why is this important?

  • The “Undo” Button: If you break your code at 2 AM, you can instantly revert the project to the state it was in at 10 PM.
  • Collaboration: Multiple data analysts can work on the same file simultaneously. Git uses mathematical algorithms to merge (combine) these changes together.
  • Branching: You can create parallel universes (branches) to test ideas without breaking the main working code.
  • Context: It tells you who wrote a line of code, when, and importantly, why (via commit messages).

Note on Git vs. GitHub

  • Git is the tool (the software installed on your machine).
  • GitHub is the service (a website that hosts Git repositories in the cloud). Think of it as: Git is MP3, GitHub is Spotify.

How to Track Changes (The Git Workflow)

Tracking changes in Git follows a three‑stage process:

  • Working Directory: Where you edit files.
  • Staging Area (Index): Where you choose what to save.
  • Repository (HEAD): The permanent storage for your code.

Common Commands

# Initialize a new Git repository
git init
# Show the status of your files
git status

Step A – Staging

# Add a specific file
git add main.py

# Or add all changed files in the current directory
git add .

Step B – Committing

# Create a permanent snapshot with a message
git commit -m "Implement the quadratic formula function"

Best Practice: Write commit messages in the imperative mood (e.g., “Add feature” not “Added feature”).

How to Push Code to GitHub

“Pushing” uploads your local repository history to a remote server (GitHub).

Prerequisite: Create a new empty repository on GitHub.com.

Step A – Connect Local to Remote

git remote add origin https://github.com/cyrusz55/my-project.git

Step B – Push the Code

git push -u origin main
  • origin – the remote name (GitHub).
  • main – the branch you are sending (formerly master).
  • -u – sets the upstream, allowing you to run git push later without extra arguments.

How to Pull Code from GitHub

“Pulling” downloads data from GitHub to your computer. There are two common scenarios.

Scenario A – Starting from Scratch (git clone)

git clone https://github.com/cyrusz55/my-project.git

This command initializes a repository, creates the remote link, and downloads the full history in one step.

Scenario B – Updating Existing Code (git pull)

git pull origin main

Fetches new changes and merges them into your local files.

Summary Cheatsheet

GoalCommand
Start Gitgit init
Check statusgit status
Stage filesgit add .
Save snapshotgit commit -m "message"
Download repogit clone
Upload changesgit push
Update localgit pull

Happy coding! 🚀

Back to Blog

Related posts

Read more »

Introduction to Gitbash and Github

Definitions - Git is a widely used, free, and open‑source system designed to handle projects of all sizes. It enables developers to track changes in code and f...