A Guide to Git and GitHub for Data Analysts
Source: Dev.to
What is Git and Why Version Control Matters
Version Control is a system that records changes to a file or set of files over time so that you can recall specific versions later.
Git is a Distributed Version Control System (DVCS). Unlike a central server where files are locked, every developer’s computer has a full copy of the code history.
Why is this important?
- The “Undo” Button: If you break your code at 2 AM, you can instantly revert the project to the state it was in at 10 PM.
- Collaboration: Multiple data analysts can work on the same file simultaneously. Git uses mathematical algorithms to merge (combine) these changes together.
- Branching: You can create parallel universes (branches) to test ideas without breaking the main working code.
- Context: It tells you who wrote a line of code, when, and importantly, why (via commit messages).
Note on Git vs. GitHub
- Git is the tool (the software installed on your machine).
- GitHub is the service (a website that hosts Git repositories in the cloud). Think of it as: Git is MP3, GitHub is Spotify.
How to Track Changes (The Git Workflow)
Tracking changes in Git follows a three‑stage process:
- Working Directory: Where you edit files.
- Staging Area (Index): Where you choose what to save.
- Repository (HEAD): The permanent storage for your code.
Common Commands
# Initialize a new Git repository
git init
# Show the status of your files
git status
Step A – Staging
# Add a specific file
git add main.py
# Or add all changed files in the current directory
git add .
Step B – Committing
# Create a permanent snapshot with a message
git commit -m "Implement the quadratic formula function"
Best Practice: Write commit messages in the imperative mood (e.g., “Add feature” not “Added feature”).
How to Push Code to GitHub
“Pushing” uploads your local repository history to a remote server (GitHub).
Prerequisite: Create a new empty repository on GitHub.com.
Step A – Connect Local to Remote
git remote add origin https://github.com/cyrusz55/my-project.git
Step B – Push the Code
git push -u origin main
origin– the remote name (GitHub).main– the branch you are sending (formerlymaster).-u– sets the upstream, allowing you to rungit pushlater without extra arguments.
How to Pull Code from GitHub
“Pulling” downloads data from GitHub to your computer. There are two common scenarios.
Scenario A – Starting from Scratch (git clone)
git clone https://github.com/cyrusz55/my-project.git
This command initializes a repository, creates the remote link, and downloads the full history in one step.
Scenario B – Updating Existing Code (git pull)
git pull origin main
Fetches new changes and merges them into your local files.
Summary Cheatsheet
| Goal | Command |
|---|---|
| Start Git | git init |
| Check status | git status |
| Stage files | git add . |
| Save snapshot | git commit -m "message" |
| Download repo | git clone |
| Upload changes | git push |
| Update local | git pull |
Happy coding! 🚀