Linux Fundamentals for Data Engineering

Published: (June 11, 2026 at 09:15 PM EDT)
4 min read
Source: Dev.to

Source: Dev.to

Introduction

As a data engineer, most of your work will happen In this article, I will walk you through the Most data engineers start their journey on Windows. To install WSL, open Windows CMD as Administrator wsl —install -d Ubuntu-22.04

After installation, restart your PC. You can now wsl in CMD. One important lesson I learned during setup: WSL -sh instead of bash, you are running a minimal SSH (Secure Shell) is the standard way to connect The basic SSH command syntax is: ssh username@server_ip -p port_number

For example, to connect to our assignment server: ssh root@159.65.222.96 -p 22

Port 22 is the default SSH port. When connecting Are you sure you want to continue connecting? Always type yes and press Enter. One important thing to understand about your

at the end means you are root (full admin)

$ at the end means you are a normal user Always run whoami to confirm which user you are On a shared server, every person should have their To create a new user: adduser briank

An important lesson I learned: Linux usernames must BrianK, I got this error: Please enter a username matching the regular The fix was simple use lowercase: adduser briank

To give the user sudo (admin) privileges: usermod -aG sudo briank

To verify the user was created successfully: id briank

Output: uid=1088(briank) gid=1088(briank) Here are the most important Linux commands every pwd # print current directory ls # list files ls -la # detailed list including hidden files cd Documents # go into a folder cd .. # go up one level cd ~ # go to home directory

touch notes.txt # create empty file mkdir linux_assignment # create folder cp notes.txt backup.txt # copy file mv backup.txt old.txt # rename/move file rm old.txt # delete file cat notes.txt # view file contents head -10 notes.txt # view first 10 lines tail -10 notes.txt # view last 10 lines grep “error” log.txt # search inside file

whoami # current username uname -a # system and kernel info hostname # server name uptime # how long server has been running df -h # disk space usage free -h # memory usage top # running processes (q to quit) ps aux # list all processes

Linux file permissions control who can read, -rwxr-xr-x Breaking this down: r = read (4) w = write (2) x = execute (1) Three groups: owner, group, others. To change permissions: chmod 755 script.sh # owner: rwx, others: r-x chmod 644 notes.txt # owner: rw-, others: r—

To change file ownership: chown briank notes.txt

ip a # show network interfaces ping google.com -c 4 # test connectivity netstat -tulnp # show open ports ss -tlnp # modern version of netstat curl ifconfig.me # show your public IP

PostgreSQL is the most popular open source database apt update apt install postgresql postgresql-contrib -y

systemctl start postgresql systemctl enable postgresql systemctl status postgresql

su -s /bin/bash postgres psql

CREATE DATABASE briank; \c briank CREATE SCHEMA staging;

CREATE TABLE staging.farmers ( id SERIAL PRIMARY KEY, farmer_name VARCHAR(100), county VARCHAR(50), subcounty VARCHAR(50), acreage DECIMAL(5,2), crop VARCHAR(50), loan_amount DECIMAL(10,2), loan_status VARCHAR(20), season VARCHAR(20) );

INSERT INTO staging.farmers (farmer_name, county, subcounty, acreage, crop, loan_amount, loan_status, season) VALUES (‘John Kipchumba’, ‘Uasin Gishu’, ‘Turbo’, 2.5, ‘Maize’, 15000.00, ‘Paid’, ‘2023A’), (‘Mary Jelimo’, ‘Uasin Gishu’, ‘Soy’, 1.8, ‘Maize’, 12000.00, ‘Defaulted’, ‘2023A’), (‘Peter Rotich’, ‘Uasin Gishu’, ‘Eldoret East’, 3.2, ‘Maize’, 20000.00, ‘Paid’, ‘2023B’);

\l — list all databases \c dbname — connect to database \dt — list all tables \du — list all users \q — quit psql

To allow tools like DBeaver or pgAdmin to connect postgresql.conf change listen_addresses: ’ *pg_hba.conf — add this line at the bottom: systemctl restart postgresql

SCP (Secure Copy Protocol) uses SSH to transfer scp C:\Users\Brian\notes.txt root@159.65.222.96:/root/

scp root@159.65.222.96:/root/notes.txt C:\Users\Brian\Downloads\

scp -r myfolder/ root@159.65.222.96:/root/

scp -i ~/.ssh/mykey.pem file.txt root@server:/path/

During this assignment I encountered several Lesson 1 Always check who you are whoami before every session saved me Lesson 2 Usernames must be lowercase BrianK failed briank worked perfectly. Lesson 3 The prompt tells you everything

means root, $ means normal user.

=# in psql means ready, (# means incomplete Lesson 4 WSL is not always Ubuntu apt, sudo, and ssh taught me cat /etc/os-release. Lesson 5 Shared servers have history grep and tail to verify Linux is the backbone of modern data engineering. The best way to learn Linux is by doing. Set up WSL As I continue my journey in data engineering at Brian Kiplangat - LuxDevHQ Data Engineering

GitHub: https://github.com/kiplangatbrian85/

0 views
Back to Blog

Related posts

Read more »