JShell - Making my own shell

Published: (December 11, 2025 at 05:01 AM EST)
5 min read
Source: Dev.to

Source: Dev.to

Overview

A few months back I decided to create JShell as a fun project to learn more about the good old C. A shell simply reads user input (commands), executes programs if needed, prints the output, and repeats this. While it may look like a simple program that runs commands such as ls, a shell is actually close to a full‑fledged programming language: it can redirect output, support loops, conditionals, variables, and even host an HTTP server written entirely in Bash.

Note: Shells and terminals are separate applications. Terminals (e.g., kitty, Windows Terminal) provide the GUI front‑end, while shells (e.g., Bash, PowerShell) handle command parsing and execution.

Consistent behavior across shells is ensured by the POSIX standard, which defines the shell language. Most mainstream shells follow this standard; Windows Command Prompt and PowerShell are notable exceptions, but they are out of scope for this series.

Data Structures

The first step of a shell is to take user input and convert it into a structured form. JShell represents each command as a linked list of simple_command structures:

typedef struct simple_command {
    sc_arg *words;          // linked list of words (first member is command name)
    int wordc;              // number of words
    redirection *redirects; // linked list of redirects (explained later)
    int redirc;             // number of redirects
    struct simple_command *next; // next command to execute
} simple_command;

Example Input

# just a simple command with 2 arguments
rm -rf /boot

# command with output redirection
ls dir > list.txt

# command with a quoted argument containing spaces
cat "file name with spaces"

# backslash escapes a quote inside a double‑quoted string
cat "file name with \" in middle"

A naive approach might split the input by spaces or newlines, but handling escaped characters, quotes, and operators quickly becomes error‑prone. The more robust solution is lexing: reading the input character by character to produce a stream of tokens, which are then parsed into an abstract syntax tree (AST). In JShell, the simple_command struct serves as the AST node.

Lexing

Lexing groups characters into tokens such as words, quoted strings, or operators. It can be performed lazily during parsing or in a single pass. JShell initially performed lexing and parsing separately, but later switched to on‑demand lexing for efficiency.

Token Types

// Token types
enum TokenType {
    T_WORD,        // Regular word/identifier
    T_GTR,         // > redirect
    T_DGTR,        // >> append redirect
    T_LINEBREAK,  // Newline or semicolon
    T_END          // End of input
};

Token Structure

// Token structure returned by the lexer
struct token {
    const char *str;   // Pointer to the token text
    int len;           // Length of the token
    enum TokenType type;
};

Lexer State

// Stores state of the lexer for a given input string
struct lexer {
    const char *curr;   // Current position in input
    struct token tk;    // Current token
};

The lexer maintains the current position in the input and a buffer for the token being built.

Lexing Algorithm

The main lexing function follows these steps:

  1. Skip blanks – advance past any whitespace.
  2. Skip comments – if the current character is #, ignore characters until a newline.
  3. Detect fixed tokens – handle newlines, semicolons, and redirection operators.
  4. Identify words – treat any other sequence as a word, stopping at delimiters such as space, newline, ;, or &.
  5. Handle concatenation – consecutive strings without intervening spaces are concatenated (e.g., echo "a"b'c'abc).

Lexer Implementation

enum L_STATE nextToken(struct lexer *l) {
    struct token *t = &l->tk;
    t->flags = 0;

    /* Skip blanks */
    while (isBlank(*l->curr))
        l->curr++;

    /* Skip comments */
    if (*l->curr == '#') {
        while (*l->curr != '\0' && *l->curr != '\n')
            l->curr++;
    }

    /* Newline or semicolon */
    if (*l->curr == '\n' || *l->curr == ';') {
        t->str = l->curr;
        t->len = 1;
        t->type = T_LINEBREAK;
        l->curr++;
        return l->state = L_EOC;
    }

    /* End of input */
    if (*l->curr == '\0') {
        t->type = T_END;
        return l->state = L_EOF;
    }

    const char *start = l->curr;

    /* Numeric prefix (optional) */
    if (*l->curr >= '0' && *l->curr curr++;

    /* Redirection operators */
    if (strncmp(l->curr, ">>", 2) == 0) {
        t->type = T_DGTR;
        l->curr += 2;
        if (*l->curr == '&')
            l->curr++;
        t->str = start;
        t->len = l->curr - start;
        return l->state = L_CONTINUE;
    } else if (*l->curr == '>') {
        t->type = T_GTR;
        l->curr++;
        if (*l->curr == '&')
            l->curr++;
        t->str = start;
        t->len = l->curr - start;
        return l->state = L_CONTINUE;
    } else {
        /* Not a redirection – rewind to start of word */
        l->curr = start;
    }

    /* Word token */
    t->type = T_WORD;
    do {
        if (*l->curr == '"') {
            /* Double‑quoted string */
            while (*l->curr != '"') {
                if (*++l->curr == '\\' && *++l->curr != '\0')
                    l->curr++;
                if (*l->curr == '\0')
                    return l->state = L_EOF_DQUOTE;
            }
        } else if (*l->curr == '\'') {
            /* Single‑quoted string */
            do {
                if (*++l->curr == '\0')
                    return l->state = L_EOF_QUOTE;
            } while (*l->curr != '\'');
        } else if (*l->curr == '\\') {
            /* Escape next character */
            if (*(l->curr + 1) != '\0')
                l->curr++;
        }
        l->curr++;
    } while (isNotSpecialChar(*l->curr));

    t->str = start;
    t->len = l->curr - start;
    return l->state = L_CONTINUE;
}

The function returns a lexer state (L_CONTINUE, L_EOC, L_EOF, etc.) indicating whether more tokens are available.

Next Steps

This post covered the basics of lexing in JShell. Future installments will explore:

  • Parsing the token stream into a full AST
  • Implementing auto‑completion
  • Additional shell features such as pipelines, background execution, and built‑in commands

Stay tuned for the next part!

Back to Blog

Related posts

Read more »