JShell - Making my own shell
Source: Dev.to
Overview
A few months back I decided to create JShell as a fun project to learn more about the good old C. A shell simply reads user input (commands), executes programs if needed, prints the output, and repeats this. While it may look like a simple program that runs commands such as ls, a shell is actually close to a full‑fledged programming language: it can redirect output, support loops, conditionals, variables, and even host an HTTP server written entirely in Bash.
Note: Shells and terminals are separate applications. Terminals (e.g., kitty, Windows Terminal) provide the GUI front‑end, while shells (e.g., Bash, PowerShell) handle command parsing and execution.
Consistent behavior across shells is ensured by the POSIX standard, which defines the shell language. Most mainstream shells follow this standard; Windows Command Prompt and PowerShell are notable exceptions, but they are out of scope for this series.
Data Structures
The first step of a shell is to take user input and convert it into a structured form. JShell represents each command as a linked list of simple_command structures:
typedef struct simple_command {
sc_arg *words; // linked list of words (first member is command name)
int wordc; // number of words
redirection *redirects; // linked list of redirects (explained later)
int redirc; // number of redirects
struct simple_command *next; // next command to execute
} simple_command;
Example Input
# just a simple command with 2 arguments
rm -rf /boot
# command with output redirection
ls dir > list.txt
# command with a quoted argument containing spaces
cat "file name with spaces"
# backslash escapes a quote inside a double‑quoted string
cat "file name with \" in middle"
A naive approach might split the input by spaces or newlines, but handling escaped characters, quotes, and operators quickly becomes error‑prone. The more robust solution is lexing: reading the input character by character to produce a stream of tokens, which are then parsed into an abstract syntax tree (AST). In JShell, the simple_command struct serves as the AST node.
Lexing
Lexing groups characters into tokens such as words, quoted strings, or operators. It can be performed lazily during parsing or in a single pass. JShell initially performed lexing and parsing separately, but later switched to on‑demand lexing for efficiency.
Token Types
// Token types
enum TokenType {
T_WORD, // Regular word/identifier
T_GTR, // > redirect
T_DGTR, // >> append redirect
T_LINEBREAK, // Newline or semicolon
T_END // End of input
};
Token Structure
// Token structure returned by the lexer
struct token {
const char *str; // Pointer to the token text
int len; // Length of the token
enum TokenType type;
};
Lexer State
// Stores state of the lexer for a given input string
struct lexer {
const char *curr; // Current position in input
struct token tk; // Current token
};
The lexer maintains the current position in the input and a buffer for the token being built.
Lexing Algorithm
The main lexing function follows these steps:
- Skip blanks – advance past any whitespace.
- Skip comments – if the current character is
#, ignore characters until a newline. - Detect fixed tokens – handle newlines, semicolons, and redirection operators.
- Identify words – treat any other sequence as a word, stopping at delimiters such as space, newline,
;, or&. - Handle concatenation – consecutive strings without intervening spaces are concatenated (e.g.,
echo "a"b'c'→abc).
Lexer Implementation
enum L_STATE nextToken(struct lexer *l) {
struct token *t = &l->tk;
t->flags = 0;
/* Skip blanks */
while (isBlank(*l->curr))
l->curr++;
/* Skip comments */
if (*l->curr == '#') {
while (*l->curr != '\0' && *l->curr != '\n')
l->curr++;
}
/* Newline or semicolon */
if (*l->curr == '\n' || *l->curr == ';') {
t->str = l->curr;
t->len = 1;
t->type = T_LINEBREAK;
l->curr++;
return l->state = L_EOC;
}
/* End of input */
if (*l->curr == '\0') {
t->type = T_END;
return l->state = L_EOF;
}
const char *start = l->curr;
/* Numeric prefix (optional) */
if (*l->curr >= '0' && *l->curr curr++;
/* Redirection operators */
if (strncmp(l->curr, ">>", 2) == 0) {
t->type = T_DGTR;
l->curr += 2;
if (*l->curr == '&')
l->curr++;
t->str = start;
t->len = l->curr - start;
return l->state = L_CONTINUE;
} else if (*l->curr == '>') {
t->type = T_GTR;
l->curr++;
if (*l->curr == '&')
l->curr++;
t->str = start;
t->len = l->curr - start;
return l->state = L_CONTINUE;
} else {
/* Not a redirection – rewind to start of word */
l->curr = start;
}
/* Word token */
t->type = T_WORD;
do {
if (*l->curr == '"') {
/* Double‑quoted string */
while (*l->curr != '"') {
if (*++l->curr == '\\' && *++l->curr != '\0')
l->curr++;
if (*l->curr == '\0')
return l->state = L_EOF_DQUOTE;
}
} else if (*l->curr == '\'') {
/* Single‑quoted string */
do {
if (*++l->curr == '\0')
return l->state = L_EOF_QUOTE;
} while (*l->curr != '\'');
} else if (*l->curr == '\\') {
/* Escape next character */
if (*(l->curr + 1) != '\0')
l->curr++;
}
l->curr++;
} while (isNotSpecialChar(*l->curr));
t->str = start;
t->len = l->curr - start;
return l->state = L_CONTINUE;
}
The function returns a lexer state (L_CONTINUE, L_EOC, L_EOF, etc.) indicating whether more tokens are available.
Next Steps
This post covered the basics of lexing in JShell. Future installments will explore:
- Parsing the token stream into a full AST
- Implementing auto‑completion
- Additional shell features such as pipelines, background execution, and built‑in commands
Stay tuned for the next part!