1 Introduction
ls
, mv
, cd
), using version control (e.g., git
, hg
), installing packages (e.g., apt-get
, npm
), or dealing with infrastructure (e.g., docker
). Experts can adapt and play with a multitude of commands and arguments, chaining them together to create more complex workflows. All this versatility introduces a common problem in user interfaces of recognition over recall (Nielsen 2005b), where users have to recall the particularities of syntax and argument combinations, instead of enabling them to use a more recognizable symbol (as in graphical user interfaces).1.1 Contribution
-
We identified nine Customization Practices, grouped into three high-level themes: Shortcuts introduce new names. They can be used for nicknaming commands (and correcting misspellings in the process), abbreviating subcommands like
git push
, and bookmarking locations for quick navigation. Modifications change the semantics of commands. We can use these types of aliases for substituting commands, such as replacingmore
withless
, for overriding defaults to customize commands to personal contexts, which often involves colorizing output, and also running certain commands as root by elevating privilege. Aliases that combine multiple commands are Scripts. They enable many ways of transforming data using Unix pipes, and allow for automating repetitive workflows by chaining subcommands. -
A Curated Dataset of Command-Line Customizations, consisting of over 2.2 million shell aliases collected from GitHub. We view our dataset as a playground for fine-grained discovery that can benefit researchers, tool-builders, and command-line users; for example, researchers can use this knowledge base to discover which commands are frequently used together and how they are combined, while tool-builders can see how their programs are being customized. We also describe the effective mining technique we used to distill this knowledge, which allowed us to capture almost the whole population (94.09%) of relevant shell configuration files.
-
We formulate Implications for Improving Command-Line Experience that go beyond single customization practices to address shortcomings and tie them to existing user experience research. Codifying emergent behavior (Fast et al. 2014) found in our customizations enables learning repair rules and discovering workflows. We are able to uncover conceptual design flaws, where customizations indicate frustrations with underlying command structures, supporting prior research on potential flaws in the conceptual design of certain commands (Perez De Rosso and Jackson 2013). Based on the prevalence of highly variable command redefinitions, we propose contextual defaults, the ability to suggest different command preferences based on user context (Stefanidis et al. 2011). Overall, we find that many customizations deal with the tension of Interactivity vs Scripting: commands being used to interactively navigate systems, while at the same time being used within scripts for batch-processing.
2 Background
sh
and its various descendants (Jones 2011; Seibold 2020). The POSIX family of standards defines a Shell Command Language (IEEE and The Open Group 2018; Greenberg 2017), whose standard implementation is still the sh
utility, but there exist a wide variety of popular POSIX-compliant shells like bash
or zsh
. These implementations are free to extend the functionality of the shell, but all share a common subset of core commands and programming language constructs. In this paper, we focus on the built-in alias
command, available on all POSIX shells.2.1 Usage and Syntax
alias
command allows the user to create alias definitions, defining command substitutions. When the shell processes the command line, it replaces known alias names with their defined string values. For example,
alias ll='ls -l'
ll
, that is replaced by the alias value ls -l
. In this case, ls
is the standard command for listing directory contents, with the argument -l
specifying a long-form output format. So the alias ll
(present in many system configurations) is used to specify a default argument to a commonly used command under a different name.
alias ducks='du -cksh * | sort -hr | head -n 15'
ducks
by chaining together three different command-line tools in order to return the 15 largest files in the current directory.
alias name=value
value
can optionally be enclosed in single ('
) or double ("
) quotes and name
can be any identifier that is a valid command name.1
alias grep='grep --color=always'
a
→ b
to indicate an alias that replaces the name a
with the value b
.2.2 Dotfiles
.bashrc
, .zshrc
, or .profile
and their main difference is the order in which they are executed.2 Often, aliases are also stored in other files referred to by these startup scripts..
) so that they are hidden by default on most Unix-based systems. In recent years, people have started sharing their dotfiles on platforms like GitHub.3 This has the advantage of being able to sync one’s configurations across different machines, and also enables exchange and discovery of configurations between users.3 Dataset
3.1 Data Collection
.bashrc
or .bash_profile
).alias
.github-searcher
7 that uses a clever sampling strategy to vastly increase the number of results we are able to retrieve.
alias language:Shell size:101..200
alias language:Shell size:201..300
alias language:Shell
3.2 Parsing
|
and |&
), logical composition (&&
and ||
), background execution (&
) and simple chaining (;
). Arguments are separated by whitespace, but care is taken to handle quoted arguments correctly. For example, echo "hello world"
is parsed as one command (echo
) with one argument ("hello world"
). See Fig. 2 for a more elaborate example.
ls -l -a --color=always
); combined short arguments without a dash (e.g., tar xvzf archive.tar
); dictionary-style arguments (e.g., dd if=/dev/zero of=/dev/sda
); subcommands (e.g., git commit -m "wip"
); and many more. Since the parser can not know the intentions of any command, it simply treats each token as a separate argument. There is one exception: if the command is sudo
, then its first argument is taken as the real command. For example, sudo apt-get install
is parsed as the command apt-get
with argument install
and the sudo flag set.3.3 Provenance
.bashrc
, aliases.zsh
or .profile
(see Table 1). We found another 2.78% of aliases originating from scripts related to Git, with file names like git.plugin.zsh
or git.bash
. The remaining aliases are more or less evenly distributed among a variety of file names, none of which contributes more than half a percent of aliases, in most cases significantly less. The average number of aliases per file is 11 ± 18, the median is 6.
% | Files | Name Pattern | Aliases | % |
---|---|---|---|---|
14.35 | 27870 | ⋆alias⋆ | 612516 | 27.79 |
27.72 | 53844 | ⋆bashrc⋆ | 591396 | 26.83 |
22.15 | 43011 | ⋆zshrc⋆ | 487002 | 22.09 |
9.42 | 18298 | ⋆profile⋆ | 199009 | 9.03 |
1.26 | 2455 | git⋆ | 61248 | 2.78 |
dot
, as in dotfiles
, dot-files
, dots
, mydotfiles
, and so on. Looking at these names and descriptions, we can see a clear bias towards personal configurations and settings management. On average, each repository contributes 16 ± 28 aliases, the median is 8.
% | Repos | Word in Description | Aliases | % |
---|---|---|---|---|
21.91 | 30259 | my | 582448 | 26.42 |
17.00 | 23483 | dotfiles | 466006 | 21.14 |
12.75 | 17612 | files | 316963 | 14.38 |
6.85 | 9466 | configuration | 175333 | 7.95 |
5.33 | 7364 | config | 131430 | 5.96 |
4.54 | 6269 | personal | 113885 | 5.17 |
4.13 | 5707 | linux | 101747 | 4.62 |
3.17 | 4385 | bash | 94739 | 4.30 |
3.88 | 5353 | scripts | 91021 | 4.13 |
2.06 | 2840 | zsh | 74034 | 3.36 |
3.4 Reproducibility
4 Analysis
ls
, appearing a total number of 83782 times, which is 3.8% of all alias definitions. Note that this is ls
as an alias name, a redefinition of the ls
command, which appears 260156 times (10.27%). This is a bit less often than git
, the most common command, which appears in 327786 aliases (12.93%). The most common argument, across all commands, is --color=auto
, appearing 153931 times (4.24%).
# | % | |
---|---|---|
Name | ||
ls | 83782 | 3.80 |
ll | 62465 | 2.83 |
grep | 44479 | 2.02 |
la | 43760 | 1.99 |
l | 39539 | 1.79 |
Command | ||
git | 327786 | 12.93 |
ls | 260156 | 10.27 |
cd | 166632 | 6.58 |
grep | 89598 | 3.54 |
vim | 46545 | 1.84 |
Argument | ||
--color=auto | 153931 | 4.24 |
-i | 70640 | 1.95 |
-a | 42910 | 1.18 |
-l | 39519 | 1.09 |
-v | 35295 | 0.97 |
git
and ls
, showing us the top arguments given with each and the most common alias names by which the command/argument combinations are referred to. Here we can already identify some of the typical alias use cases. Looking at ls
, we find that aliases are used to redefine the command with a default argument (ls
→ ls --color=auto
); to shorten a common invocation (ll
→ls -alF
); and to correct a spelling mistake (sl
→ ls
). We also notice that in the case of git
, most aliases are used for shortening git
subcommand invocations (e.g. gd
→ git diff
).
% | Arguments | Aliases (%) | |
---|---|---|---|
git | 5.85 | status | gs (54.27), gst (19.19) |
3.48 | g (75.71), gti (5.74) | ||
3.20 | checkout | gco (50.52), gc (13.87), gch (7.56) | |
3.18 | push | gp (46.73), gps (9.23), push (7.56) | |
3.16 | diff | gd (79.89) | |
2.86 | pull | gpl (18.30), gl (16.59), gp (15.07) | |
2.78 | branch | gb (73.54), gbr (6.57) | |
2.71 | add | ga (80.96) | |
2.00 | commit | gc (63.16), gci (5.33) | |
1.96 | commit -m | gcm (31.29), gc (25.18), gm (7.97) | |
ls | 14.45 | --color=auto | ls (99.04) |
8.63 | -A | la (97.61) | |
7.80 | -CF | l (98.75) | |
6.78 | -alF | ll (97.49) | |
5.46 | -l | ll (78.83), l (7.91) | |
3.75 | l (27.90), sl (21.45) | ||
2.88 | -G | ls (96.47) | |
2.74 | -la | ll (38.42), la (26.87), lla (12.63) | |
2.67 | -a | la (76.94) | |
1.92 | -al | ll (49.69), la (12.23), l (8.49) |
4.1 Inductive Coding
cd
, git
, ssh
, ls
, and vim
. Unique aliases often contain user-specific file system paths (e.g. gitbash
→ source /Users/j/mybin/gitsh
), happen to have a unique combination of arguments (e.g. ls
→ ls -GphF
) or are otherwise highly particular (e.g. h23
→ history -23000
).man
pages and other forms of documentation.10 To increase the trustworthiness of our codes, coding was performed independently in parallel by the two authors. After a first iteration, we compared our labels, consolidating different naming conventions. In consecutive iterations, we identified ways of formalizing the emerged categories, i.e. constructing automated mechanisms for classifying alias definitions as belonging to certain categories. The suitability for mechanical classification was an important factor for the viability of any emerging themes. The discussion of these formalizations additionally served to establish a better shared understanding. Ultimately, we reached a saturation point at which further coding and analysis did not lead to further insights.5 Customization Practices
# | % | |
---|---|---|
Shortcuts | ||
Nicknaming Commands | 244872 | 11.11 |
Abbreviating Subcommands | 194850 | 8.84 |
Bookmarking Locations | 321546 | 14.59 |
Modifications | ||
Substituting Commands | 100564 | 4.56 |
Overriding Defaults | 319239 | 14.48 |
Colorizing Output | 182623 | 8.29 |
Elevating Privilege | 93683 | 4.25 |
Scripts | ||
Transforming Data | 74719 | 3.39 |
Chaining Subcommands | 22062 | 1.00 |
5.1 Shortcuts
gs
→ git status
has a compression ratio of 5. Figure 3 shows the distribution of compression ratios over all aliases in the dataset. The median compression ratio is 4.25, meaning half of all alias values are at least four times as long as their alias names. A compression ratio less than 1 indicates a name that is longer than the value it aliases.
cat
command with a similarly named file as an argument. The second longest alias name is a Swedish compound word of 131 characters,11 aliasing the ls
command.line
echoes 23635 dashes, achieving a compression ratio of 5911, the highest among all aliases. The second highest comes from an alias named BEEP
, which invokes the Linux beep
utility 9 times in succession, with a combined 4471 arguments. When executed, it appears to play Daft Punk’s 2001 instrumental single Aerodynamic.g
→git
, c
→ clear
, h
→ history
, and v
→ vim
. Almost all (93.03%) of these kinds of aliases introduce a nickname that is shorter than the command they are referring to, and about half (50.58%) introduce a name that is only one or two characters long.got
→ git
. To determine instances of these typographical errors, we surveyed and experimented with different string distance measures (Navarro 2001) and decided on using the Damerau-Levenshtein algorithm (Damerau 1964).grpe
→ grep
), case-sensitivity (Jupyter
→jupyter
), localization (pluralise
→ pluralize
), and punctuation (docker-build
→ docker_build
).git push --tags
executes the push
subcommand of git
with the --tags
flag enabled. We identified 67 commands in our dataset that take subcommands, such as git
, docker
, or systemctl
. Noticeably, we found 194850 aliases (8.84%) that are purely abbreviations of subcommands, without adding any additional arguments beyond the subcommand. For example, gs
→ git status
or gd
→ git diff
. The majority of such subcommand abbreviations (58.5%) are for git
, with 113980 aliases defined purely for abbreviating git
subcommands, accounting for 36.77% of all aliases involving git
. The command with the second-most subcommand abbreviations is the package manager pacman
, with only 9918 instances (5.09% of subcommand abbreviations, but 68.67% of all aliases involving pacman
).starwars
→ telnet towel.blinkenlights.nl
and dl
→ cd ~/Downloads
are both bookmark aliases.-
A string containing a forward slash (
/
), indicating a path. -
An IPv4 address, matched by the liberal regular expression
[0-9]+∖.[0-9]+∖.[0-9]+∖.[0-9]+
-
A string containing one of the known top-level domains12 preceded by a dot (
.
) and followed by a slash (/
), colon (:
) or the end of the string.
/dev/null
is not a location for our purposes. Neither is origin/master
, and thus gm
→ git merge origin/master
does not count as a bookmark. We also exclude aliases that are merely referencing unnamed relative directories (e.g., ../..
).cd
command is featured heavily. Most other uses seem to be development related, like starting services such as web servers or databases with pre-defined locations, opening frequently edited files, or outputting logs, as in onoz
→ cat /var/log/errors.log
5.2 Modifications
more
→ less
, replacing a standard Unix utility (more
) with a more capable but similar command (less
). This can also be used for subterfuge, as in emacs
→ vim
(appearing 132 times in our dataset) or indeed vim
→ emacs
(86 times, alas).vi
→ vim
, vim
→ nvim
, and vi
→ nvim
.ls
→ ls -G
, then the alias re-defines the command and effectively overrides its default settings. Any time the command is now executed, it will be with the arguments specified in the alias. There are 319239 aliases in our dataset (14.48%) that are used to override defaults in this way. Aliases to override the defaults of the grep
family of commands (grep
, egrep
, fgrep
) occur 96970 times, accounting for (4.4%) of all alias definitions (and 68.27% of all grep
appearances). The ls
command is redefined with new defaults 75374 times, accounting for 3.42% of all aliases (28.99% of ls
appearances).mv
, cp
, and rm
, but also ln
, for creating symbolic links) enable interactive mode (-i
and variations), which prompts the user before performing potentially destructive actions. Verbose output (-v
) also plays a role here, describing exactly what kind of effects a command execution had or will have. Enabling verbosity can also be seen as a kind of output formatting, although much more common is the wish for human-readable output. For example, the alias df
→ df -h
ensures that the available disk space is displayed in common size units, as opposed to just the raw number of bytes. But by far the most common reason for overriding defaults is to enable colorized output. This behavior is so prevalent that we count it as a customization practice in its own right.less -R
or grep --color=always
), setting an environment variable (as in ssh
→ TERM=xterm256color ssh
), running the command through a tool that colorizes its output (like grcat
or pygmentize
), or even replacing a command outright (diff
→ colordiff
). Taking all these varieties into account, more than half of all command redefinitions (57.21%) enable colored output by default. This amounts to a surprising 182623 aliases, or 8.29% percent of all aliases in the dataset. If we extend this count to also include aliases that introduce new names (like ll
→ ls -l --color=auto
), then more than 10% of aliases colorize a command’s output.sudo
command allows the user to execute another command with superuser privileges. Combining a command with sudo
is often necessary if the other command needs to modify critical parts of the system. In our dataset, we found 93683 aliases (4.25%) in which a command is prefixed with sudo
. The top sudo
-prefixed command is the package manager apt-get
, appearing 10467 times with sudo
. Remarkably, these are 89.35% of all occurrences of apt-get
. In fact, 72.45% of all occurrences of the package managers apt
* (Debian and derivatives; including apt
, apt-get
, apt-cache
, aptitude
, and $apt_pref
), pacman
, abs
and aur
(Arch Linux), yum
(RPM), dnf
(Fedora), zypper
(openSUSE), port
(macOS), and gem
(Ruby) are together with sudo
, and these package managers account for 29.1% of all sudo
occurrences. Interestingly, the macOS package manager brew
rarely appears with sudo
(only 1.07%), even though it is the third most occurring package manager overall, behind apt
* and pacman
.systemctl
, shutdown
, lsof
or mount
.5.3 Scripts
|
), used in 39.66% percent of alias scripts, followed by the operators for simple chaining (;
), with 29.61%, and logical conjunction (&&
), with 26.88%. Other operators (||
, |&
) appear in only 3.85% of multi-command aliases.|
) creates an interface between two otherwise separate programs. It embodies the Unix philosophy of small tools doing one thing well, which can then be connected together to accomplish more complex tasks. There are 74719 aliases (3.39%) combining two or more commands using only the pipe operator. The most common command occurring after a pipe, by far, is grep
, which makes an appearance in almost half of all pipelines (46.16%), more than three times as often as xargs
and sort
. The most common data sources are ps
, git
, and ls
, which are found at the beginning of almost a third (32%) of all pipelines. Figure 4 shows a flow diagram of the top pipelines with three commands.
diskspace
→du -S | sort -n -r | more
or weather
→ wget -qO - http://wttr.in/ | head -7
, to the very terse, as in h
→ history | uniq | tail -15
or lll
→ ls -trlh | less
. Interestingly, aliases with the same name usually describe pipelines with the same general shape (the same commands in the same order), but slightly different argument combinations: lsd → ls -l | grep "̂d"
lsd → ls -la | grep ̂d
lsd → ls -lGFA --color | grep -i "̂d.⋆/"
lsd → ls -lh | grep --color=never '̂d'
This highlights the highly personal nature of aliases, each customized for an individual use case.brew
has a subcommand update
, for updating the package database, and a subcommand upgrade
, for upgrading previously installed packages to the latest available versions. 28.08% of all aliases involving the brew
command contain the composition brew update && brew upgrade
(sometimes with ;
instead of &&
), with alias names like update
, brewup
, bup
, etc. This pattern of repeated subcommand invocations can be found in 22062 aliases (1%), and it is most prevalent among package managers, like brew
, apt-get
, npm
or gem
, mostly for the same purpose as above.git
, however, with 12063 occurrences (3.89% of all aliases using git
). Here, the uses are more varied, e.g., commit
→ git add . && git commit -m
, or gitpull
→ git stash && git pull && git stash pop
, or indeed whoops
→ git reset --hard && git clean -df
.6 Implications
6.1 Learning Repair Rules
$ apt-get install vim
E: Could not open lock file /var/lib/dpkg/lock - open (13:
Permission denied)
E: Unable to lock the administration directory (/var/lib/dpkg/),
are you root?
apt-get
, or even looking at the specific error that is produced, a command repair system trained on our dataset of alias definitions could easily suggest the correct fix: sudo apt-get install vim
. It is reasonable to assume that this could be inferred as the correct invocation, because in aliases the command sequence apt-get install
occurs almost exclusively pre-fixed with sudo
.systemctl
command:
$\HCode{ }systemctl docker status
Unknown command verb docker.
systemctl status docker
. It is again very plausible that a repair rule for this type of error could be learned from our dataset, based on the prevalence of aliases containing the command systemctl
together with an argument status
that occurs in first position, indicating the latent knowledge that status
is in fact a subcommand of systemctl
.6.2 Discovering Workflows
sort
the output of the ps
command, the alias mem10
→ ps auxf | sort -nr -k 4 | head -10
can serve as a suggestion for the complex but common data transformation that results in showing the ten most memory-intensive processes.brew upgrade
results in a failure, we can suggest using brew update && brew upgrade
instead, based on the patterns in our dataset (cf. Section 6.1).comm
command for comparing sorted files line-by-line is not synthesizable in general, it becomes trivially parallelizable if each of its input lines is known to be unique. Evidence that this indeed the common case can be found in our dataset, where 41.29% of all occurrences of comm
follow sort | uniq
or sort -u
, and the remainder mostly have unique data sources as input, like pacman -Qeq
.6.3 Uncovering Conceptual Design Flaws
commit
→git add . && git commit -m
or gac
→git add --all && git commit
.15 Another frustration is having to use git stash
to temporarily save uncommitted changes and clean the working directory in order to avoid conflicts when using other Git commands. Stashing in itself has no higher purpose in version control, it merely exists as a concept to work around limitations in Git.16 This can be seen in aliases like gspull
→ git stash && git pull && git stash pop
, which defines a new type of pull command that stashes away ongoing work before pulling in remote changes and finally re-applying the stashed work. The same problem happens when switching branches, hence aliases like gsc
→ git stash && git checkout $1 && git stash pop
.gitstatus
→ git remote update && git status
. Unless one first manually updates Git’s local information about remote branches, the command git status
will happily report that the local branch is up-to-date with respect to its remote origin, even if the remote repository is in fact many commits ahead.6.4 Contextual Defaults
java
→ java -ea -server
ensures that Java programs are always run on a server-optimized virtual machine) or interactive terminal vs shell script use (cf. Section 6.5), or if the tool assumes a certain type of user with different needs than the actual user.ffmpeg
command is ffmpeg
→ ffmpeg -hide_banner
, suppressing verbose default output that can be confusing for newcomers but is helpful for the tool developers when providing support and locating errors.17 We could imagine providing different sets of defaults to different users, effectively alias starter packs, generated from our data. We see parallels to work that investigates contextual preferences and personalization in information systems (De Amo et al. 2015; Stefanidis et al. 2011) and privacy research (Wijesekera et al. 2018; Alom et al. 2019).6.5 Interactivity vs Scripting
mv
→ mv -i
. Here, the mv
command is redefined to always run interactively, prompting the user at critical points, i.e. before overwriting existing files. The default operating mode of mv
, and most other commands, is to assume that the user is aware of and okay with the possible consequences of running it—and that they have not made any mistakes in its invocation. This is of course a much more useful assumption in a scripting context.mount
→ mount | column -t
, which aligns the output of the mount
command for easier reading, or df
→df -h
or ll
→ ls -lh
, which change the default output of these commands so that file sizes are not shown simply in bytes but rather in much more practical common units like megabytes. The high prevalence of aliases for colorizing output (e.g. grep
→ grep --color=auto
) is also notable, as color only makes sense in an interactive context. In terminals, colorful text is achieved by inserting ANSI escape codes into the text stream. This is a hindrance for scripts, but tools could easily detect whether they are run in an interactive terminal or as part of a script and adjust their output accordingly.7 Threats to Validity
8 Related Work
csh
shell from 168 users. The data was used in a follow up study to analyze the use of interactive systems by examining the frequency of command invocations for different groups of users (Greenberg and Witten 1988). In later work, Davison and Hirsh (1998) use probabilistic action modeling to predict user action sequences based on the same dataset. Korvemaker and Greiner (2000) similarly predict future action sequences in command lines, but condition on actions of the particular user group with the goal of enabling adaptive user interfaces. Other work in the context of adaptive user interfaces by Jacobs and Blockeel (2001) uses association rule learning on the shell logs to produce scripts to automate common task sequences. Khosmood et al. (2014) use the same corpus and two additional, more recent, corpora to learn a model that can identify user profiles based on their command-line behavior. Bespoke (Vaithilingam and Guo 2019) is a system that synthesizes specialized graphical user interfaces (GUIs) based on command usage. Our work can be viewed as an input to this system that passes common shell workflows in aliases to be generated as GUIs.