welcome/
java-mcmc/
software/
papers/
links/
email me

Smart preparsing with the bash shell

Is hand quoting in bash a pain? Do you want to display the currently executing command line in your xterm title? Don't you wish you could just paste an URL into a waiting bash and have a browser start automatically?

The bash(1) shell is a ubiquitous interactive command line environment; It is powerful, but inevitably has many annoyances. Below is a trick you can add to your .bashrc initialization file, which I call smart preparsing.

When you're using bash interactively in an xterm, smart preparsing lets you execute shell commands, and arbitrarily modify the command line programmatically, right after you type RETURN and right before bash looks at it. This is a generalization of the bash xterm title trick explained previously.

You can do anything you like with the command line this way. You can inspect it to see if it's a URL, then modify it to read "firefox URL". You can take the value and write it into your xterm's title bar. You can see if the command is special, and if so surround the parameters by quotation marks, but only if they're missing.

It's not trivial to do some of these things with a standard shell. For example, the canonical form of a shell statement is "COMMAND [ARG1 ARG2 ...]". If COMMAND isn't an executable program, or some predefined symbol, the typical shell returns an error. So how do we recognize a statement that looks like "http://localhost:8080" as a URL?

In the case of bash(1), the answer lies in using the GNU readline facility creatively. When bash is loaded in an xterm and asks for input, it delegates the work of recognizing the keystrokes to the readline library. In turn, readline has a macro binding facility, which lets us replace the effect of typing a single key with a predefined sequence of other keys.

Take a look at the following bash commands. Each bind instruction tells readline to replace a sequence of keys with a different sequence of keys:

    bind  '"\C-x1": "\C-afoo \t\C-a\C-d\C-d\C-d\C-d"'
    bind  '"\C-x2": ""'
    bind  '"\C-x3": "\C-abar \t\C-a\C-d\C-d\C-d\C-d"'
    bind  'RETURN: "\C-x1\C-x2\C-x3\C-e\n"'
The last instruction tells readline to replace the RETURN keystroke with the sequence "\C-x1\C-x2\C-x3\C-e\n". In readline-speak, this sequence means "press CONTROL+X, press 1, press CONTROL+X, press 2, press CONTROL+X, press 3, press CONTROL+E, press NEWLINE".

The NEWLINE causes readline to send the command line to bash for processing. If we hadn't set up the bind instructions, then the effect of RETURN would be to press NEWLINE only.

The CONTROL+E key normally tells readline to put the cursor at the end of the line, we do this just to be nice. What about "CONTROL+X followed by 1"? Normally, readline doesn't treat this specially, but look again at the bind instructions above: the first instruction binds "CONTROL+X followed by 1" to another sequence, so readline knows that when it sees "CONTROL+X followed by 1" it should replace the sequence with "CONTROL+A, f, o, o, SPACE, TAB, CONTROL+A..." etc. Similarly, we've set up "CONTROL+X followed by 2" to be replaced by nothing, and "CONTROL+X followed by 3" is replaced by another complicated sequence.

If you look at the macro definition for "\C-x1", it helps to know that "\C-a" places the cursor at the beginning of the line, and "\C-d" deletes the character immediately following the cursor. So the sequence means "go to the beginning, type "foo " (note the extra space), press TAB, go to the beginning, delete the next four characters".

So imagine the current command line contains "abc". Then "\C-x1" causes readline to replace the command line with "foo abc", to press TAB while the cursor is one space after "foo", then to go back and delete "foo" so that in the end the command line is again "abc". Try seeing what "\C-x3" does, then put all this together and explain how the binding for RETURN works.

So far, this seems rather pointless, but the TAB key has a special meaning in readline. It calls the bash programmable completion for the symbol foo. Normally, bash doesn't know about foo, but if we tell it the following, then the effect will be to execute the function fooF() whenever TAB is pressed.

function fooF() {
   # some instructions...
}
complete -F fooF foo
What this means for us is that when we press RETURN, readline causes bash to execute fooF(), and fooF() is a normal shell function which can look at the command line through the $COMP_LINE variable. Note that all this is happening before the NEWLINE ("\n") is sent, which subsequently causes bash to actually read and execute the command line.

So we can do things in fooF(), such as filling in the xterm title with the contents of the command line $COMP_LINE. Another thing we can do is put some text in the variable $COMPREPLY. When bash returns from executing fooF(), it automatically inserts the contents of $COMPREPLY right after the cursor position. For example, if we decide to put COMPREPLY="firefox ", then we get the following (the cursor is represented by []):

abc[] (press RETURN)
foo []abc (press TAB)
(complete foo, execute fooF(), set COMPREPLY)
foo firefox []abc (finish the "\C-x1" sequence)
[]firefox abc (the "\C-x1" sequence is now finished)
(start the "\C-x2" sequence)...

So far, we're able to replace a command line such as "abc" with "firefox abc" and such like, but how do we make bigger changes to the command line? Remember that by default, we bound "\C-x2" to the empty string "", so when it is called next, nothing is done and we go straight to "\C-x3". But in fooF(), we're allowed to rebind "\C-x2" again if we like, depending on what we see exists in the command line.

For example, I have a shell function e() defined as follows:

function e() {
    echo "$@" | bc -l
}
What this function does is call bc(1), the arbitrary precision calculator. I do this so I can type things like "e 2+5" on the command line, and bash responds with 7. Sometimes I type things like
% e "sqrt(2/77)"
.16116459280507605967
but invariably I forget to put in the quote marks, so I really type
% e sqrt(2/77)
bash: syntax error near unexpected token `('
This is correct behaviour, since bash treats () specially, but it's very annoying. What I'd like instead is for bash to be smart enough to recognize that I meant to type the argument in quotes.

Let's suppose that the function fooF() recognizes that I've typed "e sqrt(2/77)" and decides to help me. It can rebind the "\C-x2" sequence as follows:

    bind  '"\C-x2": "\C-a\M-f\C-f\"\C-e\""'
This instruction means replace "CONTROL-X followed by 2" with a sequence which does the following: go to the beginning of the line, go forward one word, go forward one character, press DOUBLEQUOTE, go to the end of the line, press DOUBLEQUOTE again. Imagine that the command line is "e sqrt(2/77)", and see what happens to it when "\C-x2" is called.

Now we're nearly done. We can call arbitrary scripts in fooF(), perform arbitrary edits afterwards by redefining "\C-x2", and then let bash properly see the command line.

The only important thing left to do is to clean up. This is important because if we changed "\C-x2", then it will stay changed forever and we don't want to have strange key sequences executing if we press "CONTROL+X followed by 2" by mistake! To clean up, we simply call another function barF(), using the same technique used for fooF().

Here is the full system in place, which does title bar manipulations, calls a browser on simple URLs and quotes the calculator input. It also quotes the simple URLS in case they contain bad shell characters such as "&".

If you add this somewhere in your .bashrc, then all these things will happen when you press RETURN. But don't do it now, because you'll get errors. The code below needs the bc and firefox programs to be installed, and might interact with other statements in your .bashrc if you've extensively customized it. Just use it as a guide for your own experiments.

# this function redefines the RETURN key in readline,
# so that we can do things before bash executes it.
# 
# This is done by inserting the key sequence C-x1, C-x2, C-x3,
# where C-x1 calls the function fooF(), C-x2 can be redefined
# any way you like, and C-x3 calls the function barF().
#
# The idea is that fooF() performs something like changing the
# xterm title, and maybe rebinds C-x2 so that we can modify the
# command line. Then barF() performs cleanup and other things.
#
# The way that C-x1 calls fooF() is by prepending a dummy function
# foo before the command and pressing TAB, which calls the programmable
# completion facility of bash. Then bash calls fooF() for us, but we 
# must not forget to unset COMPREPLY, otherwise we'll get garbage displayed. 
# If you want to fill in COMPREPLY, you should do so in barF()
#
# We need the three stages because we can't change the command line
# directly inside the fooF() function, at least I don't know how.
#
# Currently, I'm using fooF() to change the xterm title, and look at
# the command line to see if I want to protect it from shell expansion.
# C-x2 either does nothing, or puts quote marks around the command line.
# barF() does cleanup.

function setup-interactive-foo() {
    # these commands update the xterm title etc.
    set -o emacs
    bind  '"\C-x1": "\C-afoo \t\C-a\C-d\C-d\C-d\C-d"'
    bind  '"\C-x2": ""'
    bind  '"\C-x3": "\C-abar \t\C-a\C-d\C-d\C-d\C-d"'
    bind  'RETURN: "\C-x1\C-x2\C-x3\C-e\n"'
    
    complete -F fooF foo
    complete -F barF bar
}

function setup-quote-protect() {
    bind  '"\C-x2": "\C-a\"\C-e\""'
}

function setup-quote-protect-arg() {
    bind  '"\C-x2": "\C-a\M-f\C-f\"\C-e\""'
}

function clear-quote-protect() {
    bind  '"\C-x2": ""'
}

function check-quote-char() {
    C=${1:$2:1}
    [ "$C" = '"' ] || [ "$C" = "'" ]
}

# command line looks like "foo $@"
function fooF() {
    # place command line in xterm title bar
    echo -ne "\033]0;bash: ${COMP_LINE:4}\007"
    
    # rudimentary parsing of command line
    case "${COMP_LINE:4}" in
	http://*) 
        setup-quote-protect
	;;
	e\ *)
        if ! check-quote-char "${COMP_LINE:4}" 2; then
	    setup-quote-protect-arg
	fi
	;;
    esac
    unset -v COMPREPLY
}

# command line looks like "bar $@"
function barF() {
    clear-quote-protect
    case "${COMP_LINE:4}" in
	\"http*) 
        COMPREPLY="firefox " 
	;;
	*)
	unset -v COMPREPLY
	;;
    esac
}

function e() {
    echo "$@" \
        | sed 's/sin/s/g;s/cos/c/g;s/arctan/a/g;s/log/l/g;s/exp/e/g;s/bessel/j/g;' \
        | bc -l
}