Welcome to the spot monorepo where multiple applications share code via toucanlib.
The applications are:
- spot: The text editor,
- m4: An implementation of the m4 macro processor,
- bc: A basic calculator, and
- freq: Determines the character frequency in a file.
spot and m4 utilise the built-in regular expression module in toucanlib which understands the regex syntax listed in a later section.
toucanlib has a monolithic interface to make usage easy, but multiple modules to make maintenance easy. Currently, the modules include:
- gen: Generic functions,
- num: Number functions,
- buf: Various buffers,
- gb: Gap buffers,
- eval: Evaluate arithmetic expression,
- ht: Hash table,
- regex: Regular expression,
- curses: Curses (terminal graphics), and
- fs: File system related functions.
This software is cross-platform and has been written in ANSI C with the minimum
use of non-standard libraries. To install, edit one of the scripts below
(depending upon your operating system) to set install_dir
.
Then simply run:
$ make
$ PREFIX="$HOME" make install
$ make clean
or
> nmake /F nMakefile
> set PREFIX=%HOMEDRIVE%%HOMEPATH%&& nmake /F nMakefile install
> nmake /F nMakefile clean
Make sure PREFIX/bin
is included in your PATH
.
spot is a cross-platform text editor that has been written in ANSI C with the minimum use of non-standard libraries.
It uses double-buffering to display flicker-free graphics without using any curses library.
Gap buffers are used to edit the text, which are very efficient for most operations. A nice balance has been achieved between optimisation, features, and code maintainability.
The region is the area between the cursor and the mark, with whichever appears first included in the region and whichever appears last excluded from the region. It is cleared by editing commands, and navigational commands are used to change its coverage.
The status bar displays !
if the last command failed, followed by *
if the
buffer has been modified. The filename associated with the buffer is presented
next, followed by the current row and column number in brackets. The hex value
of the char under the cursor (which may be in the command line) is displayed.
Finally, the if the last command included a shell command which succeeded (the
process terminated normally), then the exit status is displayed.
The command line is at the bottom of the window and is used for two-step commands that require user input. Most single-step commands work inside the command line.
The keybindings are listed below. ^a
means pressing Ctrl
plus a
.
^[
is the Ctrl
key plus [
, but is also generated by pressing the
Esc
key. LK
denotes the left key, and RK
denotes the right key.
Keys | Command |
---|---|
^b |
Left character |
^f |
Right character |
^p |
Up line |
^n |
Down line |
^d |
Delete character |
^h |
Backspace character |
^a |
Start of line |
^e |
End of line |
^2 |
Set mark |
^g |
Clear mark, or exit command line |
^l |
Centre cursor on the screen and redraw graphics |
^w |
Cut region |
^y |
Paste |
^k |
Cut to end of line |
^t |
Trim trailing white-space and remove non-printable chars |
^s |
Exact forward search (moves cursor to start of the match) |
^z |
Regex forward search (moves cursor to after the match) |
^r |
Regex replace region * |
^u |
Go to line number |
^q |
Insert hex |
^[ b |
Left word |
^[ f |
Right word |
^[ l |
Lowercase word |
^[ u |
Uppercase word |
^[ k |
Cut to start of line |
^[ m |
Match bracket <> , [] , {} , or () |
^[ n |
Repeat last search |
^[ w |
Copy region |
^[ ! |
Remove current gap buffer without saving ^ |
^[ = |
Rename gap buffer |
^[ $ |
Insert shell command from the command line |
^[ ` |
Insert shell command of logical line under the cursor + |
^[ < |
Start of gap buffer |
^[ > |
End of gap buffer |
^x ^c |
Close editor without saving any buffers |
^x ^s |
Save the current buffer only |
^x ^f |
New gap buffer |
^x i |
Insert file |
^x LK |
Move left one gap buffer |
^x RK |
Move right one gap buffer |
+
The logical line under the cursor is formed by joining neighbouring lines
that end in a backslash, to accommodate for long lines. These end-of-line
backslashes are removed from the logical line, as are \n
characters.
2>&1
is added to the end of the logical line, to capture stderr
under most
situations. If some stderr
text comes through uncaptured, then it can be
cleared by redrawing the screen (^l
).
*
Regex replace region syntax is _find_replace
where the first character
(in this example, _
) is the delimiter. The anchors, ^
and $
are relative
to the region, not the buffer.
^
Text editor will exit if it is the last gap buffer.
m4 is a general purpose macro processor. It performs text replacement, similar to the C preprocessor, although it is not tied to any specific language. It performs programmatic find-and-replace on text, but can also perform other tasks, such as executing shell commands.
m4 has two classes of macros; built-in macros and user-defined macros.
Built-in macros are written in C and can only be added or modified by editing
the source code and recompiling. User-defined macros are written in the m4
language and are added using the define
macro. Macros from either class can
be undefined (removed), but built-in macros cannot come back with their old
built-in nature (although you can reuse the name for a new user-defined macro).
m4 reads word-by-word from a centralised input buffer. (If quote mode is not activated) each word is looked up in a hash table to see if it is the name of a macro. If it is then the macro is pushed onto the stack. If the macro takes arguments, then these will be collected. When the macro is finished, for user-defined macros, the arguments are substituted into the placeholders in the macro definition and then the resultant text is pushed back into the input. Built-in macros usually perform some other operation on the arguments, and some of them also push the result back into the input.
Nested macro calls are handled by the stack. While collecting the arguments of one macro, another macro may be encountered. m4 deals with macros immediately. It will add the new macro to the stack and start collecting its arguments. Only once processing of the inner macro is done, will execution return to the outer macro. The inner macro may push its result back into the input which will then be read and rechecked for macros, until eventually the text has been fully expanded and ends up in the argument collection of the outer macro. This gives m4 its powerful recursive nature.
m4 also has flexible output. At any given time the output is either the
argument collection of a macro, or one of the eleven different diversions
(output buffers). Diversion -1 is discarded and is useful when defining a lot
of macros. Diversion 0 is regularly flushed to stdout
and is the default.
Diversions 1 to 9 are temporary storage. Diversions 1 to 9 can be written to
file, which is very useful when writing dynamic code. They can also be
undiverted into each other and will all be flushed to stdout
if the program
exits without error.
m4 uses quotes to suppress the expansion of macros by passing words directly to the output. Quotes can be nested. When entering quote mode the left quote is eaten, and when exiting quote mode the right quote is eaten, but quotes in-between are retained. Due to the recursive nature of m4, text is often evaluated multiple times, and each time the outer layer of quotes will be striped.
m4 only checks for macros when reading from the input in non-quote mode. During argument collection, quote mode prevents commas from being interpreted as argument separators (as do added parentheses). Once argument collection is finished, quotes have no effect on the substitution of collected arguments into their placeholders, this will occur irrespective of the depth of quotation.
This version of m4 has advanced safety features to make it easier to detect unintended usage. Built-in macros do not have name pass-through when called without arguments. To use a built-in macro name without detection it must be quoted. All macros must be passed the correct number of arguments, variadic macros are not allowed.
In the example below, the quotes are changed to [
and ]
. Then a
new macro called world
is created with a defining text of cool
. world
is
then shown to expand to cool
, as expected.
It superficially looks like x
is defined as [[hello $1]]
, however, as
[[hello $1]]
is read from the input, quote mode is entered and exited,
stripping off the outer layer of quotes. As shown by dumpdef
, x
is actually
defined with a single layer of quotes, [hello $1]
.
x
is called with what appears like an argument of [world]
. However, as
[world]
is read from the input, quote mode is entered and exited, stripping
off the quotes, resulting in a gathered argument of world
(which was not
looked up in the hash table due to the quote mode).
The argument, world
is substituted into the placeholder $1
, interestingly,
unaffected by the quotes in the definition! The result, [hello world]
,
(which cannot be viewed directly) is pushed back into the input. When this
is re-read, quote mode is entered and exited again, stripping off the quotes
and preventing world
from being expanded, leaving the output as simply
hello world
.
Quoting the input to x
and the output from x
gave no opportunity for
world
to expand to cool
.
changequote([, ])
define(world, cool)
world
cool
define(x, [[hello $1]])
dumpdef([x])
User-def: x: [hello $1]
x([world])
hello world
m4exit
I strongly recommend learning m4. For a small investment in learning it provides a powerful tool that is free from a lot of the limitations imposed by many programming languages.
I will now introduce the built-in macros.
changequote(left_quote, right_quote)
Sets the left and right quote strings. Please note that they must be different,
non-empty strings that can only contain graph, non-comma, non-parentheses
characters. It is normally a good idea to pick strings that are not a commonly
used in any downstream programming language, such as <[
and ]>
.
define(`macro_name', `macro_def')
define
is used to create user-defined macros. If the macro already exists,
then the old macro will be replaced, even if it is a built-in macro (which
loses the ability to bring it back). Please note that quotes are needed
when replacing a macro, otherwise the macro will expand during argument
collection, prior to the define
macro being executed. Macro names must start
with an alpha character or underscore followed by none or more alpha, digit or
underscore characters. The macro definition is the text that the macro will
expand into. It can take argument placeholders, $0
to $9
. $0
is the
macro name. $1
to $9
are the arguments collected when the macro is called.
divert(div_num)
divert
changes the active diversion. m4 commences in diversion of 0,
which is regularly flushed to stdout
. Diversion -1 is discarded.
It is often used when defining multiple macros, as the remaining newline
characters are typically not wanted in the output.
divnum
divnum
pushes the active diversion number into the input.
dnl
dnl
deletes to (and including) the next newline character. Often used for
single-line comments or for removing the newline character after a macro
definition.
dumpdef(`macro_name', ...)
dumpdef
prints the definition of the macros specified
(which should be quoted) to stderr
. Useful as a help command.
dumpdefall
Prints all macro definitions to stderr
. Useful when debugging.
errexit
Exit upon the first user-related error.
errok
Continue execution even with user-related errors.
traceon
Prints to stderr
the location in the input and name of macro when they are
called.
traceoff
Turns off trace.
errprint(error_message)
errprint
prints a message to stderr
.
esyscmd(shell_command)
esyscmd
runs an operating system specific shell command and reads the
stdout
of that command into the input.
evalmath(arithmetic_expression, verbose)
evalmath
evaluates an arithmetic expression. It understands (
, )
, ^
,
*
, /
, %
(modulus), and unary and binary +
and -
. Works with signed
long integers. If verbose is 1, then the postfix form of the expression is
printed to stderr
.
ifdef(`macro_name', `when_defined', `when_undefined')
ifdef
checks to see if the first argument is a macro, and if so, pushes the
second argument back into the input. Otherwise, the third argument is pushed
back into the input. The macro name should be quoted to prevent it from
expanding during argument collection. Also, importantly, macros will be
expanded and processed immediately during argument collection, before the
branch in logic. So, the second and third arguments should also be quoted.
ifelse(switch, case_a, `when_a', case_b, `when_b', ... , `default')
ifelse
is like a switch statement in C. The first argument is the str
that is compared against the 2, 4, 6, ... arguments, and upon the first match
the next argument is pushed back into the input. Finally, if there is no
match, then the default argument (the last argument) is pushed back into the
input. Remember that arguments will be expanded and processed during argument
collection, which occurs before the branch in logic. So, it is a good idea
to quote the 3, 5, 7, ... and last arguments.
include(filename)
include
pushes the contents of a file into the input. Macros will be
processed.
incr(number)
incr
increments a number. The result is pushed into the input.
lsdir(dir_name)
lsdir
inserts a directory listing, with a line of hyphens separating
the directories (shown first) from the files (shown second).
m4exit(exit_value)
m4exit
allows the user to request early termination of m4, specifying the
desired exit value in the first argument. Please note that the specified exit
value will be overwritten if any errors occurred at any time during the script.
recrm(file_path)
remove
recursively removes a path if it exists.
regexrep(text, regex_find, replace, newline_sensitive, verbose)
regexrep
searches text for a regex pattern and replaces the matches.
If the fourth argument is 1, then newline insensitive matching occurs.
If verbose is 1, then the posfix form of the expression and the
nondeterministic finite automaton (NFA) structure are printed to stderr
.
sysval
sysval
pushes the return value of the last shell command run via esyscmd
into the input.
tnl(str)
tnl
trims trailing newline characters from the end of the first argument.
undefine(`macro_name')
undefine
removes a macro from the hash table. It is necessary to quote the
macro name. Built-in macros cannot be retrieved once undefined.
undivert(div_num_or_filename)
undivert
appends the contents of a diversion or file onto the current active
diversion. Undiverted diversions are emptied. A diversion cannot be undiverted
into itself, and diversion -1 cannot be undiverted (as it is discarded).
It is important to note that no processing occurs during this, macros are not
expanded.
writediv(div_num, filename, append)
writediv
empties the specified diversion to file. Creates missing directories
in the file path. If append is 1, then it will append to the end of the file,
otherwise the file will be overwritten.
bc is a cross-platform basic calculator. It reads from stdin
and works with
signed long integers. It understands (
, )
, ^
, *
, /
, %
(modulus),
and unary and binary +
and -
.
freq
determines the character frequency in a file. Non-graph characters are
displayed using their hex value. A character and its count are separated by
a space, and only characters present in the file are reported.
Usage:
freq file
Regular expression syntax:
- Special escaped characters (interpreted in all contexts):
\t
Tab.\n
Line feed.\r
Carriage return.\0
Null character.\xAA
Two digit hex value.
- Non-special escape characters. Interpretation is deactivated inside
character sets:
\c
Any other character,c
, is interpreted literally.
[^a-b]
Character set.(
and)
Grouping.- Anchors. These work line-wise when in newline sensitive mode, otherwise they
work at the entire text level (uninfluenced by embedded '\n' characters):
^
Start of line.$
End of line.
- Repetition operators:
*
Zero or more.+
One or more.?
Zero or one.
|
Alternate..
Any character, except\n
when in newline sensitive mode.
All escaped characters work in both the find and replace components of a regular expression replacement operation. Typically, escape sequences are deactivated inside character sets, with the exception of the special escaped characters mentioned above.
For example, [\x41-\x5A]
will be the character set containing characters
A
to Z
, inclusively. Whereas, [\*]
will be the character set containing
the characters \
and *
.
- Most characters are treated literally inside a character set (see the escaped character rules above).
^
immediately after the opening[
negates the set and is not considered as a character.-
between characters creates an inclusive range.- Characters sets cannot be empty.
- The first character is always treated literally, so
[^]]
is the set of all characters except for]
. - As an example,
[][^\%-]
is the set containing the characters;]
,[
,^
,\
,%
, and-
.
Enjoy, Logan =)_