项目作者: modernish

项目描述 :
Modernish is a library for writing robust, portable, readable, and powerful programs for POSIX-based shells and utilities.
高级语言: Shell
项目地址: git://github.com/modernish/modernish.git
创建时间: 2016-02-03T22:48:38Z
项目社区:https://github.com/modernish/modernish

开源协议:ISC License

下载


Releases


For code examples, see

EXAMPLES.md

and

share/doc/modernish/examples

modernish – harness the shell

  • Sick of quoting hell and split/glob pitfalls?
  • Tired of brittle shell scripts going haywire and causing damage?
  • Mystified by line noise commands like [, [[, (( ?
  • Is scripting basic things just too hard?
  • Ever wish that find were a built-in shell loop?
  • Do you want your script to work on nearly any shell on any Unix-like OS?

Modernish is a library for shell script programming which provides features
like safer variable and command expansion, new language constructs for loop
iteration, and much more. Modernish programs are shell programs; the new
constructs are mixed with shell syntax so that the programmer can take
advantage of the best of both.

There is no compiled code to install, as modernish is written entirely in the
shell language. It can be deployed in embedded or multi-user systems in which
new binary executables may not be introduced for security reasons, and is
portable among numerous shell implementations. The installer can also
bundle
a reduced copy of the library with your scripts, so they can run portably with
a known version of modernish without requiring prior installation.

Join us and help breathe some new life into the shell! We
are looking for testers, early adopters, and developers to join us.
Download the latest release
or check out the very latest development code from the master branch.
Read through the documentation below. Play with the example scripts and
write your own. Try to break the library and send reports of breakage.

Table of contents

Getting started

Run install.sh and follow instructions, choosing your preferred shell
and install location. After successful installation you can run modernish
shell scripts and write your own. Run uninstall.sh to remove modernish.

Both the install and uninstall scripts are interactive by default, but
support fully automated (non-interactive) operation as well. Command
line options are as follows:

install.sh [ -n ] [ -s shell ] [ -f ] [ -P pathspec ]
[ -d installroot ] [ -D prefix ] [ -B scriptfile … ]

  • -n: non-interactive operation
  • -s: specify default shell to execute modernish
  • -f: force unconditional installation on specified shell
  • -P: specify an alternative DEFPATH
    1. for the installation (be careful; usually *not* recommended)
  • -d: specify root directory for installation
  • -D: extra destination directory prefix (for packagers)
  • -B: bundle modernish with your scripts (-D required, -n implied), see
    1. [Appendix F](#user-content-appendix-f-bundling-modernish-with-your-script)

uninstall.sh [ -n ] [ -f ] [ -d installroot ]

  • -n: non-interactive operation
  • -f: delete */modernish directories even if files left
  • -d: specify root directory of modernish installation to uninstall

Two basic forms of a modernish program

In the simple form, modernish is added to a script written for a specific
shell. In the portable form, your script is shell-agnostic and may run on any
shell that can run modernish.

Simple form

The simplest way to write a modernish program is to source modernish as a
dot script. For example, if you write for bash:

  1. #! /bin/bash
  2. . modernish
  3. use safe
  4. use sys/base
  5. ...your program starts here...

The modernish use command load modules with optional functionality. The
safe module initialises the safe mode.
The sys/base module contains modernish versions of certain basic but
non-standardised utilities (e.g. readlink, mktemp, which), guaranteeing
that modernish programs all have a known version at their disposal. There are
many other modules as well. See Modules for more
information.

The above method makes the program dependent on one particular shell (in this
case, bash). So it is okay to mix and match functionality specific to that
particular shell with modernish functionality.

(On zsh, there is a way to integrate modernish with native zsh scripts. See
Appendix E.)

Portable form

The most portable way to write a modernish program is to use the special
generic hashbang path for modernish programs. For example:

  1. #! /usr/bin/env modernish
  2. #! use safe
  3. #! use sys/base
  4. ...your program begins here...

For portability, it is important there is no space after env modernish;
NetBSD and OpenBSD consider trailing spaces part of the name, so env will
fail to find modernish.

A program in this form is executed by whatever shell the user who installed
modernish on the local system chose as the default shell. Since you as the
programmer can’t know what shell this is (other than the fact that it passed
some rigorous POSIX compliance testing executed by modernish), a program in
this form must be strictly POSIX compliant – except, of course, that it
should also make full use of the rich functionality offered by modernish.

Note that modules are loaded in a different way: the use commands are part of
hashbang comment (starting with #! like the initial hashbang path). Only such
lines that immediately follow the initial hashbang path are evaluated; even
an empty line in between causes the rest to be ignored.
This special way of pre-loading modules is needed to make any aliases they
define work reliably on all shells.

Interactive use

Modernish is primarily designed to enhance shell programs/scripts, but also
offers features for use in interactive shells. For instance, the new repeat
loop construct from the var/loop module can be quite practical to repeat
an action x times, and the safe module on interactive shells provides
convenience functions for manipulating, saving and restoring the state of
field splitting and globbing.

To use modernish on your favourite interactive shell, you have to add it to
your .profile, .bashrc or similar init file.

Important: Upon initialising, modernish adapts itself to
other settings, such as the locale. It also removes certain aliases that
may keep modernish from initialising properly. So you have to organise your
.profile or similar file in the following order:

  • first, define general system settings (PATH, locale, etc.);
  • then, . modernish and use any modules you want;
  • then define anything that may depend on modernish, and set your aliases.

Non-interactive command line use

After installation, the modernish command can be invoked as if it were a
shell, with the standard command line options from other shells (such as
-c to specify a command or script directly on the command line), plus some
enhancements. The effect is that the shell chosen at installation time will
be run enhanced with modernish functionality. It is not possible to use
modernish as an interactive shell in this way.

Usage:

  1. modernish [ --use=module | shelloption … ]
    [ scriptfile ] [ arguments ]
  2. modernish [ --use=module | shelloption … ]
    -c [ script [ me-name [ arguments ] ] ]
  3. modernish --test [ testoption … ]
  4. modernish [ --version | --help ]

In the first form, the script in the file scriptfile is
loaded and executed with any arguments assigned to the positional parameters.

In the second form, -c executes the specified modernish
script, optionally with the me-name assigned to $ME and the
arguments assigned to the positional parameters.

The --use option pre-loads any given modernish modules
before executing the script.
The module argument to each specified --use option is split using
standard shell field splitting. The first field is the module name and any
further fields become arguments to that module’s initialisation routine.

Any given short-form or long-form shelloptions are
set or unset before executing the script. Both POSIX
shell options
and shell-specific options are supported, depending on
the shell executing modernish.
Using the shell option -e or -o errexit is an error, because modernish
does not support it and
would break.

The --test option runs the regression test suite and exits. This verifies
that the modernish installation is functioning correctly. See
Appendix B
for more information.

The --version and --help options output the relative information and exit.

Non-interactive usage examples

  • Count to 10 using a basic loop:
    modernish --use=var/loop -c 'LOOP for i=1 to 10; DO putln "$i"; DONE'
  • Run a portable-form
    modernish program using zsh and enhanced-prompt xtrace:
    zsh /usr/local/bin/modernish -o xtrace /path/to/program.sh

Shell capability detection

Modernish includes a battery of shell feature, quirk and bug detection
tests, each of which is given a special capability ID.
See Appendix A for a
list of shell capabilities that modernish currently detects, as well
as further general information on the capability detection framework.

thisshellhas is the central function of the capability detection
framework. It not only tests for the presence of shell features/quirks/bugs,
but can also detect specific shell built-in commands, shell reserved words,
shell options (short or long form), and signals.

Modernish itself extensively uses capability detection to adapt itself to the
shell it’s running on. This is how it works around shell bugs and takes
advantage of efficient features not all shells have. But any script using
the library can do this in the same way, with the help of this function.

Test results are cached in memory, so repeated checks using thisshellhas
are efficient and there is no need to avoid calling it to optimise
performance.

Usage:

thisshellhas item

  • If item contains only ASCII capital letters A-Z, digits 0-9 or _,
    return the result status of the associated modernish
    capability detection test.
  • If item is any other ASCII word, check if it is a shell reserved
    word or built-in command on the current shell.
  • If item is -- (end-of-options delimiter), disable the recognition of
    operators starting with - for subsequent items.
  • If item starts with --rw= or --kw=, check if the identifier
    immediately following these characters is a shell reserved word
    (a.k.a. shell keyword).
  • If item starts with --bi=, similarly check for a shell built-in command.
  • If item starts with --sig=, check if the shell knows about a signal
    (usable by kill, trap, etc.) by the name or number following the =.
    If a number > 128 is given, the remainder of its division by 128 is checked.
    If the signal is found, its canonicalised signal name is left in the
    REPLY variable, otherwise REPLY is unset. (If multiple --sig= items
    are given and all are found, REPLY contains only the last one.)
  • If item is -o followed by a separate word, check if this shell has a
    long-form shell option by that name.
  • If item is any other letter or digit preceded by a single -, check if
    this shell has a short-form shell option by that character.
  • item can also be one of the following two operators.
    • --cache runs all external modernish shell capability tests
      that have not yet been run, causing the cache to be complete.
    • --show performs a --cache and then outputs all the IDs of
      positive results, one per line.

thisshellhas continues to process items until one of them produces a
negative result or is found invalid, at which point any further items are
ignored. So the function only returns successfully if all the items
specified were found on the current shell. (To check if either one item or
another is present, use separate thisshellhas invocations separated by the
|| shell operator.)

Exit status: 0 if this shell has all the items in question; 1 if not; 2 if
an item was encountered that is not recognised as a valid identifier.

Note: The tests for the presence of reserved words, built-in commands,
shell options, and signals are different from capability detection tests in an
important way: they only check if an item by that name exists on this shell,
and don’t verify that it does the same thing as on another shell.

Names and identifiers

All modernish functions require portable variable and shell function names,
that is, ones consisting of ASCII uppercase and lowercase letters, digits,
and the underscore character _, and that don’t begin with digit. For shell
option names, the constraints are the same except a dash - is also
accepted. An invalid identifier is generally treated as a fatal error.

Internal namespace

Function-local variables are not supported by the standard POSIX shell; only
global variables are provided for. Modernish needs a way to store its
internal state without interfering with the program using it. So most of the
modernish functionality uses an internal namespace _Msh_* for variables,
functions and aliases. All these names may change at any time without
notice. Any names starting with _Msh_ should be considered sacrosanct and
untouchable; modernish programs should never directly use them in any way.

Of course this is not enforceable, but names starting with _Msh_ should be
uncommon enough that no unintentional conflict is likely to occur.

Modernish system constants

Modernish provides certain constants (read-only variables) to make life easier.
These include:

  • $MSH_VERSION: The version of modernish.
  • $MSH_PREFIX: Installation prefix for this modernish installation (e.g.
    /usr/local).
  • $MSH_MDL: Main modules directory.
  • $MSH_AUX: Main helper scripts directory.
  • $MSH_CONFIG: Path to modernish user configuration directory.
  • $ME: Path to the current program. Replacement for $0. This is
    necessary if the hashbang path #!/usr/bin/env modernish is used, or if
    the program is launched like sh /path/to/bin/modernish /path/to/script.sh, as these set $0 to the path to bin/modernish and
    not your program’s path.
  • $MSH_SHELL: Path to the default shell for this modernish installation,
    chosen at install time (e.g. /bin/sh). This is a shell that is known to
    have passed all the modernish tests for fatal bugs. Cross-platform scripts
    should use it instead of hard-coding /bin/sh, because on some operating
    systems (NetBSD, OpenBSD, Solaris) /bin/sh is not POSIX compliant.
  • $SIGPIPESTATUS: The exit status of a command killed by SIGPIPE (a
    broken pipe). For instance, if you use grep something somefile.txt | more and you quit more before grep is finished, grep is killed by
    SIGPIPE and exits with that particular status.
    Hardened commands or functions may need to handle such a SIGPIPE exit
    specially to avoid unduly killing the program. The exact value of this
    exit status is shell-specific, so modernish runs a quick test to determine
    it at initialisation time.
    If SIGPIPE was set to ignore by the process that invoked the current
    shell, $SIGPIPESTATUS can’t be detected and is set to the special value
    1. See also the description of the
      WRN_NOSIGPIPE
      ID for
      thisshellhas.
  • $DEFPATH: The default system path guaranteed to find compliant POSIX
    utilities, as given by getconf PATH.
  • $ERROR: A guaranteed unset variable that can be used to trigger an
    error that exits the (sub)shell, for instance:
    : "${4+${ERROR:?excess arguments}}" (error on 4 or more arguments)

Control character, whitespace and shell-safe character constants

POSIX does not provide for the quoted C-style escape codes commonly used in
bash, ksh and zsh (such as $'\n' to represent a newline character),
leaving the standard shell without a convenient way to refer to control
characters. Modernish provides control character constants (read-only
variables) with hexadecimal suffixes $CC01 .. $CC1F and $CC7F, as well as $CCe,
$CCa, $CCb, $CCf, $CCn, $CCr, $CCt, $CCv (corresponding with
printf backslash escape codes). This makes it easy to insert control
characters in double-quoted strings.

More convenience constants, handy for use in bracket glob patterns for use
with case or modernish match:

  • $CONTROLCHARS: All ASCII control characters.
  • $WHITESPACE: All ASCII whitespace characters.
  • $ASCIIUPPER: The ASCII uppercase letters A to Z.
  • $ASCIILOWER: The ASCII lowercase letters a to z.
  • $ASCIIALNUM: The ASCII alphanumeric characters 0-9, A-Z and a-z.
  • $SHELLSAFECHARS: Safe-list for shell-quoting.
  • $ASCIICHARS: The complete set of ASCII characters (minus NUL).

Usage examples:

  1. # Use a glob pattern to check against control characters in a string:
  2. if str match "$var" "*[$CONTROLCHARS]*"; then
  3. putln "\$var contains at least one control character"
  4. fi
  5. # Use '!' (not '^') to check for characters *not* part of a particular set:
  6. if str match "$var" "*[!$ASCIICHARS]*"; then
  7. putln "\$var contains at least one non-ASCII character" ;;
  8. fi
  9. # Safely split fields at any whitespace, comma or slash (requires safe mode):
  10. use safe
  11. LOOP for --split=$WHITESPACE,/ field in $my_items; DO
  12. putln "Item: $field"
  13. DONE

Reliable emergency halt

The die function reliably halts program execution, even from within
subshells, optionally
printing an error message. Note that die is meant for an emergency program
halt only, i.e. in situations were continuing would mean the program is in an
inconsistent or undefined state. Shell scripts running in an inconsistent or
undefined state may wreak all sorts of havoc. They are also notoriously
difficult to terminate correctly, especially if the fatal error occurs within
a subshell: exit won’t work then. That’s why die is optimised for
killing all the program’s processes (including subshells and external
commands launched by it) as quickly as possible. It should never be used for
exiting the program normally.

On interactive shells, die behaves differently. It does not kill or exit your
shell; instead, it issues SIGINT to the shell to abort the execution of your
running command(s), which is equivalent to pressing Ctrl+C.
In addition, if die is invoked from a subshell such as a background job, it
kills all processes belonging to that job, but leaves other running jobs alone.

Usage: die [ message ]

If the trap stack module
is active, a special
DIE pseudosignal
can be trapped (using plain old trap or
pushtrap)
to perform emergency cleanup commands upon invoking die.

If the MSH_HAVE_MERCY variable is set in a script and die is invoked
from a subshell, then die will only terminate the current subshell and its
subprocesses and will not execute DIE traps, allowing the script to resume
execution in the parent process. This is for use in special cases, such as
regression tests, and is strongly discouraged for general use. Modernish
unsets the variable on init so it cannot be inherited from the environment.

Low-level shell utilities

Outputting strings

The POSIX shell lacks a simple, straightforward and portable way to output
arbitrary strings of text, so modernish adds two commands for this.

  • put prints each argument separated by a space, without a trailing newline.
  • putln prints each argument, terminating each with a newline character.

There is no processing of options or escape codes. (Modernish constants
$CCn, etc.
can be used to insert control characters in double-quoted strings. To process escape codes, use
printf
instead.)

The echo command is notoriously unportable and kind of broken, so is
deprecated in favour of put and putln. Modernish does provide its own
version of echo, but it is only activated for
portable-form)
scripts. Otherwise, the shell-specific version of echo is left intact.
The modernish version of echo does not interpret any escape codes
and supports only one option, -n, which, like BSD echo, suppresses the
final newline. However, unlike BSD echo, if -n is the only argument, it is
not interpreted as an option and the string -n is printed instead. This makes
it safe to output arbitrary data using this version of echo as long as it is
given as a single argument (using quoting if needed).

Legibility aliases: not, so, forever

Modernish sets three aliases that can help to make the shell language look
slightly friendlier. Their use is optional.

not is a new synonym for !. They can be used interchangeably.

so is a command that tests if the previous command exited with a status
of zero, so you can test the preceding command’s success with if so or
if not so.

forever is a new synonym for while :;. This allows simple infinite loops
of the form: forever do stuff; done.

Enhanced exit

The exit command can be used as normal, but has gained capabilities.

Extended usage: exit [ -u ] [ status [ message ] ]

  • As per standard, if status is not specified, it defaults to the exit
    status of the command executed immediately prior to exit.
    Otherwise, it is evaluated as a shell arithmetic expression. If it is
    invalid as such, the shell exits immediately with an arithmetic error.
  • Any remaining arguments after status are combined, separated by spaces,
    and taken as a message to print on exit. The message shown is preceded by
    the name of the current program ($ME minus directories). Note that it is
    not possible to skip status while specifying a message.
  • If the -u option is given, and the shell function showusage is defined,
    that function is run in a subshell before exiting. It is intended to print
    a message showing how the command should be invoked. The -u option has no
    effect if the script has not defined a showusage function.
  • If status is non-zero, the message and the output of the showusage
    function are redirected to standard error.

chdir

chdir is a robust cd replacement for use in scripts.

The standard cd command
is designed for interactive shells and appropriate to use there.
However, for scripts, its features create serious pitfalls:

  • The $CDPATH variable is searched. A script may inherit a user’s
    exported $CDPATH, so cd may change to an unintended directory.
  • cd cannot be used with arbitrary directory names (such as untrusted user
    input), as some operands have special meanings, even after --. POSIX
    specifies that - changes directory to $OLDPWD. On zsh (even in sh mode
    on zsh \<= 5.7.1), numeric operands such as +12 or -345 represent
    directory stack entries. All such paths need escaping by prefixing ./.
  • Symbolic links in directory path components are not resolved by default,
    leaving a potential symlink attack vector.

Thus, robust and portable use of cd in scripts is unreasonably difficult.
The modernish chdir function calls cd in a way that takes care of all
these issues automatically: it disables $CDPATH and special operand
meanings, and resolves symbolic links by default.

Usage: chdir [ -f ] [ -L ] [ -P ] [ -- ] directorypath

Normally, failure to change the present working directory to directorypath
is a fatal error that ends the program. To tolerate failure, add the -f
option; in that case, exit status 0 signifies success and exit status 1
signifies failure, and scripts should always check and handle exceptions.

The options -L (logical: don’t resolve symlinks) and -P (physical:
resolve symlinks) are the same as in cd, except that -P is the default.
Note that on a shell with BUG_CDNOLOGIC (NetBSD sh),
the -L option to chdir does nothing.

To use arbitrary directory names (e.g. directory names input by the user or
other untrusted input) always use the -- separator that signals the end of
options, or paths starting with - may be misinterpreted as options.

insubshell

The insubshell function checks if you’re currently running in a
subshell environment
(usually called simply subshell).

A subshell is a copy of the parent shell that starts out as an exact
duplicate (including non-exported variables, functions, etc.), except for
traps. A new subshell is invoked by constructs like (parentheses),
$(command substitutions), pipe|lines, and & (to launch a background
subshell). Upon exiting a subshell, all changes to its state are lost.

This is not to be confused with a newly initialised shell that is
merely a child process of the current shell, which is sometimes
(confusingly and wrongly) called a “subshell” as well.
This documentation avoids such a misleading use of the term.

Usage: insubshell [ -p | -u ]

This function returns success (0) if it was called from within a subshell
and non-success (1) if not. One of two options can be given:

  • -p: Store the process ID (PID) of the current subshell or main shell
    in REPLY.
  • -u: Store an identifier in REPLY that is useful for determining if
    you’ve entered a subshell relative to a previously stored identifier. The
    content and format are unspecified and shell-dependent.

isset

isset checks if a variable, shell function or option is set, or has
certain attributes. Usage:

  • isset varname: Check if a variable is set.
  • isset -v varname: Id.
  • isset -x varname: Check if variable is exported.
  • isset -r varname: Check if variable is read-only.
  • isset -f funcname: Check if a shell function is set.
  • isset -optionletter (e.g. isset -C): Check if shell option is set.
  • isset -o optionname: Check if shell option is set by long name.

Exit status: 0 if the item is set; 1 if not; 2 if the argument is not
recognised as a valid identifier.
Unlike most other modernish commands, isset does not treat an invalid
identifier as a fatal error.

When checking a shell option, a nonexistent shell option is not an error,
but returns the same result as an unset shell option. (To check if a shell
option exists, use thisshellhas.

Note: just isset -f checks if shell option -f (a.k.a. -o noglob) is
set, but with an extra argument, it checks if a shell function is set.
Similarly, isset -x checks if shell option -x (a.k.a -o xtrace)
is set, but isset -x varname checks if a variable is exported. If you
use unquoted variable expansions here, make sure they’re not empty, or
the shell’s empty removal mechanism will cause the wrong thing to be checked
(even in the safe mode).

setstatus

setstatus manually sets the exit status $? to the desired value. The
function exits with the status indicated. This is useful in conditional
constructs if you want to prepare a particular exit status for a subsequent
exit or return command to inherit under certain circumstances.
The status argument is a parsed as a shell arithmetic expression. A negative
value is treated as a fatal error. The behaviour of values greater than 255
is not standardised and depends on your particular shell.

Testing numbers, strings and files

The test/[ command is the bane of casual shell scripters. Even advanced
shell programmers are frequently caught unaware by one of the many pitfalls
of its arcane, hackish syntax. It attempts to look like shell grammar without
being shell grammar, causing myriad problems
(1,
2).
Its -a, -o, ( and ) operators are inherently and fatally broken as
there is no way to reliably distinguish operators from operands, so POSIX
deprecates their use;
however, most manual pages do not include this essential information, and
even the few that do will not tell you what to do instead.

Ksh, zsh and bash offer a [[ alternative that fixes many of these problems,
as it is integrated into the shell grammar. Nevertheless, it increases
confusion, as entirely different grammar and quoting rules apply
within [[]] than outside it, yet many scripts end up using them
interchangeably. It is also not available on all POSIX shells. (To make
matters worse, Busybox ash has a false-friend [[ that is just an alias
of [, with none of the shell grammar integration!)

Finally, the POSIX test/[ command is incompatible with the modernish
“safe mode” which aims to eliminate most of the need to quote variables.
See use safe for more information.

Modernish deprecates test/[ and [[ completely. Instead, it offers a
comprehensive alternative command design that works with the usual shell
grammar in a safer way while offering various feature enhancements. The
following replacements are available:

Integer number arithmetic tests and operations

To test if a string is a valid number in shell syntax, str isint is
available. See String tests.

The arithmetic command let

An implementation of let as in ksh, bash and zsh is now available to all
POSIX shells. This makes C-style signed integer arithmetic evaluation
available to every
supported shell,
with the exception of the unary ++ and -- operators
(which are a nonstandard shell capability detected by modernish under the ID of
ARITHPP).

This means let should be used for operations and tests, e.g. both
let "x=5" and if let "x==5"; then… are supported (note: single = for
assignment, double == for comparison). See POSIX
2.6.4 Arithmetic Expansion
for more information on the supported operators.

Multiple expressions are supported, one per argument. The exit status of let
is zero (the shell’s idea of success/true) if the last expression argument
evaluates to non-zero (the arithmetic idea of true), and 1 otherwise.

It is recommended to adopt the habit to quote each let expression with
"double quotes", as this consistently makes everything work as expected:
double quotes protect operators that would otherwise be misinterpreted as
shell grammar, while shell expansions starting with $ continue to work.

Arithmetic shortcuts

Various handy functions that make common arithmetic operations
and comparisons easier to program are available from the
var/arith module.

String and file tests

The following notes apply to all commands described in the subsections of
this section:

  1. “True” is understood to mean exit status 0, and “false” is understood to
    mean a non-zero exit status – specifically 1.
  2. Passing more than the number of arguments specified for each command
    is a fatal error. (If the
    safe mode is not used, excessive arguments
    may be generated accidentally if you forget to quote a variable. The
    test result would have been wrong anyway, so modernish kills the
    program immediately, which makes the problem much easier to trace.)
  3. Passing fewer than the number of arguments specified to the command is
    assumed to be the result of removal of an empty unquoted expansion.
    Where possible, this is not treated as an error, and an exit status
    corresponding to the omitted argument(s) being empty is returned instead.
    (This helps make the safe mode possible; unlike
    with test/[, paranoid quoting to avoid empty removal is not needed.)

String tests

The str function offers various operators for tests on strings. For
example, str in $foo "bar" tests if the variable foo contains “bar”.

The str function takes unary (one-argument) operators that check a property
of a single word, binary (two-argument) operators that check a word against a
pattern, as well as an option that makes binary operators check multiple words
against a pattern.

Unary string tests

Usage: str operator [ word ]

The word is checked for the property indicated by operator; if the result
is true, str returns status 0, otherwise it returns status 1.

The available unary string test operators are:

  • empty: The word is empty.
  • isint: The word is a decimal, octal or hexadecimal integer number in
    valid POSIX shell syntax, safe to use with let, $(()) and other
    arithmetic contexts on all POSIX-derived shells. This operator ignores
    leading (but not trailing) spaces and tabs.
  • isvarname: The word is a valid portable shell variable or function name.

If word is omitted, it is treated as empty, on the assumption that it is
an unquoted empty variable. Passing more than one argument after the
operator is a fatal error.

Binary string matching tests

Usage: str operator [ [ word ] pattern ]

The word is compared to the pattern according to the operator; if it
matches, str returns status 0, otherwise it returns status 1.
The available binary matching operators are:

  • eq: word is equal to pattern.
  • ne: word is not equal to pattern.
  • in: word includes pattern.
  • begin: word begins with pattern.
  • end: word ends with pattern.
  • match: word matches pattern as a shell glob pattern
    (as in the shell’s native case construct).
    A pattern that ends in an unescaped backslash is considered invalid
    and causes str to return status 2.
  • ematch: word matches pattern as a POSIX
    extended regular expression.
    An empty pattern is a fatal error.
    (In UTF-8 locales, check if
    thisshellhas WRN_EREMBYTE
    before matching multi-byte characters.)
  • lt: word lexically sorts before (is ‘less than’) pattern.
  • le: word is lexically ‘less than or equal to’ pattern.
  • gt: word lexically sorts after (is ‘greater than’) pattern.
  • ge: word is lexically ‘greater than or equal to’ pattern.

If word is omitted, it is treated as empty on the assumption that it is an
unquoted empty variable, and the single remaining argument is assumed to be
the pattern. Similarly, if both word and pattern are omitted, an empty
word is matched against an empty pattern. Passing more than two
arguments after the operator is a fatal error.

Multi-matching option

Usage: str -M operator [ [ word … ] pattern ]

The -M option causes str to compare any number of words to the
pattern. The available operators are the same as the binary string
matching operators listed above.

All matching words are stored in the REPLY variable, separated
by newline characters ($CCn) if there is more than one match.
If no words match, REPLY is unset.

The exit status returned by str -M is as follows:

  • If no words match, the exit status is 1.
  • If one word matches, the exit status is 0.
  • If between two and 254 words match, the exit status is the number of matches.
  • If 255 or more words match, the exit status is 255.

Usage example: the following matches a given GNU-style long-form command
line option $1 against a series of available options. To make it possible
for the options to be abbreviated, we check if any of the options begin with
the given argument $1.

  1. if str -M begin --fee --fi --fo --fum --foo --bar --baz --quux "$1"; then
  2. putln "OK. The given option $1 matched $REPLY"
  3. else
  4. case $? in
  5. ( 1 ) putln "No such option: $1" >&2 ;;
  6. ( * ) putln "Ambiguous option: $1" "Did you mean:" "$REPLY" >&2 ;;
  7. esac
  8. fi

File type tests

These avoid the snags with symlinks you get with [ and [[.
By default, symlinks are not followed. Add -L to operate on files
pointed to by symlinks instead of symlinks themselves (the -L makes
no difference if the operands are not symlinks).

These commands all take one argument. If the argument is absent, they return
false. More than one argument is a fatal error. See notes 1-3 in the
parent section.

is present file: Returns true if the file is present in the file
system (even if it is a broken symlink).

is -L present file: Returns true if the file is present in the file
system and is not a broken symlink.

is sym file: Returns true if the file is a symbolic link (symlink).

is -L sym file: Returns true if the file is a non-broken symlink, i.e.
a symlink that points (either directly or indirectly via other symlinks)
to a non-symlink file that is present in the file system.

is reg file: Returns true if file is a regular data file.

is -L reg file: Returns true if file is either a regular data file
or a symlink pointing (either directly or indirectly via other symlinks)
to a regular data file.

Other commands are available that work exactly like is reg and is -L reg
but test for other file types. To test for them, replace reg with one of:

  • dir for a directory
  • fifo for a named pipe (FIFO)
  • socket for a socket
  • blockspecial for a block special file
  • charspecial for a character special file

File comparison tests

The following notes apply to these commands:

  • Symlinks are not resolved/followed by default. To operate on files pointed
    to by symlinks, add -L before the operator argument, e.g. is -L newer.
  • Omitting any argument is a fatal error, because no empty argument (removed or
    otherwise) would make sense for these commands.

is newer file1 file2: Compares file timestamps, returning true if file1
is newer than file2. Also returns true if file1 exists, but file2 does
not; this is consistent for all shells (unlike test file1 -nt file2).

is older file1 file2: Compares file timestamps, returning true if file1
is older than file2. Also returns true if file1 does not exist, but file2
does; this is consistent for all shells (unlike test file1 -ot file2).

is samefile file1 file2: Returns true if file1 and file2 are the same
file (hardlinks).

is onsamefs file1 file2: Returns true if file1 and file2 are on the
same file system. If any non-regular, non-directory files are specified, their
parent directory is tested instead of the file itself.

File status tests

These always follow symlinks.

is nonempty file: Returns true if the file exists, is not a broken
symlink, and is not empty. Unlike [ -s file ], this also works
for directories, as long as you have read permission in them.

is setuid file: Returns true if the file has its set-user-ID flag set.

is setgid file: Returns true if the file has its set-group-ID flag set.

I/O tests

is onterminal FD: Returns true if file descriptor FD is associated
with a terminal. The FD may be a non-negative integer number or one of the
special identifiers stdin, stdout and stderr which are equivalent to
0, 1, and 2. For instance, is onterminal stdout returns true if commands
that write to standard output (FD 1), such as putln, would write to the
terminal, and false if the output is redirected to a file or pipeline.

File permission tests

Any symlinks given are resolved, as these tests would be meaningless
for a symlink itself.

can read file: True if the file’s permission bits indicate that you can read
the file - i.e., if an r bit is set and applies to your user.

can write file: True if the file’s permission bits indicate that you can
write to the file: for non-directories, if a w bit is set and applies to your
user; for directories, both w and x.

can exec file: True if the file’s type and permission bits indicate that
you can execute the file: for regular files, if an x bit is set and applies
to your user; for other file types, never.

can traverse file: True if the file is a directory and its permission bits
indicate that a path can traverse through it to reach its subdirectories: for
directories, if an x bit is set and applies to your user; for other file
types, never.

The stack

In modernish, every variable and shell option gets its own stack. Arbitrary
values/states can be pushed onto the stack and popped off it in reverse
order. For variables, both the value and the set/unset state is (re)stored.

Usage:

  • push [ --key=value ] item [ item … ]
  • pop [ --keepstatus ] [ --key=value ] item [ item … ]

where item is a valid portable variable name, a short-form shell option
(dash plus letter), or a long-form shell option (-o followed by an option
name, as two arguments).

Before pushing or popping anything, both functions check if all the given
arguments are valid and pop checks all items have a non-empty stack. This
allows pushing and popping groups of items with a check for the integrity of
the entire group. pop exits with status 0 if all items were popped
successfully, and with status 1 if one or more of the given items could not
be popped (and no action was taken at all).

The --key= option is an advanced feature that can help different modules
or functions to use the same variable stack safely. If a key is given to
push, then for each item, the given key value is stored along with the
variable’s value for that position in the stack. Subsequently, restoring
that value with pop will only succeed if the key option with the same key
value is given to the pop invocation. Similarly, popping a keyless value
only succeeds if no key is given to pop. If there is any key mismatch, no
changes are made and pop returns status 2. Note that this is
a robustness/convenience feature, not a security feature; the keys are not
hidden in any way.

If the --keepstatus option is given, pop will exit with the
exit status of the command executed immediately prior to calling pop. This
can avoid the need for awkward workarounds when restoring variables or shell
options at the end of a function. However, note that this makes failure to pop
(stack empty or key mismatch) a fatal error that kills the program, as pop
no longer has a way to communicate this through its exit status.

The shell options stack

push and pop allow saving and restoring the state of any shell option
available to the set builtin. The precise shell options supported
(other than the ones guaranteed by POSIX) depend on
the shell modernish is running on.
To facilitate portability, nonexistent shell options are treated as unset.

Long-form shell options are matched to their equivalent short-form shell
options, if they exist. For instance, on all POSIX shells, -f is
equivalent to -o noglob, and push -o noglob followed by pop -f works
correctly. This also works for shell-specific short & long option
equivalents.

On shells with a dynamic no option name prefix, that is on ksh, zsh and
yash (where, for example, noglob is the opposite of glob), the no
prefix is ignored, so something like push -o glob followed by pop -o noglob does the right thing. But this depends on the shell and should never
be used in portable scripts.

The trap stack

Modernish can also make traps stack-based, so that each
program component or library module can set its own trap commands
without interfering with others. This functionality is provided
by the var/stack/trap module.

Modules

As modularity is one of modernish’s
design principles,
much of its essential functionality is provided in the form of loadable
modules, so the core library is kept lean. Modules are organised
hierarchically, with names such as safe, var/loop and sys/cmd/harden. The
use command loads and initialises a module or a combined directory of modules.

Internally, modules exist in files with the name extension .mm in
subdirectories of lib/modernish/mdl – for example, the module
var/stack/trap corresponds to the file lib/modernish/mdl/var/stack/trap.mm.

Usage:

  • use modulename [ argument … ]
  • use [ -q | -e ] modulename
  • use -l

The first form loads and initialises a module. All arguments, including the
module name, are passed on to the dot script unmodified, so modules know
their own name and can implement option parsing to influence their
initialisation. See also
Two basic forms of a modernish program
for information on how to use modules in portable-form scripts.

In the second form, the -q option queries if a module is loaded, and the -e
option queries if a module exists. use returns status 0 for yes, 1 for no,
and 2 if the module name is invalid.

The -l option lists all currently loaded modules in the order in which
they were originally loaded. Just add | sort for alphabetical order.

If a directory of modules, such as sys/cmd or even just sys, is given as the
modulename, then all the modules in that directory and any subdirectories are
loaded recursively. In this case, passing extra arguments is a fatal error.

If a module file X.mm exists along with a directory X, resolving to the
same modulename, then use will load the X.mm module file without
automatically loading any modules in the X directory, because it is expected
that X.mm handles the submodules in X manually. (This is currently the case
for var/loop which auto-loads submodules containing loop types on first use).

The complete lib/modernish/mdl directory path, which depends on where
modernish is installed, is stored in the system constant $MSH_MDL.

The following subchapters document the modules that come with modernish.

use safe

The safe module sets the ‘safe mode’ for the shell. It removes most of the
need to quote variables, parameter expansions, command substitutions, or glob
patterns. It uses shell settings and modernish library functionality to secure
and demystify split and glob mechanisms. This creates a new and safer way of
shell script programming, essentially building a new shell language dialect
while still running on all POSIX-compliant shells.

Why the safe mode?

One of the most common headaches with shell scripting is caused by a
fundamental flaw in the shell as a scripting language: constantly
active field splitting
(a.k.a. word splitting) and pathname expansion
(a.k.a. globbing). To cope with this situation, it is hammered into
programmers of shell scripts to be absolutely paranoid about properly
quoting nearly everything, including
variable and parameter expansions, command substitutions, and patterns passed
to commands like find.

These mechanisms were designed for interactive command line usage, where they
do come in very handy. But when the shell language is used as a programming
language, splitting and globbing often ends up being applied unexpectedly to
unquoted expansions and command substitutions, helping cause thousands of
buggy, brittle, or outright dangerous shell scripts.

One could blame the programmer for forgetting to quote an expansion properly,
or one could blame a pitfall-ridden scripting language design where hammering
punctilious and counterintuitive habits into casual shell script programmers is
necessary. Modernish does the latter, then fixes it.

How the safe mode works

Every POSIX shell comes with a little-used ability to disable global field
splitting and pathname expansion: IFS=''; set -f. An empty IFS variable
disables split; the -f (or -o noglob) shell option disables pathname
expansion. The safe mode sets these, and two others (see below).

The reason these safer settings are hardly ever used is that they are not
practical to use with the standard shell language. For instance, for textfile in *.txt, or for item in $(some command) which both (!)
field-splits and pathname-expands the output of a command, all break.

However, that is where modernish comes in. It introduces several powerful
new loop constructs, as well as arbitrary code
blocks with local settings, each of which
has straightforward, intuitive operators for safely applying field splitting
or pathname expansion – to specific command arguments only. By default,
they are not both applied to the arguments, which is much safer. And your
script code as a whole is kept safe from them at all times.

With global field splitting and pathname expansion removed, a third issue
still affects the safe mode: the shell’s empty removal mechanism. If the
value of an unquoted expansion like $var is empty, it will not expand to
an empty argument, but will be removed altogether, as if it were never
there. This behaviour cannot be disabled.

Thankfully, the vast majority of shell and Un*x commands order their arguments
in a way that is actually designed with empty removal in mind, making it a
good thing. For instance, when doing ls $option some_dir, if $option is
-l the listing will be long-format and if is empty it will be removed, which
is the desired behaviour. (An empty argument there would cause an error.)

However, one command that is used in almost all shell scripts, test/[,
is completely unable to cope with empty removal due to its idiosyncratic
and counterintuitive syntax. Potentially empty operands come before options,
so operands removed as empty expansions cause errors or, worse, false
positives. Thus, the safe mode does not remove the need for paranoid
quoting of expansions used with test/[ commands. Modernish fixes
this issue by deprecating test/[ completely and offering
a safe command design
to use instead, which correctly deals with empty removal.

With the ‘safe mode’ shell settings, plus the safe, explicit and readable
split and glob operators and test/[ replacements, the only quoting
requirements left are:

  1. a very occasional need to stop empty removal from happening;
  2. to quote "$@" and "$*" until shell bugs are fixed (see notes below).

In addition to the above, the safe mode also sets these shell options:

  • set -C (set -o noclobber) to prevent accidentally overwriting files using
    output redirection. To force overwrite, use >| instead of >.
  • set -u (set -o nounset) to make it an error to use unset (that is,
    uninitialised) variables by default. You’ll notice this will catch many
    typos before they cause you hard-to-trace problems. To bypass the check
    for a specific variable, use ${var-} instead of $var (be careful).

Important notes for safe mode

  • The safe mode is not compatible with existing conventional shell scripts,
    written in what we could now call the ‘legacy mode’. Essentially, the safe
    mode is a new way of shell script programming. That is why it is not enabled
    by default, but activated by loading the safe module. It is highly
    recommended that new modernish scripts start out with use safe.
  • The shell applies entirely different quoting rules to string matching glob
    patterns within case constructs. The safe mode changes nothing here.
  • Due to shell bugs ID’ed as BUG_PP_*, the positional
    parameters expansions $@ and $* should still always be quoted. As of
    late 2018, these bugs have been fixed in the latest or upcoming release
    versions of all
    supported shells.
    But, until buggy versions fall out of use
    and modernish no longer supports any BUG_PP_* shell bugs, quoting "$@"
    and "$*" remains mandatory even in safe mode (unless you know with
    certainty that your script will be used on a shell with none of these bugs).
  • The behaviour of "$*" changes in safe mode. It uses the first character
    of $IFS as the separator for combining all positional parameters into
    one string. Since IFS is emptied in safe mode, there is no separator,
    so it will string them together unseparated. You can use something like
    push IFS; IFS=' '; var="$*"; pop IFS
    or LOCAL IFS=' '; BEGIN var="$*"; END
    to use the space character as a separator.
    (If you’re outputting the positional parameters, note that the
    put
    command always separates its arguments by spaces, so you can
    safely pass it multiple arguments with "$@" instead.)

Extra options for the safe mode

Usage: use safe [ -k | -K ] [ -i ]

The -k and -K module options install an extra handler that
reliably kills the program
if it tries to execute a command that is not found, on shells that have the
ability to catch and handle ‘command not found’ errors (currently bash, yash,
and zsh). This helps catch typos, forgetting to load a module, etc., and stops
your program from continuing in an inconsistent state and potentially causing
damage. The MSH_NOT_FOUND_OK variable may be set to temporarily disable this
check. The uppercase -K module option aborts the program on shells that
cannot handle ‘command not found’ errors (so should not be used for portable
scripts), whereas the lowercase -k variant is ignored on such shells.

If the -i option is given, or the shell is interactive, two extra one-letter
functions are loaded, s and g. These are pre-command modifiers for use when
split and glob are globally disabled; they allow running a single command with
local split and glob applied to that command’s arguments only. They also have
some options designed to manipulate, examine, save, restore, and generally
experiment with the global split and glob state on interactive shells. Type
s --help and g --help for more information. In general, the safe mode is
designed for scripts and is not recommended for interactive shells.

use var/loop

The var/loop module provides an innovative, robust and extensible
shell loop construct. Several powerful loop types are provided, while
advanced shell programmers may find it easy and fun to
create their own.
This construct is also ideal for the
safe mode:
the for, select and find loop types allow you to selectively
apply field splitting and/or pathname expansion to specific arguments
without subjecting a single line of your code to them.

The basic form is a bit different from native shell loops. Note the caps:
LOOP looptype arguments; DO
your commands here
DONE

The familiar dodone block syntax cannot be used because the shell
will not allow modernish to add its own functionality to it. The
DODONE block does behave in the same way as dodone: you can
append redirections at the end, pipe commands into a loop, etc. as usual.
The break and continue shell builtin commands also work as normal.

Remember: using lowercase dodone with modernish LOOP will
cause the shell to throw a misleading syntax error.
So will using uppercase
DODONE with the shell’s native loops. To help you remember to use the
uppercase variants for modernish loops, the LOOP keyword itself is also in
capitals.

Loops exist in submodules of var/loop named after the loop type; for
instance, the find loop lives in the var/loop/find module. However, the
core var/loop module will automatically load a loop type’s module when
that loop is first used, so use-ing individual loop submodules at your
script’s startup time is optional.

The LOOP block internally uses file descriptor 8 to do
its thing.
If your script happens to use FD 8 for other purposes, you should
know that FD 8 is made local to each loop block, and always appears
initially closed within DODONE.

Simple repeat loop

This simply iterates the loop the number of times indicated. Before the first
iteration, the argument is evaluated as a shell integer arithmetic expression
as in let
and its value used as the number of iterations.

  1. LOOP repeat 3; DO
  2. putln "This line is repeated 3 times."
  3. DONE

BASIC-style arithmetic for loop

This is a slightly enhanced version of the
FOR loop in BASIC.
It is more versatile than the repeat loop but still very easy to use.

LOOP for varname=initial to limit [ step increment ]; DO
some commands
DONE

To count from 1 to 20 in steps of 2:

  1. LOOP for i=1 to 20 step 2; DO
  2. putln "$i"
  3. DONE

Note the varname=initial needs to be one argument as in a shell
assignment (so no spaces around the =).

If “step increment“ is omitted, increment defaults to 1 if limit is
equal to or greater than initial, or to -1 if limit is less than
initial (so counting backwards ‘just works’).

Technically precise description: On entry, the initial, limit and
increment values are evaluated once as shell arithmetic expressions as in
let,
the value of initial is assigned to varname, and the loop iterates.
Before every subsequent iteration, the value of increment (as determined
on the first iteration) is added to the value of varname, then the limit
expression is re-evaluated; as long as the current value of varname is
less (if increment is non-negative) or greater (if increment is
negative) than or equal to the current value of limit, the loop reiterates.

C-style arithmetic for loop

A C-style for loop akin to for (( )) in ksh93, bash and zsh is now
available on all POSIX-compliant shells, with a slightly different syntax.
The one loop argument contains three arithmetic expressions (as in
let),
separated by semicolons within that argument. The first is only evaluated
before the first iteration, so is typically used to assign an initial value.
The second is evaluated before each iteration to check whether to continue
the loop, so it typically contains some comparison operator. The third is
evaluated before the second and further iterations, and typically increases
or decreases a value. For example, to count from 1 to 10:

  1. LOOP for "i=1; i<=10; i+=1"; DO
  2. putln "$i"
  3. DONE

However, using complex expressions allows doing much more powerful things.
Any or all of the three expressions may also be left empty (with their
separating ; character remaining). If the second expression is empty, it
defaults to 1, creating an infinite loop.

(Note that ++i and i++ can only be used on shells with
ARITHPP,
but i+=1 or i=i+1 can be used on all POSIX-compliant shells.)

Enumerative for/select loop with safe split/glob

The enumarative for and select loop types mirror those already present in
native shell implementations. However, the modernish versions provide safe
field splitting and globbing (pathname expansion) functionality that can be
used without globally enabling split or glob for any of your code – ideal
for the safe mode. They also add a unique operator
for processing text in fixed-size slices. The select loop type brings
select functionality to all POSIX shells and not just ksh, zsh and bash.

Usage:

LOOP [ for | select ] [ operators ] varname in argument;
DO commands ; DONE

Simple usage example:

  1. LOOP select --glob textfile in *.txt; DO
  2. putln "You chose text file $textfile."
  3. DONE

If the loop type is for, the loop iterates once for each argument, storing
it in the variable named varname.

If the loop type is select, the loop presents before each iteration a
numbered menu that allows the user to select one of the arguments. The prompt
from the PS3 variable is displayed and a reply read from standard input. The
literal reply is stored in the REPLY variable. If the reply was a number
corresponding to an argument in the menu, that argument is stored in the
variable named varname. Then the loop iterates. If the user enters ^D (end of
file), REPLY is cleared and the loop breaks with an exit status of 1. (To
break the menu loop under other conditions, use the break command.)

The following operators are supported. Note that the split and glob
operators are only for use in the safe mode.

  • One of --split or --split=characters. This operator safely applies
    the shell’s field splitting mechanism to the arguments given. The simple
    --split operator applies the shell’s default field splitting by space,
    tab, and newline. If you supply one or more of your own characters to
    split by, each of these characters will be taken as a field separator if
    it is whitespace, or field terminator if it is non-whitespace. (Note that
    shells with QRK_IFSFINAL treat both whitespace and
    non-whitespace characters as separators.)
  • One of --glob or --fglob. These operators safely apply shell pathname
    expansion (globbing) to the arguments given. Each argument is taken as
    a pattern, whether or not it contains any wildcard characters. For any
    resulting pathname that starts with - or + or is identical to ! or
    (, ./ is prefixed to keep various commands from misparsing it as an
    option or operand. Non-matching patterns are treated as follows:
    • --glob: Any non-matching patterns are quietly removed. If none match,
      the loop will not iterate but break with exit status 103.
    • --fglob: All patterns must match. Any nonexistent path terminates the
      program. Use this if your program would not work after a non-match.
  • --base=string. This operator prefixes the given string to each of the
    arguments, after first applying field splitting and/or pathname expansion
    if specified.
    If --glob or --fglob are given, then the string is used as a base
    directory path for pathname expansion, without expanding any wildcard
    characters in that base directory path itself.
    If such base directory can’t be entered, then if --glob was given, the loop
    breaks with status 98, or if --fglob was given, the program terminates.
  • One of --slice or --slice=number. This operator divides the
    arguments in slices of up to number characters. The default slice size
    is 1 character, allowing for easy character-by-character processing.
    (Note that shells with WRN_MULTIBYTE will
    not slice multi-byte characters correctly.)

If multiple operators are given, their mechanisms are applied in the
following order: split, glob, base, slice.

The find loop

This powerful loop type turns your local POSIX-compliant
find utility
into a shell loop, safely integrating both find
and xargs functionality into the POSIX shell. The infamous
pitfalls and limitations
of using find and xargs as external commands are gone, as all
the results from find are readily available to your main shell
script. Any “dangerous” characters in file names (including
whitespace and even newlines) “just work”, especially if the
safe mode
is also active. This gives you the flexibility to use either the find
expression syntax, or shell commands (including your own shell functions), or
some combination of both, to decide whether and how to handle each file found.

Usage:

LOOP find [ options ] varname [ in path … ]
[ find-expression ] ; DO commands ; DONE

LOOP find [ options ] --xargs[=arrayname] [ in path … ]
[ find-expression ] ; DO commands ; DONE

The loop recursively walks down the directory tree for each path given.
For each file encountered, it uses the find-expression to decide
whether to iterate the loop with the path to the file stored in the
variable referenced by varname. The find-expression is a standard
find
utility expression except as described below.

Any number of paths to search may be specified after the in keyword.
By default, a nonexistent path is a fatal error.
The entire in clause may be omitted, in which case it defaults to in .
so the current working directory will be searched. Any argument that starts
with a -, or is identical to ! or (, indicates the end of the paths
and the beginning of the find-expression; if you need to explicitly
specify a path with such a name, prefix ./ to it.

Except for syntax errors, any errors or warnings issued by find are
considered non-fatal and will cause the exit status of the loop to be
non-zero, so your script has the opportunity to handle the exception.

Available options
  • Any single-letter options supported by your local find utility. Note that
    POSIX specifies
    -H and -L only, so portable scripts should only use these.
    Options that require arguments (-f on BSD find) are not supported.
  • --xargs. This operator is specified instead of the varname; it is a
    syntax error to have both. Instead of one iteration per found item, as many
    items as possible per iteration are stored into the positional parameters
    (PPs), so your program can access them in the usual way (using "$@" and
    friends). Note that --xargs therefore overwrites the current PPs (however,
    a shell function or LOCAL block will give
    you local PPs). Modernish clears the PPs upon completion of the loop, but if
    the loop is exited prematurely (such as by break), the last chunk survives.
    • On shells with the KSHARRAY
      capability, an
      extra variant is available: --xargs=arrayname which uses the named
      array instead of the PPs. It otherwise works identically.
  • --try. If this option is specified, then if one of the primaries used in
    the find-expression is not supported by either the find utility used by
    the loop or by modernish itself, LOOP find will not throw a
    fatal error
    but will instead quietly abort the loop without iterating it, set the loop’s
    exit status to 128, and leave the invalid primary in the REPLY variable.
    (Expression errors other than ‘unknown primary’ remain fatal errors.)
  • One of --split or --split=characters. This operator, which is only
    accepted in the safe mode, safely applies the
    shell’s field splitting mechanism to the path name(s) given (but not
    to any patterns in the
    find-expression, which are passed on to the find
    utility as given)
    . The simple --split operator applies the shell’s default
    field splitting by space, tab, and newline. Alternatively, you can supply
    one or more characters to split by. If any pathname resulting from the
    split starts with - or + or is identical to ! or (, ./ is prefixed.
  • One of --glob or --fglob. These operators are only accepted in the
    safe mode. They safely apply shell pathname
    expansion (globbing) to the path name(s) given (but not to any
    patterns in the
    find-expression, which are passed on to the find utility
    as given)
    . All path names are taken as patterns, whether or not they
    contain any wildcard characters. If any pathname resulting from the
    expansion start with - or + or is identical to ! or (, ./ is
    prefixed. Non-matching patterns are treated as follows:
    • --glob: Any pattern not matching an existing path will output a
      warning to standard error and set the loop’s exit status to 103 upon
      normal completion, even if other existing paths are processed
      successfully. If none match, the loop will not iterate.
    • --fglob: Any pattern not matching an existing path is a fatal error.
  • --base=basedirectory. This operator prefixes the given basedirectory
    to each of the path names (and thus to each path found by find), after
    first applying field splitting and/or pathname expansion if specified.
    If --glob or --fglob are given, then wildcard characters are only
    expanded in the path names and not in the prefixed basedirectory.
    If the basedirectory can’t be entered, then either the loop breaks with
    status 98, or if --fglob was given, the program terminates.
Available find-expression operands

LOOP find can use all expression operands supported by your local find
utility; see its manual page. However, portable scripts should use only
operands specified by POSIX
along with the modernish additions described below.

The modernish -iterate expression primary evaluates as true and causes the
loop to iterate, executing your commands for each matching file. It may be
used any number of times in the find-expression to start a corresponding
series of loop iterations. If it is not given, the loop acts as if the entire
find-expression is enclosed in parentheses with -iterate appended. If the
entire find-expression is omitted, it defaults to -iterate.

The modernish -ask primary asks confirmation of the user. The text of the
prompt may be specified in one optional argument (which cannot start with -
or be equal to ! or (). Any occurrences of the characters {} within the
prompt text are replaced with the current pathname. If not specified, the
default prompt is: "{}"? If the answer is affirmative (y or Y in the
POSIX locale), -ask yields true, otherwise false. This can be used to make
any part of the expression conditional upon user input, and (unlike commands in
the shell loop body) is capable of influencing directory traversal mid-run.

The standard -exec and -ok primaries are integrated into the main shell
environment. When used with LOOP find, they can call a shell builtin command
or your own shell function directly in the main shell (no subshell). Its exit
status is used in the find expression as a true/false value capable of
influencing directory traversal (for example, when combined with -prune),
just as if it were an external command -exec’ed with the standard utility.

Some familiar, easy-to-use but non-standard find operands from GNU and/or
BSD may be used with LOOP find on all systems. Before invoking the find
utility, modernish translates them internally to portable equivalents.
The following expression operands are made portable:

  • The -or, -and and -not operators: same as -o, -a, !.
  • The -true and -false primaries, which always yield true/false.
  • The BSD-style -depth n primary, e.g. -depth +4 yields true on depth
    greater than 4 (minimum 5), -depth -4 yields true on depth less than 4
    (maximum 3), and -depth 4 yields true on a depth of exactly 4.
  • The GNU-style -mindepth and -maxdepth global options.
    Unlike BSD -depth, these GNU-isms are pseudo-primaries that
    always yield true and affect the entire LOOP find operation.

Expression primaries that write output (-print and friends) may be used for
debugging or logging the loop. Their output is redirected to standard error.

Picking a find utility

Upon initialisation, the var/loop/find module searches for a POSIX-compliant
find utility under various names in $DEFPATH and then in $PATH. To see a
trace of the full command lines of utility invocations when the loop runs, set
the _loop_DEBUG variable to any value.

For debugging or system-specific usage, it is possible to use a certain find
utility in preference to any others on the system. To do this, add an argument
to a use var/loop/find command before the first use of the loop. For example:

  • use var/loop/find bsdfind (prefer utility by this name)
  • use var/loop/find /opt/local/bin (look for a utility here first)
  • use var/loop/find /opt/local/bin/gfind (try this one first)
Compatibility mode for obsolete find utilities

Some systems come with obsolete or broken find utilities that don’t fully
support -exec ... {} + aggregating functionality as specified by POSIX.
Normally, this is a fatal error, but passing the -b/-B option to the
use command, e.g. use var/loop/find -b, enables a compatibility mode
that tolerates this defect. If no compliant find is found, then an obsolete
or broken find is used as a last resort, a warning is printed to standard
error, and the variable _loop_find_broken is set. The -B option is
equivalent to -b but does not print a warning. Loop performance may suffer as
modernish adapts to using older exec ... {} \; which is very inefficient.

Scripts using this compatibility mode should handle their logic using shell
code in the loop body as much as possible (after DO) and use only simple
find expressions (before DO), as obsolete utilities are often buggy and
breakage is likely if complex expressions or advanced features are used.

find loop usage examples

Simple example script: without the safe mode, the *.txt pattern
must be quoted to prevent it from being expanded by the shell.

  1. . modernish
  2. use var/loop
  3. LOOP find TextFile in ~/Documents -name '*.txt'
  4. DO
  5. putln "Found my text file: $TextFile"
  6. DONE

Example script with safe mode: the --glob option
expands the patterns of the in clause, but not the expression – so it
is not necessary to quote any pattern.

  1. . modernish
  2. use safe
  3. use var/loop
  4. LOOP find --glob lsProg in /*bin /*/*bin -type f -name ls*
  5. DO
  6. putln "This command may list something: $lsProg"
  7. DONE

Example use of the modernish -ask primary: ask the user if they want to
descend into each directory found. The shell loop body could skip unwanted
results, but cannot physically influence directory traversal, so skipping large
directories would take long. A find expression can prevent directory
traversal using the standard -prune primary, which can be combined with
-ask, so that unwanted directories never iterate the loop in the first place.

  1. . modernish
  2. use safe
  3. use var/loop
  4. LOOP find file in ~/Documents \
  5. -type d \( -ask 'Descend into "{}" directory?' -or -prune \) \
  6. -or -iterate
  7. DO
  8. put "File found: "
  9. ls -li $file
  10. DONE

Creating your own loop

The modernish loop construct is extensible. To define a new loop type, you
only need to define a shell function called _loopgen_type where type
is the loop type. This function, called the loop iteration generator, is
expected to output lines of text to file descriptor 8, containing properly
shell-quoted
iteration commands for the shell to run, one line per iteration.

The internal commands expanded from LOOP, DO and DONE (which are
defined as aliases) launch that loop iteration generator function in the
background with safe mode enabled, while causing
the main shell to read lines from that background process through a pipe,
evaling each line as a command before iterating the loop. As long as that
iteration command finishes with an exit status of zero, the loop keeps
iterating. If it has a nonzero exit status or if there are no more commands
to read, iteration terminates and execution continues beyond the loop.

Instead of the normal internal namespace
which is considered off-limits for modernish scripts, var/loop and its
submodules use a _loop_* internal namespace for variables, which is also
for use by user-implemented loop iteration generator functions.

The above is just the general principle. For the details, study the comments
and the code in lib/modernish/mdl/var/loop.mm and the loop generators in
lib/modernish/mdl/var/loop/*.mm.

use var/local

This module defines a new LOCALBEGINEND shell code block
construct with local variables, local positional parameters and local shell
options. The local positional parameters can be filled using safe field
splitting and pathname expansion operators similar to those in the LOOP
construct described above.

Usage: LOCAL [ localitem | operator … ] [ -- [ word … ] ] ;
BEGIN commands ; END

The commands are executed once, with the specified localitems applied.
Each localitem can be:

  • A variable name with or without a = immediately followed by a value.
    This renders that variable local to the block, initially either unsetting
    it or assigning the value, which may be empty.
  • A shell option letter immediately preceded by a - or + sign. This
    locally turns that shell option on or off, respectively. This follows the
    counterintuitive syntax of set. Long-form shell options like -o
    optionname and +o optionname are also supported. It depends on the
    shell what options are supported. Specifying a nonexistent option is a
    fatal error. Use thisshellhas to check
    for a non-POSIX option’s existence on the current shell before using it.

Modernish implements LOCAL blocks as one-time shell functions that use
the stack
to save and restore variables and settings. So the return command exits the
block, causing the global variables and settings to be restored and resuming
execution at the point immediately following END. Like any shell function, a
LOCAL block exits with the exit status of the last command executed within
it, or with the status passed on by or given as an argument to return.

The positional parameters ($@, $1, etc.) are always local to the block, but
a copy is inherited from outside the block by default. Any changes to the
positional parameters made within the block will be discarded upon exiting it.

However, if a double-dash -- argument is given in the LOCAL command line,
the positional parameters outside the block are ignored and the set of words
after -- (which may be empty) becomes the positional parameters instead.

These words can be modified prior to entering the LOCAL block using the
following operators. The safe glob and split operators are only accepted in
the safe mode. The operators are:

  • One of --split or --split=characters. This operator safely applies
    the shell’s field splitting mechanism to the words given. The simple
    --split operator applies the shell’s default field splitting by space,
    tab, and newline. If you supply one or more of your own characters to
    split by, each of these characters will be taken as a field separator if
    it is whitespace, or field terminator if it is non-whitespace. (Note that
    shells with QRK_IFSFINAL treat both whitespace and
    non-whitespace characters as separators.)
  • One of --glob or --fglob. These operators safely apply shell pathname
    expansion (globbing) to the words given. Each word is taken as a pattern,
    whether or not it contains any wildcard characters. For any resulting
    pathname that starts with - or + or is identical to ! or (, ./
    is prefixed to keep various commands from misparsing it as an option
    or operand. Non-matching patterns are treated as follows:
    • --glob: Any non-matching patterns are quietly removed.
    • --fglob: All patterns must match. Any nonexistent path terminates the
      program. Use this if your program would not work after a non-match.
  • --base=string. This operator prefixes the given string to each of the
    words, after first applying field splitting and/or pathname expansion
    if specified.
    If --glob or --fglob are given, then the string is used as a base
    directory path for pathname expansion, without expanding any wildcard
    characters in that base directory path itself.
    If such base directory can’t be entered, then if --glob was given, all
    words are removed, or if --fglob was given, the program terminates.
  • One of --slice or --slice=number. This operator divides the
    words in slices of up to number characters. The default slice size
    is 1 character, allowing for easy character-by-character processing.
    (Note that shells with WRN_MULTIBYTE will
    not slice multi-byte characters correctly.)

If multiple operators are given, their mechanisms are applied in the
following order: split, glob, base, slice.

Important var/local usage notes

  • Due to the limitations of aliases and shell reserved words, LOCAL has
    to use its own BEGINEND block instead of the shell’s dodone.
    Using the latter results in a misleading shell syntax error.
  • LOCAL blocks do not mix well with use of the shell capability
    LOCALVARS
    (shell-native functionality for local variables), especially not on shells
    with QRK_LOCALUNS or QRK_LOCALUNS2. Using both with the same variables
    causes unpredictable behaviour, depending on the shell.
  • Warning! Never use break or continue within a LOCAL block to
    resume or break from enclosing loops outside the block! Shells with
    QRK_BCDANGER allow this, preventing END from
    restoring the global settings and corrupting the stack; shells without
    this quirk will throw an error if you try this. A proper way to do what
    you want is to exit the block with a nonzero status using something like
    return 1, then append something like || break or || continue to
    END. Note that this caveat only applies when crossing BEGINEND
    boundaries. Using continue and break to continue or break loops
    entirely within the block is fine.

use var/arith

These shortcut functions are alternatives for using
let.

Arithmetic operator shortcuts

inc, dec, mult, div, mod: simple integer arithmetic shortcuts. The first
argument is a variable name. The optional second argument is an
arithmetic expression, but a sane default value is assumed (1 for inc
and dec, 2 for mult and div, 256 for mod). For instance, inc X is
equivalent to X=$((X+1)) and mult X Y-2 is equivalent to X=$((X*(Y-2))).

ndiv is like div but with correct rounding down for negative numbers.
Standard shell integer division simply chops off any digits after the
decimal point, which has the effect of rounding down for positive numbers
and rounding up for negative numbers. ndiv consistently rounds down.

Arithmetic comparison shortcuts

These have the same name as their test/[ option equivalents. Unlike
with test, the arguments are shell integer arith expressions, which can be
anything from simple numbers to complex expressions. As with $(( )),
variable names are expanded to their values even without the $.

  1. Function: Returns successfully if:
  2. eq <expr> <expr> the two expressions evaluate to the same number
  3. ne <expr> <expr> the two expressions evaluate to different numbers
  4. lt <expr> <expr> the 1st expr evaluates to a smaller number than the 2nd
  5. le <expr> <expr> the 1st expr eval's to smaller than or equal to the 2nd
  6. gt <expr> <expr> the 1st expr evaluates to a greater number than the 2nd
  7. ge <expr> <expr> the 1st expr eval's to greater than or equal to the 2nd

use var/assign

This module is provided to solve a common POSIX shell language annoyance: in a
normal shell variable assignment, only literal variable names are accepted, so
it is impossible to use a variable whose name is stored in another variable.
The only way around this is to use eval which is too difficult to use safely.
Instead, you can now use the assign command.

Usage: assign [ [ +r ] variable=value … ] | [ -r variable=variable2 … ] …

assign safely processes assignment-arguments in the same form as customarily
given to the readonly and export commands, but it only assigns values to
variables without setting any attributes. Each argument is grammatically an
ordinary shell word, so any part or all of it may result from an expansion. The
absence of a = character in any argument is a fatal error. The text preceding
the first = is taken as the variable name in which to store the value; an
invalid variable name is a fatal error. No whitespace is accepted before the
= and any whitespace after the = is part of the value to be assigned.

The -r (reference) option causes the part to the right of the = to be
taken as a second variable name variable2, and its value is assigned to
variable instead. +r turns this option back off.

Examples: Each of the lines below assigns the value ‘hello world’ to the
variable greeting.

  1. var=greeting; assign $var='hello world'
  2. var=greeting; assign "$var=hello world"
  3. tag='greeting=hello world'; assign "$tag"
  4. var=greeting; gvar=myinput; myinput='hello world'; assign -r $var=$gvar

use var/readf

readf reads arbitrary data from standard input into a variable until end
of file, converting it into a format suitable for passing to the
printf
utility. For example, readf var <foo; printf "$var" >bar will copy foo to
bar. Thus, readf allows storing both text and binary files into shell
variables in a textual format suitable for manipulation with standard shell
facilities.

All non-printable, non-ASCII characters are converted to printf octal or
one-letter escape codes, except newlines. Not encoding newline characters
allows for better processing by line-based utilities such as grep, sed,
awk, etc. However, if the file ends in a newline, that final newline is
encoded to \n to protect it from being stripped by command substitutions.

Usage: readf [ -h ] varname

The -h option disables conversion of high-byte characters (accented letters,
non-Latin scripts). Do not use for binary files; this is only guaranteed to
work for text files in an encoding compatible with the current locale.

Caveats:

  • Best for small-ish files. The encoded file is stored in memory (a shell
    variable). For a binary file, encoding in printf format typically
    about doubles the size, though it could be up to four times as large.
  • If the shell executing your program does not have printf as a builtin
    command, the external printf command will fail if the encoded file
    size exceeds the maximum length of arguments to external commands
    (getconf ARG_MAX will obtain this limit for your system). Shell builtin
    commands do not have this limit. Check for a printf builtin using
    thisshellhas if you need to be sure,
    and always harden
    printf!

use var/shellquote

This module provides an efficient, fast, safe and portable shell-quoting
algorithm for quoting arbitrary data in such a way that the quoted values are
safe to pass to the shell for parsing as string literals. This is essential
for any context where the shell must grammatically parse untrusted input,
such as when supplying arbitrary values to trap or eval.

The shell-quoting algorithm is optimised to minimise exponential growth when
quoting repeatedly. By default, it also ensures that quoted strings are
always one single printable line, making them safe for terminal output and
processing by line-oriented utilities.

shellquote

Usage: shellquote [ -f|+f|-P|+P ] varname[=value] …

The values of the variables specified by name are shell-quoted and stored
back into those variables.
Repeating a variable name will add another level of shell-quoting.
If a = plus a value (which may be empty) is appended to the varname,
that value is shell-quoted and assigned to the variable.

Options modify the algorithm for variable names following them, as follows:

  • By default, newlines and any control characters are converted into
    ${CC*}
    expansions and quoted with double quotes, ensuring that the quoted string
    consists of a single line of printable text. The -P option forces pure
    POSIX quoted strings that may span multiple lines; +P turns this back off.

  • By default, a value is only quoted if it contains characters not present
    in $SHELLSAFECHARS. The -f option forces unconditional quoting,
    disabling optimisations that may leave shell-safe characters unquoted;
    +f turns this back off.

shellquote will die if you
attempt to quote an unset variable (because there is no value to quote).

shellquoteparams

The shellquoteparams command shell-quotes the current positional
parameters in place using the default quoting method of shellquote. No
options are supported and any attempt to add arguments results in a syntax
error.

use var/stack

Modules that extend the stack.

use var/stack/extra

This module contains stack query and maintenance functions.

If you only need one or two of these functions, they can also be loaded as
individual submodules of var/stack/extra.

For the four functions below, item can be:

  • a valid portable variable name
  • a short-form shell option: dash plus letter
  • a long-form shell option: -o followed by an option name (two arguments)
  • --trap=SIGNAME to refer to the trap stack for the indicated signal
    (as set by pushtrap from var/stack/trap)

stackempty [ --key=value ] [ --force ] item: Tests if the stack
for an item is empty. Returns status 0 if it is, 1 if it is not. The key
feature works as in pop: by default, a key
mismatch is considered equivalent to an empty stack. If --force is given,
this function ignores keys altogether.

clearstack [ --key=value ] [ --force ] item [ item … ]:
Clears one or more stacks, discarding all items on it.
If (part of) the stack is keyed or a --key is given, only clears until a
key mismatch is encountered. The --force option overrides this and always
clears the entire stack (be careful, e.g. don’t use within
LOCALBEGINEND).
Returns status 0 on success, 1 if that stack was already empty, 2 if
there was nothing to clear due to a key mismatch.

stacksize [ --silent | --quiet ] item: Leaves the size of a stack in
the REPLY variable and, if option --silent or --quiet is not given,
writes it to standard output.
The size of the complete stack is returned, even if some values are keyed.

printstack [ --quote ] item: Outputs a stack’s content.
Option --quote shell-quotes each stack value before printing it, allowing
for parsing multi-line or otherwise complicated values.
Column 1 to 7 of the output contain the number of the item (down to 0).
If the item is set, column 8 and 9 contain a colon and a space, and
if the value is non-empty or quoted, column 10 and up contain the value.
Sets of values that were pushed with a key are started with a special
line containing --- key:value. A subsequent set pushed with no key is
started with a line containing --- (key off).
Returns status 0 on success, 1 if that stack is empty.

use var/stack/trap

This module provides pushtrap and poptrap. These functions integrate
with the main modernish stack
to make traps stack-based, so that each
program component or library module can set its own trap commands without
interfering with others.

This module also provides a new
DIE pseudosignal
that allows pushing traps to execute when
die
is called.

Note an important difference between the trap stack and stacks for variables
and shell options: pushing traps does not save them for restoring later, but
adds them alongside other traps on the same signal. All pushed traps are
active at the same time and are executed from last-pushed to first-pushed
when the respective signal is triggered. Traps cannot be pushed and popped
using push and pop but use dedicated commands as follows.

Usage:

  • pushtrap [ --key=value ] [ --nosubshell ] [ -- ] command sigspec [ sigspec … ]
  • poptrap [ --key=value ] [ -R ] [ -- ] sigspec [ sigspec … ]

pushtrap works like regular trap, with the following exceptions:

  • Adds traps for a signal without overwriting previous ones.
  • An invalid signal is a fatal error. When using non-standard signals, check if
    thisshellhas --sig=yoursignal
    before using it.
  • Unlike regular traps, a stack-based trap does not cause a signal to be
    ignored. Setting one will cause it to be executed upon the shell receiving
    that signal, but after the stack traps complete execution, modernish re-sends
    the signal to the main shell, causing it to behave as if no trap were set
    (unless a regular POSIX trap is also active).
    Thus, pushtrap does not accept an empty command as it would be pointless.
  • Each stack trap is executed in a new
    subshell
    to keep it from interfering
    with others. This means a stack trap cannot change variables except within
    its own environment, and exit will only exit the trap and not the program.
    The --nosubshell option overrides this behaviour, causing that particular
    trap to be executed in the main shell environment instead. This is not
    recommended if not absolutely needed, as you have to be extra careful to
    avoid exiting the shell or otherwise interfere with other stack traps.
    This option cannot be used with
    DIE traps.
  • Each stack trap is executed with $? initially set to the exit status
    that was active at the time the signal was triggered.
  • Stack traps do not have access to the positional parameters.
  • pushtrap stores current $IFS (field splitting) and $- (shell options)
    along with the pushed trap. Within the subshell executing each stack trap,
    modernish restores IFS and the shell options f (noglob), u
    (nounset) and C (noclobber) to the values in effect during the
    corresponding pushtrap. This is to avoid unexpected effects in case a trap
    is triggered while temporary settings are in effect.
    The --nosubshell option disables this functionality for the trap pushed.
  • The --key option applies the keying functionality inherited from
    plain push to the trap stack.
    It works the same way, so the description is not repeated here.

poptrap takes just signal names or numbers as arguments. It takes the
last-pushed trap for each signal off the stack. By default, it discards
the trap commands. If the -R option is given, it stores commands to
restore those traps into the REPLY variable, in a format suitable for
re-entry into the shell. Again, the --key option works as in
plain pop.

With the sole exception of
DIE traps,
all stack-based traps, like native shell traps, are reset upon entering a
subshell.
However, commands for printing traps will print the traps for
the parent shell, until another trap, pushtrap or poptrap command is
invoked, at which point all memory of the parent shell’s traps is erased.

Trap stack compatibility considerations

Modernish tries hard to avoid incompatibilities with existing trap practice.
To that end, it intercepts the regular POSIX trap command using an alias,
reimplementing and interfacing it with the shell’s builtin trap facility
so that plain old regular traps play nicely with the trap stack. You should
not notice any changes in the POSIX trap command’s behaviour, except for
the following:

  • The regular trap command does not overwrite stack traps (but still
    overwrites existing regular traps).
  • Unlike zsh’s native trap command, signal names are case insensitive.
  • Unlike dash’s native trap command, signal names may have the SIG prefix;
    that prefix is quietly accepted and discarded.
  • Setting an empty trap action to ignore a signal only works fully (passing
    the ignoring on to child processes) if there are no stack traps associated
    with the signal; otherwise, an empty trap action merely suppresses the
    signal’s default action for the current process – e.g., after executing
    the stack traps, it keeps the shell from exiting.
  • The trap command with no arguments, which prints the traps that are set
    in a format suitable for re-entry into the shell, now also prints the
    stack traps as pushtrap commands. (bash users might notice the SIG
    prefix is not included in the signal names written.)
  • The bash/yash-style -p option, including its yash-style --print
    equivalent, is now supported on all shells. If further arguments are
    given after that option, they are taken as signal specifications and
    only the commands to recreate the traps for those signals are printed.
  • Saving the traps to a variable using command substitution (as in:
    var=$(trap)) now works on every
    shell supported by modernish,
    including (d)ash, mksh and zsh which don’t support this natively.
  • To reset (unset) a trap, the modernish trap command accepts both
    valid POSIX syntax
    and legacy bash/(d)ash/zsh syntax, like trap INT to unset a SIGINT
    trap (which only works if the trap command is given exactly one
    argument). Note that this is for compatibility with existing scripts only.
  • Bypassing the trap alias to set a trap using the shell builtin command
    will cause an inconsistent state. This may be repaired with a simple trap
    command; as modernish prints the traps, it will quietly detect ones it
    doesn’t yet know about and make them work nicely with the trap stack.

POSIX traps for each signal are always executed after that signal’s stack-based
traps; this means they should not rely on modernish modules that use the trap
stack to clean up after themselves on exit, as those cleanups would already
have been done.

The new DIE pseudosignal

The var/stack/trap module adds new DIE pseudosignal whose traps are
executed upon invoking die.
This allows for emergency cleanup operations upon fatal program failure,
as EXIT traps cannot be executed after die is invoked.

  • On non-interactive shells (as well as
    subshells
    of interactive shells), DIE is its own pseudosignal with its own trap
    stack and POSIX trap. In order to kill the malfunctioning program as quickly
    as possible (hopefully before it has a chance to delete all your data), die
    doesn’t wait for those traps to complete before killing the program. Instead,
    it executes each DIE trap simultaneously as a background job, then gathers
    the process IDs of the main shell and all its subprocesses, sending SIGKILL
    to all of them except any DIE trap processes. Unlike other traps, DIE
    traps are inherited by and survive in subshell processes, and pushtrap may
    add to them within the subshell. Whatever shell process invokes die will
    fork all DIE trap actions before being SIGKILLed itself. (Note that any
    DIE traps pushed or set within a subshell will still be forgotten upon
    exiting the subshell.)
  • On an interactive shell (not including its
    subshells),
    DIE is simply an alias for INT, and INT traps
    (both POSIX and stack) are cleared out after executing them once. This is
    because die uses SIGINT for command interruption on interactive shells, and
    it would not make sense to execute emergency cleanup commands repeatedly. As
    a side effect of this special handling, INT traps on interactive shells do
    not have access to the positional parameters and cannot return from functions.

use var/string

String comparison and manipulation functions.

use var/string/touplow

toupper and tolower: convert case in variables.

Usage:

  • toupper varname [ varname … ]
  • tolower varname [ varname … ]

Arguments are taken as variable names (note: they should be given without
the $) and case is converted in the contents of the specified variables,
without reading input or writing output.

toupper and tolower try hard to use the fastest available method on the
particular shell your program is running on. They use built-in shell
functionality where available and working correctly, otherwise they fall back
on running an external utility.

Which external utility is chosen depends on whether the current locale uses
the Unicode UTF-8 character set or not. For non-UTF-8 locales, modernish
assumes the POSIX/C locale and tr is always used. For UTF-8 locales,
modernish tries hard to find a way to correctly convert case even for
non-Latin alphabets. A few shells have this functionality built in with
typeset. The rest need an external utility. Modernish initialisation
tries tr, awk, GNU awk and GNU sed before giving up and setting
the variable MSH_2UP2LOW_NOUTF8. If isset MSH_2UP2LOW_NOUTF8, it
means modernish is in a UTF-8 locale but has not found a way to convert
case for non-ASCII characters, so toupper and tolower will convert
only ASCII characters and leave any other characters in the string alone.

use var/string/trim

trim: strip whitespace from the beginning and end of a variable’s value.
Whitespace is defined by the [:space:] character class. In the POSIX
locale, this is tab, newline, vertical tab, form feed, carriage return, and
space, but in other locales it may be different.
(On shells with BUG_NOCHCLASS,
$WHITESPACE
is used to define whitespace instead.) Optionally, a string of literal
characters can be provided in the second argument. Any characters appearing
in that string will then be trimmed instead of whitespace.
Usage: trim varname [ characters ]

use var/string/replacein

replacein: Replace leading, -trailing or -all occurrences of a string by
another string in a variable.
Usage: replacein [ -t | -a ] varname oldstring newstring

use var/string/append

append and prepend: Append or prepend zero or more strings to a
variable, separated by a string of zero or more characters, avoiding the
hairy problem of dangling separators.
Usage: append|prepend [ --sep=separator ] [ -Q ] varname [ string … ]
If the separator is not specified, it defaults to a space character.
If the -Q option is given, each string is
shell-quoted
before appending or prepending.

use var/unexport

The unexport function clears the “export” bit of a variable, conserving
its value, and/or assigns values to variables without setting the export
bit. This works even if set -a (allexport) is active, allowing an “export
all variables, except these” way of working.

Usage is like export, with the caveat that variable assignment arguments
containing non-shell-safe characters or expansions must be quoted as
appropriate, unlike in some specific shell implementations of export.
(To get rid of that headache, use safe.)

Unlike export, unexport does not work for read-only variables.

use var/genoptparser

As the getopts builtin is not portable when used in functions, this module
provides a command that generates modernish code to parse options for your
shell function in a standards-compliant manner. The generated parser
supports short-form (one-character) options which can be stacked/combined.

Usage:
generateoptionparser [ -o ] [ -f func ] [ -v varprefix ]
[ -n options ] [ -a options ] [ varname ]

  • -o: Write parser to standard output.
  • -f: Function name to prefix to error messages. Default: none.
  • -v: Variable name prefix for options. Default: opt_.
  • -n: String of options that do not take arguments.
  • -a: String of options that require arguments.
  • varname: Store parser in specified variable. Default: REPLY.

At least one of -n and -a is required. All other arguments are optional.
Option characters must be valid components of portable variable names, so
they must be ASCII upper- or lowercase letters, digits, or the underscore.

generateoptionparser stores the generated parser code in a variable: either
REPLY or the varname specified as the first non-option argument. This makes
it possible to generate and use the parser on the fly with a command like
eval "$REPLY" immediately following the generateoptionparser invocation.

For better efficiency and readability, it will often be preferable to insert
the option parser code directly into your shell function instead. The -o
option writes the parser code to standard output, so it can be redirected to
a file, inserted into your editor, etc.

Parsed options are shifted out of the positional parameters while setting or
unsetting corresponding variables, until a non-option argument, a --
end-of-options delimiter argument, or the end of arguments is encountered.
Unlike with getopts, no additional shift command is required.

Each specified option gets a corresponding variable with a name consisting
of the varprefix (default: opt_) plus the option character. If an option
is not passed to your function, the parser unsets its variable; otherwise it
sets it to either the empty value or its option-argument if it requires one.
Thus, your function can check if any option x was given using
isset,
for example, if isset opt_x; then

use sys/base

Some very common and essential utilities are not specified by POSIX, differ
widely among systems, and are not always available. For instance, the
which and readlink commands have incompatible options on various GNU and
BSD variants and may be absent on other Unix-like systems. The sys/base
module provides a complete re-implementation of such non-standard but basic
utilities, written as modernish shell functions. Using the modernish version
of these utilities can help a script to be fully portable. These versions
also have various enhancements over the GNU and BSD originals, some of which
are made possible by their integration into the modernish shell environment.

use sys/base/mktemp

A cross-platform shell implementation of mktemp that aims to be just as
safe as native mktemp(1) implementations, while avoiding the problem of
having various mutually incompatible versions and adding several unique
features of its own.

Creates one or more unique temporary files, directories or named pipes,
atomically (i.e. avoiding race conditions) and with safe permissions.
The path name(s) are stored in REPLY and optionally written to stdout.

Usage: mktemp [ -dFsQCt ] [ template … ]

  • -d: Create a directory instead of a regular file.
  • -F: Create a FIFO (named pipe) instead of a regular file.
  • -s: Silent. Store output in $REPLY, don’t write any output or message.
  • -Q: Shell-quote each unit of output. Separate by spaces, not newlines.
  • -C: Automated cleanup.
    1. [Pushes a trap](#user-content-the-trap-stack)
    2. to remove the files
    3. on exit. On an interactive shell, that's all this option does. On a
    4. non-interactive shell, the following applies: Clean up on receiving
    5. `SIGPIPE` and `SIGTERM` as well. On receiving `SIGINT`, clean up if the
    6. option was given at least twice, otherwise notify the user of files
    7. left. On the invocation of
    8. [`die`](#user-content-reliable-emergency-halt),
    9. clean up if the option was given at least three times, otherwise notify
    10. the user of files left.
  • -t: Prefix one temporary files directory to all the templates:
    1. `$XDG_RUNTIME_DIR` or `$TMPDIR` if set, or `/tmp`. The *template*s
    2. may not contain any slashes. If the template has neither any trailing
    3. `X`es nor a trailing dot, a dot is added before the random suffix.

The template defaults to “/tmp/temp.”. An suffix of random shell-safe ASCII
characters is added to the template to create the file. For compatibility with
other mktemp implementations, any optional trailing X characters in the
template are removed. The length of the suffix will be equal to the amount of
Xes removed, or 10, whichever is more. The longer the random suffix, the
higher the security of using mktemp in a shared directory such as tmp.

Since /tmp is a world-writable directory shared by other users, for best
security it is recommended to create a private subdirectory using mktemp -d
and work within that.

Option -C cannot be used without option -s when in a
subshell.
Modernish will detect this and treat it as a
fatal error. The reason is that a typical command substitution like
tmpfile=$(mktemp -C)
is incompatible with auto-cleanup, as the cleanup EXIT trap would be
triggered not upon exiting the program but upon exiting the command
substitution subshell that just ran mktemp, thereby immediately undoing
the creation of the file. Instead, do something like:
mktemp -sC; tmpfile=$REPLY

This module depends on the trap stack to do auto-cleanup (the -C option),
so it will automatically use var/stack/trap on initialisation.

readlink reads the target of a symbolic link, robustly handling strange
filenames such as those containing newline characters. It stores the result
in the REPLY variable and optionally writes it on standard output.

Usage: readlink [ -nsefmQ ] path [ path … ]

  • -n: If writing output, don’t add a trailing newline.
    This does not remove the separating newlines if multiple paths are given.
  • -s: Silent operation: don’t write output, only store it in REPLY.
  • -e, -f, -m: Canonicalise. Convert each path found into a canonical
    and absolute path that can be used starting from any working directory.
    Relative paths are resolved starting from the present working directory.
    Double slashes are removed. Any special pathname components
    . and .. are resolved. All symlinks encountered are
    followed, but a path does not need to contain any symlinks. UNC network
    paths (as on Cygwin) are supported. These options differ as follows:
    • -e: All pathname components must exist to produce a result.
    • -f: All but the last pathname component must exist to produce a result.
    • -m: No pathname component needs to exist; this always produces a result.
      Nonexistent pathname components are simulated as regular directories.
  • -Q: Shell-quote each unit of output. Separate by spaces instead
    of newlines. This generates a list of arguments in shell syntax,
    guaranteed to be suitable for safe parsing by the shell, even if the
    resulting pathnames should contain strange characters such as spaces or
    newlines and other control characters.

The exit status of readlink is 0 on success and 1 if the path either is
not a symlink, or could not be canonicalised according to the option given.

use sys/base/rev

rev copies the specified files to the standard output, reversing the order
of characters in every line. If no files are specified, the standard input
is read.

Usage: like rev on Linux and BSD, which is like cat except that - is
a filename and does not denote standard input. No options are supported.

use sys/base/seq

A cross-platform implementation of seq that is more powerful and versatile
than native GNU and BSD seq(1) implementations. The core is written in
bc, the POSIX arbitrary-precision calculator language. That means this
seq inherits the capacity to handle numbers with a precision and size only
limited by computer memory, as well as the ability to handle input numbers
in any base from 1 to 16 and produce output in any base 1 and up.

Usage: seq [ -w ] [ -L ] [ -f format ] [ -s string ] [ -S scale ]
[ -B base ] [ -b base ] [ first [ incr ] ] last

seq prints a sequence of arbitrary-precision floating point numbers, one
per line, from first (default 1), to as near last as possible, in increments of
incr (default 1). If first is larger than last, the default incr is -1.
An incr of zero is treated as a fatal error.

  • -w: Equalise width by padding with leading zeros. The longest of the
    first, incr or last arguments is taken as the length that each
    output number should be padded to.
  • -L: Use the current locale’s radix point in the output instead of the
    1. full stop (`.`).
  • -f: printf-style floating-point format. The format string is passed on
    1. (with an added `\n`) to `awk`'s builtin `printf` function. Because
    2. of that, the `-f` option can only be used if the output base is 10.
    3. Note that `awk`'s floating point precision is limited, so very
    4. large or long numbers will be rounded.
  • -s: Instead of writing one number per line, write all numbers on one
    1. line separated by *string* and terminated by a newline character.
  • -S: Explicitly set the scale (number of digits after the
    1. [radix point](https://en.wikipedia.org/wiki/Radix_point)).
    Defaults to the largest number of digits after the radix point
    among the first, incr or last arguments.
  • -B: Set input and output base from 1 to 16. Defaults to 10.
  • -b: Set arbitrary output base from 1. Defaults to input base.
    1. See the `bc`(1) manual for more information on the output format
    2. for bases greater than 16.

The -S, -B and -b options take shell integer numbers as operands. This
means a leading 0X or 0x denotes a hexadecimal number and a leading 0
denotes an octal number.

For portability reasons, modernish seq uses a full stop (.) for the
radix point, regardless of the
system locale. This applies both to command arguments and to output.
The -L option causes seq to use the current locale’s radix point
character for output only.

Differences with GNU and BSD seq

The -S, -B and -b options are modernish innovations.
The -w, -f and -s options are inspired by GNU and BSD seq.
The following differences apply:

  • Like GNU and unlike BSD, the separator specified by the -s option
    is not appended to the final number and there is no -t option to
    add a terminator character.
  • Like GNU and unlike BSD, the -s option-argument is taken as literal
    characters and is not parsed for backslash escape codes like \n.
  • Unlike GNU and like BSD, the output radix point defaults to a full stop,
    regardless of the current locale.
  • Unlike GNU and like BSD, if incr is not specified,
    it defaults to -1 if first > last, 1 otherwise.
    For example, seq 5 1 counts backwards from 5 to 1, and
    specifying seq 5 -1 1 as with GNU is not needed.
  • Unlike GNU and like BSD, an incr of zero is not accepted.
    To output the same number or string infinite times, use
    yes instead.
  • Unlike both GNU and BSD, the -f option accepts any format specifiers
    accepted by awk‘s printf() function.

The sys/base/seq module depends on, and automatically loads,
var/string/touplow.

use sys/base/shuf

Shuffle lines of text.
A portable reimplementation of a commonly used GNU utility.

Usage:

  • shuf [ -n max ] [ -r rfile ] file
  • shuf [ -n max ] [ -r rfile ] -i low-high
  • shuf [ -n max ] [ -r rfile ] -e argument

By default, shuf reads lines of text from standard input, or from file
(the file - signifies standard input).
It writes the input lines to standard output in random order.

  • -i: Use sequence of non-negative integers low through high as input.
  • -e: Instead of reading input, use the arguments as lines of input.
  • -n: Output a maximum of max lines.
  • -r: Use rfile as the source of random bytes. Defaults to /dev/urandom.

Differences with GNU shuf:

  • Long option names are not supported.
  • The -o/--output-file option is not supported; use output redirection.
    Safely shuffling files in-place is not supported; use a temporary file.
  • --random-source=file is changed to -r file.
  • The -z/--zero-terminated option is not supported.

use sys/base/tac

tac (the reverse of cat) is a cross-platform reimplementation of the GNU
tac utility, with some extra features.

Usage: tac [ -rbBP ] [ -S separator ] file [ file … ]

tac outputs the files in reverse order of lines/records.
If file is - or is not given, tac reads from standard input.

  • -s: Specify the record (line) separator. Default: linefeed.
  • -r: Interpret the record separator as an
    extended regular expression.
    This allows using separators that may vary. Each separator is preserved
    in the output as it is in the input.
  • -b: Assume the separator comes before each record in the input, and also
    output the separator before each record. Cannot be combined with -B.
  • -B: Assume the separator comes after each record in the input, but output
    the separator before each record. Cannot be combined with -b.
  • -P: Paragraph mode: output text last paragraph first. Input paragraphs
    are separated from each other by at least two linefeeds. Cannot be combined
    with any other option.

Differences between GNU tac and modernish tac:

  • The -B and -P options were added.
  • The -r option interprets the record separator as an extended regular
    expression. This is an incompatibility with GNU tac unless expressions
    are used that are valid as both basic and extended regular expressions.
  • In UTF-8 locales, multi-byte characters are recognised and reversed
    correctly.

use sys/base/which

The modernish which utility finds external programs and reports their
absolute paths, offering several unique options for reporting, formatting
and robust processing. The default operation is similar to GNU which.

Usage: which [ -apqsnQ1f ] [ -P number ] program [ program … ]

By default, which finds the first available path to each given program.
If program is itself a path name (contains a slash), only that path’s base
directory is searched; if it is a simple command name, the current $PATH
is searched. Any relative paths found are converted to absolute paths.
Symbolic links are not followed. The first path found for each program is
written to standard output (one per line), and a warning is written to
standard error for every program not found. The exit status is 0 (success)
if all programs were found, 1 otherwise.

which also leaves its output in the REPLY variable. This may be useful
if you run which in the main shell environment. The REPLY value will
not survive a command substitution
subshell
as in ls_path=$(which ls).

The following options modify the default behaviour described above:

  • -a: List all programs that can be found in the directories searched,
    instead of just the first one. This is useful for finding duplicate
    commands that the shell would not normally find when searching its $PATH.
  • -p: Search in $DEFPATH
    (the default standard utility PATH provided by the operating system)
    instead of in the user’s $PATH, which is vulnerable to manipulation.
  • -q: Be quiet: suppress all warnings.
  • -s: Silent operation: don’t write output, only store it in the REPLY
    variable. Suppress warnings except, if you run which -s in a subshell,
    a warning that the REPLY variable will not survive the subshell.
  • -n: When writing to standard output, do not write a final newline.
  • -Q: Shell-quote each unit of output. Separate by spaces instead
    of newlines. This generates a one-line list of arguments in shell syntax,
    guaranteed to be suitable for safe parsing by the shell, even if the
    resulting pathnames should contain strange characters such as spaces or
    newlines and other control characters.
  • -1 (one): Output the results for at most one of the arguments in
    descending order of preference: once a search succeeds, ignore
    the rest. Suppress warnings except a subshell warning for -s.
    This is useful for finding a command that can exist under
    several names, for example:
    which -f -1 gnutar gtar tar
    This option modifies which’s exit status behaviour: which -1
    returns successfully if at least one command was found.
  • -f: Throw a fatal error
    in cases where which would otherwise return status 1 (non-success).
  • -P: Strip the indicated number of pathname elements from the output,
    starting from the right.
    -P1: strip /program;
    -P2: strip /*/program,
    etc. This is useful for determining the installation root directory for
    an installed package.
  • --help: Show brief usage information.

use sys/base/yes

yes very quickly outputs infinite lines of text, each consisting of its
space-separated arguments, until terminated by a signal or by a failure to
write output. If no argument is given, the default line is y. No options
are supported.

This infinite-output command is useful for piping into commands that need an
indefinite input data stream, or to automate a command requiring interactive
confirmation.

Modernish yes is like GNU yes in that it outputs all its arguments,
whereas BSD yes only outputs the first. It can output multiple gigabytes
per second on modern systems.

use sys/cmd

Modules in this category contain functions for enhancing the invocation of
commands.

use sys/cmd/extern

extern is like command but always runs an external command, without
having to know or determine its location. This provides an easy way to
bypass a builtin, alias or function. It does the same $PATH search
the shell normally does when running an external command. For instance, to
guarantee running external printf just do: extern printf ...

Usage: extern [ -p ] [ -v ] [ -u varname … ]
[ varname=value … ] command [ argument … ]

  • -p: The command, as well as any commands it further invokes, are searched in
    $DEFPATH
    (the default standard utility PATH provided by the operating system)
    instead of in the user’s $PATH, which is vulnerable to manipulation.
    • extern -p is much more reliable than the shell’s builtin
      command -p
      because: (a) many existing shell installations use a wrong search path for
      command -p; (b) command -p does not export the default PATH, so
      something like command -p sudo cp foo /bin/bar searches only sudo in
      the secure default path and not cp.
  • -v: don’t execute command but show the full path name of the command that
    would have been executed. Any extra arguments are taken as more command
    paths to show, one per line. extern exits with status 0 if all the commands
    were found, 1 otherwise. This option can be combined with -p.
  • -u: Temporary export override. Unset the given variable in the
    environment of the command executed, even if it is currently exported. Can
    be specified multiple times.
  • varname=value assignment-arguments: These variables/values are
    temporarily exported to the environment during the execution of the command.
    • This is provided because assignments preceding extern cause unwanted,
      shell-dependent side effects, as extern is a shell function. Be
      sure to provide assignment-arguments following extern instead.
    • Assignment-arguments after a -- end-of-options delimiter are not parsed;
      this allows commands containing a = sign to be executed.

use sys/cmd/harden

The harden function allows implementing emergency halt on error
for any external commands and shell builtin utilities. It is
modernish’s replacement for set -e a.k.a. set -o errexit (which is
fundamentally
flawed,
not supported and will break the library).
It depends on, and auto-loads, the sys/cmd/extern module.

harden sets a shell function with the same name as the command hardened,
so it can be used transparently. This function hardens the given command by
checking its exit status against values indicating error or system failure.
Exactly what exit statuses signify an error or failure depends on the
command in question; this should be looked up in the
POSIX specification
(under “Utilities”) or in the command’s man page or other documentation.

If the command fails, the function installed by harden calls die, so it
will reliably halt program execution, even if the failure occurred within a
subshell.

Usage:

harden [ -f funcname ] [ -[cSpXtPE] ] [ -e testexpr ]
[ var=value … ] [ -u var … ] command_name_or_path
[ command_argument … ]

The -f option hardens the command as the shell function funcname instead
of defaulting to command_name_or_path as the function name. (If the latter
is a path, that’s always an invalid function name, so the use of -f is
mandatory.) If command_name_or_path is itself a shell function, that
function is bypassed and the builtin or external command by that name is
hardened instead. If no such command is found, harden dies with the message
that hardening shell functions is not supported. (Instead, you should invoke
die directly from your shell function upon detecting a fatal error.)

The -c option causes command_name_or_path to be hardened and run
immediately instead of setting a shell function for later use. This option
is meant for commands that run once; it is not efficient for repeated use.
It cannot be used together with the -f option.

The -S option allows specifying several possible names/paths for a
command. It causes the command_name_or_path to be split by comma and
interpreted as multiple names or paths to search. The first name or path
found is used. Requires -f.

The -e option, which defaults to >0, indicates the exit statuses
corresponding to a fatal error. It depends on the command what these are;
consult the POSIX spec and the manual pages.
The status test expression testexpr, argument
to the -e option, is like a shell arithmetic
expression, with the binary operators == != <= >= < > turned
into unary operators referring to the exit status of the command in
question. Assignment operators are disallowed. Everything else is the same,
including && (logical and) and || (logical or) and parentheses.
Note that the expression needs to be quoted as the characters used in it
clash with shell grammar tokens.

The -X option causes harden to always search for and harden an external
command, even if a built-in command by that name exists.

The -E option causes the hardening function to consider it a fatal error
if the hardened command writes anything to the standard error stream. This
option allows hardening commands (such as
bc)
where you can’t rely on the exit status to detect an error. The text written
to standard error is passed on as part of the error message printed by
die. Note that:

  • Intercepting standard error necessitates that the command be executed from a
    subshell.
    This means any builtins or shell functions hardened with -E cannot
    influence the calling shell (e.g. harden -E cd renders cd ineffective).
  • -E does not disable exit status checks; by default, any exit status greater
    than zero is still considered a fatal error as well. If your command does not
    even reliably return a 0 status upon success, then you may want to add -e '>125', limiting the exit status check to reserved values indicating errors
    launching the command and signals caught.

The -p option causes harden to search for commands using the
system default path (as obtained with getconf PATH) as opposed to the
current $PATH. This ensures that you’re using a known-good external
command that came with your operating system. By default, the system-default
PATH search only applies to the command itself, and not to any commands that
the command may search for in turn. But if the -p option is specified at
least twice, the command is run in a subshell with PATH exported as the
default path, which is equivalent to adding a PATH=$DEFPATH assignment
argument (see below).

Examples:

  1. harden make # simple check for status > 0
  2. harden -f tar '/usr/local/bin/gnutar' # id.; be sure to use this 'tar' version
  3. harden -e '> 1' grep # for grep, status > 1 means error
  4. harden -e '==1 || >2' gzip # 1 and >2 are errors, but 2 isn't (see manual)
Important note on variable assignments

As far as the shell is concerned, hardened commands are shell functions and
not external or builtin commands. This essentially changes one behaviour of
the shell: variable assignments preceding the command will not be local to
the command as usual, but will persist after the command completes.
(POSIX technically makes that behaviour
optional
but all current shells behave the same in POSIX mode.)

For example, this means that something like

  1. harden -e '>1' grep
  2. # [...]
  3. LC_ALL=C grep regex some_ascii_file.txt

should never be done, because the meant-to-be-temporary LC_ALL locale
assignment will persist and is likely to cause problems further on.

To solve this problem, harden supports adding these assignments as
part of the hardening command, so instead of the above you do:

  1. harden -e '>1' LC_ALL=C grep
  2. # [...]
  3. grep regex some_ascii_file.txt

With the -u option, harden also supports unsetting variables for the
duration of a command, e.g.:

  1. harden -e '>1' -u LC_ALL grep

The -u option may be specified multiple times.
It causes the hardened command to be invoked from a
subshell
with the specified variables unset.

Hardening while allowing for broken pipes

If you’re piping a command’s output into another command that may close
the pipe before the first command is finished, you can use the -P option
to allow for this:

  1. harden -e '==1 || >2' -P gzip # also tolerate gzip being killed by SIGPIPE
  2. gzip -dc file.txt.gz | head -n 10 # show first 10 lines of decompressed file

head will close the pipe of gzip input after ten lines; the operating
system kernel then kills gzip with the PIPE signal before it’s finished,
causing a particular exit status that is greater than 128. This exit status
would normally make harden kill your entire program, which in the example
above is clearly not the desired behaviour. If the exit status caused by a
broken pipe were known, you could specifically allow for that exit status in
the status expression. The trouble is that this exit status varies depending
on the shell and the operating system. The -p option was made to solve
this problem: it automatically detects and whitelists the correct exit
status corresponding to SIGPIPE termination on the current system.

Tolerating SIGPIPE is an option and not the default, because in many
contexts it may be entirely unexpected and a symptom of a severe error if a
command is killed by a broken pipe. It is up to the programmer to decide
which commands should expect SIGPIPE and which shouldn’t.

Tip: It could happen that the same command should expect SIGPIPE in one
context but not another. You can create two hardened versions of the same
command, one that tolerates SIGPIPE and one that doesn’t. For example:

  1. harden -f hardGrep -e '>1' grep # hardGrep does not tolerate being aborted
  2. harden -f pipeGrep -e '>1' -P grep # pipeGrep for use in pipes that may break

Note: If SIGPIPE was set to ignore by the process invoking the current
shell, the -p option has no effect, because no process or subprocess of
the current shell can ever be killed by SIGPIPE. However, this may cause
various other problems and you may want to refuse to let your program run
under that condition.
thisshellhas WRN_NOSIGPIPE can help
you easily detect that condition so your program can make a decision. See
the WRN_NOSIGPIPE description for more information.

Tracing the execution of hardened commands

The -t option will trace command output. Each execution of a command
hardened with -t causes the command line to be output to standard
error, in the following format:

  1. [functionname]> commandline

where functionname is the name of the shell function used to harden the
command and commandline is the actual command executed. The
commandline is properly shell-quoted in a format suitable for re-entry
into the shell; however, command lines longer than 512 bytes will be
truncated and the unquoted string (TRUNCATED) will be appended to the
trace. If standard error is on a terminal that supports ANSI colours,
the tracing output will be colourised.

The -t option was added to harden because the commands that you harden
are often the same ones you would be particularly interested in tracing. The
advantage of using harden -t over the shell’s builtin tracing facility
(set -x or set -o xtrace) is that the output is a lot less noisy,
especially when using a shell library such as modernish.

Note: Internally, -t uses the shell file descriptor 9, redirecting it to
standard error (using exec 9>&2). This allows tracing to continue to work
normally even for commands that redirect standard error to a file (which is
another enhancement over set -x on most shells). However, this does mean
harden -t conflicts with any other use of the file descriptor 9 in your
shell program.

If file descriptor 9 is already open before harden is called, harden
does not attempt to override this. This means tracing may be redirected
elsewhere by doing something like exec 9>trace.out before calling
harden. (Note that redirecting FD 9 on the harden command itself will
not work as it won’t survive the run of the command.)

Simple tracing of commands

Sometimes you just want to trace the execution of some specific commands as
in harden -t (see above) without actually hardening them against command
errors; you might prefer to do your own error handling. trace makes this
easy. It is modernish’s replacement or complement for set -x a.k.a. set -o xtrace.
Unlike harden -t, it can also trace shell functions.

Usage 1: trace [ -f funcname ] [ -[cSpXE] ]
[ var=value … ] [ -u var … ] command_name_or_path
[ command_argument … ]

For non-function commands, trace acts as a shortcut for
harden -t -P -e '>125 && !=255' command_name_or_path.
Any further options and arguments are passed on to harden as given. The
result is that the indicated command is automatically traced upon execution.
A bonus is that you still get minimal hardening against fatal system errors.
Errors in the traced command itself are ignored, but your program is
immediately halted with an informative error message if the traced command:

  • cannot be found (exit status 127);
  • was found but cannot be executed (exit status 126);
  • was killed by a signal other than SIGPIPE (exit status > 128, except
    the shell-specific exit status for SIGPIPE, and except 255 which is
    used by some utilities, such as ssh and rsync, to return an error).

Note: The caveat for command-local variable assignments for harden also
applies to trace. See
Important note on variable assignments
above.

Usage 2: [ #! ] trace -f funcname

If no further arguments are given, trace -f will trace the shell
function funcname without applying further hardening (except against
nonexistence). trace -f can be used to trace the execution of modernish
library functions as well as your own script’s functions. The trace output
for shell functions shows an extra () following the function name.

Internally, this involves setting an alias under the function’s name, so
the limitations of the shell’s alias expansion mechanism apply: only
function calls that the shell had not yet parsed before calling trace -f
will be traced. So you should use trace -f at the beginning of your
script, before defining your own functions. To facilitate this, trace -f
does not check that the function funcname exists while setting up
tracing, but only when attempting to execute the traced function.

In portable-form
modernish scripts, trace -f should be used as a hashbang command to be
compatible with alias expansion on all shells. Only the trace -f form
may be used that way. For example:

  1. #! /usr/bin/env modernish
  2. #! use safe -k
  3. #! use sys/cmd/harden
  4. #! trace -f push
  5. #! trace -f pop
  6. ...your program begins here...

use sys/cmd/mapr

mapr (map records) is an alternative to xargs that shares features with the
mapfile command in bash 4.x. It is fully integrated into your script’s main
shell environment, so it can call your shell functions as well as builtin and
external utilities.
It depends on, and auto-loads, the sys/cmd/procsubst module.

Usage: mapr [ -d delimiter | -P ] [ -s count ] [ -n number ]
[ -m length ] [ -c quantum ] callback

mapr reads delimited records from the standard input, invoking the specified
callback command once or repeatedly as needed, with batches of input records
as arguments. The callback may consist of multiple arguments. By default, an
input record is one line of text.

Options:

  • -d delimiter: Use the single character delimiter to delimit input records,
    instead of the newline character. A NUL (0) character and multi-byte
    characters are not supported.
  • -P: Paragraph mode. Input records are delimited by sequences consisting of
    a newline plus one or more blank lines, and leading or trailing blank lines
    will not result in empty records at the beginning or end of the input. Cannot
    be used together with -d.
  • -s count: Skip and discard the first count records read.
  • -n number: Stop processing after passing a total of number records to
    invocation(s) of callback. If -n is not supplied or number is 0, all
    records are passed, except those skipped using -s.
  • -m length: Set the maximum argument length in bytes of each callback
    command call, including the callback command argument(s) and the current
    batch of up to quantum input records. The length of each argument is
    increased by 1 to account for the terminating null byte. The default
    maximum length depends on constraints set by the operating system for
    invoking external commands. If length is 0, this limit is disabled.
  • -c quantum: Pass at most quantum arguments at a time to each call to
    callback. If -c is not supplied or if quantum is 0, the number of
    arguments per invocation is not limited except by -m; whichever limit is
    reached first applies.

Arguments:

  • callback: Call the callback command with the collected arguments each
    time quantum lines are read. The callback command may be a shell function or
    any other kind of command, and is executed from the same shell environment
    that invoked mapr. If the callback command exits or returns with status
    255 or is interrupted by the SIGPIPE signal, mapr will not process any
    further batches but immediately exit with the status of the callback
    command. If it exits with another exit status 126 or greater, a
    fatal error
    is thrown. Otherwise, mapr exits with the status of the last-executed
    callback command.
  • argument: If there are extra arguments supplied on the mapr command line,
    they will be added before the collected arguments on each invocation on the
    callback command.
Differences from mapfile

mapr was inspired by the bash 4.x builtin command mapfile a.k.a.
readarray, and uses similar options, but there are important differences.

  • mapr passes all the records as arguments to the callback command.
  • mapr does not support assigning records directly to an array. Instead,
    all handling is done through the callback command (which could be a shell
    function that assigns its arguments to an array.)
  • The callback command is specified directly instead of with a -C option,
    and it may consist of several arguments (as with xargs).
  • The record separator itself is never included in the arguments passed
    to the callback command (so there is no -t option to remove it).
  • mapr supports paragraph mode.
  • If the callback command exits with status 255, processing is aborted.
Differences from xargs

mapr shares important characteristics with
xargs
while avoiding its myriad pitfalls.

  • Instead of being an external utility, mapr is fully integrated into the
    shell. The callback command can be a shell function or builtin, which can
    directly modify the shell environment.
  • mapr is line-oriented by default, so it is safe to use for input
    arguments that contain spaces or tabs.
  • mapr does not parse or modify the input arguments in any way, e.g. it
    does not process and remove quotes from them like xargs does.
  • mapr supports paragraph mode.

use sys/cmd/procsubst

This module provides a portable
process substitution
construct, the advantage being that this is not limited to bash, ksh or zsh
but works on all POSIX shells capable of running modernish. It is not
possible for modernish to introduce the original ksh syntax into other
shells. Instead, this module provides a % command for use within a
$(command substitution).

The % command takes one simple command as its arguments, executes it in
the background, and writes a file name from which to read its output. So
if % is used within a command substitution as intended, that file name
is passed on to the invoking command as an argument.

The % command supports one option, -o. If that option is given, then it is
expected that, instead of reading input, the invoking command writes output to
the file name passed on to it, so that the command invoked by % -o can read
that data from its standard input.


















Example syntax comparison:
ksh/bash/zshmodernish

diff -u <(ls) <(ls -a)

diff -u $(% ls) $(% ls -a)

IFS=’ ‘ read -r user vsz args < <(ps -o ‘user= vsz= args=’ -p $$)

IFS=’ ‘ read -r user vsz args < $(% ps -o ‘user= vsz= args=’ -p $$)

{ some commands; } > >(tee stdout.log) 2> >(tee stderr.log)

(both tee commands write terminal output to standard output)

{ some commands; } > $(% -o tee stdout.log) 2> $(% -o tee stderr.log)

(both tee commands write terminal output to standard error)

Unlike the bash/ksh/zsh version, modernish process substitution only works
with simple commands. This includes shell function calls, but not aliases or
anything involving shell grammar or reserved words (such as redirections,
pipelines, loops, etc.). To use such complex commands, enclose them in a shell
function and call that function from the process substitution.

Also note that anything that a command invoked by the % -o writes to its
standard output is redirected to standard error. The main shell environment’s
standard output is not available because the command substitution subsumes it.

use sys/cmd/source

The source command sources a dot script like the . command, but
additionally supports passing arguments to sourced scripts like you would
pass them to a function. It mostly mimics the behaviour of the source
command built in to bash and zsh.

If a filename without a directory path is given, then, unlike the .
command, source looks for the dot script in the current directory by
default, as well as searching $PATH.

It is a fatal error to attempt to source a directory, a file with no read
permission, or a nonexistent file.

use sys/dir

Functions for working with directories.

use sys/dir/countfiles

countfiles: Count the files in a directory using nothing but shell
functionality, so without external commands. (It’s amazing how many pitfalls
this has, so a library function is needed to do it robustly.)

Usage: countfiles [ -s ] directory [ globpattern … ]

Count the number of files in a directory, storing the number in REPLY
and (unless -s is given) printing it to standard output.
If any globpatterns are given, only count the files matching them.

use sys/dir/mkcd

The mkcd function makes one or more directories, then, upon success,
change into the last-mentioned one. mkcd inherits mkdir‘s usage, so
options depend on your system’s mkdir; only the
POSIX options
are guaranteed.
When mkcd is run from a script, it uses cd -P to change the working
directory, resolving any symlinks in the present working directory path.

use sys/term

Utilities for working with the terminal.

use sys/term/putr

This module provides commands to efficiently output a string repeatedly.

Usage:

  • putr [ number | - ] string
  • putrln [ number | - ] string

Output the string number times. When using putrln, add a newline at
the end.

If a - is given instead of a number, then the total length of the output
is the line length of the terminal divided by the length of the string,
rounded down.

Note that, unlike with put and putln, only a single string
argument is accepted.

Example: putrln - '=' prints a full terminal line of equals signs.

use sys/term/readkey

readkey: read a single character from the keyboard without echoing back to
the terminal. Buffering is done so that multiple waiting characters are read
one at a time.

Usage: readkey [ -E ERE ] [ -t timeout ] [ -r ] [ varname ]

-E: Only accept characters that match the extended regular expression
ERE (the type of RE used by grep -E/egrep). readkey will silently
ignore input not matching the ERE and wait for input matching it.

-t: Specify a timeout in seconds (one significant digit after the
decimal point). After the timeout expires, no character is read and
readkey returns status 1.

-r: Raw mode. Disables INTR (Ctrl+C), QUIT, and SUSP (Ctrl+Z) processing
as well as translation of carriage return (13) to linefeed (10).

The character read is stored into the variable referenced by varname,
which defaults to REPLY if not specified.

This module depends on the trap stack to save and restore the terminal state
if the program is stopped while reading a key, so it will automatically
use var/stack/trap on initialisation.


Appendix A: List of shell cap IDs

This appendix lists all the shell
capabilities,
quirks, and
bugs
that modernish can detect in the current shell, so that modernish scripts
can easily query the results of these tests and decide what to do. Certain
problematic system conditions
are also detected this way and listed here.

The all-caps IDs below are all usable with the
thisshellhas
function. This makes it easy for a cross-platform modernish script to
be aware of relevant conditions and decide what to do.

Each detection test has its own little test script in the
lib/modernish/cap directory. These tests are executed on demand, the
first time the capability or bug in question is queried using
thisshellhas. See README.md in that directory for further information.
The test scripts also document themselves in the comments.

Capabilities

Modernish currently identifies and supports the following non-standard
shell capabilities:

  • ADDASSIGN: Add a string to a variable using additive assignment,
    e.g. VAR+=string
  • ANONFUNC: zsh anonymous functions (basically the native zsh equivalent
    of modernish’s var/local module)
  • ARITHCMD: standalone arithmetic evaluation using a command like
    ((expression)).
  • ARITHFOR: ksh93/C-style arithmetic for loops of the form
    for ((exp1;exp2;exp3)) docommands; done.
  • ARITHPP: support for the ++ and -- unary operators in shell arithmetic.
  • CESCQUOT: Quoting with C-style escapes, like $'\n' for newline.
  • DBLBRACKET: The ksh88-style [[ double-bracket command ]],
    implemented as a reserved word, integrated into the main shell grammar,
    and with a different grammar applying within the double brackets.
    (ksh93, mksh, bash, zsh, yash >= 2.48)
  • DBLBRACKETERE: DBLBRACKET plus the =~ binary operator to match a
    string against an extended regular expression.
  • DBLBRACKETV: DBLBRACKET plus the -v unary operator to test if a
    variable is set. Named variables only. (Testing positional parameters
    (like [[ -v 1 ]]) does not work on bash or ksh93; check $# instead.)
  • DOTARG: Dot scripts support arguments.
  • HERESTR: Here-strings, an abbreviated kind of here-document.
  • KSH88FUNC: define ksh88-style shell functions with the function keyword,
    supporting dynamically scoped local variables with the typeset builtin.
    (mksh, bash, zsh, yash, et al)
  • KSH93FUNC: the same, but with static scoping for local variables. (ksh93 only)
    See Q28 at the ksh93 FAQ for an explanation
    of the difference.
  • KSHARRAY: ksh93-style arrays. Supported on bash, zsh (under emulate sh),
    mksh, and ksh93.
  • LEPIPEMAIN: execute last element of a pipe in the main shell, so that
    things like somecommand | read somevariable work. (zsh, AT&T ksh,
    bash 4.2+)
  • LINENO: the $LINENO variable contains the current shell script line
    number.
  • LOCALVARS: the local command creates dynamically scoped local variables
    within functions defined using standard POSIX syntax.
  • NONFORKSUBSH: as a performance optimisation,
    subshells are
    implemented without forking a new process, so they share a PID with the main
    shell. (AT&T ksh93; it has many bugs
    related to this, but there’s a nice workaround: ulimit -t unlimited forces
    a subshell to fork, making those bugs disappear! See also BUG_FNSUBSH.)
  • PRINTFV: The shell’s printf builtin has the -v option to print to a variable,
    which avoids forking a command substitution subshell.
  • PROCREDIR: the shell natively supports <(process redirection),
    a special kind of redirection that connects standard input (or standard
    output) to a background process running your command(s).
    This exists on yash.
    Note this is not combined with a redirection like < <(command).
    Contrast with bash/ksh/zsh’s PROCSUBST where this <(syntax)
    substitutes a file name.
  • PROCSUBST: the shell natively supports <(process substitution),
    a special kind of command substitution that substitutes a file name,
    connecting it to a background process running your command(s).
    This exists on ksh93 and zsh.
    (Bash has it too, but its POSIX mode turns it off, so modernish can’t use it.)
    Note this is usually combined with a redirection, like < <(command).
    Contrast this with yash’s PROCREDIR where the same <(syntax)
    is itself a redirection.
  • PSREPLACE: Search and replace strings in variables using special parameter
    substitutions with a syntax vaguely resembling sed.
  • RANDOM: the $RANDOM pseudorandom generator.
    Modernish seeds it if detected. The variable is then set it to read-only
    whether the generator is detected or not, in order to block it from losing
    its special properties by being unset or overwritten, and to stop it being
    used if there is no generator. This is because some of modernish depends
    on RANDOM either working properly or being unset.
    (The use case for non-readonly RANDOM is setting a known seed to get
    reproducible pseudorandom sequences. To get that in a modernish script,
    use awk‘s srand(yourseed) and int(rand()*32768).)
  • ROFUNC: Set functions to read-only with readonly -f. (bash, yash)
  • TESTERE: The regular test/[ builtin command supports the =~ binary
    operator to match a string against an extended regular expression.
  • TESTO: The test/[ builtin supports the -o unary operator to check if
    a shell option is set.
  • TRAPPRSUBSH: The ability to obtain a list of the current shell’s native
    traps from a command substitution subshell, for example: var=$(trap),
    as long as no new traps have been set within that command substitution.
    Note that the var/stack/trap module transparently reimplements this
    feature on shells without this native capability.
  • TRAPZERR: This feature ID is detected if the ERR trap is an alias for
    the ZERR trap. According to the zsh manual, this is the case for zsh on
    most systems, i.e. those that don’t have a SIGERR signal. (The
    trap stack
    uses this feature test.)
  • VARPREFIX: Expansions of type ${!prefix@} and ${!prefix*} yield
    all names of set variables beginning with prefix in the same way and with
    the same quoting effects as $@ and $*, respectively.
    This includes the name prefix itself, unless the shell has BUG_VARPREFIX.
    (bash; AT&T ksh93)

Quirks

Modernish currently identifies and supports the following shell quirks:

  • QRK_32BIT: mksh: the shell only has 32-bit arithmetic. Since every modern
    system these days supports 64-bit long integers even on 32-bit kernels, we
    can now count this as a quirk.
  • QRK_ANDORBG: On zsh, the & operator takes the last simple command
    as the background job and not an entire AND-OR list (if any).
    In other words, a && b || c & is interpreted as
    a && b || { c & } and not { a && b || c; } &.
  • QRK_ARITHEMPT: In yash, with POSIX mode turned off, a set but empty
    variable yields an empty string when used in an arithmetic expression,
    instead of 0. For example, foo=''; echo $((foo)) outputs an empty line.
  • QRK_ARITHWHSP: In yash
    and FreeBSD /bin/sh, trailing whitespace from variables is not trimmed in arithmetic
    expansion, causing the shell to exit with an ‘invalid number’ error. POSIX is silent
    on the issue. The modernish isint function (to determine if a string is a valid
    integer number in shell syntax) is QRK_ARITHWHSP compatible, tolerating only
    leading whitespace.
  • QRK_BCDANGER: break and continue can affect non-enclosing loops,
    even across shell function barriers (zsh, Busybox ash; older versions
    of bash, dash and yash). (This is especially dangerous when using
    var/local
    which internally uses a temporary shell function to try to protect against
    breaking out of the block without restoring global parameters and settings.)
  • QRK_EMPTPPFLD: Unquoted $@ and $* do not discard empty fields.
    POSIX says
    for both unquoted $@ and unquoted $* that empty positional parameters
    may be discarded from the expansion. AFAIK, just one shell (yash)
    doesn’t.
  • QRK_EMPTPPWRD: POSIX says
    that empty "$@" generates zero fields but empty '' or "" or
    "$emptyvariable" generates one empty field. But it leaves unspecified
    whether something like "$@$emptyvariable" generates zero fields or one
    field. Zsh, pdksh/mksh and (d)ash generate one field, as seems logical.
    But bash, AT&T ksh and yash generate zero fields, which we consider a
    quirk. (See also BUG_PP_01)
  • QRK_EVALNOOPT: eval does not parse options, not even --, which makes it
    incompatible with other shells: on the one hand, (d)ash does not accept
    eval -- "$command" whereas on other shells this is necessary if the command
    starts with a -, or the command would be interpreted as an option to eval.
    A simple workaround is to prefix arbitrary commands with a space.
    Both situations are POSIX compliant,
    but since they are incompatible without a workaround,the minority situation
    is labeled here as a QuiRK.
  • QRK_EXECFNBI: In pdksh and zsh, exec looks up shell functions and
    builtins before external commands, and if it finds one it does the
    equivalent of running the function or builtin followed by exit. This
    is probably a bug in POSIX terms; exec is supposed to launch a
    program that overlays the current shell, implying the program launched by
    exec is always external to the shell. However, since the
    POSIX language
    is rather
    l@opengroup.org/msg01437.html">vague and possibly incorrect,
    this is labeled as a shell quirk instead of a shell bug.
  • QRK_FNRDREXIT: On FreeBSD sh and NetBSD sh, an error in a redirection
    attached to a function call causes the shell to exit. This affects
    redirections of all functions, including modernish library functions
    as well as functions set by harden.
  • QRK_GLOBDOTS: Pathname expansion of .* matches the pseudonames . and
    .. so that, e.g., cp -pr .* backup/ cannot be used to copy all your
    hidden files. (bash \< 5.2, (d)ash, AT&T ksh != 93u+m, yash)
  • QRK_HDPARQUOT: Double quotes within certain parameter substitutions in
    here-documents aren’t removed (FreeBSD sh; bosh). For instance, if
    var is set, ${var+"x"} in a here-document yields "x", not x.
    l@opengroup.org/msg01626.html">POSIX considers it undefined
    to use double quotes there, so they should be avoided for a script to be
    fully POSIX compatible.
    (Note this quirk does not apply for substitutions that remove patterns,
    such as ${var#"$x"} and ${var%"$x"}; those are defined by POSIX
    and double quotes are fine to use.)
    (Note 2: single quotes produce widely varying behaviour and should never
    be used within any form of parameter substitution in a here-document.)
  • QRK_IFSFINAL: in field splitting, a final non-whitespace IFS delimiter
    character is counted as an empty field (yash \< 2.42, zsh, pdksh). This is a QRK
    (quirk), not a BUG, because POSIX is ambiguous on this.
  • QRK_LOCALINH: On a shell with LOCALVARS, local variables, when declared
    without assigning a value, inherit the state of their global namesake, if
    any. (dash, FreeBSD sh)
  • QRK_LOCALSET: On a shell with LOCALVARS, local variables are immediately set
    to the empty value upon being declared, instead of being initially without
    a value. (zsh)
  • QRK_LOCALSET2: Like QRK_LOCALSET, but only if the variable by the
    same name in the global/parent scope is unset. If the global variable is
    set, then the local variable starts out unset. (bash 2 and 3)
  • QRK_LOCALUNS: On a shell with LOCALVARS, local variables lose their local
    status when unset. Since the variable name reverts to global, this means that
    unset will not necessarily unset the variable! (yash, pdksh/mksh. Note:
    this is actually a behaviour of typeset, to which modernish aliases local
    on these shells.)
  • QRK_LOCALUNS2: This is a more treacherous version of QRK_LOCALUNS that
    is unique to bash. The unset command works as expected when used on a local
    variable in the same scope that variable was declared in, however, it
    makes local variables global again if they are unset in a subscope of that
    local scope, such as a function called by the function where it is local.
    (Note: since QRK_LOCALUNS2 is a special case of QRK_LOCALUNS, modernish
    will not detect both.)
    On bash >= 5.0, modernish eliminates this quirk upon initialisation
    by setting shopt -s localvar_unset.
  • QRK_OPTABBR: Long-form shell option names can be abbreviated down to a
    length where the abbreviation is not redundant with other long-form option
    names. (ksh93, yash)
  • QRK_OPTCASE: Long-form shell option names are case-insensitive. (yash, zsh)
  • QRK_OPTDASH: Long-form shell option names ignore the -. (ksh93, yash)
  • QRK_OPTNOPRFX: Long-form shell option names use a dynamic no prefix for
    all options (including POSIX ones). For instance, glob is the opposite
    of noglob, and nonotify is the opposite of notify. (ksh93, yash, zsh)
  • QRK_OPTULINE: Long-form shell option names ignore the _. (yash, zsh)
  • QRK_PPIPEMAIN: On zsh \<= 5.5.1, in all elements of a pipeline, parameter
    expansions are evaluated in the current environment (with any changes they
    make surviving the pipeline), though the commands themselves of every
    element but the last are executed in a subshell. For instance, given unset
    or empty v, in the pipeline cmd1 ${v:=foo} | cmd2, the assignment to
    v survives, though cmd1 itself is executed in a subshell.
  • QRK_SPCBIXP: Variable assignments directly preceding
    special builtin commands
    are exported, and persist as exported. (bash; yash)
  • QRK_UNSETF: If ‘unset’ is invoked without any option flag (-v or -f), and
    no variable by the given name exists but a function does, the shell unsets
    the function. (bash)

Bugs

Modernish currently identifies and supports the following shell bugs:

  • BUG_ALIASCSHD: A spurious syntax error occurs if a here-document
    containing a command substitution is used within two aliases that define a
    block. The syntax error reporting a missing } occurs because the alias
    terminating the block is not correctly expanded. This bug affects
    var/local and
    var/loop
    as they define blocks this way. Workaround: make a shell function that
    handles the here-document and call that shell function from the block/loop
    instead. Bug found on: dash \<= 0.5.10.2; Busybox ash \<= 1.31.1.
  • BUG_ALIASPOSX: Running any command “foo” in POSIX mode like
    POSIXLY_CORRECT=y foo will globally disable alias expansion on a
    non-interactive shell (killing modernish), unless POSIX mode is globally
    enabled. Bug found on bash 4.2 through 5.0.
    Note: on bash versions with this bug, modernish automatically enables
    POSIX mode to avoid triggering it. A side effect is that process substitution
    (PROCSUBST) isn’t available.
  • BUG_ARITHINIT: Using unset or empty variables (dash <= 0.5.9.1 on macOS)
    or unset variables (yash <= 2.44) in arithmetic expressions causes the
    shell to exit, instead of taking them as a value of zero.
  • BUG_ARITHLNNO: The shell supports $LINENO, but the variable is
    considered unset in arithmetic contexts, like $(( LINENO > 0 )).
    This makes it error out under set -u and default to zero otherwise.
    Workaround: use shell expansion like $(( $LINENO > 0 )). (FreeBSD sh)
  • BUG_ARITHNAN: The case-insensitive special floating point constants
    Inf and NaN are recognised in arithmetic evaluation, overriding any
    variables with the names Inf, NaN, INF, nan, etc. (AT&T ksh93;
    zsh 5.6 - 5.8)
  • BUG_ARITHSPLT: Unquoted $((arithmetic expressions)) are not
    subject to field splitting as expected. (zsh, mksh<=R49)
  • BUG_ASGNCC01: if IFS contains a $CC01 (^A) character, unquoted expansions in
    shell assignments discard that character (if present). Found on: bash 4.0-4.3
  • BUG_ASGNLOCAL: If you have a function-local variable (see LOCALVARS)
    with the same name as a global variable, and within the function you run a
    shell builtin command preceded by a temporary variable assignment, then
    the global variable is unset. (zsh \<= 5.7.1)
  • BUG_BRACQUOT: shell quoting within bracket patterns has no effect (zsh < 5.3;
    ksh93) This bug means the - retains it special meaning of ‘character
    range’, and an initial ! (and, on some shells, ^) retains the meaning of
    negation, even in quoted strings within bracket patterns, including quoted
    variables.
  • BUG_CASEEMPT: An empty case list on a single line, as in case x in esac,
    is a syntax error. (AT&T ksh93)
  • BUG_CASELIT: If a case pattern doesn’t match as a pattern, it’s tried
    again as a literal string, even if the pattern isn’t quoted. This can
    result in false positives when a pattern doesn’t match itself, like with
    bracket patterns. This contravenes POSIX and breaks use cases such as
    input validation. (AT&T ksh93) Note: modernish match works around this.
  • BUG_CASEPAREN: case patterns without an opening parenthesis
    (i.e. with only an unbalanced closing parenthesis) are misparsed
    as a syntax error within command substitutions of the form $( ).
    Workaround: include the opening parenthesis. Found on: bash 3.2
  • BUG_CASESTAT: The case conditional construct prematurely clobbers the
    exit status $?. (found in zsh \< 5.3, Busybox ash \<= 1.25.0, dash \<
    0.5.9.1)
  • BUG_CDNOLOGIC: The cd built-in command lacks the POSIX-specified -L
    option and does not support logical traversal; it always acts as if the -P
    (physical traversal) option was passed. This also renders the -L option
    to modernish chdir ineffective. (NetBSD sh)
  • BUG_CDPCANON: cd -P (and hence also modernish
    chdir) does not correctly canonicalise/normalise a
    directory path that starts with three or more slashses; it reduces these to
    two initial slashes instead of one in $PWD. (zsh \<= 5.7.1)
  • BUG_CMDEXEC: using command exec (to open a file descriptor, using
    command to avoid exiting the shell on failure) within a function causes
    bash \<= 4.0 to fail to restore the global positional parameters when
    leaving that function. It also renders bash \<=4.0 prone to hanging.
  • BUG_CMDEXPAN: if the command command results from an expansion, it acts
    like command -v, showing the path of the command instead of executing it.
    For example: v=command; "$v" ls or set -- command ls; "$@" don’t work.
    (AT&T ksh93)
  • BUG_CMDOPTEXP: the command builtin does not recognise options if they
    result from expansions. For instance, you cannot conditionally store -p
    in a variable like defaultpath and then do command $defaultpath someCommand. (found in zsh \< 5.3)
  • BUG_CMDPV: command -pv does not find builtins ({pd,m}ksh), does not
    accept the -p and -v options together (zsh \< 5.3) or ignores the -p
    option altogether (bash 3.2); in any case, it’s not usable to find commands
    in the default system PATH.
  • BUG_CMDSETPP: using command set -- has no effect; it does not set the
    positional parameters. For compat, use set without command. (mksh \<= R57)
  • BUG_CMDSPASGN: preceding a
    special builtin
    with command does not stop preceding invocation-local variable
    assignments from becoming global. (AT&T ksh93)
  • BUG_CMDSPEXIT: preceding a
    special builtin
    (other than eval, exec, return or exit)
    with command does not always stop
    it from exiting the shell if the builtin encounters error.
    (bash \<= 4.0; zsh \<= 5.2; mksh; ksh93)
  • BUG_CSNHDBKSL: Backslashes within non-expanding here-documents within
    command substitutions are incorrectly expanded to perform newline joining,
    as opposed to left intact. (bash \<= 4.4)
  • BUG_CSUBBTQUOT: A spurious syntax erorr is thrown when using double
    quotes within a backtick-style command substitution that is itself within
    double quotes. (AT&T ksh93 \< 93u+m 2022-05-20)
  • BUG_CSUBLNCONT: Backslash line continuation is not processed correctly
    within modern-form $(command substitutions).
    (AT&T ksh93 \< 93u+m 2022-05-21)
  • BUG_CSUBRMLF: A bug affecting the stripping of final linefeeds from
    command substitutions. If a command substitution does not produce any
    output to substitute and is concatenated in a string or here-document,
    then the shell removes any concurrent linefeeds occurring directly before
    the command substitution in that string or here-document.
    (dash \<= 0.5.10.2, Busybox ash, FreeBSD sh)
  • BUG_CSUBSTDO: If standard output (file descriptor 1) is closed before
    entering a command substitution, and any other file descriptors are
    redirected within the command substitution, commands such as echo or
    putln will not work within the command substitution, acting as if standard
    output is still closed (AT&T ksh93 \<= AJM 93u+ 2012-08-01). Workaround: see
    cap/BUG_CSUBSTDO.t.
  • BUG_DEVTTY: the shell can’t redirect output to /dev/tty if
    set -C/set -o noclobber (part of safe mode)
    is active. Workaround: use >| /dev/tty instead of > /dev/tty.
    Bug found on: bash on certain systems (at least QNX and Interix).
  • BUG_DOLRCSUB: parsing problem where, inside a command substitution of
    the form $(...), the sequence $$'...' is treated as $'...' (i.e. as
    a use of CESCQUOT), and $$"..." as $"..." (bash-specific translatable
    string). (Found in bash up to 4.4)
  • BUG_DQGLOB: globbing is not properly deactivated within
    double-quoted strings. Within double quotes, a * or ? immediately
    following a backslash is interpreted as a globbing character. This applies
    to both pathname expansion and pattern matching in case. Found in: dash.
    (The bug is not triggered when using modernish
    match.)
  • BUG_EXPORTUNS: Setting the export flag on an otherwise unset variable
    causes a set and empty environment variable to be exported, though the
    variable continues to be considered unset within the current shell.
    (FreeBSD sh \< 13.0)
  • BUG_FNSUBSH: Function definitions within subshells (including command
    substitutions) are ignored if a function by the same name exists in the
    main shell, so the wrong function is executed. unset -f is also silently
    ignored. ksh93 (all current versions as of November 2018) has this bug.
    It only applies to non-forked subshells. See NONFORKSUBSH.
  • BUG_FORLOCAL: a for loop in a function makes the iteration variable
    local to the function, so it won’t survive the execution of the function.
    Found on: yash. This is intentional and documented behaviour on yash in
    non-POSIX mode, but in POSIX terms it’s a bug, so we mark it as such.
  • BUG_GETOPTSMA: The getopts builtin leaves a : instead of a ? in
    the specified option variable if a given option that requires an argument
    lacks an argument, and the option string does not start with a :. (zsh)
  • BUG_HDOCBKSL: Line continuation using backslashes in expanding
    here-documents is handled incorrectly. (zsh up to 5.4.2)
  • BUG_HDOCMASK: Here-documents (and here-strings, see HERESTRING) use
    temporary files. This fails if the current umask setting disallows the
    user to read, so the here-document can’t read from the shell’s temporary
    file. Workaround: ensure user-readable umask when using here-documents.
    (bash, mksh, zsh)
  • BUG_IFSCC01PP: If IFS contains a $CC01 (^A) control character, the
    expansion "$@" (even quoted) is gravely corrupted. Since many modernish
    functions use this to loop through the positional parameters, this breaks
    the library.
    (Found in bash \< 4.4)
  • BUG_IFSGLOBC: In glob pattern matching (such as in case and [[), if a
    wildcard character is part of IFS, it is matched literally instead of as a
    matching character. This applies to glob characters *, ?, [ and ].
    Since nearly all modernish functions use case for argument validation and
    other purposes, nearly every modernish function breaks on shells with this
    bug if IFS contains any of these three characters!

    (Found in bash \< 4.4)
  • BUG_IFSGLOBP: In pathname expansion (filename globbing), if a
    wildcard character is part of IFS, it is matched literally instead of as a
    matching character. This applies to glob characters *, ?, [ and ].
    (Bug found in bash, all versions up to at least 4.4)
  • BUG_IFSGLOBS: in glob pattern matching (as in case or parameter
    substitution with # and %), if IFS starts with ? or * and the
    "$*" parameter expansion inserts any IFS separator characters, those
    characters are erroneously interpreted as wildcards when quoted “$*” is
    used as the glob pattern. (AT&T ksh93)
  • BUG_IFSISSET: AT&T ksh93 (2011/2012 versions): ${IFS+s} always yields s
    even if IFS is unset. This applies to IFS only.
  • BUG_ISSETLOOP: AT&T ksh93: Expansions like ${var+set}
    remain static when used within a for, while or
    until loop; the expansions don’t change along with the state of the
    variable, so they cannot be used to check whether a variable is set
    within a loop if the state of that variable may change
    in the course of the loop.
  • BUG_KBGPID: AT&T ksh93: If a single command ending in & (i.e. a background
    job) is enclosed in a { braces; } block with an I/O redirection, the $!
    special parameter is not set to the background job’s PID.
  • BUG_KUNSETIFS: AT&T ksh93: Unsetting IFS fails to activate default field
    splitting if the following conditions are met: 1. IFS is set and empty
    (i.e. split is disabled) in the main shell, and at least one expansion has
    been processed with that setting; 2. The code is currently executing in a
    non-forked subshell (see NONFORKSUBSH).
  • BUG_LNNONEG: $LINENO becomes wildly inaccurate, even negative, when
    dotting/sourcing scripts. Bug found on: dash with LINENO support compiled in.
  • BUG_LOOPRET1: If a return command is given with a status argument within
    the set of conditional commands in a while or until loop (i.e., between
    while/until and do), the status argument is ignored and the function
    returns with status 0 instead of the specified status.
    Found on: dash \<= 0.5.8; zsh \<= 5.2
  • BUG_LOOPRET2: If a return command is given without a status argument
    within the set of conditional commands in a while or until loop (i.e.,
    between while/until and do), the exit status passed down from the
    previous command is ignored and the function returns with status 0 instead.
    Found on: dash \<= 0.5.10.2; AT&T ksh93; zsh \<= 5.2
  • BUG_LOOPRET3: If a return command is given within the set of conditional
    commands in a while or until loop (i.e., between while/until and
    do), and the return status (either the status argument to return or the
    exit status passed down from the previous command by return without a
    status argument) is non-zero, and the conditional command list itself yields
    false (for while) or true (for until), and the whole construct is
    executed in a dot script sourced from another script, then too many levels of
    loop are broken out of, causing program flow corruption or premature exit.
    Found on: zsh \<= 5.7.1
  • BUG_MULTIBIFS: We’re on a UTF-8 locale and the shell supports UTF-8
    characters in general (i.e. we don’t have WRN_MULTIBYTE) – however, using
    multi-byte characters as IFS field delimiters still doesn’t work. For
    example, "$*" joins positional parameters on the first byte of IFS
    instead of the first character. (ksh93, mksh, FreeBSD sh, Busybox ash)
  • BUG_NOCHCLASS: POSIX-mandated character [:classes:] within bracket
    [expressions] are not supported in glob patterns. (mksh)
  • BUG_NOEXPRO: Cannot export read-only variables. (zsh <= 5.7.1 in sh mode)
  • BUG_OPTNOLOG: on dash, setting -o nolog causes $- to wreak havoc:
    trying to expand $- silently aborts parsing of an entire argument,
    so e.g. "one,$-,two" yields "one,". (Same applies to -o debug.)
  • BUG_PP_01: POSIX says
    that empty "$@" generates zero fields but empty '' or "" or
    "$emptyvariable" generates one empty field. This means concatenating
    "$@" with one or more other, separately quoted, empty strings (like
    "$@""$emptyvariable") should still produce one empty field. But on
    bash 3.x, this erroneously produces zero fields. (See also QRK_EMPTPPWRD)
  • BUG_PP_02: Like BUG_PP_01, but with unquoted $@ and only
    with "$emptyvariable"$@, not $@"$emptyvariable".
    (mksh \<= R50f; FreeBSD sh \<= 10.3)
  • BUG_PP_03: When IFS is unset or empty (zsh 5.3.x) or empty (mksh \<= R50),
    assigning var=$* only assigns the first field, failing to join and
    discarding the rest of the fields. Workaround: var="$*"
    (POSIX leaves var=$@, etc. undefined, so we don’t test for those.)
  • BUG_PP_03A: When IFS is unset, assignments like var=$*
    incorrectly remove leading and trailing spaces (but not tabs or
    newlines) from the result. Workaround: quote the expansion. Found on:
    bash 4.3 and 4.4.
  • BUG_PP_03B: When IFS is unset, assignments like var=${var+$*},
    etc. incorrectly remove leading and trailing spaces (but not tabs or
    newlines) from the result. Workaround: quote the expansion. Found on:
    bash 4.3 and 4.4.
  • BUG_PP_03C: When IFS is unset, assigning var=${var-$*} only assigns
    the first field, failing to join and discarding the rest of the fields.
    (zsh 5.3, 5.3.1) Workaround: var=${var-"$*"}
  • BUG_PP_04A: Like BUG_PP_03A, but for conditional assignments within
    parameter substitutions, as in : ${var=$*} or : ${var:=$*}.
    Workaround: quote either $* within the expansion or the expansion
    itself. (bash \<= 4.4)
  • BUG_PP_04E: When assigning the positional parameters ($) to a variable
    using a conditional assignment within a parameter substitution, e.g.
    `: ${var:=$
    }, the fields are always joined and separated by spaces, except ifIFS` is set and empty. Workaround as in BUG_PP_04A.
    (bash 4.3)
  • BUG_PP_04_S: When IFS is null (empty), the result of a substitution
    like ${var=$*} is incorrectly field-split on spaces.
    The assignment itself succeeds normally.
    Found on: bash 4.2, 4.3
  • BUG_PP_05: POSIX says
    that empty $@ and $* generate zero fields, but with null IFS, empty
    unquoted $@ and $* yield one empty field. Found on: dash 0.5.9
    and 0.5.9.1; Busybox ash.
  • BUG_PP_06A: POSIX says
    that unquoted $@ and $* initially generate as many fields as there are
    positional parameters, and then (because $@ or $* is unquoted) each field is
    split further according to IFS. With this bug, the latter step is not
    done if IFS is unset (i.e. default split). Found on: zsh \< 5.4
  • BUG_PP_07: unquoted $* and $@ (including in substitutions like
    ${1+$@} or ${var-$*}) do not perform default field splitting if
    IFS is unset. Found on: zsh (up to 5.3.1) in sh mode
  • BUG_PP_07A: When IFS is unset, unquoted $* undergoes word splitting
    as if IFS=' ', and not the expected IFS=" ${CCt}${CCn}".
    Found on: bash 4.4
  • BUG_PP_08: When IFS is empty, unquoted $@ and $* do not generate
    one field for each positional parameter as expected, but instead join
    them into a single field without a separator. Found on: yash \< 2.44
    and dash \< 0.5.9 and Busybox ash \< 1.27.0
  • BUG_PP_08B: When IFS is empty, unquoted $* within a substitution (e.g.
    ${1+$*} or ${var-$*}) does not generate one field for each positional
    parameter as expected, but instead joins them into a single field without
    a separator. Found on: bash 3 and 4
  • BUG_PP_09: When IFS is non-empty but does not contain a space,
    unquoted $* within a substitution (e.g. ${1+$*} or ${var-$*}) does
    not generate one field for each positional parameter as expected,
    but instead joins them into a single field separated by spaces
    (even though, as said, IFS does not contain a space).
    Found on: bash 4.3
  • BUG_PP_10: When IFS is null (empty), assigning var=$* removes any
    $CC01 (^A) and $CC7F (DEL) characters. (bash 3, 4)
  • BUG_PP_10A: When IFS is non-empty, assigning var=$* prefixes each
    $CC01 (^A) and $CC7F (DEL) character with a $CC01 character. (bash 4.4)
  • BUG_PP_1ARG: When IFS is empty on bash <= 4.3 (i.e. field
    splitting is off), ${1+"$@"} or "${1+$@}" is counted as a single
    argument instead of each positional parameter as separate arguments.
    This also applies to prepending text only if there are positional
    parameters with something like "${1+foobar $@}".
  • BUG_PP_MDIGIT: Multiple-digit positional parameters don’t require expansion
    braces, so e.g. $10 = ${10} (dash; Busybox ash). This is classed as a bug
    because it causes a straight-up incompatibility with POSIX scripts. POSIX
    says:
    “The parameter name or symbol can be enclosed in braces, which are
    optional except for positional parameters with more than one digit […]”.
  • BUG_PP_MDLEN: For ${#x} expansions where x >= 10, only the first digit of
    the positional parameter number is considered, e.g. ${#10}, ${#12},
    ${#123} are all parsed as if they are ${#1}. Then, string parsing is
    aborted so that further characters or expansions, if any, are lost.
    Bug found in: dash 0.5.11 - 0.5.11.4 (fixed in dash 0.5.11.5)
  • BUG_PSUBASNCC: in an assignment parameter substitution of the form
    ${foo=value}, if the characters $CC01 (^A) or $CC7F (DEL) are in the
    value, all their occurrences are stripped from the expansion (although the
    assignment itself is done correctly). If the expansion is quoted, only
    $CC01 is stripped. This bug is independent of the state of IFS, except if
    IFS is null, the assignment in ${foo=$*} (unquoted) is buggy too: it
    strips $CC01 from the assigned value. (Found on bash 4.2, 4.3, 4.4)
  • BUG_PSUBBKSL1: A backslash-escaped } character within a quoted parameter
    substitution is not unescaped. (bash 3.2, dash \<= 0.5.9.1, Busybox 1.27 ash)
  • BUG_PSUBEMIFS: if IFS is empty (no split, as in safe mode), then if a
    parameter substitution of the forms ${foo-$*}, ${foo+$*}, ${foo:-$*} or
    ${foo:+$*} occurs in a command argument, the characters $CC01 (^A) or
    $CC7F (DEL) are stripped from the expanded argument. (Found on: bash 4.4)
  • BUG_PSUBEMPT: Expansions of the form ${V-} and ${V:-} are not
    subject to normal shell empty removal if that parameter is unset, causing
    unexpected empty arguments to commands. Workaround: ${V+$V} and
    ${V:+$V} work as expected. (Found on FreeBSD 10.3 sh)
  • BUG_PSUBIFSNW: When field-splitting unquoted parameter substitutions like
    ${var#foo}, ${var##foo}, ${var%foo} or ${var%%foo} on non-whitespace
    IFS, if there is an initial empty field, a spurious extra initial empty
    field is generated. (mksh)
  • BUG_PSUBNEWLN: Due to a bug in the parser, parameter substitutions
    spread over more than one line cause a syntax error.
    Workaround: instead of a literal newline, use $CCn.
    (found in dash \<= 0.5.9.1 and Busybox ash \<= 1.28.1)
  • BUG_PSUBSQUOT: in pattern matching parameter substitutions
    (${param#pattern}, ${param%pattern}, ${param##pattern} and
    ${param%%pattern}), if the whole parameter substitution is quoted with
    double quotes, then single quotes in the pattern are not parsed. POSIX
    says
    they are to keep their special meaning, so that glob characters may
    be quoted. For example: x=foobar; echo "${x#'foo'}" should yield bar
    but with this bug yields foobar. (dash \<= 0.5.9.1; Busybox 1.27 ash)
  • BUG_PSUBSQHD: Like BUG_PSUBSQUOT, but included a here-document instead of
    quoted with double quotes. (dash \<= 0.5.9.1; mksh)
  • BUG_PUTIOERR: Shell builtins that output strings (echo, printf, ksh/zsh
    print), and thus also modernish put and putln, do not check for I/O
    errors on output. This means a script cannot check for them, and a script
    process in a pipe can get stuck in an infinite loop if SIGPIPE is ignored.
  • BUG_READWHSP: If there is more than one field to read, read does not
    trim trailing IFS whitespace. (dash 0.5.7, 0.5.8)
  • BUG_REDIRIO: the I/O redirection operator <> (open a file descriptor
    for both read and write) defaults to opening standard output (i.e. is
    short for 1<>) instead of defaulting to opening standard input (0<>) as
    POSIX specifies.
    (AT&T ksh93)
  • BUG_REDIRPOS: Buggy behaviour occurs if a redirection is positioned
    in between to variable assignments in the same command. On zsh 5.0.x, a
    parse error is thrown. On zsh 5.1 to 5.4.2, anything following the
    redirection (other assignments or command arguments) is silently ignored.
  • BUG_SCLOSEDFD: bash \< 5.0 and dash fail to establish a block-local scope
    for a file descriptor that is added to the end of the block as a redirection
    that closes that file descriptor (e.g. } 8<&- or done 7>&-). If that FD
    is already closed outside the block, the FD remains global, so you can’t
    locally exec it. So with this bug, it is not straightforward to make a
    block-local FD appear initially closed within a block. Workaround: first open
    the FD, then close it – for example: done 7>/dev/null 7>&- will establish
    a local scope for FD 7 for the preceding dodone block while still
    making FD 7 appear initially closed within the block.
  • BUG_SETOUTVAR: The set builtin (with no arguments) only prints native
    function-local variables when called from a shell function. (yash \<= 2.46)
  • BUG_SHIFTERR0: The shift builtin silently returns a successful exit
    status (0) when attempting to shift a number greater than the current
    amount of positional parameters. (Busybox ash \<= 1.28.4)
  • BUG_SPCBILOC: Variable assignments preceding
    special builtins
    create a partially function-local variable if a variable by the same name
    already exists in the global scope. (bash \< 5.0 in POSIX mode)
  • BUG_TESTERR1A: test/[ exits with a non-error false status
    (1) if an invalid argument is given to an operator. (AT&T ksh93)
  • BUG_TESTILNUM: On dash (up to 0.5.8), giving an illegal number to test -t
    or [ -t causes some kind of corruption so the next test/[ invocation
    fails with an “unexpected operator” error even if it’s legit.
  • BUG_TESTONEG: The test/[ builtin supports a -o unary operator to
    check if a shell option is set, but it ignores the no prefix on shell
    option names, so something like [ -o noclobber ] gives a false positive.
    Bug found on yash up to 2.43. (The TESTO feature test implicitly checks
    against this bug and won’t detect the feature if the bug is found.)
  • BUG_TRAPEMPT: The trap builtin does not quote empty traps in its
    output, rendering the output unsuitable for shell re-input. For instance,
    trap '' INT; trap outputs “trap -- INT“ instead of “trap -- '' INT“.
    (found in mksh \<= R56c)
  • BUG_TRAPEXIT: the shell’s trap builtin does not know the EXIT trap by
    name, but only by number (0). Using the name throws a “bad trap” error. Found in
    klibc 2.0.4 dash.
  • BUG_TRAPFNEXI: When a function issues a signal whose trap exits the
    shell, the shell is not exited immediately, but only on return from the
    function. (zsh)
  • BUG_TRAPRETIR: Using return within eval triggers infinite recursion if
    both a RETURN trap and the functrace shell option are active. This bug in
    bash-only functionality triggers a crash when using modernish, so to avoid
    this, modernish automatically disables the functrace shell option if a
    RETURN trap is set or pushed and this bug is detected. (bash 4.3, 4.4)
  • BUG_TRAPSUB0: Subshells in traps fail to pass down a nonzero exit status of
    the last command they execute, under certain conditions or consistently,
    depending on the shell. (bash \<= 4.0; dash 0.5.9 - 0.5.10.2; yash \<= 2.47)
  • BUG_TRAPUNSRE: When a trap unsets itself and then resends its own signal,
    the execution of the trap action (including functions called by it) is
    not interrupted by the now-untrapped signal; instead, the process
    terminates after completing the entire trap routine. (bash \<= 4.2; zsh)
  • BUG_UNSETUNXP: If an unset variable is given the export flag using the
    export command, a subsequent unset command does not remove that export
    flag again. Workaround: assign to the variable first, then unset it to
    unexport it. (Found on AT&T ksh JM-93u-2011-02-08; Busybox 1.27.0 ash)
  • BUG_VARPREFIX: On a shell with the VARPREFIX feature, expansions of type
    ${!prefix@} and ${!prefix*} do not find the variable name
    prefix itself. (AT&T ksh93)
  • BUG_ZSHNAMES: A series of lowerase names, normally okay for script use
    as per POSIX convention, is reserved for special use. Unsetting these
    names is impossible in most cases, and changing them may corrupt important
    shell or system settings. This may conflict with
    simple-form modernish scripts.
    This bug is detected on zsh when it was not initially invoked in emulation
    mode, and emulation mode was enabled using emulate sh post invocation
    instead (which does not disable these conflicting parameters).
    As of zsh 5.6, the list of variable names affected is: aliases argv
    builtins cdpath commands dirstack dis_aliases dis_builtins
    dis_functions dis_functions_source dis_galiases dis_patchars
    dis_reswords dis_saliases fignore fpath funcfiletrace
    funcsourcetrace funcstack functions functions_source functrace
    galiases histchars history historywords jobdirs jobstates
    jobtexts keymaps mailpath manpath module_path modules nameddirs
    options parameters patchars path pipestatus prompt psvar
    reswords saliases signals status termcap terminfo userdirs
    usergroups watch widgets zsh_eval_context zsh_scheduled_events
  • BUG_ZSHNAMES2: Two lowercase variable names histchars and signals,
    normally okay for script use as per POSIX convention, are reserved for
    special use on zsh, even if zsh is initialised in sh mode (via a sh
    symlink or using the --emulate sh option at startup).
    Bug found on: zsh <= 5.7.1. The bug is only detected if BUG_ZSHNAMES is
    not detected, because this bug’s effects are included in that one’s.

Warning IDs

Warning IDs do not identify any characteristic of the shell, but instead
warn about a potentially problematic system condition that was detected at
initialisation time.

  • WRN_EREMBYTE: The current system locale setting supports Unicode UTF-8
    multi-byte/variable-length characters, but the utility used by
    str ematch
    to match extended regular expressions (EREs) does not support them
    and treats all characters as single bytes. This means multi-byte characters
    will be matched as multiple characters, and character [:classes:]
    within bracket expressions will only match ASCII characters.
  • WRN_MULTIBYTE: The current system locale setting supports Unicode UTF-8
    multi-byte/variable-length characters, but the current shell does not
    support them and treats all characters as single bytes. This means
    counting or processing multi-byte characters with the current shell will
    produce incorrect results. Scripts that need compatibility with this
    system condition should check if thisshellhas WRN_MULTIBYTE and resort
    to a workaround that uses external utilities where necessary.
  • WRN_NOSIGPIPE: Modernish has detected that the process that launched
    the current program has set SIGPIPE to ignore, an irreversible condition
    that is in turn inherited by any process started by the current shell, and
    their subprocesses, and so on. The system constant
    $SIGPIPESTATUS
    is set to the special value 99999 and neither the current shell nor any
    process it spawns is now capable of receiving SIGPIPE. The
    -P option to harden
    is also rendered ineffective.
    Depending on how a given command foo is implemented, it is now possible
    that a pipeline such as foo | head -n 10 never ends; if foo doesn’t
    check for I/O errors, the only way it would ever stop trying to write
    lines is by receiving SIGPIPE as head terminates.
    Programs that use commands in this fashion should check if thisshellhas WRN_NOSIGPIPE and either employ workarounds or refuse to run if so.

Appendix B: Regression test suite

Modernish comes with a suite of regression tests to detect bugs in modernish
itself, which can be run using modernish --test after installation. By
default, it will run all the tests verbosely but without tracing the command
execution. The install.sh installer will run modernish --test -eqq on the
selected shell before installation.

A few options are available to specify after --test:

  • -h: show help.
  • -e: disable or reduce expensive (i.e. slow or memory-hogging) tests.
  • -q: quieter operation; report expected fails [known shell bugs]
    and unexpected fails [bugs in modernish]). Add -q again for
    quietest operation (report unexpected fails only).
  • -s: entirely silent operation.
  • -t: run only specific test sets or tests. Test sets are those listed
    in the full default output of modernish --test. This option requires
    an option-argument in the following format:
    testset1:num1,num2,/testset2:num1,num2,/
    The colon followed by numbers is optional; if omitted, the entire set
    will be run, otherwise the given numbered tests will be run in the given
    order. Example: modernish --test -t match:2,4,7/arith/shellquote:1 runs
    test 2, 4 and 7 from the match set, the entire arith set, and only
    test 1 from the shellquote set.
    A testset can also be given as the incomplete beginning of a name or as
    a shell glob pattern. In that case, all matching sets will be run.
  • -x: trace each test using the shell’s xtrace facility. Each trace is
    stored in a separate file in a specially created temporary directory. By
    default, the trace is deleted if a test does not produce an unexpected
    fail. Add -x again to keep expected fails as well, and again to
    keep all traces regardless of result. If any traces were saved,
    modernish will tell you the location of the temporary directory at the
    end, otherwise it will silently remove the directory again.
  • -E: don’t run any tests, but output a command to open the tests that would
    have been run in your editor. The editor from the VISUAL or EDITOR
    environment variable is used, with vi as a default. This option should be
    used together with -t to specify tests. All other options are ignored.
  • -F: takes an argument with the name or path to a find utility to
    prefer when testing LOOP find.
    More info here.

These short options can be combined so, for example,
--test -qxx is the same as --test -q -x -x.

Difference between capability detection and regression tests

Note the difference between these regression tests and the cap tests listed in
Appendix A. The latter are
tests for whatever shell is executing modernish: they detect capabilities
(features, quirks, bugs) of the current shell. They are meant to be run via
thisshellhas and are designed to
be taken advantage of in scripts. On the other hand, these tests run by
modernish --test are regression tests for modernish itself. It does not
make sense to use these in a script.

New/unknown shell bugs can still cause modernish regression tests to fail,
of course. That’s why some of the regression tests also check for
consistency with the results of the capability detection tests: if there is a
shell bug in a widespread release version that modernish doesn’t know about
yet, this in turn is considered to be a bug in modernish, because one of its
goals is to know about all the shell bugs in all released shell versions
currently seeing significant use.

Testing modernish on all your shells

The testshells.sh program in share/doc/modernish/examples can be used to
run the regression test suite on all the shells installed on your system.
You could put it as testshells in some convenient location in your
$PATH, and then simply run:

  1. testshells modernish --test

(adding any further options you like – for instance, you might like to add
-q to avoid very long terminal output). On first run, testshells will
generate a list of shells it can find on your system and it will give you a
chance to edit it before proceeding.

Appendix C: Supported locales

modernish, like most shells, fully supports two system locales: POSIX
(a.k.a. C, a.k.a. ASCII) and Unicode’s UTF-8. It will work in other locales,
but things like converting to upper/lower case, and matching single
characters in patterns, are not guaranteed.

Caveat: some shells or operating systems have bugs that prevent (or lack
features required for) full locale support. If portability is a concern,
check for thisshellhas WRN_MULTIBYTE or thisshellhas BUG_NOCHCLASS
where needed. See Appendix A.

Scripts/programs should not change the locale (LC_* or LANG) after
initialising modernish. Doing this might break various functions, as
modernish sets specific versions depending on your OS, shell and locale.
(Temporarily changing the locale is fine as long as you don’t use
modernish features that depend on it – for example, setting a specific
locale just for an external command. However, if you use harden, see
the important note
in its documentation!)

Appendix D: Supported shells

Modernish builds on the
POSIX 2018 Edition
standard, so it should run on any sufficiently POSIX-compliant shell and
operating system. It uses both
bug/feature detection
and
regression testing
to determine whether it can run on any particular shell, so it does not
block or support particular shell versions as such. However, modernish has
been confirmed to run correctly on the following shells:

  • bash 3.2 or higher
  • Busybox ash 1.20.0 or higher, excluding 1.28.x
    (also possibly excluding anything older than 1.27.x on UTF-8 locales,
    depending on your operating system)
  • dash (Debian sh)
    0.5.7 or higher, excluding 0.5.10, 0.5.10.1, 0.5.11-0.5.11.4
  • FreeBSD sh 11.0 or higher
  • gwsh
  • ksh 93u+ 2012-08-01, 93u+m
  • mksh version R55 or higher
  • NetBSD sh 9.0 or higher
  • yash 2.40 or higher (2.44+ for POSIX mode)
  • zsh 5.3 or higher

Currently known not to run modernish due to excessive bugs:

Appendix E: zsh: integration with native scripts

This appendix is specific to zsh.

While modernish duplicates some functionality already available natively
on zsh, it still has plenty to add. However, writing a normal
simple-form modernish script turns
emulate sh on for the entire script, so you lose important aspects
of the zsh language.

But there is another way – modernish functionality may be integrated
with native zsh scripts using ‘sticky emulation’, as follows:

  1. emulate -R sh -c '. modernish'

This causes modernish functions to run in sh mode while your script will still
run in native zsh mode with all its advantages. The following notes apply:

  • Using the safe mode is not recommended, as zsh
    does not apply split/glob to variable expansions by default, and the
    modernish safe mode would defeat the ${~var} and ${=var} flags that apply
    these on a case by case basis. This does mean that:
    • The --split and --glob operators to constructs such as
      LOOP find
      are not available. Use zsh expansion flags instead.
    • Quoting literal glob patterns to commands like find remains necessary.
  • Using LOCAL is not recommended.
    Anonymous functions
    are the native zsh equivalent.
  • Native zsh loops should be preferred over modernish loops, except where
    modernish adds functionality not available in zsh (such as LOOP find or
    user-programmed loops).

See man zshbuiltins under emulate, option -c, for more information.

Appendix F: Bundling modernish with your script

The modernish installer install.sh can bundle one or more scripts with a
stripped-down version of the modernish library. This allows the bundled scripts
to run with a known version of modernish, whether or not modernish is installed
on the user’s system. Like modernish itself, bundling is cross-platform and
portable (or as portable as your script is).

Bundled scripts are not modified. Instead, for each script, a wrapper script is
installed under the same name in the installation root directory. This wrapper
automatically looks for a suitable
POSIX-compliant shell that passes the modernish battery of fatal bug tests,
then sets up the environment to run the real script with modernish on that
shell. Your modernish script can be run through the supplied wrapper script
from any directory location on any POSIX-compliant operating system, as long as
all files remain in the same location relative to each other.

Bundling is always a non-interactive installer operation, with options
specified on the command line. The installer usage for bundling is as follows:

install.sh -B -D rootdir [ -d subdir ]
[ -s shell ] scriptfile [ scriptfile … ]

The -B option enables bundling mode. The option does not itself take an
option-argument. Instead, any number of scriptfiles to bundle can be given
as arguments following all other options. All scripts are bundled with a
single copy of modernish. The bundling operation does not deal with any
auxiliary files the scripts may require (other than modernish modules); any
such need to be added manually after bundling is complete.

The -D option specifies the path to the bundled installation’s root
directory, where wrapper scripts are installed. This option is mandatory.
If the directory doesn’t exist, it is created.

The -d option specifies the subdirectory of the -D root directory where the
bundled scripts and modernish are installed. It can contain slashes to install
the bundle at a deeper directory level. The default subdirectory is bndl.
The option-argument can be empty or /, in which case the bundle is installed
directly into the installation root directory.

The -s option specifies a preferred shell for the bundled scripts. A shell
name or a full path to a shell can be given. Wrapper scripts try the full path
first (if any), then try to find a shell with its basename, and then try to
find a shell with that basename minus any version number (e.g. bash instead
of bash-5.0 or ksh instead of ksh93). If all that doesn’t produce a shell
that passes fatal bugs tests, it continues with the normal shell search.

This means the script won’t fail to launch if the preferred shell can’t be
found. Instead, it is up to the script itself to refuse to run if required
shell-specific conditions are not met. Script should use the
thisshellhas
function to check for any nonstandard
capabilities
required, or any
bugs
or
quirks
that the script is incompatible with (or indeed requires!).

Bundling is supported for both
portable-form
and
simple-form
modernish scripts. The installer automatically adapts the wrapper scripts to
the form used. For simple-form scripts, the directory containing the bundled
modernish core library (by default, .../bndl/bin/modernish) is prefixed to
$PATH so that . modernish works. Since simple-form scripts are often more
shell-specific, you may want to specify a preferred shell with the -s option.

To save space, the bundled copy of the modernish library is reduced such that
all comments are stripped from the code,
interactive use
is not supported,
the regression test suite
is not included,
thisshellhas
does not have the --cache and --show operators,
and the
cap/*.t capability detection scripts
are “statically linked” (directly included) into bin/modernish instead of
shipped as separate files.
A README.modernish file is added with a short explanation, the licence,
and a link for people to get the complete version of modernish. Please do
not remove this when distributing bundled scripts.


EOF