The Ultimate Guide to Argv¶
What is argv?¶
Argv is short for arguments vector. Vector is a fancy way of saying n-tuple.
When executing a program, e.g. from the terminal you may pass in the argv:
$ echo foo bar
echo
is the program to execute.(echo, foo, bar)
is the argv given to the program.(foo, bar)
are referred to as arguments
argv
can also be referred to as command line or cmdline.
How is argv useful?¶
argv
is one of several mechanisms to provide input to the program to either provide data it
should work on or configuration of the program modifying its standard behavior.
argv
can also be used for process identification in tools like htop, ps,
pgrep, or pkill.
argv
is also available in the special proc(5) file system as /proc/[pid]/cmdline
.
How is argv represented?¶
This depends on your runtime system.
In python, argv is a list available via
sys.argv
. It is similar in other high level languages.
In C, argv is given to the program in its int main(int argc, char ** agrv)
function. An integer
parameter signifying the length of the tuple and an array of pointers to the individual strings of
the tuple 1 2.
At the assembler level. The argv is present on top of the stack 3 when the program is started by the kernel. It is there again as integer parameter signifying the length of the tuple, followed by the pointers to the individual character arrays of the tuple 4.
Argv[0]¶
The first element in the argv tuple contains the name of the program 1. The argv[0]
is defined
by the parent process (execve(2)) and by convention it is the basename(1) of
the executed file. If the file is referred to through a file system link, the argv[0]
is
determined by the link name.
The argv[0]
can usually be ignored but you need to be aware of it. For example, if you want to
dispatch the argv to another program you usually want to pass only the 1..n
elements (instead of
0..n
, omitting the 0th argument).
Sometimes it can also be useful. For example, BusyBox 5 implements functionality of several
different programs and determines the program name from the argv[0]
6. Thanks to this trick,
it can be used as a drop-in replacement for these programs while also existing on the system as a
single executable.
argv[0]
can also be rewritten by the program itself typically to present itself in more useful
manner but also to hide itself by appearing as something else [C17-5.1.2.2.1].
Interpreting argv¶
The semantics of the argv contents are defined by the program. However, there are some patterns and standards.
Conventions¶
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.
1. SUSv4¶
Most software (on Linux) is mostly conforming to the [SUSv4-2018] or [GNU-Coding-Standards-4.8] standards. We will start by examining the [SUSv4-2018], specifically the Utility Conventions section [XBD-Utility-Conventions]. I took some liberties in the interpretation of the standard to enable this section to also serve as best current practice description for new programs.
You may refer only to the [12.2-Utility-Syntax-Guidelines] as your only guidelines but it is my hope that this document provides best current practices unconstrained by historical consideration of [SUSv4-2018] and it does so with more clarity while still describing practices that are no longer best but still somewhat current.
Note
The SUSv4 standard refers to programs as “utilities” 12.
The arguments given to a program can be categorized into disjunct sets of options, operands, and option-arguments.
2. Options and Operands¶
All arguments that can not be recognized as options or option-arguments, SHALL be recognized as operands.
Caution
This definition violates the [SUSv4-2018] [12.2.9] but it is the current best practice. It has to be included in this section because it is foundational to the rest of the document. This is on purpose to include options that follow operands. Also see 2. GNU Option Order.
Note
Operands are sometimes referred to as positional arguments and usually provide data.
The arguments that consist of <hyphen-minus> characters and single letters or digits SHALL be recognized as options [12.1.1] [12.2.3] [12.2.4] unless they are preceded in the
argv
by 10. Options Terminator Operand.
Hint
These are also referred to as short options or shortopts. Long options will be discussed in the 2. GNU Coding Standards and further sections.
Caution
The definition of options is complemented by 6. Bundled Options and 1. Long Options.
Note
Options are sometimes also referred to as flags and usually modify the program behavior.
Example
Command line to execute program echo
with argv (echo, -n)
where -n
is a short option.
$ echo -n
3. Option Arguments¶
Options MAY require option-arguments [12.1.1]. Option-arguments are passed in as the argv element successive to the option name argument.
Example
$ xargs -I %
The -I
is an option and %
is its option-argument. argv = (xargs, -I, %)
.
Options that require option-argument may also be referred to as argumented-options.
Options that do not require an option-argument may also be referred to as non-argumented-options.
4. Mandatory Option Arguments¶
For historical reasons we need to distinguish option-arguments that are optional and that are mandatory.
If an option accepts an option-argument, that option-argument SHALL be mandatory [12.2.7], i.e. not optional.
Example
$ xargs -I
xargs: option requires an argument -- 'I'
Try 'xargs --help' for more information.
[1]
Note
The [SUSv4-2018] recommends against optional option-arguments but ultimately permits them.
Motivation not entirely clear. It may include:
implementation simplicity
not providing significant benefit
future portability reasons (different implementation choosing different defaults)
5. Bundled Option-Argument¶
Bundled option-argument refers to option and its option-argument represented as single argv element with option-argument immediately following the option.
Bundled option arguments SHALL not be recognized as options and not accepted as options unless for historical compatibility reasons [12.2.6] [12.1.2] [12.1.2.a].
Example
The bundled form below would be be equivalent to unbundled form xargs -I %
if permitted.
$ xargs -I%
Caution
Not to be confused with 6. Bundled Options which is recommended.
Hint
Option-argument bundling is not permitted because it creates ambiguities with 6. Bundled Options and complicates parsing implementation for no significant benefit.
6. Bundled Options¶
Bundled options are one or more short options without option-arguments, followed by at most one
option that takes an option-argument, grouped into single argv element behind one -
delimiter
[12.2.5] [12.2.14].
Bundled options SHOULD be recognized as options.
Example
Unbundled form of options:
$ echo -n -e
is semantically equivalent to bundled form of options:
$ echo -ne
Note
Improves user experience when using the program manually and often.
7. Option Order¶
Example
grep -ri
and grep -ir
being equivalent.
Caution
The standard excepts 9. Mutually Exclusive Options from this requirement.
Note
Improves user experience when using the program manually and often.
Option order may be semantically significant.
Some programs domain and/or purpose requires semantical significance in order to function correctly.
Example
find(1) can express propositional logic in argv, e.g.:
$ `find '(' -name 'a' -or -name 'foo' ')' -and -not -type d`.
Note
find(1) is actually posix conforming because the -name
, etc arguments are
actually not options. They are operands.
If a program chooses to treat option order as semantically significant, it MUST be documented in the OPTIONS section [12.1.3] [12.2.11].
8. Option Repetition¶
Non-argumented-options may be repeated in the argv [12.1.3].
Note
The standard does not impose any requirements on program behavior “unless otherwise stated in the OPTIONS section” [12.1.3]. This likely refers to standard utilities described in other volumes of the standard.
If a non-argumented-option is repeated in the argv and the program documentation does not explicitly specify this behavior, the program MUST terminate erroneously or accept the options as if they were not repeated.
Note
Repetition of non-argumented-options are sometimes used to e.g. increase verbosity levels. Example lspci(8):
$ lspci -vv
Note
Accepting the options as if not repeated is likely the default behavior of argument parsers that do not consider this use case.
Note
Erroneous termination is strictly saner behavior if the argument parsing does consider the use case and does not assign any significant semantics to it.
Note
Mandating this behavior only if not otherwise specified by the program’s documentation allows for some niche use cases but it is probably advisable to consider other solutions, such as option with mandatory option argument before opting into this behavior.
Argumented-options may be repeated [12.1.9].
Hint
These are sometimes also referred to as cumulative options or cumulative arguments.
Interpretation of repeated argumented-options is determined on program-specific basis.
If the repetition is accepted, the options should be interpreted in the order specified in the
argv
[12.2.11].Example
$ sed -e 'script-1' -e 'script-2' $ rsync -av --exclude foo --exclude bar 1 2
Question
Should repetition where some instance of the option overrides the other be permitted?
It may be useful in niche cases when composing the argv by allowing the successive options to override the preceding ones but it feels wrong.
It should probably also be consistent with resolving mutualy exclusive options which have a precedent in the [12.2.11].
Also see 9. Mutually Exclusive Options.
9. Mutually Exclusive Options¶
Programs may interpret options as mutually exclusive [12.1.3] [12.2.11].
Multiple mutually exclusive options may be accepted in single
argv
as long as such options are documented as mutually exclusive and are documented to override any incompatible options preceding it.Caution
Single
argv
. Not singleargv
element.Note
These options are exempted from the insignificant option order recommendation by the standard [12.2.11].
Note
When considering whether the guidelines should permit or not this behavior, it should be considered in the context of general option repetition. The standard does not seem to provide a guideline (ie. allows arbitrary repetition) on this topic except for this specific exception.
10. Options Terminator Operand¶
The first
--
operand should be accepted as a delimiter indicating end of options [12.2.10].Note
This exists to distinguish operands that would otherwise be recognized as an option. Example
argv = (/usr/bin/printf, --, --version)
:$ /usr/bin/printf -- --version --version
vs:
$ /usr/bin/printf --version printf (GNU coreutils) 8.32 [...]
11. Standard Input/Output Operand¶
The
-
operand may refer to standard input, standard output, or file named-
[12.2.13].Question
Motivation unclear except for being short. And occasional use case when utilities are composed where passing in the stdin/out the same way as file name may be convenient to implement.
2. GNU Coding Standards¶
GNU Coding Standards for Command Line Interfaces [GNU-Coding-Standards-4.8] mostly extend the [SUSv4-2018] standard but occasionally violate it 9.
1. Long Options¶
Programs SHOULD also accept options in the form of long options. Also referred to as longopts. Longopts are signified by prefix of two <hyphen-minus> characters
--
.Example
$ rsync --version
Long options generally follow the same guidelines as short options as defined in 1. SUSv4 except for obvious incompatibilities such as option bundling.
Programs should accept long options version of each short option in the hope of more user friendliness. E.g.
rsync --verbose
andrsync -v
are equivalent
2. GNU Option Order¶
Programs SHOULD violate [SUSv4-2018] to accept options regardless of their relative position to operands if possible.
Hint
Options following operands may be referred to as tail options 14.
3. Recommended Option Names¶
Operands used as file name arguments should be used for input files only. Output files should be specified using options
-o
or--output
.GNU Coding Standards also contain a table of recommended long option names and their semantics: https://www.gnu.org/prep/standards/html_node/Option-Table.html#Option-Table.
The table interestingly specifies
--quiet
and--silent
as synonyms. There is at least one common software that uses these differently. Can’t remember which right now.There is also a list recommended short options in the [TaO-Command-Line-Options].
4. Recommended Options¶
Programs should support two standard options
--version
and--help
.
CGI programs should accept these as command-line options as well as as PATH_INFO
; for example
http://example.org/p.cgi/–help` should output the same information as invoking p.cgi --help
on
the command line.
Note
Well this interesting. This probably should be disregarded from the best current practice. For one because CGI is practically non-existent nowadays. And for second, looks funky. But its only –version and –help. Idk.
De Facto Standards¶
A de facto standard is a custom or convention that has achieved a dominant position by public acceptance or market forces [DeFacto].
Long options option-argument bundling is NOT RECOMMENDED.
However, it is common to see in existing software. Usually with a
=
as separator.Example
$ git --git-dir=foo
Question
Why is this still a thing?
It seems like more work for no benefit compared to no bundling
--git-dir foo
.At first I thought this is historical, possibly because shell scripts could just remove the prefix and eval the rest but no. This is still implemented by modern software.
Options are often only boolean switches (aka flags). Normally, if unspecified, the option is off. When specified, the option is on. E.g.:
$ grep -q
If an option is by default on and when specified is off, it may be realized as long option with
--no-
prefix. E.g.:$ wget --no-verbose
The
--
operand is in addition to the meaning in 10. Options Terminator Operand also commonly used to signify end of operands meant for the program in theargv[0]
position and start of an argv for another program to execute. This technique is known as Bernstein chaining 7.
Golang Standards¶
Golang kind of goes its own way. This style actually seems to originate from X toolkit [TaO-Command-Line-Options].
Historical Standards¶
All recommendations in this section SHOULD NOT be regarded as best current practice unless historical reasons are involved in which case this entire document is irrelevant.
First we will discuss patterns that are common and allowed by [SUSv4-2018].
All options should precede the operands [12.2.9].
Hint
Disregard. See 2. Options and Operands and 2. GNU Option Order.
Note
This guidelines seems to be strictly adhered to in FreeBSD world and it is kind of annoying.
Programs accepting option-arguments may accept multiple option-arguments bundled into a single argv. In that case, the option-arguments should be separated by comma
,
or <blank> 11 characters [12.2.8].Hint
Cumulative options should be used for this purpose.
This bundled approach may be chosen due to compositional synergy with other utilities. But in that case it should be considered whether the other tools may be modified to also be synergistic with the cumulative approach.
Optional option-arguments are permitted [12.1.2] [12.1.7].
Example
xargs -i
andxargs -i{}
being equivalent as per xargs(1).Hint
Optional option-arguments are not recommended. See 4. Mandatory Option Arguments.
Bundled option-arguments may be accepted [12.1.2] [12.1.2.a] [12.1.2.b].
Hint
Mostly only of historical significance. See 5. Bundled Option-Argument.
Optional option-arguments MUST be option-argument bundled. [12.1.2.b]:
$ xargs -i{}
Hint
This is required to distinguish option-arguments from operands. However, optional option-arguments are not recommended in the first place. See 4. Mandatory Option Arguments.
Now, we will look at some other practices that are relatively rare by now.
Flags (as in boolean switch options) were sometimes signified by prefixes
-
and+
to turn an option on and off respectively.Example
$ setopt -x $ setopt +x
Note
This style seems to originate in the [X-Toolkit-Style]
Flags may also be realized by switching the letter case with e.g.
-d
flag being on and-D
flag being off 10.Some programs may also accept short opts without the
-
signifier. Example:$ ps eof
Note
The ps(1) indicates this style originates from some kind of BSD. Why is this distinct from UNIX eludes me.
Thanks, I Hate This¶
The [SUSv4-2018] seems well formed and relatively straightforward but it is not an easy read. It is actually pretty confusing at several places.
The important thing to realize when reading [SUSv4-2018] is that [12.1.2.a] refers to “standard utility” 13 which is quite easy to miss. Actually the entire [12.1.2] and related guidelines require careful deconstruction.
Thanks, I hate it and I hope to never see [XBD-Utility-Conventions] again.
Backlog¶
This document still suffers from:
missing guidelines on some edge cases
inconsistent phrasing
inconsistent use of admonitions
some terms may be over-specified
some terms may be under-specified
missing section on conformity status of existing argument parsing solutions
The 1. SUSv4 section mentions relatively lot best current practices that violate it. May be the structuring into sections by standards should be abandoned by now.
I know this document violates rfc2119 section 6 and I don’t care. I find it useful. This is not a an internet standard. It’s not even a standard. It’s an
Any contradictions, ambiguities, typographical issues, inconsistencies, or complaints shall be directed to /dev/null jan@matejka.ninja
References¶
- 1(1,2)
ISO/IEC 9899:2018 Information technology — Programming languages — C https://www.iso.org/standard/74528.html Section 5.1.2.2.1 Program startup
- 2
https://publications.gbdirect.co.uk//c_book/chapter10/arguments_to_main.html
- 3
- 4
https://github.com/jan-matejka/code-golf/blob/master/2.echo-argv/nasm/main.asm
- 5
- 6
- 7
- 8
https://stackoverflow.com/questions/402377/using-getopts-to-process-long-and-short-command-line-options 8
- 9
“We consider standards such as POSIX; we don’t “obey” them.” https://www.gnu.org/prep/standards/html_node/Program-Behavior.html#Program-Behavior
- 10
- 11
One of the characters that belong to the blank character class as defined via the LC_CTYPE category in the current locale. In the POSIX locale, a <blank> character is either a <tab> or a <space>.
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_74
- 12
- Utility:
A program, excluding special built-in utilities provided as part of the Shell Command Language, that can be called by name from a shell to perform a specific task, or related set of tasks.
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_439
- 13
- Standard Utilities
The utilities described in the Shell and Utilities volume of POSIX.1-2017 [XCU-Shell-And-Utilities].
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_369
- 14
- 12.1.1(1,2)
Section 12.1 paragraph 1 in SUSv4-2018.
- 12.1.2(1,2,3,4)
Section 12.1 paragraph 2 in SUSv4-2018.
- 12.1.2.a(1,2,3)
Section 12.1 paragraph 2.a in SUSv4-2018.
- 12.1.2.b(1,2)
Section 12.1 paragraph 2.b in SUSv4-2018.
- 12.1.3(1,2,3,4,5)
Section 12.1 paragraph 3 in SUSv4-2018.
- 12.1.7
Section 12.1 paragraph 6 in SUSv4-2018.
- 12.1.9
Section 12.1 paragraph 9 in SUSv4-2018.
- 12.2-Utility-Syntax-Guidelines
Section 12.2 in SUSv4-2018.
- 12.2.3
Section 12.2 Guideline 3 in SUSv4-2018.
- 12.2.4
Section 12.2 Guideline 4 in SUSv4-2018.
- 12.2.5
Section 12.2 Guideline 4 in SUSv4-2018.
- 12.2.6
Section 12.2 Guideline 6 in SUSv4-2018.
- 12.2.7
Section 12.2 Guideline 7 in SUSv4-2018.
- 12.2.8
Section 12.2 Guideline 8 in SUSv4-2018.
- 12.2.9(1,2)
Section 12.2 Guideline 9 in SUSv4-2018.
- 12.2.10
Section 12.2 Guideline 10 in SUSv4-2018.
- 12.2.11(1,2,3,4,5,6)
Section 12.2 Guideline 11 in SUSv4-2018.
- 12.2.13
Section 12.2 Guideline 13 in SUSv4-2018.
- 12.2.14
Section 12.2 Guideline 14 in SUSv4-2018.
- GNU-Coding-Standards-4.8(1,2)
https://www.gnu.org/prep/standards/html_node/Command_002dLine-Interfaces.html
- TaO-Command-Line-Options(1,2,3)
- X-Toolkit-Style
https://www.x.org/releases/X11R7.7/doc/libXt/intrinsics.html#Parsing_the_Command_Line
- C17
ISO/IEC9899:2017
- C17-5.1.2.2.1
[C17] $ 5.1.2.2.1
- DeFacto
A de facto standard is a custom or convention that has achieved a dominant position by public acceptance or market forces.
Campbell, Robert; Pentz, Ed; Borthwick, Ian (2012). Academic and Professional Publishing. Chandos Publishing. p. 9.
- SUSv4-2018(1,2,3,4,5,6,7,8,9,10)
Single Unix Specification 2018 edition
SUSv4 2018 edition is simultaneously POSIX.1-2017 and IEEE Std 1003.1™-2017 and The Open Group Technical Standard Base Specifications, Issue 7.
- XBD-Utility-Conventions(1,2)
SUSv4 2018 edition Volume XBD Section 12. Utility Conventions
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html#tag_12
- XCU-Shell-And-Utilities
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/toc.html