Wednesday, 1 June 2011

How to write a UNIX man page

Introduction
Man pages are common on UNIX and UNIX-like systems for providing online documentation for user commands, libraries, APIs, file formats and the like.  So common in fact, that one might think there is a magic tool that authors use to write them.  Well, there is and there isn't.  If you consider vi or emacs to be magic, or the text formatting tools nroff and troff, then indeed you would be right.  That's about as magic as it gets.

When you use the man command to display a man page, the text file that you have written in your favourite editor is formatted by one of several text formatters, such as nroff, tbl and col, before being displayed on-screen.  Each of these text formatters has its own man page describing its behaviour.

This article discusses writing man pages for Solaris or Linux, although the instructions will be practically identical for other UNIX systems. The best way to learn how to write a man page is often to take an existing man page that someone else has written and change it for your own needs.  However, this article will give you some useful pointers.

Chapters
Man pages are organised by chapters, much like the chapters of a book.  Each chapter is identified by a title and a number.  The main difference between writing man pages for Solaris and Linux are the chapter numbers, which will differ.

To find out what information should be contained within a particular chapter, type man -s<N> intro on Solaris or man <N> intro on Linux, where <N> is the chapter number of interest. This will pull up the introduction page for the chapter. For example, man -s1 intro will identify that this chapter is for User Commands. On Solaris, chapter 1M (man -s1m intro) is for System Administration Commands such as those you would usually only run as the root user, while on Linux this information goes in chapter 8 (man 8 intro).

If you're unsure of the title of a chapter or what chapter number you should be using, open the man page for another similar type of command that comes with the OS and use the same chapter number in your man page.

Basic Layout
A typical man page starts with some preamble identifying the title and chapter number, and is then laid out in a number of sections:

SectionDescription
NAME Name of command and summary line
SYNOPSIS Identifies the different ways the command can be invoked and its command-line arguments
DESCRIPTION A description of what the command does and how to use it
OPTIONS A description of each command-line option and what effect it has
SEE ALSO A list of related man pages or documentation

Man pages may include any sections that are relevant, but the above list is normal for a basic man page and this article will use the above list. Other common sections that appear in man pages include ENVIRONMENT VARIABLES, EXAMPLES, EXIT STATUS, FILES, NOTES, AUTHOR, COPYRIGHT and BUGS.

Man pages are text files called <name>.<chapter>, where <name> is the name of the man page (usually the same as the command it is describing), and <chapter> is the chapter number in lowercase. Man pages for chapter <chapter> are contained within a directory called man/man<chapter>, again in lowercase.

Fonts
Thoughout a man page, different font faces have particular meanings. Default text is known as "Roman". Bold text is used for text that must be typed exactly as shown (or for general emphasis within paragraphs). Italic text, which is actually usually displayed underlined instead, are for arguments that must be replaced by something else.

Note that on Solaris, bold in man pages does not show up without some tweaking.  I'll discuss this in a separate posting.

In Linux, apostrophes don't always display as apostrophes in PuTTY. To fix this, make sure PuTTY is configured to assume received data is in the UTF-8 character set.

General Formatting Rules

Macro commands
Macro commands for the text formatter generally appear on newlines prefixed by a single dot. Anything else you type will appear in the man page in formatted paragraphs, fully justified against the left and right margins. The text formatter will automatically split-up and hyphenate long words when necessary.

Line breaks and paragraph breaks (.br and .LP)
Line breaks in man pages are generally swallowed up, so if you're typing a long paragraph, you can usually hit Enter whenever you like.

If you actually want to begin a new paragraph, leave one blank line. Alternatively, use the .LP command on a line by itself to request a new paragraph.

If you want to force a line break (but not a new paragraph), use the .br command on a line by itself to request a line break.

Be careful when putting in line breaks. Solaris swallows up extra space when displaying man pages, but Linux does not.

Bold text (.B, .BR and \fB)
If a line begins .B the next argument will be bold. If the text contains spaces, enclose in double quotes. E.g.
The word
.B bold
will be bolded.
To switch back to Roman text without incurring a space, use .BR instead. The first argument will be bold, the second argument Roman. As before, if an argument must contain spaces, enclose in double quotes. E.g.
The word
.BR bold ,
will be bolded but the comma was Roman.
Alternatively, the macro \fB starts bold face, \fR returns to Roman. E.g.
The word \fBbold\fR, will be bold. 
Italic text (.I, .IR and \fI)
As previously mentioned, italic text actually usually appears underlined. If a line begins .I the next argument will be italics. If the text contains spaces, enclose in double quotes. E.g.
The word
.I italic
will be underlined.
To switch back to Roman text without incurring a space, use .IR instead. The first argument will be italics, the second argument Roman. As before, if an argument must contain spaces, enclose in double quotes. E.g.
The word
.IR italic ,
will be underlined but the comma was Roman.
Alternatively, the macro \fI starts italics, \fR returns to Roman. E.g.
The word \fIitalic\fR, will be underlined.
Indenting paragraphs (.RS, .RE, .HP and .TP)
There are several ways to achieve paragraph indentation. The simplest form is .RS <N> where <N> is the number of characters to indent. This sets up a relative indent, and .RE ends a relative indent. E.g.
.RS 3
This paragraph is indented by 3 characters.
.RE
The .RS command can be nested to create different levels of indentation. Each successive .RE returns the indentation back to the previous setting. E.g.
.RS 3
This line is indented by 3 characters.
.RS 3
This line is indented by 6 characters.
.RE
This line is indented by 3 characters.
.RE
Now we're back to normal.
Alternatively, the .HP command can be used to set-up a hanging indent. Like .RS it is given an argument specifying the number of characters to indent by, but it will apply to the next paragraph. To remove the indentation, start a new paragraph with .LP. E.g.
.HP 3
This paragraph is normal.

This paragraph is indented by 3 characters.
.LP
This paragraph is normal.
The .TP command sets up a tagged indent and is typically used when discussing command-line options. This allows for paragraph indentation that follows an initial line that is not indented. The first line immediately following a .TP command contains the text to display that is not indented. All further lines and paragraphs will be indented. The .TP command can be given an argument specifying the number of characters to indent, or if omitted will use whatever indentation setting was specified with the last .TP command. E.g.
.TP 8
.B -a
This argument does something.
.TP
.B -b
This argument does something else.
.LP
Now we're back to normal.
In the above example, the -a and -b options appear in bold in the left column, while the description of what the argument does appears in the right column. The left column is 8 characters wide.
It is common to indent the whole block using a relative indent. E.g.
Command-line options are:
.RS 3
.TP 8
.B -a
This argument does something.
.TP
.B -b
This argument does something else.
.LP
.RE
Now we're back to normal.
Preamble (.TH)
The preamble generally includes comments and a title line.

Comments are lines that are prefixed: .\"
.\" This is a comment in a man page
The title line takes the format .TH <n> <s> <d> <f> <m>, where each argument is described in the table below:

<n> Name of man page (from file name)
<s> Section (chapter number from file name)
<d> Date of most recent change
<f> Left page footer text, commonly the product name and version that provides this manual page
<m> Main page (centre) header text, commonly the title of the chapter

If an argument contains spaces, it must be enclosed within double quotes.  If the left page or main page header text is omitted, defaults will be assumed.  Here is an example:
.TH prose "1" "17 November 2010"
Man Page Sections (.SH)
New sections in man pages are started with the Section Heading command .SH. This macro resets formatting, displays a section heading title and sets up for a new paragraph. It takes a single argument, the section heading text. If this text contains spaces, enclose it within double quotes. For example:
.SH SYNOPSIS
begins the SYNOPSIS section.

NAME
The NAME section usually consists of one line:
<command> - <summary>
where <command> is the name of the command that the man page is describing, and <summary> is a one line summary. Here is an example NAME section:
.SH NAME
prose - PROSE script compiler and engine
SYNOPSIS
The SYNOPSIS section provides the syntax of the command and its arguments, as typed on the command line.
When in bold, a word must be typed exactly as displayed. When in italics (or underlined), a word can be replaced with an argument that the user supplies. Symbols are used to further identify the syntax:

[ ] An argument, when surrounded by brackets, is optional.
| Arguments separated by a vertical bar are exclusive. You can supply only one item from such a list.
... Arguments followed by an ellipsis can be repeated. When an ellipsis follows a bracketed set, the expression within brackets can be repeated.

Here is an example SYNOPSIS section:
.SH SYNOPSIS
\fBprose\fR [\fB\-D\fR \fIn|n1-n2\fR] [\fB\-m\fR \fImodule_dir\fR]
.RS 6
[\fB\-s\fR \fIschema_dir\fR] \fIbinary_file\fR ...
.RE
.LP
.B "prose --help"
.br
.B "prose --version"
To display a backslash, you must write \\ to avoid special interpretation.

DESCRIPTION
The DESCRIPTION section provides a narrative overview of the command's behaviour. This includes how it interacts with files or data, and how it handles the standard input, standard output and standard error. Internals and implementation details are normally omitted. This section attempts to provide a succinct overview in answer to the question, "what does it do?". Here is an example:
.SH DESCRIPTION
Loads one or more PAL binary files into the execution engine,
and launches the
.B 'main'
functions located underneath each module root.

PROSE binary files are created by passing PAL instructions to the
.B prism
assembler tool.

Note that the PROSE scripting language does not yet exist.  When
it does, this tool will also compile PROSE scripts into bytecode.
OPTIONS
The OPTIONS section lists the command-line options with a description of how each affects the command's operation. Here is an example:
.SH OPTIONS
.TP
.BI \-D " n|n1-n2"
enable debug mode, reporting debug messages at level
.IR n ,
or between the levels of
.I n1
and
.IR n2 .
There are a number of debug levels available, ranging from 1-127,
where 1 is
the least amount of information.  Debug levels from 50 and above
are only
available if the tool has been compiled with the ENABLE_EXTRADEBUG
flag.
.TP
.BI \-m " module_dir"
specify a different directory in which to locate the PROSE modules.
Run
.B "prose --help"
to see the default location.
.TP
.BI \-s " schema_dir"
specify a different directory in which to locate the PROSE schema
definitions.
Run
.B "prose --help"
to see the default location.
SEE ALSO
The SEE ALSO section is a comma-separated list of related man pages and documentation. Here is an example:
.SH "SEE ALSO"
.BR prism (1),
.BR pal_intro (5).
Further Reading
See man man on a Solaris or Linux host for a general discussion on the format of man pages. See also man -s5 man on Solaris or man 7 man on Linux for a list of standard man page macros (including the macros discussed above), and see man tbl for macros that can be used for formatting tables.

1 comment:

  1. A nice post about a great, old capability of Unix - in fact, one of the capabilities that made Unix great. Other OS's of the time did not have succinct, usable manuals, especially online.

    Lots of folk criticized the manual pages for being obtuse and unusable, but for the target audience, they were perfect.

    They also fit very nicely with the Unix Philosophy, now mostly forgotten, I'm afraid.

    Anyway, good work!

    ReplyDelete