summaryrefslogtreecommitdiff
path: root/docs/re_format.7
diff options
context:
space:
mode:
authormike-m <mikem.llvm@gmail.com>2010-05-06 23:45:43 +0000
committermike-m <mikem.llvm@gmail.com>2010-05-06 23:45:43 +0000
commit68cb31901c590cabceee6e6356d62c84142114cb (patch)
tree6444bddc975b662fbe47d63cd98a7b776a407c1a /docs/re_format.7
parentc26ae5ab7e2d65b67c97524e66f50ce86445dec7 (diff)
downloadllvm-68cb31901c590cabceee6e6356d62c84142114cb.tar.gz
llvm-68cb31901c590cabceee6e6356d62c84142114cb.tar.bz2
llvm-68cb31901c590cabceee6e6356d62c84142114cb.tar.xz
Overhauled llvm/clang docs builds. Closes PR6613.
NOTE: 2nd part changeset for cfe trunk to follow. *** PRE-PATCH ISSUES ADDRESSED - clang api docs fail build from objdir - clang/llvm api docs collide in install PREFIX/ - clang/llvm main docs collide in install - clang/llvm main docs have full of hard coded destination assumptions and make use of absolute root in static html files; namely CommandGuide tools hard codes a website destination for cross references and some html cross references assume website root paths *** IMPROVEMENTS - bumped Doxygen from 1.4.x -> 1.6.3 - splits llvm/clang docs into 'main' and 'api' (doxygen) build trees - provide consistent, reliable doc builds for both main+api docs - support buid vs. install vs. website intentions - support objdir builds - document targets with 'make help' - correct clean and uninstall operations - use recursive dir delete only where absolutely necessary - added call function fn.RMRF which safeguards against botched 'rm -rf'; if any target (or any variable is evaluated) which attempts to remove any dirs which match a hard-coded 'safelist', a verbose error will be printed and make will error-stop. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@103213 91177308-0d34-0410-b5e6-96231b3b80d8
Diffstat (limited to 'docs/re_format.7')
-rw-r--r--docs/re_format.7756
1 files changed, 0 insertions, 756 deletions
diff --git a/docs/re_format.7 b/docs/re_format.7
deleted file mode 100644
index 0c0928716f..0000000000
--- a/docs/re_format.7
+++ /dev/null
@@ -1,756 +0,0 @@
-.\" $OpenBSD: re_format.7,v 1.14 2007/05/31 19:19:30 jmc Exp $
-.\"
-.\" Copyright (c) 1997, Phillip F Knaack. All rights reserved.
-.\"
-.\" Copyright (c) 1992, 1993, 1994 Henry Spencer.
-.\" Copyright (c) 1992, 1993, 1994
-.\" The Regents of the University of California. All rights reserved.
-.\"
-.\" This code is derived from software contributed to Berkeley by
-.\" Henry Spencer.
-.\"
-.\" Redistribution and use in source and binary forms, with or without
-.\" modification, are permitted provided that the following conditions
-.\" are met:
-.\" 1. Redistributions of source code must retain the above copyright
-.\" notice, this list of conditions and the following disclaimer.
-.\" 2. Redistributions in binary form must reproduce the above copyright
-.\" notice, this list of conditions and the following disclaimer in the
-.\" documentation and/or other materials provided with the distribution.
-.\" 3. Neither the name of the University nor the names of its contributors
-.\" may be used to endorse or promote products derived from this software
-.\" without specific prior written permission.
-.\"
-.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
-.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
-.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
-.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
-.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
-.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
-.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
-.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
-.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
-.\" SUCH DAMAGE.
-.\"
-.\" @(#)re_format.7 8.3 (Berkeley) 3/20/94
-.\"
-.Dd $Mdocdate: May 31 2007 $
-.Dt RE_FORMAT 7
-.Os
-.Sh NAME
-.Nm re_format
-.Nd POSIX regular expressions
-.Sh DESCRIPTION
-Regular expressions (REs),
-as defined in
-.St -p1003.1-2004 ,
-come in two forms:
-basic regular expressions
-(BREs)
-and extended regular expressions
-(EREs).
-Both forms of regular expressions are supported
-by the interfaces described in
-.Xr regex 3 .
-Applications dealing with regular expressions
-may use one or the other form
-(or indeed both).
-For example,
-.Xr ed 1
-uses BREs,
-whilst
-.Xr egrep 1
-talks EREs.
-Consult the manual page for the specific application to find out which
-it uses.
-.Pp
-POSIX leaves some aspects of RE syntax and semantics open;
-.Sq **
-marks decisions on these aspects that
-may not be fully portable to other POSIX implementations.
-.Pp
-This manual page first describes regular expressions in general,
-specifically extended regular expressions,
-and then discusses differences between them and basic regular expressions.
-.Sh EXTENDED REGULAR EXPRESSIONS
-An ERE is one** or more non-empty**
-.Em branches ,
-separated by
-.Sq \*(Ba .
-It matches anything that matches one of the branches.
-.Pp
-A branch is one** or more
-.Em pieces ,
-concatenated.
-It matches a match for the first, followed by a match for the second, etc.
-.Pp
-A piece is an
-.Em atom
-possibly followed by a single**
-.Sq * ,
-.Sq + ,
-.Sq ?\& ,
-or
-.Em bound .
-An atom followed by
-.Sq *
-matches a sequence of 0 or more matches of the atom.
-An atom followed by
-.Sq +
-matches a sequence of 1 or more matches of the atom.
-An atom followed by
-.Sq ?\&
-matches a sequence of 0 or 1 matches of the atom.
-.Pp
-A bound is
-.Sq {
-followed by an unsigned decimal integer,
-possibly followed by
-.Sq ,\&
-possibly followed by another unsigned decimal integer,
-always followed by
-.Sq } .
-The integers must lie between 0 and
-.Dv RE_DUP_MAX
-(255**) inclusive,
-and if there are two of them, the first may not exceed the second.
-An atom followed by a bound containing one integer
-.Ar i
-and no comma matches
-a sequence of exactly
-.Ar i
-matches of the atom.
-An atom followed by a bound
-containing one integer
-.Ar i
-and a comma matches
-a sequence of
-.Ar i
-or more matches of the atom.
-An atom followed by a bound
-containing two integers
-.Ar i
-and
-.Ar j
-matches a sequence of
-.Ar i
-through
-.Ar j
-(inclusive) matches of the atom.
-.Pp
-An atom is a regular expression enclosed in
-.Sq ()
-(matching a part of the regular expression),
-an empty set of
-.Sq ()
-(matching the null string)**,
-a
-.Em bracket expression
-(see below),
-.Sq .\&
-(matching any single character),
-.Sq ^
-(matching the null string at the beginning of a line),
-.Sq $
-(matching the null string at the end of a line),
-a
-.Sq \e
-followed by one of the characters
-.Sq ^.[$()|*+?{\e
-(matching that character taken as an ordinary character),
-a
-.Sq \e
-followed by any other character**
-(matching that character taken as an ordinary character,
-as if the
-.Sq \e
-had not been present**),
-or a single character with no other significance (matching that character).
-A
-.Sq {
-followed by a character other than a digit is an ordinary character,
-not the beginning of a bound**.
-It is illegal to end an RE with
-.Sq \e .
-.Pp
-A bracket expression is a list of characters enclosed in
-.Sq [] .
-It normally matches any single character from the list (but see below).
-If the list begins with
-.Sq ^ ,
-it matches any single character
-.Em not
-from the rest of the list
-(but see below).
-If two characters in the list are separated by
-.Sq - ,
-this is shorthand for the full
-.Em range
-of characters between those two (inclusive) in the
-collating sequence, e.g.\&
-.Sq [0-9]
-in ASCII matches any decimal digit.
-It is illegal** for two ranges to share an endpoint, e.g.\&
-.Sq a-c-e .
-Ranges are very collating-sequence-dependent,
-and portable programs should avoid relying on them.
-.Pp
-To include a literal
-.Sq ]\&
-in the list, make it the first character
-(following a possible
-.Sq ^ ) .
-To include a literal
-.Sq - ,
-make it the first or last character,
-or the second endpoint of a range.
-To use a literal
-.Sq -
-as the first endpoint of a range,
-enclose it in
-.Sq [.
-and
-.Sq .]
-to make it a collating element (see below).
-With the exception of these and some combinations using
-.Sq [
-(see next paragraphs),
-all other special characters, including
-.Sq \e ,
-lose their special significance within a bracket expression.
-.Pp
-Within a bracket expression, a collating element
-(a character,
-a multi-character sequence that collates as if it were a single character,
-or a collating-sequence name for either)
-enclosed in
-.Sq [.
-and
-.Sq .]
-stands for the sequence of characters of that collating element.
-The sequence is a single element of the bracket expression's list.
-A bracket expression containing a multi-character collating element
-can thus match more than one character,
-e.g. if the collating sequence includes a
-.Sq ch
-collating element,
-then the RE
-.Sq [[.ch.]]*c
-matches the first five characters of
-.Sq chchcc .
-.Pp
-Within a bracket expression, a collating element enclosed in
-.Sq [=
-and
-.Sq =]
-is an equivalence class, standing for the sequences of characters
-of all collating elements equivalent to that one, including itself.
-(If there are no other equivalent collating elements,
-the treatment is as if the enclosing delimiters were
-.Sq [.
-and
-.Sq .] . )
-For example, if
-.Sq x
-and
-.Sq y
-are the members of an equivalence class,
-then
-.Sq [[=x=]] ,
-.Sq [[=y=]] ,
-and
-.Sq [xy]
-are all synonymous.
-An equivalence class may not** be an endpoint of a range.
-.Pp
-Within a bracket expression, the name of a
-.Em character class
-enclosed
-in
-.Sq [:
-and
-.Sq :]
-stands for the list of all characters belonging to that class.
-Standard character class names are:
-.Bd -literal -offset indent
-alnum digit punct
-alpha graph space
-blank lower upper
-cntrl print xdigit
-.Ed
-.Pp
-These stand for the character classes defined in
-.Xr ctype 3 .
-A locale may provide others.
-A character class may not be used as an endpoint of a range.
-.Pp
-There are two special cases** of bracket expressions:
-the bracket expressions
-.Sq [[:<:]]
-and
-.Sq [[:>:]]
-match the null string at the beginning and end of a word, respectively.
-A word is defined as a sequence of
-characters starting and ending with a word character
-which is neither preceded nor followed by
-word characters.
-A word character is an
-.Em alnum
-character (as defined by
-.Xr ctype 3 )
-or an underscore.
-This is an extension,
-compatible with but not specified by POSIX,
-and should be used with
-caution in software intended to be portable to other systems.
-.Pp
-In the event that an RE could match more than one substring of a given
-string,
-the RE matches the one starting earliest in the string.
-If the RE could match more than one substring starting at that point,
-it matches the longest.
-Subexpressions also match the longest possible substrings, subject to
-the constraint that the whole match be as long as possible,
-with subexpressions starting earlier in the RE taking priority over
-ones starting later.
-Note that higher-level subexpressions thus take priority over
-their lower-level component subexpressions.
-.Pp
-Match lengths are measured in characters, not collating elements.
-A null string is considered longer than no match at all.
-For example,
-.Sq bb*
-matches the three middle characters of
-.Sq abbbc ;
-.Sq (wee|week)(knights|nights)
-matches all ten characters of
-.Sq weeknights ;
-when
-.Sq (.*).*
-is matched against
-.Sq abc ,
-the parenthesized subexpression matches all three characters;
-and when
-.Sq (a*)*
-is matched against
-.Sq bc ,
-both the whole RE and the parenthesized subexpression match the null string.
-.Pp
-If case-independent matching is specified,
-the effect is much as if all case distinctions had vanished from the
-alphabet.
-When an alphabetic that exists in multiple cases appears as an
-ordinary character outside a bracket expression, it is effectively
-transformed into a bracket expression containing both cases,
-e.g.\&
-.Sq x
-becomes
-.Sq [xX] .
-When it appears inside a bracket expression,
-all case counterparts of it are added to the bracket expression,
-so that, for example,
-.Sq [x]
-becomes
-.Sq [xX]
-and
-.Sq [^x]
-becomes
-.Sq [^xX] .
-.Pp
-No particular limit is imposed on the length of REs**.
-Programs intended to be portable should not employ REs longer
-than 256 bytes,
-as an implementation can refuse to accept such REs and remain
-POSIX-compliant.
-.Pp
-The following is a list of extended regular expressions:
-.Bl -tag -width Ds
-.It Ar c
-Any character
-.Ar c
-not listed below matches itself.
-.It \e Ns Ar c
-Any backslash-escaped character
-.Ar c
-matches itself.
-.It \&.
-Matches any single character that is not a newline
-.Pq Sq \en .
-.It Bq Ar char-class
-Matches any single character in
-.Ar char-class .
-To include a
-.Ql \&]
-in
-.Ar char-class ,
-it must be the first character.
-A range of characters may be specified by separating the end characters
-of the range with a
-.Ql - ;
-e.g.\&
-.Ar a-z
-specifies the lower case characters.
-The following literal expressions can also be used in
-.Ar char-class
-to specify sets of characters:
-.Bd -unfilled -offset indent
-[:alnum:] [:cntrl:] [:lower:] [:space:]
-[:alpha:] [:digit:] [:print:] [:upper:]
-[:blank:] [:graph:] [:punct:] [:xdigit:]
-.Ed
-.Pp
-If
-.Ql -
-appears as the first or last character of
-.Ar char-class ,
-then it matches itself.
-All other characters in
-.Ar char-class
-match themselves.
-.Pp
-Patterns in
-.Ar char-class
-of the form
-.Eo [.
-.Ar col-elm
-.Ec .]\&
-or
-.Eo [=
-.Ar col-elm
-.Ec =]\& ,
-where
-.Ar col-elm
-is a collating element, are interpreted according to
-.Xr setlocale 3
-.Pq not currently supported .
-.It Bq ^ Ns Ar char-class
-Matches any single character, other than newline, not in
-.Ar char-class .
-.Ar char-class
-is defined as above.
-.It ^
-If
-.Sq ^
-is the first character of a regular expression, then it
-anchors the regular expression to the beginning of a line.
-Otherwise, it matches itself.
-.It $
-If
-.Sq $
-is the last character of a regular expression,
-it anchors the regular expression to the end of a line.
-Otherwise, it matches itself.
-.It [[:<:]]
-Anchors the single character regular expression or subexpression
-immediately following it to the beginning of a word.
-.It [[:>:]]
-Anchors the single character regular expression or subexpression
-immediately following it to the end of a word.
-.It Pq Ar re
-Defines a subexpression
-.Ar re .
-Any set of characters enclosed in parentheses
-matches whatever the set of characters without parentheses matches
-(that is a long-winded way of saying the constructs
-.Sq (re)
-and
-.Sq re
-match identically).
-.It *
-Matches the single character regular expression or subexpression
-immediately preceding it zero or more times.
-If
-.Sq *
-is the first character of a regular expression or subexpression,
-then it matches itself.
-The
-.Sq *
-operator sometimes yields unexpected results.
-For example, the regular expression
-.Ar b*
-matches the beginning of the string
-.Qq abbb
-(as opposed to the substring
-.Qq bbb ) ,
-since a null match is the only leftmost match.
-.It +
-Matches the singular character regular expression
-or subexpression immediately preceding it
-one or more times.
-.It ?
-Matches the singular character regular expression
-or subexpression immediately preceding it
-0 or 1 times.
-.Sm off
-.It Xo
-.Pf { Ar n , m No }\ \&
-.Pf { Ar n , No }\ \&
-.Pf { Ar n No }
-.Xc
-.Sm on
-Matches the single character regular expression or subexpression
-immediately preceding it at least
-.Ar n
-and at most
-.Ar m
-times.
-If
-.Ar m
-is omitted, then it matches at least
-.Ar n
-times.
-If the comma is also omitted, then it matches exactly
-.Ar n
-times.
-.It \*(Ba
-Used to separate patterns.
-For example,
-the pattern
-.Sq cat\*(Badog
-matches either
-.Sq cat
-or
-.Sq dog .
-.El
-.Sh BASIC REGULAR EXPRESSIONS
-Basic regular expressions differ in several respects:
-.Bl -bullet -offset 3n
-.It
-.Sq \*(Ba ,
-.Sq + ,
-and
-.Sq ?\&
-are ordinary characters and there is no equivalent
-for their functionality.
-.It
-The delimiters for bounds are
-.Sq \e{
-and
-.Sq \e} ,
-with
-.Sq {
-and
-.Sq }
-by themselves ordinary characters.
-.It
-The parentheses for nested subexpressions are
-.Sq \e(
-and
-.Sq \e) ,
-with
-.Sq (
-and
-.Sq )\&
-by themselves ordinary characters.
-.It
-.Sq ^
-is an ordinary character except at the beginning of the
-RE or** the beginning of a parenthesized subexpression.
-.It
-.Sq $
-is an ordinary character except at the end of the
-RE or** the end of a parenthesized subexpression.
-.It
-.Sq *
-is an ordinary character if it appears at the beginning of the
-RE or the beginning of a parenthesized subexpression
-(after a possible leading
-.Sq ^ ) .
-.It
-Finally, there is one new type of atom, a
-.Em back-reference :
-.Sq \e
-followed by a non-zero decimal digit
-.Ar d
-matches the same sequence of characters matched by the
-.Ar d Ns th
-parenthesized subexpression
-(numbering subexpressions by the positions of their opening parentheses,
-left to right),
-so that, for example,
-.Sq \e([bc]\e)\e1
-matches
-.Sq bb\&
-or
-.Sq cc
-but not
-.Sq bc .
-.El
-.Pp
-The following is a list of basic regular expressions:
-.Bl -tag -width Ds
-.It Ar c
-Any character
-.Ar c
-not listed below matches itself.
-.It \e Ns Ar c
-Any backslash-escaped character
-.Ar c ,
-except for
-.Sq { ,
-.Sq } ,
-.Sq \&( ,
-and
-.Sq \&) ,
-matches itself.
-.It \&.
-Matches any single character that is not a newline
-.Pq Sq \en .
-.It Bq Ar char-class
-Matches any single character in
-.Ar char-class .
-To include a
-.Ql \&]
-in
-.Ar char-class ,
-it must be the first character.
-A range of characters may be specified by separating the end characters
-of the range with a
-.Ql - ;
-e.g.\&
-.Ar a-z
-specifies the lower case characters.
-The following literal expressions can also be used in
-.Ar char-class
-to specify sets of characters:
-.Bd -unfilled -offset indent
-[:alnum:] [:cntrl:] [:lower:] [:space:]
-[:alpha:] [:digit:] [:print:] [:upper:]
-[:blank:] [:graph:] [:punct:] [:xdigit:]
-.Ed
-.Pp
-If
-.Ql -
-appears as the first or last character of
-.Ar char-class ,
-then it matches itself.
-All other characters in
-.Ar char-class
-match themselves.
-.Pp
-Patterns in
-.Ar char-class
-of the form
-.Eo [.
-.Ar col-elm
-.Ec .]\&
-or
-.Eo [=
-.Ar col-elm
-.Ec =]\& ,
-where
-.Ar col-elm
-is a collating element, are interpreted according to
-.Xr setlocale 3
-.Pq not currently supported .
-.It Bq ^ Ns Ar char-class
-Matches any single character, other than newline, not in
-.Ar char-class .
-.Ar char-class
-is defined as above.
-.It ^
-If
-.Sq ^
-is the first character of a regular expression, then it
-anchors the regular expression to the beginning of a line.
-Otherwise, it matches itself.
-.It $
-If
-.Sq $
-is the last character of a regular expression,
-it anchors the regular expression to the end of a line.
-Otherwise, it matches itself.
-.It [[:<:]]
-Anchors the single character regular expression or subexpression
-immediately following it to the beginning of a word.
-.It [[:>:]]
-Anchors the single character regular expression or subexpression
-immediately following it to the end of a word.
-.It \e( Ns Ar re Ns \e)
-Defines a subexpression
-.Ar re .
-Subexpressions may be nested.
-A subsequent backreference of the form
-.Pf \e Ns Ar n ,
-where
-.Ar n
-is a number in the range [1,9], expands to the text matched by the
-.Ar n Ns th
-subexpression.
-For example, the regular expression
-.Ar \e(.*\e)\e1
-matches any string consisting of identical adjacent substrings.
-Subexpressions are ordered relative to their left delimiter.
-.It *
-Matches the single character regular expression or subexpression
-immediately preceding it zero or more times.
-If
-.Sq *
-is the first character of a regular expression or subexpression,
-then it matches itself.
-The
-.Sq *
-operator sometimes yields unexpected results.
-For example, the regular expression
-.Ar b*
-matches the beginning of the string
-.Qq abbb
-(as opposed to the substring
-.Qq bbb ) ,
-since a null match is the only leftmost match.
-.Sm off
-.It Xo
-.Pf \e{ Ar n , m No \e}\ \&
-.Pf \e{ Ar n , No \e}\ \&
-.Pf \e{ Ar n No \e}
-.Xc
-.Sm on
-Matches the single character regular expression or subexpression
-immediately preceding it at least
-.Ar n
-and at most
-.Ar m
-times.
-If
-.Ar m
-is omitted, then it matches at least
-.Ar n
-times.
-If the comma is also omitted, then it matches exactly
-.Ar n
-times.
-.El
-.Sh SEE ALSO
-.Xr ctype 3 ,
-.Xr regex 3
-.Sh STANDARDS
-.St -p1003.1-2004 :
-Base Definitions, Chapter 9 (Regular Expressions).
-.Sh BUGS
-Having two kinds of REs is a botch.
-.Pp
-The current POSIX spec says that
-.Sq )\&
-is an ordinary character in the absence of an unmatched
-.Sq ( ;
-this was an unintentional result of a wording error,
-and change is likely.
-Avoid relying on it.
-.Pp
-Back-references are a dreadful botch,
-posing major problems for efficient implementations.
-They are also somewhat vaguely defined
-(does
-.Sq a\e(\e(b\e)*\e2\e)*d
-match
-.Sq abbbd ? ) .
-Avoid using them.
-.Pp
-POSIX's specification of case-independent matching is vague.
-The
-.Dq one case implies all cases
-definition given above
-is the current consensus among implementors as to the right interpretation.
-.Pp
-The syntax for word boundaries is incredibly ugly.