This manual describes flexc++, a tool which can generate lexical scanners: programs recognizing patterns in text. Usually, scanners are used in combination with parsers, which in turn can be generated, by bisonc++ for example.
Flexc++ reads a lexer file, containing rules: pairs of regular expressions and
C++ code. Flexc++ then generates several files, defining a class (Scanner
by default). The member function lex
is used to analyze input for
occurrences of the regular expressions. Whenever it finds a match, it executes
the corresponding C++ code.
Flexc++ is highly comparable to the programs flex and flex++, written by Vern Paxson. The goal was to create a similar program, but to completely implement it in C++. Most flex / flex++ grammars should be usable with flexc++, with minor adjustments (see also the differences with flex / flex++ 2).
This edition of the manual documents version 0.98.00 and provides detailed information on how to use flexc++ and how flexc++ works. Some texts are adapted from the flex manual. The manual page flexc++(1) offers a quick overview of the command line options and option directives.
The most recent version of both this manual and flexc++ itself can be found at our website http://flexcpp.org/. If you find a bug in flexc++ or mistakes in the documentation, please report it.
Flexc++ was designed and written by Frank B. Brokken, Jean-Paul van Oosten, and (until version 0.5.3) Richard Berendsen.
Flexc++, contrary to flex and flex++, generates code that is
explicitly intended for use by C++ programs. The well-known flex(1)
program generates C source-code and flex++(1) merely offers a
C++-like shell around the yylex
function generated by flex(1) and
hardly supports present-day ideas about C++ software development.
Contrary to this, flexc++ creates a C++ class offering a predefined member function lex matching input against regular expressions and possibly executing C++ code once regular expressions were matched. The code generated by flexc++ is pure C++, allowing its users to apply all of the features offered by that language.
Flexc++'s synopsis is:
flexc++ [OPTIONS] rules-file
Its options are covered in section 1.1.1, the format of its
rules-file is discussed in chapter 3.
header
(-b)header
as the pathname of the file containing the scanner
class's base class. Defaults to the name of the scanner class plus
base.h
skeleton
(-C)skeleton
as the pathname of the file containing the
skeleton of the scanner class's base class. Its filename defaults
to flexc++base.h
.
When this option is specified the resulting scanner does not distinguish between the following rules:
First // initial F is transformed to f first FIRST // all capitals are transformed to lower case charsWith a case-insensitive scanner only the first rule can be matched, and flexc++ will issue warnings for the second and third rule about rules that cannot be matched.
Input processed by a case-insensitive scanner is also handled case
insensitively. The above mentioned First
rule is matched for
all of the following input words: first First FIRST firST
.
Although the matching process proceeds case insensitively, the
matched text (as returned by the scanner's matched()
member)
always contains the original, unmodified text. So, with the above
input matched()
returns, respectively first, First, FIRST
and firST
, while matching the rule First
.
header
(-c)header
as the pathname of the file containing the scanner
class. Defaults to the name of the scanner class plus the suffix
.h
class
class
(rather than Scanner
) as the name of the scanner
class. Unless overridden by other options generated files will be
given the (transformed to lower case) class*
name instead of
scanner
*.
skeleton
(-C)skeleton
as the pathname of the file containing the
skeleton of the scanner class. Its filename defaults to
flexc++.h
.
`rules-file'.output
. Details cover the used character ranges,
information about the regexes, the raw NFA states, and the final
DFAs.
lex
and its support functions with debugging code,
showing the actual parsing process on the standard output
stream. When included, the debugging output is active by default,
but its activity may be controlled using the setDebug(bool
on-off)
member. Note that #ifdef DEBUG
macros are not used
anymore. By rerunning flexc++ without the --debug option an
equivalent scanner is generated not containing the debugging
code.
genericName
(-f)lex
-function source file, see the --lex-source
option for
that). By default the header file names will be equal to the name
of the generated class.
header
(-i)header
as the pathname of the file containing the
implementation header. Defaults to the name of the generated
scanner class plus the suffix .ih
. The implementation header
should contain all directives and declarations only used by
the implementations of the scanner's member functions. It is the
only header file that is included by the source file containing
lex()'s implementation . User defined implementation of
other class members may use the same convention, thus
concentrating all directives and declarations that are required
for the compilation of other source files belonging to the scanner
class in one header file.
skeleton
(-I)skeleton
as the pathname of the file containing the
skeleton of the implementation header. Its filename defaults to
flexc++.ih
.
Scanner(std::istream &in, std::ostream &out)
constructor, by default assuming that input is read from
std::cin
.
skeleton
(-L)skeleton
as the pathname of the file containing the
lex()
member function's skeleton. Its filename defaults to
flexc++.cc
.
funname
funname
rather than lex
as the name of the member
function performing the lexical scanning.
source
(-l)source
as the name of the source file containing the
scanner member function lex
. Defaults to lex.cc
.
--debug
option.
Displaying the matched rules can be suppressed by calling the
generated scanner's member setDebug(false)
(or, of course, by
re-generating the scanner without using specifying
--matched-rules
).
depth
(-m)depth
. By default the maximum depth is
set to 10. When more than depth
specification files are used
the scanner throws a Max stream stack size exceeded
std::length_error
exception.
namespace
(-n)namespace
. By default
no namespace is defined. If this options is used the
implementation header will contain a commented out using
namespace
declaration for the requested namespace.
lex
function. By default #line
directives
are entered at the beginning of the action statements in the
generated lex.cc
file, allowing the compiler and debuggers
to associate errors with lines in your grammar specification
file, rather than with the source file containing the lex
function itself.
lex
member function is
(re)written each time flexc++ is called. This option
should normally be avoided, as this file contains parsing
tables which are altered whenever the grammar definition is
modified.
This option does not result in the generated program displaying
returned tokens and matched text. If that is what you want, use
the --print-tokens
option.
lex
function are displayed on the standard output stream, just
before returning the token to lex
's caller. Displaying tokens
and matched text is suppressed again when the lex.cc
file is
generated without using this option. The function showing the
tokens (ScannerBase::print__
) is called from
Scanner::printTokens
, which is defined in-line in
scanner.h
. Calling ScannerBase::print__
, therefore, can
also easily be controlled by an option controlled by the program
using the scanner object.
This option does not show the tokens returned and text matched
by flexc++ itself when reading its input s. If that is what
you want, use the --own-tokens
option.
directory
(-S)-B -C, -H,
and -I
).
directory
--target-directory
option does not affect files that were
explicitly named (either as option or as directive).
--construction
and --show-filenames
options.
%% [_a-zA-Z][_a-zA-Z0-9]* return 1;
The main()
function below defines a local Scanner object,
and calls lex()
as long as it does not return 0.
lex()
will return 0 if the end of the input stream
is reached. (By default std::cin
will be used).
#include <iostream> #include "scanner.h" using namespace std; int main() { Scanner scanner; while (scanner.lex()) cout << "[Identifier: " << scanner.match() << "]"; return 0; }
Each identifier on the input stream is replaced by
itself and some surrounding text. By default, flexc++ echoes
all characters it cannot match to cout
. If you do not want
this, simply use the following pattern:
%% [_a-zA-Z][_a-zA-Z0-9]* return 1; .|\n // ignore
The second pattern will cause flexc++ to ignore all characters on the input stream. The first pattern will still match all identifiers, even those that consist of only one letter. But everything else is ignored. The second pattern has no associated action, and that is precisely what happens in lex: nothing. The stream is simply scanned for more characters.
It is also possible to let the generated lexer do all the work. The simple lexer below will print out all identifiers itself.
%% [_a-zA-Z][_a-zA-Z0-9]* { std::cout << "[Identifier: " << match() << "]\n"; } .|\n // ignore
Note how a compound statement may be used instead of a one line
statement at the end of the line. The opening bracket must appear
on the same line as the pattern, however. Also note that inside
an action, we can use members of Scanner. match()
contains
the token that was last matched.
And below is a main() function that is used with the generated scanner.
#include "scanner.h" int main() { Scanner scanner; scanner.lex(); return 0; }
Note how simple the main function is. Scanner::lex()
does not
return until the entire input stream has been processed, because none of
the patterns has an associated action with a return statement.
Command-line editing and history is provided by the Gnu readline library. The
bobcat library offers a class FBB::ReadLineStream
encapsulating Gnu's
readline library's facilities, and this class is used to implement the
requested features.
The lexical scanner is a simple one. It recognizes C++ identifiers and
\n
characters, and ignores all other characters. Here is its
specification:
%class-name Scanner %interactive %% [[:alpha:]_][[:alnum:]_]* return 1; \n return '\n'; .Create the lexical scanner from this specification file:
flexc++ lexer
Assuming that the directory containing the specification file also
contains the file main.cc
whose implementation is shown below, then
execute the following command to create the interactive scanner program:
g++ --std=c++0x *.cc -lbobcatThis completes the construction of the interactive scanner. Here is
main.cc
's source:
#include <iostream> #include <bobcat/readlinestream> #include "scanner.h" using namespace std; using namespace FBB; int main() { ReadLineStream rls("? "); // create the ReadLineStream, using "? " // as a prompt before each line Scanner scanner(rls); // pass `rls' to the interactive scanner // process all the line's tokens // (the prompt is provided by `rls') while (int token = scanner.lex()) { if (token == '\n') // end of line: new prompt continue; // process other tokens cout << scanner.matched() << '\n'; if (scanner.matched()[0] == 'q') return 0; } }Here is an example of some interaction with the program. End-of-line comment is not entered, but was added by us for documentary purposes:
$ a.out ? hello world // enter some words hello world // echoed after pressing Enter ? hello world // this is shown after pressing up-arrow ? hello world^H^H^Hman // do some editing and press Enter hello // the tokens as edited are returned woman ? q // end the program $The interactive scanner only supports one constructor, by default using
std::cin
to read from and std::cout
to write to:
explicit Scanner(std::istream &in = std::cin, std::ostream &out = std::cout);Furthmore, interactive scanners only support switching output streams (through
switchOstream
members).