Chapter 1: Introduction

This manual describes flexc++, a tool which can generate lexical scanners: programs recognizing patterns in text. Usually, scanners are used in combination with parsers, which in turn can be generated, by bisonc++ for example.

Flexc++ reads a lexer file, containing rules: pairs of regular expressions and C++ code. Flexc++ then generates several files, defining a class (Scanner by default). The member function lex is used to analyze input for occurrences of the regular expressions. Whenever it finds a match, it executes the corresponding C++ code.

Flexc++ is highly comparable to the programs flex and flex++, written by Vern Paxson. The goal was to create a similar program, but to completely implement it in C++. Most flex / flex++ grammars should be usable with flexc++, with minor adjustments (see also the differences with flex / flex++ 2).

This edition of the manual documents version 0.98.00 and provides detailed information on how to use flexc++ and how flexc++ works. Some texts are adapted from the flex manual. The manual page flexc++(1) offers a quick overview of the command line options and option directives.

The most recent version of both this manual and flexc++ itself can be found at our website http://flexcpp.org/. If you find a bug in flexc++ or mistakes in the documentation, please report it.

Flexc++ was designed and written by Frank B. Brokken, Jean-Paul van Oosten, and (until version 0.5.3) Richard Berendsen.

1.1: Running Flexc++

Flexc++(1) was designed after flex(1) and flex++(1). Like these latter two programs flexc++ generates code performing pattern-matching on text, possibly executing actions when certain regular expressions are recognized.

Flexc++, contrary to flex and flex++, generates code that is explicitly intended for use by C++ programs. The well-known flex(1) program generates C source-code and flex++(1) merely offers a C++-like shell around the yylex function generated by flex(1) and hardly supports present-day ideas about C++ software development.

Contrary to this, flexc++ creates a C++ class offering a predefined member function lex matching input against regular expressions and possibly executing C++ code once regular expressions were matched. The code generated by flexc++ is pure C++, allowing its users to apply all of the features offered by that language.

Flexc++'s synopsis is:

flexc++ [OPTIONS] rules-file
Its options are covered in section 1.1.1, the format of its rules-file is discussed in chapter 3.

1.1.1: Flexc++ options

If available, single letter options are listed between parentheses following their associated long-option variants. Single letter options require arguments if their associated long options require arguments as well.

1.2: Some simple examples

1.2.1: A simple lexer file and main function

The following lexer file detects identifiers:

%%
[_a-zA-Z][_a-zA-Z0-9]* return 1;

The main() function below defines a local Scanner object, and calls lex() as long as it does not return 0. lex() will return 0 if the end of the input stream is reached. (By default std::cin will be used).

#include <iostream>
#include "scanner.h"

using namespace std;

int main()
{
	Scanner scanner;
	while (scanner.lex())
		cout << "[Identifier: " << scanner.match() << "]";

	return 0;
}

Each identifier on the input stream is replaced by itself and some surrounding text. By default, flexc++ echoes all characters it cannot match to cout. If you do not want this, simply use the following pattern:

%%
[_a-zA-Z][_a-zA-Z0-9]*		return 1;
.|\n						// ignore

The second pattern will cause flexc++ to ignore all characters on the input stream. The first pattern will still match all identifiers, even those that consist of only one letter. But everything else is ignored. The second pattern has no associated action, and that is precisely what happens in lex: nothing. The stream is simply scanned for more characters.

It is also possible to let the generated lexer do all the work. The simple lexer below will print out all identifiers itself.

%%
[_a-zA-Z][_a-zA-Z0-9]*		{
	std::cout << "[Identifier: " << match() << "]\n";
}

.|\n						// ignore

Note how a compound statement may be used instead of a one line statement at the end of the line. The opening bracket must appear on the same line as the pattern, however. Also note that inside an action, we can use members of Scanner. match() contains the token that was last matched. And below is a main() function that is used with the generated scanner.

#include "scanner.h"

int main()
{
	Scanner scanner;
	scanner.lex();

	return 0;
}

Note how simple the main function is. Scanner::lex() does not return until the entire input stream has been processed, because none of the patterns has an associated action with a return statement.

1.2.2: An interactive scanner supporting command-line editing

The flexc++(1) manual page contains an example of an interactive scanner. Let's add command-line editing and command-line history to that scanner.

Command-line editing and history is provided by the Gnu readline library. The bobcat library offers a class FBB::ReadLineStream encapsulating Gnu's readline library's facilities, and this class is used to implement the requested features.

The lexical scanner is a simple one. It recognizes C++ identifiers and \n characters, and ignores all other characters. Here is its specification:


%class-name Scanner
%interactive
%%
[[:alpha:]_][[:alnum:]_]*   return 1;
\n                          return '\n';
.
    
Create the lexical scanner from this specification file:

    flexc++ lexer
        

Assuming that the directory containing the specification file also contains the file main.cc whose implementation is shown below, then execute the following command to create the interactive scanner program:


    g++ --std=c++0x *.cc -lbobcat
        
This completes the construction of the interactive scanner. Here is main.cc's source:

#include <iostream>
#include <bobcat/readlinestream>

#include "scanner.h"

using namespace std;
using namespace FBB;

int main()
{
    ReadLineStream rls("? ");       // create the ReadLineStream, using "? "
                                    // as a prompt before each line
                                    
    Scanner scanner(rls);           // pass `rls' to the interactive scanner

                                    // process all the line's tokens
                                    // (the prompt is provided by `rls')
    while (int token = scanner.lex())
    {                                   
        if (token == '\n')          // end of line: new prompt
            continue;
                                    // process other tokens
        cout << scanner.matched() << '\n';
        if (scanner.matched()[0] == 'q')
            return 0;
    }
}
    
Here is an example of some interaction with the program. End-of-line comment is not entered, but was added by us for documentary purposes:
   
    $ a.out
    ? hello world               // enter some words
    hello 
    world                       // echoed after pressing Enter
    ? hello world               // this is shown after pressing up-arrow
    ? hello world^H^H^Hman      // do some editing and press Enter
    hello                       // the tokens as edited are returned 
    woman
    ? q                         // end the program
    $
        
The interactive scanner only supports one constructor, by default using std::cin to read from and std::cout to write to:

    explicit Scanner(std::istream &in = std::cin,
                     std::ostream &out = std::cout);
        
Furthmore, interactive scanners only support switching output streams (through switchOstream members).