Writing a Real C++ Program – Part 3

August 2, 2011

This is the third instalment in a series of C++ programming tutorials that started here.

Introduction

In the previous instalment, the word dictionary you created was instantiated by reading its contents from a file. File input/output is one of the most important aspects of real C++ programming, so I propose to provide here a short general tutorial, covering things that introductory text books tend to gloss over. I’ll then go on to look at error reporting, where you can get real data to populate the dictionary, and consider if what you have written so far is fast enough.

Text versus Binary Data

Wherever possible, you should make your C++ programs deal with textual data (i.e. data you could create with your favourite text editor). Doing so has a lot of advantages – you can read the data easily, tests are easy to write, and you can edit it if there is a problem. However, sometimes you need to use binary data, which is just a bunch of bytes. If you edit this with a text editor, you will see a lot of weird characters, and if you edit and save , you stand a very high chance of damaging the data. Binary data is used when you want to write something like a structure to a file as a blob of bytes, and then read it back in again.

Streams

The basic tool for implementing input and output in C++ is the stream. Most beginner C++ programs are somewhat familiar with the cin and cout streams, provided by the standard library, but most are not really familiar with how to use them.

The first thing you need to forget from your introductory text is any idea of using formatted input from cin via the >> extraction operator. Code like this has no place in a real C++ program:

int n;
cin >> n;

There are any number of reasons why this code is a bad idea, but the main one is that you cannot guarantee what the type of the next thing in the stream is, and if you get it wrong, error recovery and reporting can be difficult to say the least. In all cases, real C++ programs that read text data (the rules for binary data are somewhat different) should read data without expecting specific types, and then parse it. In outline, your text-processing code should look something like this

string line;
while( getline( cin, line ) )  {
    parse( line );
}

Here, parse() is a function you have written which deals with lines of input, splitting them up and checking the types of their components. You’ll need a parser when you come to extract words from user submissions later in the project. You didn’t need a parser for your dictionary reading code as effectively getline() itself is a parser for parsing text into lines, which luckily is exactly what we wanted.

The code above is the canonical form for reading text in C++, where "canonical" means "do it this way unless you have a very, very good reason not to." Unfortunately, many beginner C++ programmers believe that this is the right way to do things:

string line;
while( ! cin.eof() ) {
    getline( cin, line );
    // do stuff with line
}

This actually looks sensible, but unfortunately it is not. The eof() function returns the end-of-file state of a stream after an input operation has taken place – it does not predict what the result of the next input operation will be. To see why eof() doesn’t work, consider this code, which prints lines with line numbers:

string line;
int n = 1;
while( ! cin.eof() ) {
    getline( cin, line );
    cout << n++ << " " << line << "\n";
}

Now, consider what happens if you hand this code an empty file (i.e. one with a zero size, containing no data whatsoever.) The test for end-of-file passes, as nothing has been read yet, and so the loop is entered. Then the call to getline() fails, but as this failure is never tested for, nothing happens. And then line number and line are output – a line which does not exist in the input file! The code should have been written as:

string line;
int n = 1;
while( getline( cin, line ) ) {
    cout << n++ << " " << line << "\n";
}

which for an empty file outputs the correct results – nothing.

To summarise the above – never use use eof() to control a read-loop (it is OK to use it to test for the condition which may have caused the loop to terminate, but do this outside of the loop) and always test the results of every single input operation. You should also normally test every file output operation, and report an error if one occurs, although output is typically far less prone to failure than is input.

All of this begs the question – how does this test actually work:

while( getline( cin, line ) ) {

If you were to look at the definition of getline() in the Standard Library, you would see that it is implemented something like this:

istream & getline( istream & is, const string & s ) {
    // read input into 's' from 'is' somehow
    return is;
}

In other words, getline() returns the stream that is its first parameter, so the while loop is testing the stream after the read operation. How is it doing that? Well, the compiler knows that while() needs a built-n type (integer, pointer, bool etc.) and will look to see if the thing actually being used has a conversion operator to such a type. As it happens, input streams have a conversion to void pointer that looks something like this:

class istream {
    ...
    operator void *() {
        if ( eof() || other-failure-conditions ) {
            return 0;
        }
        else {
            return this;
        }
    }
};

So if the previous operation (in this case getline()) caused an end-of-file condition to be set, the type conversion operator returns zero, and the while() test fails.

File Streams

As well as the cin and cout streams linked to standard input and output, C++ also provides streams that can be used with named files. When performing text processing, you will almost always want to read files using an ifstream and wrote them using an ofstream. There is a third stream type, fstream, which can be used for both reads and writes, but this is rarely used for text processing.

The constructor for file stream objects can take the name of the file to open as a parameter. For example, this code opens an output stream:

ofstream out( "results.txt" );

It is always a good idea to check that opening the stream actually did work, as it is an operation that is quite prone to failure:

ofstream out( "results.txt" );
if ( ! out.is_open() ) {
    // report error somehow
}

The constructor parameter must be a char *, and this leads to one of the more irritating syntax errors your C++ compiler can produce. This code (note the use of a string containing the file name):

string fname = "results.txt";
ofstream out( fname );

produces this error message:

error: no matching function for call to 
  'basic_ofstream<char>::basic_ofstream(string&)'

You can stare at the offending code for a long time before you remember that ofstream is a typedef for basic_ofstream<char>, that the stream constructor parameter has to be a char *, and that C++ provides no automatic conversion from string to char *. You need to rewrite the code:

string fname = "results.txt";
ofstream out( fname.c_str() );

This is a major annoyance, and one that bites me personally at least once a week, but I’m afraid you have to live with it.

Back To The Code

Let’s apply the guidelines above to the code you have written so far. Your dictionary constructor currently looks like this:

Dictionary( const std::string & fname ) {
    std::ifstream wlist( fname.c_str() );
    std::string word;
    while( std::getline( wlist, word ) ) {
        mWords.insert( word );
    }
}

You need to test that the file was opened correctly, and that the read-loop actually fails because of an end-of-file, and not for some other reason. Applying what you now know (or have been reminded of) about file I/O, change your code:

Dictionary( const std::string & fname ) {
    std::ifstream wlist( fname.c_str() );
    if ( ! wlist.is_open() ) {
        // report open error
    }
    std::string word;
    while( std::getline( wlist, word ) ) {
        mWords.insert( word );
    }
    if ( ! wlist.eof() ) {
        // report read error
    }
}

Ah – slight snag! How are you going to report an error? Well, C++ gives you several alternatives. You could:

return a special error code
set a flag in an object that can be tested to see if an error ocurred
set a global error number, as in C
throw an exception

I’m going to suggest that only the last is a general solution (you can’t for example return error codes from object construction) and that throwing exceptions should be the normal error handling strategy for a C++ program. Some people may dispute this, citing all sorts of factors from performance to difficulty of design, but my experience is that using exceptions leads to the cleanest clearest code, and causes few performance problems. Note that I am only talking about dealing with errors here – one should not use exceptions as a general value return or flow control mechanism.

Throwing Exceptions

So if you are going to throw exceptions, what kind of exceptions are they going to be? Well, you are spoiled for choice here – in C++ you can throw objects of any type that takes your fancy. One quick and dirty to our solution would be to throw character pointers:

if ( ! wlist.is_open() ) { 
    throw "could not open file";
}

This will work, but it has a couple of problems. First, building up the error message is difficult (in fact it’s impossible to do safely) and secondly you can’t differentiate your exceptions from other people’s. Better practice is to derive your own exception class from one of the C++ Standard Library exceptions. The one you should normally derive from is runtime_error, which implements error messages. To create your own exception type, add a new header file called error.h to your inc directory:

#ifndef INC_ERROR_H
#define INC_ERROR_H
#include <stdexcept>
#include <string>
class ScheckError : public std::runtime_error {
   public:
      ScheckError( const std::string & emsg ) 
         : std::runtime_error( emsg ) {
      }
};
#endif

Things to note here:

As with the dictionary header, you need to provide include guards.
The runtime_error class is defined in <stdexcept> header, not in <exception> (I always forget this.)
You need to pass the error message from the derived ScheckError class to the base runtime_error class using an initialisation list. If you are not sure what an initialisation list is, you may need to refer back to your C++ text book.

You can now modify the dictionary class to include this header and use it to report the errors:

Dictionary( const std::string & fname ) {
    std::ifstream wlist( fname.c_str() );
    if ( ! wlist.is_open() ) {
        throw ScheckError( "Could not open dictionary file " + fname );
    }
    std::string word;
    while( std::getline( wlist, word ) ) {
        mWords.insert( word );
    }
    if ( ! wlist.eof() ) {
        throw ScheckError( "Error reading dictionary file " + fname );
    }
}

Before you add the exception handling code, it’s instructive to see what happens if you don’t handle them. Change your main.cpp file so that the dictionary creation code looks like this:

Dictionary d( "data/not-there.dat" );

and recompile:

g++ -I inc src/main.cpp -o bin/scheck

If you run your program now, the dictionary creation code should throw an exception, because the file open will fail. Exactly what happens after that will depend on your C++ implementation. The C++ Standard says that if an exception is not handled, the Standard C++ function terminate() must be called, but exactly what (if any) diagnostics terminate() produces is not specified. On my GCC implementation on Windows, I get:

terminate called after throwing an instance of 'ScheckError'

and then the program crashes, with several operating system error messages.

Obviously, it would be nicer if the program exited on error in a more controlled manner. You need to add exception handling to your program.

Exception Handling

The general rule of exception handling is that you handle exceptions as far away from the throw-site (where the exception originated) as is possible. In other words, you should catch exceptions, which are typically thrown in functions at low levels of abstraction, at the highest level of abstraction possible. This is because it is only at the higher levels that your program will know what to do with with exception – for example, how to report it, whether it is a fatal error, how to recover from it etc. You should normally not catch exceptions in the code that directly calls the function that threw the exception – in that case using a return value from the function to indicate the error might be a better bet. And you should always provide exception handling in main() to catch any exceptions that escape from the rest of the application.

In your case, main() is in fact the only obvious place to put the exception handling code. You can make this quite ornate, or quite simple – I’d suggest the latter. catch the exceptions you expect your application to throw (ScheckErrors), and then everything else. Write diagnostic messages to the standard error stream, and return a fail status from main:

int main() {
    try {
      ... previous contents of main() now go here ...
    }
    catch( const ScheckError & e ) {
        cerr << "Error: " << e.what() << endl;
        return 1;
    }
    catch( ... ) {
        cerr << "Error: unknown exception" << endl;
        return 2;
    }
}

Note that exceptions should always be thrown by value (don’t create them with new) and always caught by const reference. This avoids all sorts of problems with memory management and makes sure that virtual functions in the exception hierarchy work properly. Exceptions should also always be caught in a well-defined order, with the most derived exception types being caught first. If you don’t do this, your specialised exception types will appear not to be caught.

You may not have realised it, but your ScheckError class has a virtual member function what() inherited from the std::exception base class. It returns a const char * pointing to the error message the function contains, if the actual exception type supports error messages.

With this exception handling in place, your code should now produce much more civilised error messages:

Error: Could not open dictionary file data/not-there.dat

Getting Real Data

Having (at least for the moment) got error handling sorted, it’s time to turn to getting some real word lists for the dictionary – the "quick brown fox" stuff is not going to impress your boss! I did a bit of Googling, and found this site, which provides very complete English (US and UK) lists, suitable to be used directly by Scheck. I suggest you download the Scowl zip file featured there and get familiar with its contents. I’m not going to repeat the Scowl documentation here, but you need to use the mk-list Perl script to build your word lists. And if you haven’t got Perl installed on your development system, you really should do so. I used it to generate a 3.5Mb word list to use for performance testing.

One thing that worried me about using a std::set as the basis for the dictionary was the speed at which words could be added to it from a file (I was very confident that search speed would not be a problem). With the 3.5Mb word list I generated, I found that I could populate the dictionary in about 1.6 seconds on my less than state-of-the-art laptop. I would class this at the low end of acceptable, but still as acceptable. However, later in the series I will look at ways of speeding this up.

Conclusion

That’s it for this instalment. Hopefully you have learned:

How to read text files – don’t use eof() to control read-loops.
You should always check all file I/O operations for errors.
Exceptions are the best general way of handling errors in C++ programs.
Your dictionary implementation looks as if is going to be fast enough.

Coming next: Parsing words from the submission data.

Sources for this and all other tutorials in the series available here.

From → c++, linux, tutorial, windows

One Comment

Joshua permalink

This tutorial is amazing! The pedagogy and content are both superb. Thank you for providing this.

BTW: I arrived here from r/learnprogramming. This tutorial and the one about stringstreams are the most useful things since opposable thumbs.

Reply

Writing a Real C++ Program – Part 3

Leave a reply to Joshua Cancel reply

Recent Posts

Archives

Email Subscription

Writing a Real C++ Program – Part 3

Rate this:

Share this:

Related

Leave a reply to Joshua Cancel reply

Recent Posts

Archives

Email Subscription