Skip to content

Writing a Real C++ Program – Part 2

July 29, 2011

This is the second instalment in a series of C++ programming tutorials that started here.

Introduction

One of the documents you created when planning the project was a list of the various elements of the project, together with an estimate of their complexity. The reason for doing this can now be revealed – you need to know which part of the problem to attack first. In general, you should always go for the most difficult part of the problem first. There are several reasons for this; Firstly, if you can’t do the difficult part, there is simply no point in continuing with the project. Secondly, once the difficult part is done, you will have a much clearer idea how long the project as a whole is going to take. And lastly, once the difficult part is out of the way, it won’t be a nagging worry to you, always hanging over your head.

So – difficult bit first, which from your original analysis is the dictionary used to do the spell-checking. You need to think a bit about what this element of the system requires and provides:

  • Users of the dictionary (i.e. other parts of the system) need to be able to ask "is this word spelled correctly?" and get the right answer.
  • They also need to get the answer quickly!
  • Dictionaries need to be loaded with lists of words – some of these are custom lists and some are not.

These requirements raise an interesting issue that wasn’t really covered in the project specification – what exactly is a "word"?  How do you get words from the submissions? And where are you going to get the word lists to load into the dictionary from? It looks like this might be a good point to update your system design diagram:

design2

and your list of elements and issues:

Element Complexity Issues
Dictionary High Where to get content? How to search quickly? How to load quickly? Different kinds of dictionary.
Checker Moderate How to read submissions? How to get words?
Word Moderate/High What exactly is a word? How to get words out of submissions?
Reports Moderate Formatting. Need to support XML later.
Word List Moderate? Where to get lists of English and technical words from? What is word list format?
Command Line Low Maybe just use argc and argv.

 

Finding new problems like this is extremely common when developing software, so there is nothing to worry about in the fact that the list above is growing. But note that the longer list inevitably means that the project duration is going to be different from what you thought – you should re-estimate whenever the list changes.

In order that you don’t get stuck on details, it’s often necessary to make quick arbitrary decisions. So for the moment, let’s say that a word is a sequence of non-space characters, and that a word list is a file containing words, one word per line. You can always revisit these decisions later.

The Dictionary Class

To recap, you have decided that dictionaries that dictionaries can be used to look up words, and that the dictionary data is contained in files of word lists. How to implement this in C++?

C++ is a multi-paradigm language, which means that it offers lots of different ways of doing things. You could implement the dictionary as a group of functions, as  a C programmer would. You might be able to do it using functional programming using templates. Or you could create a class.

I’m going to suggest that the last option (creating a class) is in this case the right one to go for. You use classes when you have some complexity you want to hide (and the dictionary looks as if it might be quite complicated) and/or when you have several kinds of a thing that are similar, but a bit different (you’ve got the English and the Tech dictionaries). However, a class is not always the correct thing to use, and you do not design C++ programs by creating classes – classes are an implementation feature.

The interface for the Dictionary class that I suggest you use looks like this [Note – in this tutorial series I am not going to address coding standards (things like indentation style, naming conventions etc.), but I will be using a consistent standard that you are free to copy or ignore as you choose]:

#include <string>

class Dictionary {
    public:
        Dictionary( const std::string & fname );
        bool Check( const std::string & word  ) const;
};

There are a few things to notice here. Firstly, the class has no default constructor – instead, it is constructed from a filename (the name of the file containing its word list). Most classes should not have default constructors, as a default constructed object cannot normally fulfil the responsibilities of the class – in this case it would not be able to look up words. Secondly, Standard Library strings are used where one might expect character pointers to be employed. It is  almost always preferable to use std::strings (defined in the <string> header file) over character pointers in C++ code, though as we will see later in the series, there are exceptions to this. Thirdly, parameters are passed to functions by reference to avoid copying overhead. And lastly,  const is used to indicate those parameters cannot be changed, and that the Check function can be called on const Dictionary instances. These are all standard features of C++, which I don’t propose to describe in any more detail here – if you are not familiar with any of them, it would probably be a good idea to go back to your introductory C++ book and read up on them before proceeding with this tutorial.

You should put the above code in a C++ header file called dictionary.h (remember, simplest names possible for files, all lower-case file names) and save it in the inc directory of your project tree. The class needs to be in a header file because it is potentially going to be used in several places in the scheck application, and wherever it is used the declaration needs to be seen by the using code.

You can now try to use the Dictionary from the main.cpp file you created in the first tutorial. Modify main.cpp like this:

#include <iostream>
#include <string>
#include "dictionary.h"
using namespace std;

int main() {
    cout << "scheck version 0.1" << endl;
    Dictionary d( "mydict.dat" );
    string word = "dog";
    if ( d.Check( word ) ) {
        cout << word << " is OK\n";
    }
    else {
        cout << word << " is misspelt \n";
    }
}

and save it. You are now going to be taken on a short side-trip trip through some of the most common compiler error messages and their causes!

Compilation And Linker Errors

Let’s try compiling your new code with compiler command line from the previous tutorial:

 g++ src/main.cpp -o bin/scheck

Oh dear! An immediate failure, with an error which will look similar to this (exact wording will be compiler dependent, and I have trimmed all the errors somewhat for presentation purposes):

fatal error: dictionary.h: No such file or directory
compilation terminated.

The compiler is telling you it cannot find the new header file you created. Note that it has no problems finding the <iostream> and <string> system header files, only yours. This is because we have not told it where to look for your project’s headers. The compiler option to do this is -I, so this tells it to look in the inc directory:

g++ -I inc src/main.cpp -o bin/scheck

The error regarding the header file should go away (if it doesn’t you have either the header file or the directory name wrong) and be replaced with the even more intimidating:

main.cpp:(.text+0xb2): undefined reference to `Dictionary::Dictionary(std::string const&)'
main.cpp:(.text+0x12a): undefined reference to `Dictionary::Check(std::string const&)'
collect2: ld returned 1 exit status

This is actually not a compiler error message. Your code has been compiled correctly, but now the linker (which is the piece of software that actually puts together the executable, and which is called ld) is complaining that the Dictionary constructor and Check function have not been defined. Whenever you see the "undefined reference" message it means the linker cannot find all the code it needs to create the executable. There area  lot of possible causes for this error, but there is no point in staring at your code looking for syntax errors – you won’t find any!

In this case, it’s fairly obvious what the problem is – you haven’t actually defined the constructor or Check function, all you did was to declare them. You need to add the definitions to your code. For the moment, let’s make this as simple as possible:

#include <string>
class Dictionary {
    public:
        Dictionary( const std::string & fname ) {
        }
        bool Check( const std::string & word ) const {
            return false;
        }
};

You should now be able to compile your code and to run it:

$ g++ -I inc src/main.cpp -o bin/scheck
$ bin/scheck
scheck version 0.1
dog is misspelt 

Obviously, "dog" is not misspelt, but this is still encouraging progress! But before going on to make the dictionary work correctly, I want to illustrate one more error message. Modify main.cpp so your header file is included twice:

#include <iostream>
#include <string>
#include "dictionary.h"
#include "dictionary.h"
using namespace std;

... rest of main.cpp here ...

and recompile. You should get the following message:

error: redefinition of 'class Dictionary'

This happens because each translation unit (which is what the compilation of a single C++ source file is referred to as technically) can contain only a single definition of each class and/or function. By including the header twice you ended up with two definitions of the class. Now, you might think that you are not going to go around including the same header twice, but it turns out to be very easy to do this accidentally. It’s also very easy to prevent, by using include guards – modify your header file so it looks like this:

#ifndef INC_DICTIONARY_H
#define INC_DICTIONARY_H

#include <string>
class Dictionary {
    ... same as before ...
};

#endif

If you recompile, you should find the redefinition error has gone away. The include guards work like this: The first time dictionary.h is included, the macro INC_DICTIONARY_H is not defined, so the #ifndef test succeeds, and the code following the #ifndef is processed by the compiler, which results in INC_DIRECTORY_H becoming defined. Then, the next time the the header file is included, the #ifndef test fails, and the code it controls is not compiled.

You should use include guards around all your header file code. There are numerous naming conventions for the guard names, which should be based on the header file’s name, but please note that names like __DICTIONARY__ and _DICTIONARY_H_ must not be used – names that contain double underscores, or begin with an underscore and an uppercase letter are reserved for the C++ compiler’s own use.

Back To The Dictionary

It’s time to make your dictionary class actually do what it is supposed to do. You need some way of searching through a list of names efficiently. Happily, the C++ Standard Library provides many facilities for doing just that. In fact, you are rather spoiled for choice. I am going to suggest that, for this first implementation, you use a std::set. In C++, sets are collections of unique objects which, being based on balanced binary search trees, can be searched very efficiently.  This of course begs the question – how do I know what the Standard Library contains, and when to use which bit of it? I’m afraid there is only one answer to this, and that is "experience". However, you can gain this experience rather painlessly by reading an excellent book on the library – The C++ Standard Library by Nicolai Josuttis. If there was one book that every single C++ programmer should read, this is it.

Anyhow, a set it is. Sets are actually C++ templates – you need to say what kind of thing they are going to contain. In this case, you want them to contain strings, so modify your class:

#ifndef INC_DICTIONARY_H
#define INC_DICTIONARY_H

#include <string>
#include <set>

class Dictionary {
    public:
        Dictionary( const std::string & fname ) {
            mWords.insert( "dog" );
        }
        bool Check( const std::string & word ) const {
            return mWords.find( word ) != mWords.end();
        }
    private:
        std::set <std::string> mWords;
};

#endif

 

You should now be able to compile and run your code:

$ g++ -I inc src/main.cpp -o bin/scheck
$ bin/scheck
scheck version 0.1
dog is OK

Hurrah! But what does the new code do? Well, the code added to the constructor inserts the word "dog" into the set. The code in the Check function uses the set’s search function find() to look up the word passed as a parameter. If the word is not found, the find() function returns an iterator (think of an iterator as a kind of pointer) which points to the end() of the set, which is a special, unused value. If the word is found, then an iterator pointing to the found word is returned.

You now need to load the set up with a word list. The issue of where to get lists of correctly spelt words has not been addressed yet, so you will need to create your own. As a starter, I suggest creating a list which contains all the words in the phrase "the quick brown fox jumped over the lazy dog", all in lower-case, one word per line. Save this file as mydict.dat  in your projects data directory (please note that if you are a Linux user, or a user of a Linux-like system such as Cygwin or OSX, the data file provided in the downloads mentioned at the bottom of this page will not currently work for you – you must create the file from scratch yourself).

Now modify the dictionary.h header again. You need to add code to open a  C++ ifstream object, and read lines from it.  I will have more to say on this subject in the next tutorial, but for the moment, simply use this code:

#ifndef INC_DICTIONARY_H
#define INC_DICTIONARY_H

#include <string>
#include <set>
#include <fstream>

class Dictionary {
    public:
        Dictionary( const std::string & fname ) {
            std::ifstream wlist( fname.c_str() );
            std::string word;
            while( std::getline( wlist, word ) ) {
                mWords.insert( word );
            }
        }
        bool Check( const std::string & word ) const {
            return mWords.find( word ) != mWords.end();
        }
    private:
        std::set <std::string> mWords;
};

#endif

Also, change the constructor call in main.cpp:

Dictionary d( "data/mydict.dat" );

Recompile your code and run it again, you should find it still works. If it doesn’t (i.e. if "dog" is detected as misspelt) then check your the name of your word list data file, and check that you have actually spelled "dog" correctly there!

One more change, and it’s a wrap for this instalment. It would be nice to interactively test the dictionary, so modify main.cpp once more:

int main() {
    cout << "scheck version 0.1" << endl;
    Dictionary d( "data/mydict.dat" );
    string word;
    while( getline( cin, word ) ) {
        if ( d.Check( word ) ) {
            cout << word << " is OK\n";
        }
        else {
            cout << word << " is misspelt\n";
        }
    }
}

If you recompile and run your program now, you can interact with it:

scheck version 0.1
dog
dog is OK
fox
fox is OK
cat
cat is misspelt
zzzz
zzzz is misspelt

which is kind of  getting near to what you want the final product to do! Unfortunately, as you’ll see as you progress through the next few tutorials, there are problems with what you have wrought so far.

Conclusion

We’ll wrap on that high point. In this instalment you have hopefully learned:

  • Discovering new issues as you develop your project is to be expected, and is nothing to worry about.
  • Classes are one way of implementing features in C++, but not the only way.
  • Errors from the compiler and linker do not just report problems with syntax.
  • The Standard Library contains classes that make some apparently difficult problems a breeze.

 

Coming next: File handling, error reporting and performance measuring.

Sources for this and all other tutorials in the series available here.

Advertisements

From → c++, linux, tutorial, windows

10 Comments
  1. Bill permalink

    Thanks for pointing out the common compiler/linker errors. I can still remember struggling through error text like that as a new programmer and how frustrating it was. It’s not like the textbook called out what they were and what they meant.

    I hope these don’t seem too nit-picky, but…
    Did you mean for Dictionary’s ctor to be `explicit`?
    It looks like you left off the `const` qualifier in the first dictionary.h snippet.
    After you add `set` to the dictionary, “dog is OK”, but we haven’t populated the set yet.

  2. justin permalink

    Hi, I really like the aim of this series, but unfortunately I seem to be encountering something that I would consider odd.The program only recognizes the last word in my data file, even copying the class directly from the tutorial. Here is a picture of the relevant items (the data file, class header, and output):

    I suspect it’s about the data file itself, yet it’s solely newline-separated, the words are clearly added to the set (as seen in the picture), and the last word is recognized. I removed “dog” to see if “lazy” would work, and it did.

    Any insight would be appreciated.

    –Justin

    • @justin I would tend to suspect the data. Did you download this from my site or create your own? I just hex dumped the test data for my version and the lines are CR/LF terminated (as I develop on Windows), which I think might cause a problem if you are on Linux – I will now go away and fire up a VM to test this. If you are on Linux, delete the file and recreate it yourself with your favourite text editor and see if that works. Let me know how you get on and thanks for bringing this to my attention!

      Neil

    • @justin It looks like that is the problem, or at least there is a problem using my download on Linux (or a Linux like operating system like cygwin). For now, I will add a note to the tutorial about this.

      • Justin permalink

        @Neil I appreciate the reply, but I had created all my files from scratch, including the data file. Replicated it on a second Windows machine with the same issue. It’s odd. But I moved over to my linux distro and launched it there, no issue. No idea what is going on, but I’ll just continue this series under linux. Thanks for your time, once again!

      • @justin I still think this is a line-ending issue. What are you using to edit your Cygwin files? If it is Code::Blocks, go to Editor Settings and change the End-of-line Mode to LF, and try again. Or do this from a Cygwin bash shell:

        $ cat > mydict.dat
        the
        quick
        brown
        fox
        ^D

        where ^D is Control-D. Let me know if either of these fixes the problem.

      • justin permalink

        @Neil, it turns out that you were correct (never doubted you!). Notepad++ was saving the mydict.dat with both CR and LF line-endings, and when converted over to LF exclusively, it worked, no change to the code. Will keep this in mind during future projects cross-platform. I appreciate the help! I totally bypassed the thought of just using nano or vi in cygwin to create the file in the first place. OH WELL– lesson learned.

  3. Frentz permalink

    Sorry, noob question 🙂

    I had to use the full path of the data file (“C:\\Users\\ace\\Desktop\\__DEVEL\\scheck\\data\\mydict.dat”), since i had errors opening the file when i tried to use the following alternatives:
    “data/mydict.dat”
    “./data/mydict.dat”
    “data\\mydict.dat”
    “.\\data\\mydict.dat”

    What did i get wrong?

    • Where are you executing the program from? You should be running it from the scheck root directory, so that your command line looks like either “bin\scheck.exe” or “bin/scheck.exe”, depending on what shell you are using.

  4. Frentz permalink

    ugh sorry …. dumb moment for me! -)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: