Skip to content

Writing a Real C++ Program – Part 7

August 18, 2011

This is the seventh instalment in a series of C++ programming tutorials that started here.

Introduction

If you think back to the initiation of the scheck project, you may remember that your boss indicated that the output of the program should be in CSV format, with maybe XML needed later. Well, today when you came into work, the following email was waiting for you:

We need to go ahead with producing XML output. Just create an XHTML table for now. Oh, and it seems that the guys in the Amsterdam office may want JSON output at some time in the future. This should all be selectable from the command line.

As it happens, the reporting entity was the next in complexity in the original project plan, so it makes sense to look at that now.

A good way to start is to write out samples of the output that you want your reporter to produce (for the moment, forget the JSON output). For the CSV format it will be something like this:

word,context,line,file
"blug","crystals will be blug in colour","27", "chemo123.txt"
"pssible","it's pssible, though unlikely",40,"chemo123.txt"


while the XML will be like this:

<table>
  <tr>
    <td>blug</td>
    <td>crystals will be blug in colour</td>
    <td>27</td>
    <td>chemo123.txt</td>
  </tr>
  <tr>
    <td>pssible</td>
    <td>it's pssible, though unlikely</td>
    <td>40</td>
    <td>chemo123.txt</td>
  </tr>
</table>


What do the two have in common? Well, they both have a header (the field names for the CSV and the table tag, they both output records, formatted as required. The XML output has a tail, in the form of the table close tag. If you were to write pseudocode to produce the required output you might end up with something like this:

if CSV
    write csv header
else if XML
    write xml header
loop
    check word spelling
    if error
       if CSV
          write csv record
       else if XML
          write xml record
if XML
    write xml tail


I think you can see that when you add the JSON formatting to the mix (and who knows what formats in the future) those if-ladders are going to get out of hand. C++ offers a neater solution. You can write code something like this:

rep = create CSV or XML reporter
rep.write header
loop
    check word spelling
    if error
        rep.report error
rep.write tail

In other words, you transfer the responsibility of working out exactly what needs to be output to the CSV and XML reporter objects. When you want to add JSON, you just need to implement a JSON reporter, and remainder of the code stays unchanged.

This ability of different objects to present the same interface, but to respond differently when the interface is used, is called polymorphism. C++ actually allows for several different types of polymorphic behaviour, but the one you need here is run-time polymorphism, which is implemented using inheritance and virtual functions.

The Base Class

In order to use polymorphism, you need to derive your CSV and XML reporters from a base class. The base class specifies the interface that the CSV and XML reporters must implement. There are a number of possible ways of designing the base class to solve the problem at hand – this is what I came up with:

class Reporter {
  public:
    Reporter( std::ostream & os ) : mOut( os ) {
    }
    virtual ~Reporter() {
    }
    virtual void ReportHeader() = 0;
    virtual void ReportError( const std::string & word,
                   const std::string & context,
                   unsigned int line,
                   const std::string & filename ) = 0;
    virtual void ReportFooter() = 0;
  protected:
    std::ostream & Out() {
      return mOut;
    }
  private:
    std::ostream & mOut;
};

As with the Parser class, the Reporter’s constructor takes a stream object (the stream that reports will be written to) and uses it to initialise  the private mOut reference member. This member is going to be be used by the derived classes to write their output, so a protected member function is provided to give them access. It’s important that the mOut reference itself is not made protected. Protected data members are always a bad idea – if you have protected data you may as well go the whole hog and make it public.

The virtual destructor is an absolute requirement for all classes that are going to be derived from. Exactly why this is the case I will explain a bit later, but for now be assured that this is non-negotiable!

The interface that the CSV and XML reporters will have to implement is provided by the ReportHeader, ReportError and ReportFooter functions. These functions are declared as virtual, which means that they will behave polymorphically, and as pure (using the somewhat strange =0 syntax)  which means that derived classes must provide implementations. Any class that has one or more pure virtual functions is said to be an abstract base class. You cannot directly create instances of such a class – they can only be used to derive from.

If you are happy with the design, save the code in a header file called reporter.h in the inc directory. You will have to add include guards and the various Standard Library header files.

The Derived Classes

The derived CSV and XML reporter classes now have to provide implementations for the pure virtual functions declared in Reporter base class. You should write these as separate .h and .cpp files – this wasn’t really necessary for the Reporter class because it basically doesn’t do anything. Here’s the header file for the CSV reporter class:

#ifndef INC_SCHECK_CSVREPORTER_H
#define INC_SCHECK_CSVREPORTER_H
#include "reporter.h"
class CSVReporter : public Reporter {
  public:
    CSVReporter( std::ostream & os );
    void ReportHeader();
    void ReportError( const std::string & word,
                      const std::string & context,
                      unsigned int line,
                      const std::string & filename );
    void ReportFooter();
};
#endif


You need a constructor because you have to have some way of passing the output stream to the Reporter constructor. You do not need a destructor, because there is nothing for it to do. There is no need to declare the interface functions as virtual, because once a function is declared so, the "virtualness" cannot be removed. And they must not be declared as pure, because you are going to implement them. Here’s a simple implementation of the CSVReporter class:

#include "csvreporter.h"

using std::string;

static string ToCSV( const string & s ) {
    string csv;
    for ( unsigned int i = 0; i < s.size(); ++i ) {
        if ( s[i] == '"') {
            csv += '"';
        }
        csv += s[i];
    }
    return '"' + csv + '"';
}

CSVReporter :: CSVReporter( std::ostream & os ) 
      : Reporter( os ) {
}

void CSVReporter :: ReportHeader() {
     Out() << "word,context,line,file\n";
}

void CSVReporter :: ReportError( const string & word,
                                 const string & context,
                                 unsigned int line,
                                 const string & filename ) {
    Out() << ToCSV( word ) << ","
	  << ToCSV( context ) << ","
          << '"' << line << '"' << ","
          << ToCSV( filename ) << "\n";
}

void CSVReporter :: ReportFooter() {
   // nothing to do
}

 

This is all straightforward stuff, with possible exception of the ToCSV function.  This function does two things – it wraps the string passed to it in double quotes, and it turns any double-quote inside the string into  a pair of double-quotes. This is necessary if the CSV generated is to conform to what passes for the CSV standard. The function is made static because you do not want it to be used outside of this specific .cpp file. An alternative would be to make it a private member function, but ToCSV does not actually use any of the members of CSVReporter  and so making it a member is not necessary. This is important – not all (in fact nowhere near all) functions in C++ programs must be class members.

Arguably, only the context actually needs to have CSV quoting applied to it, as misspelt words will not contain double-quotes because of the parser’s logic, and file names generally don’t contain such characters. However, things can change, and it’s safest to apply the quoting to all strings.

You can now test the CSV output  by modifying main (you will have to include csvreporter.h, and you need to update your makefile too):

Dictionary d( "data/mydict.dat" ); 
const char * subtext = "data/sub2.txt"; 
ifstream sub( subtext ); 
if ( ! sub.is_open() ) { 
    throw ScheckError( string("cannot open ") + subtext ); 
} 
Parser p( sub ); 
CSVReporter rep( cout );

string word; 
rep.ReportHeader(); 
while( ( word = p.NextWord() ) != "" ) { 
    if ( ! d.Check( word ) ) { 
        rep.ReportError( word, p.Context(), p.LineNo(), subtext ); 
    } 
} 
rep.ReportFooter();

If you build and run this, you may find that it almost works, but that it exposes a bug in the Context member function of the Parser class. In the code I supply, I add a newline to the end of each input line which helps with parsing hyphenated words. Unfortunately, it messes up the context! You can fix it by removing the newline in the string returned by Context, like this;

string Parser :: Context() const {
    return mLine.substr( 0, mLine.size() - 1 );
}

With that fix in place, the code now outputs misspelt words in nicely formatted CSV.

You should now be able to write the XML reporter yourself – it should do basically what the CSV reporter does, but wraps the output in the relevant XML tags. If you don’t want to write it yourself, an implementation is supplied in the downloadable code supporting this tutorial.

Dynamic Object Creation

As things stand, you can create either a CSVReporter or an XMLReporter object, with the type determined by the code you have written at compile-time. However, what you really want is to decide which kind of reporter you need at run-time. To do that, you need to create the reporter object dynamically using new, depending on some run-time condition.

The run-time condition I suggest you use to decide which reporter to create is the number of arguments to main – if there are no arguments (apart from the command name), create a CSV reporter, if there are any arguments, create an XML reporter. This isn’t good enough for a real program, but you haven’t implemented the command line processing code yet, so it will do as a start.  You will have to change the signature of main:

int main( int argc, char * argv[] ) {


You can now create the objects dynamically. As new returns a pointer, you need to store the return value in a pointer variable. Of what type should that pointer be? Well, it cannot be a pointer to a CSVReporter, as you may be creating an XMLReporter, and it cannot be a pointer to an XMLReporter, as you may be creating a CSVReporter. In fact, what you need is a pointer to the Reporter base class, which is compatible with pointers to any of its derived classes:

Reporter * rep = 0;
if ( argc == 1 ) {
    rep = new CSVReporter( cout );
}
else {
    rep = new XMLReporter( cout );
}


You also need to change the calls to the reporter’s functions to use pointer access syntax:

 rep->ReportHeader();    
 while( ( word = p.NextWord() ) != "" ) {
     if ( ! d.Check( word ) ) {
         rep->ReportError( word, p.Context(), p.LineNo(), subtext );
     }
 }
 rep->ReportFooter();

 

And lastly, because you allocated the  reporter with new, you must add a call to delete at the end of main.

delete rep;


You can now make the program and try running it. If you run it as:

src/scheck


you should get CSV output, but if you run it as:

src/scheck xxxxx

(what xxxxx is isn’t important) you should get XML output.

The Virtual Destructor

If you recall, I insisted that the Reporter base class have a virtual destructor. You can now see why this must be the case. Simplifying, a call to your code to create a CSV reporter  looks like this:

Reporter * rep = new CSVReporter( cout );
delete rep;

Here you are creating a derived class object dynamically, and storing a pointer to it in a base class pointer. You then delete the object through the base class pointer.

The C++ Standard says very clearly that if you are in this situation, and if the base class does not have a virtual destructor, then the program’s behaviour is undefined. When the C++ Standard says that you have "undefined behaviour" it means that you cannot predict what the program may do from that point onwards. This is not a good position to be in. In fact, what will probably happen is that the base class destructor will be called, omitting any call to the derived class’s destructor. In the case of your code, the derived class doesn’t have a destructor, so that might not seem so bad, but you still are in undefined behaviour territory, and the next release of your compiler might produce code that does something much nastier.

Bottom line – if you are going to derive from a class, always give that class a virtual destructor.

Conclusion

That wraps it for this instalment. I will continue talking about pointers in the next one, and explain why you really need to use smart rather than plain pointers, and take a look at how you can begin to write tests for your code.

Sources for this and all other tutorials in the series available here.

Advertisements

From → c++, linux, tutorial, windows

8 Comments
  1. Justin permalink

    Hi, it’s me again. I was looking for some help with this cryptic (linker?) error that I’m running into with my XMLreporter class.

    Justin@Justin-PC ~/code/scheck
    $ make
    g++ -I inc -c src/xmlreporter.cpp
    src/xmlreporter.cpp: In member function `virtual void XMLreporter::ReportError(c
    onst std::string&, const std::string&, unsigned int, const std::string&)’:
    src/xmlreporter.cpp:39: error: invalid conversion from `unsigned int’ to `const
    char*’
    src/xmlreporter.cpp:39: error: initializing argument 1 of `std::basic_string::basic_string(const _CharT*, const _Alloc&) [with _CharT
    = char, _Traits = std::char_traits, _Alloc = std::allocator]’
    make: *** [xmlreporter.o] Error 1

    My code is here: https://s3.amazonaws.com/folderholder/scheck.zip

    As far as I can tell, everything is as it should be. My class looks exactly likes yours (sans styling and actual function), the types are copy-pasted exactly from my CSV class, and that once compiles without error. It seems as though it’s my ToXML function, but syntactically it’s identical to my ToCSV function. Wisdom is appreciated. Thanks again for these tutorials!

    • @Justin. These are actually compilation errors. You are getting them because when you say:

      ToXML(line,3)

      you are passing an integer (line) to a function that expects a string. You need to find some way of converting the integer to a string, or do what I did and just output it directly to the stream.

      • Justin permalink

        And this is why amateur programmers shouldn’t code on little sleep. Stupid mistakes. Always. Thanks though!

  2. martin permalink

    Hi, I was trying to implement the XML output class, but am running into an issue.

    I’m formatting it as follows:

    \t
    \t\t
    \t\t
    \t\t
    \t

    I’m using a string to hold the lines, one at a time. The issue arises when needing to assign the strings for the values.
    outstring = ‘\t’ + toXML(foo); causes issues whereas no errors are thrown, but the line is printed as blank.
    outstring = “\t\t” + toXML(foo); works for the details of each item.

    I tried outstring = “\t” + toXML(foo); but that doesn’t compile due to it not being a string, but rather a character literal, and wrapping the tab in a string() call has zero effect as well. Using gcc, 4.5.1

    Any advice? I’d appreciate it.

    PS. in the time writing this, I figured I could use a string stream, then call the member function sstream.string() on it and use that to print to the file, but it seems like a roundabout way to do it.
    Am I just totally off-base with my approach? Thanks! And these tutorials are great.

    • martin permalink

      Uh oh, the website killed my formatting..

      it’s as follows
      –errors_opentag
      \t –item_opentag
      \t\t –word
      \t\t –context
      \t\t –line
      \t\t –file
      \t — item_closetag
      –errors_closetag

      • @martin I would need to see more code., specifically – what is the return type of your toXML function? In C++ “\t” represents a string containing a single tab and “\t\t” represents a string containing two tabs – this works perfectly well with GCC, and I use it all the time myself.

  3. martin permalink

    I understand that you need to see more code, but the issue is replicated here:

    string foo;
    // foo = “”; does not make a difference
    foo += “\t” + “helllo”;
    cout << foo;

    And it produces this:
    error: invalid operands of types ‘const char [2]’ and ‘const char [7]’ to binary ‘operator+’

    • OK, that’s because when you say “\t” + “hello” the compiler sees it as trying to add two arrays of char (which is what string lioterals are) which is not possible in C++. You need to make one of them a string:
      foo += string(“\t”) + “helllo”;

      Now the compiler will see it as adding a string and an array of char, and there is an overload on operator + to do that.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: