Get in touch
Thank you
We will get back to you as soon as possible
.pdf, .docx, .odt, .rtf, .txt, .pptx (max size 5 MB)

15.3.2014

7 min read

Do not muddle references and pointers

.CPP files under the hood with pitfalls! Introduction 

Beginning with this article I am going to start a set of articles about different things in C++ which can confuse and challenge some junior programmers or be interesting for getting a deeper knowledge about this language. I will try to tell everything in a clear and comprehensible way. Articles will contain a lot C++ code with lots of comments. I usually use GCC compiler with enabled C++11 standard on Linux machine, in case of other environment, I will mention this. My first message is aimed at the first target group who usually don't deal well with the title. My goal is to describe difference between these fundamental entities and make some kind of guide how to use them with the Force.   So what kind of beasts pointers and references are? or some kind of history   History comes first. Pointers are legacy from C-times. I don't use this phrase in negative sense, because it is a very useful acquisition. Pointers provide random access in memory of computer. This means that pointer is an address in the memory. What you can get by this address? Usually it will be your variables or set of variables (arrays). References appeared in C++ when there was an issue with overloading operators which return type for other operations, such as []. You must remember this correct overloading, but just for one moment let’s think that C++ doesn't have references. It would be like this:

#include <cstddef> 

// Common class for holding data.
class Overloader
{
 protected:
  Overloader()
  {
    for(int i = 0; i < size; ++i)
    {
      array[i] = i;
    }
  }

  static const std::size_t size = 3;
  int array[size];
};

// Class with overloaded operator [] which return type by value.
class OverloadWithValue : public Overloader
{
 public:
  int operator [] (int index)
  {
    return array[index];
  }
};

  and everything is great until you try to assign returned value:

OverloadWithValue value;
std::cout << value[0] << std::endl; // Works! It prints 0
value[0] = 1; // compile error: lvalue required as left operand of assignment

OK, then you can write something like this with pointers:

class OverloadWithPtr : public Overloader
{
 public:
  int* operator [] (int index)
  {
    return array + index;
  }
};

and it works in the following way:

OverloadWithPtr ptr;
std::cout << *ptr[0] << std::endl; // Prints 0
*ptr[0] = 1;
std::cout << *ptr[0] << std::endl; // Prints 1

but when you see this, you think "uugh, it is ugly!". We are forced to use dereference operator *. Also it collapses encapsulation of class, because we get raw pointer and nobody forbids to use it and modify any values in its memory space. We can do such tricks:

int *ptrHack = ptr[0];
// Writes too big digit which you can't validate anyway.
ptrHack[2] = 1000000000000000000;
ptrHack[3] = 0; // Writes out of allocated space (undefined behavior).

very bad boy, very bad... So there is a typical situation where we need good l-value for operator [] which we can change without collapsing encapsulation of the class. References are what we need. Meet:

class OverloadWithRef : public Overloader
{
 public:
  int& operator [] (int index)
  {
    return array[index];
  }
};

...

OverloadWithRef ref;
std::cout << ref[0] << std::endl; // Prints 0.
ref[0] = 1;
std::cout << ref[0] << std::endl; // Prints 1.

Perfect! It works, it is elegant and it doesn't hurt anybody, but we still don't know what reference is. Actually it is dereferenced constant pointer. That means we still get object by address, we can modify object, but address itself can't be modified. In simple words it is a nice way to hide ugly constructions with pointers and their dereference operator from us.   Main difference or understand everything   Lets see how we create pointers for different purpose:

int *pointer0 = new int(10);            (1.1)
int *pointer1 = new int[1] { 10 };      (1.2)

int variable = 10;
int *pointer2 = &variable;              (1.3)
int *pointer3 = nullptr;                (1.4)

delete pointer0;                        (1.5)
delete []pointer1;                      (1.6)

in order of appearance: (1.1) We use operator new to create integer object whose value is 10 and we must use operator delete to free it (1.5) (1.2) We use operator new [] to create a set of integer objects. Set has size 1 and single object has value 10. In such cases do not forget to free memory with operator delete [] (1.6) (1.3) We use operator & which takes address of the variable (do not muddle it with declaration of reference!). We don't free this pointer, because it is addressed to variable on stack which destroys when performing of code is gone out of scope. (1.4) We address pointer on nullptr which means that nothing was allocated and there is nothing to free. There is just silence and nirvana. Let’s see how we create references

void takeReference(int &ref)                     (2.1)
{
  ref = 10;
}

void takePointerReference(int *&ref)             (2.2)
{
  ref = new int[1] {10};
}

...

int value;
int &ref = value;                                (2.3)
ref = 10; // value will be 10.

int empty;
takeReference(empty); // empty will be 10.

int *array;
takePointerReference(array); // array[0] will be 10.

Actually (2.1), (2.2) and (2.3) are very similar, but used in different context. We get reference on already existing variable: (2.1) is local variable; (2.2) is argument of function which is pointer; (2.3) is argument of function passed by value. If you take a look at (1.4) and (2.*) little bit more, than you get first difference between pointers and references: references have to be addressed somewhere, but pointers aren’t. It means following code doesn't compile:

int &ref; // Compile error: ref declared as reference but not initialized

but this does:

int *ptr;

OK you can ask about such case:

int *ptr = nullptr;
int &ref = *ptr;
std::cout << ref << std::endl; // probably here is segmentation fault

I want to ask counter question – who is your doctor? Never act like this and you will be blessed. It isn't an issue of reference. It is an issue of dereferencing of nullptr, because by standard this is undefined behavior. Any compiler doesn't guarantee correct work here. Take a look at (1.2) and (2.*) again. See? - References is addressed to single object. Next: references can't be addressed to other object after its creation. Following code proves it:

int a = 0;
int b = 1;
int &ref = a; // ref is referenced to a and its value is 0.
ref = b; // ref is still referenced to a, but its value is 1.

If you return to implementation of operator [] which returns pointer, you may understand: references don't support address arithmetic. You will never see code with references which use operator +, operator -, operator – or operator ++ for changing address. Only pointers can do this and that's why C or C++ is usually called low level language when it comes to memory management.   When do we usually use them with Force? or typical use cases   For what purpose pointer can be used:

we want to allocate an object and deliver its lifetime management to another place;


Overloader* create()
{
return new OverloadWithValue();
}
Overloader *overmind = create();
...
delete overmind;
    

*   we want to hold set(massive, array) of objects on heap;
    
    ``` 
    int *array = new int[10];
    ...
    delete []array;
    

we want to pass argument in function\method which can be nullptr (NULL before C++11) (do not forget to check on it!);


void func0(int *massive)
{
if (massive == nullptr) return;
...
}
    

*   we want to use class field with postponed initialization;
    
    ``` 
    class NullField
    {
     public:
      NullField() : obj(nullptr)
      {
      }
    
      void setObj(NullField *obj)
      {
        this->obj = obj;
      }
    
     private:
      NullField *obj;
    };
    

For what purpose reference can be used:

we want to pass in function\method argument which is always initialized and we don't want to spend time on copying it how it does on passing by value;


// It is good practice to add const if we don't change argument value
void inputReference(const std::vector &data)
{
// Use data.
}
    

*   we want to hold in class field which is initialized, but in other place;
    
    ``` 
    class RefField
    {
     public:
      RefField(NullField &field) : ref(field)
      {
      }
    
     private:
      NullField &ref;
    };
    
    void useNullAndRefField()
    {
      NullField nField;
      RefField rField(nField);
    } // Do not forget about order of destruction of objects.
    

we want to return some field from class without copying its data.


class VectorField
{
public:
const std::vector& getData()
{
return data;
}
private:
std::vector data;
};

And last one, don't worry - polymorphism works well on both entities.

void passByPointer(Overloader *over)
{
}
void passByReference(Overloader &over)
{
}
...
OverloadWithValue value;
passByPointer(&value);
passByReference(value);

  Conclusion That's all what I wanted to say. There are some phrases which make up a summary of the article:

  • pointers and references is one of the most powerful and basic entities of C++ world;
  • they are different;
  • use cases are different;
  • each use case provides different, but fast code;
  • reference is safer than pointer;
  • pointer is more flexible than reference.
0 Comments
name *
email *