C With (Object) Prototypes

Introduction

When talking about prototypes, especially in C, the first thing that comes to mind is function prototypes, which are those little function declarations found in header files or at the beginning of a file. They act as a “forward declaration” for the function, leaving the definition for later in the file or even for a separate file (it's how shared libraries work, placing prototypes in header files and the implementations in the shared object files.)

There is another type of prototypes, though: object prototypes. This kind of prototypes is actually very far from C, since it's an object-oriented paradigm. Actually, even among object-oriented languages, prototypes doesn't seem to be much known (or, at least, they used to be fairly unknown. I didn't really research if things have changed.)

In this article I'll briefly go over what object prototypes are, and then provide an implementation for the C language (thus giving C an object-oriented interface.)

Object Prototypes

What is an object prototype? It's an object in the “object-oriented” sense, however, the object is not created from a class, but from an object. In practice, it means that instead of defining a class and creating instances of that class (the objects), an already existing object is cloned and then the various methods and properties changed accordingly.

To illustrate this with an example, using classes one would write something like (it's pseudocode):

class Example {
     private a;
    
     public Example() { a = 0; }

     public getA() { return a; }
     public setA(v) { a = v; }
}

class Example2 extends Example {
     public Example2() { super(); }
    
     public getA() { return super.getA() + 2; }
}

o = new Example();
o2 = new Example2();
	  

The Example class defines some methods to get and set a value; the Example2 class subclasses the Example class and overrides the getA method to return the same value incremented by 2. Example2's constructor calls super to initialize the a variable.

Using object prototypes, one would write something like this instead (again pseudocode):

Example = Object.clone();
Example.a = 0;
Example.getA = function () { return this.a; }
Example.setA = function (v) { this.a = v; }

Example2 = Example.clone();
Example2.getA = function () { return this.a + 2 }
	  

A more functional-like approach was taken, but it's only to be brief. As I'll show later, it's possible to define methods using C's standard features.

The example clearly shows the difference: classes are static definitions and objects are created from them. Prototypes are dynamic entities used to create other objects, which can then be used as prototypes for other objects, and so on.

Searching the web, there are definitely better and more detailed explanations, but this is enough to understand the rest of this article.

Object-Oriented C

During the years since C's inception, there have been many attempts at adding an object-oriented interface to the language.

Of course, there's C++; Objective-C; the less known ooc (not to be confused with the other ooc!); the GTK version, GObject; and even that one cousin no one wants to talk about, Xt.

All of these share one common trait: their objects are defined through the use of classes.

This article will define CWP, an object-oriented interface based on prototypes.

The Interface

Looking at the example above, we can see that we need the following features: a way to define properties (Example.a = 0); a way to define methods (Example.getA = function ...); a way to access properties (this.a); and a way to call methods (Object.clone()).

Some prototype-based languages allow to define methods as properties of the object, but C isn't as permissive for reasons explained later on, so “properties” and “methods” are treated as separate entities.

Given this, the following C functions are required:

  • cwp_set for setting a property value
  • cwp_get to get a property value
  • cwp_method to define a method
  • cwp_call to call a method

For completeness, two other functions, cwp_unset and cwp_unmethod, are defined to permanently remove a property or a method, respectively.

These are the functions needed to access the object-oriented interface, but something important is still missing: as explained earlier, when using prototypes objects are created from other objects; what we need is a starting point, i.e. an object that exists the moment the code is compiled and is never deleted.

We accomplish this by creating the cwpObject prototype statically.

cwpObject

The static cwpObject prototype requires at least a method to create other objects, which will be called “create” (in the example it's called “clone”), and a method to delete the object when it's not needed anymore (this is C, memory management is not done automatically), which will be called “destroy”.

Regarding properties, it's common for objects to override a method, but then call the superclass's implementation somewhere inside it (the example shows it cleary in the overridden getA method). As such, the cwpObject prototype will include a “super” property containing the object acting as a superclass.

These are the strict minimum, but there are some more methods which are pretty useful to have. One of these is the “to string” method, used to generate a representation of the object that can be used with e.g. printf. Another one is the “equals” method: for those familiar with Java, it's the same thing. For those not familiar with it… it's a more detailed version of ==.

How do we represent this in C? Of course, it's a struct:

struct object {

};
	  

Naturally, this means that if we want to pass objects to functions or return objects from them, we need to make the actual type a pointer:

typedef struct object *cwp_object_t;
	  

For the sake of information hiding, the typedef will be placed in the header file (e.g. cwp.h), while the structure definition in the implementation file (e.g. cwp.c), so that users will not know the structure layout, using the cwp_object_t type instead.

The object Structure

As explained earlier, an object contains any number of properties and methods, so struct object will contain vectors for properties and methods:

struct object {
     size_t nprops;
     struct property *properties;
     size_t nmeths;
     struct method *methods;
};
	  

Good practices tell us to also include the respective sizes.

If we just use the raw vectors, users will have to know the index of a particular entry (property or method) before it can use it. This makes code unnecessarily unreadable: what is that 0? Is it “create”? Is it “destroy”? As such, it's better to search properties and methods using names (i.e. strings) rather than indices:

struct property {
     const char *name;
     cwp_object_t value;
};
	  

Having properties store only objects will make things a bit more verbose, but simplifies some operations as shown later on.

This takes care of properties, but what about methods? How do we associate a function with a string? Thankfully, C has function pointers, which allow us to assign a function name to a value, as long as the function signature is the same.

struct method {
     const char *name;
     cwp_method_t value;
};
	  

cwp_method_t is a typedef, which will be shown after introducing some additional concepts.

Indeed, struct property and struct method are essentially the same. Do we really need them both? Well, we don't really need both of them, but using only one data type (e.g. by placing the two value entries in an union) would require additional casting or some indicator telling us if we are dealing with a property or a method, which is generally error-prone and doesn't really have any practical advantage. On the other hand, using two different types, we don't need to pay that much attention to assignments and we can also leverage the compiler's type checking abilities.

Properties

Properties are values associated with a name. They can't be executed like methods, but can be used by methods to perform their operations. They are essentially the “instance variables” of class-based objects. Unfortunately, unlike instance variables, properties can't be made private, i.e. accessible only by methods. There is, however, a trick to “hide” them, explained near the end of the article.

Our struct property associates a name to a cwp_object_t value. We could extend the structure to also hold types like int or double, but then we'd have to have a way to pass the value with the appropriate type when setting a property (C has static typing). This means having multiple “set” functions, one for each accepted type.

While it isn't particularily complex, it raises at least one important problem: what happens when we remove the property? As long as the value is an integer, there isn't much of an issue, but what happens when it is, for example, a string? Do we need to free it? What if the pointer to that string is still being used outside the object? It's the same when considering generic pointers.

There isn't a definite answer here, it depends entirely on what the implementor thinks is best. Since whatever decision we take is most likely to be the wrong one in some use cases, we're going to delegate all these issues to cwp_object_t's “destroy” method. After all, if we need int or float values, we can simply box them like Java does (and we're going to do exactly that later on).

Even though properties only have cwp_object_t values, there is still the problem of what to do when we destroy an object. After all, a single object can be a property of multiple objects. Consider this example (pseudocode using the functions explained earlier):

o1 = cwp_call(cwpObject, "create");
o2 = cwp_call(cwpObject, "create");
o3 = cwp_call(cwpObject, "create");

cwp_set(o1, "prop", o3);
cwp_set(o2, "prop", o3);
cwp_call(o1, "destroy");
cwp_call(o3, "destroy");
	  

The o3 object is a property of both o1 and o2, so when we destroy o1, the o3 object must still be alive. Similarily, when o3 is destroyed, o2 still has it as a property, meaning o3 must be alive until o2 is destroyed.

We solve this problem by adding a form of reference counting. Usually references are used with garbage collectors, but they solve our problem easily even though memory management is manual.

Thus, struct object now looks like this:

struct object {
     size_t nref;
     size_t nprops;
     struct property *properties;
     size_t nmeths;
     struct method *methods;
};
	  

Now, setting a property will increment the object's nref field, while unsetting a property or destroying the container object will decrement the nref field. Of course, if there are no more references to the object, that means it can be safely destroyed.

Function Signatures

Before talking about defining methods, there is an important matter to take care of: as shown by struct method the actual value of a method is a function pointer. This means every function assigned to that pointer must have the same signature. The structure of this signature is decided entirely by the implementor, so users will have to put up with seemingly nonsensical choices.

Defining a good signature, especially for functions exposed to the users, is as hard as finding good names for variables. For CWP, I strived for signatures that are as uniform as possible, though they turned out to be a little bit unintuitive.

Some methods happen to return a value, while some others don't, so the first thing to do is decide how to handle these returned values.

A possible solution is to always specify the return value, like this:

cwp_object_t method(...)
	  

That would result in code like:

a = cwp_call(cwpObject, "create");
b = cwp_call(a, "a method");
...
cwp_call(a, "destroy");
cwp_call(b, "another method");
	  

However, some methods that normally wouldn't return values might do so for some particular object:

a = cwp_call(cwpObject, "create");
b = cwp_call(a, "special method");
...
cwp_call(a, "destroy");
c = cwp_call(b, "destroy");
	  

I find this type of interface too irregular and it's easy to forget special cases. Thus, I decided to make the returned value a function argument:

cwp_call(cwpObject, "create", &a);
cwp_call(a, "special method", &b);
...
cwp_call(a, "destroy", NULL);
cwp_call(b, "destroy", &c);
	  

While it's true that it isn't too different in practice than the case with the specified return value, it makes the code as a whole look more harmonious, since every function looks the same.

We've took care of return values, but what about errors? Sure, we can use the returned value to indicate an error, but some methods might return values of a type that isn't really suited for errors (e.g. integers; sometimes -1 is an error while other times it's a valid value).

We could use the now free returned value of the method to report an error using a dedicated type (like an enum), but then it would be easy for users to forget about it. Consider this example:

cwp_call(cwpObject, "create", &a);
cwp_call(a, "method", NULL);
	  

What happens if cwp_call fails (e.g. because there is no memory available for the object)? When the program is run, it will result in a crash, but that might happen after several days of execution, and debugging it would likely be hard.

To reduce these kind of problems, the error is yet another function parameter. In particular, it's the first parameter. That way, users can't say they forgot about it. Either they explicitly pass NULL, or they at least recognize that an error can occur any time.

Lastly, methods need other parameters to work, which depend entirely by the method itself. This is solved easily by usnig variadic arguments (i.e. the <stdarg.h> header file.)

After all these consideration, a method will then be defined as:

typedef void (*cwp_method_t)(cwp_error_t *, const cwp_object_t, const char *, void *, va_list *);
	  

This file should be placed in the cwp.h header file, so that the function prototypes for the cwp_call, etc. functions can find it.

The returned value is of type void * because some methods can't return an object no matter what, so to accomodate different types of values, void * is our only choice.

Of course, I talked about methods, but this reasoning is actually applied to the actual user-facing functions:

void cwp_set(cwp_error_t *error, const cwp_object_t self, const char *name, cwp_object_t value);
void cwp_get(cwp_error_t *error, const cwp_object_t self, const char *name, cwp_object_t *returned);
void cwp_unset(cwp_error_t *error, const cwp_object_t self, const char *name);
void cwp_method(cwp_error_t *error, const cwp_object_t self, const char *name, cwp_method_t function);
void cwp_call(cwp_error_t *error, const cwp_object_t self, const char *name, void *returned, ...);
void cwp_unmethod(cwp_error_t *error, const cwp_object_t self, const char *name);
	  

The methods follow the same structure so that it's easier to define them (less cognitive dissonance).

The reason is that certain methods can't return an object no matter what, but rather values of other types, so void * is the only way to solve that problem.

Methods

After the discussion about function signatures, defining methods is actually fairly straightforward: first, we define a function with the cwp_method_t signature, then use it as the method value. Since this is one of the most performed actions, we can define a macro to save us the time to write down the signature (which also helps if the signature changes for one reason or another):

#define cwp_defmethod(name) void name(cwp_error_t *error, const cwp_object_t self, const char *name, void *returned, va_list *args)
	  

It can then be used like:

static cwp_defmethod(a_method); /* Function prototype */
...
static cwp_defmethod(a_method) {
     /* The function body goes here */
}
	  

The arguments are implicitly declared, sure, but with an appropriate set of macros we can deal with that too:

#define cwp_error(value) do { if (error) { *error = value; } } while (0)
#define cwp_return(value, type) do { if (returned) { *(type *)returned = value; } } while (0)
	  

As explained earlier, methods can return any type of value, so the cwp_return macro needs to specify the type of the value.

They can be used like this:

static cwp_defmethod(a_method) {
     cwp_error(NONE); /* Everything is fine */

     cwp_return(0, int);
}
	  

We can also define a macro for the variadic arguments, like:

#define cwp_argument(type) va_arg(*args, type)
	  

Which is used like this:

static defmethod(a_method) {
     int a = cwp_argument(int);
}
	  

Among the possible ways a method can fail, the most common are wrong arguments and no more memory available (this case usually for the “create” method), thus, a minimal definition for cwp_error_t would be this:

typedef enum {
     cwpNONE,
     cwpARG,
     cwpNOMEM,
} cwp_error_t;
	  

cwpNONE means there were no errors and it should always be used when a method terminates correctly, otherwise the user might get spurious errors caused by recycling a variable.

About the returned value, what happens if we want to return nothing? A straightforward implementation might simply return NULL, but that's not optimal. NULL can cause issues when it's not expected and in general is a headache (Hoare even regretted thinking about it). It's much better to return a special cwp_object_t value, so that other functions working with objects can operate on a valid object without worries. This special object will be called cwpEmpty.

Defining the cwpObject prototype

Now that we have a framework to define properties and methods, we can finally define the cwpObject prototype.

Since it has to be used in cwp_call, its type must be cwp_object_t, i.e. a pointer. However, it must already exist before the program is executed (we can't rely on users calling functions like init_cwp and doing that automatically is a hack.)

We solve this problem by defining a variable of type struct object (not a pointer), then assigning its address to cwpObject:

struct object object = {

};

cwp_object_t cwpObject = &object;
	  

What's inside object? Of course, the first value is the value of the nref field. Since cwpObject always exists in itself, its value will be 1.

After the number of references, we have the properties and the methods. These vectors must be statically initialized too, so we apply the same trick and define two more variables:

struct property object_properties[] = {
     {"super", &empty},
};

struct method object_methods[] = {
     {"create", object_create},
     {"destroy", NULL},
     {"equals", object_equals},
     {"to string", object_tostring},
};
	  

The &empty value is the value of the cwpEmpty object introduced previously. It will be explained after these variables.

The choice of using cwpEmpty as cwpObject's superclass is purely implementative. Other implementors might decide to make it something else.

The C functions used as methods are of course defined using the framework fleshed out earlier. The “destroy” method is special: since cwpObject can't be destroyed, we can't assign a method to it, but at the same time that method must be available to objects created from this prototype. Using a NULL pointer is the only choice, as long as we properly check for it inside cwp_call, in which case nothing is executed and cwpNONE is returned as error code.

The “to string” method is defined using a space in its name, rather than using the more familiar “toString”. This is merely to show that we are not bound to any particular limitation with names, as long as it's a valid string.

Now that we have these vectors, our object becomes:

struct object object = {
     1,
     sizeof(object_properties)/sizeof(object_properties[0]), object_properties,
     sizeof(object_methods)/sizeof(object_methods[0]), object_methods,
};
	  

By using sizeof, we can expand the vectors without worrying about updating the nprops and nmeths field. The compiler will do it for us.

The cwpEmpty object is defined in a similar way:

struct method empty_methods[] = {
     {"equals", object_equals},
     {"to string", empty_tostring},
};

struct object empty = {
     1,
     0, NULL,
     sizeof(empty_methods)/sizeof(empty_methods[0]), empty_methods,
};

cwp_object_t cwpEmpty = &empty;
	  

cwpEmtpy isn't supposed to be used as a prototype, so we'll give it no properties and no “create” method (it naturally follows that there is no need for a “destroy” method). However, the “equals” and “to string” methods are pretty useful (especially “equals”), so they are made available to the object.

Creating Objects

Now that our starting point is ready, we can finally define our generic “create” method. It's “generic” because this method will simply clone an already existing object as-is, without modifying anything. Specific changes are left to users when they want to “subclass” (or rather, specialize) an object.

There are many ways in which an object can be cloned. Some languages makes so that changing the prototype also changes the already cloned objects, while others keep the two entities separated, i.e. when a object is cloned, changes don't affect the clones.

There are practical advantages and disadvantages to both choices and one isn't better than the other. However, the second option is easier and less error-prone to implement, so that's how CWP objects will be created. On the other hand, it will use a lot more memory.

Even though the method isn't supposed to modify the new object, it does actually need to make an important change: the “destroy” method needs to be changed from NULL to the object_destroy function.

The method would then need to follow these steps:

  1. Check if the arguments are valid
  2. Allocate enough memory for the object and the object's vectors
  3. Copy the prototype's property and method vectors
  4. Change the “destroy” method
  5. Increment the properties' nref field

Later on, some special objects will be introduced which follow a similar pattern. For convenience, a function oalloc implementing the steps 2, 3, and 5 will be defined:

static cwp_object_t oalloc(cwp_error_t *error, size_t nprops, struct property *props, size_t nmeths, struct property *nmeths) {
     cwp_object_t r;
     size_t i;

     r = calloc(1, sizeof(struct object));
     if (r == NULL) {
          cwp_error(cwpNOMEM);
          return NULL;
     }

     r->nref = 1;
     r->nprops = nprops;
     r->nmeths = nmeths;

     if (r->nprops > 0) {
          r->properties = calloc(r->nprops, sizeof(struct property));
          if (r->properties == NULL) {
               free(r);
               cwp_error(cwpNOMEM);
               return NULL;
          }
     }

     if (r->nmeths > 0) {
          r->methods = calloc(r->nmeths, sizeof(struct method));
          if (r->methods == NULL) {
               free(r->properties);
               free(r);
               cwp_error(cwpNOMEM);
               return NULL;
          }
     }

     if (r->nprops > 0) {
          r->properties = memcpy(r->properties, props, r->nprops*sizeof(struct property));
     }

     if (r->nmeths > 0) {
          r->methods = memcpy(r->methods, methods, r->nmeths*sizeof(struct method));
     }

     for (i=0; i<r->nprops; ++i) {
          if (i > object.nprops) {
               c = strdup(r->properties[i].name);
               r->properties[i].name = (c != NULL) ? c : r->properties[i].name;
          }
          r->properties[i].value->nref += 1;
     }

     for (i=object.nmeths; i<r->nmeths; ++i) {
	  c = strdup(r->methods[i].name);
	  r->methods[i].name = (c != NULL) ? c : r->methods[i].name;
		 }

     return r;
}
	  

The name of a property or a method is duplicated for reasons that will be explained later on (some people might've noticed already what problem it tries to solve, though!)

Now, the object_create function will be defined like this:

static cwp_defmethod(object_create) {
     cwp_object_t r;

     cwp_error(cwpNONE);

     if (returned == NULL) {
          return;
     }

     if (self == NULL) {
          cwp_error(cwpARG);
          return;
     }

     r = oalloc(error, object.nprops, object.properties, object.nmeths, object.methods);

     if (r == NULL) {
          return;
     }

     r->methods[1].value = object_destroy;

     cwp_return(r, cwp_object_t);
}
	  

When returned is NULL, we don't create the object at all because the caller isn't interested in the return value (for whatever reason). When allocating the object, the error code is set by oalloc so in case of errors we can simply return.

When an object is freshly created, before being returned to the caller, the layout of the methods and the properties is known. As such, the “destroy” method is set to object_destroy immediately using an index in the vector.

The generic “create” doesn't need additional additional arguments, so only self is checked for validity.

Destroying Objects

We have a method to create objects, but now we need a method to destroy them. Intuitively, it's just a matter of reversing the “create” method:

  1. Check if the arguments are valid
  2. Decrement the properties' nref field
  3. Destroy properties with no more references
  4. Free the allocated memory for the object

There's not much else to explain, so here is the C code:

static cwp_defmethod(object_destroy) {
     size_t i;

     cwp_error(cwpNONE);
     cwp_return(cwpEmpty, cwp_object_t);

     if (self == NULL) {
          cwp_error(cwpARG);
          return;
     }

     if (self->nref <= 1) {
          for (i=0; i<self->nprops; ++i) {
               if (self->properties[i].value != self) {
                    self->properties[i].value->nref -= 1;
                    cwp_call(error, self->properties[i].value, "destroy", NULL);
                    if (*error != cwpNONE) {
                         return;
                    }
               }

               if (i >= object.nprops) {
                    free((char *)self->properties[i].name);
               }
          }

          for (i=object.nmeths; i<self->nmeths; ++i) {
               free((char *)self->methods[i].name);
          }

          free(self->properties);
          free(self->methods);
          free(self);
     }
}
	  

Since this method isn't expected to return a meaningful value, it returns cwpEmpty.

When examining properties, special care must be taken as an object can contain itself as a property. Since the “destroy” method is called on the object's properties, if a property is the object itself that would cause a sort of infinite recursion. When this case is met, we can skip the property, because the value is going to be destroyed anyway.

Now, here comes something interesting: what happens when the allocated memory is freed? Of course, the answer is that it becomes invalid.

When creating a new object, memory is simply copied over using memcpy. This isn't a big deal, because what is copied is just pointers. However, when an object is destroyed, all the memory allocated for it should be freed.

Since using memcpy all we do is share the same pointers between two different objects, we can't just call free on the pointers, because otherwise if the same data is being used by a still-living object, that data will become invalid, causing a crash (if nothing worse).

As such, names are duplicated. Objects take care of deleting themselves thanks to the nref field (as long as objects are never destroyed with free, but only with the “destroy” method) and function pointers don't need to be deallocated. This means the burden of managing the strings is entirely on us. Duplicating the string as a newly allocated piece of memory solves all of our problems, since we can then deallocate it without worrying.

Of course, we don't need to duplicate every possible name. The statically defined names inside object don't need to be duplicated, since they must be always available and can never be deallocated (otherwise it's impossible to create new objects).

We now have a method to create and a method to destroy objects. “Deep equality”, performed by the “equals” method, is just a matter of comparing each property and method of the two objects (with “equals”). Special cases that can optimize execution aside, it's a fairly trivial implementation, and as such it's omitted. The “to string” method will be explained later, after introducing other concepts.

Setting, Getting and Unsetting

After creating an object, it's only natural that one would want to modify its properties or methods.

Since they are defined as vectors, getting a property or a method is a matter of following these steps:

  1. Check if the arguments are valid
  2. Search the vector for the property or the method with the desired name
  3. If the element is found, return or execute it
  4. Otherwise, return cwpEmpty

Similarily, setting a property or a method would follow these steps:

  1. Check if the arguments are valid
  2. Search the vector for the property or the method with the desired name
  3. If the element is found, change the value field
    • If it was a property, decrease the old value's nref field
    • If the old value has no more references, call “destroy” on it
  4. Otherwise, increase the vector's size and add the new property or method to it

For both these actions, we need to first search the appropriate vector to find out if the object already has the property or the method. There are probably more efficient implementations than plain vectors for this use case, but for the sake of simplicity and ease of understanding, let's roll with that.

Like when creating an object, it's important to duplicate the string used as a name. However, unlike the “create” method, we can't trust the input. After all, during creation we can safely assume the strings are properly NUL-terminated, which is something that can't be said for user-provided strings. While not an end-all solution, using asprintf to duplicate the string should at the very least make the whole thing a bit more robust.

Similarily, changing the vector's size can have some very subtle bugs. Using reallocarray makes the operation a little bit more safe, at the price of requiring a fairly recent feature.

Unsetting (or “deleting”) a property or a method is a special case: to begin with, we have to deal with the fact that the requested method or property might be in the middle of the vector, so we can't just do the reverse of cwp_set by shrinking it.

The solution is actually fairly simple: we swap the position of the requested property or method with the last element of the vector, then perform the other operations.

However, there is at least another important issue: some properties or methods are required by other functions or methods to perform their duties. The clearest example is “destroy” which is called by many functions to make sure objects are destroyed and there are no memory leaks. As such, if the property or method to delete has an index in the vector which is less than object.nprops or object.nmeths, the property or method is not deleted.

The actual implementation of the cwp_set/cwp_method, cwp_get/cwp_call and cwp_unset/cwp_unmethod functions is not shown because it's just a matter of following what has been explained alredy, without any particularly special considerations.

Boxing: Making cwpObject Actually Useful

Now we can add, change or remove properties and methods from our objects. However, right now it's hardly useful. After all, properties can only be objects, and even though methods can operate on aribitrary data, that would miss the point of object-orientation. What we need is a way to store arbitrary data, e.g. an integer, as a property.

The solution is to create an object that contains the actual value, like a cwpInteger that contains the integer value we want to store. Taking from the Java world, this process is called boxing.

Ideally, we want to store as many built-in types inside properties as possible, so we create the following prototypes:

  • cwpString for strings (char *)
  • cwpInt for signed integers (int)
  • cwpNat for “natural numbers” (unsigned int)
  • cwpLongInt and cwpLongLongInt for signed integers with more bits (long int, long long int)
  • cwpLongNat and cwpLongLongNat for natural numbers with more bits (unsigned long int, unsigned long long int)
  • cwpFloat and cwpDouble for floating-point numbers (float, double)

How do we store the boxed value? After all, we specifially forbid adding thos fields to struct property because of easy-to-introduce errors. The answer is making another struct:

struct primitive {
     const char *name;
     char *string_value;
     int int_value;
     unsigned int uint_value;
     long int long_value;
     unsigned long int ulong_value;
     long long int longlong_value;
     unsigned long long int ulonglong_value;
     float float_value;
     double double_value;
};
	  

Making our struct object look like this:

struct object {
     size_t nref;
     size_t nprops;
     struct property *properties;
     size_t nmeths;
     struct method *methods;
     size_t nprivs;
     struct primitive *primitives;
};
	  

Unlike struct property, this structure is completely invisible to the user. In particular, we forbid any modification after creating the object. That way, we reduce the risks of adding a bug caused by mismanagement of these values.

Since we now have this new structure, our object now becomes:

static struct object object = {
     1,
     sizeof(object_properties)/sizeof(object_properties[0]), object_properties,
     sizeof(object_methods)/sizeof(object_methods[0]), object_methods,
     0, NULL,
};
	  

Of course a simple object doesn't need those values, so the pointer is set to NULL (cwpEmpty is the same).

Also, the oalloc function needs to be changed accordingly. It's just a matter of changing the signature:

static cwp_object_t oalloc(cwp_error_t *error,
                           size_t nprops, struct property *props,
                           size_t nmeths, struct method *methods, 
                           size_t nprivs, struct primitive *privs)
	  

And add a block of code to add the primitive values to the new object:

if (r->nprivs > 0) {
     r->primitives = calloc(r->nprivs, sizeof(struct primitive));
     if (r->primitives == NULL) {
          free(r->properties);
          free(r->methods);
          cwp_error(cwpNOMEM);

          return NULL;
     }
	  
     r->primitives = memcpy(r->primitives, privs, r->nprivs*sizeof(struct primitive));
}
	  

Actually, since this structure is fairly limited in what it can do, we can probably optimize its memory usage by placing it in an union or something like that. For the sake of simplicity, though, the implementation explained in this article will keep it like this.

Now, we can define the prototypes. Numbers are essentially the same, except for the “create” methods which takes a value of the appropriate type and assigns it to the appropriate field in the vector of primitive values.

As such, only cwpInt will be shown:

struct property box_properties[] = {
     {"super", &object},
};

struct primitive number_primitives[] = {
     {"value", NULL, 0, 0, 0, 0, 0, 0, 0, 0},
};

struct method int_methods[] = {
     {"create", int_create},
     {"destroy", NULL},
                 {"equals", number_equals},
                 {"value", int_value},
};

struct object integer = {
     1,
     sizeof(box_properties)/sizeof(box_properties[0]), box_properties,
     sizeof(int_methods)/sizeof(int_methods[0]), int_methods,
     sizeof(number_primitives)/sizeof(number_primitives[0]), number_primitives,
};

cwp_object_t cwpInt = &integer;

static cwp_defmethod(int_create) {
     cwp_object_t r;
     int number;
     
     cwp_error(cwpNONE);

     if (returned == NULL) {
          return;
     }

     r = oalloc(error, integer.nprops, integer.properties, integer.nmeths, integer.methods, integer.nprivs, integer.primitives);

     if (r == NULL) {
          return;
     }

     r->methods[1].value = object_destroy;

     number = cwp_argument(int);
     
     r->primitives[0].int_value = number;
     
     cwp_return(r, cwp_object_t);
}

static cwp_defmethod(integer_value) {
     int number;
     
     cwp_error(cwpNONE);
     
     if (self == NULL) {
          cwp_error(cwpARG);
          cwp_return(0, int);
          return;
     }

     number = self->primitives[0].int_value;

     cwp_return(number, int);
}
	  

We keep the “destroy” method because, even though it's a boxed integer, it's still an object, so users can add properties and methods to it.

box_properties is used by cwpString too, while number_primitives is used by the objects boxing a number.

Like object_equals, number_equals is also very simple: it's a matter of comparing each number field of the two objects (after all, the float number 0.0 is the same as the integer number 0, mathematically).

After defining all the boxed numbers, we can finally store a number inside an object:

cwp_call(&error, cwpObject, "create", &o);
cwp_call(&error, cwpInt, "create", &i, -7);
cwp_set(&error, o, "a number", i);
	  

Now, let's define cwpString.

The structure of this object is pretty much the same, with a few differences: to begin with, the value primitive is the value of the string_value field. But a string also has an inherent lenght property. We define this property as a primitive value, initially set to 0 (i.e. it's the empty string), then define a method “length” (or “size”, or even both), to get this number.

Thus, cwpString is defined like this:

struct method string_methods[] = {
     {"create", string_create},
     {"destroy", NULL},
     {"equals", string_equals},
     {"length", string_length},
     {"to c array", string_toarray},
};

struct primitive string_primitives[] = {
     {"length", NULL, 0, 0, 0, 0},
     {"value", "", 0, 0, 0, 0},
};

struct object string = {
     1,
     sizeof(box_properties)/sizeof(box_properties[0]), box_properties,
     sizeof(string_methods)/sizeof(string_methods[0]), string_methods,
     sizeof(string_primitives)/sizeof(string_primitives[0]), string_primitives,
};

cwp_object_t cwpString = &string;

static cwp_defmethod(string_create) {
     cwp_object_t r;
     char *str, *copy;
     int size;

     cwp_error(cwpNONE);

     if (returned == NULL) {
          return;
     }

     str = cwp_argument(char *);
     if (str == NULL) {
          cwp_error(cwpARG);
          cwp_return(cwpEmpty, cwp_object_t);
          return;
     }

     size = asprintf(&copy, "%s", str);
     if (size < 0) {
          cwp_error(cwpNOMEM);
          cwp_return(cwpEmpty, cwp_object_t);
          return;
     }

     r = oalloc(error, string.nprops, string.properties, string.nmeths, string.methods, string.nprivs, string.primitives);

     if (r == NULL) {
          return;
     }

     r->methods[1].value = string_destroy;

     r->primitives[0].int_value = size;
     r->primitives[1].string_value = copy;
  
     cwp_return(r, cwp_object_t);
}

static cwp_defmethod(string_destroy) {
     cwp_error(cwpNONE);
     cwp_return(cwpEmpty, cwp_object_t);

     if (self == NULL) {
          cwp_error(cwpARG);
          return;
     }

     if (self->nref > 1) {
          return;
     }

     free(self->primitives[1].string_value);

     object_destroy(error, self, returned, args);
}
	  

Like when dealing with property or method names, we must duplicate the string that will be boxed. Not only it makes sure we can safely deallocate all the memory used by our string object, but it also makes sure (as far as asprintf goes) that our strings are NUL-terminated. Another advantage is that strings can't be manipulated from outside, making them immutable objects.

String equality is essentially a wrapper around a strcmp of the string_value of the two objects. Unlike normal objects, when comparing strings, what we care for is wether or not the “value” properties are the same, so we ignore other properties or methods (but, of course, a subclass might think differently!).

cwpString contains a special method, “to c array”. This method is used to get the string as a char *, so that it can be used in e.g. printf.

Ultimately, it's just a matter of returning the content of the string_value field, but we can't just return the pointer, as that would break consistency within the object (that is, someone might change the contents of the string or, worse, deallocate it!).

As such, the method is defined as:

static cwp_defmethod(string_toarray) {
     char *copy;

     if (self == NULL) {
          cwp_error(cwpARG);
          cwp_return(NULL, char *);
          return;
     }

     if (asprintf(&copy, "%s", self->primitives[1].string_value) < 0) {
          cwp_error(cwpNOMEM);
          cwp_return(NULL, char *);
          return;
     }

     cwp_error(cwpNONE);

     cwp_return(copy, char *);
}
	  

Now that we also have string objects, we can get a printable representation of our objects:

cwp_call(&error, o, "to string", &s);
cwp_call(&error, s, "to c array", & p);
printf("%s\n", p);
free(p);
cwp_call(&error, s, "destroy", NULL);
	  

The “to string” method is just a matter of returning a new cwpString (or one cached in one of the properties) to the caller. Of course, the returned values (especially the one created by “to c array”) should be deallocated after being used.

About Generic Pointers

With this implementation, the most essential types can be stored as a property of an object. However, a particular type is missing: pointers. We do have pointers to characters (“strings”), but not other types of pointers (i.e. void *, since we can't realistically box every possible pointer).

This is a design choice. Strings are special kind of pointers: the content being pointed to is supposed to be readable by humans, so it will not contain strange bytes and (bugs aside) is always terminated by a NUL. These characteristics allow us to treat strings in a certain way, e.g. by copying them before storing them. The last behaviour in particular means we can deallocate them as we please whenever we're done with them without affecting the outside world (and viceversa) .

We can't do that with generic pointers. The major issue is that in some cases it might be impossible to copy the pointed content. For example, how do we copy a FILE *? In this case, it's not just a matter of using memcpy (or asprintf, or…).

Since we can't copy a FILE * storing it as-is inside an object would mean that someone from outside can invalidate that pointer in one way or another, making our object useless if not dangerous.

To avoid this kind of subtle bugs, generic pointers can't be stored inside an object. Users will have to manage them “the old way” (or by casting some magic by treating them as a normal int, but that's a different kind of problem.)

Private Properties, or Hiding Informations Without Writing Code

As introduced earlier, properties can't be made private. An object always exposes all its properties and code can change them any time, which is something that can cause bugs if some piece of code doesn't behave properly.

Does this means all we can do is hope for the best? We don't. Unlike other prototype-based languages, CWP lacks a “for each” (or “map”, or…) method, that is, there is no built-in way to enumerate the properties of an object.

Why is this important? Because properties are stored in a vector that is not accessible from outside. Since it's not accessible, users can't look at its content without examining the memory layout of the whole object (e.g. by casting it to a uin8_t *, then using pointer arithmetics to get the pointer to the vector), but it isn't a very reliable method, and a faux pas can even corrupt the whole thing.

Thanks to this “missing feature”, the only way a user can know which properties an object has is reading the object's documentation. Consider this example:

cwp_object_t Point(int x, int y) {
     cwp_object_t o;
     cwp_object_t ox, oy;
     cwp_object_t s;

     cwp_call(NULL, cpwObject, "create", &o);
     cwp_call(NULL, cwpInt, "create", &ox, x);
     cwp_call(NULL, cwpInt, "create", &oy, y);
     cwp_set(NULL, o, "x", ox);
     cwp_set(NULL, o, "y", oy);

     cwp_call(NULL, cwpString, "create", &s, "This is private!");
     cwp_set(NULL, o, "private", s);

     return o;
}
	  

Which is documented by this piece of text:

Point
-----
Constructors:
     Point(int x, int y)
          Create a new Point object at position (X, Y).

Properties:
     x (int)
          The X coordinate of the Point object.

     y (int)
          The Y coordinate of the Point object.

Methods:
	  

The “private” property is not part of the documentation; thus, a user that will not read the source code of the Point prototype can't know about it.

As the old saying goes: what the eye doesn't see, the heart doesn't grieve over. If the documentation doesn't mention a property (or even a method), that property (or method) doesn't exist, and if a user somehow uses it, it's a bug in the user's code.

I agree that this method is not the classic encapsulation we're all used to, but the expected result is preventing users from accessing certain properties or methods. The details are unimportant.

By relying entirely on the documentation, a “for each” method might look like:

char *names[] = { "prop1", "prop2" };
size_t nnames = 2;
for (size_t i=0; i<nnames; ++i) {
     cwp_get(&error, o, names[i], &r);

     /* Rest of the code here */
}
	  

Because the documentation told us that o has two properties called prop1 and prop2.

In addition, it has the added bonus that documentation remains up to date. Sometimes, applications or libraries don't really update their documentation, or even just leave it unfinished, for one reason or another. It's common with tools like Doxygen that parse comments. Some functions or data types might lack such a comment or have something very brief, which isn't really helpful. By giving documentation an important role like in CWP, not keeping it as complete and up to date as possible means users will be unable to use it at all, either with the built-in prototypes or some other user-defined prototypes.

Performance and Uses

Even with just a cursory examination, it's obvious that CWP is slower and uses more resources than the other class-based “object-oriented C”.

However, unlike them, it doesn't require anything special before it can be used: since classes are static entities, it can happen that changing just one of them breaks the whole program (e.g. because the linker can't find a symbol anymore when linking .o files). CWP just needs to be #includeed and it will always work. Of course this doesn't mean it's automatically better, but it's certainly a plus.

Given its performance, is it really good for general usage? Certainly, it doesn't make sense to use it to build common data structures. If you need a linked list, it's better to use the usual struct list with raw pointers everyone is familiar with. This isn't Java, you don't need objects for everything.

However, there are some use cases in which objects work better than normal structs. The most obvious example is graphical user interfaces, since the whole thing is modeled as a set of objects that communicate with each other through events ever since Smalltalk. Thanks to the dynamic nature of CWP, it's easier to specialize certain generic widgets: for example a button instance can change it behaviour according to some state just by calling cwp_method at some point. With class-based interfaces this operation is often not possible.

Conclusion

A complete working (and commented) implementation is available on my Gitlab page.

It's especially useful since this article omitted the code in some parts.