Executing JSON

Introduction

As the old adage goes, “Code is data and data is code”. This article is just a fun experiment on the matter.

The code presented here has quite a lot of poor design choices, but it's meant to be simple and easy to understand (as much as possible) while still doing its job. As such, please don't mind too much the lack of error checking and how some stuff isn't particularily great!

Getting started

The goal of this article is to show that it is possible to execute JSON, that is, writing a program in that language and feed the program to an interpreter. The interpreter will then return the result of the computation.

An interpreter is essentially made of three components: a reader, an evaluator and a printer.

The reader component's job is to get the input from the user and transform it so that the next component can work on it.

The evaluator is the actual workhorse of the interpreter: it examines the input received from the reader, manipulates the environment (e.g. some variables are defined) and returns results from the various computations.

The printer is simply a component that shows the result to the user. In interactive environments (say, a shell like bash) the value is shown immediately, but in a different place might write it to a specific file. Regardless, this component doesn't really change the input but merely gives it to the user in a suitable manner.

Our JSON interpreter will be structured like that too. The implementation language will be Javascript, because it integrates well with JSON and because it's a fairly flexible language, meaning developing a prototype is faster.

Input from the user

Before we can read anything from the user, we have to decide how the input is structured. Naturally, being JSON means that the syntax is already decided, but we still have to decide how the program is executed. For example, in C execution always starts from a function called “main” while in Javascript expressions are evaluated immediately when the interpreter meets them, meaning a statement like console.log(1) will be executed immediately unless it's inside the body of a function.

This interpreter will do the same, so the input must be in a format that allows iterating over the various statements. We'll use an array for this: besides being the iterable collection, it is also valid JSON by itself, meaning we can use JSON syntax to structure a JSON program.

Thus, executing the interpreter would be something like

interpreter([exp1, exp2, exp3, exp4]);

We can now write the skeleton of our interpreter:

function interpreter(input) {
  let ret = null;

  for (let i=0; i<input.length; ++i) {
    // TODO
  }

  return ret;
}

The default return value is null because the interpreter will always return valid JSON, and null is valid JSON for “there is no data”.

The reader and the printer

Reading input is quite simple:

function jread(v) {
  let ret = null;

  try {
    ret = JSON.parse(v);
  } catch (e) {
    ret = v;
  }

  return ret;
}

Because we expect to work with JSON, the reader will first try to parse the input to obtain some actual JSON. If the parsing fails, we pass the input as-is, because it might be a normal string and that's still valid data.

Like the reader, the printer is also quite simple:

function jprint(v) {
  return v;
}

For the purpose of this article it is assumed that the interpreter will be called from a command line (e.g. the developer console of the browser), so the printer will simply return the value to the command line, letting the command line itself handle the value.

Now that we have these two components, we can add them to the interpreter, inside the for:

ret = jread(input[i]);
// TODO: evaluate ret
ret = jprint(ret);

Evaluating expressions

As said earlier, the purpose of the reader is to transform the data taken from the user in a suitable format that the evaluator can compute. We chose to transform the JSON defined by the user into valid JSON, so our evaluator will examine the content of the given JSON element to understand what to do.

In the simplest case, an element that is not an object (i.e. is not enclosed in curly braces) is a “primitive” value (or “atom”) and evaluates to itself, so our evaluator will return the input value as-is if it's an atom:

function jeval(v) {
  if (typeof v !== 'object' || v instanceof Array) {
    return v;
  }

  // Other actions
}

When the value v is an object, jeval will treat is as a “complex” expression; the evaluator will examine it and will try to compute an atomic value out of it.

Because complex expressions are objects, jeval needs a way to differentiate the intented purpose of the expression. A very simple case is defining a variable versus using it. This is done by using a specific keyword: for example in Javascript a variable is defined with var or let, while in C it's defined by specifying a type before the variable name.

We're going to do the same by reading the type property of the object being examined. This property will contain all the informations needed to describe the purpose of the expression, and will then not be limited to just variables.

Thus, jeval can be expanded with a simple switch statement:

let ev = null; // The result of the evaluation
switch (v.type) {
  // Accepted types here
  break;
default:
  ev = null;
  break;
}
return ev;

Type tags

“Type tags” are the accepted values of the type property. Here we're going to define only the basic ones, but the list can be expanded as needed.

To define a variable, the type tag will be the actual type of the stored value. Even though the tag could've just been something like “var”, a properly written interpreter might want to add type-checking during evaluation, especially since it's fairly cheap in this case.

The types accepted by the interpreter are: "num", to define a number; "str" to define a string; "vec" for arrays. Objects are not included in the list, because their semantics are not well defined (e.g. should the fields of an object be evalued? Some implementations might say “yes”, while other might say “no”.) Dealing with objects would then complicate the whole thing and it's beyond the scope of the article.

Expressions tagged with those type tags will then have two more properties: name will assign a name to the variable; if missing the value will disappear once the computation is complete, unless used by another expression. value will be the actual value assigned to the variable. It can be any valid JSON as long as the type match after being evaluated.

To obtain the value stored inside a variable, an expression tagged "var" is used. This expression will simply have a name property contaning the name of the variable.

Since functions are also complex expressions, the interpreter will treat them as they were variable definitions introduced by the "fn" tag. Unlike simple variables, functions are defined “in two steps”: the first step is specifying the "fn" tag, the other is using a special object as the variable's value. This object will have two properties: args is an array of strings, in which each element is an argument accepted by the function; body is an array of JSON elements, representing the body of the function, so each element within will be evaluated by the interpreter.

Since we defined functions, we need a way to execute them; as such, a "call"-tagged expression will be used. This expression will have a name property indicating the name of the function to call (as a string) and a args property, an array listing the values for each argument passed to the function.

The environment

Since we have a way to store values inside variables, we need a place to store variables so that we can obtain them later. This place is the environment and it will be a simple object (or rather, a dictionary): when a variable is defined the name will be used as a key to find the stored value.

This object will be initialized by the interpreter and it will be passed to jeval, so the function signature is now:

function intepreter(v, env) {
  // etc
}

This environment will be manipulated by two functions: one to look up a key called jlook and the other to update the value stored at the specified key, jupdate.

The environment can be manipulated directly, but as we'll see later we'll then have to change how the environment is used, so let's use functions right away.

function jlook(key, env) {
  return (env[key]) ? env[key] : null;
}

function jupdate(key, v, env) {
  env[key] = v;
  return v;
}

Now that we have our environment, we can finally store our variables. Thus, the switch in jeval can be extended like so:

case 'num':
case 'str':
case 'vec':
  ev = jeval(v.value, env);
  if (v.name) {
    jupdate(v.name, ev, env);
  }
  break;

Obtaining the value from a variable is quite straightforward:

case 'var':
  ev = jlook(v.name, env);
  if (typeof ev === 'object') {
    return '<function ' + v.name + '>';
  }
  break;

Even though functions are also values stored inside variables, they are considered a special case (at least in this implementation), so when "var" is requested on a function value, a special string is returned instead of the actual object. As we'll see later, the object stored for functions is considered invalid by the interpreter, so without this special handling things would break in some cases.

Functions

Functions are special objects: they don't have a type tag, but they have properties storing their argument list and body. Due to this, they have to be treated in a special way.

Also, the purpose of functions is to define expressions that can be executed any time multiple times, so we also have to consider how to handle the variables used inside: if variables are defined while the function is executed there aren't particular problems (as long as the variable is defined before being used), but if a variable is defined outside then there must be a way to obtain the value stored there.

There are two way to do this, dynamically and statically. The differences between the two are outside the scope of the article. For our interpreter, we're going to use the second method (also known as static scope.)

When a function is defined (type tag "fn"), the interpreter will call the jfn function, whose purpose is to define an object with enough informations so that expression tagged "call" can execute it.

function jfn(v, env) {
  let f = {};

  f.env = {__parent: env};
  f.args = v.args;
  f.body = v.body;

  return f;
}

The new env property is used to find non-local variables; in particular, if f.env does not have a value associated with the variable (f.env[key] is undefined), then the __parent environment is examined. This is repeated until a value is found or there are no more parent environment.

Since functions can also change the value of a non-local variable, updating the environment follows the same procedure: environments are examined until the one with the variable to be updated is found or there are no more parents.

function jlook(key, env) {
  let e = env;
  let r = undefined;

  while (r === undefined && e !== undefined) {
    r = e[name];
    e = e.__parent;
  }

  return r;
}

function jupdate(key, v, env) {
  let e = env;
  let r = undefined;

  while (r === undefined && e !== undefined) {
    r = e[name];
    e = e.__parent;
  }

  e = (e === undefined) ? env : e;
  e[name] = val;

  return e;
}

Now that functions are stored in the environment, we can execute them with jcall:

function jcall(f, args, env) {
  let e = {__parent: f.env};

  for (let i=0; i<f.args.length; ++i) {
    e[f.args[i]] = jeval(args[i], env);
  }

  let ret = null;
  for (let i=0; i<f.body.length; ++i) {
    ret = jeval(f.body[i], e);
  }

  return ret;
}

A new environment is created for the purpose of storing the values associated with the function arguments. Then the expressions within the body are simply evaluated using the new environment.

Now that we have these two function, extending the switch in jeval is trivial:

case 'fn':
  ev = jfn(v.value, env);
  if (v.name) {
    jupdate(v.name, ev, env);
  }
  break;
case 'call':
  ev = jcall(jlook(v.name, env), v.args, env);
  break;

Conditionals and loops

It's hard to make significant programs without at least conditional expressions. Loops can be substituted with recursive functions, but they are not always the best choice, either for limitations in the implementation (e.g. the limit on nested functions is too low) or simply because they are slower than explicit loops.

A conditional expression will be tagged with "if" and will contain three properties: cond is the condtion to check; then is an array containing expressions to evaluate if cond is truthy; else is an array with expressions to evaluate if cond is falsy.

What constitute “truth” and “falsity” is fairly arbitrary: it can be two special values like the strings "true" and "false" or it might be that only one value is false and the rest is true. For simplicty this implementation will use the second approach, treating any non-null values as true and null as false.

The implementation is just what one would expect (the switch case is omitted):

function jif(cond, then, other, env) {
  let c = jeval(cond, env);
  return (c === null) ? jeval(other, env) : jeval(then, env);
}

Regarding loops, we're going to use the “while” semantics, that is, as long as the condition is true the body will be executed. Extending this semantics to “for” isn't too hard, but is not necessary for this article.

function jloop(cond, body, env) {
  let r = null;

  while (jeval(cond, env) !== null) {
    for (let i=0; i<body.length; ++i) {
      r = jeval(body[i], env);
    }
  }

  return r;
}

Arithmetic and logic

The subset of JSON defined so far doesn't have any way to specify actions like “increase the value of this variable by 5” or “check that both these expression are null”. That's why the interpreter must define these operations within itself (“primitive operations”) and expose them to the user somehow.

With the interpreter defined so far, there are two ways to expose these operations: make them type tags or make them special function objects. The first approach makes sure that the operation will always be the same even if the user defines a function with the same name, but on the other hand it will be impossible to use them as arguments to higer order functions; the other approach is the opposite: since they are functions they can be used as arguments in function calls, at the cost of letting the user redefine them.

The chosen method is not really important as long as it's documented, so this article will only show how to implement the operations.

Since we can use arrays as values (arrays are valid JSON), there's no need to limit these operations to two values (unless it does not make sense to have more than a certain number of arguments. For example, the not operation applies only to one argument.)

The sum and the multiplications are very easy and similar with each other:

let r = 0; // use 1 for multiplication
for (let i=0; i<args.length; ++i) {
  r = r + jeval(args[i], env); // use * for multiplication
}
// r will have the final value

Subtraction and division are more difficult, because their semantics change depending on how many arguments are passed to them. With only one argument, subtraction will negate the value, so -(1) will return -1; division will instead calculate 1/n, so /(3) will give 1/3. More than one argument means applying the operation each two arguments, so -(1, 2, 3, 4) means ((1 - 2) - 3) - 4 and /(1, 2, 3, 4) means ((1 / 2) / 3) / 4.

// subtraction
if (args.length === 1) {
  r = -(jeval(args[0], env);
} else {
  r = jeval(args[0], env);
  for (let i=1; i<args.length; ++i) {
    r = r - jeval(args[i], env);
  }
}

// division
if (args.length === 1) {
  r = 1 / jeval(args[0], env);
} else {
  r = jeval(args[0], env);
  for (let i=1; i<args.length; ++i) {
    r = r / jeval(args[i], env);
  }
}

Even though they are not strictly arithmetic operations, since we are dealing with numbers let's also define comparison operations, that is, “less than”, “greater than”, etc.

Since arguments are passed in as an array, the operation will check that the elements inside the array are ordered according to the operation. If they are not, null will be returned immediately as the “unordered” value is found. If they are, a truthy value will be returned; since any non-null value is fine, for the sake of being explanatory this implementation will return the string "true".

let m = jeval(args[0], env);
for (let i=1; i<args.length; ++i) {
  let n = jeval(args[i], env);
  if (m <= n) { // use >=, > or < accordingly for other operations
    return null;
  }
}
return 'true';

Logical operations are also fairly straightforward: && will evaluate the arguments as long as they are non-null, || will evaluate the arguments as long as they are null while not will change a non-null value into null and null to "true":

// &&
let c = jeval(args[0], env);
for (let i=1; i<args.length && c!==null; ++i) {
  c = jeval(args[i], env);
}

// ||
c = jeval(args[0], env);
for (let i=1; i<args.length && c===null; ++i) {
  c = jeval(args[i], env);
}

// not
c = (jeval(args[0], env) === null) ? 'true' : null;

There are actually more primitive operations than arithmetic, logic and numerical ordering (especially with array and strings as primitive values), but they are outside the scope of this article.

A micro example

Now that the interpreter is finished, we can finally write a simple program. And what's simpler than “Hello World!”?

let helloworld = [
  {type: 'str',
   name: 'hello',
   value: 'Hello World!'},
  {type: 'fn',
   name: 'greet',
   value: {args: [],
           body: [{type: 'var',
                   name: 'hello'}]}},
  {type: 'call',
   name: 'greet',
   args: []},
];

console.info(interpreter(helloworld));

Epilogue

The subtitle of this article is “With a Lithp”.

Also check out this other article writtent ages ago.