Automatic Differentiation with Computational Graph, Part I

Computational graph is a directed graph used to represent a sequence of operations to some input data, until its eventual output. This sequence of operations could be any form of transformations or manipulation to the input data. In this article we will restrict ourselves to those operations that involve mathematical expressions.

The key elements of a computational graph are;

a node and
an edge

Since the graph is used to represent data flow along a sequence of operations, I will expand the concept of a computational graph to also incorporate an additional element; rules. The significance of this addition will become apparent as we proceed. Thus, for our purpose a computational graph will be composed of the following elements;

Node,
Edge and
Rules

A Node

A node will be either;

A round circle, representing either an input, an output or both. Its designation shall depend on the edges from the node.
A square box, representing an operation on an input. We shall refer to such a node as an operator node.

An Edge

This is directed arrow which indicates the direction of flow of data. Its direction will specify whats a input or output.

If the arrow originates from an node we shall refer to such an edge as an output edge and the node as the input node.
Conversely, if it originates from a node we shall refer to such an edge as an input edge, and the node as the output node.

Rules

Rules will guide us on breaking down our expressions and assist us in drawing the graph.

The rules are as follows;

An input node will have at least 1 output edge, conversely an output node will have at most 1 input edge.
An operator node will have at least 1 and at most 2 input edges.
An operator node will have at most 1 output edge.
An operator node will have at most 1 output node.
Between an input and output node, there should be at least 1 operator node.
An edge shall not be dangling. That is, it should not be pointing to an empty space.

Lets look at a few examples;

  graph LR
    A@{shape: rect, label: "A"};

According to rule 2 this is not a valid graph.

  graph LR
    A@{shape: rect, label: "A"} --> B@{shape: rect, label: "B"};

According to rule 2 and 3 this is not be a valid graph.

  graph LR
    A@{shape: circ, label: "A"} --> B@{shape: circ, label: "B"};

According to rule 5 this is not be a valid graph.

  graph LR
    A@{shape: circ, label: "A"};

According to rule 1 and 6 this is not a valid graph.

  graph LR
    A@{shape: circ, label: "A"} --> B@{shape: rect, label: "B"} --> C@{shape: circ, label: "C"};

This is a valid graph.

Transforming expressions into a computational graph

Having met the elements of a computational graph we are now in a position to apply them to some expressions. The steps we shall follow are:

Identify the inputs.
Identify the outputs.
Identify the operations on the inputs to produce the outputs.
Apply rules to steps 1, 2 and 3 in order to generate nodes and edges.

Example 1:

Consider the expression, $\displaystyle y=x+1$. The steps we would follow are;

Identify the inputs;

In this case $\displaystyle x$ and $\displaystyle 1$.

graph LR
  A@{shape: circle, label: "1"}
  B@{shape: circle, label: "x"}

Identify the outputs;
In this case $\displaystyle y$
```
  graph LR
  B@{shape: circle, label: "y"}
```
Identify operations on the inputs;
In this case only one, which is $\displaystyle +$
```
  graph LR
  B@{shape: rect, label: "&plus;"}
```

Using edges to join our inputs, operations and outputs we get;

graph LR
  A@{shape: circle, label: "1"} --> C
  B@{shape: circ, label: "x"} --> C
  C@{shape: rect, label: "&plus;"} --> D@{shape: circ, label: "y"};

Example 2:

Convert the following expression to a computational graph; $\displaystyle f = (x^2+y^3)^\frac{1}{2} + \sin(x^2+e^y)$

solution:

Identify inputs: $\displaystyle x$ and $\displaystyle y$

2 Identify outputs: $\displaystyle f$

Identify operators: $\displaystyle +,\quad (.)^\frac{1}{2},\quad (.)^2,\quad (.)^3,\quad \sin(.),\quad e^.$
Apply rules and draw the graph

graph LR
  A@{shape: circ, label: 'x'} --> B@{shape: rect, label: "(.)^2"} --> C@{shape: circ, label: "A"}
  D@{shape: circ, label: 'y'} --> E@{shape: rect, label: "(.)^3"} --> F@{shape: circ, label: "B"}
  D --> G@{shape: rect, label: "e^(.)"} --> H@{shape: circ, label: "C"}
  C --> J@{shape: rect, label: "&plus;"} --> K@{shape: circ, label: "D"}
  F --> J
  K --> L@{shape: rect, label: "(.)^1/2"} --> M@{shape: circ, label: "E"}
  C --> N@{shape: rect, label: "&plus;"} --> P@{shape: circ, label: "F"}
  H --> N
  P --> Q@{shape: rect, label: "sin(.)"} --> R@{shape: circ, label: "G"}
  R --> S@{shape: rect, label: "&plus;"} --> T@{shape: circ, label: "H"}
  M --> S

From the graph we can now also generate its sub-expressions starting from the right;

$$ H = E + G$$$$ E = D^\frac{1}{2}$$$$ G = \sin(F)$$$$ F = A + C $$$$ D = A + B $$$$ A = x^2 $$$$ B = y^3 $$$$ C = e^y $$

Conclusion

The incorporation of rules to a computational graph significantly helps in simplifying and clarifying the process of breaking down long expressions and generating sub-expressions. This shall prove to be very useful in automatic differentiation as discussed on part II of this article;

The pros of this technique are:

Errors can quickly be Identified.
Its easy to generate sub-expressions.

while its cons are:

For long expression the graph can quickly become large.

Exercise:

Generate the computational graph and sub-expressions for the following expressions;

$\displaystyle f=d-g(a*b + c)$ where g(.) is some arbitrary function.
$\displaystyle f=L(y,\ g(w*x+b))$ where L(., .) and g(.) are some arbitrary functions.

A Node#

An Edge#

Rules#

Transforming expressions into a computational graph#

Example 1:#

Example 2:#

Conclusion#

Exercise:#