Implementing Linked Lists with Contiguous Arrays

Contents

1 Introduction

The linked list is a well-known data structure, with a straightforward implementation that is a common exercise for students and interviewees alike. The advantage of a linked list is the \mathcal O(1) cost of insertion and erasure of elements at arbitrary positions, which does not require the existing data elements to be moved around. It is also useful for swapping elements, or splicing two lists together, which can be done purely by modifying the links.

The problem is that the traversal of a linked list is deeply offensive to modern hardware, and it’s difficult to avoid traversal when implementing a higher-level algorithm. Even if the use case primarily concerns insertion/erasure, there is often some traversal involved to get to the required position of insertion/erasure.

Pointer chasing is one of the best ways to destroy performance on modern hardware, and the linked list is the king of pointer chasing. This is something that has been highlighted in recent years as the importance of cache-friendliness has become more publicized. Here are a few recent talks on the topic:

The traversal of a linked list is so disastrous that its cost can overwhelm the linked list’s benefits, to such an extent that a vector ends up being the better choice, despite its expensive insert/erase. This is particularly the case with conventional implementations, such as std::list, that do separate allocations for each node. This has the cost of an allocation/deallocation for every insertion/erasure, and potentially scatters the nodes wildly over memory.

An alternate implementation is the intrusive linked list, where the allocation is outsourced to the user, who can preallocate the nodes in a contiguous array and then link/unlink them at their leisure. Allocating the nodes in a contiguous array improves the situation somewhat, but the link pointers (prev, next) can still be spread out over a large area as they are separated by the data elements:

prev next [---data---] prev next [---data---] prev next [---data---] ...

When traversing the list, we don’t necessarily need to access the whole of each data element; we primarily wish to follow the trail of pointers. We will refer to traversal that doesn’t touch the values as pure traversal. This memory layout is inefficient for pure traversal as a significant amount of each cache line contains irrelevant data, wasting both memory bandwidth and increasing the chance of cache misses. This gets worse with larger data types.

2 Reinventing the Linked List

My objective was to create a non-standard linked list that attempts to mitigate the cost of pure traversal while still retaining the \mathcal O(1) insert/erase benefit, and then to see how that affects the performance of various algorithms. To do this, I began by separating the storage of the values from the nodes that describe the linked structure.

struct list {
    struct node {
        index_type prev, next;
    };
    std::vector<value_type> values;
    std::vector<node> nodes;
    index_type head, tail;
};

(templates omitted)

Both the nodes and the values are stored in their own contiguous arrays. This allows for indexes to be used instead of pointers, and this in turn allows for small unsigned integers (such as uint16_t) to be used instead of the full 64-bits that you get with a pointer. The idea here is to make the array of nodes as compact as possible.

[---data---] [---data---] [---data---] [---data---] [---data---] ...
prev next prev next prev next prev next prev next ...

The association between a node and its value is implicit — the value associated with nodes[i] is values[i]. There is no explicit pointer-to-value. Each index identifies both a node and a value, and this design allows us to pack as much of these indexes into cache as possible.

The contiguous property of the two vectors is an invariant of the list. The insertion and erasure operations are designed to achieve this:

  • Insertion pushes a node and value to the back of their respective vectors, which is amortized \mathcal O(1). The links are updated as usual.
  • Erasure moves the back node/value to the index of the erased node/value and then pops both vectors. The links need to be updated to reflect this move, and it costs a move operation of one node and one value, but there is no deallocation and it is \mathcal O(1).

With the basic design established I then fleshed out the functionality, following the interface of std::list so that it can be used as a drop-in replacement. The implementation is on GitHub: https://github.com/cwzx/cwds.

2.1 Benefits

There are several benefits to this approach that can be noted:

  • As the implementation is backed by vectors, additional functionality not found in std::list can be provided, such as .reserve(). Being able to preallocate the memory is a big win for performance.
  • No allocation on insert, apart from when the vectors need to grow to accommodate more elements. This can be reserved upfront.
  • No deallocation on erase.
  • As the values are separate to the linking machinery, we can implement some algorithms directly on the vector of values, bypassing the linked structure entirely. This is possible when the algorithm accesses all of the elements and doesn’t depend on the order of the elements. Examples:
    • .remove()
    • .remove_if()
    • .unique()

    And when applied to the whole list, these STL algorithms can also benefit in this way:

    • std::accumulate
    • std::any_of
    • std::all_of
    • std::none_of
    • std::count
    • std::count_if
    • std::fill
    • std::replace
    • std::replace_if

    These can be given the full cache/prefetcher benefits of linear traversal over contiguous arrays.

  • Some operations do not touch the values at all, they just rewire the links. For example, .reverse(). These operations can be implemented directly on the vector of nodes, bypassing the values entirely.
  • The list can be constructed/assigned to using an existing vector of values by both move and copy.
  • The list can be copied/moved without needing to update any links.
  • Without any explicit pointers to worry about, the list can be easily serialized.

2.2 Disadvantages

  • Due to the separation of nodes from values, swapping two lists invalidates all iterators to both lists.
  • Both .merge() and .splice() require allocation and moving the values across, as we require contiguous data.
  • Erasing an element invalidates iterators, as the (vector) back is moved to the erased index, and the moved element could have any position in the list.

3 Benchmarks

Benchmarks were performed to compare cw::list with std::list. Tests were performed for different numbers of elements, and for different sizes of the value type. For cw::list, the smallest possible unsigned integer type was used for the index type depending on the number of elements tested. Each test was repeated multiple times.

The following compiler and settings were used:

  • Visual Studio 2015 Preview
  • x64 Release
  • AVX2 instructions enabled

A custom implementation of std::chrono::high_resolution_timer was used that uses Window’s actual high resolution timer (QueryPerformanceCounter).

The following hardware was used:

  • Intel i5-4670K @ 4.4GHz (Haswell)
  • 16GB DDR3 1600MHz CAS-9

Three list creation strategies were tested:

  • Back — insert each value at the back.
  • Mid — insert each value at the midpoint.
  • Random — insert each value at the front or back, chosen at random with equal probability.

The first strategy produces a linear ordering (i.e. iterating over the list corresponds to linearly traversing the underlying arrays), the other two produce nonlinear orderings.

As we’re interested in comparing performance of the two list implementations, we will plot the ratio of time taken for std::list over the time taken for cw::list. Results greater than 1 indicate a performance improvement.

3.1 Pure Traversal

Pure traversal means iterating over all the elements of the list without touching the values.

For value types between 1 byte and 8 bytes we see the same pattern:

trav8

Going beyond 8 bytes, the advantage cw::list has over std::list at large list sizes grows:

trav1k

3.1.1 Large list improvements

Value Type Size
(bytes)
Improvement Factor
(linear)
Improvement Factor
(nonlinear)
List Sizes
1–8 1.6 3.2 >300 000
16 2.3 4.5 >250 000
32 3.2 7.5 >200 000
64 4.75 14 >100 000
128 15 20 >120 000
1024 25 40 >120 000


3.2 Accumulate

Now we test traversal where every value is accessed. std::accumulate sums the values of the list. As the objective here is to test traversal of the linked structure, we use the standard implementation of accumulate, rather than cw::list‘s optimized version that bypasses the linked structure.

accumulate8

accumulate16

accumulate32

accumulate64

accumulate128

accumulate1k

As with the previous benchmark, we see improvements for large lists, with benefits increasing with value size, although to a lesser degree this time. The nonlinear traversals see the greatest improvement, with the random pattern tending to see the most benefit.

This is good news, as we would expect a list in the wild to be in a nonlinear, pseudo-random configuration, particularly after prolonged use under many insertions/erasures/swaps/etc.

3.2.1 Large list improvements

Value Type Size
(bytes)
Improvement Factor
(linear)
Improvement Factor
(nonlinear)
List Sizes
1–8 1.2 1.8 >300 000
16 1.8 2.25 >200 000
32 1.7 2.15 >750
64 1.45 2.8 >2500
128 1.9 3.9 >2500
1024 3.6 6.2 >70 000


3.3 Insert Random Values Sorted

This benchmark creates a list by inserting random values such that the list remains sorted. This tests both traversal (to find the insertion point) and insertion. The traversal is performed by std::lower_bound, which is a binary search. Note that a binary search uses pure traversal to navigate the lists (via std::advance). We can also compare the lists to std::vector, for which the binary search uses direct indexing.

For this benchmark, the maximum list sizes tested are significantly smaller than the previous two benchmarks. This is due to the complexity being \mathcal O(n\log n) for the lists and \mathcal O(n^2) for the vector.

First we can compare the two list implementations:

randcwstd

Again, cw::list performs better for large lists sizes, for all value sizes.

We can also compare both std::list and cw::list with std::vector. The vector wins for small value types, but the lists win for large value types:

randveccw

For cw::list, the performance advantage over std::vector begins at 256-byte values.

randvecstd

For std::list, the performance advantage over std::vector also begins at 256-byte values, but it appears to decay somewhat with increasing list sizes.

The performance of std::vector on this task declines significantly as the value size increases due to the cost of insertion, which must shift large amounts of data proportional to the value size.

For std::list, performance declines slightly with value size, either due to the allocation, or due to the cost of traversal, as for large value sizes it must use an entire cache line for each node traversal even though it doesn’t touch the value.

However, the performance of cw::list is almost completely independent of value size, as the dominant factor is the (pure) traversal required by the binary search, which is independent of value size.

4 Conclusion

Contiguous arrays are great for performance, but the requirements of a linked list mean we can’t traverse these linearly. However, we can still gain some benefit by mitigating the effect of pointer-chasing by separating the nodes from the values and making them as compact as possible.

The benchmarks show that this approach is a win over std::list for large list sizes and the benefit increases with value size.

However, for applications involving a combination of traversal and insertion, the vector still wins for small value types (128 bytes or less in this example), which appears to be true for all list sizes.

So unless you have large value types, the standing advice remains: just use a vector.

Advertisements

From Torque to Angular Acceleration… Correctly.

Introduction

I’ve been seeing the relationship between torque and angular acceleration stated as

\alpha = I^{-1}\tau.

Although this has a nice analogy with the force, mass and acceleration of translational dynamics (a=m^{-1} F), it is actually incorrect. There is an extra term missing from the right-hand side. The correct version is

\alpha = I^{-1}(\tau-\omega\times L).

This extra \omega\times L term is sometimes called the gyroscopic term. It is non-zero when the angular velocity and angular momentum have different directions. Unfortunately there are some lecture notes, slides and articles that make no mention of this extra term, nor do they explain why it has been omitted.

This is particularly the case with material aimed at the development of real-time simulations of rigid-body dynamics, such as game development.

Looking at open source physics libraries, you most likely won’t find this missing term there either, so where has it gone?

The reason for the omission is that the extra term causes instability in numerical simulations with general-purpose solvers. There’s no physical reason to omit it — indeed the extra term has real physical consequences — but where physical accuracy isn’t paramount it is simpler to just remove the term than to employ a special-purpose solver.

This article will derive the torque-angular acceleration relationship in full.

Derivation

From basic rigid-body dynamics we know that the net torque, \tau\in\mathbb R^3, is the time derivative of the angular momentum, L\in\mathbb R^3,

\tau = \dot L.

The moment of inertia tensor, I\in\mathbb R^3\otimes\mathbb R^3\cong\mathbb R^{3\times 3}, is defined as the linear relationship between angular velocity \omega\in\mathbb R^3 and angular momentum,

L = I\omega.

Combining these two equations gives

\tau = \frac{\mathrm d}{\mathrm dt}(I\omega) = I\dot\omega + \dot I\omega.

The key thing to note is that the moment of inertia tensor is a time-dependent quantity — it changes as the body rotates. Only when described in a body-fixed rotating frame is it constant.

Now \dot\omega is just the definition of angular acceleration, \alpha, so this becomes

\tau = I\alpha + \dot I\omega

The tricky bit is the second term, which involves the time derivative of the inertia tensor. If \tilde I is the constant inertia tensor in the body-fixed frame and R\in\mathrm{SO}(3) is the current rotation matrix for the body (the body-to-world rotation), then

I = R \tilde I R^{\mathrm T}

and

\dot I = \dot R \tilde I R^{\mathrm T} + R \tilde I \dot R^{\mathrm T}.

The rotation matrix evolves due to angular velocity via

\dot R = \omega_{\times} R,

where \omega_{\times}\in\mathfrak{so}(3) is the anti-symmetric cross-product matrix generated by \omega, giving

\dot I = \omega_{\times} I - I\omega_{\times}

and

\dot I\omega = \omega_{\times} I\omega - I\omega_{\times}\omega

The second term is null,

I\omega_{\times}\omega = I(\omega\times\omega) = 0,

and the remaining term is

\dot I\omega = \omega_{\times} I\omega = \omega\times L.

Therefore the torque is

\tau = I\alpha + \omega\times L

and the angular acceleration is

\alpha = I^{-1}(\tau - \omega\times L).

The Gyroscopic Term

In the absence of torque the gyroscopic term means the angular acceleration can still be non-zero. This is the cause of a phenomenum called torque-free precession, where the angular velocity vector gradually rotates around the angular momentum.

The reason for the gyroscopic term’s mysterious absence from numerical simulations is that it can cause instability when used with general-purpose solvers. A full treatment of this instability is beyond the scope of this short article, but suffice it to say that numerical instability is not acceptable. A special solver may be needed to safely include this term, and so as long as a physically accurate result isn’t required, it is simpler to omit it completely. This is a simple and cheap solution to the problem, but it means that the simulation isn’t accurate enough for the scientific level as it cannot exhibit phenomena such as torque-free precession. However, for non-scientific applications, such as games, it can still give a physically-plausible result without risk of instabilities nor cost of special solvers.

Automatically Computing the Derivatives of a Vector Field using Dual Numbers

Dual numbers are 2-dimensional numbers that can be regarded as an extension of the real numbers. Along with the complex and split-complex numbers, they are one of the three 2-dimensional unital algebras.

Each dual number can be described as a linear combination of the real unit 1 and the dual unit \varepsilon. Whereas the complex numbers feature an imaginary unit that squares to -1, the dual unit squares to zero, \varepsilon^2=0. This leads to the following algebraic operations.

We will write a dual number x+\varepsilon y as the pair (x,y).

Addition and subtraction are element-wise,
(x,y) \pm (a,b) = (x\pm a,y\pm b),
with identity (0,0).

Multiplication is
(x,y) (a,b) = (xa,ya+xb),
which is commutative and has identity (1,0).

Division is therefore
\frac{(x,y)}{(a,b)}= (\frac xa,\frac{ya-xb}{a^2}).

From this we can see that
(x,y)^n = (x^n,nx^{n-1} y),\quad n\in\mathbb N_1.

Extending Real Functions to the Dual Plane

We can use this algebraic structure to extend any real analytic function f:\mathbb R\to\mathbb R to the dual plane, \tilde f:\mathbb R^2\to\mathbb R^2 by taking its Taylor series and substituting a dual argument.

f(x) = \sum_{n=0}^\infty \frac{f^{(n)}(a)}{n!} (x-a)^n

\tilde f(x,y) = \sum_{n=0}^\infty \frac{f^{(n)}(a)}{n!} (x-a,y)^n

\tilde f(x,y) = \sum_{n=0}^\infty \frac{f^{(n)}(a)}{n!} \left((x-a)^n,n(x-a)^{n-1} y\right)

\tilde f(x,y) = \left(\sum_{n=0}^\infty \frac{f^{(n)}(a)}{n!}(x-a)^n,y\sum_{n=0}^\infty \frac{f^{(n)}(a)}{n!}n(x-a)^{n-1}\right)

\tilde f(x,y) = (f(x),yf'(x)).

We can use this rule to uniquely extend any differentiable function to the dual plane, or indeed any suitable subset of the real line to the corresponding subset of the dual plane. This extension is compatible with the composition of functions, i.e. the extension of a composition is the composition of the extensions,

\widetilde{f\circ g} = \tilde f\circ\tilde g.

This is useful as it means we can extend complicated functions by composing elementary functions.

f(x) f'(x) \tilde f(x,y)
x+c 1 (x+c,y)
cx c (cx,cy)
x^n nx^{n-1} (x^n,nx^{n-1}y)
e^x e^x (e^x,e^xy)
\log x 1/x (\log x,y/x)
\sin x \cos x (\sin x,y\cos x)

For example, to extend \sin^2 x we extend \sin x to give (\sin x,y\cos x), then square the result to give (\sin^2 x,2y\sin x\cos x).

In other words, we don’t need to know the derivative of the complicated function in order to extend it, we only need to know the derivatives of the elementary functions of which it is composed.

Automatic Differentiation

A real function’s derivative at a given point x\in\mathbb R can be obtained by applying its extension to the dual number (x,1) and then taking the dual part of the result,

\tilde f(x,1) = (f(x),f'(x)).

This is called automatic differentiation. This is particularly useful when we don’t necessarily need to know f' as a function, we just want to obtain its value at a particular point x. If the function f is a complicated combination of many elementary functions, then computing the derivative on paper can be both time-consuming and error-prone. Instead of finding f' in full and then evaluating it, we can just evaluate the original function f with a dual number.

This technique can be applied to multivariable functions, such as vector fields, V:\mathbb R^n\to\mathbb R^n, giving

\tilde V(x+\varepsilon y) = V(x) + \varepsilon (y\cdot\nabla) V|_{x}.

The dual part is the directional derivative of the vector field V in the direction y, evaluated at x. By choosing the direction to be the jth basis vector, e_j, and then taking the ith component of the result, we can automatically compute the partial derivatives of the vector field,

\tilde V^i(x+\varepsilon e_j) = V^i(x) + \varepsilon \partial_j V^i|_{x}.

If we denote the dual part as \mathfrak D(x,y) = y then the Jacobian is given by

\left[\begin{array}{ccc}\mathfrak D\tilde V(x,e_1)&\cdots&\mathfrak D\tilde V(x,e_n)\end{array}\right].

This means we can compute the derivatives of an n-dimensional vector field by evaluating the vector field n times with dual numbers.

In a practical implementation it is more convenient to think of (x,y) as a vector of dual numbers, rather than a pair of real vectors. This allows for a single implementation of the vector field to be evaluated with a vector of either real or dual elements, for example, by templating the scalar type in C++:

template<typename Scalar>
Vector<Scalar> MyVectorField( const Vector<Scalar>& x ) {
    // evaluate the vector field
}

The version of automatic differentation discussed here (using dual numbers) is called forward automatic differentiation. There is an alternate version called reverse automatic differentiation, and also extensions to the higher derivatives. Wikipedia has a decent article on the topic.

Iterating Over Distinct Pairs in C++

It is common in programming to iterate over a data set, processing each element in turn. However there are cases where one wants to process pairs of elements from the data set. This situation readily arises when evaluating binary relations between elements, such as collisions (intersections) between geometric objects, or the relation “has the same colour as” between coloured objects. Many binary relations (such as the aforementioned examples) feature two properties — symmetry and reflexivity — that mean we don’t have to process all pairs, but only a subset of the pairs. Symmetry means the order doesn’t matter — e.g. if x has the same colour as y then y has the same colour as x. Reflexivity means that every element is related to itself — e.g. x has the same colour as x for all x. These two properties mean we only need to iterate over the distinct pairs.

Let X be a set of N elements. The set of pairs of elements from X is just the Cartesian product X^2 = X\times X. In this set the pair (x,y) is a different element to (y,x), and this set also includes elements paired with themselves, (x,x). There are N^2 pairs in total.

On the other hand, we can consider distinct pairs, where the two elements x and y must be different, and the order of the pairing does not matter, i.e. (x,y) and (y,x) represent the same distinct pair. The set of distinct pairs contains N(N-1)/2 elements.

In order to iterate over the pairs of X, one typically writes nested for loops, like so:

for(int i=0;i<N;++i)
    for(int j=0;j<N;++j)
        process_pair( X[i], X[j] );

One can use the range-based for loops of C++11 to extend this to more general containers:

for( auto& x : X )
    for( auto& y : X )
        process_pair( x, y );

For distinct pairs, the indexed approach looks like this:

for(int i=0;i<N;++i)
    for(int j=i+1;j<N;++j)
        process_pair( X[i], X[j] );

i.e. for each i, the index j runs over values strictly greater than i.

What would be nice is to leverage the iterators of C++ in order to make use of both range-based for loops and the host of STL algorithms for processing pairs and distinct pairs of elements from a container, something like this:

for( auto& p : pairs(X) )
    process_pair( p );
for( auto& p : distinct_pairs(X) )
    process_pair( p );

The STL already has std::pair for representing a pair of values. The naive solution to this would be to have pairs() construct a new container of std::pairs, copying the elements over from the original container, and then iterating over the new container as normal. This is bad for the following reasons:

  • The cost associated with allocation and copying.
  • It multiplies the memory requirement by 2N+1 for pairs, or by N for distinct pairs. For large containers this can be a problem. In fact, it doesn’t take a very large value of N for the memory requirements to exceed the limits of a desktop machine: if you have more than 131072 1-byte elements, you’ll need more than 32 GB to store all the pairs, which is the limit of most motherboards at the time of writing (185364 for distinct pairs).
  • We may wish to mutate the original elements in place — mutating an element on one iteration may affect future iterations.

Due to the memory requirements, constructing a container of pairs of references is out of the question as well. A much more elegant solution is to use the existing iterators of the container to create new iterator types: pairs_iterator and distinct_pairs_iterator.

My implementation of this approach can be found on GitHub.

The two new iterator types describe each pair using a std::pair of iterators to the original container. The dereferencing operation returns a std::pair of references to the elements, which can then be used to read/write the elements that make up the pair. All of the usual iterator operations, such as incrementation, can be implemented in terms of those from the container iterator.

The only problematic operation is operator->(). Since the pairs of elements don’t actually exist anywhere in memory, we can’t return a pointer to them. Without this operator, the iterators technically don’t satisfy the InputIterator concept, but do satisfy the more basic Iterator concept. It may be possible to resolve this by returning a custom wrapper that mimics a pointer to a pair of elements, but I haven’t tried it. It turns out that this shortcoming is not really a problem, since generic algorithms tend not use this operator (it’s used to access members of the element type, and an arbitrary type need not have any members).

The iterators do satisfy the OutputIterator concept, as you can assign a std::pair of values to a std::pair of references, which will perform a member-wise assignment that updates the elements of the original container. This approach is therefore compatible with algorithms that mutate, although with great caution as the result may well be sensitive to the order of iteration over the pairs (which is the same order as in the indexed for loops above).

The utility functions pairs() and distinct_pairs() construct a wrapper object that represents the set of pairs without actually storing them. The wrapper only contains the range (pair of iterators) that defines the extent of original data set, and provides the key begin() and end() functionality that returns the new iterators. The const versions are called cpairs() and cdistinct_pairs().

The really cool thing about this approach is that pairs() and distinct_paris() can be applied to any data set that is bookended by a pair of iterators. This means we can apply it to itself to iterate over the pairs-of-pairs:

for( auto pp : pairs(pairs(X)) )
    process_pair_of_pairs( pp );

This works because the reference type of the iterator isn’t actually a reference — it’s a std::pair of references from the underlying iterator. This means the type of pp is “a pair of a pair of references”, i.e. std::pair<std::pair<T&,T&>,std::pair<T&,T&>>, so there are no references to temporaries to worry about.

Examples

A big data set

// a big vector
vector<int> v( 1 << 17 );

// fill the vector with ascending numbers starting with 1
iota( begin(v), end(v), 1 );

// count the number of distinct pairs whose sum is even
auto ps = distinct_pairs(v);
cout << count_if( begin(ps), end(ps),
    []( auto p ) {
        return ( p.first + p.second ) % 2 == 0;
    }
);

/* Output:
4294901760
*/

Note that it would take 64 GB of memory to store the distinct pairs for this example, yet this implementation had a running time of only 9 seconds and memory usage of around 1 MB.

Print distinct pairs of distinct pairs

vector<int> v { 1, 2, 3, 4 };

for( auto p : distinct_pairs(distinct_pairs(v)) ) {
    cout << "( ( " << p.first.first << ", " << p.first.second << " ), ( " << p.second.first << ", " << p.second.second << " ) )" << endl;
}

/* Output:
( ( 1, 2 ), ( 1, 3 ) )
( ( 1, 2 ), ( 1, 4 ) )
( ( 1, 2 ), ( 2, 3 ) )
( ( 1, 2 ), ( 2, 4 ) )
( ( 1, 2 ), ( 3, 4 ) )
( ( 1, 3 ), ( 1, 4 ) )
( ( 1, 3 ), ( 2, 3 ) )
( ( 1, 3 ), ( 2, 4 ) )
( ( 1, 3 ), ( 3, 4 ) )
( ( 1, 4 ), ( 2, 3 ) )
( ( 1, 4 ), ( 2, 4 ) )
( ( 1, 4 ), ( 3, 4 ) )
( ( 2, 3 ), ( 2, 4 ) )
( ( 2, 3 ), ( 3, 4 ) )
( ( 2, 4 ), ( 3, 4 ) )
*/

Limitations

If your data set contains two different elements that are equal, the distinct_pairs iterator will still consider them to be distinct. In other words, ‘distinct’ refers to instances of elements, not to semantic distinctness with respect to the equality comparison.

The Dynamics of a Transform Hierarchy

Transforms are used in game development to describe the position, rotation and scale of objects in a virtual world. These properties are encoded in a coordinate transformation, which is a function that transforms the object’s local coordinates into the world’s coordinates. These functions are invertible affine maps, consisting of an invertible linear map and a translation.

Modern games often feature complex objects, each made up of several sub-objects. To manage this complexity it has become common to organize the game objects into a hierarchy, where each object is assigned a parent, which is either another object or the world itself. This forms a tree structure with the world as the root. In such a hierarchy, each object’s transform describes its position, rotation and scale relative to its parent. The textbook example is a tank and its turret — the turret has a constant position and scale relative to its parent tank, but can rotate of its own accord.

In order to determine the transform of an object relative to the world, the individual transforms are composed by walking through the tree from the object to the root. Each transform in the sequence transforms from child to parent and the result of the composition is a single transform that describes the object relative to the world.

Composition of transforms

Composition of transforms

However, when dealing with dynamics we must acknowledge that these transforms are changing over time. In order to advance a transform forward in time, we could do everything in global terms — applying global velocities and accelerations and then converting the new global transform back into a local transform. This approach is easier, but it effectively bypasses the hierarchy as far as dynamics is concerned.

An interesting alternative is to use local velocities and accelerations, which allows the local transforms to be updated directly. In order take this approach, we need to understand how to convert these velocities and accelerations between the different frames of reference. This is especially important when the dynamics is being driven by physics, such as Newtonian mechanics. Physical forces produce acceleration, and given the acceleration of an object relative to the world, we must determine the acceleration of the object relative to its parent. This will involve taking into account the dynamics of each object in the tree between it and the root. This article explores the mathematics needed to do this.

We begin by expressing the algebraic structure of transforms in terms that keep the translation, rotation and scale components separate (rather than mixing them into a single matrix). Then we discuss our preferred description of the dynamics of a transform using linear and angular velocities and accelerations. The key step is then to extend the algebraic structure to include these velocities and accelerations, which provides the machinery to convert the dynamics between different frames of reference. This is used to derive local-to-global and global-to-local conversions of velocity and acceleration. These results are applied to instantaneous modifications of the dynamics (e.g. by impulses) to gain corresponding conversions for discrete changes in velocity and acceleration.

Finally, we look at a couple of example applications. The first shows how to apply Newtonian forces to generate local accelerations, which results in the well-known inertial forces arising. The second application is for linear impulses, where we also discuss how to counteract inherited changes, which is important for correct collision response.

Transforms

To begin, let’s define more precisely what kind of transforms we are dealing with. Nothing in this article is specific to 3-dimensional space, so we will work in n-dimensional space. Let a transform be a triple (T,R,S) consisting of a translation vector T\in\mathbb R^n, a rotation matrix R\in\mathrm{SO}(n,\mathbb R), and a positive scale factor S\in\mathbb R. As a function A:\mathbb R^n\to\mathbb R^n, the transform acts on a coordinate vector x by

x\mapsto A(x) = R S x + T,

i.e. the scale is applied first, then the rotation, and finally the translation. We will write the linear part as J = RS. The linear part coincides with the Jacobian of the map (hence the notation) and its action performs a change of basis. If A is a transform from child to parent then J is a change of basis from the child basis to the parent basis.

Transforms form a group under composition. We can express the group structure (i.e. composition, identity, inverse) in terms of the individual translation, rotation and scale components.

Composition of Transforms

The composition of two transforms is a third transform. The resulting transform can be expressed with separate translation, rotation and scale components:

(T_3,R_3,S_3) = (T_2,R_2,S_2)\circ (T_1,R_1,S_1),

where

S_3 = S_2 S_1
R_3 = R_2 R_1
T_3 = R_2 S_2 T_1 + T_2 = A_2(T_1).

Note that the composition of functions g\circ f means that f is applied first, then g.

Some implementations allow for multiple scale factors, one for each axis, which are usually stored as a vector. The use of a vector for this can be rather misleading — the correct interpretation is a diagonal n\times n matrix. When composing such transforms, the scaling matrix loses its diagonality. It becomes a mixture of diagonal scalings and rotations, which requires the full n\times n matrix to describe. For this reason we will restrict ourselves to a single scale factor.

Identity Transform

The identity transform does nothing, x\mapsto x. It is given by (0,\mathrm{id},1): the null vector 0\in\mathbb R^n, the identity matrix \mathrm{id}\in\mathrm{SO}(n,\mathbb R), and the unit scalar 1\in\mathbb R.

Inverse Transform

The inverse of a transform (T,R,S) is a transform denoted (T,R,S)^{-1} that is defined by

(T,R,S)\circ (T,R,S)^{-1} = (0,\mathrm{id},1) = (T,R,S)^{-1} \circ (T,R,S),

i.e. the inverse undoes the action of the transform, resulting in the identity transform. It is given by

(T',R',S') = (T,R,S)^{-1},

where

S' = 1 / S
R' = R^{\mathrm T}
T' = R^{\mathrm T}( -T/S ) = -S' R' T = -J'T

and R^{\mathrm T} is the matrix transpose of R, which describes the inverse rotation. J' is the linear part of the inverse, which is the inverse of the linear part,

J'= J^{-1} = R'S' = R^{\mathrm T}/S.

Dynamics of the Transform

To describe how a transform (T,R,S) is changing over time we need to consider the time derivatives of the three components. In this article we are going to assume that the scale factor S is constant. It’s unusual to have a time-varying scale factor, and taking it to be constant simplifies some of the following equations.

For the translational and rotational parts, we will consider the first two time derivatives (velocity and acceleration). Dynamics in this domain is often specified by second-order differential equations, such as Newton’s second law, and it is unusual to encounter derivatives of third-order or higher, so two should be sufficient for most applications.

Describing Translational Dynamics

Since the translation T describes the relative position of the object to its parent, the relative velocity is its first derivative and the relative acceleration is its second derivative,

v = \dot T,
a = \dot v = \ddot T.

Describing Rotational Dynamics

The dynamics of the rotation R is given by the derivatives \dot R and \ddot R. However, it is not convenient to use these quantities directly. We are using n\times n matrices to describe rotations, but only a relatively small number of matrices are valid rotation matrices. The valid matrices form a smooth geometric object — a manifold — within the ambient space of matrices. The time derivative \dot R is a tangent vector to this manifold. Due to the curvature of this manifold, the types of matrix that describe valid tangent vectors is dependent on the current rotation, R. This means we cannot choose \dot R independently of R.

Angular velocity as a tangent vector to the rotation group at the identity.

Angular velocity as a tangent vector to the rotation group at the identity.

What we would like is a way to specify the time derivative in terms independent of the current rotation, analogous to the way we can choose a velocity independently of the current position. We do this by describing the time derivative using a vector in the tangent space at the identity rotation, \omega\in\mathfrak{so}(n,\mathbb R). This tangent vector can be associated with a tangent vector at R by

\dot R = \omega R.

We call the matrix \omega an angular velocity. The tangent space at the identity is denoted \mathfrak{so}(n,\mathbb R) and consists of n\times n skew-symmetric matrices, i.e.

\omega^{\mathrm T} = -\omega.

This follows straightforwardly from the definition of a rotation matrix (by differentiating the orthogonality constraint). The skew-symmetry is preserved by addition and scalar multiplication, which isn’t surprising given that it characterizes a vector space. Although individual matrix multiplications do not preserve skew-symmetry, the space does feature a Lie bracket given by the matrix commutator, [\alpha,\beta] = \alpha\beta - \beta\alpha.

Although the angular velocities are n\times n matrices, we do not need n^2 scalars to describe them. The dimension of the rotation group (and hence its tangent spaces) is n(n-1)/2. This means for 2-dimensional space we need only a single scalar,

\left[ \begin{array}{cc} 0 & -\omega\\\omega & 0 \end{array} \right] \in\mathfrak{so}(2,\mathbb R).

The action of this matrix on a 2-vector v corresponds to scalar multiplication of the perpendicular vector,

\left[ \begin{array}{cc} 0 & -\omega\\\omega & 0 \end{array} \right] v = \omega v^\perp

where the perpendicular vector is (x,y)^\perp = (-y,x). The commutator vanishes for the 2-dimensional case, [\alpha,\beta]=0\ \forall\alpha,\beta\in\mathfrak{so}(2,\mathbb R).

For 3-dimensional space we need 3 scalars,

\left[\begin{array}{ccc}0 & -\omega_3 & \omega_2\\\omega_3 & 0 & -\omega_1\\-\omega_2 & \omega_1 & 0\end{array}\right] \in\mathfrak{so}(3,\mathbb R).

This allows for an angular velocity to be interpreted as a 3-vector (\omega_1,\omega_2,\omega_3)\in\mathbb R^3. Under this identification, the action of the matrix on a 3-vector v corresponds to the cross product \omega\times v. Similarly, the commutator of the matrix representations corresponds to the cross product of the vector representations, [\alpha,\beta]=\alpha\times\beta. Since this vector interpretation of angular velocity is unique to the 3-dimensional case, we will keep things in the matrix form. The matrix form is also a bit nicer to work with, since matrix multiplication is associative, while the cross product is not.

The second derivative is then given by

\ddot R = (\alpha + \omega^2) R,

where we have defined the angular acceleration,

\alpha = \dot\omega.

The angular accelerations are also skew-symmetric matrices. This allows us to choose R, \omega and \alpha independently of each other while still providing a complete description of \dot R and \ddot R.

So a transform A=(T,R,S) and its dynamics can be described in terms of the quantities (A,v,a,\omega,\alpha). Now that we have decided how to describe the dynamics of individual transforms, we can extend the transforms’ group structure to include the velocities and accelerations. This algebraic structure will be the means by which we convert between local and global descriptions of an object’s dynamics.

Dynamics of the Composition

To compute the dynamics of the composition, we take the expression for the composition of two transforms and compute the derivatives of its translation and rotation components. The definitions of velocity and acceleration are then used to identify v, a, \omega and \alpha for the composition.

A_3 = A_2\circ A_1
v_3 = v_2 + J_2 v_1 + \omega_2 J_2 T_1
a_3 = a_2 + J_2 a_1 + \alpha_2 J_2 T_1 + (\omega_2)^2 J_2 T_1 + 2\omega_2 J_2 v_1
\omega_3 = \omega_2 + R_2 \omega_1 R_2^{\mathrm T}
\alpha_3 = \alpha_2 + R_2 \alpha_1 R_2^{\mathrm T} + [\omega_2,\omega_3] .

Dynamics of the Inverse

The inverse converts a child-to-parent transform into a parent-to-child transform. The inverse transform describes the parent relative to the child. By computing the time derivative of the expression for the inverse, we can express the dynamics of the parent relative to the child.

A' = A^{-1}
v' = J^{-1}( \omega T - v )
a' = J^{-1}( \alpha T - \omega^2 T + 2\omega v - a )
\omega' = -R^{\mathrm T}\omega R
\alpha' = -R^{\mathrm T}\alpha R .

Dynamics of the Identity

The identity transform describes an object relative to itself. The dynamics of such a transform is null:

A = (0,\mathrm{id},1)
v = 0
a = 0
\omega = 0
\alpha = 0 .

Note that v and a are null vectors, while \omega and \alpha are zero matrices (the zero matrix is skew-symmetric).

The interpretation of this is that an object is always at rest relative to itself, regardless of its dynamics relative to other objects/the world.

Local and Global Dynamics

We now have enough machinery to convert between local and global descriptions of an object’s dynamics.

For each object we can describe its velocities and accelerations either locally (relative to the parent) or globally (relative to the world). For objects that are parented to the world these notions coincide, but in general they do not.

A hierarchy of objects related by transforms. Each object has exactly one parent.

A hierarchy of objects related by transforms. Each object has exactly one parent.

Let an object have transform A_1 relative to its parent, and the parent have transform A_2 relative to the world.

Converting Local Dynamics To Global Dynamics

The composition A_3=A_2\circ A_1 describes the object relative to the world. Given its local dynamics, and the global dynamics of the parent, the object’s global dynamics is given by the dynamics of the composition, given above.

In the case where there are many levels in the hierarchy between the root and the object, each node’s global dynamics depends on the global dynamics of its parent. This approach can be applied recursively, in exactly the same way that a transform hierarchy usually operates. In iterative terms this means starting at the root and walking through the tree towards the object, computing the global transform (and dynamics) of each node in sequence.

Converting Global Dynamics To Local Dynamics

If we are given the dynamics of A_3 (the global dynamics of the object), and want to find the dynamics of A_1 (the local dynamics of the object), we can express A_1 in terms of A_2 and A_3 via

A_1 = A_2^{-1}\circ A_3.

The dynamics of A_1 is then given by combining the dynamics of the inverse of A_2 with the dynamics of the composition with A_3:

A_1 = A_2^{-1}\circ A_3
v_1 = J_2^{-1}( v_3-v_2-\omega_2(T_3-T_2) )
a_1 = J_2^{-1}( a_3-a_2-(\alpha_2-(\omega_2)^2)(T_3-T_2) - 2\omega_2(v_3-v_2) )
\omega_1 =  R_2^{\mathrm T}( \omega_3-\omega_2 ) R_2
\alpha_1 = R_2^{\mathrm T}( \alpha_3-\alpha_2 + [\omega_3,\omega_2] ) R_2.

The velocity and acceleration can also be written as

v_1 = J_2^{-1}( v_3-v_2-\omega_2J_2T_1 )
a_1 = J_2^{-1}( a_3-a_2-(\alpha_2+(\omega_2)^2)J_2T_1 - 2\omega_2J_2v_1 ).

Local and Global Deltas

There are occasions when it is necessary to make instantaneous modifications to the velocity and acceleration of an object. A relevant example is modifying an object’s velocity in response to a collision. We can use the above results to convert the changes in velocities and accelerations between local and global forms.

As before, A_1 is the child to parent transform, A_2 is the parent to world transform, and A_3=A_2\circ A_1 is the child to world transform. These transforms are assumed to be unchanged by the interaction, and the velocity/accel is modified instantaneously, meaning no time elapses between the old and new velocities/accelerations.

Global change in child due to local change in child

\Delta v_3 = J_2\Delta v_1
\Delta a_3 = J_2\Delta a_1 + 2\omega_2\Delta v_3
\Delta\omega_3 = R_2 \Delta\omega_1 R_2^{\mathrm T}
\Delta\alpha_3 = R_2 \Delta\alpha_1 R_2^{\mathrm T} + [\omega_2,\Delta\omega_3]

Local change in child required to produce the given global change in child

This is just the inverse of the above.

\Delta v_1 = J_2^{-1}\Delta v_3
\Delta a_1 = J_2^{-1}( \Delta a_3 - 2\omega_2\Delta v_3 )
\Delta\omega_1 =  R_2^{\mathrm T} \Delta\omega_3 R_2
\Delta\alpha_1 = R_2^{\mathrm T}( \Delta\alpha_3 + [\Delta\omega_3,\omega_2] ) R_2

Global change in child due to global change in parent

A child’s global dynamics depends on both its local dynamics and its parent’s global dynamics. If the child is locally unmodified but the parent is globally modified then the child’s global dynamics changes according to the following.

\Delta v_3 = \Delta v_2 + \Delta\omega_2 J_2T_1
\Delta a_3 = \Delta a_2 + (\Delta\alpha_2 + \Delta\omega_2\omega_2 + \omega_2\Delta\omega_2 + (\Delta\omega_2)^2) J_2T_1 + 2\Delta\omega_2 J_2v_1
\Delta\omega_3 = \Delta\omega_2
\Delta\alpha_3 = \Delta\alpha_2 + [\Delta\omega_2,\omega_3-\omega_2]

Example: Newtonian Mechanics — Force to Acceleration

In Newtonian mechanics, the dynamics of an object is driven by forces, which are influences that determine its acceleration via F=ma, where F\in\mathbb R^n is the total force, and m>0 is the mass of the object. However, this is only valid in what physicists call an inertial frame, which is a frame of reference that is either stationary or moving with constant (translational) velocity — no acceleration or rotation. This corresponds to our world frame, the root of the hierarchy.

A transform hierarchy of two objects in the world.

A transform hierarchy of two objects in the world.

Consider a non-inertial frame, such as the Earth, which has translational velocity and acceleration due to its orbit around the Sun, angular velocity due to its rotation around its axis, and even some angular acceleration that causes its axis to precess. Let us also have an object that is parented to the non-inertial frame, such as a vehicle on the Earth.

If the Earth has transform A_2 and the vehicle has transform A_1 then we can convert the acceleration of the object relative to the world to acceleration relative to the Earth by using the global-to-local acceleration equation above,

a_1 = J_2^{-1}( a_3-a_2-(\alpha_2+(\omega_2)^2)J_2T_1 - 2\omega_2J_2v_1 ).

We can write this in terms of the applied force F and mass m,

a_1 = \frac{1}{m} J_2^{-1} F - J_2^{-1}(a_2 + (\alpha_2+(\omega_2)^2)J_2T_1 + 2\omega_2J_2v_1 ).

Recall that the role of J_2 is only to change between bases; it does not affect the meaning of the vector it acts on. Therefore let’s introduce a bit of notation:

  • Applied force expressed in the Earth basis: \tilde F = J_2^{-1} F.
  • Acceleration of the Earth relative to the world expressed in the Earth basis: \tilde a_2 = J_2^{-1} a_2.
  • Angular velocity of the Earth relative to the world expressed in the Earth basis: \tilde\omega_2 = J_2^{-1} \omega_2 J_2.
  • Angular acceleration of the Earth relative to the world expressed in the Earth basis: \tilde\alpha_2 = J_2^{-1} \alpha_2 J_2.

With this notation, the local acceleration becomes much cleaner:

a_1 = \frac{1}{m}\tilde F - \tilde a_2 - \tilde\alpha_2 T_1 - (\tilde\omega_2)^2 T_1 - 2\tilde\omega_2v_1.

The local acceleration is therefore given not only by the applied force, but by four additional contributions due to the global dynamics of the parent. The additional four terms are conventionally interpreted as extra forces (by F=ma), and are known to physicists as inertial forces. They are regarded as `fictitious’ forces, as they do not result from physical interactions.

This illustrates that in the absence of applied forces an object may still experience local accelerations due to the dynamics of its parent.

Since we have kept this in the n-dimensional matrix form, it is now easy to go into the 2- and 3-dimensional specializations.

In 3D

In 3D, the skew-symmetric matrices are identified with 3-vectors, and their action on a vector becomes the cross product:

  • Centrifugal: -\tilde\omega_2\times(\tilde\omega_2\times T_1)
  • Coriolis: -2\tilde\omega_2\times v_1
  • Euler: -\tilde\alpha_2\times T_1

Since physicists usually operate in 3-dimensions using the vector version of angular velocity, this form may be more familiar.

In 2D

In 2D, the skew-symmetric matrices are identified with scalars, and their action on a vector becomes scalar multiplication with the perpendicular vector, (x,y)^\perp = (-y,x):

  • Centrifugal: (\omega_2)^2 T_1
  • Coriolis: -2\omega_2 v_1^\perp
  • Euler: -\alpha_2 T_1^\perp

Example: Newtonian Mechanics — Linear Impulses

An impulse is the integral of an applied force over a given time interval. In an inertial frame this is equal to the overall change in momentum produced by that force acting over the time interval. However, when a large force acts over a very small time interval it is often not feasible to simulate the action of the force itself. Instead, the corresponding impulse can be applied as an instantaneous modification to the object’s dynamics. A relevant example is collision response.

An impulse corresponds to a change in velocity, \Delta v = \frac 1m\Delta p, in the same way that a force corresponds to an acceleration. In order to convert the global change in velocity into a local change in velocity, we use the global-to-local conversion of delta.

Let the global delta be \Delta v_3 = \Delta p/m then, applying the formulae above, the local deltas are

\Delta v_1 = \frac 1m J_2^{-1} \Delta p
\Delta a_1 = -\frac 2m J_2^{-1} \omega_2\Delta p.

This shows that applying an impulse can causes changes in both the local velocity and the local acceleration of an object. The change in acceleration is due to the Coriolis term, which is dependent on velocity.

Counteracting the Inherited Effect from the Parent

When the parent’s global dynamics have been modified, all of its children (and childrens’ children etc.) will automatically inherit this modification to their own global dynamics thanks to the hierarchy. There may be some applications where this is undesirable and you want to modify an object in isolation. An example is a car crash — the dynamic child objects inside the car should continue globally moving forward when the car crashes, otherwise they will appear to be fixed to the car.

To do this we need to add the opposite global effect to the immediate children of the object that has been modified to counter the inherited changes to their dynamics. The childrens’ children will inherit this correction automatically and do not need to be corrected further. If \Delta v is the global change in velocity that has been added to the parent, then for each child the local velocity and acceleration need to be updated by

\Delta v_1 = - J_2^{-1} \Delta v
\Delta a_1 = 2 J_2^{-1} \omega_2\Delta v.

It is especially important to do this when a child collides with its parent (or the parent’s parent etc.) to get the correct collision response.

Implementation

To implement this approach, you will need not only a Transform class, but also a DynamicTransform class. Both of these will have the group structure of composition, identity and inverse derived in this article. A hierarchy can then be made in the usual way, just using DynamicTransforms in place of Transforms.

Here is a sketch to illustrate.

template<typename T,int N>
struct Transform {
    Vec<T,N> translation;
    Rotation<T,N> rotation;
    T scale;
    
    // Function evaluation
    Vec<T,N> operator()( const Vec<T,N>& rhs ) const;
    
    // Composition
    Transform<T,N> operator*( const Transform<T,N>& rhs ) const;
    Transform<T,N>& operator*=( const Transform<T,N>& rhs );
    
    Transform<T,N> Inverse() const;
    
    static const Transform<T,N>& Identity();
    
    // Linear part
    Transform<T,N> LinearTransform() const;
    
    // Inverse transpose of linear part
    Transform<T,N> NormalTransform() const;
    
    // To augmented matrix representation
    Mat<T,N+1,N+1> ToMatrix() const;
};

template<typename T,int N>
struct DynamicTransform {
    Transform<T,N> transform;
    Vec<T,N> velocity, accel;
    Skew<T,N> angularVelocity, angularAccel;
    
    // Composition
    DynamicTransform<T,N> operator*( const DynamicTransform<T,N>& rhs ) const;
    DynamicTransform<T,N>& operator*=( const DynamicTransform<T,N>& rhs );
    
    DynamicTransform<T,N> Inverse() const;
    
    static const DynamicTransform<T,N>& Identity();
    
    // Transform deltas using this dynamic transform
    Vec<T,N> TransformDeltaVel( const Vec<T,N>& dv ) const;
    Vec<T,N> TransformDeltaAcc( const Vec<T,N>& dv, const Vec<T,N>& da ) const;
    Skew<T,N> TransformDeltaAngVel( const Skew<T,N>& domega ) const;
    Skew<T,N> TransformDeltaAngAcc( const Skew<T,N>& domega, const Skew<T,N>& dalpha ) const;
};

A Quick Note On Rigid Body Dynamics

For application to rigid bodies, the `positions’ of objects here correspond to the centre of mass of the rigid body. This is because rigid body motion is decomposed into rotations about the centre of mass and translations of the centre of mass.

Conclusion

We have discussed the algebraic structure of n-dimensional transforms consisting of translation, rotation and scale components, and described their dynamics in terms of linear and angular velocities and accelerations. By extending the algebraic structure to include these velocities and accelerations, we can convert between local and global descriptions of an object’s dynamics. The resulting equations apply to any number of dimensions, to any source of dynamics, and to a hierarchy of arbitrary depth. This in turn allowed us to convert instantaneous changes in velocity and acceleration between local and global frames of reference. We applied these results to the example of Newtonian mechanics, where the global-to-local conversion of acceleration produces the well-known inertial forces, and to the example of linear impulses.

Numerical Integration for Rotational Dynamics

In order to advance a simulation forward in time, a numerical integration scheme is needed to evaluate the continuous part of the dynamics, which is specified by a differential equation. For translational dynamics, this is usually of the form

\begin{array}{l}\dot x = v\\\dot v=a(x,v),\end{array}

where x is the position, and v is the velocity. The acceleration, defined as \dot v, is a function of the position and velocity. Under the assumption that acceleration is constant, the differential equations have a closed-form solution, giving the update formula

\begin{array}{l}\Delta v = a \Delta t\\ \Delta x = \frac 12 (v + v') \Delta t,\end{array}

where x' = x + \Delta x and v' = v + \Delta v.

In this article, we will derive an analogous result for rotational dynamics — the numerical integration scheme for the differential equations describing rotational motion under the assumption of constant angular acceleration.

The assumption of constant acceleration is not always appropriate for translational dynamics, particularly for systems like springs that have rapidly varying acceleration. For rotational dynamics, a proper discussion of the appropriateness of constant angular acceleration is best left to a dedicated article, as it would involve the details of rigid body dynamics, which is not the concern of this article.

Translational and Rotational Analogies in 3D

Whereas translational motion describes the change in an object’s position over time, rotational motion describes the change in an object’s orientation. Positions are described relative to an origin by a displacement vector, with the null vector describing the origin. In an analogous fashion, orientations are described relative to a standard orientation by a rotation, with the identity rotation describing the standard.

Rotations in 3D can be quantified by:

  • an axis and an angle;
  • by virtue of being linear maps: a rotation matrix — a real 3-by-3 matrix with determinant 1 and inverse coinciding with its transpose;
  • a unit quaternion.

We will assume that the reader is already familiar with rotations and their various descriptions.

In 3-dimensional space, the space of rotations also possesses a 3-dimensional structure. This is not the case in general — for example, in 2D the space of rotations is 1-dimensional. Because of this, there are many analogies between translational and rotational dynamics in 3D.

Translational Rotational
position, x\in\mathbb R^3 rotation matrix, R\in\mathrm{SO}(3,\mathbb R), or unit quaternion, q\in S^3\subset\mathbb H
velocity, v=\dot x\in\mathbb R^3 angular velocity, \omega\in\mathbb R^3
acceleration, a=\dot v\in\mathbb R^3 angular acceleration, \alpha=\dot\omega\in\mathbb R^3
mass, m\in\mathbb R moment of inertia tensor, I\in\mathbb R^3\otimes\mathbb R^3\cong\mathbb R^{3\times 3}
momentum, p=mv\in\mathbb R^3 angular momentum, L=I\omega\in\mathbb R^3
force, F=\dot p\in\mathbb R^3 torque, \tau=\dot L\in\mathbb R^3

The aim of this article is not to discuss how accelerations are produced by forces, nor how angular accelerations are produced by torques, and therefore we will be focusing on the top half of this table.

The main reason that rotational motion is more complicated than translational motion is that the space of rotations is not a linear space — it is `curved’. In a linear space, a constant velocity moves you along a straight line; in the space of rotations, a constant angular velocity moves you along a curve.

The Rotational Equations

The key piece of information missing from the table above is the relation between the angular velocity, \omega, and the time derivative of the rotation, \dot R (or \dot q).

For rotation matrices this is
\dot R=\omega_{\times} R,

where \omega_{\times} is an anti-symmetric matrix describing the linear map x\mapsto\omega\times x, called the cross-product matrix,
\omega_{\times} = \left[\begin{array}{ccc}0 & -\omega_3 & \omega_2\\\omega_3 & 0 & -\omega_1\\-\omega_2 & \omega_1 & 0\end{array}\right].

For the mathematically literate, \omega_{\times} is an element of the Lie algebra \mathfrak{so}(3,\mathbb R), the tangent space to \mathrm{SO}(3,\mathbb R) at the identity. The conversion \omega\mapsto\omega_{\times} is an isomorphism between our physical space \mathbb R^3 and \mathfrak{so}(3,\mathbb R), which is possible as they are of the same dimension. Under this identification, the cross product coincides with the matrix commutator, which is the Lie bracket for the algebra. The time derivative \dot R is a tangent vector at R, which is given by the pushforward of \omega_{\times} by right translation.

The quaternionic version of this is
\dot q=\frac 12\omega q,

where \omega is interpreted as an imaginary quaternion and the multiplication is quaternionic. The identification of vectors with imaginary quaternions is analogous to the identification of vectors with anti-symmetric matrices, as the imaginary quaternions form the tangent space to the unit quaternions at the identity, 1. The factor of \frac 12 comes from the property that each rotation is described by two unique unit quaternions, whereas there is only one rotation matrix.

Integrating the Rotational Equations

The easy part of the integration is the angular velocity, which, assuming constant angular acceleration gives

\omega'=\omega + \alpha\Delta t.

However the other equation is not so straightforward.

In the special case that \alpha=0, the angular velocity is constant and the solution is given by the exponential map

R'=\exp(\omega_{\times}\Delta t) R
q'=\exp(\frac 12 \omega\Delta t) q.

But when \alpha\neq 0 and \omega varies with time, this becomes much more complicated. The solution can be written as an infinite series via the Magnus expansion,

R'=\exp( \Omega_{\times} ) R
\Omega=\Omega_1 + \Omega_2 + \cdots.

For constant angular acceleration, we can compute the first few of these terms by hand,

\Omega_1 = \frac 12 (\omega+\omega')\ \Delta t
\Omega_2 = \frac{1}{12}\ \alpha\times\omega\ \Delta t^3 = \frac{1}{12}\ \omega'\times\omega\ \Delta t^2
\Omega_3 = \frac{1}{240}\ \alpha\times(\alpha\times\omega)\ \Delta t^5.

This series is guaranteed to converge as long as the angle rotated per time-step is less than \pi/\sqrt{2} (which is about 127 degrees). Typically the first one or two terms are plenty. Note that we have given \Omega in vector form, but for use in the matrix exponential the sum should be converted to the cross-product matrix form. We can also reverse-engineer the quaternion version from this,

q'=\exp( \frac 12 \Omega ) q,

where \Omega is now an imaginary quaternion.

A Geometric Interpretation

We can think of the Magnus expansion \Omega as a polynomial in \Delta t with values in \mathfrak{so}(3,\mathbb R), the tangent space to \mathrm{SO}(3,\mathbb R) at the identity. In other words it is a path in the tangent space that begins at the null vector \Omega(0)=0 and ends at some vector \Omega(\Delta t).

The exponential map maps this tangent space to the manifold, \mathfrak{so}(3,\mathbb R)\to\mathrm{SO}(3,\mathbb R), in a way that identifies the null vector with the identity. Therefore it `wraps’ the path around the manifold so the starting point is the identity, \exp(\Omega(0))=\exp(0)=\mathrm{id}, and the end point is \exp(\Omega(\Delta t)).

The act of multiplying this path with the current rotation on the right \exp(\Omega)\mapsto \exp(\Omega)R is called right translation. It `moves’ the path across the manifold so that the starting point is now at R, and the end point is the new rotation R'. Therefore the integration moves us along paths that are constrained to rotation group.

Evaluating the Exponential Map

For both matrices and quaternions, the exponential map is defined by the series

\exp( X ) = \sum_{n=0}^\infty \frac{X^n}{n!}.

However, for the case that X is an anti-symmetric matrix, the exponential map corresponds to the conversion from axis-angle to rotation matrix (the length of the associated vector is the angle, the direction is the axis). The analogous statement applies for the quaternionic version: the exponential map acting on imaginary quaternions corresponds to the axis-angle to unit quaternion conversion,

\exp( X ) = \cos\left(\left\lVert X\right\rVert\right) + \frac{X}{\left\lVert X\right\rVert}\sin\left(\left\lVert X\right\rVert\right).

However, performing such a conversion involves evaluating trig functions, which are expensive. Instead, we can truncate the series expansion of the exponential map to get an approximation. This is appropriate as we will be evaluating it with \Delta t, so the higher powers will be negligible.

For an imaginary quaternion, X^2 = -\left\lVert X\right\rVert^2. This means the quaternion exponential map to third-order is
\exp( X ) \approx (1 - \frac 12 \left\lVert X\right\rVert^2) + (1 - \frac 16 \left\lVert X\right\rVert^2) X.

The full exponential map produces unit quaternions from imaginary quaternions, so we should also build a normalization into this approximation,
\exp( X ) \approx \frac{(1 - \frac 12 \left\lVert X\right\rVert^2) + (1 - \frac 16 \left\lVert X\right\rVert^2) X }{ \sqrt{(1 - \frac 12 \left\lVert X\right\rVert^2)^2 + (1 - \frac 16 \left\lVert X\right\rVert^2)^2 \left\lVert X\right\rVert^2}}.

This approximation is good for small rotations per time-step. The error is less than 1% for rotations up to 90 degrees per time-step. For faster rotations, the full exponential map should be applied.

Unfortunately, rotation matrices are not so easily (ortho)normalizable, and so you’re best off using the full axis-angle to rotation matrix conversion to implement the exponential map for matrices. For this reason the quaternion description is preferable to rotation matrices.

Conclusion

We have derived a method for time-stepping rotational dynamics that expresses each update as a rotation. The update rotation is determined in vector form (axis-angle) by truncating the Magnus expansion and then converted to the underlying algebraic representation (rotation matrix/unit quaternion) by applying the exponential map.

For the quaternion version, we also approximated the exponential map in a way that avoids evaluating trig functions while preserving normalization, so the resulting quaternion is always a valid rotation.