Friday, March 6, 2015

Approximation by spline functions and parametric spline curves with SciPy

The flexibility of splines provides best fitting results in most cases when the underlying math of the data is unknown. You can become convinced of this trying to find best approximation of your data using CurveExpert. By selecting Tools->CurveFinder… you can search for best data fit. After that call spline interpolation by selecting Calculate->Polynomial Spline… and entering spline’s degree. In most cases spline interpolation is the best fit.


However, there are many other ways of curve fitting by splines. If there is no restriction that curve must pass exactly through data points, we can use spline approximation methods to reduce amount of piecewise polynomials that represent spline curve and eliminate end effects like that one at the picture above. Moreover, this will produce smoother curve than interpolated one.

SciPy's fitpack2.py module provides a pretty useful set of classes representing univariate and bivariate splines. It is based on a collection of Fortran routines DIERCKX and provides object-oriented wrapping to this low-level functionality.

Further I will speak about univariate splines, but I'm pretty sure that similar operations can be scaled to bivariate splines.

To construct univariate spline object you can  instantiate one of the three classes
  • UnivariateSpline,
  • InterpolatedUnivariateSpline,
  • LSQUnivariateSpline,
passing them datapoints, spline degree, knot vector (for LSQUnivariateSpline) and optional additional parameters like weights vector, smoothing factor and boundary of the approximation interval.

Being a superclass for others, UnivariateSpline has the ability to be constructed in alternative way by passing it tuple of knot vector t, B-spline coefficients c and spline degree k. This is realized by defining ._from_tck classmethod. I find it pretty useful for creation of new spline instances from others, or for construction of parametric spline curves.

Here we need to understand the difference between spline functions (which we get instantiating any of the mentioned fitpack2 classes) and parametric spline curves.

Spline functions \(y=f(x)\) maps a real number \(x\) to a real number \(y\) and its knot vector is defined in the same space as \(x\) (in other words, represents position of knots along x-axis). This can lead to some problems, like representing vertical lines.

Spline curves \(f(t) = (x(t),y(t))\) are represented by two distinct spline functions that use same knot vector in its own parameter space \(t\). So we can say that parametric spline curve is a collection of two spline functions. That gives us the ability to represent much more complex shapes.

Another important property of parametric spline curves is the ability to define it through its control polygon, or in other words, control points.

Many modern CAD systems, as well as 3D an 2D computer graphics software has in their tools parametric spline curves and surfaces for representing complex shapes. Edition of the spline curves is provided by moving control points. This gives the designer control over spline shape in a natural way as the curve is attracted to its control points.

The same approach can be used to manually approximate any set of data points by parametric spline curves. And the purpose of this topic is to show you that this approach can give the best fitting result with minimum knot vector length and ultimate smoothness.

First things first, so let’s take a look at the construction process of parametric spline curves with SciPy:

import numpy, matplotlib and scipy’s interpolate module:
import numpy as np
import matplotlib.pyplot as plt
import scipy.interpolate as si
define some control points and set some variables:
x = [6, -2, 4, 6, 8, 14, 6]
y = [-3, 2, 5, 0, 5, 2, -3]

xmin = min(x)
xmax = max(x)
ymin = min(y)
ymax = max(y)

n=len(x)
plotpoints = 100
next, set spline degree and find knot vector:
k=3
knotspace = range(n)
knots = si.InterpolatedUnivariateSpline(knotspace,knotspace,k=k).get_knots()
knots_full = np.concatenate(([knots[0]]*k, knots, [knots[-1]]*k ))
Actually we could define knot vector manually, but to get smooth spline and avoid some difficulties     we’ve just constructed the instance of InterpolatedUnivariateSpline and got its knots.

Now we need to construct tuples of knots, coefficients, and spline degree:
tckX = knots_full , x, k
tckY = knots_full , y, k
Here we pass coordinates of control points as spline coefficients. This provides independence of control points on knot vector for parametric spline curves and assure its free-form property.

Finally, construct spline functions from this tuples using ._from_tck classmethod:
splineX = si.UnivariateSpline._from_tck(tckX)
splineY = si.UnivariateSpline._from_tck(tckY)
Now we can evaluate our parametric spline curve for the set of parameter values tP:
tP = np.linspace(knotspace[0], knotspace[-1], plotpoints)
xP = splineX(tP)
yP = splineY(tP)
So let’s take a look at the spline functions and parametric spline curve they represent:


Black triangles along parameter axes represent knots. Green polygons are actually the control polygons of the parametric spline curve and the spline functions.

While control polygon of parametric spline curve is originally defined by \(x\) and \(y\) and does not depend on the knot values, control polygons of spline functions do depend on knot values. To be more specific, coordinates of the spline functions’ control points are points \( (cp_i, x_i) \) and \( (cp_i, y_i) \) where

\[{cp_i} = {t_{i+1}+…+t_{i+k}\over k}\]
are coordinates of the control points along parameter axis \(t\), \(t_i\) are knots and \(k\) is a degree of the spline function.

So, to implement this formula we can define a simple function:
def getControlPoints(knots, k):
 n = len(knots)-1-k
 cx = np.zeros(n)
 for i in range(n):
  tsum = 0
  for j in range(1,k+1):
   tsum+=knots[i+j]
  cx[i] = float(tsum)/k
 return cx

cp = getControlPoints(knots_full, k)
Changing of spline function’s control points in order to change the spline’s shape is not an option. Assume we decided to edit control points of some spline function. Moving points along vertical axis is not a problem, as knot vector is independent on spline coefficients. But if we try to move points along the parameter axis we face the need to find new values of knots. To do that we hopefully would like to solve a linear equation system:

\[ \left( \begin{array}{cccc}
{t_{1}+…+t_{1+k}} \\
{t_{2}+…+t_{2+k}} \\
... \\
{t_{i+1}+…+t_{i+k}} \end{array} \right)
=
k \cdot {\left( \begin{array}{cccc}
cp_0 \\
cp_1 \\
... \\
cp_i \end{array} \right) }
\]
complemented with some boundary conditions. We can even get the solution, but there is no guarantee that knots will be in ascending order. Moreover, if we add boundary conditions, based on the knowledge that first and last \(k+1\) knot values in knot vector are equal we get overdetermined system.

That’s why we would like to use easy-editable parametric spline curves since their control points are independent on knot vector values.

Now let’s compare this approach with some approximation approaches provided by scipy’s fitpack2 module in approximation of some real engineering data.

Assume we have some data and we need to find the best smooth spline approximation with minimal length of knot vector.


Let's do this for second from the bottom curve. First of all we need to extract data points from the scanned image. I used Gsys application for this purpose but there are many alternatives.

So, now we have vectors of \(x\) and \(y\) representing data points:


Before we start, it’s better to define the vector of weights \(w\) in order to push our further approximations to pass close to the end points of the original data:
wend = 3
w = [wend] + [1]*(num_points-2) + [wend]
First let’s find least squares spline approximation.


As LSQUnivariateSpline class requires knot vector on the call, we need to construct it somehow.

try with uniform knot vector:
knot_offset=(xmax-xmin)/(nknot+1)
knots = np.linspace( knot_offset, xmax-knot_offset, nknot )
instantiate LSQUnivariateSpline class using knot vector and weight vector:
lsqspline = si.LSQUnivariateSpline( x, y, knots, k=k, w = w)
get full-length spline knot vector, spline coefficients and coordinates of control points along x-axis:
knots = lsqspline.get_knots()
knots_full = np.concatenate(([knots[0]]*k, knots, [knots[-1]]*k ))
coeffs_y = lsqspline.get_coeffs()
coeffs_x = getControlPoints(knots_full, k)
evaluate spline points and plot the result:
xP = np.linspace( x[0], x[-1], nsample )
yP = lsqspline (xP)



Well, as you can see this is definitely not the approximation we are looking for. And do believe me, varying the length of knot-vector will not make it better.

So, next step is to try with non-uniform knot vector. You can achieve this either by inventing some efficient algorithm to construct best-fitting non-uniform knot vector (which is, actually, a non-trivial task) or by selecting values of interior points manually. I’ve chosen the second way and it took me about 30 minutes to get a good result:
knots = [1.2, 1.85, 2.0, 2.8, 3.5]



As you can see, this is a different story and I was pretty happy to find such good fitting!

If we evaluate this spline representation to a wider boundaries, we can see that its extrapolating behavior, frankly speaking, is not perfect:


So, if we seek not only for interpolation but extrapolation as well, we need to play with knot vector values for some more time.

Actually, we can avoid the necessity of defining knot vector by using UnivariateSpline class and passing it the smoothing factor s on the call.
try with \(s=0.1\):
spline = si.UnivariateSpline( x, y, k=k , s=0.1, w=w)
knots = spline.get_knots()
knots_full = np.concatenate(([knots[0]]*k, knots, [knots[-1]]*k ))
coeffs_x = getControlPoints(knots_full, k)
coeffs_y = spline.get_coeffs()

xP = np.linspace( x[0], x[-1], nsample )
yP = spline (xP)



knots: [ 0.1062 6.026 ]

Not even close…

try with \(s=0.005\):


knots: [ 0.1062 1.704 2.086 2.772 6.026 ]

Better, closer, warmer!

Further decreasing of smoothing factor will lead to better results. But at the same time, knot vector length increases. If we put \(s=0\) we get polynomial spline interpolation like we saw before:


knots: [ 0.1062 0.4779 0.7301 0.9292 1.088 1.227 1.336 1.473 1.581
1.704 1.805 1.877 1.935 2.007 2.086 2.216 2.389 2.534
2.656 2.772 2.887 3.024 3.161 3.284 3.436 3.602 3.811
3.977 4.143 4.323 4.525 4.72 4.951 5.167 5.355 5.507
5.687 6.026 ]

As you can see, it is not smooth, has data-long knot vector and behave bad on extrapolation.

So what using of parametric spline curves can give us?

Let's take as the basic one the smooth spline function approximation with s=0.005. First we will parameterize it and then we edit its control points to find the best fitting.

To emphasize the independence of spline coefficients on the knot values I suggest you to norm knot vector of the original spline function to some value. I've chosen the number of data points, but it can be any other number. Some usually simply take 1.
num_points=len(x)
num_knots = len(knots_full)
ka = (knots_full[-1] - knots_full[0])/(num_points)
knotsp = np.zeros(num_knots)
for i in range(num_knots):
 knotsp[i] = num_points-((knots_full[-1] - knots_full[i]))/ka
Knot vector knotsp is defined in new space, but relative position between knots stays the same to preserve spline's shape.

Now we need to repeat the same procedure that we started with to get parametric spline:
tckX = knotsp , coeffs_x, k
tckY = knotsp , coeffs_y, k
splineX = si.UnivariateSpline._from_tck(tckX)
splineY = si.UnivariateSpline._from_tck(tckY)
define control points' coordinates along parameter axis t:
coeffs_p = getControlPoints(knotsp, k)
and evaluate points of the curve:
tP = np.linspace(-3,  num_points+3, 100)
xP = splineX(tP)
yP = splineY(tP)


As we can see, spline function \(x(t)\) is a straight line what is obvious as coefficients coeffs_p are just a scaled version of the original coeffs_x.

If I had this spline curve being represented in the window of some design software I just had to drag control points to desired place until the curve will fit initial data points. To do this with matplotlib we need to use some additional framework for 2D graphics. But this will go straight out of the topic. Without that we just can manually iteratively select some new coordinates for control points. It’s pretty easy and took me about 10 minutes to get best result. Of course it’s not what you would like to do every time you need to make an approximation but it’s good for this example.

So, we have redefined control points, which are, actually, coefficients of the spline functions:
coeffs_x = [ x[0], 1.1,   1.7,  2.16,  3.1,   3.0,   x[-1]]
coeffs_y = [ y[0], 0.11,  0.17,  0.32,  0.27,  0.19,  y[-1]]
I took initial values for boundary points to be sure that our spline curve interpolates them.

Now just rebuild spline curve with new coefficients:


knots: [0. 10.79631069 13.37747897 18.0127707 40.]

What I see, is the best fitting result of all previous approximation. The curve is rather smooth, knot vector is short, and, what is amazing, we received good extrapolation result even without an intention. That's because spline is tangent to control polygon at the end points.

However there is one significant challenge we face using parametric spline curve as the approximation of some data. Since \(x\) values depends on \(t\) as well as \(y\) values, there is no obvious connection between \(x\) and \(y\) values. In case we want to get \(y_i\) that corresponds to some \(x_i\) we need to dig deep into underlying B-spline functions and get proper piecewise polynomial representation of the \(x(t)\) spline function. Then we must solve k-degree equation against \(t\) for given \(x\) and choose proper root. This problem can be complicated even more if spline curve cross itself, producing loops.

Conclusions



SciPy provides ultimate tools for approximation by spline functions that in right hands can be extremely helpful. Moreover, using this functionality we can construct parametric spline curves and enable editing the spline's shape simply by moving its control points. This can seem less scientific comparing to selecting proper knot vectors or playing with smoothing factor, but can give better results and provides intuitive and easy way of data fitting. Despite that, there are some challenges to overcome, like receiving dependency \(y=f(x)\) for parametric spline curves to use them in every-day approximations.


Parametric Spline Curves Example 1

Parametric Spline Curves Example 2

Curve Data