Introduction
It's been a while since I've done one of these, but life has been pretty busy Also, this isn't the most efficient algorithm, there are several places where it could use improvement.
Anyways, here's something that I've been working on. It's not specific to the Java programming language, but it may be useful to you if you want to perform spline interpolation over a 1D data set.
Difficulty: Medium-hard. While this tutorial doesn't involve any difficult Java concepts (just the basic Java syntax is enough), there is significant math content, in particular systems of equations and linear algebra. It's not necessary that you completely understand the algorithm in order to know how to use it, though.
What is Spline Interpolation?
Spline interpolation is the action of taking a set of points and fitting multiple piece-wise continuous functions (known as splines) to these points. The most common functions to fit are polynomials. While in general if you want higher precision you'll need higher order polynomials, I have found that you can get the best accuracy using cubic functions (due to floating point errors, higher order polynomials start getting worse performance). Cubic functions are the minimum you use to get a smooth curve over all your data. Anything else will only be piecewise smooth.
Note that spline interpolation is not the same thing as regression curve fitting! For a given set of data, you are guaranteed to cross all the data points. Also, spline interpolation can only be used on functions, so you can't have multiple y values for a given x value.
If you're really looking for speed, simple linear functions will provide good speed with decent accuracy.
Math Section
The basis of spline interpolation is solve a system of linear equations. The number of equations we need (and thus the number of points needed) is dictated by the number of coefficients we need to solve for.
For example, take the cubic function:
a x^3 + b x^2 + c x + d = y
This has 4 coefficients a, b, c, and d, thus would require 4 points.
So, first step is to write down our system of equations with all the points plugged in.
a x1^3 + b x1^2 + c x1 + d = y1
a x2^3 + b x2^2 + c x2 + d = y2
a x3^3 + b x3^2 + c x3 + d = y3
a x4^3 + b x4^2 + c x4 + d = y4
This can easily be solved using any linear-algebra solver. Because I didn't want to solve this out by hand (and didn't have access to a CAS), I chose to implement a simple Gaussian elimination algorithm with partial pivoting.
// the gaussian elimnation algorithm // I won't explain this here because it's not in the scope of this tip private static double[] lin_solve(double[][] matrix) { double[] results = new double[matrix.length]; int[] order = new int[matrix.length]; for (int i = 0; i < order.length; ++i) { order[i] = i; } for (int i = 0; i < matrix.length; ++i) { // partial pivot int maxIndex = i; for (int j = i + 1; j < matrix.length; ++j) { if (Math.abs(matrix[maxIndex][i]) < Math.abs(matrix[j][i])) { maxIndex = j; } } if (maxIndex != i) { // swap order { int temp = order[i]; order[i] = order[maxIndex]; order[maxIndex] = temp; } // swap matrix for (int j = 0; j < matrix[0].length; ++j) { double temp = matrix[i][j]; matrix[i][j] = matrix[maxIndex][j]; matrix[maxIndex][j] = temp; } } if (Math.abs(matrix[i][i]) < 1e-15) { throw new RuntimeException("Singularity detected"); } for (int j = i + 1; j < matrix.length; ++j) { double factor = matrix[j][i] / matrix[i][i]; for (int k = i; k < matrix[0].length; ++k) { matrix[j][k] -= matrix[i][k] * factor; } } } for (int i = matrix.length - 1; i >= 0; --i) { // back substitute results[i] = matrix[i][matrix.length]; for (int j = i + 1; j < matrix.length; ++j) { results[i] -= results[j] * matrix[i][j]; } results[i] /= matrix[i][i]; } double[] correctResults = new double[results.length]; for (int i = 0; i < order.length; ++i) { // switch the order around back to the original order correctResults[order[i]] = results[i]; } return results; }
once we have the coefficients, we can easily interpolate between [x1,x4] using our cubic function. However, how do we pick x1,x2,x3, and x4?
The simplest method is to split our data into sets which each contain the minimum number of points needed to solve the linear system of equations.
Take the example data set:
x data
[1,2,3,4,5,6,7,8,9,10]
If we're using a cubic function, we would split the data as such:
[1,2,3,4], [5,6,7,8], [9,10]
What do we do with the last two points? there's no possible way we could create an exact cubic function for these last two points! There are two easy solutions to this problem: one is to throw these last two data points away, and the second is to make up 2 more "fake" data points. Because spline interpolation will always intersect the data points specified, we can add extra points outside the domain (similar to adding a "dummy portion" to make a function periodic for use with Fourier Transforms). While the second method is somewhat better, I chose to implement the first method (partially because I wanted to perform some extrapolation, too), though both methods are roughly the same difficulty.
The code
Here's the code for the spline interpolation. This method will take in x,y data lists, the x value you want to interpolate, and the order of the polynomial you want to use. In general order 3 polynomials (cubic functions) give you the best trade-off between mathematical accuracy and floating point truncation errors.
public static double poly_interpolate(double[] dataX, double[] dataY, double x, int power) { int xIndex = 0; while (xIndex < dataX.length - (1 + power + (dataX.length - 1) % power) && dataX[xIndex + power] < x) { xIndex += power; } double matrix[][] = new double[power + 1][power + 2]; for (int i = 0; i < power + 1; ++i) { for (int j = 0; j < power; ++j) { matrix[i][j] = Math.pow(dataX[xIndex + i], (power - j)); } matrix[i][power] = 1; matrix[i][power + 1] = dataY[xIndex + i]; } double[] coefficients = lin_solve(matrix); double answer = 0; for (int i = 0; i < coefficients.length; ++i) { answer += coefficients[i] * Math.pow(x, (power - i)); } return answer; }
Conclusion
Hopefully this may be useful to someone who wants to perform interpolation on a data set. Spline interpolation is also not limited to 2-d functions. They can be expanded to 3D data sets, though the computation time will significantly increase.
For practical applications, it's better to "solve out" the values of the coefficients using a CAS, then plugging in the actual values later. This would eliminate the need for the linear system solver, which won't always succeed, and also has significant truncation errors associated with it.
Also, the look-up table algorithm provided here (searching linearly through an array) isn't the most efficient method. There are other methods (such as using a tree) which will provide significant performance boosts over O(n).
Happy coding