Algorithms: curve fitting

For any set of $N$ data points there is a unique $N-1$ order polynomial that passes through all of the points. Its construction can be bootstrapped from the linear case that is so well known. Let $(x_1,y_1)$ and $(x_2, y_2)$ be on a line $y=a+bx$ . Then the equations

$y_1=a+bx_1, \qquad y_2=a+bx_2$

have solution

$y=\big({x_1y_2-x_2y_1\over x_1-x_2}\big) +\big({y_1-y_2\over x_1-x_2}\big) x$

which we rewrite as

$y=\big(y_1{x-x_2\over x_1-x_2}\big) -\big(y_2{x-x_1\over x_1-x_2}\big)$

From this we see how to build the general solution, for example for three data points the unique quadratic polynomial that interpolates the points is (Lagrange interpolation formula)

$y=y_1\big({(x-x_2)(x-x_3)\over(x_1-x_2)(x_2-x_3)}\big)-y_2\big({(x-x_1)(x-x_3)\over(x_1-x_2)(x_2-x_3)}\big)$

$+y_3\big({(x-x_2)(x-x_1)\over(x_1-x_2)(x_2-x_3)}\big)$

which is the solution to the linear system

$\left(\begin{array}{ccc} x_1^2 & x_1 & 1\\ x_2^2 & x_2 & 1\\ x_1^3 & x_3 & 1\end{array}\right)\left(\begin{array}{c} a\\ b\\ c\end{array}\right)=\left(\begin{array}{c} y_1\\ y_2\\ y_3\end{array}\right)$

Chebyshev polynomials make a much better choice than $x^k$ functions for expansion on the domain $-1\le x\le 1$ because the matrix that is inverted in the algorithm is far better conditioned than in the monomial case. If we are working with data such that $a\le x_i\le b$ , we can transform it into the domain $-1\le x'_i\le 1$ with the transformation $x'_i={2x_i-a-b\over b-a}$ .
By changing one routine in the program, we could expand the data into a sum of orthogonal polynomials of any type, Chebyshev, Legendre, Hermite, and so forth.

Best fits to curves
Suppose that you have lab data: $(x_1,y_1),(x_2,y_2),\cdots,(x_n,y_n)$ and you need a best fit to a curve $y=f(x)$ . Using least squares we can find the coefficients $a_j$ such that

$f=\sum_j a_j p_j(x)$

where $p_j$ are some well-behaved polynomial functions on the domain of your $x's$ , $x_i\in [a,b]$ . Map this domain into $\xi_j\in [-1,1]$ and use Chebyshev polynomials $p_j=T_j$ , since on $[-1,1]$ these polynomials are bounded and the resulting linear equations are well-conditioned.

We create the cumulative deviation function

$Q=\sum_j(y_j-\sum_k a_k T_k(\xi_j))^2$

and set up the linear equations

${\partial Q\over \partial a_m}=0, \qquad \sum_j y_j T_m(\xi_j)=\sum_{j,k}a_k T_m(\xi_j)T_k(\xi_j)$

$Z_m=\sum_k a_k B_{mk}$

$Z_m=\sum_j y_j T_m(\xi_j), \quad B_{mk}=\sum_jT_m(\xi_j)T_k(\xi_j)$

and invert the array $B$

$a=B^{-1}\cdot Z$

Select $N$ in $f=\sum_k^N a_k T_k(\xi_j)$ . You need to compute all $T_k(\xi_j)$ from $k=0$ to $k=N$ which can be done recursively using $T_0(\xi)=1$ , $T_1(\xi)=\xi$ , $T_{k+1}(\xi)=2\xi T_k(\xi)-T_{k-1}(\xi)$ .
Let’s code it up in julia, which is very easy to translate into python, R, C, whatever…

using LinearAlgebra

"""Chebyshev fit N^th order y=sum_j^N a_jT_j(u+vx) for input
   vectors X, Y. outputs [a_0,...,a_{N-1}],[u,v].
   CFit(X,Y,N,digits)
   Example: X=[1,2,3,4], Y=[1,4,9,16], CFit(X,Y,3,5) outpu is
   [7.375, 7.5, 1.125, 0.0], [-1.6666666666666667, 0.6666666666666666]
   ie y=7.375+7.5*T_1((2*x-5)/3)+1.125*T_2((2*x-5)/3)=x^2 """
CFit=function(X::Array{Float64},y::Array{Float64},N::Int,prec::Int)
n=length(X)
a=X[1]
b=X[n]
u=(a+b)/(b-a)
v=2.0/(b-a)
x=(v .* X) .-u 
M=zeros(n,n)
Z=zeros(N)
B=zeros(N,N)
for j in 1:n
M[j,1]=1.0
M[j,2]=x[j]
for k in 3:N
M[j,k]=2.0*x[j]*M[j,k-1]-M[j,k-2]
end
end

for k in 1:N
Z[k]=sum(y .* M[:,k])
for l in 1:N
B[k,l]= sum(M[:,l] .* M[:,k])
end
end
round.(inv(B)*Z,digits=prec), [-u;v]
end

We output the coefficients and the rule to get $\xi$ from $x$ .
Try it out:

x=[1.0;2;3;4;5]
y=[1.0,4,9,16,25]
CFit(x,y,4,5)
([11.0, 12.0, 2.0, 0.0], [-1.5, 0.5])

so we got

$y=11.0+12.0 T_1((x-3)/2+2T_2((x-3)/2)+0=x^2$

Integration and differentiation of the best fit curve is pretty trivial once this expansion is known.

Using

$\int_a^b T_n({2x-a-b\over b-a})\, dx={(b-a)\over 2}\int_{-1}^1 T_n(\xi)\, d\xi$

$={(b-a)\over 2}\Big({1+(-1)^n\over 1-n^2}\Big)$

we get a very simple formula for the integral of our data’s best fit curve

$\int_a^b f(x)\, dx={(b-a)\over 2}\sum_n a_n\Big({1+(-1)^n\over 1-n^2}\Big)$

Home 2.0

Algorithms: curve fitting

Related Posts

Algorithms: curve fitting

Published by Jeff on September 14, 2021September 14, 2021

Related Posts

Parallel programming: Heat equation with pthreads

Algorithms: Krylov subspaces, Arnoldi iteration and extreme eigenvalues

Algorithms: The “lost art” of interpolation