Algorithms: curve fitting

For any set of N data points there is a unique N-1 order polynomial that passes through all of the points. Its construction can be bootstrapped from the linear case that is so well known. Let (x_1,y_1) and (x_2, y_2) be on a line y=a+bx. Then the equations

    \[y_1=a+bx_1, \qquad y_2=a+bx_2\]

have solution

    \[y=\big({x_1y_2-x_2y_1\over x_1-x_2}\big) +\big({y_1-y_2\over x_1-x_2}\big) x\]

which we rewrite as

    \[y=\big(y_1{x-x_2\over x_1-x_2}\big) -\big(y_2{x-x_1\over x_1-x_2}\big)\]

From this we see how to build the general solution, for example for three data points the unique quadratic polynomial that interpolates the points is (Lagrange interpolation formula)

    \[y=y_1\big({(x-x_2)(x-x_3)\over(x_1-x_2)(x_2-x_3)}\big)-y_2\big({(x-x_1)(x-x_3)\over(x_1-x_2)(x_2-x_3)}\big)\]

    \[+y_3\big({(x-x_2)(x-x_1)\over(x_1-x_2)(x_2-x_3)}\big)\]

which is the solution to the linear system

    \[\left(\begin{array}{ccc} x_1^2 & x_1 & 1\\ x_2^2 & x_2 & 1\\ x_1^3 & x_3 & 1\end{array}\right)\left(\begin{array}{c} a\\ b\\ c\end{array}\right)=\left(\begin{array}{c} y_1\\ y_2\\ y_3\end{array}\right)\]

Chebyshev polynomials make a much better choice than x^k functions for expansion on the domain -1\le x\le 1 because the matrix that is inverted in the algorithm is far better conditioned than in the monomial case. If we are working with data such that a\le x_i\le b, we can transform it into the domain -1\le x'_i\le 1 with the transformation x'_i={2x_i-a-b\over b-a}.
By changing one routine in the program, we could expand the data into a sum of orthogonal polynomials of any type, Chebyshev, Legendre, Hermite, and so forth.

Best fits to curves
Suppose that you have lab data: (x_1,y_1),(x_2,y_2),\cdots,(x_n,y_n) and you need a best fit to a curve y=f(x). Using least squares we can find the coefficients a_j such that

    \[f=\sum_j a_j p_j(x)\]

where p_j are some well-behaved polynomial functions on the domain of your x's, x_i\in [a,b]. Map this domain into \xi_j\in [-1,1] and use Chebyshev polynomials p_j=T_j, since on [-1,1] these polynomials are bounded and the resulting linear equations are well-conditioned.

We create the cumulative deviation function

    \[Q=\sum_j(y_j-\sum_k a_k T_k(\xi_j))^2\]

and set up the linear equations

    \[{\partial Q\over \partial a_m}=0, \qquad \sum_j y_j T_m(\xi_j)=\sum_{j,k}a_k T_m(\xi_j)T_k(\xi_j)\]

or

    \[Z_m=\sum_k a_k B_{mk}\]

    \[Z_m=\sum_j y_j T_m(\xi_j), \quad B_{mk}=\sum_jT_m(\xi_j)T_k(\xi_j)\]

and invert the array B

    \[a=B^{-1}\cdot Z\]

Select N in f=\sum_k^N a_k T_k(\xi_j). You need to compute all T_k(\xi_j) from k=0 to k=N which can be done recursively using T_0(\xi)=1, T_1(\xi)=\xi, T_{k+1}(\xi)=2\xi T_k(\xi)-T_{k-1}(\xi).
Let’s code it up in julia, which is very easy to translate into python, R, C, whatever…

using LinearAlgebra

"""Chebyshev fit N^th order y=sum_j^N a_jT_j(u+vx) for input
   vectors X, Y. outputs [a_0,...,a_{N-1}],[u,v].
   CFit(X,Y,N,digits)
   Example: X=[1,2,3,4], Y=[1,4,9,16], CFit(X,Y,3,5) outpu is
   [7.375, 7.5, 1.125, 0.0], [-1.6666666666666667, 0.6666666666666666]
   ie y=7.375+7.5*T_1((2*x-5)/3)+1.125*T_2((2*x-5)/3)=x^2 """
CFit=function(X::Array{Float64},y::Array{Float64},N::Int,prec::Int)
n=length(X)
a=X[1]
b=X[n]
u=(a+b)/(b-a)
v=2.0/(b-a)
x=(v .* X) .-u 
M=zeros(n,n)
Z=zeros(N)
B=zeros(N,N)
for j in 1:n
M[j,1]=1.0
M[j,2]=x[j]
for k in 3:N
M[j,k]=2.0*x[j]*M[j,k-1]-M[j,k-2]
end
end

for k in 1:N
Z[k]=sum(y .* M[:,k])
for l in 1:N
B[k,l]= sum(M[:,l] .* M[:,k])
end
end
round.(inv(B)*Z,digits=prec), [-u;v]
end

We output the coefficients and the rule to get \xi from x.
Try it out:

x=[1.0;2;3;4;5]
y=[1.0,4,9,16,25]
CFit(x,y,4,5)
([11.0, 12.0, 2.0, 0.0], [-1.5, 0.5])

so we got

    \[y=11.0+12.0 T_1((x-3)/2+2T_2((x-3)/2)+0=x^2\]

Integration and differentiation of the best fit curve is pretty trivial once this expansion is known.

Using

    \[\int_a^b T_n({2x-a-b\over b-a})\, dx={(b-a)\over 2}\int_{-1}^1 T_n(\xi)\, d\xi\]

    \[={(b-a)\over 2}\Big({1+(-1)^n\over 1-n^2}\Big)\]

we get a very simple formula for the integral of our data’s best fit curve

    \[\int_a^b f(x)\, dx={(b-a)\over 2}\sum_n a_n\Big({1+(-1)^n\over 1-n^2}\Big)\]

Home 2.0
error: Content is protected !!