Algorithms: Nonlinear fits

Recall Newton’s method for finding roots of a function:
Suppose that you think x is a root of f(x) ie f_(x)=0, but it is only close to zero. A better guess may be y

    \[f(y)=f(x)+(y-x){\partial f\over \partial x}+\cdots\approx=f(x)+(y-x)f'(x)+\cdots\approx  0\]

Solve for y;

    \[y=x-{f(x)\over f'(x)}\]

which can be repeated until you get as close as you wish (http://abadonna.physics.wisc.edu/2021/08/05/root-finding/).

The same idea can be used to construct nonlinear least square fits to data.
Suppose the data is (t_1,t_2,\cdots, t_n), (x_1,x_2,\cdots, x_n) and the fit function f(p,t).
f(p,t) depends on M parameters p_\alpha. Create the M functions {\partial f(t)\over \partial p_\alpha} useful for expanding

    \[f(p_1+h_1, p_2+h_2,\cdots, p_M+h_M,t)=f(p_1, p_2,\cdots, p_M,t)+\sum_\alpha {\partial f(t)\over \partial p_\alpha} h_\alpha+\cdots\]

    \[=f+\bm{J}\cdot\bm{h}\]

where h=(h_1, \cdots, h_M).

The least-squares measure of how far the data x deviates from the fit is

    \[R(p)=\sum_{i=1}^N(x_i-f(p,t_i))^2\]

Then

    \[R(p+h)=\sum_{i=1}^N(x_i-f(p,t_i)-\sum_\alpha {\partial f(t_i)\over \partial p_\alpha} h_\alpha+\cdots)^2\]

or

    \[R(p+h)=\sum_{i=1}^N(x_i-f(p,t_i)-\sum_\alpha J_\alpha(t_i) h_\alpha+\cdots)^2\]

and

    \[{\partial R(p+h)\over \partial h_\beta}=-2\sum_i (x_i-f(p,t_i)-\sum_\alpha J_\alpha(t_i) h_\alpha+\cdots)(\sum_\gamma J_\gamma(t_i){\partial h_\gamma\over \partial h_\beta})\] \[=-2\sum_i (x_i-f(p,t_i)-\sum_\alpha J_\alpha(t_i) h_\alpha+\cdots)(\sum_\gamma J_\gamma(t_i)\delta_{\beta,\gamma})\] \[=-2\sum_i (x_i-f(p,t_i)-\sum_\alpha J_\alpha(t_i) h_\alpha+\cdots)J_\beta(t_i)\]

If this is zero for each \beta, we get the h values that minimize R

    \[\sum_i (x_i-f(p,t_i))J_\beta(t_i)=\sum_\alpha \sum_i J_\alpha(t_i)J_\beta(t_i) h_\alpha\]

Think of
\bullet J_\alpha(t_i) as an M\times N matrix,
\bullet \sum_i J_\alpha(t_i)J_\beta(t_i)=Q_{\alpha,\beta} as the \alpha,\beta component of a M\times M matrix,
\bullet v_i=(x_i-f(p,t_i) is the i^{th} component of an N-vector,
\bullet h_\alpha is the \alpha^{th} component of an M-vector.

    \[\bm{J}\cdot\bm{v}=\bm{Q}\cdot\bm{h}, \quad \bm{h}=\bm{Q}^{-1}\cdot \bm{J}\cdot\bm{v}\]

gives the increment in the parameters that fit better than the original ones.

The process is pretty simple:
\bullet Make a plot of x versus t and come up with an estimate of the parameters p_\alpha, \alpha=1,2,\cdots,M,
\bullet compute R,
\bullet compute h_\alpha, and update the original guess to p_\alpha\rightarrow p_\alpha+h_\alpha.
\bullet Repeat until R is as small as you want.

These types of algorithms are sensitive to the initial guess, so do a good job of creating one, and the algorithm converges very nicely.

Let’s implement this in R. We will make a fit for x=A+B e^{-Ct} and find the best A,B,C.

install.packages('matlib')
library(matlib)
# first we make some data, a vector of t values x=R0 values for some A,B,C
t <- seq(1,20,by=1)
x<- 88+1000*exp(-1.23*t)
A<- 96
B<- 1100
C <- 1.5

R0<-sum((x-(A+B*exp(-C*t)))^2)
R0

# Now we code the algorithm, we run 10 iterations
# hand-encoding df/dA=1, df/dB=exp(-C*t), etc
for (j in 1:10){
J<-matrix(c(rep(1,length(t)),exp(-C*t),-B*t*exp(-C*t)),nrow=length(t))
Jt<-t(J)
M <- Jt%*%J
dM <- diag(diag(M),3,3)
iDen=inv(M)
dp <- (iDen%*%Jt)%*%(x-A-B*exp(-C*t))
A<-A+dp[1,1]
B<-B+dp[2,1]
C<-C+dp[3,1]
R<-sum((x-(A+B*exp(-C*t)))^2)
R0<-R
}

Run the code. Note that before the run A=96, B=1100, C=1.5, and after the run A=88,B=1000, C=1.23 in exact agreement with the input data.
Below is an R library function that fits data to a Lorentzian peak (such as a real spectral line). This is part of my “Physics307” data analysis package.

lorentzian <- deriv(~ w^2/(w^2+(x-b)^2), namevec = c("b", "w"), function.arg = c("x", "b", "w"))

LorentzFit <- function(ch,cntdat,b0,w0,iter){
#' Simple fit to Lorentzian  distribution y ~ A w^2/(w^2+(x-b)^2).
#' @param ch A vector of bin/channel labels.
#' @param cntdat A vector of bin occupancies/MCA channel counts.
#' @param b0 A first guess of peak channel location.
#' @param w0 A first guess of peak width in channels.
#' @param iter Number of iterations.
#' @return Returns a list of initial R=R0, final R=R, b,w, A.
#' @examples 
#' fit<- LorentFit(x,y,300,60,5)

cnt<-lorentzian(ch,b0,w0)
grad<-attr(cnt,"gradient")
val<- cnt[1:length(ch)]
Y<- cntdat/max(cntdat)
R0<- sum((Y-val)^2)
for (i in 1:iter){
J<-t(grad)
Q<-t(grad)%*%grad
Qi<-inv(Q)
params<-Qi%*%J%*%(Y-val)
b0<-b0+params[1];w0<-w0+params[2]
cnt<-lorentzian(ch,b0,w0)
grad<-attr(cnt,"gradient")
val<- cnt[1:length(ch)]
R<- sum((Y-val)^2)}
c(R0,R,b0,w0,max(cntdat))
}
Home 2.0
error: Content is protected !!