We talked about root finding (http://abadonna.physics.wisc.edu/2021/08/05/root-finding/) and automatic differentiation (http://abadonna.physics.wisc.edu/2021/09/11/auto-differentiation-make-a-julia-package) previously, let’s discuss a more powerful method of optimization/root finding that uses autodifferentiation, the gradient descent method.
The idea is that if we have a function whose root we wish to find, and if the function is convex (so that we find the global minimum and not some local one) then if is a good guess (ie ) then heading “downhill” from by some small amount will take us to a better guess . In other words if is small enough, then
is a better guess. Of course this may end up oscillating around an actual minimum if is not allowed to vary with each iteration. Apply the algorithm twice
and subtract, assume that
and so is ignorable
and dot into each side, and take absolute values (we want so that we are always heading downhill)
This can be used to solve a system of nonlinear equations
for by extremizing
We can easily add this functionality to the DualNumbers package that I posed way back, by adding to it a gradient function. We modify the DualNumbers package to make the Dn struct mutable, so that we can select which of the will be a dual number to facilitate building the components of the gradient. Then we can add in the gradient helper, the gradient, and the gradient descent:
module DualNum
import Base: *, +, -, /, log,exp,sqrt,sin,cos,tan,sinh,cosh,tanh,log,exp,^
mutable struct Dn
val
eps
end
# the rest of the DualNum package is the same
.
.
.
# here is the additional stuff...
""" helper function for gradients"""
Dnj=function(x,j)
y=Dn.(x)
for k in 1:length(x)
if k!=j
y[k].eps=0.0
end
end
return y
end
""" Gradient of a function of several variables"""
gradient=function(f,x)
g=[Deriv(f(Dnj(x,j))) for j in 1:length(x)]
return g
end
grad_descent=function(f,x0,g,iter,prec)
for k in 1:iter
ggrad=g*gradient(f,x0)
x1=x0 .- ggrad'
diffg=gradient(f,x1) .- gradient(f,x0)
g=abs(sum((x1-x0) .* diffg'))/sum(diffg .* diffg)
x0=x1
if(sqrt(sum(ggrad .* ggrad)))<=prec
break
end
end
return x0
end
Let’s run it for some nice convex functions. We need an initial value of which I take to be 0.1, …
func3=function(x)
return -1.0*cos(x[1]^2+x[2]^2)
end
func4=function(x)
return (x[1]-2.0)^2+x[2]^2+2*x[3]^2
end
grad_descent(func4, [0.3 0.3 0.5], 0.1, 8,0.001)
1×3 Matrix{Float64}:
2.0 5.63528e-7 -8.05855e-5
grad_descent(func3, [0.3 1.0], 0.1, 30,0.001)
1×2 Matrix{Float64}:
0.000768393 0.00256131