Algorithms: gradient descent

We talked about root finding (http://abadonna.physics.wisc.edu/2021/08/05/root-finding/) and automatic differentiation (http://abadonna.physics.wisc.edu/2021/09/11/auto-differentiation-make-a-julia-package) previously, let’s discuss a more powerful method of optimization/root finding that uses autodifferentiation, the gradient descent method.

The idea is that if we have a function f(\bm{x}) whose root we wish to find, and if the function is convex (so that we find the global minimum and not some local one) then if \bm{x}_0 is a good guess (ie f(\bm{x}_0)\approx 0) then heading “downhill” from \bm{x}_0 by some small amount will take us to a better guess \bm{x}_1. In other words if \gamma is small enough, then

    \[\bm{x}_1=\bm{x}_0-\gamma\bm{\nabla}f(\bm{x}_0)\]

is a better guess. Of course this may end up oscillating around an actual minimum if \gamma is not allowed to vary with each iteration. Apply the algorithm twice

    \[\bm{x}_n=\bm{x}_{n-1}-\gamma\bm{\nabla}f(\bm{x}_{n-1})\]

    \[\bm{x}_{n+1}=\bm{x}_{n}-\gamma\bm{\nabla}f(\bm{x}_{n})\]

and subtract, assume that

    \[\bm{x}_{n+1}-\bm{x}_n <\bm{\epsilon}\]

and so is ignorable

    \[(\bm{x}_{n}-\bm{x}_{n-1})=\gamma(\bm{\nabla}f(\bm{x}_{n})-\bm{\nabla}f(\bm{x}_{n-1}))\]

and dot \bm{\nabla}f(\bm{x}_{n})-\bm{\nabla}f(\bm{x}_{n-1}) into each side, and take absolute values (we want \gamma\ge 0 so that we are always heading downhill)

    \[\gamma=\gamma_n={|(\bm{x}_{n}-\bm{x}_{n-1})\cdot (\bm{\nabla}f(\bm{x}_{n})-\bm{\nabla}f(\bm{x}_{n-1}))\over |\bm{\nabla}f(\bm{x}_{n})-\bm{\nabla}f(\bm{x}_{n-1})|^2}\]

This can be used to solve a system of nonlinear equations

    \[f_1(\bm{x})=0, \quad f_2(\bm{x})=0, \cdots, f_n(\bm{x})=0\]

for \bm{x}=(x_1,x_2,\cdots, x_n) by extremizing

    \[F={1\over 2}\sum_{j=1}^n f_j^2(\bm{x})\]

We can easily add this functionality to the DualNumbers package that I posed way back, by adding to it a gradient function. We modify the DualNumbers package to make the Dn struct mutable, so that we can select which of the x_j will be a dual number to facilitate building the components of the gradient. Then we can add in the gradient helper, the gradient, and the gradient descent:

module DualNum
import Base: *, +, -, /, log,exp,sqrt,sin,cos,tan,sinh,cosh,tanh,log,exp,^

mutable struct Dn
val
eps
end

# the rest of the DualNum package is the same
.
.
.
# here is the additional stuff...
""" helper function for gradients"""
Dnj=function(x,j)
       y=Dn.(x)
       for k in 1:length(x)
       if k!=j
       y[k].eps=0.0
       end
       end
       return y
       end

""" Gradient of a function of several variables"""
gradient=function(f,x)
g=[Deriv(f(Dnj(x,j))) for j in 1:length(x)]
return g
end

grad_descent=function(f,x0,g,iter,prec)
for k in 1:iter
ggrad=g*gradient(f,x0)
x1=x0 .- ggrad'
diffg=gradient(f,x1) .- gradient(f,x0)
g=abs(sum((x1-x0) .* diffg'))/sum(diffg .* diffg)
x0=x1
if(sqrt(sum(ggrad .* ggrad)))<=prec
break
end
end
return x0
end


Let’s run it for some nice convex functions. We need an initial value of \gamma which I take to be 0.1, …

func3=function(x)
       return -1.0*cos(x[1]^2+x[2]^2)
       end

func4=function(x)
       return (x[1]-2.0)^2+x[2]^2+2*x[3]^2
       end

grad_descent(func4, [0.3 0.3 0.5], 0.1, 8,0.001)
1×3 Matrix{Float64}:
 2.0  5.63528e-7  -8.05855e-5

grad_descent(func3, [0.3 1.0], 0.1, 30,0.001)
1×2 Matrix{Float64}:
 0.000768393  0.00256131
Home 2.0
error: Content is protected !!