Algorithms: gradient descent

We talked about root finding (http://abadonna.physics.wisc.edu/2021/08/05/root-finding/) and automatic differentiation (http://abadonna.physics.wisc.edu/2021/09/11/auto-differentiation-make-a-julia-package) previously, let’s discuss a more powerful method of optimization/root finding that uses autodifferentiation, the gradient descent method.

The idea is that if we have a function $f(\bm{x})$ whose root we wish to find, and if the function is convex (so that we find the global minimum and not some local one) then if $\bm{x}_0$ is a good guess (ie $f(\bm{x}_0)\approx 0$ ) then heading “downhill” from $\bm{x}_0$ by some small amount will take us to a better guess $\bm{x}_1$ . In other words if $\gamma$ is small enough, then

$\bm{x}_1=\bm{x}_0-\gamma\bm{\nabla}f(\bm{x}_0)$

is a better guess. Of course this may end up oscillating around an actual minimum if $\gamma$ is not allowed to vary with each iteration. Apply the algorithm twice

$\bm{x}_n=\bm{x}_{n-1}-\gamma\bm{\nabla}f(\bm{x}_{n-1})$

$\bm{x}_{n+1}=\bm{x}_{n}-\gamma\bm{\nabla}f(\bm{x}_{n})$

and subtract, assume that

$\bm{x}_{n+1}-\bm{x}_n <\bm{\epsilon}$

and so is ignorable

$(\bm{x}_{n}-\bm{x}_{n-1})=\gamma(\bm{\nabla}f(\bm{x}_{n})-\bm{\nabla}f(\bm{x}_{n-1}))$

and dot $\bm{\nabla}f(\bm{x}_{n})-\bm{\nabla}f(\bm{x}_{n-1})$ into each side, and take absolute values (we want $\gamma\ge 0$ so that we are always heading downhill)

$\gamma=\gamma_n={|(\bm{x}_{n}-\bm{x}_{n-1})\cdot (\bm{\nabla}f(\bm{x}_{n})-\bm{\nabla}f(\bm{x}_{n-1}))\over |\bm{\nabla}f(\bm{x}_{n})-\bm{\nabla}f(\bm{x}_{n-1})|^2}$

This can be used to solve a system of nonlinear equations

$f_1(\bm{x})=0, \quad f_2(\bm{x})=0, \cdots, f_n(\bm{x})=0$

for $\bm{x}=(x_1,x_2,\cdots, x_n)$ by extremizing

$F={1\over 2}\sum_{j=1}^n f_j^2(\bm{x})$

We can easily add this functionality to the DualNumbers package that I posed way back, by adding to it a gradient function. We modify the DualNumbers package to make the Dn struct mutable, so that we can select which of the $x_j$ will be a dual number to facilitate building the components of the gradient. Then we can add in the gradient helper, the gradient, and the gradient descent:

module DualNum
import Base: *, +, -, /, log,exp,sqrt,sin,cos,tan,sinh,cosh,tanh,log,exp,^

mutable struct Dn
val
eps
end

# the rest of the DualNum package is the same
.
.
.
# here is the additional stuff...
""" helper function for gradients"""
Dnj=function(x,j)
       y=Dn.(x)
       for k in 1:length(x)
       if k!=j
       y[k].eps=0.0
       end
       end
       return y
       end

""" Gradient of a function of several variables"""
gradient=function(f,x)
g=[Deriv(f(Dnj(x,j))) for j in 1:length(x)]
return g
end

grad_descent=function(f,x0,g,iter,prec)
for k in 1:iter
ggrad=g*gradient(f,x0)
x1=x0 .- ggrad'
diffg=gradient(f,x1) .- gradient(f,x0)
g=abs(sum((x1-x0) .* diffg'))/sum(diffg .* diffg)
x0=x1
if(sqrt(sum(ggrad .* ggrad)))<=prec
break
end
end
return x0
end

Let’s run it for some nice convex functions. We need an initial value of $\gamma$ which I take to be 0.1, …

func3=function(x)
       return -1.0*cos(x[1]^2+x[2]^2)
       end

func4=function(x)
       return (x[1]-2.0)^2+x[2]^2+2*x[3]^2
       end

grad_descent(func4, [0.3 0.3 0.5], 0.1, 8,0.001)
1×3 Matrix{Float64}:
 2.0  5.63528e-7  -8.05855e-5

grad_descent(func3, [0.3 1.0], 0.1, 30,0.001)
1×2 Matrix{Float64}:
 0.000768393  0.00256131

Home 2.0

Algorithms: gradient descent

Related Posts

Algorithms: gradient descent

Published by Jeff on January 7, 2022January 7, 2022

Related Posts

Protected: ML: Topological data analysis-I

ML/algorithms: PCA (principal component analysis)

Algorithms: SVD (singular value decomposition)