Skip to content

2nd Order Methods #19

@max-vassili3v

Description

@max-vassili3v

Suppose we have a function $f: \mathbb{R^n} \to \mathbb{R}$. Using dual numbers we can input a dual vector $x + J\epsilon$ and get out a dual number. This induces a function $g: \mathbb{R}^n \to \mathbb{R}^n$ defined as $g(x) \to \text{Dual parts of }f(x + J\epsilon)$. If we now input $y + I\epsilon'$ into $g$, where $\epsilon_i\epsilon'_j \neq 0$ for all $1 \leq i, j \leq n$, the $\epsilon_i\epsilon'_j$ terms give us the Hessian coefficients.

In application, if we allow definitions DualVector(::DualVector, ::AbstractMatrix) and Dual(::Dual, ::DualVector), this should have the same effect, and would open up discussion or examples concerning 2nd order methods in DualArrays.jl.

Example:

  • f(x) = x[1] * x[2]
  • g(x) = f(DualVector(x, I(length(x)))).partials
  • d = DualVector([1, 2], I(2))
  • In order to evaluate g(d), we start with a DualVector(d, I(2))
  • We then pass through f giving us Dual(d[1], [1, 0]) * Dual(d[2], [0, 1])
  • The product rule makes this evaluate to Dual(d[1] * d[2], [d[2], d[1]]) (Note: we can overload operations such that [d[2], d[1]] = d[2] * [1,0] + d[1] * [0,1] returns a DualVector. In general, for the latter argument to be a DualVector instead of a Vector{Dual} operations between Dual and AbstractVector also need to be overloaded)
  • The jacobian of [d[2], d[1]] is actually a matrix of nested dual parts, so this is our Hessian. We verify from the definition of d that this is the correct Hessian, [0 1;1 0].

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions