Custom TensorFlow Keras optimizer

python tensorflow deep-learning tf.keras tensorflow2.x

Update: TF2.2 forced me to clean up all implementations - so now they can be used as a reference for TF best practices. Also added a section below on _get_hyper vs. _set_hyper.

I've implemented Keras AdamW in all major TF & Keras versions - I invite you to examine optimizers_v2.py. Several points:

You should inherit OptimizerV2, which is actually what you linked; it's the latest and current base class for tf.keras optimizers
You are correct in (1) - this is a documentation mistake; the methods are private, as they aren't meant to be used by the user directly.
apply_gradients (or any other method) is only overidden if the default doesn't accomplish what's needed for a given optimizer; in your linked example, it's just a one-liner addon to the original
"So, it seems that a _create_slots method must be defined in an optimizer subclass if that subclass does not override apply_gradients" - the two are unrelated; it's coincidental.

What is the difference between _resource_apply_dense and _resource_apply_sparse?

Latter deals with sparse layers - e.g. Embedding - and former with everything else; example.

When should I use _create_slots()?

When defining trainable tf.Variables; example: weights' first and second order moments (e.g. Adam). It uses add_slot().

_get_hyper vs. _set_hyper: they enable setting and getting Python literals (int, str, etc), callables, and tensors. They exist largely for convenience: anything set via _set_hyper can be retrieved via _get_hyper, avoiding repeating boilerplate code. I dedicated a Q&A to it here.

python tensorflow deep-learning tf.keras tensorflow2.x

Yes, this looks to be a documentation error. The preceding underscore names are the correct methods to override. Related is the non-Keras Optimizer which has these all defined, but not implemented in the base class https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/training/optimizer.py

  def _create_slots(self, var_list):    """Create all slots needed by the variables.    Args:      var_list: A list of `Variable` objects.    """    # No slots needed by default    pass  def _resource_apply_dense(self, grad, handle):    """Add ops to apply dense gradients to the variable `handle`.    Args:      grad: a `Tensor` representing the gradient.      handle: a `Tensor` of dtype `resource` which points to the variable       to be updated.    Returns:      An `Operation` which updates the value of the variable.    """    raise NotImplementedError()  def _resource_apply_sparse(self, grad, handle, indices):    """Add ops to apply sparse gradients to the variable `handle`.    Similar to `_apply_sparse`, the `indices` argument to this method has been    de-duplicated. Optimizers which deal correctly with non-unique indices may    instead override `_resource_apply_sparse_duplicate_indices` to avoid this    overhead.    Args:      grad: a `Tensor` representing the gradient for the affected indices.      handle: a `Tensor` of dtype `resource` which points to the variable       to be updated.      indices: a `Tensor` of integral type representing the indices for       which the gradient is nonzero. Indices are unique.    Returns:      An `Operation` which updates the value of the variable.    """    raise NotImplementedError()

I don't know about apply_dense. For one thing, if you do override it, the code mentions that a per-replica DistributionStrategy could be "dangerous"

    # TODO(isaprykin): When using a DistributionStrategy, and when an    # optimizer is created in each replica, it might be dangerous to    # rely on some Optimizer methods.  When such methods are called on a    # per-replica optimizer, an exception needs to be thrown.  We do    # allow creation per-replica optimizers however, because the    # compute_gradients()->apply_gradients() sequence is safe.

CodeHunter

Custom TensorFlow Keras optimizer

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last