What does compute_gradients return in tensorflow

python-3.x tensorflow deep-learning gradient

compute_gradients(a,b) returns d[ sum a ]/db. So in your case this returns d mean_sq / d theta, where theta is set of all variables. There is no "dx" in this equation, you are not computing gradients wrt. inputs. So what happens with batch dimension? You remove it yourself in the definition of mean_sq:

mean_sqr = tf.reduce_mean(tf.pow(y_ - y, 2))

thus (I am assuming y is 1D for simplicity)

d[ mean_sqr ] / d theta = d[ 1/M SUM_i=1^M (pred(x_i), y_i)^2 ] / d theta                        = 1/M SUM_i=1^M d[ (pred(x_i), y_i)^2 ] / d theta

so you are in control of whether it sums over batch, takes the mean or does something different, if you would define mean_sqr to use reduce_sum instead of a reduce_mean, gradients would be the sum over the batch and so on.

On the other hand apply_gradients simply "applies the gradients", the exact rule for application is optimiser dependent, for GradientDescentOptimizer it would be

theta <- theta - learning_rate * gradients(theta)

For Adam that you are using the equation is more complex of course.

Note however that tf.gradients is more like "backprop" than true gradient in mathematical sense - meaning that it depends on the graph dependencies and does not recognise dependences which are in "opposite" direction.

CodeHunter

What does compute_gradients return in tensorflow

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last