Ordering of batch normalization and dropout?

python neural-network tensorflow conv-neural-network

In the Ioffe and Szegedy 2015, the authors state that "we would like to ensure that for any parameter values, the network always produces activations with the desired distribution". So the Batch Normalization Layer is actually inserted right after a Conv Layer/Fully Connected Layer, but before feeding into ReLu (or any other kinds of) activation. See this video at around time 53 min for more details.

As far as dropout goes, I believe dropout is applied after activation layer. In the dropout paper figure 3b, the dropout factor/probability matrix r(l) for hidden layer l is applied to it on y(l), where y(l) is the result after applying activation function f.

So in summary, the order of using batch normalization and dropout is:

-> CONV/FC -> BatchNorm -> ReLu(or other activation) -> Dropout -> CONV/FC ->

python neural-network tensorflow conv-neural-network

As noted in the comments, an amazing resource to read up on the order of layers is here. I have gone through the comments and it is the best resource on topic i have found on internet

My 2 cents:

Dropout is meant to block information from certain neurons completely to make sure the neurons do not co-adapt.So, the batch normalization has to be after dropout otherwise you are passing information through normalization statistics.

If you think about it, in typical ML problems, this is the reason we don't compute mean and standard deviation over entire data and then split it into train, test and validation sets. We split and then compute the statistics over the train set and use them to normalize and center the validation and test datasets

so i suggest Scheme 1 (This takes pseudomarvin's comment on accepted answer into consideration)

-> CONV/FC -> ReLu(or other activation) -> Dropout -> BatchNorm -> CONV/FC

as opposed to Scheme 2

-> CONV/FC -> BatchNorm -> ReLu(or other activation) -> Dropout -> CONV/FC -> in the accepted answer

Please note that this means that the network under Scheme 2 should show over-fitting as compared to network under Scheme 1 but OP ran some tests as mentioned in question and they support Scheme 2

python neural-network tensorflow conv-neural-network

Usually, Just drop the `Dropout`(when you have `BN`):

"BN eliminates the need for Dropout in some cases cause BN provides similar regularization benefits as Dropout intuitively"
"Architectures like ResNet, DenseNet, etc. not using Dropout

For more details, refer to this paper [Understanding the Disharmony between Dropout and Batch Normalization by Variance Shift] as already mentioned by @Haramoz in the comments.

CodeHunter

Ordering of batch normalization and dropout?

Usually, Just drop the `Dropout`(when you have `BN`):

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last

Ordering of batch normalization and dropout?

Usually, Just drop the Dropout(when you have BN):

Recent Posts

Usually, Just drop the `Dropout`(when you have `BN`):