How to get reproducible results in keras How to get reproducible results in keras python python

How to get reproducible results in keras


You can find the answer at the Keras docs: https://keras.io/getting-started/faq/#how-can-i-obtain-reproducible-results-using-keras-during-development.

In short, to be absolutely sure that you will get reproducible results with your python script on one computer's/laptop's CPU then you will have to do the following:

  1. Set the PYTHONHASHSEED environment variable at a fixed value
  2. Set the python built-in pseudo-random generator at a fixed value
  3. Set the numpy pseudo-random generator at a fixed value
  4. Set the tensorflow pseudo-random generator at a fixed value
  5. Configure a new global tensorflow session

Following the Keras link at the top, the source code I am using is the following:

# Seed value# Apparently you may use different seed values at each stageseed_value= 0# 1. Set the `PYTHONHASHSEED` environment variable at a fixed valueimport osos.environ['PYTHONHASHSEED']=str(seed_value)# 2. Set the `python` built-in pseudo-random generator at a fixed valueimport randomrandom.seed(seed_value)# 3. Set the `numpy` pseudo-random generator at a fixed valueimport numpy as npnp.random.seed(seed_value)# 4. Set the `tensorflow` pseudo-random generator at a fixed valueimport tensorflow as tftf.random.set_seed(seed_value)# for later versions: # tf.compat.v1.set_random_seed(seed_value)# 5. Configure a new global `tensorflow` sessionfrom keras import backend as Ksession_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)K.set_session(sess)# for later versions:# session_conf = tf.compat.v1.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)# sess = tf.compat.v1.Session(graph=tf.compat.v1.get_default_graph(), config=session_conf)# tf.compat.v1.keras.backend.set_session(sess)

It is needless to say that you do not have to to specify any seed or random_state at the numpy, scikit-learn or tensorflow/keras functions that you are using in your python script exactly because with the source code above we set globally their pseudo-random generators at a fixed value.


Theano's documentation talks about the difficulties of seeding random variables and why they seed each graph instance with its own random number generator.

Sharing a random number generator between different {{{RandomOp}}} instances makes it difficult to producing the same stream regardless of other ops in graph, and to keep {{{RandomOps}}} isolated. Therefore, each {{{RandomOp}}} instance in a graph will have its very own random number generator. That random number generator is an input to the function. In typical usage, we will use the new features of function inputs ({{{value}}}, {{{update}}}) to pass and update the rng for each {{{RandomOp}}}. By passing RNGs as inputs, it is possible to use the normal methods of accessing function inputs to access each {{{RandomOp}}}’s rng. In this approach it there is no pre-existing mechanism to work with the combined random number state of an entire graph. So the proposal is to provide the missing functionality (the last three requirements) via auxiliary functions: {{{seed, getstate, setstate}}}.

They also provide examples on how to seed all the random number generators.

You can also seed all of the random variables allocated by a RandomStreams object by that object’s seed method. This seed will be used to seed a temporary random number generator, that will in turn generate seeds for each of the random variables.

>>> srng.seed(902340)  # seeds rv_u and rv_n with different seeds each


I finally got reproducible results with my code. It's a combination of answers I saw around the web. The first thing is doing what @alex says:

  1. Set numpy.random.seed;
  2. Use PYTHONHASHSEED=0 for Python 3.

Then you have to solve the issue noted by @user2805751 regarding cuDNN by calling your Keras code with the following additional THEANO_FLAGS:

  1. dnn.conv.algo_bwd_filter=deterministic,dnn.conv.algo_bwd_data=deterministic

And finally, you have to patch your Theano installation as per this comment, which basically consists in:

  1. replacing all calls to *_dev20 operator by its regular version in theano/sandbox/cuda/opt.py.

This should get you the same results for the same seed.

Note that there might be a slowdown. I saw a running time increase of about 10%.