Getting some form of keras multi-processing/threading to work on Windows Getting some form of keras multi-processing/threading to work on Windows multithreading multithreading

Getting some form of keras multi-processing/threading to work on Windows


In combination with a sequence, using multi_processing=False and workers=e.g. 4 does work.

I just realized that in the example code in the question, I was not seeing the speed-up, because the data was being generated too fast. By inserting a time.sleep(2) this becomes evident.

class DummySequence(Sequence):def __init__(self, x_set, y_set, batch_size):    self.x, self.y = x_set, y_set    self.batch_size = batch_sizedef __len__(self):    return int(np.ceil(len(self.x) / float(self.batch_size)))def __getitem__(self, idx):            batch_x = self.x[idx * self.batch_size:(idx + 1) * self.batch_size]    batch_y = self.y[idx * self.batch_size:(idx + 1) * self.batch_size]    time.sleep(2)    return np.array(batch_x), np.array(batch_y)x = np.random.random((100, 3))y = to_categorical(np.random.random(100) > .5).astype(int)seq = DummySequence(x, y, 10)model = Sequential()model.add(Dense(32, input_dim=3))model.add(Dense(2, activation='softmax'))model.compile(optimizer='rmsprop',              loss='categorical_crossentropy',              metrics=['accuracy'])print('single worker')model.fit_generator(generator=seq,                     steps_per_epoch = 10,                    epochs = 2,                     verbose=2,                    workers=1)print('achieves speed-up!')model.fit_generator(generator=seq,                     steps_per_epoch = 10,                    epochs = 2,                     verbose=2,                    workers=4,                    use_multiprocessing=False)

This produced on my laptop the following:

single worker>>> model.fit_generator(generator=seq,...                     steps_per_epoch = 10,...                     epochs = 2,...                     verbose=2,...                     workers=1)Epoch 1/2 - 20s - loss: 0.6984 - acc: 0.5000Epoch 2/2 - 20s - loss: 0.6955 - acc: 0.5100

and

achieves speed-up!>>> model.fit_generator(generator=seq,...                     steps_per_epoch = 10,...                     epochs = 2,...                     verbose=2,...                     workers=4,...                     use_multiprocessing=False)Epoch 1/2 - 6s - loss: 0.6904 - acc: 0.5200Epoch 2/2 - 6s - loss: 0.6900 - acc: 0.5000

Important notes:You will probably want self.lock = threading.Lock() in __init___ and then with self.lock: in __getitem__. Try to do the absolute bare minimum required within the with self.lock:, as far as I understand it, that would be any reference to self.xxxx (multi-threading is prevent while the with self.lock: block is running).

Additionally, if you want multithreading to speed up calculations (i.e. CPU operations are the limit), do not expect any speed-up. The global-interpreter lock (GIL) will prevent that. Multithreading will only help you, if the limitation is in I/O operations. Apparently, to speed-up CPU computations we need true multiprocessing, which keras currently does not support on Windows 10. Perhaps it is possible to hand-craft a multi-processing generator (I have no idea).


I tested your proposal at my solution with GPU / CPU monitoring.

  1. There is some speed increase ~10% (440 sec vs. 550 sec) in my case
  2. The CPU uses only 1 core at time. GPU load is not above 22%

Looks like one core runs more efficient way with more workers assigned. However no true multiprocessing is enabled.

TF 2.0

Keras 2.2.4