Why Bother With Recurrent Neural Networks For Structured Data? Why Bother With Recurrent Neural Networks For Structured Data? python python

Why Bother With Recurrent Neural Networks For Structured Data?


In practice even in NLP you see that RNNs and CNNs are often competitive. Here's a 2017 review paper that shows this in more detail. In theory it might be the case that RNNs can handle the full complexity and sequential nature of language better but in practice the bigger obstacle is usually properly training the network and RNNs are finicky.

Another problem that might have a chance of working would be to look at a problem like the balanced parenthesis problem (either with just parentheses in the strings or parentheses along with other distractor characters). This requires processing the inputs sequentially and tracking some state and might be easier to learn with a LSTM then a FFN.

Update:Some data that looks sequential might not actually have to be treated sequentially. For example even if you provide a sequence of numbers to add since addition is commutative a FFN will do just as well as a RNN. This could also be true of many health problems where the dominating information is not of a sequential nature. Suppose every year a patient's smoking habits are measured. From a behavioral standpoint the trajectory is important but if you're predicting whether the patient will develop lung cancer the prediction will be dominated by just the number of years the patient smoked (maybe restricted to the last 10 years for the FFN).

So you want to make the toy problem more complex and to require taking into account the ordering of the data. Maybe some kind of simulated time series, where you want to predict whether there was a spike in the data, but you don't care about absolute values just about the relative nature of the spike.

Update2

I modified your code to show a case where RNNs perform better. The trick was to use more complex conditional logic that is more naturally modeled in LSTMs than FFNs. The code is below. For 8 columns we see that the FFN trains in 1 minute and reaches a validation loss of 6.3. The LSTM takes 3x longer to train but it's final validation loss is 6x lower at 1.06.

As we increase the number of columns the LSTM has a larger and larger advantage, especially if we added more complicated conditions in. For 16 columns the FFNs validation loss is 19 (and you can more clearly see the training curve as the model isn't able to instantly fit the data). In comparison the LSTM takes 11 times longer to train but has a validation loss of 0.31, 30 times smaller than the FFN! You can play around with even larger matrices to see how far this trend will extend.

from keras import modelsfrom keras import layersfrom keras.layers import Dense, LSTMimport numpy as npimport matplotlib.pyplot as pltimport matplotlibimport timematplotlib.use('Agg')np.random.seed(20180908)rows = 20500cols = 10# Randomly generate ZZ = 100*np.random.uniform(0.05, 1.0, size = (rows, cols))larger = np.max(Z[:, :cols/2], axis=1).reshape((rows, 1))larger2 = np.max(Z[:, cols/2:], axis=1).reshape((rows, 1))smaller = np.min((larger, larger2), axis=0)# Z is now the max of the first half of the array.Z = np.append(Z, larger, axis=1)# Z is now the min of the max of each half of the array.# Z = np.append(Z, smaller, axis=1)# Combine and shuffle.#Z = np.concatenate((Z_sum, Z_avg), axis = 0)np.random.shuffle(Z)## Training and validation data.split = 10000X_train = Z[:split, :-1]X_valid = Z[split:, :-1]Y_train = Z[:split, -1:].reshape(split, 1)Y_valid = Z[split:, -1:].reshape(rows - split, 1)print(X_train.shape)print(Y_train.shape)print(X_valid.shape)print(Y_valid.shape)print("Now setting up the FNN")## FNN model.tick = time.time()# Define model.network_fnn = models.Sequential()network_fnn.add(layers.Dense(32, activation = 'relu', input_shape = (X_train.shape[1],)))network_fnn.add(Dense(1, activation = None))# Compile model.network_fnn.compile(optimizer = 'adam', loss = 'mean_squared_error')# Fit model.history_fnn = network_fnn.fit(X_train, Y_train, epochs = 500, batch_size = 128, verbose = False,    validation_data = (X_valid, Y_valid))tock = time.time()print()print(str('%.2f' % ((tock - tick) / 60)) + ' minutes.')print("Now evaluating the FNN")loss_fnn = history_fnn.history['loss']val_loss_fnn = history_fnn.history['val_loss']epochs_fnn = range(1, len(loss_fnn) + 1)print("train loss: ", loss_fnn[-1])print("validation loss: ", val_loss_fnn[-1])plt.plot(epochs_fnn, loss_fnn, 'black', label = 'Training Loss')plt.plot(epochs_fnn, val_loss_fnn, 'red', label = 'Validation Loss')plt.title('FNN: Training and Validation Loss')plt.legend()plt.show()plt.scatter(Y_train, network_fnn.predict(X_train), alpha = 0.1)plt.xlabel('Actual')plt.ylabel('Predicted')plt.title('training points')plt.show()plt.scatter(Y_valid, network_fnn.predict(X_valid), alpha = 0.1)plt.xlabel('Actual')plt.ylabel('Predicted')plt.title('valid points')plt.show()print("LSTM")## LSTM model.X_lstm_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)X_lstm_valid = X_valid.reshape(X_valid.shape[0], X_valid.shape[1], 1)tick = time.time()# Define model.network_lstm = models.Sequential()network_lstm.add(layers.LSTM(32, activation = 'relu', input_shape = (X_lstm_train.shape[1], 1)))network_lstm.add(layers.Dense(1, activation = None))# Compile model.network_lstm.compile(optimizer = 'adam', loss = 'mean_squared_error')# Fit model.history_lstm = network_lstm.fit(X_lstm_train, Y_train, epochs = 500, batch_size = 128, verbose = False,    validation_data = (X_lstm_valid, Y_valid))tock = time.time()print()print(str('%.2f' % ((tock - tick) / 60)) + ' minutes.')print("now eval")loss_lstm = history_lstm.history['loss']val_loss_lstm = history_lstm.history['val_loss']epochs_lstm = range(1, len(loss_lstm) + 1)print("train loss: ", loss_lstm[-1])print("validation loss: ", val_loss_lstm[-1])plt.plot(epochs_lstm, loss_lstm, 'black', label = 'Training Loss')plt.plot(epochs_lstm, val_loss_lstm, 'red', label = 'Validation Loss')plt.title('LSTM: Training and Validation Loss')plt.legend()plt.show()plt.scatter(Y_train, network_lstm.predict(X_lstm_train), alpha = 0.1)plt.xlabel('Actual')plt.ylabel('Predicted')plt.title('training')plt.show()plt.scatter(Y_valid, network_lstm.predict(X_lstm_valid), alpha = 0.1)plt.xlabel('Actual')plt.ylabel('Predicted')plt.title("validation")plt.show()