How to perform k-fold cross validation with tensorflow? How to perform k-fold cross validation with tensorflow? python python

How to perform k-fold cross validation with tensorflow?


I know this question is old but in case someone is looking to do something similar, expanding on ahmedhosny's answer:

The new tensorflow datasets API has the ability to create dataset objects using python generators, so along with scikit-learn's KFold one option can be to create a dataset from the KFold.split() generator:

import numpy as npfrom sklearn.model_selection import LeaveOneOut,KFoldimport tensorflow as tfimport tensorflow.contrib.eager as tfetf.enable_eager_execution()from sklearn.datasets import load_irisdata = load_iris()X=data['data']y=data['target']def make_dataset(X_data,y_data,n_splits):    def gen():        for train_index, test_index in KFold(n_splits).split(X_data):            X_train, X_test = X_data[train_index], X_data[test_index]            y_train, y_test = y_data[train_index], y_data[test_index]            yield X_train,y_train,X_test,y_test    return tf.data.Dataset.from_generator(gen, (tf.float64,tf.float64,tf.float64,tf.float64))dataset=make_dataset(X,y,10)

Then one can iterate through the dataset either in the graph based tensorflow or using eager execution. Using eager execution:

for X_train,y_train,X_test,y_test in tfe.Iterator(dataset):    ....


NN's are usually used with large datasets where CV is not used - and very expensive. In the case of IRIS (50 samples for each species), you probably need it..why not use scikit-learn with different random seeds to split your training and testing?

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

for k in kfold:

  1. split data differently passing a different value to "random_state"
  2. learn the net using _train
  3. test using _test

If you dont like the random seed and want a more structured k-fold split,you can use this taken from here.

from sklearn.model_selection import KFold, cross_val_scoreX = ["a", "a", "b", "c", "c", "c"]k_fold = KFold(n_splits=3)for train_indices, test_indices in k_fold.split(X):    print('Train: %s | test: %s' % (train_indices, test_indices))Train: [2 3 4 5] | test: [0 1]Train: [0 1 4 5] | test: [2 3]Train: [0 1 2 3] | test: [4 5]


modifying @ahmedhosny answer

from sklearn.model_selection import KFold, cross_val_scorek_fold = KFold(n_splits=k)train_ = []test_ = []for train_indices, test_indices in k_fold.split(all_data.index):    train_.append(train_indices)    test_.append(test_indices)