How to improve accuracy of Tensorflow camera demo on iOS for retrained graph How to improve accuracy of Tensorflow camera demo on iOS for retrained graph android android

How to improve accuracy of Tensorflow camera demo on iOS for retrained graph


Since you are not using YOLO Detector the MAINTAIN_ASPECT flag is set to false. Hence the image on Android app is not getting cropped, but it's scaled. However, in the code snippet provided I don't see the actual initialisation of the flag. Confirm that the value of the flag is actually false in your app.

I know this isn't a complete solution but hope this helps you in debugging the issue.


Tensorflow Object detection have default and standard configurations, below is the list of settings,

Important things you need to check based on your input ML model,

-> model_file_name - This according to your .pb file name,

-> model_uses_memory_mapping - It's up to you to reduce overall memory usage.

-> labels_file_name - This varies based on our label file name,

-> input_layer_name/output_layer_name - Make sure you are using your own layer input/output names which you are using during graph(.pb) file creation.

snippet:

// If you have your own model, modify this to the file name, and make sure// you've added the file to your app resources too.static NSString* model_file_name = @"graph";//@"tensorflow_inception_graph";static NSString* model_file_type = @"pb";// This controls whether we'll be loading a plain GraphDef proto, or a// file created by the convert_graphdef_memmapped_format utility that wraps a// GraphDef and parameter file that can be mapped into memory from file to// reduce overall memory usage.const bool model_uses_memory_mapping = true;// If you have your own model, point this to the labels file.static NSString* labels_file_name = @"labels";//@"imagenet_comp_graph_label_strings";static NSString* labels_file_type = @"txt";// These dimensions need to match those the model was trained with.const int wanted_input_width = 224;const int wanted_input_height = 224;const int wanted_input_channels = 3;const float input_mean = 117.0f;const float input_std = 1.0f;const std::string input_layer_name = "input";const std::string output_layer_name = "final_result";

Custom Image Tensorflow detection, you can use below working snippet:

-> For this process you just need to pass the UIImage.CGImage object,

NSString* RunInferenceOnImageResult(CGImageRef image) {    tensorflow::SessionOptions options;    tensorflow::Session* session_pointer = nullptr;    tensorflow::Status session_status = tensorflow::NewSession(options, &session_pointer);    if (!session_status.ok()) {        std::string status_string = session_status.ToString();        return [NSString stringWithFormat: @"Session create failed - %s",                status_string.c_str()];    }    std::unique_ptr<tensorflow::Session> session(session_pointer);    LOG(INFO) << "Session created.";    tensorflow::GraphDef tensorflow_graph;    LOG(INFO) << "Graph created.";    NSString* network_path = FilePathForResourceNames(@"tensorflow_inception_graph", @"pb");    PortableReadFileToProtol([network_path UTF8String], &tensorflow_graph);    LOG(INFO) << "Creating session.";    tensorflow::Status s = session->Create(tensorflow_graph);    if (!s.ok()) {        LOG(ERROR) << "Could not create TensorFlow Graph: " << s;        return @"";    }    // Read the label list    NSString* labels_path = FilePathForResourceNames(@"imagenet_comp_graph_label_strings", @"txt");    std::vector<std::string> label_strings;    std::ifstream t;    t.open([labels_path UTF8String]);    std::string line;    while(t){        std::getline(t, line);        label_strings.push_back(line);    }    t.close();    // Read the Grace Hopper image.    //NSString* image_path = FilePathForResourceNames(@"grace_hopper", @"jpg");    int image_width;    int image_height;    int image_channels;//    std::vector<tensorflow::uint8> image_data = LoadImageFromFile(//                                                                  [image_path UTF8String], &image_width, &image_height, &image_channels);    std::vector<tensorflow::uint8> image_data = LoadImageFromImage(image,&image_width, &image_height, &image_channels);    const int wanted_width = 224;    const int wanted_height = 224;    const int wanted_channels = 3;    const float input_mean = 117.0f;    const float input_std = 1.0f;    assert(image_channels >= wanted_channels);    tensorflow::Tensor image_tensor(                                    tensorflow::DT_FLOAT,                                    tensorflow::TensorShape({        1, wanted_height, wanted_width, wanted_channels}));    auto image_tensor_mapped = image_tensor.tensor<float, 4>();    tensorflow::uint8* in = image_data.data();    // tensorflow::uint8* in_end = (in + (image_height * image_width * image_channels));    float* out = image_tensor_mapped.data();    for (int y = 0; y < wanted_height; ++y) {        const int in_y = (y * image_height) / wanted_height;        tensorflow::uint8* in_row = in + (in_y * image_width * image_channels);        float* out_row = out + (y * wanted_width * wanted_channels);        for (int x = 0; x < wanted_width; ++x) {            const int in_x = (x * image_width) / wanted_width;            tensorflow::uint8* in_pixel = in_row + (in_x * image_channels);            float* out_pixel = out_row + (x * wanted_channels);            for (int c = 0; c < wanted_channels; ++c) {                out_pixel[c] = (in_pixel[c] - input_mean) / input_std;            }        }    }    NSString* result;//    result = [NSString stringWithFormat: @"%@ - %lu, %s - %dx%d", result,//              label_strings.size(), label_strings[0].c_str(), image_width, image_height];    std::string input_layer = "input";    std::string output_layer = "output";    std::vector<tensorflow::Tensor> outputs;    tensorflow::Status run_status = session->Run({{input_layer, image_tensor}},                                                 {output_layer}, {}, &outputs);    if (!run_status.ok()) {        LOG(ERROR) << "Running model failed: " << run_status;        tensorflow::LogAllRegisteredKernels();        result = @"Error running model";        return result;    }    tensorflow::string status_string = run_status.ToString();    result = [NSString stringWithFormat: @"Status :%s\n",              status_string.c_str()];    tensorflow::Tensor* output = &outputs[0];    const int kNumResults = 5;    const float kThreshold = 0.1f;    std::vector<std::pair<float, int> > top_results;    GetTopN(output->flat<float>(), kNumResults, kThreshold, &top_results);    std::stringstream ss;    ss.precision(3);    for (const auto& result : top_results) {        const float confidence = result.first;        const int index = result.second;        ss << index << " " << confidence << "  ";        // Write out the result as a string        if (index < label_strings.size()) {            // just for safety: theoretically, the output is under 1000 unless there            // is some numerical issues leading to a wrong prediction.            ss << label_strings[index];        } else {            ss << "Prediction: " << index;        }        ss << "\n";    }    LOG(INFO) << "Predictions: " << ss.str();    tensorflow::string predictions = ss.str();    result = [NSString stringWithFormat: @"%@ - %s", result,              predictions.c_str()];    return result;}

Scaling Image for custom width and height - C++ code snippet,

std::vector<uint8> LoadImageFromImage(CGImageRef image,                                     int* out_width, int* out_height,                                     int* out_channels) {    const int width = (int)CGImageGetWidth(image);    const int height = (int)CGImageGetHeight(image);    const int channels = 4;    CGColorSpaceRef color_space = CGColorSpaceCreateDeviceRGB();    const int bytes_per_row = (width * channels);    const int bytes_in_image = (bytes_per_row * height);    std::vector<uint8> result(bytes_in_image);    const int bits_per_component = 8;    CGContextRef context = CGBitmapContextCreate(result.data(), width, height,                                                 bits_per_component, bytes_per_row, color_space,                                                 kCGImageAlphaPremultipliedLast | kCGBitmapByteOrder32Big);    CGColorSpaceRelease(color_space);    CGContextDrawImage(context, CGRectMake(0, 0, width, height), image);    CGContextRelease(context);    CFRelease(image);    *out_width = width;    *out_height = height;    *out_channels = channels;    return result;}

Above function helps you to load the image data based on your custom ratio. High accurate image pixel ratio for both Width and height during tensorflow classification is 224 x 224.

You need to call above LoadImage function from RunInferenceOnImageResult, with actual custom width and height arguments along with Image reference.


Please change at this code:

// If you have your own model, modify this to the file name, and make sure// you've added the file to your app resources too.static NSString* model_file_name = @"tensorflow_inception_graph";static NSString* model_file_type = @"pb";// This controls whether we'll be loading a plain GraphDef proto, or a// file created by the convert_graphdef_memmapped_format utility that wraps a// GraphDef and parameter file that can be mapped into memory from file to// reduce overall memory usage.const bool model_uses_memory_mapping = false;// If you have your own model, point this to the labels file.static NSString* labels_file_name = @"imagenet_comp_graph_label_strings";static NSString* labels_file_type = @"txt";// These dimensions need to match those the model was trained with.const int wanted_input_width = 299;const int wanted_input_height = 299;const int wanted_input_channels = 3;const float input_mean = 128f;const float input_std = 1.0f;const std::string input_layer_name = "Mul";const std::string output_layer_name = "final_result";

Here change : const float input_std = 1.0f;