efficiency of fwrite for massive numbers of small writes

First of all, fwrite() is a library and not a system call. Secondly, it already buffers the data.

You might want to experiment with increasing the size of the buffer. This is done by using setvbuf(). On my system this only helps a tiny bit, but YMMV.

If setvbuf() does not help, you could do your own buffering and only call fwrite() once you've accumulated enough data. This involves more work, but will almost certainly speed up the writing as your own buffering can be made much more lightweight that fwrite()'s.

edit: If anyone tells you that it's the sheer number of fwrite() calls that is the problem, demand to see evidence. Better still, do your own performance tests. On my computer, 500,000,000 two-byte writes using fwrite() take 11 seconds. This equates to throughput of about 90MB/s.

Last but not least, the huge discrepancy between 11 seconds in my test and one hour mentioned in your question hints at the possibility that there's something else going on in your code that's causing the very poor performance.

c++ unix fwrite system-calls

your problem is not the buffering for fwrite(), but the total overhead of making the library call with small amounts of data. if you write just 1MB of data, you make 250000 function calls. you'd better try to collect your data in memory and then write to the disk with one single call to fwrite().

UPDATE: if you need an evidence:

$ dd if=/dev/zero of=/dev/null count=50000000 bs=250000000+0 records in50000000+0 records out100000000 bytes (100 MB) copied, 55.3583 s, 1.8 MB/s$ dd if=/dev/zero of=/dev/null count=50 bs=200000050+0 records in50+0 records out100000000 bytes (100 MB) copied, 0.0122651 s, 8.2 GB/s

c++ unix fwrite system-calls

OK, well, that was interesting. I thought I'd write some actual code to see what the speed was. And here it is. Compiled using C++ DevStudio 2010 Express. There's quite a bit of code here. It times 5 ways of writing the data:-

Naively calling fwrite
Using a buffer and doing fewer calls to fwrite using bigger buffers
Using the Win32 API naively
Using a buffer and doing fewer calls to Win32 using bigger buffers
Using Win32 but double buffering the output and using asynchronous writes

Please check that I've not done something a bit stupid with any of the above.

The program uses QueryPerformanceCounter for timing the code and ends the timing after the file has been closed to try and include any pending internal buffered data.

The results on my machine (an old WinXP SP3 box):-

fwrite on its own is generally the fastest although the buffered version can sometimes beat it if you get the size and iterations just right.
Naive Win32 is significantly slower
Buffered Win32 doubles the speed but it is still easily beaten by fwrite
Asynchronous writes were not significantly better than the buffered version. Perhaps someone could check my code and make sure I've not done something stupid as I've never really used the asynchronous IO before.

You may get different results depending on your setup.

Feel free to edit and improve the code.

    #define _CRT_SECURE_NO_WARNINGS    #include <stdio.h>    #include <memory.h>    #include <Windows.h>    const int        // how many times fwrite/my_fwrite is called        c_iterations = 10000000,        // the size of the buffer used by my_fwrite        c_buffer_size = 100000;    char         buffer1 [c_buffer_size],        buffer2 [c_buffer_size],        *current_buffer = buffer1;    int        write_ptr = 0;    __int64        write_offset = 0;    OVERLAPPED        overlapped = {0};    // write to a buffer, when buffer full, write the buffer to the file using fwrite    void my_fwrite (void *ptr, int size, int count, FILE *fp)    {        const int            c = size * count;        if (write_ptr + c > c_buffer_size)        {            fwrite (buffer1, write_ptr, 1, fp);            write_ptr = 0;        }        memcpy (&buffer1 [write_ptr], ptr, c);        write_ptr += c;    }    // write to a buffer, when buffer full, write the buffer to the file using Win32 WriteFile    void my_fwrite (void *ptr, int size, int count, HANDLE fp)    {        const int            c = size * count;        if (write_ptr + c > c_buffer_size)        {            DWORD                written;            WriteFile (fp, buffer1, write_ptr, &written, 0);            write_ptr = 0;        }        memcpy (&buffer1 [write_ptr], ptr, c);        write_ptr += c;    }    // write to a double buffer, when buffer full, write the buffer to the file using     // asynchronous WriteFile (waiting for previous write to complete)    void my_fwrite (void *ptr, int size, int count, HANDLE fp, HANDLE wait)    {        const int            c = size * count;        if (write_ptr + c > c_buffer_size)        {            WaitForSingleObject (wait, INFINITE);            overlapped.Offset = write_offset & 0xffffffff;            overlapped.OffsetHigh = write_offset >> 32;            overlapped.hEvent = wait;            WriteFile (fp, current_buffer, write_ptr, 0, &overlapped);            write_offset += write_ptr;            write_ptr = 0;            current_buffer = current_buffer == buffer1 ? buffer2 : buffer1;        }        memcpy (current_buffer + write_ptr, ptr, c);        write_ptr += c;    }    int main ()    {        // do lots of little writes        FILE            *f1 = fopen ("f1.bin", "wb");        LARGE_INTEGER            f1_start,            f1_end;        QueryPerformanceCounter (&f1_start);        for (int i = 0 ; i < c_iterations ; ++i)        {            fwrite (&i, sizeof i, 1, f1);        }        fclose (f1);        QueryPerformanceCounter (&f1_end);        // do a few big writes        FILE            *f2 = fopen ("f2.bin", "wb");        LARGE_INTEGER            f2_start,            f2_end;        QueryPerformanceCounter (&f2_start);        for (int i = 0 ; i < c_iterations ; ++i)        {            my_fwrite (&i, sizeof i, 1, f2);        }        if (write_ptr)        {            fwrite (buffer1, write_ptr, 1, f2);            write_ptr = 0;        }        fclose (f2);        QueryPerformanceCounter (&f2_end);        // use Win32 API, without buffer        HANDLE            f3 = CreateFile (TEXT ("f3.bin"), GENERIC_WRITE, 0, 0, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, 0);        LARGE_INTEGER            f3_start,            f3_end;        QueryPerformanceCounter (&f3_start);        for (int i = 0 ; i < c_iterations ; ++i)        {            DWORD                written;            WriteFile (f3, &i, sizeof i, &written, 0);        }        CloseHandle (f3);        QueryPerformanceCounter (&f3_end);        // use Win32 API, with buffer        HANDLE            f4 = CreateFile (TEXT ("f4.bin"), GENERIC_WRITE, 0, 0, CREATE_ALWAYS, FILE_FLAG_WRITE_THROUGH, 0);        LARGE_INTEGER            f4_start,            f4_end;        QueryPerformanceCounter (&f4_start);        for (int i = 0 ; i < c_iterations ; ++i)        {            my_fwrite (&i, sizeof i, 1, f4);        }        if (write_ptr)        {            DWORD                written;            WriteFile (f4, buffer1, write_ptr, &written, 0);            write_ptr = 0;        }        CloseHandle (f4);        QueryPerformanceCounter (&f4_end);        // use Win32 API, with double buffering        HANDLE            f5 = CreateFile (TEXT ("f5.bin"), GENERIC_WRITE, 0, 0, CREATE_ALWAYS, FILE_FLAG_OVERLAPPED | FILE_FLAG_WRITE_THROUGH, 0),            wait = CreateEvent (0, false, true, 0);        LARGE_INTEGER            f5_start,            f5_end;        QueryPerformanceCounter (&f5_start);        for (int i = 0 ; i < c_iterations ; ++i)        {            my_fwrite (&i, sizeof i, 1, f5, wait);        }        if (write_ptr)        {            WaitForSingleObject (wait, INFINITE);            overlapped.Offset = write_offset & 0xffffffff;            overlapped.OffsetHigh = write_offset >> 32;            overlapped.hEvent = wait;            WriteFile (f5, current_buffer, write_ptr, 0, &overlapped);            WaitForSingleObject (wait, INFINITE);            write_ptr = 0;        }        CloseHandle (f5);        QueryPerformanceCounter (&f5_end);        CloseHandle (wait);        LARGE_INTEGER            freq;        QueryPerformanceFrequency (&freq);        printf ("  fwrites without buffering = %dms\n", (1000 * (f1_end.QuadPart - f1_start.QuadPart)) / freq.QuadPart);        printf ("     fwrites with buffering = %dms\n", (1000 * (f2_end.QuadPart - f2_start.QuadPart)) / freq.QuadPart);        printf ("    Win32 without buffering = %dms\n", (1000 * (f3_end.QuadPart - f3_start.QuadPart)) / freq.QuadPart);        printf ("       Win32 with buffering = %dms\n", (1000 * (f4_end.QuadPart - f4_start.QuadPart)) / freq.QuadPart);        printf ("Win32 with double buffering = %dms\n", (1000 * (f5_end.QuadPart - f5_start.QuadPart)) / freq.QuadPart);    }

CodeHunter

efficiency of fwrite for massive numbers of small writes

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last