How to efficiently write large files to disk on background thread (Swift) How to efficiently write large files to disk on background thread (Swift) multithreading multithreading

How to efficiently write large files to disk on background thread (Swift)


Performance depends wether or not the data fits in RAM. If it does, then you should use NSData writeToURL with the atomically feature turned on, which is what you're doing.

Apple's notes about this being dangerous when "writing to a public directory" are completely irrelevant on iOS because there are no public directories. That section only applies to OS X. And frankly it's not really important there either.

So, the code you've written is as efficient as possible as long as the video fits in RAM (about 100MB would be a safe limit).

For files that don't fit in RAM, you need to use a stream or your app will crash while holding the video in memory. To download a large video from a server and write it to disk, you should use NSURLSessionDownloadTask.

In general, streaming (including NSURLSessionDownloadTask) will be orders of magnitude slower than NSData.writeToURL(). So don't use a stream unless you need to. All operations on NSData are extremely fast, it is perfectly capable of dealing with files that are multiple terabytes in size with excellent performance on OS X (iOS obviously can't have files that large, but it's the same class with the same performance).


There are a few issues in your code.

This is wrong:

let filePath = NSTemporaryDirectory() + named

Instead always do:

let filePath = NSTemporaryDirectory().stringByAppendingPathComponent(named)

But that's not ideal either, you should avoid using paths (they are buggy and slow). Instead use a URL like this:

let tmpDir = NSURL(fileURLWithPath: NSTemporaryDirectory())!let fileURL = tmpDir.URLByAppendingPathComponent(named)

Also, you're using a path to check if the file exists... don't do this:

if NSFileManager.defaultManager().fileExistsAtPath( filePath ) {

Instead use NSURL to check if it exists:

if fileURL.checkResourceIsReachableAndReturnError(nil) {


Latest Solution (2018)

Another useful possibility might include the use of a closure whenever the buffer is filled (or if you've used a timed length of recording) to append the data and also to announce the end of the stream of data. In combination with some of the Photo APIs this could lead to good outcomes. So some declarative code like below could be fired during processing:

var dataSpoolingFinished: ((URL?, Error?) -> Void)?var dataSpooling: ((Data?, Error?) -> Void)?

Handling these closures in your management object may allow you to succinctly handle data of any size while keeping the memory under control.

Couple that idea with the use of a recursive method that aggregates pieces of work into a single dispatch_group and there could be some exciting possibilities.

Apple docs state:

DispatchGroup allows for aggregate synchronization of work. You can use them to submit multiple different work items and track when they all complete, even though they might run on different queues. This behavior can be helpful when progress can’t be made until all of the specified tasks are complete.

Other Noteworthy Solutions (~2016)

I have no doubt that I will refine this some more but the topic is complex enough to warrant a separate self-answer. I decided to take some advice from the other answers and leverage the NSStream subclasses. This solution is based on an Obj-C sample (NSInputStream inputStreamWithURL example ios, 2013, May 12) posted over on the SampleCodeBank blog.

Apple documentation notes that with an NSStream subclass you do NOT have to load all data into memory at once. That is the key to being able to manage multimedia files of any size (not exceeding available disk or RAM space).

NSStream is an abstract class for objects representing streams. Its interface is common to all Cocoa stream classes, including its concrete subclasses NSInputStream and NSOutputStream.

NSStream objects provide an easy way to read and write data to and from a variety of media in a device-independent way. You can create stream objects for data located in memory, in a file, or on a network (using sockets), and you can use stream objects without loading all of the data into memory at once.

File System Programming Guide

Apple's Processing an Entire File Linearly Using Streams article in the FSPG also provided the notion that NSInputStream and NSOutputStream should be inherently thread safe.

file-processing-with-streams

Further Refinements

This object doesn't use stream delegation methods. Plenty of room for other refinements as well but this is the basic approach I will take. The main focus on the iPhone is enabling the large file management while constraining the memory via a buffer (TBD - Leverage the outputStream in-memory buffer). To be clear, Apple does mention that their convenience functions that writeToURL are only for smaller file sizes (but makes me wonder why they don't take care of the larger files - These are not edge cases, note - will file question as a bug).

Conclusion

I will have to test further for integrating on a background thread as I don't want to interfere with any NSStream internal queuing. I have some other objects that use similar ideas to manage extremely large data files over the wire. The best method is to keep file sizes as small as possible in iOS to conserve memory and prevent app crashes. The APIs are built with these constraints in mind (which is why attempting unlimited video is not a good idea), so I will have to adapt expectations overall.

(Gist Source, Check gist for latest changes)

import Foundationimport Darwin.Mach.mach_timeclass MNGStreamReaderWriter:NSObject {    var copyOutput:NSOutputStream?    var fileInput:NSInputStream?    var outputStream:NSOutputStream? = NSOutputStream(toMemory: ())    var urlInput:NSURL?    convenience init(srcURL:NSURL, targetURL:NSURL) {        self.init()        self.fileInput  = NSInputStream(URL: srcURL)        self.copyOutput = NSOutputStream(URL: targetURL, append: false)        self.urlInput   = srcURL    }    func copyFileURLToURL(destURL:NSURL, withProgressBlock block: (fileSize:Double,percent:Double,estimatedTimeRemaining:Double) -> ()){        guard let copyOutput = self.copyOutput, let fileInput = self.fileInput, let urlInput = self.urlInput else { return }        let fileSize            = sizeOfInputFile(urlInput)        let bufferSize          = 4096        let buffer              = UnsafeMutablePointer<UInt8>.alloc(bufferSize)        var bytesToWrite        = 0        var bytesWritten        = 0        var counter             = 0        var copySize            = 0        fileInput.open()        copyOutput.open()        //start time        let time0 = mach_absolute_time()        while fileInput.hasBytesAvailable {            repeat {                bytesToWrite    = fileInput.read(buffer, maxLength: bufferSize)                bytesWritten    = copyOutput.write(buffer, maxLength: bufferSize)                //check for errors                if bytesToWrite < 0 {                    print(fileInput.streamStatus.rawValue)                }                if bytesWritten == -1 {                    print(copyOutput.streamStatus.rawValue)                }                //move read pointer to next section                bytesToWrite -= bytesWritten                copySize += bytesWritten            if bytesToWrite > 0 {                //move block of memory                memmove(buffer, buffer + bytesWritten, bytesToWrite)                }            } while bytesToWrite > 0            if fileSize != nil && (++counter % 10 == 0) {                //passback a progress tuple                let percent     = Double(copySize/fileSize!)                let time1       = mach_absolute_time()                let elapsed     = Double (time1 - time0)/Double(NSEC_PER_SEC)                let estTimeLeft = ((1 - percent) / percent) * elapsed                block(fileSize: Double(copySize), percent: percent, estimatedTimeRemaining: estTimeLeft)            }        }        //send final progress tuple        block(fileSize: Double(copySize), percent: 1, estimatedTimeRemaining: 0)        //close streams        if fileInput.streamStatus == .AtEnd {            fileInput.close()        }        if copyOutput.streamStatus != .Writing && copyOutput.streamStatus != .Error {            copyOutput.close()        }    }    func sizeOfInputFile(src:NSURL) -> Int? {        do {            let fileSize = try NSFileManager.defaultManager().attributesOfItemAtPath(src.path!)            return fileSize["fileSize"]  as? Int        } catch let inputFileError as NSError {            print(inputFileError.localizedDescription,inputFileError.localizedRecoverySuggestion)        }        return nil    }}

Delegation

Here's a similar object that I rewrote from an article on Advanced File I/O in the background, Eidhof,C., ObjC.io). With just a few tweaks this could be made to emulate the behavior above. Simply redirect the data to an NSOutputStream in the processDataChunk method.

(Gist Source - Check gist for latest changes)

import Foundationclass MNGStreamReader: NSObject, NSStreamDelegate {    var callback: ((lineNumber: UInt , stringValue: String) -> ())?    var completion: ((Int) -> Void)?    var fileURL:NSURL?    var inputData:NSData?    var inputStream: NSInputStream?    var lineNumber:UInt = 0    var queue:NSOperationQueue?    var remainder:NSMutableData?    var delimiter:NSData?    //var reader:NSInputStreamReader?    func enumerateLinesWithBlock(block: (UInt, String)->() , completionHandler completion:(numberOfLines:Int) -> Void ) {        if self.queue == nil {            self.queue = NSOperationQueue()            self.queue!.maxConcurrentOperationCount = 1        }        assert(self.queue!.maxConcurrentOperationCount == 1, "Queue can't be concurrent.")        assert(self.inputStream == nil, "Cannot process multiple input streams in parallel")        self.callback = block        self.completion = completion        if self.fileURL != nil {            self.inputStream = NSInputStream(URL: self.fileURL!)        } else if self.inputData != nil {            self.inputStream = NSInputStream(data: self.inputData!)        }        self.inputStream!.delegate = self        self.inputStream!.scheduleInRunLoop(NSRunLoop.currentRunLoop(), forMode: NSDefaultRunLoopMode)        self.inputStream!.open()    }    convenience init? (withData inbound:NSData) {        self.init()        self.inputData = inbound        self.delimiter = "\n".dataUsingEncoding(NSUTF8StringEncoding)    }    convenience init? (withFileAtURL fileURL: NSURL) {        guard !fileURL.fileURL else { return nil }        self.init()        self.fileURL = fileURL        self.delimiter = "\n".dataUsingEncoding(NSUTF8StringEncoding)    }    @objc func stream(aStream: NSStream, handleEvent eventCode: NSStreamEvent){        switch eventCode {        case NSStreamEvent.OpenCompleted:            fallthrough        case NSStreamEvent.EndEncountered:            self.emitLineWithData(self.remainder!)            self.remainder = nil            self.inputStream!.close()            self.inputStream = nil            self.queue!.addOperationWithBlock({ () -> Void in                self.completion!(Int(self.lineNumber) + 1)            })            break        case NSStreamEvent.ErrorOccurred:            NSLog("error")            break        case NSStreamEvent.HasSpaceAvailable:            NSLog("HasSpaceAvailable")            break        case NSStreamEvent.HasBytesAvailable:            NSLog("HasBytesAvaible")            if let buffer = NSMutableData(capacity: 4096) {                let length = self.inputStream!.read(UnsafeMutablePointer<UInt8>(buffer.mutableBytes), maxLength: buffer.length)                if 0 < length {                    buffer.length = length                    self.queue!.addOperationWithBlock({ [weak self]  () -> Void in                        self!.processDataChunk(buffer)                        })                }            }            break        default:            break        }    }    func processDataChunk(buffer: NSMutableData) {        if self.remainder != nil {            self.remainder!.appendData(buffer)        } else {            self.remainder = buffer        }        self.remainder!.mng_enumerateComponentsSeparatedBy(self.delimiter!, block: {( component: NSData, last: Bool) in            if !last {                self.emitLineWithData(component)            }            else {                if 0 < component.length {                    self.remainder = (component.mutableCopy() as! NSMutableData)                }                else {                    self.remainder = nil                }            }        })    }    func emitLineWithData(data: NSData) {        let lineNumber = self.lineNumber        self.lineNumber = lineNumber + 1        if 0 < data.length {            if let line = NSString(data: data, encoding: NSUTF8StringEncoding) {                callback!(lineNumber: lineNumber, stringValue: line as String)            }        }    }}


You should consider using NSStream (NSOutputStream/NSInputStream). If you are going to choose this approach, keep in mind that background thread run loop will need to be started (run) explicitly.

NSOutputStream has a method called outputStreamToFileAtPath:append: which is what you might be looking for.

Similar question :

Writing a String to an NSOutputStream in Swift