How to efficiently write large files to disk on background thread (Swift)

ios swift multithreading large-files large-data

Performance depends wether or not the data fits in RAM. If it does, then you should use NSData writeToURL with the atomically feature turned on, which is what you're doing.

Apple's notes about this being dangerous when "writing to a public directory" are completely irrelevant on iOS because there are no public directories. That section only applies to OS X. And frankly it's not really important there either.

So, the code you've written is as efficient as possible as long as the video fits in RAM (about 100MB would be a safe limit).

For files that don't fit in RAM, you need to use a stream or your app will crash while holding the video in memory. To download a large video from a server and write it to disk, you should use NSURLSessionDownloadTask.

In general, streaming (including NSURLSessionDownloadTask) will be orders of magnitude slower than NSData.writeToURL(). So don't use a stream unless you need to. All operations on NSData are extremely fast, it is perfectly capable of dealing with files that are multiple terabytes in size with excellent performance on OS X (iOS obviously can't have files that large, but it's the same class with the same performance).

There are a few issues in your code.

This is wrong:

let filePath = NSTemporaryDirectory() + named

Instead always do:

let filePath = NSTemporaryDirectory().stringByAppendingPathComponent(named)

But that's not ideal either, you should avoid using paths (they are buggy and slow). Instead use a URL like this:

let tmpDir = NSURL(fileURLWithPath: NSTemporaryDirectory())!let fileURL = tmpDir.URLByAppendingPathComponent(named)

Also, you're using a path to check if the file exists... don't do this:

if NSFileManager.defaultManager().fileExistsAtPath( filePath ) {

Instead use NSURL to check if it exists:

if fileURL.checkResourceIsReachableAndReturnError(nil) {

ios swift multithreading large-files large-data

Latest Solution (2018)

Another useful possibility might include the use of a closure whenever the buffer is filled (or if you've used a timed length of recording) to append the data and also to announce the end of the stream of data. In combination with some of the Photo APIs this could lead to good outcomes. So some declarative code like below could be fired during processing:

var dataSpoolingFinished: ((URL?, Error?) -> Void)?var dataSpooling: ((Data?, Error?) -> Void)?

Handling these closures in your management object may allow you to succinctly handle data of any size while keeping the memory under control.

Couple that idea with the use of a recursive method that aggregates pieces of work into a single dispatch_group and there could be some exciting possibilities.

Apple docs state:

DispatchGroup allows for aggregate synchronization of work. You can use them to submit multiple different work items and track when they all complete, even though they might run on different queues. This behavior can be helpful when progress can’t be made until all of the specified tasks are complete.

File System Programming Guide

Apple's Processing an Entire File Linearly Using Streams article in the FSPG also provided the notion that NSInputStream and NSOutputStream should be inherently thread safe.

Further Refinements

This object doesn't use stream delegation methods. Plenty of room for other refinements as well but this is the basic approach I will take. The main focus on the iPhone is enabling the large file management while constraining the memory via a buffer (TBD - Leverage the outputStream in-memory buffer). To be clear, Apple does mention that their convenience functions that writeToURL are only for smaller file sizes (but makes me wonder why they don't take care of the larger files - These are not edge cases, note - will file question as a bug).

Conclusion

I will have to test further for integrating on a background thread as I don't want to interfere with any NSStream internal queuing. I have some other objects that use similar ideas to manage extremely large data files over the wire. The best method is to keep file sizes as small as possible in iOS to conserve memory and prevent app crashes. The APIs are built with these constraints in mind (which is why attempting unlimited video is not a good idea), so I will have to adapt expectations overall.

(Gist Source, Check gist for latest changes)

import Foundationimport Darwin.Mach.mach_timeclass MNGStreamReaderWriter:NSObject {    var copyOutput:NSOutputStream?    var fileInput:NSInputStream?    var outputStream:NSOutputStream? = NSOutputStream(toMemory: ())    var urlInput:NSURL?    convenience init(srcURL:NSURL, targetURL:NSURL) {        self.init()        self.fileInput  = NSInputStream(URL: srcURL)        self.copyOutput = NSOutputStream(URL: targetURL, append: false)        self.urlInput   = srcURL    }    func copyFileURLToURL(destURL:NSURL, withProgressBlock block: (fileSize:Double,percent:Double,estimatedTimeRemaining:Double) -> ()){        guard let copyOutput = self.copyOutput, let fileInput = self.fileInput, let urlInput = self.urlInput else { return }        let fileSize            = sizeOfInputFile(urlInput)        let bufferSize          = 4096        let buffer              = UnsafeMutablePointer<UInt8>.alloc(bufferSize)        var bytesToWrite        = 0        var bytesWritten        = 0        var counter             = 0        var copySize            = 0        fileInput.open()        copyOutput.open()        //start time        let time0 = mach_absolute_time()        while fileInput.hasBytesAvailable {            repeat {                bytesToWrite    = fileInput.read(buffer, maxLength: bufferSize)                bytesWritten    = copyOutput.write(buffer, maxLength: bufferSize)                //check for errors                if bytesToWrite < 0 {                    print(fileInput.streamStatus.rawValue)                }                if bytesWritten == -1 {                    print(copyOutput.streamStatus.rawValue)                }                //move read pointer to next section                bytesToWrite -= bytesWritten                copySize += bytesWritten            if bytesToWrite > 0 {                //move block of memory                memmove(buffer, buffer + bytesWritten, bytesToWrite)                }            } while bytesToWrite > 0            if fileSize != nil && (++counter % 10 == 0) {                //passback a progress tuple                let percent     = Double(copySize/fileSize!)                let time1       = mach_absolute_time()                let elapsed     = Double (time1 - time0)/Double(NSEC_PER_SEC)                let estTimeLeft = ((1 - percent) / percent) * elapsed                block(fileSize: Double(copySize), percent: percent, estimatedTimeRemaining: estTimeLeft)            }        }        //send final progress tuple        block(fileSize: Double(copySize), percent: 1, estimatedTimeRemaining: 0)        //close streams        if fileInput.streamStatus == .AtEnd {            fileInput.close()        }        if copyOutput.streamStatus != .Writing && copyOutput.streamStatus != .Error {            copyOutput.close()        }    }    func sizeOfInputFile(src:NSURL) -> Int? {        do {            let fileSize = try NSFileManager.defaultManager().attributesOfItemAtPath(src.path!)            return fileSize["fileSize"]  as? Int        } catch let inputFileError as NSError {            print(inputFileError.localizedDescription,inputFileError.localizedRecoverySuggestion)        }        return nil    }}

Delegation

Here's a similar object that I rewrote from an article on Advanced File I/O in the background, Eidhof,C., ObjC.io). With just a few tweaks this could be made to emulate the behavior above. Simply redirect the data to an NSOutputStream in the processDataChunk method.

(Gist Source - Check gist for latest changes)

import Foundationclass MNGStreamReader: NSObject, NSStreamDelegate {    var callback: ((lineNumber: UInt , stringValue: String) -> ())?    var completion: ((Int) -> Void)?    var fileURL:NSURL?    var inputData:NSData?    var inputStream: NSInputStream?    var lineNumber:UInt = 0    var queue:NSOperationQueue?    var remainder:NSMutableData?    var delimiter:NSData?    //var reader:NSInputStreamReader?    func enumerateLinesWithBlock(block: (UInt, String)->() , completionHandler completion:(numberOfLines:Int) -> Void ) {        if self.queue == nil {            self.queue = NSOperationQueue()            self.queue!.maxConcurrentOperationCount = 1        }        assert(self.queue!.maxConcurrentOperationCount == 1, "Queue can't be concurrent.")        assert(self.inputStream == nil, "Cannot process multiple input streams in parallel")        self.callback = block        self.completion = completion        if self.fileURL != nil {            self.inputStream = NSInputStream(URL: self.fileURL!)        } else if self.inputData != nil {            self.inputStream = NSInputStream(data: self.inputData!)        }        self.inputStream!.delegate = self        self.inputStream!.scheduleInRunLoop(NSRunLoop.currentRunLoop(), forMode: NSDefaultRunLoopMode)        self.inputStream!.open()    }    convenience init? (withData inbound:NSData) {        self.init()        self.inputData = inbound        self.delimiter = "\n".dataUsingEncoding(NSUTF8StringEncoding)    }    convenience init? (withFileAtURL fileURL: NSURL) {        guard !fileURL.fileURL else { return nil }        self.init()        self.fileURL = fileURL        self.delimiter = "\n".dataUsingEncoding(NSUTF8StringEncoding)    }    @objc func stream(aStream: NSStream, handleEvent eventCode: NSStreamEvent){        switch eventCode {        case NSStreamEvent.OpenCompleted:            fallthrough        case NSStreamEvent.EndEncountered:            self.emitLineWithData(self.remainder!)            self.remainder = nil            self.inputStream!.close()            self.inputStream = nil            self.queue!.addOperationWithBlock({ () -> Void in                self.completion!(Int(self.lineNumber) + 1)            })            break        case NSStreamEvent.ErrorOccurred:            NSLog("error")            break        case NSStreamEvent.HasSpaceAvailable:            NSLog("HasSpaceAvailable")            break        case NSStreamEvent.HasBytesAvailable:            NSLog("HasBytesAvaible")            if let buffer = NSMutableData(capacity: 4096) {                let length = self.inputStream!.read(UnsafeMutablePointer<UInt8>(buffer.mutableBytes), maxLength: buffer.length)                if 0 < length {                    buffer.length = length                    self.queue!.addOperationWithBlock({ [weak self]  () -> Void in                        self!.processDataChunk(buffer)                        })                }            }            break        default:            break        }    }    func processDataChunk(buffer: NSMutableData) {        if self.remainder != nil {            self.remainder!.appendData(buffer)        } else {            self.remainder = buffer        }        self.remainder!.mng_enumerateComponentsSeparatedBy(self.delimiter!, block: {( component: NSData, last: Bool) in            if !last {                self.emitLineWithData(component)            }            else {                if 0 < component.length {                    self.remainder = (component.mutableCopy() as! NSMutableData)                }                else {                    self.remainder = nil                }            }        })    }    func emitLineWithData(data: NSData) {        let lineNumber = self.lineNumber        self.lineNumber = lineNumber + 1        if 0 < data.length {            if let line = NSString(data: data, encoding: NSUTF8StringEncoding) {                callback!(lineNumber: lineNumber, stringValue: line as String)            }        }    }}

ios swift multithreading large-files large-data

You should consider using NSStream (NSOutputStream/NSInputStream). If you are going to choose this approach, keep in mind that background thread run loop will need to be started (run) explicitly.

NSOutputStream has a method called outputStreamToFileAtPath:append: which is what you might be looking for.

CodeHunter

How to efficiently write large files to disk on background thread (Swift)

Latest Solution (2018)

Other Noteworthy Solutions (~2016)

File System Programming Guide

Further Refinements

Conclusion

Delegation

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last