Taming the AWS framework to upload a large file to S3

Warning: this is not the cleanest code I have ever written as this was for a proof of concept, but i'm hoping someone will find it useful. Please add error handling as I removed it for clarity:-).


For TenStats, I had to upload video file which were about 300 to 400 MB to S3. As you can imagine it can take some time. iOS has some nice features allowing to do networking in the background, be it downloading or uploading, so I was hopeful.

Then I discovered the AWS Transfer Utility and was thrilled. It was working great, with one big exception... If the upload took more than an hour, the upload would fail. And I did not find out because I ran into the issue myself, but because my users did. They were using the app in real life and the real world does not always include a nice stable WIFI connection...

So after digging around a little bit I found that AWS Transfer Utility uses Cognito identity to create temporary credentials which allow you to upload the file. Those credentials have a limited lifespan of 60 minutes.

From there, I essentially found out that if I wanted to upload a large file in the background to AWS S3 reliably, I needed to roll up my sleeves and do some dirty work. To keep this short, I'll omit the NSURLSession background code for now as it adds another layer of complexity. I'll keep that for another post.

Overview

If you work with AWS framework, you probably know this, but it is in my opinion a very verbose framework. What I mean by this is that you have to go through multiple small steps to get it to work.

For example, if you want to upload a file you'll need to:

  • Configure the AWS Framework
  • create a request to tell AWS you are creating a multipart upload
  • run the request you just created
  • Create a request to ask AWS for a pre-signed URL for each part
  • Run the request you just created
  • upload each part to the pre-signed URL you got
  • Create a request to tell AWS that you uploaded everything
  • Run the request you just created

Yes, that is a lot of step, I guess for simpler scenario you can use the AWS Transfer Utility...

Configure AWS

First the AWS framework needs to be configured to be able to generate pre-signed URLs which are valid for more than 60 minutes. The official way of doing this is to do it on your servers. I did not have any servers available to do this and I was actually doing a proof of concept, so I generated those links in the app.


//Create a credentialsProvider to instruct AWS how to sign those URL
let credentialsProvider = AWSStaticCredentialsProvider(accessKey: accessKey, secretKey: secretKey)
 
 //create a service configuration with the credential provider we just created
let configuration = AWSServiceConfiguration.init(region: AWSRegionType.USEast1, credentialsProvider: credentialsProvider)

//set this as the default configuration
//this way any time the AWS frameworks needs to get credential
//information, it will get those from the credential provider we just created
AWSServiceManager.defaultServiceManager().defaultServiceConfiguration = configuration

The variables accessKey and secretKey which are the credential information were retrieved from my parse server. I did not store those piece of information in the app for security reasons. But as I mentioned before, this is not the most secure way of doing things either.

Create a multipart upload

Realistically I did not want to upload the whole file in one go. If the connection to S3 dropped for some reason the entire upload would need to be restarted. So in comes multi-part upload.


//create a request to start a multipart upload
let multipart = AWSS3CreateMultipartUploadRequest()

//the key in AWS S3 parlance is the name of the file, it needs to be unique
multipart.key = "myGreatFile"

//tell which bucket you want to upload to
multipart.bucket = "mygreatbucket"

//and the content type of the file you are uploading (in my case MP4 video)
multipart.contentType = "video/MP4"

The code above created a request to create a multipart upload, it did not run the request to create it. So this is the next part, actually creating the upload on the server by running the request.

Start a multipart upload


//access the default AWS S3 object, which is configured appropriately
let awsService = AWSS3.defaultS3()

//actually create the multipart upload using the multipart request we created earlier
awsService.createMultipartUpload(multipartRequest).continueWithBlock ({ (task:AWSTask!) -> AnyObject! in
	//get the ID that AWS uses to uniquely identify this upload as you'll need it later
	self.multipartUploadId = s3MultiPartUpload.uploadId
                
	//as individual part complete you'll want to keep track of those
	//as AWS S3 requires the list of all parts to be able to reassemble the file
	self.completedPartsInfo = AWSS3CompletedMultipartUpload()
                         
	//now that we have an upload ID we can actually start uploading the parts
	self.uploadAllParts()
})

I obviously left all the error handling code out for clarity, but you get the idea. Once the request completes, you have a unique ID for your upload so AWS knows where to put your data.

Create the parts to upload

The next part is to simply chunk off the file in multiple parts to be sent to AWS. One thing to know about NSURLSession, is that you need to schedule all the request at the beginning of the session. You can't add more tasks as the previous one completes (well you can, but you'll run into issues as each added request is delayed). This is something I discovered the hard way...


func uploadAllParts ()
{
	//get the file size of the file to upload
    let fileAttributes = try! FileManager.default.attributesOfItem(atPath: self.fileURL().path)
    let fileSizeNumber = fileAttributes[FileAttributeKey.size] as! NSNumber
    let fileSize = fileSizeNumber.int64Value

	//figure out how many parts we're going to have
    let partsCount = Int(fileSize / Int64(self.chunckSize)) + 1  //+1: if the file is 5Mb and chunkSize is 10, the 5/10 will be 0 but we have 1 part
    
    //create a part for each chunk
    var partsStarted = 0
    var chunkIndex = 1
    while partsCount >= chunkIndex
    {
        //reading from file allocates memory in chunk the same size as the chunck
        //need to release it to avoid running out of memory
        autoreleasepool   {
            self.uploadAWSPart(chunkIndex)
            
            chunkIndex += 1
        }
    }
}

Create the parts' upload requests

In the last step we looped to create each individual part, in this section, we'll do the actual creation.


func uploadAWSPart(_ awsPartNumber:Int)
{
        //Get a presigned URL for AWS S3
        let getPreSignedURLRequest = AWSS3GetPreSignedURLRequest()
        
        //specify which bucket we want to upload to
        getPreSignedURLRequest.bucket = self.bucketName

		//specify what is the name of the file
        getPreSignedURLRequest.key = self.fileName
        
        //for upload, we need to do a PUT
        getPreSignedURLRequest.HTTPMethod = AWSHTTPMethod.PUT;
        
        //this is where the magic happens, you can specify how long you want
        //this pre-signed URL to be valid for, in this case 36 hours
        getPreSignedURLRequest.expires = Date(timeIntervalSinceNow: 36 * 60 * 60);

        //Important: set contentType for a PUT request.
        getPreSignedURLRequest.contentType = self.contentType
        
        //Tell AWS which upload you are uploading to, this is a value we got earlier
        getPreSignedURLRequest.setValue(self.multipartUploadId, forRequestParameter: "uploadId")
        
        //tell AWS what is the index of this part, note that this needs to be a string for some reason
        getPreSignedURLRequest.setValue(String(awsPartNumber), forRequestParameter: "partNumber") 

        //generate the file for the current chunck
        //NSURLSession can only work from files when working in the background
        //so we need to create a file containing just the part required
        let URL = self.fileForChunk(awsPartNumber) 
        
        //AWS wants to get an MD5 hash of the file to make sure everything got transfered ok
        let MD5 = (try? Data(contentsOf: URL))?.base64MD5()
        getPreSignedURLRequest.contentMD5 = MD5
        
        //create a presigned URL request for this specific chunk
        let presignedTask = AWSS3PreSignedURLBuilder.defaultS3PreSignedURLBuilder().getPreSignedURL(getPreSignedURLRequest)
        
        //run the request to get a presigned URL
        presignedTask.continueWithExecutor(AWSExecutor.mainThreadExecutor(), withBlock: { (task:AWSTask!) -> AnyObject! in            
            if let presignedURL = task.result as? NSURL
            {
            	//we now have the URL we can use to upload this chunk...
                self.startUploadForPresignedURL (presignedURL, chunkURL: URL, awsPartNumber: awsPartNumber)
            }
            return nil
        })
    }
}

Start uploading a part

The code above got us the pre-signed URL for a single part, but did not actually start the upload and this is done using NSURLSession functionality.


    func startUploadForPresignedURL (_ presignedURL:URL, chunkURL: URL, awsPartNumber: Int)
    {
    	//create the request with the presigned URL
        let URLRequest = NSMutableURLRequest(url: presignedURL)
        URLRequest.cachePolicy = .reloadIgnoringLocalCacheData
        URLRequest.httpMethod = "PUT"
        URLRequest.setValue(self.contentType, forHTTPHeaderField: "Content-Type")
        URLRequest.setValue((try? Data(contentsOf: chunkURL))?.base64MD5(), forHTTPHeaderField: "Content-MD5")
        
        //create the upload task with the request
        let uploadTask = self.session!.uploadTask(with: URLRequest as URLRequest, fromFile: chunkURL)
        
        //set the part number as the description so we can keep track of the various tasks
        uploadTask.taskDescription = String(awsPartNumber)
        
        //start the part upload
        uploadTask.resume()
    }

 

Handle response for each part

As the responses come in you will need to save some information about the parts so that once they are all complete, you can pass that information back to AWS.


    func handleSuccessfulPartUploadInSession (_ session: Foundation.URLSession, task: URLSessionTask)
    {
        //for each part we need to save the etag and the part number
        let completedPart = AWSS3CompletedPart()
        
        //remember how we saved the part number in the task description, time to get it back
        completedPart.partNumber = NSNumber(unsignedInteger: UInt(task.taskDescription!)!)
        
        //save the etag as AWS needs that information
         let headers = (task.response as! HTTPURLResponse).allHeaderFields
        completedPart.ETag = headers["ETag"] as? String
        
        //add the part to the list of completed parts
        self.completedPartsInfo.parts!.append(completedPart)
       
        
        //check if there are any other parts uploading
        self.session!.getAllTasks(completionHandler: { (tasks:[URLSessionTask]) -> Void in
            if tasks.count > 1 //completed task are flushed from the list, current task is still listed though, hence 1
            {
                //upload is still progressing
            }
            else
            {
            	//all parts were uploaded, let AWS know
                self.completeUpload(session)
            }
        })
    }

Complete the upload 

Now that all the parts are uploaded, we need to let AWS know about it and give it the information it needs to reassemble those parts.


    func completeUpload (_ session:Foundation.URLSession)
    {
        //For some reason AWS needs the parts sorted, it can't do it on its own...
        let descriptor = NSSortDescriptor(key: "partNumber", ascending: true)
        self.completedPartsInfo.parts = (self.completedPartsInfo.parts! as NSArray).sortedArrayUsingDescriptors([descriptor])
        
        //close up the session as we are done
        self.session?.finishTasksAndInvalidate()
        self.session = nil
        
        //create the request to complete the multipart upload
        let complete = AWSS3CompleteMultipartUploadRequest()
        complete.uploadId = self.multipartUploadId
        complete.bucket = "mygreatbucket"
        complete.multipartUpload = self.completedPartsInfo
        complete.key = "myGreatFile"
       
       //run the request that will complete the uplaod
       AWSS3.defaultS3().completeMultipartUpload(complete).continueWithBlock({ (task:AWSTask!) -> AnyObject! in
       		//handle error and do any needed cleanup
            return nil
        })
    }

 

That is it. It always seems to me like a lot of code, but this is what you'll need. Keep in mind, if you are uploading small files the transfer utility is your friend.

Also, the other thing to keep in mind is that on top of this code, you'll most likely have to add the code to properly keep track of the completion of the NSURLSessionTasks.