Download YouTube videos with AWS Lambda and store them on S3

Download YouTube videos with AWS Lambda

Recently, I was faced with the challenge to download videos from YouTube and store them on S3.

Sounds easy? Remember than Lambda comes with a few limitations:

  1. 512 MB of disk space available at /tmp
  2. 3008 MB of memory
  3. 15 minutes maximum execution time

While working on a solution, I encountered multiple problems:

  1. Download the video from YouTube to /tmp and then upload it to S3: Does not work with videos larger than 512 MB.
  2. Download the video from YouTube into memory and then upload it to S3: Does not work with videos larger than ~3 GB.
  3. Download the video from Youtube and stream it to S3 while downloading: Works for all videos that can be processed within 15 minutes. I have not found a video that took longer than a few minutes to process.

Let’s look at how I finally solved the problem with a streaming approach in Node.js. I use the youtube-dl library to get easy access to YouTube videos.

First, we create a PassThrough stream in Node.js. A pass-through stream is a duplex stream where you can write on one side and read on the other side.

const stream = require('stream');
const passtrough = new stream.PassThrough();

Next, we need to write data to the stream. This is done by the youtube-dl library.

const youtubedl = require('youtube-dl');
const dl = youtubedl(event.videoUrl, ['--format=best[ext=mp4]'], {maxBuffer: Infinity});
dl.pipe(passtrough); // write video to the pass-through stream

And finally, we need to upload the stream to S3. We make use of the Multipart Upload feature of S3 which allows us to upload a big file in smaller chunks. This way, we only have to buffer the small junk (64 MB in this case) in memory and not the whole file.

const AWS = require('aws-sdk');
const upload = new AWS.S3.ManagedUpload({
params: {
Bucket: process.env.BUCKET_NAME,
Key: 'video.mp4',
Body: passtrough
},
partSize: 1024 * 1024 * 64 // 64 MB in bytes
});
upload.send((err) => {
if (err) {
console.log('error', err);
} else {
console.log('done');
}
});

That’s it. Now you can download YouTube videos of any size with Lambda and upload them to S3. I recommend running the code in a “big” Lambda function with 3008 MB of memory for better network performance.

You can find the full source code on GitHub including a SAM template to provision the AWS resources. Have fun!