Running LibreOffice In AWS Lambda: 2022 Edition, Open-Sourced

History

Exactly 5 years ago, I wrote about running LibreOffice in AWS Lambda for the first time. It enabled people to generate PDFs from any office file format for a fraction of a dollar. As opposed to a traditional server-based approach, it utilizes serverless computing to drastically reduce costs and get better stability. Each Lambda invocation gets a fresh runtime environment, so users benefit from not having to worry about frozen zombie processes or memory leaks.

The original approach was based on compiling a LibreOffice build specifically suited for the AWS Lambda execution environment (with many Linux libraries missing) and packaging it as a Lambda layer. It can be found here: https://github.com/vladholubiev/serverless-libreoffice

Since then, a lot of things changed in the AWS Lambda landscape. Lambda got native Docker support, higher memory limits, expandable /tmp space, etc.

The most common problems observed with the original approach were:

  • Stripped out LibreOffice build lacked many features and supported fewer file formats than a standalone desktop build.
  • No fonts or emoji support.
  • Limited disk space available for converting large files, since a layer consumed ~50% of the /tmp space.
  • A complex process of upgrading to a new version due to constraints of the AWS Lambda’s provided environment.

A New Approach Was Born

To address those problems, I wrote a new version based on Docker and compiled it with the latest LibreOffice 7.4. The setup is a bit more involved, but it’s more flexible, easier to maintain, and produces better-quality outputs. Some of the benefits:

  • The size limit of the compiled LibreOffice increased from 512 MB (Lambda layer limitation) to 10 GB — meaning less stuff needs to be stripped out from the LibreOffice package. The new base image is 877 MB in size.
  • More repeatable build — by utilizing Docker for the build process, it’s much easier to replicate a build locally and upgrade to new LibreOffice versions.
  • Ability to install fonts or other LibreOffice extensions — my base image already includes CJK fonts!

The outcome of this is:

  • a base Docker image, that you can use to package your Lambda function Docker image with the latest LibreOffice and its dependencies installed, ready to be used for PDF generation.
  • a helper utility node.js library to help invoke LibreOffice commands to do file conversion.

Check them out:

Bonus tip: by default, Lambda allocates 512 MB of /tmp space for your Lambda. When converting larger files, you might need more space. Since March 2022, it’s now possible to extend the /tmp space to 10 GB! See the blog post.

How to Use It

If you are already bundling your Lambdas as Docker images, the setup should be familiar to you. If not — check out the AWS Documentation on the topic.

It starts with adjusting your Dockerfile. By default, when you’re building a Docker image for your Lambda, you base it off the official AWS image like this:

FROM public.ecr.aws/lambda/nodejs:16-x86_64

What you need to do instead, is to change this line to the base image for Node.js 16 with LibreOffice:

FROM public.ecr.aws/shelf/lambda-libreoffice-base:7.4-node16-x86_64

Next, you finish your Dockefile with the usual bundling steps:

COPY handler.js ${LAMBDA_TASK_ROOT}/
CMD [ "handler.handler" ]

I’ll skip the rest of the steps needed, like building the image, pushing it to the ECR repository, etc, as it’s out of the scope of this article. You can find tutorials on that online, as well as in the AWS Docs.

So now, by changing the base image of your Lambda, you are able to execute LibreOffice commands in your Lambda! Here is an example that creates a dummy TXT file and converts it to PDF:

const {execSync} = require('child_process');
const {writeFileSync} = require('fs');
module.exports.handler = () => {
  writeFileSync('/tmp/hello.txt', Buffer.from('Hello World!'));
  execSync(`
    cd /tmp
    libreoffice7.4 --headless --invisible --nodefault --view --nolockcheck --nologo --norestore --convert-to pdf --outdir /tmp ./hello.txt
  `);
};

If you want a more high-level API around LibreOffice rather than fiddling with the commands like libreoffice7.4 --headless, check out https://github.com/shelfio/aws-lambda-libreoffice. It provides API like this:

const {convertTo} = require('@shelf/aws-lambda-libreoffice');
convertTo('document.docx', 'pdf'); // returns /tmp/document.pdf

The library handles specifics of the Lambda environment, like saving files to the only writable path /tmp and cleaning up residual files.

Summary

Thousands of people used the original 2017 build of LibreOffice for Lambda to run their production workloads.

And now, there is a new, recommended way of running LibreOffice in AWS Lambda. It utilizes Docker instead of Lambda layers. It has the latest LibreOffice version compiled and CJK fonts support.

Right now, there is a pre-built base image for Node 16 environment, but it could be easily extended to support more languages. Pull Requests and contributions are welcome! Check out the Dockerfile.node16-x86_64 as an example for inspiration.