Fuzzing and Parsing Securely

May 19, 2021 | By Jeremy Mill

fuzzing excel data

Here at FloQast, we do quite a bit of parsing, especially of the language of accountants known as xlsx. As part of our normal secure software development lifecycle (SSDLC) and our application security (appsec) program in general, we test the components that become part of our product. Testing includes not just the normal OWASP top 10 type issues, but also more advanced attack techniques like fuzzing.

Fuzzing is repeatedly giving a program random or pseudo-random inputs with the goal of discovering unhandled errors or unintended behaviors. Often it reveals crashes, denial of service conditions (like infinite loops) or other potentially dangerous scenarios.

The Fuzzer

A large portion of our tech stack is written in Node.js, so our fuzzer must support JavaScript as a first-class target. Luckily for us, GitLab purchased the excellent security company Peach Tech in June, 2020. Peach Tech is the company behind excellent tools like:

  • Peach Fuzz
  • pythonfuzz
  • jsfuzz
  • and a bunch of other awesome security tools

This was lucky for us because GitLab kept jsfuzz open source. It can be found here: jsfuzz.

jsfuzz is a 'coverage guided fuzzer' which means instead of using purely random input to an application it tries to mutate the input to generate test cases that test as many branches of the source code as possible. This means we should get more accurate results and a higher degree of confidence that our fuzzing has adequately tested the software under test. Other coverage-guided fuzzers are AFL and libFuzzer.

One of the best parts of jsfuzz is that setup is simple. Most often the hardest part of fuzzing is getting everything set up. Compared to other fuzzing frameworks, this was a breeze and the maintainers provide a comparatively huge number of examples.

The Target and the Corpus

Our target this time was a library named sheetjs which is also known as xlsx and js-xlsx on npm: https://www.npmjs.com/package/xlsx. The library is relatively popular, with just over 1,000,000 downloads/week. The xlsx library has a pro version, with extended features as well as the open-source version. At a high level this library is a:

Parser and writer for various spreadsheet formats. Pure-JS cleanroom implementation from official specifications, related documents, and test files. Emphasis on parsing and writing robustness, cross-format feature compatibility with a unified JS representation, and ES3/ES5 browser compatibility back to IE6.
-Taken from the project description on npm

The testing outlined here was performed on the open-source version using the latest commit to the main branch of their GitHub repository.

At a high level our plan was:

  • Give a 'known' good .xlsx file as a corpus
  • let jsfuzz run until we get a crash
  • for each crash:
    • See if it's an error we should skip
      • If yes, add it to our 'ignore list'
      • If no, add it to our findings
  • For each crash in our findings:
    • Validate the bug using a "real world" example
    • Evaluate the impact of the bug

A "corpus" is a known good file or set of files. Giving these files to a coverage-guided fuzzer can greatly increase the speed and efficacy of the fuzzer when compared to starting with random bytes. In this case, our corpus was a single xlsx file named cat.xlsx that was as simple an excel file as possible:

a b c
2 cat 11.15

While I started with a single file, ideally you should provide several files in order to increase our odds of reaching more (un-explored/fuzzed) code paths. We stored cat.xlsx in a folder named corpus together with our fuzzer fuzz.js, which we'll create in the next section.

Let's Start Fuzzing!

First, let's get jsfuzz installed.

npm install -g jsfuzz

Now let's create our fuzzer target. This is our starting file that doesn't know how to handle any errors. I named mine fuzz.js and looked like this:

const xlsx = require('xlsx');

async function fuzz (bytes) {
  try {
    await xlsx.read(bytes)
  } catch (error) {
    if (!acceptable(error)) throw error
}

function acceptable (error) {
  return !!expected
    .find(message => error.message.startsWith(message))
}

const expected = [
]

exports.fuzz = fuzz

This file:

  • Imports our library under test
  • Defines a 'fuzz' function that jsfuzz will call
  • Sets up our error handling for expected errors
  • Doesn't forget to export the 'fuzz' function, which would make an engineer question their career choices

There are almost always going to be errors we want to ignore while fuzzing. Imagine a jpeg parser and an error that says invalid format. That's an error we're not interested in while fuzzing, we're probably going to create thousands of them!. As we run our tests and discover errors that we want to ignore we can start to populate to the expected array.

Lets see how we kick off our fuzzer with our corpus:

jsfuzz ./fuzz.js corpus

After a few runs my set of expected errors looked like this:

const expected = [
  'Unrecognized LOTUS BOF',
  'Unexpected BIFF Ver',
  'Bad Gutters',
  "Cannot set property 'ImData'",
  "Cannot read property",
  'MulBlank read error',
  'Bad SerAr',
  'Corrupted zip',
  'End of data reached',
  'Header Signature',
  'CFB file size',
  'Cannot find file',
  'Unsupported ELFs',
  'RangeError',
  'String record expects Formula'
]

Handling Errors Without Messages

A problem came up with my error handler while testing. I was getting unhandled exceptions from node and not xlsx that didn't have a message associated with them. To handle these errors I made some changes to my acceptable function:

function acceptable (error) {
  if (error instanceof RangeError && error.code === 'ERR_BUFFER_OUT_OF_BOUNDS') {
    console.log('---skipping RangeError')
    return true
  }else if (error instanceof RangeError) {
    console.log('---skipping unknown RangeError')
    return true
  }
  return !!expected
    .find(message => error.message.startsWith(message))
}

I definitely could have handled this a bit more gracefully, but in this case, it worked well enough. I wanted to skip these RangeErrors because they didn't really crash anything and I wanted to find some juicier bugs.

The Results

Within an hour or so three potential high severity bugs were discovered. Two were out-of-memory exceptions that crashed Node.js resulting in a denial of service condition and one was a condition that caused the application to hang and consume 100% of the CPU. These issues were assigned the following CVEs:

  • CVE-2021-32012
  • CVE-2021-32013
  • CVE-2021-32014

The bugs were disclosed to and patched by the vendor on the following timeline:

  • 5/4/21 - Initial Vendor Contact
  • 5/4/21 - Discussion with vendor on the root cause
  • 5/13/21 - Vendor releases patch
  • 5/19/21 - This blog post

During the disclosure process, it was discovered that CVE-2021-32013 and CVE-2021-32012 had a common root cause and CVE-2021-32014 was unrelated.

I'd like to give a huge shoutout to the devs at https://sheetjs.com/ who were great to work with during the disclosure process. Getting a bug, triaging it, and releasing a patch in ~9 days is awesome and a breath of fresh air.

Actions if You’re an ‘Xlsx’ User

If you use xlsx from sheetjs, you should upgrade to version 0.17.0 or the latest release of the pro version.

Jeremy Mill
Jeremy Mill
Jeremy Mill is a Senior Application Security Engineer at FloQast. He is a former Marine and enjoys speaking at conferences and building robust, integrated security programs

Check out research, videos, case studies, and more!

Learn more about working at FloQast!