Blog -
Fuzzing and Parsing Securely
Here at FloQast, we do quite a bit of parsing, especially of the language of accountants known as xlsx
. As part of our normal secure software development lifecycle (SSDLC) and our application security (appsec) program in general, we test the components that become part of our product. Testing includes not just the normal OWASP top 10 type issues, but also more advanced attack techniques like fuzzing.
Fuzzing is repeatedly giving a program random or pseudo-random inputs with the goal of discovering unhandled errors or unintended behaviors. Often it reveals crashes, denial of service conditions (like infinite loops) or other potentially dangerous scenarios.
The Fuzzer
A large portion of our tech stack is written in Node.js
, so our fuzzer must support JavaScript as a first-class target. Luckily for us, GitLab purchased the excellent security company Peach Tech in June, 2020. Peach Tech is the company behind excellent tools like:
-
Peach Fuzz
- pythonfuzz
- jsfuzz
- and a bunch of other awesome security tools
This was lucky for us because GitLab kept jsfuzz
open source. It can be found here: jsfuzz.
jsfuzz
is a ‘coverage guided fuzzer’ which means instead of using purely random input to an application it tries to mutate the input to generate test cases that test as many branches of the source code as possible. This means we should get more accurate results and a higher degree of confidence that our fuzzing has adequately tested the software under test. Other coverage-guided fuzzers are AFL
and libFuzzer
.
One of the best parts of jsfuzz
is that setup is simple. Most often the hardest part of fuzzing is getting everything set up. Compared to other fuzzing frameworks, this was a breeze and the maintainers provide a comparatively huge number of examples.
The Target and the Corpus
Our target this time was a library named sheetjs which is also known as xlsx
and js-xlsx
on npm: https://www.npmjs.com/package/xlsx. The library is relatively popular, with just over 1,000,000 downloads/week. The xlsx
library has a pro version, with extended features as well as the open-source version. At a high level this library is a:
Parser and writer for various spreadsheet formats. Pure-JS cleanroom implementation from official specifications, related documents, and test files. Emphasis on parsing and writing robustness, cross-format feature compatibility with a unified JS representation, and ES3/ES5 browser compatibility back to IE6.-Taken from the project description on npm
The testing outlined here was performed on the open-source version using the latest commit to the main
branch of their GitHub repository.
At a high level our plan was:
- Give a ‘known’ good .xlsx file as a corpus
- let jsfuzz run until we get a crash
- for each crash:
- See if it’s an error we should skip
- If yes, add it to our ‘ignore list’
- If no, add it to our findings
- See if it’s an error we should skip
- For each crash in our findings:
- Validate the bug using a “real world” example
- Evaluate the impact of the bug
A “corpus” is a known good file or set of files. Giving these files to a coverage-guided fuzzer can greatly increase the speed and efficacy of the fuzzer when compared to starting with random bytes. In this case, our corpus was a single xlsx
file named cat.xlsx
that was as simple an excel file as possible:
a | b | c |
---|---|---|
2 | cat | 11.15 |
While I started with a single file, ideally you should provide several files in order to increase our odds of reaching more (un-explored/fuzzed) code paths. We stored cat.xlsx
in a folder named corpus
together with our fuzzer fuzz.js
, which we’ll create in the next section.
Let’s Start Fuzzing!
First, let’s get jsfuzz
installed.
npm install -g jsfuzz
Now let’s create our fuzzer target. This is our starting file that doesn’t know how to handle any errors. I named mine fuzz.js
and looked like this:
const xlsx = require('xlsx'); async function fuzz (bytes) { try { await xlsx.read(bytes) } catch (error) { if (!acceptable(error)) throw error } function acceptable (error) { return !!expected .find(message => error.message.startsWith(message)) } const expected = [ ] exports.fuzz = fuzz
This file:
- Imports our library under test
- Defines a ‘fuzz’ function that
jsfuzz
will call - Sets up our error handling for expected errors
- Doesn’t forget to export the ‘fuzz’ function, which would make an engineer question their career choices
There are almost always going to be errors we want to ignore while fuzzing. Imagine a jpeg
parser and an error that says invalid format
. That’s an error we’re not interested in while fuzzing, we’re probably going to create thousands of them!. As we run our tests and discover errors that we want to ignore we can start to populate to the expected
array.
Lets see how we kick off our fuzzer with our corpus:
jsfuzz ./fuzz.js corpus
After a few runs my set of expected errors looked like this:
const expected = [ 'Unrecognized LOTUS BOF', 'Unexpected BIFF Ver', 'Bad Gutters', "Cannot set property 'ImData'", "Cannot read property", 'MulBlank read error', 'Bad SerAr', 'Corrupted zip', 'End of data reached', 'Header Signature', 'CFB file size', 'Cannot find file', 'Unsupported ELFs', 'RangeError', 'String record expects Formula' ]
Handling Errors Without Messages
A problem came up with my error handler while testing. I was getting unhandled exceptions from node
and not xlsx
that didn’t have a message
associated with them. To handle these errors I made some changes to my acceptable
function:
function acceptable (error) { if (error instanceof RangeError && error.code === 'ERR_BUFFER_OUT_OF_BOUNDS') { console.log('---skipping RangeError') return true }else if (error instanceof RangeError) { console.log('---skipping unknown RangeError') return true } return !!expected .find(message => error.message.startsWith(message)) }
I definitely could have handled this a bit more gracefully, but in this case, it worked well enough. I wanted to skip these RangeErrors
because they didn’t really crash anything and I wanted to find some juicier bugs.
The Results
Within an hour or so three potential high severity bugs were discovered. Two were out-of-memory
exceptions that crashed Node.js
resulting in a denial of service condition and one was a condition that caused the application to hang and consume 100% of the CPU. These issues were assigned the following CVEs:
- CVE-2021-32012
- CVE-2021-32013
- CVE-2021-32014
The bugs were disclosed to and patched by the vendor on the following timeline:
- 5/4/21 – Initial Vendor Contact
- 5/4/21 – Discussion with vendor on the root cause
- 5/13/21 – Vendor releases patch
- 5/19/21 – This blog post
During the disclosure process, it was discovered that CVE-2021-32013 and CVE-2021-32012 had a common root cause and CVE-2021-32014 was unrelated.
I’d like to give a huge shoutout to the devs at https://sheetjs.com/ who were great to work with during the disclosure process. Getting a bug, triaging it, and releasing a patch in ~9 days is awesome and a breath of fresh air.
Actions if You’re an ‘Xlsx’ User
If you use xlsx
from sheetjs, you should upgrade to version 0.17.0 or the latest release of the pro version.
Back to Blog