Research Report: Building a File Observatory for Secure Parser Development
2021
Parsing untrusted data is notoriously challenging. Failure to handle maliciously crafted data correctly can (and does) lead to a wide range of vulnerabilities. The Language-theoretic security (LangSec) philosophy seeks to obviate the need for developers to apply ad hoc solutions by, instead, offering formally correct and verifiable input handling throughout the software development lifecycle. One of the key components in developing secure parsers is a broad coverage corpus that enables developers to understand the problem space for a given format and to use, potentially, as seeds for fuzzing and other automated testing. In this paper, we offer an update on work initially reported at the LangSec 2020 conference on the development of a file observatory to gather and enable analysis on a diverse collection of files at scale. The initial focus of the observatory is on Portable Document Format (PDF) files and file formats typically embedded in PDFs. In this paper, we report on the addition of a bug tracker corpus and new analytic methods.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
3
References
0
Citations
NaN
KQI