1
0
Fork 0
mirror of https://github.com/mozilla/pdf.js.git synced 2025-04-25 09:38:06 +02:00

[Regression] Eagerly fetch/parse the entire /Pages-tree in corrupt documents (issue 14303, PR 14311 follow-up)

*Please note:* This is similar to the method that existed prior to PR 3848, but the new method will *only* be used as a fallback when parsing of corrupt PDF documents.

The implementation in PR 14311 unfortunately turned out to be *way* too simplistic, as evident by the recently added test-files in issue 14303, since it may *cause* infinite loops in `PDFDocument.checkLastPage` for some corrupt PDF documents.[1]
To avoid this, the easiest solution that I could come up with was to fallback to eagerly parsing the *entire* /Pages-tree when the /Count-entry validation fails during document initialization.

Fixes *at least* two of the issues listed in issue 14303, namely the `poppler-395-0.pdf...` and `GHOSTSCRIPT-698804-1.pdf...` documents.

---
[1] The whole point of PR 14311 was obviously to *get rid of* infinte loops during document initialization, not to introduce any more of those.
This commit is contained in:
Jonas Jenwald 2021-12-02 01:40:52 +01:00
parent f61b74e38e
commit 1fac6371d3
7 changed files with 504 additions and 35 deletions

View file

@ -60,12 +60,6 @@ class MissingDataException extends BaseException {
}
}
class PageDictMissingException extends BaseException {
constructor(msg) {
super(msg, "PageDictMissingException");
}
}
class ParserEOFException extends BaseException {
constructor(msg) {
super(msg, "ParserEOFException");
@ -547,7 +541,6 @@ export {
isWhiteSpace,
log2,
MissingDataException,
PageDictMissingException,
ParserEOFException,
parseXFAPath,
readInt8,