mirror of
https://github.com/mozilla/pdf.js.git
synced 2025-04-25 09:38:06 +02:00
[api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303)
*This patch basically extends the approach from PR 10392, by also checking the last page.* Currently, in e.g. the `Catalog.numPages`-getter, we're simply assuming that if the /Pages-tree has an *integer* /Count entry it must also be correct/valid. As can be seen in the referenced PDF documents, that entry may be completely bogus which causes general parsing to breaking down elsewhere in the worker-thread (and hanging the browser). Rather than hoping that the /Count entry is correct, similar to all other data found in PDF documents, we obviously need to validate it. This turns out to be a little less straightforward than one would like, since the only way to do this (as far as I know) is to parse the *entire* /Pages-tree and essentially counting the pages. To avoid doing that for all documents, this patch tries to take a short-cut by checking if the last page (based on the /Count entry) can be successfully fetched. If so, we assume that the /Count entry is correct and use it as-is, otherwise we'll iterate through (potentially) the *entire* /Pages-tree to determine the number of pages. Unfortunately these changes will have a number of *somewhat* negative side-effects, please see a possibly incomplete list below, however I cannot see a better way to address this bug. - This will slow down initial loading/rendering of all documents, at least by some amount, since we now need to fetch/parse more of the /Pages-tree in order to be able to access the *last* page of the PDF documents. - For poorly generated PDF documents, where the entire /Pages-tree only has *one* level, we'll unfortunately need to fetch/parse the *entire* /Pages-tree to get to the last page. While there's a cache to help reduce repeated data lookups, this will affect initial loading/rendering of *some* long PDF documents, - This will affect the `disableAutoFetch = true` mode negatively, since we now need to fetch/parse more data during document initialization. While the `disableAutoFetch = true` mode should still be helpful in larger/longer PDF documents, for smaller ones the effect/usefulness may unfortunately be lost. As one *small* additional bonus, we should now also be able to support opening PDF documents where the /Pages-tree /Count entry is completely invalid (e.g. contains a non-integer value). Fixes two of the issues listed in issue 14303, namely the `poppler-67295-0.pdf` and `poppler-85140-0.pdf` documents.
This commit is contained in:
parent
9a1e27efc5
commit
d0c4bbd828
8 changed files with 215 additions and 16 deletions
|
@ -60,6 +60,12 @@ class MissingDataException extends BaseException {
|
|||
}
|
||||
}
|
||||
|
||||
class PageDictMissingException extends BaseException {
|
||||
constructor(msg) {
|
||||
super(msg, "PageDictMissingException");
|
||||
}
|
||||
}
|
||||
|
||||
class ParserEOFException extends BaseException {
|
||||
constructor(msg) {
|
||||
super(msg, "ParserEOFException");
|
||||
|
@ -541,6 +547,7 @@ export {
|
|||
isWhiteSpace,
|
||||
log2,
|
||||
MissingDataException,
|
||||
PageDictMissingException,
|
||||
ParserEOFException,
|
||||
parseXFAPath,
|
||||
readInt8,
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue