1
0
Fork 0
mirror of https://github.com/mozilla/pdf.js.git synced 2025-04-20 07:08:08 +02:00
pdf.js/src
Jonas Jenwald d0c4bbd828 [api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303)
*This patch basically extends the approach from PR 10392, by also checking the last page.*

Currently, in e.g. the `Catalog.numPages`-getter, we're simply assuming that if the /Pages-tree has an *integer* /Count entry it must also be correct/valid.
As can be seen in the referenced PDF documents, that entry may be completely bogus which causes general parsing to breaking down elsewhere in the worker-thread (and hanging the browser).

Rather than hoping that the /Count entry is correct, similar to all other data found in PDF documents, we obviously need to validate it. This turns out to be a little less straightforward than one would like, since the only way to do this (as far as I know) is to parse the *entire* /Pages-tree and essentially counting the pages.
To avoid doing that for all documents, this patch tries to take a short-cut by checking if the last page (based on the /Count entry) can be successfully fetched. If so, we assume that the /Count entry is correct and use it as-is, otherwise we'll iterate through (potentially) the *entire* /Pages-tree to determine the number of pages.

Unfortunately these changes will have a number of *somewhat* negative side-effects, please see a possibly incomplete list below, however I cannot see a better way to address this bug.
 - This will slow down initial loading/rendering of all documents, at least by some amount, since we now need to fetch/parse more of the /Pages-tree in order to be able to access the *last* page of the PDF documents.
 - For poorly generated PDF documents, where the entire /Pages-tree only has *one* level, we'll unfortunately need to fetch/parse the *entire* /Pages-tree to get to the last page. While there's a cache to help reduce repeated data lookups, this will affect initial loading/rendering of *some* long PDF documents,
 - This will affect the `disableAutoFetch = true` mode negatively, since we now need to fetch/parse more data during document initialization. While the `disableAutoFetch = true` mode should still be helpful in larger/longer PDF documents, for smaller ones the effect/usefulness may unfortunately be lost.

As one *small* additional bonus, we should now also be able to support opening PDF documents where the /Pages-tree /Count entry is completely invalid (e.g. contains a non-integer value).

Fixes two of the issues listed in issue 14303, namely the `poppler-67295-0.pdf` and `poppler-85140-0.pdf` documents.
2021-11-27 21:57:35 +01:00
..
core [api-minor] Validate the /Pages-tree /Count entry during document initialization (issue 14303) 2021-11-27 21:57:35 +01:00
display [api-minor] Replace PDFDocumentProxy.getStats with a synchronous PDFDocumentProxy.stats getter 2021-11-20 12:20:55 +01:00
images Vectorize the logo. 2012-10-29 14:08:52 -04:00
scripting_api JS - Avoid a popup to ask for specific version of Acrobat 2021-10-29 23:09:59 +10:00
shared [api-minor] Only use Workers when postMessage transfers are supported (PR 11123 follow-up) 2021-11-19 16:47:58 +01:00
doc_helper.js [api-major] Completely remove the global PDFJS object 2018-03-01 18:13:27 +01:00
interfaces.js Use ESLint to ensure that exports are sorted alphabetically 2021-01-09 20:37:51 +01:00
license_header.js Update the year in the license_header files 2021-02-11 17:52:26 +01:00
license_header_libre.js Update the year in the license_header files 2021-02-11 17:52:26 +01:00
pdf.image_decoders.js Fix the Jbig2Image export for the gulp image_decoders build (PR 9729 follow-up, issue 13367) 2021-05-12 19:41:29 +02:00
pdf.js Try to expose more API-functionality in the TypeScript definitions 2021-09-13 13:57:56 +02:00
pdf.sandbox.external.js [JS] Fix several issues found in pdf in #13269 2021-05-04 19:21:51 +02:00
pdf.sandbox.js Clean-up usage of the TESTING-define in src/pdf.sandbox.js 2021-05-11 12:39:33 +02:00
pdf.scripting.js Tweak the pdf.scripting.js bundling, to improve overall consistency 2020-10-25 16:36:56 +01:00
pdf.worker.entry.js Update the year in the license_header files 2021-02-11 17:52:26 +01:00
pdf.worker.js Convert the src/pdf.js and src/pdf.worker.js files to use standard import/export statements 2020-05-20 13:18:23 +02:00
worker_loader.js Update Prettier to version 2.0 2020-04-14 12:28:14 +02:00