mirror of https://github.com/mozilla/pdf.js.git synced 2025-04-19 14:48:08 +02:00

PDF Reader in JavaScript

Find a file

Jonas Jenwald 60bcce184e Check that the first page can be successfully loaded, to try and ascertain the validity of the XRef table (issue 7496, issue 10326) For PDF documents with sufficiently broken XRef tables, it's usually quite obvious when you need to fallback to indexing the entire file. However, for certain kinds of corrupted PDF documents the XRef table will, for all intents and purposes, appear to be valid. It's not until you actually try to fetch various objects that things will start to break, which is the case in the referenced issues[1]. Since there's generally a real effort being in made PDF.js to load even corrupt PDF documents, this patch contains a suggested approach to attempt to do a bit more validation of the XRef table during the initial document loading phase. Here the choice is made to attempt to load the first page, as a basic sanity check of the validity of the XRef table. Please note that attempting to load a more-or-less arbitrarily chosen object without any context of what it's supposed to be isn't a very useful, which is why this particular choice was made. Obviously, just because the first page can be loaded successfully that doesn't guarantee that the entire XRef table is valid, however if even the first page fails to load you can be reasonably sure that the document is not valid[2]. Even though this patch won't cause any significant increase in the amount of parsing required during initial loading of the document[3], it will require loading of more data upfront which thus delays the initial `getDocument` call. Whether or not this is a problem depends very much on what you actually measure, please consider the following examples: ```javascript console.time('first'); getDocument(...).promise.then((pdfDocument) => { console.timeEnd('first'); }); console.time('second'); getDocument(...).promise.then((pdfDocument) => { pdfDocument.getPage(1).then((pdfPage) => { // Note: the API uses `pageNumber >= 1`, the Worker uses `pageIndex >= 0`. console.timeEnd('second'); }); }); ``` The first case is pretty much guaranteed to show a small regression, however the second case won't be affected at all since the Worker caches the result of `getPage` calls. Again, please remember that the second case is what matters for the standard PDF.js use-case which is why I'm hoping that this patch is deemed acceptable. --- [1] In issue 7496, the problem is that the document is edited without the XRef table being correctly updated. In issue 10326, the generator was sorting the XRef table according to the offsets rather than the objects. [2] The idea of checking the first page in particular came from the "standard" use-case for the PDF.js library, i.e. the default viewer, where a failure to load the first page basically means that nothing will work; note how `{BaseViewer, PDFThumbnailViewer}.setDocument` depends completely on being able to fetch the first page. [3] The only extra parsing is caused by, potentially, having to traverse part of the `Pages` tree to find the first page.		2018-12-29 12:47:25 +01:00
.github	Attempt to clarify the meaning of "extension" in the ISSUE_TEMPLATE	2017-10-21 11:32:03 +02:00
docs	Update remaining examples, and docs, to utilize current API functionality (issue 10377)	2018-12-24 12:33:39 +01:00
examples	Update remaining examples, and docs, to utilize current API functionality (issue 10377)	2018-12-24 12:33:39 +01:00
extensions	Add OpenAction destination support, off by default, to the viewer	2018-12-19 11:45:17 +01:00
external	Replace `String.prototype.substr()` occurrences with `String.prototype.substring()`	2018-09-28 11:41:07 +02:00
l10n	Update translations	2018-12-22 15:54:42 +01:00
src	Check that the first page can be successfully loaded, to try and ascertain the validity of the XRef table (issue 7496, issue 10326)	2018-12-29 12:47:25 +01:00
test	Check that the first page can be successfully loaded, to try and ascertain the validity of the XRef table (issue 7496, issue 10326)	2018-12-29 12:47:25 +01:00
web	Merge pull request #10334 from Snuffleupagus/OpenAction-dest	2018-12-23 20:49:50 +01:00
.editorconfig	Uses editorconfig to maintain consistent coding styles	2015-11-14 07:32:18 +05:30
.eslintignore	Turn on ESLint in examples directory, apply examples-specific exceptions	2018-12-11 15:23:26 +01:00
.eslintrc	Enable eslint-plugin-import to prevent unresolved paths	2018-11-23 13:50:28 +01:00
.gitattributes	Fixing C++,PHP and Pascal presence in the repo	2015-10-29 13:03:51 -05:00
.gitignore	Include `package-lock.json` for reproducible builds	2018-06-02 20:29:47 +02:00
.gitmodules	Update fonttools location and version (issue 6223)	2015-07-17 12:51:09 +02:00
.mailmap	Add mgol's name to AUTHORS, add .mailmap	2017-11-22 10:46:11 +01:00
.travis.yml	Upgrade to Gulp 4	2018-12-17 16:20:13 +01:00
AUTHORS	Add SehyunPark to AUTHORS	2017-11-29 22:24:08 +09:00
EXPORT	Adds ECCN response statement	2017-10-23 13:31:36 -05:00
gulpfile.js	Upgrade to Gulp 4	2018-12-17 16:20:13 +01:00
LICENSE	cleaned whitespace	2015-02-17 11:07:37 -05:00
package-lock.json	Update packages	2018-12-22 16:35:34 +01:00
package.json	Update packages	2018-12-22 16:35:34 +01:00
pdfjs.config	Bump versions in `pdfjs.config`	2018-10-27 16:55:23 +02:00
README.md	Add Build Status Button	2018-10-13 18:26:48 -04:00
systemjs.config.js	Provide custom messages for the `no-restricted-globals` ESLint rule, and refactor the `.eslintrc` files (PR 9868 follow-up)	2018-07-23 14:10:13 +02:00

README.md

PDF.js

PDF.js is a Portable Document Format (PDF) viewer that is built with HTML5.

PDF.js is community-driven and supported by Mozilla Labs. Our goal is to create a general-purpose, web standards-based platform for parsing and rendering PDFs.

Contributing

PDF.js is an open source project and always looking for more contributors. To get involved, visit:

Feel free to stop by #pdfjs on irc.mozilla.org for questions or guidance.

Getting Started

Online demo

https://mozilla.github.io/pdf.js/web/viewer.html

Browser Extensions

Firefox

PDF.js is built into version 19+ of Firefox.

Chrome

The official extension for Chrome can be installed from the Chrome Web Store. This extension is maintained by @Rob--W.
Build Your Own - Get the code as explained below and issue gulp chromium. Then open Chrome, go to Tools > Extension and load the (unpackaged) extension from the directory build/chromium.

Getting the Code

To get a local copy of the current code, clone it using git:

$ git clone https://github.com/mozilla/pdf.js.git
$ cd pdf.js

Next, install Node.js via the official package or via nvm. You need to install the gulp package globally (see also gulp's getting started):

$ npm install -g gulp-cli

If everything worked out, install all dependencies for PDF.js:

$ npm install

Finally, you need to start a local web server as some browsers do not allow opening PDF files using a file:// URL. Run:

$ gulp server

and then you can open:

http://localhost:8888/web/viewer.html

Please keep in mind that this requires an ES6 compatible browser; refer to Building PDF.js for usage with older browsers.

It is also possible to view all test PDF files on the right side by opening:

http://localhost:8888/test/pdfs/?frame

Building PDF.js

In order to bundle all src/ files into two production scripts and build the generic viewer, run:

$ gulp generic

This will generate pdf.js and pdf.worker.js in the build/generic/build/ directory. Both scripts are needed but only pdf.js needs to be included since pdf.worker.js will be loaded by pdf.js. The PDF.js files are large and should be minified for production.

Using PDF.js in a web application

To use PDF.js in a web application you can choose to use a pre-built version of the library or to build it from source. We supply pre-built versions for usage with NPM and Bower under the pdfjs-dist name. For more information and examples please refer to the wiki page on this subject.

Including via a CDN

PDF.js is hosted on several free CDNs:

Learning

You can play with the PDF.js API directly from your browser using the live demos below:

Interactive examples

More examples can be found in the examples folder. Some of them are using the pdfjs-dist package, which can be built and installed in this repo directory via gulp dist-install command.

For an introduction to the PDF.js code, check out the presentation by our contributor Julian Viereck:

https://www.youtube.com/watch?v=Iv15UY-4Fg8

More learning resources can be found at:

https://github.com/mozilla/pdf.js/wiki/Additional-Learning-Resources

Questions

Check out our FAQs and get answers to common questions:

https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions

Talk to us on IRC (Internet Relay Chat):

#pdfjs on irc.mozilla.org

File an issue:

https://github.com/mozilla/pdf.js/issues/new

https://twitter.com/pdfjs