mirror of https://github.com/mozilla/pdf.js.git synced 2025-04-19 22:58:07 +02:00

PDF Reader in JavaScript

Find a file

Jonas Jenwald 0351852d74 [api-minor] Decode all JPEG images with the built-in PDF.js decoder in `src/core/jpg.js` Currently some JPEG images are decoded by the built-in PDF.js decoder in `src/core/jpg.js`, while others attempt to use the browser JPEG decoder. This inconsistency seem unfortunate for a number of reasons: - It adds, compared to the other image formats supported in the PDF specification, a fair amount of code/complexity to the image handling in the PDF.js library. - The PDF specification support JPEG images with features, e.g. certain ColorSpaces, that browsers are unable to decode natively. Hence, determining if a JPEG image is possible to decode natively in the browser require a non-trivial amount of parsing. In particular, we're parsing (part of) the raw JPEG data to extract certain marker data and we also need to parse the ColorSpace for the JPEG image. - While some JPEG images may, for all intents and purposes, appear to be natively supported there's still cases where the browser may fail to decode some JPEG images. In order to support those cases, we've had to implement a fallback to the PDF.js JPEG decoder if there's any issues during the native decoding. This also means that it's no longer possible to simply send the JPEG image to the main-thread and continue parsing, but you now need to actually wait for the main-thread to indicate success/failure first. In practice this means that there's a code-path where the worker-thread is forced to wait for the main-thread, while the reverse should always be the case. - The native decoding, for anything except the simplest of JPEG images, result in increased peak memory usage because there's a handful of short-lived copies of the JPEG data (see PR 11707). Furthermore this also leads to data being parsed on the main-thread, rather than the worker-thread, which you usually want to avoid for e.g. performance and UI-reponsiveness reasons. - Not all environments, e.g. Node.js, fully support native JPEG decoding. This has, historically, lead to some issues and support requests. - Different browsers may use different JPEG decoders, possibly leading to images being rendered slightly differently depending on the platform/browser where the PDF.js library is used. Originally the implementation in `src/core/jpg.js` were unable to handle all of the JPEG images in the test-suite, but over the last couple of years I've fixed (hopefully) all of those issues. At this point in time, there's two kinds of failure with this patch: - Changes which are basically imperceivable to the naked eye, where some pixels in the images are essentially off-by-one (in all components), which could probably be attributed to things such as different rounding behaviour in the browser/PDF.js JPEG decoder. This type of "failure" accounts for the vast majority of the total number of changes in the reference tests. - Changes where the JPEG images now looks ever so slightly blurrier than with the native browser decoder. For quite some time I've just assumed that this pointed to a general deficiency in the `src/core/jpg.js` implementation, however I've discovered when comparing two viewers side-by-side that the differences vanish at higher zoom levels (usually around 200% is enough). Basically if you disable [this downscaling in canvas.js](`8fb82e939c/src/display/canvas.js (L2356-L2395)`), which is what happens when zooming in, the differences simply vanish! Hence I'm pretty satisfied that there's no significant problems with the `src/core/jpg.js` implementation, and the problems are rather tied to the general quality of the downscaling algorithm used. It could even be seen as a positive that all images now share the same downscaling behaviour, since this actually fixes one old bug; see issue 7041.		2020-05-22 00:22:48 +02:00
.github	Update links from IRC to Matrix.	2020-02-27 16:26:17 -08:00
docs	Update the getting started page of the website for the new release	2020-03-19 23:07:45 +01:00
examples	[api-minor] Decode all JPEG images with the built-in PDF.js decoder in `src/core/jpg.js`	2020-05-22 00:22:48 +02:00
extensions	Update Prettier to version 2.0	2020-04-14 12:28:14 +02:00
external	Reduce usage of SystemJS, in the development viewer, even further	2020-05-20 13:36:52 +02:00
l10n	Update l10n files	2020-05-16 11:47:08 +02:00
src	[api-minor] Decode all JPEG images with the built-in PDF.js decoder in `src/core/jpg.js`	2020-05-22 00:22:48 +02:00
test	[api-minor] Decode all JPEG images with the built-in PDF.js decoder in `src/core/jpg.js`	2020-05-22 00:22:48 +02:00
web	Reduce usage of SystemJS, in the development viewer, even further	2020-05-20 13:36:52 +02:00
.editorconfig	Uses editorconfig to maintain consistent coding styles	2015-11-14 07:32:18 +05:30
.eslintignore	Replace the bundled `ReadableStream` polyfill with the `web-streams-polyfill` npm package (issue 11157)	2019-09-23 22:16:59 +02:00
.eslintrc	Reduce usage of SystemJS, in the development viewer, even further	2020-05-20 13:36:52 +02:00
.gitattributes	Fixing C++,PHP and Pascal presence in the repo	2015-10-29 13:03:51 -05:00
.gitignore	Include `package-lock.json` for reproducible builds	2018-06-02 20:29:47 +02:00
.gitmodules	Update fonttools location and version (issue 6223)	2015-07-17 12:51:09 +02:00
.gitpod.Dockerfile	Simplifies code contributions by automating the dev setup with gitpod.io	2019-11-06 04:12:19 +00:00
.gitpod.yml	Simplifies code contributions by automating the dev setup with gitpod.io	2019-11-06 04:12:19 +00:00
.mailmap	Add mgol's name to AUTHORS, add .mailmap	2017-11-22 10:46:11 +01:00
.prettierrc	Update Prettier to version 2.0	2020-04-14 12:28:14 +02:00
.travis.yml	Use Node LTS releases to fix Travis CI builds (issue 10790)	2020-04-22 00:06:27 +02:00
AUTHORS	Add SehyunPark to AUTHORS	2017-11-29 22:24:08 +09:00
CODE_OF_CONDUCT.md	Add Mozilla Code of Conduct file	2019-03-27 21:00:01 -07:00
EXPORT	Adds ECCN response statement	2017-10-23 13:31:36 -05:00
gulpfile.js	Add a `minified-es5` gulp task (issue 11858)	2020-05-10 13:41:42 +02:00
LICENSE	cleaned whitespace	2015-02-17 11:07:37 -05:00
package-lock.json	Reduce usage of SystemJS, in the development viewer, even further	2020-05-20 13:36:52 +02:00
package.json	Reduce usage of SystemJS, in the development viewer, even further	2020-05-20 13:36:52 +02:00
pdfjs.config	Bump versions in `pdfjs.config`	2020-03-19 23:01:17 +01:00
README.md	Remove any mention of Gitpod from the README (issue 11732)	2020-04-11 16:47:27 +02:00
systemjs.config.js	docs: Fix simple typo, occurences -> occurrences	2020-04-18 07:53:18 +10:00

README.md

PDF.js

PDF.js is a Portable Document Format (PDF) viewer that is built with HTML5.

PDF.js is community-driven and supported by Mozilla Labs. Our goal is to create a general-purpose, web standards-based platform for parsing and rendering PDFs.

Contributing

PDF.js is an open source project and always looking for more contributors. To get involved, visit:

Feel free to stop by our Matrix room for questions or guidance.

Getting Started

Online demo

Please note that the "Modern browsers" version assumes native support for features such as e.g. async/await, Promise, and ReadableStream.

Modern browsers: https://mozilla.github.io/pdf.js/web/viewer.html
Older browsers: https://mozilla.github.io/pdf.js/es5/web/viewer.html

Browser Extensions

Firefox

PDF.js is built into version 19+ of Firefox.

Chrome

The official extension for Chrome can be installed from the Chrome Web Store. This extension is maintained by @Rob--W.
Build Your Own - Get the code as explained below and issue gulp chromium. Then open Chrome, go to Tools > Extension and load the (unpackaged) extension from the directory build/chromium.

Getting the Code

To get a local copy of the current code, clone it using git:

$ git clone https://github.com/mozilla/pdf.js.git
$ cd pdf.js

Next, install Node.js via the official package or via nvm. You need to install the gulp package globally (see also gulp's getting started):

$ npm install -g gulp-cli

If everything worked out, install all dependencies for PDF.js:

$ npm install

Finally, you need to start a local web server as some browsers do not allow opening PDF files using a file:// URL. Run:

$ gulp server

and then you can open:

http://localhost:8888/web/viewer.html

Please keep in mind that this requires an ES6 compatible browser; refer to Building PDF.js for usage with older browsers.

It is also possible to view all test PDF files on the right side by opening:

http://localhost:8888/test/pdfs/?frame

Building PDF.js

In order to bundle all src/ files into two production scripts and build the generic viewer, run:

$ gulp generic

This will generate pdf.js and pdf.worker.js in the build/generic/build/ directory. Both scripts are needed but only pdf.js needs to be included since pdf.worker.js will be loaded by pdf.js. The PDF.js files are large and should be minified for production.

Using PDF.js in a web application

To use PDF.js in a web application you can choose to use a pre-built version of the library or to build it from source. We supply pre-built versions for usage with NPM and Bower under the pdfjs-dist name. For more information and examples please refer to the wiki page on this subject.

Including via a CDN

PDF.js is hosted on several free CDNs:

Learning

You can play with the PDF.js API directly from your browser using the live demos below:

Interactive examples

More examples can be found in the examples folder. Some of them are using the pdfjs-dist package, which can be built and installed in this repo directory via gulp dist-install command.

For an introduction to the PDF.js code, check out the presentation by our contributor Julian Viereck:

https://www.youtube.com/watch?v=Iv15UY-4Fg8

More learning resources can be found at:

https://github.com/mozilla/pdf.js/wiki/Additional-Learning-Resources

The API documentation can be found at:

https://mozilla.github.io/pdf.js/api/

Questions

Check out our FAQs and get answers to common questions:

https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions

Talk to us on Matrix:

https://chat.mozilla.org/#/room/#pdfjs:mozilla.org

File an issue:

https://github.com/mozilla/pdf.js/issues/new

https://twitter.com/pdfjs