Add basic validation of the `scanLines` parameter in JPEG images, before delegating decoding to the browser

mirror of https://github.com/mozilla/pdf.js.git synced 2025-04-25 09:38:06 +02:00

In some cases PDF documents can contain JPEG images that the native browser decoder cannot handle, e.g. images with DNL (Define Number of Lines) markers or images where the SOF (Start of Frame) marker contains a wildly incorrect `scanLines` parameter.
Currently, for "simple" JPEG images, we're relying on native image decoding to *fail* before falling back to the implementation in `src/core/jpg.js`. In some cases, note e.g. issue 10880, the native image decoder doesn't outright fail and thus some images may not render.

In an attempt to improve the current situation, this patch adds additional validation of the JPEG image SOF data to force the use of `src/core/jpg.js` directly in cases where the native JPEG decoder cannot be trusted to do the right thing.
The only way to implement this is unfortunately to parse the *beginning* of the JPEG image data, looking for a SOF marker. To limit the impact of this extra parsing, the result is cached on the `JpegStream` instance and this code is only run for images which passed all of the pre-existing "can the JPEG image be natively rendered and/or decoded" checks.

---

*Slightly off-topic:* Working on this *really* makes me start questioning if native rendering/decoding of JPEG images is actually a good idea.
There's certain kinds of JPEG images not supported natively, and all of the validation which is now necessary isn't "free". At this point, in the `NativeImageDecoder`, we're having to check for certain properties in the image dictionary, parse the `ColorSpace`, and finally read the actual image data to find the SOF marker.
Furthermore, we cannot just send the image to the main-thread and be done in the "JpegStream" case, but we also need to wait for rendering to complete (or fail) before continuing with other parsing.
In the "JpegDecode" case we're even having to parse part of the image on the main-thread, which seems completely at odds with the principle of doing all heavy parsing in the Worker, and there's also a couple of potentially large (temporary) allocations/copies of TypedArray data involved as well.

This commit is contained in:

Jonas Jenwald

2020-01-09 21:39:31 +01:00

parent 3472b671e7

commit 5494f7d5bc

3 changed files with 133 additions and 2 deletions

									
										3

src/core/image_utils.js
									
										View file
										
				@ -41,7 +41,8 @@ class NativeImageDecoder {

				        this.xref,

				        this.resources,

				        this.pdfFunctionFactory

				      )

				      ) &&

				      image.maybeValidDimensions

				    );

				  }

Rows
Columns

Add basic validation of the scanLines parameter in JPEG images, before delegating decoding to the browser

3 src/core/image_utils.js Unescape Escape View file

Add basic validation of the `scanLines` parameter in JPEG images, before delegating decoding to the browser

3

src/core/image_utils.js

View file