In Webpack version `5.99.0` the way that `export` statements are handled was changed slightly, with much less boilerplate code being generated, which unfortunately breaks our `tweakWebpackOutput` function that's used to expose the exported properties globally and that e.g. the viewer depends upon.
Given that we were depending on formatting that should most likely be viewed as nothing more than an internal implementation detail in Webpack, we instead work-around this by manually defining the structures that were previously generated.
Obviously this will lead to a tiny bit more manual work in the future, however we don't change the API-surface often enough that it should be a big issue *and* the relevant unit-tests are updated such that it shouldn't be possible to break this.
*NOTE:* In the future we might want to consider no longer using global properties like this, and instead rely only on proper `export`s throughout the code-base.
However changing this would likely be non-trivial (given edge-cases), and it'd be an `api-major` change, so let's just do the minimal amount of work to unblock Webpack updates for now.
These `getAll` methods are not used anywhere within the PDF.js code-base, outside of tests, and were mostly added (speculatively) for third-party users.
To still allow access to the same data we instead introduce iterators on these classes, which (slightly) shortens the code and allows us to remove the `objectFromMap` helper function.
A summary of the changes in this patch:
- Replace the `getAll` methods with iterators in the following classes: `AnnotationStorage`, `Metadata`, and `OptionalContentGroup`.
- Change, and also re-name, `AnnotationStorage.prototype.setAll` into a test-only method since it's not used elsewhere.
- Remove the `Metadata.prototype.has` method, since it's only used in tests and can be trivially replaced by calling `Metadata.prototype.get` and checking if the returned value is `null`.
We want to iterate through the data in the `computeMD5` function, and `Map`s have "nicer" support for that than generic objects.
(Somewhat recently `Map` performance was improved in Firefox, however this also isn't really performance sensitive code.)
With this patch, all the paths components are collected in the worker until a path
operation is met (i.e., stroke, fill, ...).
Then in the canvas a Path2D is created and will replace the path data transfered from the worker,
this way when rescaling, the Path2D can be reused.
In term of performances, using Path2D is very slightly improving speed when scaling the canvas.
Currently the MD5 computation doesn't actually work (at all?), since we're invoking the `calculateMD5` function without providing all of the necessary parameters and the PDF-data thus isn't taken into account.
Fixing this caused unit-tests to fail, which isn't that surprising since the current date/time is used in the MD5 computation, and we thus utilize Jasmine to work-around that.
The new API-functionality will allow a PDF document to be downloaded in the viewer e.g. while the PasswordPrompt is open, or in cases when document initialization failed.
Normally the raw data of the PDF document would be accessed via the `PDFDocumentProxy.prototype.getData` method, however in these cases the `PDFDocumentProxy`-instance isn't available.
This patch extends the approach of PR 14543, by also treating e.g. minus signs followed by '(' or '<' as zero.
Inside of a /Contents stream those characters will generally mean the start of one or more glyphs.
Currently we have a number of spots in the code-base where we need to clamp a value to a [min, max] range. This is either implemented using `Math.min`/`Math.max` or with a local helper function, which leads to some unnecessary duplication.
Hence this patch adds and re-uses a single helper function for this, which we'll hopefully be able to remove in the future once https://github.com/tc39/proposal-math-clamp/ becomes generally available.
This complements the existing `LocalColorSpaceCache`, which is unique to each `getOperatorList`-invocation since it also caches by `Name`, which should help reduce unnecessary re-parsing especially for e.g. `ICCBased` ColorSpaces once we properly support those.
Currently we re-implement the same helper function twice, which in hindsight seems like the wrong decision since that way it's quite easy for the implementations to accidentally diverge.
The reason for doing it this way was because the code in the worker-thread is able to check for `Ref`- and `Name`-instances directly, which obviously isn't possible in the viewer but can be solved by passing validation-functions to the helper.
When rendering big PDF pages at high zoom levels, we currently fall back
to CSS zoom to avoid rendering canvases with too many pixels. This
causes zoomed in PDF to look blurry, and the text to be potentially
unreadable.
This commit adds support for rendering _part_ of a page (called
`PDFPageDetailView` in the code), so that we can render portion of a
page in a smaller canvas without hiting the maximun canvas size limit.
Specifically, we render an area of that page that is slightly larger
than the area that is visible on the screen (100% larger in each
direction, unless we have to limit it due to the maximum canvas size).
As the user scrolls around the page, we re-render a new area centered
around what is currently visible.
Currently we modify the EXIF-block in place, which may end up "breaking" the JPEG-data of the original PDF document since e.g. saving it from the viewer no longer contains the real EXIF-block.
Hence the EXIF-block replacement is moved into the `JpegStream` class, such that we can copy the data before doing the replacement.
Currently this rule is disabled in a number of spots across the code-base, and unless absolutely necessary we probably shouldn't disable linting, so let's just update the code to fix all the outstanding cases.
Given that most inferred links will overlap existing LinkAnnotations, creating a lot of unused `borderStyle` objects seem unnecessary.
Hence we can move that into the `AnnotationLayer.prototype.addLinkAnnotations` method instead, which also allows us to slightly reduce the API-surface.
Automatically detect links in the text content of a file and automatically
generate link annotations at the appropriate locations to achieve
automatic link detection and hyperlinking.
- Most of the these are only used in the `src/display/api.js` file, and this leads to slightly shorter code.
- A number of unit-tests need a `BaseCanvasFactory`-instance, however that one is available through the `PDFDocumentProxy`-instance nowadays.
- For other unit-tests the remaining necessary default Factory-definitions can be moved into the `test/unit/test_utils.js` file.
These old exceptions have a fair amount of overlap given how/where they are being used, which is likely because they were introduced at different points in time, hence we can shorten and simplify the code by replacing them with a more general `ResponseException` instead.
Besides an error message, the new `ResponseException` instances also include:
- A numeric `status` field containing the server response status, similar to the old `UnexpectedResponseException`.
- A boolean `missing` field, to allow easily detecting the situations where `MissingPDFException` was previously thrown.
In order to fix bug 1935076, we'll have to add a pure js fallback in case wasm is disabled
or simd isn't supported. Unfortunately, this fallback will take some space.
So, the main goal of this patch is to reduce the overall size (by ~93k).
As a side effect, it should make easier to use an other wasm file (which must export
_jp2_decode, _malloc and _free).
When a dash separates two digits, it's very likely to not be a hyphen
inserted to split a word into two lines (e.g. "par\n-ser"), but rather
either a minus sign, a range, or a date. For example, in the tracemonkey
PDF there is `2008-02` (a date) split across two lines.
Preserving the dash, similarly to how we do for compound words, allows
searches for "2008-02" to find a match.
The `/Root/AcroForm/Fields` array contains a "ridiculous" number of LinkAnnotations, which obviously makes no sense since those are not form fields.
To improve performance we'll thus ignore those when collecting the field objects.
This commit makes sure that the font tests are no longer reported as
being unit tests and that the PDF.js logo is shown in the browser tab to
make PDF.js-specific resources/tabs more easily identifyable during e.g.
development.
The reporter is used in both the unit and the font tests, so this commit
moves it to the test root folder to more clearly indicate that this is a
shared resource and so the font tests don't have to reach into the unit
tests folder to import it (which improves separation).
Note that PR 19212 tried to change the test-server to fix the intermittent failure in Google Chrome, however that unfortunately caused *other* unit-tests to start failing.
As long as this unit-test still runs successfully in Mozilla Firefox that should be enough, and it doesn't seem like a good use of our time to hunt down a bug that only happens in Google Chrome.