Automatically detect links in the text content of a file and automatically
generate link annotations at the appropriate locations to achieve
automatic link detection and hyperlinking.
Given that we now have a few different factory-url parameters, we introduce a helper function for parsing them.
*Please note:* These parameters have always been documented as requiring a trailing slash[1], which we can now easily enforce during the `getDocument`-call.
---
[1] I recall that we've occasionally seen issues because users miss that detail, and the new Error should hopefully be more easily actionable than one thrown during rendering/parsing.
This CSS feature is now available in *most* browsers that we support, with old Chromium-based browsers being the only exception; please see https://developer.mozilla.org/en-US/docs/Web/CSS/color_value/color-mix#browser_compatibility
From this data we see that the feature in question has been supported since Chrome 111, which was released on 2023-03-01 (i.e. almost two years ago).
Please note that we've never guaranteed that all features and functionality will be available in the oldest supported browsers.
Furthermore, even with the `color-mix` fallback removed PopupAnnotations will still function just as before but may render with the default color (defined in the CSS-file) rather than the one specified in the PDF document.
Currently we're manually computing the width/height of the /Rect-entry in a number of spots throughout the worker-thread Annotation code, which these new getters help avoid.
Currently we're initializing the image-options for every page, which seems unnecessary since it should suffice to do that once per document.
Also, changes the `BasePdfManager` constructor to improve readability/documentation a little bit.
This patch is adding some code in order to extract a drawing as curves from an image.
The algorithm is basically the following:
- reduce the dimensions
- make it gray
- apply a bilateral filter in order to add some blurryness while keeping the edges
- compute the histogram
- guess what's the background color which should contain a large majority of the pixels
- make a binary image
- extract the contours in using the Suzuki algorithm
- apply the Douglas-Peucker algorithm in order to reduce the number of points
The algorithm is improvable but it should work pretty well if there's a clear difference between
the background and the drawing.
In a v2 we could use a ML model in order to improve the extraction.
There's few changes related to the UI in order to make the tool usable, but they're very basic
for the moment.
This patch fixes a bug that caused incorrect curve shapes when an endpoint lies beyond the page boundaries. It adds a check for the endpoint's position, and if it is outside the page, the point is excluded from the shape's coordinates.
Steps to reproduce this in `master`:
1. Open https://mozilla.github.io/pdf.js/web/viewer.html
2. Use the "Open"-button (in the secondaryToolbar), or drag-and-drop, to load another PDF document.
3. Enable the highlight-editor.
4. Try to pick a new colour.
Note how it's no longer possible to change the default highlight-colour.
The reason for this is that we're only initializing the viewer-toolbar `ColorPicker` *once*, which doesn't work since every PDF document gets its own `AnnotationEditorUIManager`-instance. To address this we simply need to re-initialize the viewer-toolbar `ColorPicker`, and note that this patch won't affect the Firefox PDF Viewer.
After PR 2317, which landed in 2012, we'd immediately clean-up after rendering for pages with large image resources. This had the effect that re-rendering, e.g. after zooming, would force us to re-parse the entire page which could easily lead to bad performance.
In PR 16108, which landed in 2023, we tried to lessen the impact of that by slightly delaying clean-up however that's obviously not a perfect solution (and it increased the complexity of the relevant code).
Furthermore, the condition for this "immediate" clean-up seems a bit arbitrary to me since a page could easily contain a large number of smaller images whose total size vastly exceeds the threshold.
Hence this patch, which suggests that we remove the conditional and delayed clean-up after rendering. Compared to the situation back in 2012, a number of things have improved since:
- We have *multiple* caches for repeated image-resources on the worker-thread[1], which helps reduce overall memory usage and improves performance.
- We downsize huge images on the worker-thread, which means that the images we're using on the main-thread cannot be arbitrarily large.
- The amount of available RAM on devices should be a lot higher, since more than a decade has passed.
A future improvement here, for more resource constrained environments, could be to instead clean-up when actually needed using e.g. `WeakRef`s (see issue 18148).
---
[1] More specifically:
- `LocalImageCache`, which caches image-data by /Name and /Ref on the `PartialEvaluator.prototype.getOperatorList` level.
- `RegionalImageCache`, which caches image-data by /Ref on the `PartialEvaluator`-instance (i.e. at the page) level.
- `GlobalImageCache`, which caches image-data by /Ref globally at the document level.
It fixes#19360.
Each glyph in the test case has a fill and a stroke pattern, so the current transform used
to scale the glyph outline must be the same.
In setting the stroke color to green, I noticed that the last outline contains some non-closed
subpaths, so when generating the glyph outline, every time we 'moveTo', we close the previous
subpath.
- Most of the these are only used in the `src/display/api.js` file, and this leads to slightly shorter code.
- A number of unit-tests need a `BaseCanvasFactory`-instance, however that one is available through the `PDFDocumentProxy`-instance nowadays.
- For other unit-tests the remaining necessary default Factory-definitions can be moved into the `test/unit/test_utils.js` file.
Currently we're not checking that the response is actually OK before getting the data, which means that rather than throwing an error we can get an empty `ArrayBuffer`.
To avoid duplicating code we can move an existing helper into `src/core/core_utils.js` and re-use it when fetching the JPX wasm-file as well.
Given that we nowadays provide default Node.js versions of these Factory-parameters it no longer seems necessary to mention that environment specifically.
These old exceptions have a fair amount of overlap given how/where they are being used, which is likely because they were introduced at different points in time, hence we can shorten and simplify the code by replacing them with a more general `ResponseException` instead.
Besides an error message, the new `ResponseException` instances also include:
- A numeric `status` field containing the server response status, similar to the old `UnexpectedResponseException`.
- A boolean `missing` field, to allow easily detecting the situations where `MissingPDFException` was previously thrown.
In order to fix bug 1935076, we'll have to add a pure js fallback in case wasm is disabled
or simd isn't supported. Unfortunately, this fallback will take some space.
So, the main goal of this patch is to reduce the overall size (by ~93k).
As a side effect, it should make easier to use an other wasm file (which must export
_jp2_decode, _malloc and _free).
In the affected font the total number of mapping-entries is `1142348`, and no less than `997473` of them are duplicates.
Given that every duplicate causes a lot of Array elements to be moved this becomes extremely inefficient, which we can avoid by keeping track of seen `charCode`s and directly build the final mappings-Array instead.
Currently we re-implement a number of helper functions specifically for this code, which seems completely unnecessary since there's already general purpose ones available in the `src/core/core_utils.js` file.
It lets the user make a pinch gesture with a finger on page with a drawing
and the second finger on an other page.
On mobile, it's pretty easy to be in such a situation.
It fixes#19239.
When the canvas isn't existing the editor has no image: it's fine because the editor is invisible.
Once it's made visible, the canvas is set when the annotation layer has been rendered.
This appears to have regressed in PR 13808, since it removed the `matrix`-entry from array returned by the `MeshShading.prototype.getIR` method *without* also updating the indexes in the `MeshShadingPattern` constructor.
The `/Root/AcroForm/Fields` array contains a "ridiculous" number of LinkAnnotations, which obviously makes no sense since those are not form fields.
To improve performance we'll thus ignore those when collecting the field objects.
There's a few cases where we're looping through the result of `Dict.prototype.getKeys` and then manually look-up the values, which after PR 19051 can be replaced with direct iteration instead.
Originally the code in this file was used in both the GENERIC and MOZCENTRAL builds, however that's no longer the case.
Hence we can now directly call `NetworkManager.prototype.request` and remove these old "helper" methods that only had a single call-site each.
From section [11.6.4.3 Mask Shape and Opacity](https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G10.4848628) in the PDF specification:
- An image XObject may contain its own *soft-mask image* in the form of a subsidiary image XObject in the `SMask` entry of the image dictionary (see "Image Dictionaries"). This mask, if present, shall override any explicit or colour key mask specified by the image dictionary's `Mask` entry. Either form of mask in the image dictionary shall override the current soft mask in the graphics state.