1
0
Fork 0
mirror of https://github.com/mozilla/pdf.js.git synced 2025-04-19 22:58:07 +02:00
Commit graph

352 commits

Author SHA1 Message Date
Calixte Denizet
be1f5671bb [api-minor] Use a Path2D when doing a path operation in the canvas (bug 1946953)
With this patch, all the paths components are collected in the worker until a path
operation is met (i.e., stroke, fill, ...).
Then in the canvas a Path2D is created and will replace the path data transfered from the worker,
this way when rescaling, the Path2D can be reused.
In term of performances, using Path2D is very slightly improving speed when scaling the canvas.
2025-03-22 20:35:24 +01:00
Jonas Jenwald
9e8d4e4d46 [api-minor] Attempt to support fetching the raw data of the PDF document from the PDFDocumentLoadingTask-instance (issue 15085)
The new API-functionality will allow a PDF document to be downloaded in the viewer e.g. while the PasswordPrompt is open, or in cases when document initialization failed.
Normally the raw data of the PDF document would be accessed via the `PDFDocumentProxy.prototype.getData` method, however in these cases the `PDFDocumentProxy`-instance isn't available.
2025-03-16 10:09:44 +01:00
Jonas Jenwald
7b5cd9cddd Use arrow functions with some Promise.then calls
A lot of this is fairly old code, which we can shorten slightly by using arrow functions instead of "regular" functions.
2025-03-02 19:57:38 +01:00
Jonas Jenwald
2e62f426fe Use arrow function with various Array methods
A lot of this is quite old code, which we can shorten slightly by using arrow functions instead of "regular" functions.
2025-03-02 15:19:04 +01:00
Jonas Jenwald
d5ce35f744 Move the EXIF-block replacement into JpegStream (PR 19356 follow-up)
Currently we modify the EXIF-block in place, which may end up "breaking" the JPEG-data of the original PDF document since e.g. saving it from the viewer no longer contains the real EXIF-block.
Hence the EXIF-block replacement is moved into the `JpegStream` class, such that we can copy the data before doing the replacement.
2025-02-20 12:41:39 +01:00
Jonas Jenwald
36979e9eb2 Fix all outstanding ESLint arrow-body-style warnings
Currently this rule is disabled in a number of spots across the code-base, and unless absolutely necessary we probably shouldn't disable linting, so let's just update the code to fix all the outstanding cases.
2025-02-17 15:45:44 +01:00
Jonas Jenwald
33cba30bdb Search for destinations in both /Names and /Dests dictionaries (issue 19474)
Currently we only use either one of them, preferring the NameTree when it's available.
2025-02-14 15:49:05 +01:00
Jonas Jenwald
db43f158dc Inline the default Factory-definitions in getDocument
- Most of the these are only used in the `src/display/api.js` file, and this leads to slightly shorter code.

 - A number of unit-tests need a `BaseCanvasFactory`-instance, however that one is available through the `PDFDocumentProxy`-instance nowadays.

 - For other unit-tests the remaining necessary default Factory-definitions can be moved into the `test/unit/test_utils.js` file.
2025-01-18 14:09:14 +01:00
Jonas Jenwald
75cba72ca6 [api-major] Replace MissingPDFException and UnexpectedResponseException with one exception
These old exceptions have a fair amount of overlap given how/where they are being used, which is likely because they were introduced at different points in time, hence we can shorten and simplify the code by replacing them with a more general `ResponseException` instead.

Besides an error message, the new `ResponseException` instances also include:
 - A numeric `status` field containing the server response status, similar to the old `UnexpectedResponseException`.

 - A boolean `missing` field, to allow easily detecting the situations where `MissingPDFException` was previously thrown.
2025-01-16 22:51:05 +01:00
Jonas Jenwald
6f062abb76 Skip LinkAnnotations when collecting field objects (issue 19281)
The `/Root/AcroForm/Fields` array contains a "ridiculous" number of LinkAnnotations, which obviously makes no sense since those are not form fields.
To improve performance we'll thus ignore those when collecting the field objects.
2025-01-04 11:54:45 +01:00
Jonas Jenwald
c6e3fc4fe6 Take the userUnit into account in the PageViewport class (issue 19176) 2024-12-08 15:51:04 +01:00
Jonas Jenwald
f8d11a3a3a
Merge pull request #19074 from Rob--W/issue-12744-test
Add test cases for redirected responses
2024-12-02 19:06:55 +01:00
Rob Wu
f97b4b9a66 Add test cases for redirected responses
Regression tests for issue #12744 and PR #19028
2024-12-02 17:57:49 +01:00
Rob Wu
28b0220bc2 Replace createTemporaryNodeServer with TestPdfsServer
Some tests rely on the presence of a server that serves PDF files.
When tests are run from a web browser, the test files and PDF files are
served by the same server (WebServer), but in Node.js that server is not
around.

Currently, the tests that depend on it start a minimal Node.js server
that re-implements part of the functionality from WebServer.

To avoid code duplication when tests depend on more complex behaviors,
this patch replaces createTemporaryNodeServer with the existing
WebServer, wrapped in a new test utility that has the same interface in
Node.js and non-Node.js environments (=TestPdfsServer).

This patch has been tested by running the refactored tests in the
following three configurations:

1. From the browser:
   - http://localhost:8888/test/unit/unit_test.html?spec=api
   - http://localhost:8888/test/unit/unit_test.html?spec=fetch_stream

2. Run specific tests directly with jasmine without legacy bundling:
   `JASMINE_CONFIG_PATH=test/unit/clitests.json ./node_modules/.bin/jasmine --filter='^api|^fetch_stream'`

3. `gulp unittestcli`
2024-12-02 17:57:49 +01:00
Rob Wu
131d4650a5 Drop trailing whitespace from test/unit/api_spec.js
test/unit/api_spec.js is the only JS file in the tree with trailing
whitespace. Because `trim_trailing_whitespace = true` in .editorconfig,
any editor supporting EditorConfig would trim whitespace when the file
is changed, which results in test failures.

This commit fixes the issue by trimming the trailing whitespace and
adjusting the test expectations.
2024-11-24 23:37:16 +01:00
Jonas Jenwald
1a56b35af7
Merge pull request #19003 from Snuffleupagus/api-unittest-image-helpers
Add helper functions to load image blob/bitmap data in `test/unit/api_spec.js`
2024-11-06 09:11:28 +01:00
Jonas Jenwald
e92a929a58 Try to improve handling of missing trailer dictionaries in XRef.indexObjects (issue 18986)
The problem with the referenced PDF document has nothing to do with invalid dates, as the issue seems to suggest, but rather with the fact that it has neither an XRef table nor a trailer dictionary.
Given that crucial parts of the internal document structure is missing, you might argue that it's not really a PDF document.

In an attempt to support this kind of corruption, we'll simply iterate through all (previously found) XRef entries and pick one that *might* be a valid /Root dictionary.
There's obviously no guarantee that this works, and it might not be fast in larger PDF documents, but at least it cannot be any worse than *immediately* throwing `InvalidPDFException` as we previously did here.

*Please note:* I'm totally fine with this patch being rejected, since it's somewhat questionable if we should actually attempt to support "PDF documents" with this level of corruption.
2024-11-05 18:19:26 +01:00
Jonas Jenwald
f2fb3b95ce Add helper functions to load image blob/bitmap data in test/unit/api_spec.js
This avoids repeating the same code multiple times, and as part of the changes we'll also utilize existing PDF.js helpers more.
2024-11-04 14:09:34 +01:00
Calixte Denizet
c9050be863 [Editor] Add the possibility to save an updated stamp annotation (bug 1921291) 2024-10-02 11:45:16 +02:00
Calixte Denizet
0382dd0e25 [Editor] When deleting an annotation with popup, then delete the popup too 2024-09-26 17:52:25 +02:00
Jonas Jenwald
bb302dd993 [api-minor] Pass CanvasFactory/FilterFactory, rather than instances, to getDocument
This unifies the various factory-options, since it's consistent with `CMapReaderFactory`/`StandardFontDataFactory`, and ensures that any needed parameters will always be consistently provided when creating `CanvasFactory`/`FilterFactory`-instances.

As shown in the modified example this may simplify some custom implementations, since we now provide the ability to access the `CanvasFactory`-instance used with a particular `getDocument`-invocation.
2024-09-23 11:26:30 +02:00
Calixte Denizet
ddba096191 Make tagged images visible for screen readers (bug 1708040)
The idea is to insert a span in the text layer with an aria-role set to img
and use the bounding box provided by the attribute field in the tag dict in
order to have non-null dimensions for the image to make it "visible".
2024-09-05 17:59:42 +02:00
Jonas Jenwald
c4fdb28573 Remove PDFWorkerUtil and move its contents into PDFWorker instead
This is possible thanks to features, i.e. private fields and in particular static initialization blocks, that didn't exist back when we started using classes in the code-base.
2024-07-29 11:22:43 +02:00
Jonas Jenwald
c4cd405a8f Ignore non-dictionary nodes when parsing StructTree data (issue 18503) 2024-07-28 12:08:44 +02:00
Calixte Denizet
c3065629ca [Editor] Correctly save a non-ascii alt text 2024-07-24 19:13:45 +02:00
Jonas Jenwald
d24a61c648 Allow /XYZ destinations without zoom parameter (issue 18408)
According to the PDF specification these destinations should have a zoom parameter, which may however be `null`, but it shouldn't be omitted; please see https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G11.2095870

Hence we try to work-around bad PDF generators by making the zoom parameter optional when validating explicit destinations in both the worker and the viewer.
2024-07-18 13:29:32 +02:00
Jonas Jenwald
403d023617 Allow e.g. /FitH destinations without additional parameter (bug 1907000)
According to the PDF specification these destinations should have a coordinate parameter, which may however be `null`, but it shouldn't be omitted; please see https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G11.2095870

Hence we try to work-around bad PDF generators by making the coordinate parameter optional when validating explicit destinations in both the worker and the viewer.
2024-07-11 10:36:44 +02:00
Jonas Jenwald
5ee61690f3
Merge pull request #18390 from alexcat3/fix-issue-18099
Handle toUnicode cMaps that omit leading zeros in hex encoded UTF-16 (issue 18099)
2024-07-06 18:57:07 +02:00
alexcat3
1c364422a6 Handle toUnicode cmaps that omit leading zeros in hex encoded UTF-16 (issue 18099)
Add unit test to check compatability with such cmaps

In the PDF in issue 18099. the toUnicode cmap had a line to map the glyph char codes from 00 to 7F to the corresponding code points. The syntax to map a range of char codes to a range of unicode code points is
<start_char_code> <end_char_code> <start_unicode_codepoint>
As the unicode code points are supposed to be given in UTF-16 BE, the PDF's line SHOULD have probably read
<00> <7F> <0000>
Instead it omitted two leading zeros from the UTF-16 like this
<00> <7F> <00>
This confused PDF.js into mapping these character codes to the UTF-16 characters with the corresponding HIGH bytes (01 became \u0100, 02 became \u0200, et cetera), which ended up turning latin text in the PDF into chinese when it was copied
I'm not sure if the PDF spec actually allows PDFs to do this, but since there's at least one PDF in the wild that does and other PDF readers read it correctly, PDF.js should probably support this
2024-07-06 11:29:21 -04:00
Tim van der Meij
2a44203d96
Fix the "caches image resources at the document/page level as expected (issue 11878)" unit test
This unit test fails occasionally (albeit much less than before thanks
to PR #17663), so we change the parsing time check's divisor to prevent
it from happening again. If the last page's rendering time is less than
or equal to 50% of the first page's rendering time that should be enough
proof that no worker thread re-parsing occurred while also providing a
wide enough range to avoid intermittents.

Note that the assertion is now equal to the one we already have in the
"caches image resources at the document/page level, with main-thread
copying of complex images (issue 11518)" unit test which seems to work
reliably so far.
2024-07-06 16:30:07 +02:00
Jonas Jenwald
06334c97ef Improve the loadingParams functionality in the API
- Move the definition of the `loadingParams` Object, to simplify the code.

 - Add a unit-test, since none existed and the viewer depends on this functionality.
2024-05-24 09:26:40 +02:00
Jonas Jenwald
c5f92437f7 Avoid re-parsing global images that failed decoding (issue 18042, PR 17428 follow-up)
For images that failed to decode once we want to avoid a pointless round-trip to the main-thread, which could otherwise happen for globally cached images.
2024-05-14 13:58:36 +02:00
Jonas Jenwald
6d523c316c [api-minor] Include the document /Lang attribute in the textContent-data
- These changes will allow a simpler way of implementing PR 17770.

 - The /Lang attribute is fetched lazily, with the first `getTextContent` invocation. Given the existing worker-thread caching, this will thus only need to be done *once* per PDF document (and most PDFs don't included this data).

 - This makes the /Lang attribute *directly available* in the `textLayer`, which has the following advantages:
    - We don't need to block, and thus delay, overall viewer initialization on fetching it (nor pass it around throughout the viewer).

    - Third-party users of the `textLayer` will automatically benefit from this, once we start actually using the /Lang attribute in PR 17770.
      *Please note:* This also, importantly, means that the `text` reference-tests will then cover this code (which wouldn't otherwise have been the case).
2024-05-14 12:44:41 +02:00
Jonas Jenwald
2b69fb76ac [api-minor] Improve the FileSpec implementation
- Check that the `filename` is actually a string, before parsing it further.
 - Use proper "shadowing" in the `filename` getter.
 - Add a bit more validation of the data in `pickPlatformItem`.
 - Last, but not least, return both the original `filename` and the (path stripped) variant needed in the display-layer and viewer.
2024-05-01 18:02:05 +02:00
Jonas Jenwald
bf4e36d1b5 [api-minor] Expose the /Desc-attribute of file attachments in the viewer (issue 18030)
In the viewer this will be displayed in the `title` of the hyperlink, which is probably the best we can do here given how the viewer is implemented.
2024-05-01 09:02:11 +02:00
Calixte Denizet
45fa867577 Allow to insert several annotations under the same parent in the structure tree
While testing stamp insertion with the added pdf, I noticed that the tags using a MCID
weren't considered when trying to attach an annotation to it.
2024-04-24 16:23:05 +02:00
Jonas Jenwald
91898e5923 Extend the globally cached image main-thread copying to "complex" images as well (PR 17428 follow-up)
In PR 17428 this functionality was limited to "larger" images, to not affect performance negatively. However it turns out that it's also beneficial to consider more "complex" images, regardless of their size, that contain /SMask or /Mask data; see issue 11518.
2024-04-20 11:10:09 +02:00
Tim van der Meij
2e5282928f
Merge pull request #17854 from Snuffleupagus/rm-PromiseCapability
[api-minor] Replace the `PromiseCapability` with  `Promise.withResolvers()`
2024-04-02 15:21:43 +02:00
Jonas Jenwald
e4d0e84802 [api-minor] Replace the PromiseCapability with Promise.withResolvers()
This replaces our custom `PromiseCapability`-class with the new native `Promise.withResolvers()` functionality, which does *almost* the same thing[1]; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/withResolvers

The only difference is that `PromiseCapability` also had a `settled`-getter, which was however not widely used and the call-sites can either be removed or re-factored to avoid it. In particular:
 - In `src/display/api.js` we can tweak the `PDFObjects`-class to use a "special" initial data-value and just compare against that, in order to replace the `settled`-state.
 - In `web/app.js` we change the only case to manually track the `settled`-state, which should hopefully be OK given how this is being used.
 - In `web/pdf_outline_viewer.js` we can remove the `settled`-checks, since the code should work just fine without it. The only thing that could potentially happen is that we try to `resolve` a Promise multiple times, which is however *not* a problem since the value of a Promise cannot be changed once fulfilled or rejected.
 - In `web/pdf_viewer.js` we can remove the `settled`-checks, since the code should work fine without them:
     - For the `_onePageRenderedCapability` case the `settled`-check is used in a `EventBus`-listener which is *removed* on its first (valid) invocation.
     - For the `_pagesCapability` case the `settled`-check is used in a print-related helper that works just fine with "only" the other checks.
 - In `test/unit/api_spec.js` we can change the few relevant cases to manually track the `settled`-state, since this is both simple and *test-only* code.

---
[1] In browsers/environments that lack native support, note [the compatibility data](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/withResolvers#browser_compatibility), it'll be polyfilled via the `core-js` library (but only in `legacy` builds).
2024-04-01 11:42:37 +02:00
Calixte Denizet
136c1faa7f Display outlines even if one has no title
Fixes #17856.
2024-03-29 21:30:24 +01:00
Jonas Jenwald
0d039937f9 Add better support for /Launch actions with /FileSpec dictionaries (issue 17846) 2024-03-26 20:15:48 +01:00
Jonas Jenwald
eded037d06 [api-minor] Use the Fetch API, when supported, to load PDF documents in Node.js environments
Given that modern Node.js versions now implement support for a fair number of "browser" APIs, we can utilize the standard Fetch API to load PDF documents that are specified via http/https URLs.

Please find compatibility information at:
 - https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API#browser_compatibility
 - https://nodejs.org/dist/latest-v18.x/docs/api/globals.html#fetch
 - https://developer.mozilla.org/en-US/docs/Web/API/Response#browser_compatibility
 - https://nodejs.org/dist/latest-v18.x/docs/api/globals.html#response
2024-02-21 22:38:42 +01:00
Jonas Jenwald
37e98e39f6 Skip any whitespace after the first object in linearized PDFs (issue 17665)
This way the code is now consistent with the non-linearized branch in the `PDFDocument.startXRef` getter.
2024-02-12 22:05:36 +01:00
Jonas Jenwald
19ef3e367b Tweak the issue 11878 unit-test parsing time check (PR 17428 follow-up)
This unit-test has been failing occasionally in Chrome and Node.js, hence we tweak the parsing time check to reduce the likelihood of that happening.
2024-02-12 12:31:55 +01:00
Calixte Denizet
7f2428a77e Reduce memory use and improve perfs when computing the bounding box of a bezier curve (bug 1875547)
It isn't really a fix for the mentioned bug but it slightly improve things.
In reducing the memory use, the time spent in the GC is reduced either.
The algorithm to compute the bounding box is the same as before but it has just
been rewritten to be more efficient.
2024-01-24 23:41:14 +01:00
Jonas Jenwald
f9a384d711 Enable the arrow-body-style ESLint rule
This manually ignores some cases where the resulting auto-formatting would not, as far as I'm concerned, constitute a readability improvement or where we'd just end up with more overall indentation.

Please see https://eslint.org/docs/latest/rules/arrow-body-style
2024-01-21 16:20:55 +01:00
Jonas Jenwald
9dfe9c552c Use shorter arrow functions where possible
For arrow functions that are both simple and short, we can avoid using explicit `return` to shorten them even further without hurting readability.

For the `gulp mozcentral` build-target this reduces the overall size of the output by just under 1 kilo-byte (which isn't a lot but still can't hurt).
2024-01-21 10:13:12 +01:00
Calixte Denizet
405f573d70 Take into account empty lines when extracting text content from the appearance
Fixes #17492.
2024-01-14 20:23:29 +01:00
Jonas Jenwald
9f02cc36d4 Attempt to further reduce re-parsing for globally cached images (PR 11912, 16108 follow-up)
In PR 11912 we started caching images that occur on multiple pages globally, which improved performance a lot in many PDF documents.
However, one slightly annoying limitation of the implementation is the need to re-parse the image once the global-caching threshold has been reached. Previously this was difficult to avoid, since large image-resources will cause cleanup to run on the main-thread after rendering has finished. In PR 16108 we started delaying this cleanup a little bit, to improve performance if a user e.g. zooms and/or rotates the document immediately after rendering completes.

Taking those two PRs together, we now have a situation where it's much more likely that the main-thread has "globally used" images cached at the page-level. Hence we can instead attempt to *copy* a locally cached image into the global object-cache on the main-thread and thus reduce unnecessary re-parsing of large/complex global images, which significantly reduces the rendering time in many cases.

For the PDF document in issue 11878, the rendering time of *the second page* changes as follows (on my computer):
 - With the `master`-branch it takes >600 ms to render.
 - With this patch that goes down to ~50 ms, which is one order of magnitude faster.

(Note that all other pages are, as expected, completely unaffected by these changes.)

This new main-thread copying is limited to "large" global images, since:
 - Re-parsing of small images, on the worker-thread, is usually fast enough to not be an issue.
 - With the delayed cleanup after rendering, it's still not guaranteed that an image is available in a page-level cache on the main-thread.
 - This forces the worker-thread to wait for the main-thread, which is a pattern that you always want to avoid unless absolutely necessary.
2023-12-21 21:26:21 +01:00
Jonas Jenwald
674052d3fc Re-factor the blob-URL caching in DownloadManager.openOrDownloadData
Cache blob-URLs on the actual data, rather than DOM elements, to reduce potential duplicates (note the updated unit-test).
2023-10-17 10:18:34 +02:00