1
0
Fork 0
mirror of https://github.com/mozilla/pdf.js.git synced 2025-04-19 14:48:08 +02:00
Commit graph

20014 commits

Author SHA1 Message Date
Andrii Vitiv
824a619a2a
Fix error on empty response headers
Fixes https://github.com/mozilla/pdf.js/issues/18957

https://github.com/mozilla/pdf.js/pull/18682 introduced a regression that causes the following error:

```
Uncaught TypeError: Failed to construct 'Headers': Invalid name
    at PDFNetworkStreamFullRequestReader._onHeadersReceived (pdf.mjs:10214:29)
    at NetworkManager.onStateChange (pdf.mjs:10103:22)
```

The mentioned PR replaced a call to `getResponseHeader()` with `getAllResponseHeaders()` without handling cases where it may return null or an empty string. Quote from the [docs](https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest/getAllResponseHeaders#return_value):

> Returns:
>
>A string representing all of the response's headers (except those whose field name is Set-Cookie) separated by CRLF, or null if no response has been received. If a network error happened, an empty string is returned.

Run the following code and observe the error in the console. Note that the URL is intentionally set to an invalid value to simulate network error

```js
<script src="//mozilla.github.io/pdf.js/build/pdf.mjs" type="module"></script>

<script type="module">
  var url = 'blob:';

  pdfjsLib.GlobalWorkerOptions.workerSrc = '//mozilla.github.io/pdf.js/build/pdf.worker.mjs';

  var loadingTask = pdfjsLib.getDocument(url);
  loadingTask.promise
    .then((pdf) => console.log('PDF loaded'))
    .catch((reason) => console.error(reason));

</script>
```
2024-11-05 21:52:50 +02:00
Jonas Jenwald
e92a929a58 Try to improve handling of missing trailer dictionaries in XRef.indexObjects (issue 18986)
The problem with the referenced PDF document has nothing to do with invalid dates, as the issue seems to suggest, but rather with the fact that it has neither an XRef table nor a trailer dictionary.
Given that crucial parts of the internal document structure is missing, you might argue that it's not really a PDF document.

In an attempt to support this kind of corruption, we'll simply iterate through all (previously found) XRef entries and pick one that *might* be a valid /Root dictionary.
There's obviously no guarantee that this works, and it might not be fast in larger PDF documents, but at least it cannot be any worse than *immediately* throwing `InvalidPDFException` as we previously did here.

*Please note:* I'm totally fine with this patch being rejected, since it's somewhat questionable if we should actually attempt to support "PDF documents" with this level of corruption.
2024-11-05 18:19:26 +01:00
Jonas Jenwald
2c90eee5a8 Shorten a few helper functions in src/core/core_utils.js
In a few cases we can ever so slightly shorten the code without negatively impacting the readability.
2024-11-05 13:58:00 +01:00
Jonas Jenwald
f2fb3b95ce Add helper functions to load image blob/bitmap data in test/unit/api_spec.js
This avoids repeating the same code multiple times, and as part of the changes we'll also utilize existing PDF.js helpers more.
2024-11-04 14:09:34 +01:00
Jonas Jenwald
e4a5bd9555 Bump library version to 4.9 2024-11-04 10:37:35 +01:00
Jonas Jenwald
cefd1ebcd2
Merge pull request #18959 from Snuffleupagus/Node-20
[api-minor] Update the minimum supported Node.js version to 20, and only support the Fetch API for "remote" PDF documents in Node.js
2024-11-04 10:29:55 +01:00
Jonas Jenwald
9269fb9be2 Remove the BaseFullReader and BaseRangeReader classes in the src/display/node_stream.js file
After the previous patch these base-classes are only extended once each and they can thus be combined with the final classes.
2024-11-03 16:18:12 +01:00
Jonas Jenwald
cbf0ca71bf [api-minor] Only support the Fetch API for "remote" PDF documents in Node.js environments
The Fetch API has been supported since Node.js version 18, see https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API#browser_compatibility
2024-11-03 16:18:10 +01:00
Jonas Jenwald
c7407230c1 [api-minor] Load Node.js packages/polyfills with process.getBuiltinModule
This allows *synchronous* loading of Node.js modules and (indirectly) packages, thus simplifying the code a fair bit.
2024-11-03 16:13:58 +01:00
Jonas Jenwald
4f01cdef18 [api-minor] Update the minimum supported Node.js version to 20
This patch updates the minimum supported environments as follows:
 - Node.js 20, which was released on 2023-04-18 and has now entered the "Maintenance"-phase; see https://github.com/nodejs/release#release-schedule

Furthermore, note also that Node.js 18 will fairly soon reach EOL.
2024-11-03 16:13:55 +01:00
Tim van der Meij
35673d3e6e
Merge pull request #19001 from timvandermeij/integration-test-scripting-uppercase
Fix the "must convert input to uppercase" scripting integration test
2024-11-03 15:47:45 +01:00
Tim van der Meij
3adf8b6be0
Fix the "must convert input to uppercase" scripting integration test
This integration test fails intermittently because we're not
(correctly) awaiting the sandbox actions. The `27R` field in
`issue14862.pdf` triggers sandbox events for every typing action, but
for the backspace and "a" character typing actions we weren't awaiting
the sandbox trip at all, and for other places we weren't awaiting it
fully (causing some characters to be missed in the assertion).

This commit fixes the issues by using the appropriate helper functions,
similar to what we did in PR #18399. Not only is this shorter in terms
of code, but it also fixed the near-permafail for this test with newer
versions of Puppeteer.
2024-11-03 15:08:55 +01:00
Tim van der Meij
20fbb4d661
Merge pull request #19000 from timvandermeij/types-node
Install and use the most recent Node types for the types tests
2024-11-03 14:41:06 +01:00
Tim van der Meij
ccfaf20ee2
Install and use the most recent Node types for the types tests
The types tests run in Node.js and therefore use Node types for e.g.
builtins. However, we didn't explicitly indicate this in
`tsconfig.json` (see [1] for more information and [2] for the PR where
we found this). Moreover, we didn't explicitly install the most recent
version of `@types/node` which implicitly made us fall back to version
14.14.45 (because that was installed as a dependency of other modules)
whereas much newer versions are available and we need those after
changes in Node.js (see [3] for more information and [4] for the PR
where we found this).

This commit fixes both issues by explicitly installing and using the
most recent Node.js types, which should also avoid future issues with
the types tests.

[1] https://github.com/TypeStrong/ts-node/issues/1012
[2] https://github.com/mozilla/pdf.js/pull/18237
[3] https://stackoverflow.com/questions/78790943/in-typescript-5-6-buffer-is-not-assignable-to-arraybufferview-or-uint8arr
[4] https://github.com/mozilla/pdf.js/pull/18959
2024-11-03 13:40:55 +01:00
Tim van der Meij
c1bcb46b3b
Merge pull request #18999 from Snuffleupagus/toBase64Util-unittest
Use the `toBase64Util` helper function in the unit-tests
2024-11-03 13:04:13 +01:00
Jonas Jenwald
f78a8f3c54 Use the toBase64Util helper function in the unit-tests 2024-11-03 11:25:19 +01:00
Tim van der Meij
5f77b907eb
Merge pull request #18997 from Snuffleupagus/Node-enable-Blob-unittest
Enable the 'gets PDF filename from query string appended to "blob:" URL' unit-test in Node.js
2024-11-03 11:11:36 +01:00
Tim van der Meij
7ae21ece4a
Merge pull request #18998 from Snuffleupagus/Node-enable-XFA-alt-unittest
Enable the "should have an alt attribute from toolTip" unit-test in Node.js
2024-11-03 11:10:54 +01:00
Tim van der Meij
3755d680e4
Merge pull request #18995 from timvandermeij/updates
Update dependencies and translations to the most recent versions
2024-11-03 11:09:15 +01:00
Jonas Jenwald
faf9e32ecb Enable the "should have an alt attribute from toolTip" unit-test in Node.js
Despite the pending-message mentioning "Image", this appears to be another case where the code actually depends on [`Blob`](https://developer.mozilla.org/en-US/docs/Web/API/Blob/Blob#browser_compatibility); note cf3ca8b5bc/src/core/xfa/template.js (L3453)
2024-11-03 00:15:44 +01:00
Jonas Jenwald
15fbee158c Enable the 'gets PDF filename from query string appended to "blob:" URL' unit-test in Node.js
The necessary functionality has been supported in Node.js for quite some time now, please see:
 - https://developer.mozilla.org/en-US/docs/Web/API/Blob/Blob#browser_compatibility
 - https://developer.mozilla.org/en-US/docs/Web/API/URL/createObjectURL_static#browser_compatibility
2024-11-02 23:53:03 +01:00
Tim van der Meij
3854ab5efd
Update translations to the most recent versions 2024-11-02 20:21:59 +01:00
Tim van der Meij
3cd906829e
Update dependencies to the most recent versions 2024-11-02 20:20:58 +01:00
Tim van der Meij
cf3ca8b5bc
Merge pull request #18994 from timvandermeij/bump
Bump the stable version in `pdfjs.config`
2024-11-02 19:59:41 +01:00
Tim van der Meij
627a588336
Bump the stable version in pdfjs.config 2024-11-02 19:56:10 +01:00
Tim van der Meij
3634dab10c
Merge pull request #18988 from Snuffleupagus/split-dom-factory
Move the various DOM-factories into their own files
2024-11-02 19:06:47 +01:00
Tim van der Meij
e930f3030c
Merge pull request #18992 from Snuffleupagus/getPdfManager-inline-flushChunks
Inline the `flushChunks` helper function, used in `getPdfManager` on the worker-thread
2024-11-02 18:58:29 +01:00
Jonas Jenwald
e5485108ec
Merge pull request #18990 from Snuffleupagus/ensure-structTree-serializable
Ensure that serializing of StructTree-data cannot fail during loading
2024-11-02 15:17:10 +01:00
Jonas Jenwald
aa4839ed0f
Merge pull request #18993 from Snuffleupagus/stringToUTF16HexString-hexNumbers
Use the `hexNumbers` structure in the `stringToUTF16HexString` helper
2024-11-02 15:15:47 +01:00
Jonas Jenwald
2145a7b9ca Use the hexNumbers structure in the stringToUTF16HexString helper
We can re-use the `hexNumbers` structure here, since that allows us to directly lookup the hexadecimal values and shortens the code.
2024-11-02 15:00:32 +01:00
Jonas Jenwald
196f7d7df1 Inline the flushChunks helper function, used in getPdfManager on the worker-thread
- This helper function has only a single call-site, and the function is fairly short.

 - It'll only be invoked if range requests are *disabled*, or if the entire PDF manages to load *before* the headers are resolved (which is very unlikely).
   Hence, by default, this helper function is not invoked.

 - By inlining the code we're able to utilize the existing error-handling at the call-site, rather than having to duplicate it, which further reduces the size of this code.

Finally, while slightly unrelated, this patch also adds optional chaining in one spot in the file (PR 16424 follow-up).
2024-11-02 11:06:30 +01:00
Jonas Jenwald
b26dc19392 Ensure that serializing of StructTree-data cannot fail during loading
I discovered that doing skip-cache re-reloading of https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf would *intermittently* cause (some of) the AnnotationLayers to break with errors printed in the console (see below).

In hindsight this bug is really obvious, however it took me quite some time to find it, since the `StructTreePage.prototype.serializable` getter will lookup various data and all of those cases can fail during loading when streaming and/or range requests are being used.

Finally, to prevent any future errors, ensure that the viewer won't break in these sort of situations.

```
Uncaught (in promise)
Object { message: "Missing data [19098296, 19098297)", name: "UnknownErrorException", details: "MissingDataException: Missing data [19098296, 19098297)", stack: "BaseExceptionClosure@resource://pdf.js/build/pdf.mjs:453:29\n@resource://pdf.js/build/pdf.mjs:456:2\n" }
viewer.mjs:8801:55

\#renderAnnotationLayer: "UnknownErrorException: Missing data [17552729, 17552730)". viewer.mjs:8737:15

Uncaught (in promise)
Object { message: "Missing data [17552729, 17552730)", name: "UnknownErrorException", details: "MissingDataException: Missing data [17552729, 17552730)", stack: "BaseExceptionClosure@resource://pdf.js/build/pdf.mjs:453:29\n@resource://pdf.js/build/pdf.mjs:456:2\n" }
viewer.mjs:8801:55
```
2024-11-01 17:43:59 +01:00
Jonas Jenwald
4e12906061 Move the various DOM-factories into their own files
- Over time the number and size of these factories have increased, especially the `DOMFilterFactory` class, and this split should thus aid readability/maintainability of the code.

 - By introducing a couple of new import maps we can avoid bundling the `DOMCMapReaderFactory`/`DOMStandardFontDataFactory` classes in the Firefox PDF Viewer, since they are dead code there given that worker-thread fetching is always being used.

 - This patch has been successfully tested, by running `$ ./mach test toolkit/components/pdfjs/`, in a local Firefox artifact-build.

*Note:* This patch reduces the size of the `gulp mozcentral` output by `1.3` kilo-bytes, which isn't a lot but still cannot hurt.
2024-11-01 13:31:28 +01:00
Tim van der Meij
06f3b2d0a6
Merge pull request #18983 from Snuffleupagus/api-FetchBuiltInCMap-FetchStandardFontData-async
Change the "FetchBuiltInCMap"/"FetchStandardFontData" message-handlers to be asynchronous
2024-10-31 20:30:11 +01:00
Jonas Jenwald
3ed438aef5
Merge pull request #18979 from Snuffleupagus/L10n-#elements-lazy-init
Don't initialize `L10n.#elements` eagerly since it's unused in MOZCENTRAL builds
2024-10-31 11:03:24 +01:00
Jonas Jenwald
7572382c7a Change the "FetchBuiltInCMap"/"FetchStandardFontData" message-handlers to be asynchronous
This way we can directly throw Errors, rather than having to "manually" return rejected Promises, which is ever so slightly shorter.

Also, since `useWorkerFetch` is always true in MOZCENTRAL builds these message-handlers should not be invoked there.
2024-10-31 09:29:11 +01:00
Jonas Jenwald
cdd4b052f9 Don't initialize L10n.#elements eagerly since it's unused in MOZCENTRAL builds
It's not necessary to manually start translation in the Firefox PDF Viewer, and doing so would even cause problems there (see issue 17142).
2024-10-30 15:20:44 +01:00
Jonas Jenwald
f013c39b9f
Merge pull request #18978 from Snuffleupagus/toHexUtil-simplify
Re-factor the `toHexUtil` helper (PR 17862 follow-up)
2024-10-29 17:10:36 +01:00
calixteman
f142fb8c28
Merge pull request #18972 from calixteman/refactor_highlight
[Editor] Refactor the free highlight stuff in order to be able to use the code for more general drawing
2024-10-29 17:09:03 +01:00
Jonas Jenwald
db1238aae3 Re-factor the toHexUtil helper (PR 17862 follow-up)
We can re-use the `hexNumbers` structure, since that allows us to directly lookup the hexadecimal values and shortens the code.
2024-10-29 16:35:44 +01:00
Jonas Jenwald
9870099e90
Merge pull request #18977 from Snuffleupagus/api-ReaderHeadersReady-simplify
Simplify the "ReaderHeadersReady" message-handler in the API
2024-10-29 15:46:31 +01:00
Calixte Denizet
5a9607b2ad [Editor] Refactor the free highlight stuff in order to be able to use the code for more general drawing
One goal is to make the code for drawing with the Ink tool similar to the one to free highlighting:
it doesn't really make sense to have so different ways to do almost the same thing.

When the zoom level is high, it'll avoid to create a too big canvas covering all the page which consume
more memory, makes the drawing very slow and the overall user xp pretty bad.

A second goal is to be able to easily implement more drawing tools where we would just have to implement
how to draw from the pointer coordinates.
2024-10-29 15:41:08 +01:00
Jonas Jenwald
25cf4add05
Merge pull request #17862 from Snuffleupagus/fingerprints-toHex
Improve the implementation of the `PDFDocument.fingerprints`-getter
2024-10-29 15:34:02 +01:00
Jonas Jenwald
afb4813d1c Simplify the "ReaderHeadersReady" message-handler in the API
We can convert the handler to an `async` function, which removes the need to create a temporary Promise here.
Given the age of this code it shouldn't hurt to simplify it a little bit.
2024-10-29 14:59:39 +01:00
Jonas Jenwald
8f47d06d07 Add helper functions to allow using new Uint8Array methods
This allows using the new methods in browsers that support them, e.g. Firefox 133+, while still providing fallbacks where necessary; see https://github.com/tc39/proposal-arraybuffer-base64

*Please note:* These are not actual polyfills, but only implements what we need in the PDF.js code-base. Eventually this patch should be reverted, once support is generally available.
2024-10-29 10:22:35 +01:00
Jonas Jenwald
bfc645bab1 Introduce some Uint8Array.fromBase64 and Uint8Array.prototype.toBase64 usage in the main code-base
See https://github.com/tc39/proposal-arraybuffer-base64
2024-10-29 10:22:35 +01:00
Jonas Jenwald
f9fc477080 Improve the implementation of the PDFDocument.fingerprints-getter
- Add explicit `length` validation of the /ID entries. Given the `EMPTY_FINGERPRINT` constant we're already *implicitly* assuming a particular length.

 - Move the constants into the `fingerprints`-getter, since they're not used anywhere else.

 - Replace the `hexString` helper function with the standard `Uint8Array.prototype.toHex` method; see https://github.com/tc39/proposal-arraybuffer-base64
2024-10-29 10:22:35 +01:00
Jonas Jenwald
3a85479c67
Merge pull request #18974 from Snuffleupagus/issue-18973
Allow `StreamsSequenceStream` to skip sub-streams that are not actual Streams (issue 18973)
2024-10-29 10:21:35 +01:00
Jonas Jenwald
48a18585f2 Allow StreamsSequenceStream to skip sub-streams that are not actual Streams (issue 18973)
This extends PR 13796 to also handle the case where sub-streams contain invalid data, i.e. anything that isn't a Stream, however please note that in these cases there's no guarantee that we'll render the page "correctly".

Note that Adobe Reader, i.e. the PDF reference implementation, cannot render the last page of the referenced PDF document.
2024-10-29 09:36:08 +01:00
Jonas Jenwald
93961e2802
Merge pull request #18971 from Snuffleupagus/AltText-fluent
[Editor] Utilize Fluent "better" when localizing the AltText
2024-10-28 20:53:06 +01:00