1
0
Fork 0
mirror of https://github.com/mozilla/pdf.js.git synced 2025-04-25 17:48:07 +02:00
Commit graph

278 commits

Author SHA1 Message Date
Jonas Jenwald
ec1a05c104 Add missing startWorkerTask calls in the "SaveDocument" handler
Without these calls we'll not actually wait for saving to complete when document destruction runs; compare with other `WorkerTask`-usage in this file.
While I cannot imagine that this has caused any problems for library users, the code is however not technically correct as-is.
2024-12-21 14:22:18 +01:00
Jonas Jenwald
ede589dd6e Shorten the WorkerMessageHandler class a little bit
- Use `this` in all scopes where that's possible, to avoid having to spell out `WorkerMessageHandler` everywhere.

 - Inline the `isMessagePort` helper function, since there's only a single call-site.

 - Use a static initialization block to move more code into the `WorkerMessageHandler` class itself.
2024-11-30 14:07:16 +01:00
Jonas Jenwald
8ec399d7e1 Convert the getPdfManager function to be asynchronous
This is fairly old code, and by making the function `async` we can handle initialization errors "automatically" without the need for try-catch statements.
2024-11-22 17:49:43 +01:00
Jonas Jenwald
2c0cc48d1b Replace the forEach method in Dict with "proper" iteration support 2024-11-17 12:45:32 +01:00
Calixte Denizet
4bf7787084 Simplify saving added/modified annotations.
Having this map to collect the different changes will allow to know if some objects have already been modified.
2024-11-12 10:59:38 +01:00
Jonas Jenwald
196f7d7df1 Inline the flushChunks helper function, used in getPdfManager on the worker-thread
- This helper function has only a single call-site, and the function is fairly short.

 - It'll only be invoked if range requests are *disabled*, or if the entire PDF manages to load *before* the headers are resolved (which is very unlikely).
   Hence, by default, this helper function is not invoked.

 - By inlining the code we're able to utilize the existing error-handling at the call-site, rather than having to duplicate it, which further reduces the size of this code.

Finally, while slightly unrelated, this patch also adds optional chaining in one spot in the file (PR 16424 follow-up).
2024-11-02 11:06:30 +01:00
Calixte Denizet
3103deaa44 Fix missing annotation parent in using the one from the Fields entry
Fixes #15096.
2024-10-04 20:00:19 +02:00
Tim van der Meij
c77b97daff
Update the JS/CSS files for the new Prettier/Stylelint versions 2024-07-13 16:29:47 +02:00
Jonas Jenwald
a4ffc1066c Move the internal API/Worker isEditing-state into RenderingIntentFlag
In *hindsight* this seems like a better idea, since it avoids the need to manually pass `isEditing` around as a boolean value.
Note that `RenderingIntentFlag` is *internal* functionality, not exposed in the official API, which means that it can be extended and modified as necessary.
2024-07-04 23:34:30 +02:00
Calixte Denizet
64635f3b35 [api-minor][Editor] When switching to editing mode, redraw pages containing editable annotations
Right now, editable annotations are using their own canvas when they're drawn, but
it induces several issues:
 - if the annotation has to be composed with the page then the canvas must be correctly
   composed with its parent. That means we should move the canvas under canvasWrapper
   and we should extract composing info from the drawing instructions...
   Currently it's the case with highlight annotations.
 - we use some extra memory for those canvas even if the user will never edit them, which
   the case for example when opening a pdf in Fenix.

So with this patch, all the editable annotations are drawn on the canvas. When the
user switches to editing mode, then the pages with some editable annotations are redrawn but
without them: they'll be replaced by their counterpart in the annotation editor layer.
2024-07-02 14:11:40 +02:00
Jonas Jenwald
f6cd03955b [api-minor] Move the page reference/number caching into the API
Rather than having to handle this *manually* throughout the viewer, this functionality can instead be moved into the API which simplifies the code slightly.
2024-04-29 18:54:06 +02:00
Jonas Jenwald
e4d0e84802 [api-minor] Replace the PromiseCapability with Promise.withResolvers()
This replaces our custom `PromiseCapability`-class with the new native `Promise.withResolvers()` functionality, which does *almost* the same thing[1]; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/withResolvers

The only difference is that `PromiseCapability` also had a `settled`-getter, which was however not widely used and the call-sites can either be removed or re-factored to avoid it. In particular:
 - In `src/display/api.js` we can tweak the `PDFObjects`-class to use a "special" initial data-value and just compare against that, in order to replace the `settled`-state.
 - In `web/app.js` we change the only case to manually track the `settled`-state, which should hopefully be OK given how this is being used.
 - In `web/pdf_outline_viewer.js` we can remove the `settled`-checks, since the code should work just fine without it. The only thing that could potentially happen is that we try to `resolve` a Promise multiple times, which is however *not* a problem since the value of a Promise cannot be changed once fulfilled or rejected.
 - In `web/pdf_viewer.js` we can remove the `settled`-checks, since the code should work fine without them:
     - For the `_onePageRenderedCapability` case the `settled`-check is used in a `EventBus`-listener which is *removed* on its first (valid) invocation.
     - For the `_pagesCapability` case the `settled`-check is used in a print-related helper that works just fine with "only" the other checks.
 - In `test/unit/api_spec.js` we can change the few relevant cases to manually track the `settled`-state, since this is both simple and *test-only* code.

---
[1] In browsers/environments that lack native support, note [the compatibility data](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/withResolvers#browser_compatibility), it'll be polyfilled via the `core-js` library (but only in `legacy` builds).
2024-04-01 11:42:37 +02:00
Calixte Denizet
2133da166e When updating, write the xref table in the same format as the previous one (bug 1878916)
The specs are unclear about what kind of xref table format must be used.
In checking the validity of some pdfs in the preflight tool from Acrobat
we can guess that having the same format is the correct way to do.
The pdf in the mentioned bug, after having been changed, wasn't correctly
displayed in neither Chrome nor Acrobat: it's now fixed.
2024-02-13 14:14:37 +01:00
Jonas Jenwald
37e98e39f6 Skip any whitespace after the first object in linearized PDFs (issue 17665)
This way the code is now consistent with the non-linearized branch in the `PDFDocument.startXRef` getter.
2024-02-12 22:05:36 +01:00
Calixte Denizet
f2196f7803 StructParents entry isn't required on pages with no tagged contents (bug 1855641) 2023-09-28 14:23:10 +02:00
Calixte Denizet
a8573d4e1b [Editor] Add the ability to create/update the structure tree when saving a pdf containing newly added annotations (bug 1845087)
When there is no tree, the tags for the new annotions are just put under the root element.
When there is a tree, we insert the new tags at the right place in using the value
of structTreeParentId (added in PR #16916).
2023-09-16 18:34:58 +02:00
Jonas Jenwald
ff96c413d3 Use await even more in the "SaveDocument" worker-thread handler
Given that the function is already asynchronous we can make use of `await` even more and reduce the amount of indentation a little bit.
2023-09-16 13:06:48 +02:00
Jonas Jenwald
50937a3539 Ensure that the entire PDF document is loaded *before* we begin saving it
When I started looking at PR 16938 it occurred to me that some of the new structTree-methods are synchronously accessing certain dictionary-data (not used during "normal" structTree-parsing), which may not be generally safe since everything in a dictionary could be a reference (and the relevant data may not have been loaded yet).

Rather than suggesting that we make all those new methods even more asynchronous, to me the overall simplest and safest solution is to ensure that the *entire* PDF document has been loaded *before* we begin saving it. In practice this shouldn't really affect "performance" of saving noticeably, since it's always depended on the entire PDF document being downloaded.

Finally note that with the exception of the PDF document possibly not having been fully downloaded when saving is triggered, all other "global" document properties are pretty much guaranteed to already be available at this point.
2023-09-12 13:26:57 +02:00
Jonas Jenwald
64e8557fb5 [api-minor] Deprecate the PDFDocumentProxy.getJavaScript method
This method is very old, however with the exception of the auto-print hack (when scripting is disabled) in the viewer it's never actually been used.

Most likely the idea with `PDFDocumentProxy.getJavaScript` was that it'd be useful if scripting support was added, however it turned out that it was a bit too simplistic and instead a number of new methods were added for the scripting use-cases.
2023-08-01 09:02:05 +02:00
Calixte Denizet
33fdec1392 Don't replace Acroform dictionary if nothing has changed when saving (bug 1844572) 2023-07-22 17:51:06 +02:00
Jonas Jenwald
88524bf9ae Don't reset temporary XRef-entries during saving (PR 16392 follow-up)
*Please note:* I'm not aware of any bugs caused by this, however that might be more luck than anything else.

In PR 16392 the `incrementalUpdate` function, and all of its various helpers, were made asynchronous. However the call-site in `src/core/worker.js` wasn't updated, which means that we currently reset temporary XRef-entries while saving is ongoing.
2023-07-20 15:49:59 +02:00
Jonas Jenwald
3a886e7264 Move the isNodeJS-helper into the src/shared/util.js file
With the changes in the previous patch the `isNodeJS`-helper no longer needs to live in its own file, which helps get rid of a closure in the *built* files.
2023-07-17 16:42:25 +02:00
Calixte Denizet
599b9498f2 [Editor] Add support for printing/saving newly added Stamp annotations
In order to minimize the size the of a saved pdf, we generate only one
image and use a reference in each annotation using it.
When printing, it's slightly different since we have to render each page
independantly but we use the same image within a page.
2023-06-26 15:47:05 +02:00
Calixte Denizet
71479fdd21 [Editor] Avoid to have duplicated entries in the Annot array when saving an existing and modified annotation 2023-06-15 22:02:10 +02:00
Jonas Jenwald
1753e321cd Remove the compatibility checks in WorkerMessageHandler.createDocumentHandler
For some time these checks have only targeted Node.js environments, since the features in question exist in all supported browsers (even when a `legacy`-build is used).

Now that we've updated the minimum supported Node.js version to 18, a number of polyfills are thus (finally) no longer necessary in that environment. Hence for certain *basic* functionality, such as e.g. text-extraction, it's now possible to use either a modern- or a `legacy`-build of the PDF.js library in Node.js environments.

*Please note:* For e.g. canvas-rendering in Node.js environments it's still necessary to use a `legacy`-build, since that functionality requires various polyfills.
2023-05-07 13:43:19 +02:00
Jonas Jenwald
ed8be6f882 [api-minor] Update the minimum supported Node.js version to 18
This patch updates the minimum supported environments as follows:
 - Node.js 18, which was released on 2022-04-19; see https://en.wikipedia.org/wiki/Node.js#Releases

Note also that Node.js 16 will soon reach EOL, and thus no longer receive any security updates.
2023-05-07 13:43:19 +02:00
Jonas Jenwald
d950b91c4e Introduce some logical assignment in the src/core/ folder 2023-04-29 13:49:37 +02:00
Jonas Jenwald
317abd6d07 Change the createPromiseCapability helper function into a PromiseCapability class
This is not only slightly more compact, but it also simplifies the handling of the `settled` getter.
2023-04-29 13:43:24 +02:00
Tim van der Meij
c9359957e6
Merge pull request #16305 from Snuffleupagus/PDFJSDev-skip-PRODUCTION
Remove the `PRODUCTION` build-target
2023-04-22 14:53:30 +02:00
Calixte Denizet
117bbf7cd9 [api-minor] Don't normalize the text used in the text layer.
Some arabic chars like \ufe94 could be searched in a pdf, hence it must be normalized
when creating the search query. So to avoid to duplicate the normalization code,
everything is moved in the find controller.
The previous code to normalize text was using NFKC but with a hardcoded map, hence it
has been replaced by the use of normalize("NFKC") (it helps to reduce the bundle size
by 30kb).
In playing with this \ufe94 char, I noticed that the bidi algorithm wasn't taking into
account some RTL unicode ranges, the generated font wasn't embedding the mapping this
char and the unicode ranges in the OS/2 table weren't up-to-date.

When normalized some chars can be replaced by several ones and it induced to have
some extra chars in the text layer. To avoid any regression, when copying some text
from the text layer, a copied string is normalized (NFKC) before being put in the
clipboard (it works like this in either Acrobat or Chrome).
2023-04-17 14:31:23 +02:00
Jonas Jenwald
804aa896a7 Stop using the PRODUCTION build-target in the JavaScript code
This *special* build-target is very old, and was introduced with the first pre-processor that only uses comments to enable/disable code.
When the new pre-processor was added `PRODUCTION` effectively became redundant, at least in JavaScript code, since `typeof PDFJSDev === "undefined"` checks now do the same thing.

This patch proposes that we remove `PRODUCTION` from the JavaScript code, since that simplifies the conditions and thus improves readability in many cases.
*Please note:* There's not, nor has there ever been, any gulp-task that set `PRODUCTION = false` during building.
2023-04-17 12:04:34 +02:00
Jonas Jenwald
edd13895dd Limit the Path2D-checks in the worker-thread to Node.js (PR 16238 follow-up, issue 16289)
The changes in PR 16238 were intended specifically for Node.js environments, however they accidentally applied to older browsers as well.

*Please note:* In up-to-date browsers `Path2D` is available in Workers, which should be connected to the introduction of `OffscreenCanvas`.
2023-04-14 11:51:11 +02:00
Tim van der Meij
13f2426aab
Merge pull request #16238 from Snuffleupagus/update-Node-compat-check
Update the Node.js compatibility-check in the worker-thread
2023-04-01 14:20:33 +02:00
Jonas Jenwald
57a307d0cd Update the Node.js compatibility-check in the worker-thread
*Please note:* In Node.js environments a `legacy`-build must be used since only those versions include any polyfills.

Previously we'd only check if `ReadableStream` is natively supported, however since Node.js version 18 that's now been implemented; please see https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream#browser_compatibility
Hence we'll also check for the availability of `Path2D`, since that's browser-specific functionality not expected to be available in Node.js environments; please see https://developer.mozilla.org/en-US/docs/Web/API/Path2D#browser_compatibility
2023-03-30 18:36:15 +02:00
Jonas Jenwald
5063a6f2a9 [api-minor] Remove the disableCombineTextItems option
*Please note:* This parameter has never been used within the PDF.js library/viewer itself, and it was only ever added for backwards compatibility reasons.

This parameter was added in PR 7475, over six years ago, to try and optionally maintain the previous *default* text-extraction behaviour.
However as part of the general text-extraction improvements in PR 13257, almost two years ago, the `disableCombineTextItems` functionality was accidentally "broken" in various ways. Note how the only (very basic) unit-test was updated in a way that doesn't really make sense, since generally speaking you'd expect that using the option should result in *more* (or at least the same number of) text-items. Furthermore there's also the recent issue 16209, where the option causes almost all textContent to be concatenated together.

Hence this patch proposes that we simply remove the `disableCombineTextItems` option since it's essentially unused/untested functionality, as evident from the fact that it took almost two years for someone to notice that it's broken.
2023-03-30 14:23:38 +02:00
Calixte Denizet
2d0f30a67c Use the position of the previous xref stream if any when saving a pdf (bug 1823296) 2023-03-21 19:27:24 +01:00
Calixte Denizet
3a21423386 [Acroform] Use the full path to find the node in the XFA datasets where to store the value
I noticed several 'Path not found' errors because of a field called #subform[2].
From the XFA specs, the hash is used for a class of elements in the template tree.
When we're looking for a node in the datasets tree, it doesn't make sense to search
for a class. Hence the path element starting with a hash are just skipped.
2023-02-23 12:09:39 +01:00
Tim van der Meij
22618213c7
Merge pull request #16040 from Snuffleupagus/arrayBuffersToBytes
Re-factor the `arraysToBytes` helper function (PR 16032 follow-up)
2023-02-12 11:47:57 +01:00
Jonas Jenwald
6d4d402a78 Move the arrayBuffersToBytes helper function into the worker-thread
Given that this helper function is only used on the worker-thread, there's no reason to duplicate it in both of the *built* `pdf.js` and `pdf.worker.js` files.
2023-02-11 21:34:37 +01:00
Jonas Jenwald
18042163ce Improve the consistency between the LocalPdfManager/NetworkPdfManager constructor
Currently these classes take a bunch of parameters (somewhat randomly ordered), probably because this is very old code that's been extended over the years.
Hence this patch changes the constructors to use parameter-objects instead, which improves consistency and (slightly) reduces the amount of code as well.

*Please note:* Also removes the `msgHandler`-property on these classes, since I cannot find a single call-site that accesses it.
2023-02-11 13:39:52 +01:00
Jonas Jenwald
14b0e8c0b6 Ensure that "GetAnnotations" errors are propagated to the main-thread (PR 15267 follow-up)
With the changes in PR 15267 we're now accidentally swallowing "GetAnnotations" errors, rather than propagating them to the main-thread as intended.
2023-02-10 12:18:35 +01:00
Jonas Jenwald
c56f25409d Re-factor the arraysToBytes helper function (PR 16032 follow-up)
Currently this helper function only has two call-sites, and both of them only pass in `ArrayBuffer` data. Given how it's implemented there's a couple of code-paths that are completely unused (e.g. the "string" one), and in particular the intended fast-paths don't actually work.
This patch re-factors and simplifies the helper function, and it'll no longer accept anything except `ArrayBuffer` data (hence why it's also re-named).

Note that at the time when `arraysToBytes` was added we still supported browsers without TypedArray functionality, and we'd then simulate them using regular Arrays.
2023-02-10 10:26:35 +01:00
Jonas Jenwald
5ba596786c Change WorkerTasks, in WorkerMessageHandler.createDocumentHandler, to a use a Set
This is a tiny bit more compact, thanks to the `Set.prototype.delete` method.
2023-02-09 22:01:16 +01:00
Jonas Jenwald
96d338e437 Reduce usage of the arrayByteLength helper function
We're using this helper function when reading data from the [`PDFWorkerStreamReader.read`](a49d1d1615/src/core/worker_stream.js (L90-L98)) and [`PDFWorkerStreamRangeReader.read`](a49d1d1615/src/core/worker_stream.js (L122-L128)) methods, and as can be seen they always return `ArrayBuffer` data. Hence we can simply get the `byteLength` directly, and don't need to use the helper function.

Note that at the time when `arrayByteLength` was added we still supported browsers without TypedArray functionality, and we'd then simulate them using regular Arrays.
2023-02-09 15:50:38 +01:00
Jonas Jenwald
70d362f22c Remove an unnecessary variable in getPdfManager, in the src/core/worker.js file
Another tiny piece of clean-up, since adding a `catch`-handler to a Promise shouldn't require an intermediate variable.
2022-11-17 15:31:41 +01:00
Jonas Jenwald
a2a200175f Remove unnecessary function names in the src/core/worker.js file
Currently *some* functions in this file have names while others don't, and in a few cases the names are no longer entirely accurate.
For the relevant functions there should really be no need to name them, and if memory serves this was originally done since browsers (many years ago) didn't always handle anonymous functions correctly in stack traces.
2022-11-17 15:12:48 +01:00
Calixte Denizet
3ca03603c2 [Annotation] Fix printing/saving for annotations containing some non-ascii chars and with no fonts to handle them (bug 1666824)
- For text fields
 * when printing, we generate a fake font which contains some widths computed thanks to
   an OffscreenCanvas and its method measureText.
   In order to avoid to have to layout the glyphs ourselves, we just render all of them
   in one call in the showText method in using the system sans-serif/monospace fonts.
 * when saving, we continue to create the appearance streams if the fonts contain the char
   but when a char is missing, we just set, in the AcroForm dict, the flag /NeedAppearances
   to true and remove the appearance stream. This way, we let the different readers handle
   the rendering of the strings.
- For FreeText annotations
  * when printing, we use the same trick as for text fields.
  * there is no need to save an appearance since Acrobat is able to infer one from the
    Content entry.
2022-11-10 19:05:39 +01:00
Jonas Jenwald
caef47a0cf Remove the PdfManager.onLoadedStream method (PR 15616 follow-up)
After the clean-up in PR 15616, the `PdfManager.onLoadedStream` method now only has a single call-site.
Hence why this patch suggests that we remove this method and replace it with an *optional* parameter in `PdfManager.requestLoadedStream` instead. By making the new behaviour opt-in, we'll thus not change any existing call-site.
2022-10-29 14:42:17 +02:00
Jonas Jenwald
bcffbf74f3 Let the PdfManager.requestLoadedStream method return the stream
*This is very old code, and it could thus do with some simplification.*

Note how in the `src/core/worker.js` file we're combining both the `PdfManager.requestLoadedStream` and `PdfManager.onLoadedStream` methods in order to access the stream-data. This seems unnecessary, and it's simple enough to always let the `PdfManager.requestLoadedStream` method return the stream-data as well.
2022-10-24 17:00:48 +02:00
Jonas Jenwald
f2f0a1e871 [api-minor] Stop sending "UnsupportedFeature" from the worker-thread GetOperatorList-handling
This code was added all the way back in PR 6698, almost seven years ago, for backwards compatibility reasons. At this point in time, it seems that we can remove that since:
 - We have more fine-grained "UnsupportedFeature" reporting elsewhere in the worker-thread code nowadays.
 - The GetOperatorList-handling is now using `ReadableStream`s, which means that errors are being forwarded to the main-thread anyway.
 - We're also no longer displaying a notification-bar, in the *built-in* Firefox PDF Viewer, for any of these "UnsupportedFeature" messages.
2022-10-13 11:46:17 +02:00