1
0
Fork 0
mirror of https://github.com/mozilla/pdf.js.git synced 2025-04-19 22:58:07 +02:00
Commit graph

1251 commits

Author SHA1 Message Date
Calixte Denizet
3103deaa44 Fix missing annotation parent in using the one from the Fields entry
Fixes #15096.
2024-10-04 20:00:19 +02:00
Calixte Denizet
c9050be863 [Editor] Add the possibility to save an updated stamp annotation (bug 1921291) 2024-10-02 11:45:16 +02:00
Calixte Denizet
0382dd0e25 [Editor] When deleting an annotation with popup, then delete the popup too 2024-09-26 17:52:25 +02:00
Calixte Denizet
fc1564f476 Correctly compute the font size when printing a text field with an auto font size (bug 1917734) 2024-09-25 14:05:54 +02:00
Jonas Jenwald
bb302dd993 [api-minor] Pass CanvasFactory/FilterFactory, rather than instances, to getDocument
This unifies the various factory-options, since it's consistent with `CMapReaderFactory`/`StandardFontDataFactory`, and ensures that any needed parameters will always be consistently provided when creating `CanvasFactory`/`FilterFactory`-instances.

As shown in the modified example this may simplify some custom implementations, since we now provide the ability to access the `CanvasFactory`-instance used with a particular `getDocument`-invocation.
2024-09-23 11:26:30 +02:00
Jonas Jenwald
0a621ba73a Use fs/promises in the Node.js unit-tests (PR 17714 follow-up)
This is available in all Node.js versions that we currently support, and using it allows us to remove callback-functions; please see https://nodejs.org/docs/latest-v18.x/api/fs.html#promises-api
2024-09-22 12:57:23 +02:00
Calixte Denizet
46fac8b2c1 [Editor] Take into account the device pixel ratio when drawing an added image
Fixes #18626.
2024-09-16 14:48:26 +02:00
Jonas Jenwald
c52e8485a7
Merge pull request #18731 from Snuffleupagus/TextLayer-ensureCtxFont
Ensure that textLayers can be rendered in parallel, without interfering with each other
2024-09-11 16:29:53 +02:00
Jonas Jenwald
5b3d3c7dd9 Ensure that textLayers can be rendered in parallel, without interfering with each other
Note that the textContent is returned in "chunks" from the API, through the use of `ReadableStream`s, and on the main-thread we're (normally) using just one temporary canvas in order to measure the size of the textLayer `span`s; see the [`#layout`](5b4c2fe1a8/src/display/text_layer.js (L396-L428)) method.

*Order of events, for parallel textLayer rendering:*
 1. Call [`render`](5b4c2fe1a8/src/display/text_layer.js (L155-L177)) of the textLayer for page A.
 2. Immediately call `render` of the textLayer for page B.
 3. The first text-chunk for pageA arrives, and it's parsed/layout which means updating the cached [fontSize/fontFamily](5b4c2fe1a8/src/display/text_layer.js (L409-L413)) for the textLayer of page A.
 4. The first text-chunk for pageB arrives, which means updating the cached fontSize/fontFamily *for the textLayer of page B* since this data is unique to each `TextLayer`-instance.
 5. The second text-chunk for pageA arrives, and we don't update the canvas-font since the cached fontSize/fontFamily still apply from step 3 above.

Where this potentially breaks down is between the last steps, since we're using just one temporary canvas for all measurements but have *individual* fontSize/fontFamily caches for each textLayer.
Hence it's possible that the canvas-font has actually changed, despite the cached values suggesting otherwise, and to address this we instead cache the fontSize/fontFamily globally through a new (static) helper method.

*Note:* Includes a basic unit-test, using dummy text-content, which fails on `master` and passes with this patch.

Finally, pun intended, ensure that temporary textLayer-data is cleared *before* the `render`-promise resolves to avoid any intermittent problems in the unit-tests.
2024-09-11 15:28:51 +02:00
Calixte Denizet
06f9d8002d Consider foo-\nBar as a compound word
Fixes #18693.
2024-09-11 15:01:54 +02:00
Calixte Denizet
bae32b4fd2 [JS] Let AFSpecial_KeystrokeEx match a format without 'decoration' (bug 1916714)
It'll let the user enter 1234567 instead of 123-4567 for example.
It works like this in other pdf viewers.
2024-09-09 20:29:14 +02:00
Tim van der Meij
c159cb1335
Merge pull request #18682 from Snuffleupagus/responseHeaders
Use response-`Headers` in the different `IPDFStream` implementations
2024-09-08 11:49:50 +02:00
Calixte Denizet
68332ec236 Avoid to have a white line around the canvas
The canvas must have the same dims as the page in order to avoid to see the page
background.
2024-09-07 20:12:29 +02:00
Jonas Jenwald
840cc5e0d4 Use response-Headers in the different IPDFStream implementations
Given that the `Headers` functionality is now available in all browsers/environments that we support, [see MDN](https://developer.mozilla.org/en-US/docs/Web/API/Headers#browser_compatibility), we can utilize "proper" `Headers` in the helper functions that are used to parse the response.
2024-09-07 12:34:53 +02:00
Calixte Denizet
ddba096191 Make tagged images visible for screen readers (bug 1708040)
The idea is to insert a span in the text layer with an aria-role set to img
and use the bounding box provided by the attribute field in the tag dict in
order to have non-null dimensions for the image to make it "visible".
2024-09-05 17:59:42 +02:00
Jonas Jenwald
d3a94f17cb Use Headers consistently in the different IPDFStream implementations
The `Headers` functionality is now available in all browsers/environments that we support, which allows us to consolidate and simplify how the `httpHeaders` API-option is handled; see https://developer.mozilla.org/en-US/docs/Web/API/Headers#browser_compatibility

Also, simplifies the old `NetworkManager`-constructor a little bit.
2024-09-02 11:56:24 +02:00
Nicolò Ribaudo
229ad1bb2c
Use the URL global instead of the deprecated url.parse
The Node.js url.parse API (https://nodejs.org/api/url.html#urlparseurlstring-parsequerystring-slashesdenotehost)
is deprecated because it's prone to security issues (to the point that Node.js doesn't even publish CVEs for it anymore).

The official reccomendation is to instead use the global URL constructor, available both in Node.js and in browsers.
Node.js filesystem APIs accept URL objects as parameter, so this also avoids a few URL->filepath conversions.
2024-08-27 18:19:25 +02:00
Jonas Jenwald
8728f7f134 Support an odd number of digits in hexadecimal strings (issue 18645)
See https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G6.1840792
2024-08-23 16:31:43 +02:00
Nicolò Ribaudo
f051597e23
Allow specifying custom match logic in PDFFindController
This patch allows embedders of PDF.js to provide custom match
logic for seaching in PDFs. This is done by subclassing the
PDFFindController class and overriding the `match` method.

`match` is called once per PDF page, receives as parameters the
search query, the page contents, and the page index, and returns
an array of { index, length } objects representing the search
results.
2024-08-13 10:45:57 +02:00
Jonas Jenwald
c4fdb28573 Remove PDFWorkerUtil and move its contents into PDFWorker instead
This is possible thanks to features, i.e. private fields and in particular static initialization blocks, that didn't exist back when we started using classes in the code-base.
2024-07-29 11:22:43 +02:00
Jonas Jenwald
c4cd405a8f Ignore non-dictionary nodes when parsing StructTree data (issue 18503) 2024-07-28 12:08:44 +02:00
Jonas Jenwald
d116ab1a22 Shorten the errors mentioning API parameters in BaseCMapReaderFactory and BaseStandardFontDataFactory
The current error-messages also mention internal parameters, which an end-user obviously don't have to care about. So, let's try to avoid confusion here by only including the API parameters.
2024-07-27 16:54:54 +02:00
Calixte Denizet
c3065629ca [Editor] Correctly save a non-ascii alt text 2024-07-24 19:13:45 +02:00
Jonas Jenwald
d24a61c648 Allow /XYZ destinations without zoom parameter (issue 18408)
According to the PDF specification these destinations should have a zoom parameter, which may however be `null`, but it shouldn't be omitted; please see https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G11.2095870

Hence we try to work-around bad PDF generators by making the zoom parameter optional when validating explicit destinations in both the worker and the viewer.
2024-07-18 13:29:32 +02:00
Jonas Jenwald
8a979c2d0e [api-minor] Remove Outliner from the official API
As far as I can tell `Outliner` is only exposed in the API because we need to access it when running some of the reference-tests, but is otherwise not used.
Hence this seems like something that should be kept *internal* and thus only exposed in TESTING-builds.
2024-07-16 13:08:26 +02:00
calixteman
9b1b5ff7e7
Merge pull request #18419 from calixteman/reuse_old_dict_when_updating
[Editor] Update the freetext annotation dictionary instead of creating a new one when updating an existing freetext
2024-07-11 11:24:15 +02:00
Calixte Denizet
6711123f68 [Editor] Update the freetext annotation dictionary instead of creating a new one when updating an existing freetext 2024-07-11 10:44:21 +02:00
Jonas Jenwald
403d023617 Allow e.g. /FitH destinations without additional parameter (bug 1907000)
According to the PDF specification these destinations should have a coordinate parameter, which may however be `null`, but it shouldn't be omitted; please see https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G11.2095870

Hence we try to work-around bad PDF generators by making the coordinate parameter optional when validating explicit destinations in both the worker and the viewer.
2024-07-11 10:36:44 +02:00
Jonas Jenwald
5ee61690f3
Merge pull request #18390 from alexcat3/fix-issue-18099
Handle toUnicode cMaps that omit leading zeros in hex encoded UTF-16 (issue 18099)
2024-07-06 18:57:07 +02:00
alexcat3
1c364422a6 Handle toUnicode cmaps that omit leading zeros in hex encoded UTF-16 (issue 18099)
Add unit test to check compatability with such cmaps

In the PDF in issue 18099. the toUnicode cmap had a line to map the glyph char codes from 00 to 7F to the corresponding code points. The syntax to map a range of char codes to a range of unicode code points is
<start_char_code> <end_char_code> <start_unicode_codepoint>
As the unicode code points are supposed to be given in UTF-16 BE, the PDF's line SHOULD have probably read
<00> <7F> <0000>
Instead it omitted two leading zeros from the UTF-16 like this
<00> <7F> <00>
This confused PDF.js into mapping these character codes to the UTF-16 characters with the corresponding HIGH bytes (01 became \u0100, 02 became \u0200, et cetera), which ended up turning latin text in the PDF into chinese when it was copied
I'm not sure if the PDF spec actually allows PDFs to do this, but since there's at least one PDF in the wild that does and other PDF readers read it correctly, PDF.js should probably support this
2024-07-06 11:29:21 -04:00
Tim van der Meij
2a44203d96
Fix the "caches image resources at the document/page level as expected (issue 11878)" unit test
This unit test fails occasionally (albeit much less than before thanks
to PR #17663), so we change the parsing time check's divisor to prevent
it from happening again. If the last page's rendering time is less than
or equal to 50% of the first page's rendering time that should be enough
proof that no worker thread re-parsing occurred while also providing a
wide enough range to avoid intermittents.

Note that the assertion is now equal to the one we already have in the
"caches image resources at the document/page level, with main-thread
copying of complex images (issue 11518)" unit test which seems to work
reliably so far.
2024-07-06 16:30:07 +02:00
Jonas Jenwald
38528d1116 Remove the renderForms parameter from the Annotation getOperatorList methods
The `renderForms` parameter pre-dates the introduction of the general `intent` parameter, which means that we're now effectively passing the same state twice to these `getOperatorList` methods.
2024-07-05 12:25:18 +02:00
Jonas Jenwald
f3d177e3e4 [api-minor] Remove the deprecated renderTextLayer and updateTextLayer functions (PR 18104 follow-up) 2024-06-30 15:16:00 +02:00
Tim van der Meij
2f3bf6f07e
Don't ignore errors in the Jasmine suite start/end stages
Currently errors in `afterAll` are logged, but don't fail the tests.
This could cause new errors during test teardown to go by unnoticed.

Moreover, the integration test use a different reporting mechanism which
also handled errors differently (this is extra reason to do #12730).

This patch fixes the issues by consistently handling errors in
`suiteStarted` and `suiteDone` in both reporting mechanisms.

Fixes #18319.
2024-06-23 20:59:48 +02:00
bootleq
890c567eca Expose entireWord in updateFindControlState
Allow apps with supportsIntegratedFind to better monitor the find state.

A recognized use case is the Firefox findbar, its "not found" sound must
consider `entireWord` and only make noise when it is off.

See related implementation in
https://hg.mozilla.org/mozilla-central/rev/16b902cbcf26

This change can help if we have to move the implementation from cpp to jsm.
2024-06-21 13:12:59 +08:00
Calixte Denizet
c14c3cfc9f Improve date parsing in the js sandbox
If for example dd:mm is failing we just try with d:m which is equivalent
to the regex /d{1,2}:m{1,2}/. This way it allows the user to forget the
0 for the first days/months.
2024-06-14 17:21:50 +02:00
Calixte Denizet
6fa98ac99f [api-minor] Simplify how the list of points are structured
Instead of sending to the main thread an array of Objects for a list of points (or quadpoints),
we'll send just a basic float buffer.
It should slightly improve performances (especially when cloning the data) and use slightly less memory.
2024-05-30 15:36:15 +02:00
Jonas Jenwald
06334c97ef Improve the loadingParams functionality in the API
- Move the definition of the `loadingParams` Object, to simplify the code.

 - Add a unit-test, since none existed and the viewer depends on this functionality.
2024-05-24 09:26:40 +02:00
Jonas Jenwald
15b5808eee [api-minor] Re-factor the basic textLayer-functionality
This is very old code, and predates e.g. the introduction of JavaScript classes, which creates unnecessarily unwieldy code in the viewer.
By introducing a new `TextLayer` class in the API, similar to how e.g. the `AnnotationLayer` looks, we're able to keep most parameters on the class-instance itself. This removes the need to manually track them in the viewer, and simplifies the call-sites.

This also removes the `numTextDivs` parameter from the "textlayerrendered" event, since that's only added to support default-viewer functionality that no longer exists.

Finally we try, as far as possible, to polyfill the old `renderTextLayer` and `updateTextLayer` functions since they are exposed in the library API.
For *simple* invocations of `renderTextLayer` the behaviour should thus be the same, with only a warning printed in the console.
2024-05-17 14:20:20 +02:00
Tim van der Meij
4db843617f
Merge pull request #18047 from Snuffleupagus/issue-18042
Avoid re-parsing global images that failed decoding (issue 18042, PR 17428 follow-up)
2024-05-15 15:40:18 +02:00
Tim van der Meij
6b237e3358
Implement a unit test for the BaseException class
The issue from #18003 hasn't been shown to be caused by PDF.js, but it
did surface that we don't have (direct) unit test coverage for the
`BaseException` class. This made it more difficult to prove that the
`stack` property was already available on exception instances, but more
importantly it caused the CI to be green even though the suggested
change would have caused the `stack` property to disappear.

To avoid future regressions, for e.g. similar changes or a rewrite from
a closure to a proper class, this commit introduces a dedicated unit
test for `BaseException` that asserts that our exception instances
indeed expose all expected properties.
2024-05-14 20:21:42 +02:00
Jonas Jenwald
c5f92437f7 Avoid re-parsing global images that failed decoding (issue 18042, PR 17428 follow-up)
For images that failed to decode once we want to avoid a pointless round-trip to the main-thread, which could otherwise happen for globally cached images.
2024-05-14 13:58:36 +02:00
Jonas Jenwald
6d523c316c [api-minor] Include the document /Lang attribute in the textContent-data
- These changes will allow a simpler way of implementing PR 17770.

 - The /Lang attribute is fetched lazily, with the first `getTextContent` invocation. Given the existing worker-thread caching, this will thus only need to be done *once* per PDF document (and most PDFs don't included this data).

 - This makes the /Lang attribute *directly available* in the `textLayer`, which has the following advantages:
    - We don't need to block, and thus delay, overall viewer initialization on fetching it (nor pass it around throughout the viewer).

    - Third-party users of the `textLayer` will automatically benefit from this, once we start actually using the /Lang attribute in PR 17770.
      *Please note:* This also, importantly, means that the `text` reference-tests will then cover this code (which wouldn't otherwise have been the case).
2024-05-14 12:44:41 +02:00
Jonas Jenwald
c0b5d93ef4
Merge pull request #18052 from Snuffleupagus/textLayer-only-ReadableStream
Restore broken functionality and simplify the implementation in `src/display/text_layer.js`
2024-05-14 12:30:27 +02:00
Jonas Jenwald
049848ba00 Unify the ReadableStream and TextContent code-paths in src/display/text_layer.js
The only reason that this code still accepts `TextContent` is for backward-compatibility purposes, so we can simplify the implementation by always using a `ReadableStream` internally.
2024-05-07 13:03:57 +02:00
Jonas Jenwald
2643570364 [api-minor] Re-factor how Node.js packages/polyfills are loaded (issue 17245)
*Please note:* This removes top level await from the GENERIC builds of the PDF.js library.

Despite top level await being supported in all modern browsers/environments, note [the MDN compatibility data](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/await#browser_compatibility), it seems that many frameworks and build-tools unfortunately have trouble with it.
Hence, in order to reduce the influx of support requests regarding top level await it thus seems that we'll have to try and fix this.

Given that top level await is only needed for Node.js environments, to load packages/polyfills, we re-factor things to limit the asynchronicity to that environment.
The "best" solution, with the least likelihood of causing future problems, would probably be to await the load of Node.js packages/polyfills e.g. at the top of the `getDocument`-function. Unfortunately that doesn't work though, since that's a *synchronous* function that we cannot change without breaking "the world".

Hence we instead await the load of Node.js packages/polyfills together with the `PDFWorker` initialization, since that's the *first point* of asynchronicity during initialization/loading of a PDF document. The reason that this works is that the Node.js packages/polyfills are only needed during fetching of the PDF document respectively during rendering, neither of which can happen *until* the worker has been initialized.
Hopefully this won't cause any future problems, since looking at the history of the PDF.js project I don't believe that we've (thus far) ever needed a Node.js dependency at an earlier point.
This new pattern for accessing Node.js packages/polyfills will also require some care during development *and* importantly reviewing, to ensure that no new top level await is added in the main code-base.
2024-05-06 23:20:03 +02:00
Jonas Jenwald
a790f2df5d [api-minor] Remove the unused onlyStripPath option from the getFilenameFromUrl helper function 2024-05-03 08:29:41 +02:00
Jonas Jenwald
2b69fb76ac [api-minor] Improve the FileSpec implementation
- Check that the `filename` is actually a string, before parsing it further.
 - Use proper "shadowing" in the `filename` getter.
 - Add a bit more validation of the data in `pickPlatformItem`.
 - Last, but not least, return both the original `filename` and the (path stripped) variant needed in the display-layer and viewer.
2024-05-01 18:02:05 +02:00
Jonas Jenwald
bf4e36d1b5 [api-minor] Expose the /Desc-attribute of file attachments in the viewer (issue 18030)
In the viewer this will be displayed in the `title` of the hyperlink, which is probably the best we can do here given how the viewer is implemented.
2024-05-01 09:02:11 +02:00
Calixte Denizet
45fa867577 Allow to insert several annotations under the same parent in the structure tree
While testing stamp insertion with the added pdf, I noticed that the tags using a MCID
weren't considered when trying to attach an annotation to it.
2024-04-24 16:23:05 +02:00