1
0
Fork 0
mirror of https://github.com/mozilla/pdf.js.git synced 2025-04-23 00:28:06 +02:00
Commit graph

6460 commits

Author SHA1 Message Date
Calixte Denizet
6fa98ac99f [api-minor] Simplify how the list of points are structured
Instead of sending to the main thread an array of Objects for a list of points (or quadpoints),
we'll send just a basic float buffer.
It should slightly improve performances (especially when cloning the data) and use slightly less memory.
2024-05-30 15:36:15 +02:00
Tim van der Meij
ee545930ea
Merge pull request #18171 from Snuffleupagus/move-pendingTextLayers
Don't register a pending `TextLayer` until `render` is invoked (PR 18104 follow-up)
2024-05-28 15:37:51 +02:00
Jonas Jenwald
f2e7eee00e Don't register a pending TextLayer until render is invoked (PR 18104 follow-up)
After the re-factoring in PR 18104 there's now a *theoretical* risk that a pending `TextLayer` is never removed, which we can avoid by not registering it until `render` is invoked.
Note that this doesn't affect the viewer or tests, but if a third-party user calls `new TextLayer(...)` without a following call of either the `render`- or `cancel`-method we'd block global clean-up without this patch.
2024-05-26 18:38:40 +02:00
Jonas Jenwald
27436d52b2 Reduce indentation when parsing new annotations in getOperatorList
This code has, over the years, become more complex and less indentation generally helps readability.
2024-05-25 12:00:44 +02:00
Jonas Jenwald
ce52ce063e Change parsingType3Font to a getter (PR 14448 follow-up)
We can easily "compute" `parsingType3Font` from the `type3FontRefs`-value, and thus avoid having to separately track two related properties.
2024-05-25 10:46:12 +02:00
Jonas Jenwald
c349ac3a5d Skip the temporary variable when calling #findStreamLength (PR 18125 follow-up) 2024-05-25 10:38:32 +02:00
Jonas Jenwald
17e09e5478
Merge pull request #18159 from Snuffleupagus/loadingParams-test
Improve the `loadingParams` functionality in the API
2024-05-24 23:21:24 +02:00
Jonas Jenwald
cfcb700ecc Prevent XRef errors from breaking font loading (bug 1898802)
Note that the referenced file is trivially corrupt, since it contains *two* PDF documents placed in the same file which doesn't make sense (and isn't how a PDF document should be updated).
However it's still a good idea to ensure that `loadFont` is able to handle errors when resolving References, since that allows us to invoke the existing fallback font handling.
2024-05-24 21:37:35 +02:00
Jonas Jenwald
06334c97ef Improve the loadingParams functionality in the API
- Move the definition of the `loadingParams` Object, to simplify the code.

 - Add a unit-test, since none existed and the viewer depends on this functionality.
2024-05-24 09:26:40 +02:00
Jonas Jenwald
3afa9bfc42 Improve /Page validation for linearized documents (issue 18138)
The referenced PDF document contains corrupt linearization-data, that doesn't point to the *first* page as intended.
2024-05-22 12:04:02 +02:00
Jonas Jenwald
2a52fda11b
Merge pull request #17770 from Aditi-1400/fix-issue-16843
Add language attribute to canvas
2024-05-21 21:35:43 +02:00
calixteman
5da2894278
Merge pull request #18136 from calixteman/ml_stamp
[Editor] Pass a buffer instead of a blob url to the ML api
2024-05-21 18:24:26 +02:00
Calixte Denizet
b20ddff300 [Editor] Pass a buffer instead of a blob url to the ML api 2024-05-21 17:07:03 +02:00
Calixte Denizet
2369e40d2e [Editor] Update popup position and contents after a FreeText has been edited 2024-05-21 16:54:10 +02:00
Aditi
9edca0a5ed Add lang attribute to canvas element
Fixes issue #16843.
In certain cases, the text layer was misaligned
due to a difference between the `lang` attribute
of the viewer and the canvas. This commit addresses
the problem by adding the `lang` attribute to the canvas.

The issue was caused because PDF.js uses serif/sans-serif
fonts to generate the text layer and relies on system fonts.
The difference in the `lang` attribute led to different fonts
being picked, causing the misalignment.
2024-05-21 19:41:24 +05:30
Jonas Jenwald
57014d0d13 Support corrupt PDF documents that contain "endsteam" commands (issue 18122)
This patch also re-factors the findStreamLength-helper to avoid even more code duplication.
2024-05-21 13:38:17 +02:00
Jonas Jenwald
9ee7c07b83
Merge pull request #18104 from Snuffleupagus/TextLayer-class
[api-minor] Re-factor the basic textLayer-functionality
2024-05-21 12:28:28 +02:00
Jonas Jenwald
59637c1fa8
Merge pull request #18115 from Snuffleupagus/freeze-evaluatorOptions
Freeze `evaluatorOptions` in the src/core/pdf_manager.js file
2024-05-21 12:19:04 +02:00
Jonas Jenwald
440b4b6eeb Support charCodes larger than 32-bit in adjustMapping (issue 18117)
This also required changing the initial `charCodeToGlyphId`-data to an Object, which seems generally correct since it's consistent with existing code in the `src\core\{cff_font, type1_font}.js` files.
2024-05-20 12:13:55 +02:00
Jonas Jenwald
3cd6c6c0e6 Freeze evaluatorOptions in the src/core/pdf_manager.js file
Given that these options are passed from the API we don't want to accidentally modify them.
2024-05-18 15:16:12 +02:00
Jonas Jenwald
15b5808eee [api-minor] Re-factor the basic textLayer-functionality
This is very old code, and predates e.g. the introduction of JavaScript classes, which creates unnecessarily unwieldy code in the viewer.
By introducing a new `TextLayer` class in the API, similar to how e.g. the `AnnotationLayer` looks, we're able to keep most parameters on the class-instance itself. This removes the need to manually track them in the viewer, and simplifies the call-sites.

This also removes the `numTextDivs` parameter from the "textlayerrendered" event, since that's only added to support default-viewer functionality that no longer exists.

Finally we try, as far as possible, to polyfill the old `renderTextLayer` and `updateTextLayer` functions since they are exposed in the library API.
For *simple* invocations of `renderTextLayer` the behaviour should thus be the same, with only a warning printed in the console.
2024-05-17 14:20:20 +02:00
Jonas Jenwald
d8e0fca609 Don't invoke cleanupTextLayer when there are pending textLayers
*Please note:* This doesn't really affect the viewer, but may affect the library API if multiple PDF documents are opened in parallel.

Since we clean-up "global" textLayer-data when destroying a PDF document, this means that other active PDFs could potentially break by invoking `cleanupTextLayer` unconditionally. Note that textLayer rendering is an asynchronous task, and we thus need to ensure those are all finished before running clean-up.
2024-05-17 08:52:10 +02:00
Jonas Jenwald
d5f3829f91 Actually disable TextLayerRenderTask.prototype.#processItems when MAX_TEXT_DIVS_TO_RENDER is reached (PR 18089 follow-up)
I broke this accidentally in PR 18089, sorry about that!
Note that since `#processItems` is private we can no longer just "replace" the method as was done in PR 18052.
2024-05-16 11:48:11 +02:00
Tim van der Meij
4db843617f
Merge pull request #18047 from Snuffleupagus/issue-18042
Avoid re-parsing global images that failed decoding (issue 18042, PR 17428 follow-up)
2024-05-15 15:40:18 +02:00
Jonas Jenwald
6b171540b7 Initialize the networkStream synchronously in getDocument
This is fairly old code, and at some point the need for this to be asynchronous disappeared.
2024-05-14 17:04:25 +02:00
Jonas Jenwald
cbb8748a22 Inline the _fetchDocument helper function in getDocument
This function has been modified a number of times over the years, and at this point it's small/simple enough that we can just inline the code instead.
2024-05-14 16:29:41 +02:00
Jonas Jenwald
036fd11ad7 Improve the TextLayerRenderTask implementation
- Change all possible semi-private methods into properly private ones. Note that this code is old enough to predate standard classes.

 - Move the `appendText` helper function into `TextLayerRenderTask`, as a private method, to avoid having to manually pass in the scope.

 - Simplify `#layoutText` by directly passing in all necessary data. This is possible after the changes PR 18052.
2024-05-14 14:10:17 +02:00
Jonas Jenwald
c5f92437f7 Avoid re-parsing global images that failed decoding (issue 18042, PR 17428 follow-up)
For images that failed to decode once we want to avoid a pointless round-trip to the main-thread, which could otherwise happen for globally cached images.
2024-05-14 13:58:36 +02:00
Jonas Jenwald
6d523c316c [api-minor] Include the document /Lang attribute in the textContent-data
- These changes will allow a simpler way of implementing PR 17770.

 - The /Lang attribute is fetched lazily, with the first `getTextContent` invocation. Given the existing worker-thread caching, this will thus only need to be done *once* per PDF document (and most PDFs don't included this data).

 - This makes the /Lang attribute *directly available* in the `textLayer`, which has the following advantages:
    - We don't need to block, and thus delay, overall viewer initialization on fetching it (nor pass it around throughout the viewer).

    - Third-party users of the `textLayer` will automatically benefit from this, once we start actually using the /Lang attribute in PR 17770.
      *Please note:* This also, importantly, means that the `text` reference-tests will then cover this code (which wouldn't otherwise have been the case).
2024-05-14 12:44:41 +02:00
Jonas Jenwald
c0b5d93ef4
Merge pull request #18052 from Snuffleupagus/textLayer-only-ReadableStream
Restore broken functionality and simplify the implementation in `src/display/text_layer.js`
2024-05-14 12:30:27 +02:00
Jonas Jenwald
298d72133e
Merge pull request #18051 from Snuffleupagus/NodePackages
[api-minor] Re-factor how Node.js packages/polyfills are  loaded (issue 17245)
2024-05-14 11:43:57 +02:00
Jonas Jenwald
761abc7cc3
Merge pull request #18066 from Snuffleupagus/rm-FontFaceObject-ignoreErrors
Remove the `ignoreErrors` option from the `FontFaceObject` class
2024-05-14 09:49:08 +02:00
Tim van der Meij
0347e59b99
Merge pull request #18061 from Snuffleupagus/api-report-Stats
Slightly re-factor how the viewer initializes debug-only functionality
2024-05-13 19:38:59 +02:00
Jonas Jenwald
4aee67227e Remove the unused Font.prototype.spaceWidth getter (PR 13424 follow-up)
This getter became unused in PR 13424, well over two years ago, and apparently none of us noticed that.
2024-05-11 11:50:51 +02:00
Jonas Jenwald
5f6f1686b5 Remove the ignoreErrors option from the FontFaceObject class
- The `stopAtErrors` API option, which is the inverse of the "internal" `ignoreErrors` option, is explicitly documented as applying to *parsing* (i.e. the worker-thread) while the `FontFaceObject` class is used during rendering (i.e. the main-thread); see b6765403a1/src/display/api.js (L164-L167)

 - A glyph that fails in the `FontRendererFactory`, on the worker-thread, will already cause (overall) parsing to stop when `ignoreErrors === false` hence checking the option on the main-thread as well seems redundant; see b6765403a1/src/core/evaluator.js (L4527-L4533)

 - Removing this option simplifies the code, and slightly reduces the number of options that we need to handle in the main-thread code.
2024-05-11 10:18:23 +02:00
Jonas Jenwald
5e50479ac6 Use more object destructuring in the "commonobj" handler in the API 2024-05-11 09:44:10 +02:00
Jonas Jenwald
4a8d742592 Move the reporting of page Stats into the API
This avoids having to add a couple of event listeners in the viewer, when debugging is enabled, and is consistent with the existing handling of `FontInspector` and `StepperManager` in the API.
2024-05-11 09:42:05 +02:00
Jonas Jenwald
8d86e18a32 Restore the MAX_TEXT_DIVS_TO_RENDER limit in the textLayer
This limit is currently completely non-functional, since the check happens *after* the entire textLayer has been parsed and appended to the DOM. It seems that this has been *accidentally* broken ever since the introduction of `ReadableStream` support.
The reason that this hasn't caused noticeable textLayer-related performance issues in practice is probably because we nowadays manage to coalesce the textLayer into fewer overall DOM elements, whereas years ago many PDF documents ended up with one DOM element *per* glyph.

By moving this check, and thus restoring the functionality, we're also able to remove the `render` helper function and simplify the code.
2024-05-07 13:04:00 +02:00
Jonas Jenwald
30840e411e Ensure that the textLayer styleCache is always cleared, even on failure
By also moving it to the `TextLayerRenderTask`-instance, we can avoid a bit of manual parameter passing.
2024-05-07 13:04:00 +02:00
Jonas Jenwald
049848ba00 Unify the ReadableStream and TextContent code-paths in src/display/text_layer.js
The only reason that this code still accepts `TextContent` is for backward-compatibility purposes, so we can simplify the implementation by always using a `ReadableStream` internally.
2024-05-07 13:03:57 +02:00
Jonas Jenwald
2643570364 [api-minor] Re-factor how Node.js packages/polyfills are loaded (issue 17245)
*Please note:* This removes top level await from the GENERIC builds of the PDF.js library.

Despite top level await being supported in all modern browsers/environments, note [the MDN compatibility data](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/await#browser_compatibility), it seems that many frameworks and build-tools unfortunately have trouble with it.
Hence, in order to reduce the influx of support requests regarding top level await it thus seems that we'll have to try and fix this.

Given that top level await is only needed for Node.js environments, to load packages/polyfills, we re-factor things to limit the asynchronicity to that environment.
The "best" solution, with the least likelihood of causing future problems, would probably be to await the load of Node.js packages/polyfills e.g. at the top of the `getDocument`-function. Unfortunately that doesn't work though, since that's a *synchronous* function that we cannot change without breaking "the world".

Hence we instead await the load of Node.js packages/polyfills together with the `PDFWorker` initialization, since that's the *first point* of asynchronicity during initialization/loading of a PDF document. The reason that this works is that the Node.js packages/polyfills are only needed during fetching of the PDF document respectively during rendering, neither of which can happen *until* the worker has been initialized.
Hopefully this won't cause any future problems, since looking at the history of the PDF.js project I don't believe that we've (thus far) ever needed a Node.js dependency at an earlier point.
This new pattern for accessing Node.js packages/polyfills will also require some care during development *and* importantly reviewing, to ensure that no new top level await is added in the main code-base.
2024-05-06 23:20:03 +02:00
Jonas Jenwald
9b41bfc374 Introduce helper functions for parsing /Matrix and /BBox arrays 2024-05-03 22:37:50 +02:00
Jonas Jenwald
52f7ff155d Validate even more dictionary properties
This checks primarily Arrays, but also some other properties, that we'll end up sending (sometimes indirectly) to the main-thread.
2024-05-03 22:37:14 +02:00
Jonas Jenwald
1b811ac113
Merge pull request #18034 from Snuffleupagus/FileSpec-filename-stripPath
[api-minor] Improve the `FileSpec` implementation
2024-05-03 09:03:17 +02:00
Jonas Jenwald
a790f2df5d [api-minor] Remove the unused onlyStripPath option from the getFilenameFromUrl helper function 2024-05-03 08:29:41 +02:00
Jonas Jenwald
c419c8333b
Merge pull request #18037 from Snuffleupagus/validate-more-widths
Add even more validation of width-data (PR 18017 follow-up)
2024-05-02 14:41:02 +02:00
Jonas Jenwald
6c05f8b381 Add even more validation of width-data (PR 18017 follow-up)
I missed this case in PR 18017, sorry about that.
2024-05-02 11:24:15 +02:00
calixteman
33732ff2cb
Merge pull request #18035 from calixteman/rm_max_group_size
Remove the limit used to decided if a group canvas must be upscaled or not
2024-05-01 20:14:28 +02:00
Jonas Jenwald
2b69fb76ac [api-minor] Improve the FileSpec implementation
- Check that the `filename` is actually a string, before parsing it further.
 - Use proper "shadowing" in the `filename` getter.
 - Add a bit more validation of the data in `pickPlatformItem`.
 - Last, but not least, return both the original `filename` and the (path stripped) variant needed in the display-layer and viewer.
2024-05-01 18:02:05 +02:00
Calixte Denizet
5c771628de Remove the limit used to decided if a group canvas must be upscaled or not
It fixes issues #14982 and #14724.
The main problem of upscaling a canvas is that it can induces some pixelation
(see issue #14724). So this patch is just removing the limit and as a side
effect it fixes issue #14982.
As far as I can tell, in looking different profiles (especially some memory profile)
in using the Firefox profiler, I don't see any noticeable difference in term of
memory use.
2024-05-01 18:01:54 +02:00