pdf.js

mirror of https://github.com/mozilla/pdf.js.git synced 2025-04-19 22:58:07 +02:00

Author	SHA1	Message	Date
Nicolò Ribaudo	229ad1bb2c	Use the URL global instead of the deprecated url.parse The Node.js url.parse API (https://nodejs.org/api/url.html#urlparseurlstring-parsequerystring-slashesdenotehost) is deprecated because it's prone to security issues (to the point that Node.js doesn't even publish CVEs for it anymore). The official reccomendation is to instead use the global URL constructor, available both in Node.js and in browsers. Node.js filesystem APIs accept URL objects as parameter, so this also avoids a few URL->filepath conversions.	2024-08-27 18:19:25 +02:00
Jonas Jenwald	8728f7f134	Support an odd number of digits in hexadecimal strings (issue 18645) See https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G6.1840792	2024-08-23 16:31:43 +02:00
Nicolò Ribaudo	f051597e23	Allow specifying custom match logic in PDFFindController This patch allows embedders of PDF.js to provide custom match logic for seaching in PDFs. This is done by subclassing the PDFFindController class and overriding the `match` method. `match` is called once per PDF page, receives as parameters the search query, the page contents, and the page index, and returns an array of { index, length } objects representing the search results.	2024-08-13 10:45:57 +02:00
Jonas Jenwald	c4fdb28573	Remove `PDFWorkerUtil` and move its contents into `PDFWorker` instead This is possible thanks to features, i.e. private fields and in particular static initialization blocks, that didn't exist back when we started using classes in the code-base.	2024-07-29 11:22:43 +02:00
Jonas Jenwald	c4cd405a8f	Ignore non-dictionary nodes when parsing StructTree data (issue 18503)	2024-07-28 12:08:44 +02:00
Jonas Jenwald	d116ab1a22	Shorten the errors mentioning API parameters in `BaseCMapReaderFactory` and `BaseStandardFontDataFactory` The current error-messages also mention internal parameters, which an end-user obviously don't have to care about. So, let's try to avoid confusion here by only including the API parameters.	2024-07-27 16:54:54 +02:00
Calixte Denizet	c3065629ca	[Editor] Correctly save a non-ascii alt text	2024-07-24 19:13:45 +02:00
Jonas Jenwald	d24a61c648	Allow /XYZ destinations without zoom parameter (issue 18408) According to the PDF specification these destinations should have a zoom parameter, which may however be `null`, but it shouldn't be omitted; please see https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G11.2095870 Hence we try to work-around bad PDF generators by making the zoom parameter optional when validating explicit destinations in both the worker and the viewer.	2024-07-18 13:29:32 +02:00
Jonas Jenwald	8a979c2d0e	[api-minor] Remove `Outliner` from the official API As far as I can tell `Outliner` is only exposed in the API because we need to access it when running some of the reference-tests, but is otherwise not used. Hence this seems like something that should be kept internal and thus only exposed in TESTING-builds.	2024-07-16 13:08:26 +02:00
calixteman	9b1b5ff7e7	Merge pull request #18419 from calixteman/reuse_old_dict_when_updating [Editor] Update the freetext annotation dictionary instead of creating a new one when updating an existing freetext	2024-07-11 11:24:15 +02:00
Calixte Denizet	6711123f68	[Editor] Update the freetext annotation dictionary instead of creating a new one when updating an existing freetext	2024-07-11 10:44:21 +02:00
Jonas Jenwald	403d023617	Allow e.g. /FitH destinations without additional parameter (bug 1907000) According to the PDF specification these destinations should have a coordinate parameter, which may however be `null`, but it shouldn't be omitted; please see https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G11.2095870 Hence we try to work-around bad PDF generators by making the coordinate parameter optional when validating explicit destinations in both the worker and the viewer.	2024-07-11 10:36:44 +02:00
Jonas Jenwald	5ee61690f3	Merge pull request #18390 from alexcat3/fix-issue-18099 Handle toUnicode cMaps that omit leading zeros in hex encoded UTF-16 (issue 18099)	2024-07-06 18:57:07 +02:00
alexcat3	1c364422a6	Handle toUnicode cmaps that omit leading zeros in hex encoded UTF-16 (issue 18099) Add unit test to check compatability with such cmaps In the PDF in issue 18099. the toUnicode cmap had a line to map the glyph char codes from 00 to 7F to the corresponding code points. The syntax to map a range of char codes to a range of unicode code points is <start_char_code> <end_char_code> <start_unicode_codepoint> As the unicode code points are supposed to be given in UTF-16 BE, the PDF's line SHOULD have probably read <00> <7F> <0000> Instead it omitted two leading zeros from the UTF-16 like this <00> <7F> <00> This confused PDF.js into mapping these character codes to the UTF-16 characters with the corresponding HIGH bytes (01 became \u0100, 02 became \u0200, et cetera), which ended up turning latin text in the PDF into chinese when it was copied I'm not sure if the PDF spec actually allows PDFs to do this, but since there's at least one PDF in the wild that does and other PDF readers read it correctly, PDF.js should probably support this	2024-07-06 11:29:21 -04:00
Tim van der Meij	2a44203d96	Fix the "caches image resources at the document/page level as expected (issue 11878)" unit test This unit test fails occasionally (albeit much less than before thanks to PR #17663), so we change the parsing time check's divisor to prevent it from happening again. If the last page's rendering time is less than or equal to 50% of the first page's rendering time that should be enough proof that no worker thread re-parsing occurred while also providing a wide enough range to avoid intermittents. Note that the assertion is now equal to the one we already have in the "caches image resources at the document/page level, with main-thread copying of complex images (issue 11518)" unit test which seems to work reliably so far.	2024-07-06 16:30:07 +02:00
Jonas Jenwald	38528d1116	Remove the `renderForms` parameter from the Annotation `getOperatorList` methods The `renderForms` parameter pre-dates the introduction of the general `intent` parameter, which means that we're now effectively passing the same state twice to these `getOperatorList` methods.	2024-07-05 12:25:18 +02:00
Jonas Jenwald	f3d177e3e4	[api-minor] Remove the deprecated `renderTextLayer` and `updateTextLayer` functions (PR 18104 follow-up)	2024-06-30 15:16:00 +02:00
Tim van der Meij	2f3bf6f07e	Don't ignore errors in the Jasmine suite start/end stages Currently errors in `afterAll` are logged, but don't fail the tests. This could cause new errors during test teardown to go by unnoticed. Moreover, the integration test use a different reporting mechanism which also handled errors differently (this is extra reason to do #12730). This patch fixes the issues by consistently handling errors in `suiteStarted` and `suiteDone` in both reporting mechanisms. Fixes #18319.	2024-06-23 20:59:48 +02:00
bootleq	890c567eca	Expose entireWord in updateFindControlState Allow apps with supportsIntegratedFind to better monitor the find state. A recognized use case is the Firefox findbar, its "not found" sound must consider `entireWord` and only make noise when it is off. See related implementation in https://hg.mozilla.org/mozilla-central/rev/16b902cbcf26 This change can help if we have to move the implementation from cpp to jsm.	2024-06-21 13:12:59 +08:00
Calixte Denizet	c14c3cfc9f	Improve date parsing in the js sandbox If for example dd:mm is failing we just try with d:m which is equivalent to the regex /d{1,2}:m{1,2}/. This way it allows the user to forget the 0 for the first days/months.	2024-06-14 17:21:50 +02:00
Calixte Denizet	6fa98ac99f	[api-minor] Simplify how the list of points are structured Instead of sending to the main thread an array of Objects for a list of points (or quadpoints), we'll send just a basic float buffer. It should slightly improve performances (especially when cloning the data) and use slightly less memory.	2024-05-30 15:36:15 +02:00
Jonas Jenwald	06334c97ef	Improve the `loadingParams` functionality in the API - Move the definition of the `loadingParams` Object, to simplify the code. - Add a unit-test, since none existed and the viewer depends on this functionality.	2024-05-24 09:26:40 +02:00
Jonas Jenwald	15b5808eee	[api-minor] Re-factor the basic textLayer-functionality This is very old code, and predates e.g. the introduction of JavaScript classes, which creates unnecessarily unwieldy code in the viewer. By introducing a new `TextLayer` class in the API, similar to how e.g. the `AnnotationLayer` looks, we're able to keep most parameters on the class-instance itself. This removes the need to manually track them in the viewer, and simplifies the call-sites. This also removes the `numTextDivs` parameter from the "textlayerrendered" event, since that's only added to support default-viewer functionality that no longer exists. Finally we try, as far as possible, to polyfill the old `renderTextLayer` and `updateTextLayer` functions since they are exposed in the library API. For simple invocations of `renderTextLayer` the behaviour should thus be the same, with only a warning printed in the console.	2024-05-17 14:20:20 +02:00
Tim van der Meij	4db843617f	Merge pull request #18047 from Snuffleupagus/issue-18042 Avoid re-parsing global images that failed decoding (issue 18042, PR 17428 follow-up)	2024-05-15 15:40:18 +02:00
Tim van der Meij	6b237e3358	Implement a unit test for the `BaseException` class The issue from #18003 hasn't been shown to be caused by PDF.js, but it did surface that we don't have (direct) unit test coverage for the `BaseException` class. This made it more difficult to prove that the `stack` property was already available on exception instances, but more importantly it caused the CI to be green even though the suggested change would have caused the `stack` property to disappear. To avoid future regressions, for e.g. similar changes or a rewrite from a closure to a proper class, this commit introduces a dedicated unit test for `BaseException` that asserts that our exception instances indeed expose all expected properties.	2024-05-14 20:21:42 +02:00
Jonas Jenwald	c5f92437f7	Avoid re-parsing global images that failed decoding (issue 18042, PR 17428 follow-up) For images that failed to decode once we want to avoid a pointless round-trip to the main-thread, which could otherwise happen for globally cached images.	2024-05-14 13:58:36 +02:00
Jonas Jenwald	6d523c316c	[api-minor] Include the document /Lang attribute in the textContent-data - These changes will allow a simpler way of implementing PR 17770. - The /Lang attribute is fetched lazily, with the first `getTextContent` invocation. Given the existing worker-thread caching, this will thus only need to be done once per PDF document (and most PDFs don't included this data). - This makes the /Lang attribute directly available in the `textLayer`, which has the following advantages: - We don't need to block, and thus delay, overall viewer initialization on fetching it (nor pass it around throughout the viewer). - Third-party users of the `textLayer` will automatically benefit from this, once we start actually using the /Lang attribute in PR 17770. Please note: This also, importantly, means that the `text` reference-tests will then cover this code (which wouldn't otherwise have been the case).	2024-05-14 12:44:41 +02:00
Jonas Jenwald	c0b5d93ef4	Merge pull request #18052 from Snuffleupagus/textLayer-only-ReadableStream Restore broken functionality and simplify the implementation in `src/display/text_layer.js`	2024-05-14 12:30:27 +02:00
Jonas Jenwald	049848ba00	Unify the `ReadableStream` and `TextContent` code-paths in `src/display/text_layer.js` The only reason that this code still accepts `TextContent` is for backward-compatibility purposes, so we can simplify the implementation by always using a `ReadableStream` internally.	2024-05-07 13:03:57 +02:00
Jonas Jenwald	2643570364	[api-minor] Re-factor how Node.js packages/polyfills are loaded (issue 17245) Please note: This removes top level await from the GENERIC builds of the PDF.js library. Despite top level await being supported in all modern browsers/environments, note [the MDN compatibility data](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/await#browser_compatibility), it seems that many frameworks and build-tools unfortunately have trouble with it. Hence, in order to reduce the influx of support requests regarding top level await it thus seems that we'll have to try and fix this. Given that top level await is only needed for Node.js environments, to load packages/polyfills, we re-factor things to limit the asynchronicity to that environment. The "best" solution, with the least likelihood of causing future problems, would probably be to await the load of Node.js packages/polyfills e.g. at the top of the `getDocument`-function. Unfortunately that doesn't work though, since that's a synchronous function that we cannot change without breaking "the world". Hence we instead await the load of Node.js packages/polyfills together with the `PDFWorker` initialization, since that's the first point of asynchronicity during initialization/loading of a PDF document. The reason that this works is that the Node.js packages/polyfills are only needed during fetching of the PDF document respectively during rendering, neither of which can happen until the worker has been initialized. Hopefully this won't cause any future problems, since looking at the history of the PDF.js project I don't believe that we've (thus far) ever needed a Node.js dependency at an earlier point. This new pattern for accessing Node.js packages/polyfills will also require some care during development and importantly reviewing, to ensure that no new top level await is added in the main code-base.	2024-05-06 23:20:03 +02:00
Jonas Jenwald	a790f2df5d	[api-minor] Remove the unused `onlyStripPath` option from the `getFilenameFromUrl` helper function	2024-05-03 08:29:41 +02:00
Jonas Jenwald	2b69fb76ac	[api-minor] Improve the `FileSpec` implementation - Check that the `filename` is actually a string, before parsing it further. - Use proper "shadowing" in the `filename` getter. - Add a bit more validation of the data in `pickPlatformItem`. - Last, but not least, return both the original `filename` and the (path stripped) variant needed in the display-layer and viewer.	2024-05-01 18:02:05 +02:00
Jonas Jenwald	bf4e36d1b5	[api-minor] Expose the /Desc-attribute of file attachments in the viewer (issue 18030) In the viewer this will be displayed in the `title` of the hyperlink, which is probably the best we can do here given how the viewer is implemented.	2024-05-01 09:02:11 +02:00
Calixte Denizet	45fa867577	Allow to insert several annotations under the same parent in the structure tree While testing stamp insertion with the added pdf, I noticed that the tags using a MCID weren't considered when trying to attach an annotation to it.	2024-04-24 16:23:05 +02:00
Tim van der Meij	bda98b91cb	Merge pull request #17967 from Snuffleupagus/eventBus-signal Add `signal`-support in the `EventBus`, and utilize it in the viewer (PR 17964 follow-up)	2024-04-23 15:55:59 +02:00
Jonas Jenwald	9e80c6d228	Merge pull request #17978 from Snuffleupagus/pr-17428-followup Extend the globally cached image main-thread copying to "complex" images as well (PR 17428 follow-up)	2024-04-22 16:46:23 +02:00
Tim van der Meij	335d8394cd	Merge pull request #17979 from Snuffleupagus/image-errors-shorter-msg [api-minor] Remove the image-related error message prefixes	2024-04-22 15:35:10 +02:00
Jonas Jenwald	912b57b95d	[api-minor] Remove the image-related error message prefixes Other custom errors, based on `BaseException`, do not use such a format.	2024-04-20 12:51:45 +02:00
Jonas Jenwald	702ee7b1e1	Add `signal`-support in the `EventBus`, and utilize it in the viewer (PR 17964 follow-up) This mimics the `signal` option that's available for `addEventListener`, see [MDN](https://developer.mozilla.org/en-US/docs/Web/API/EventTarget/addEventListener#signal).	2024-04-20 12:00:58 +02:00
Jonas Jenwald	91898e5923	Extend the globally cached image main-thread copying to "complex" images as well (PR 17428 follow-up) In PR 17428 this functionality was limited to "larger" images, to not affect performance negatively. However it turns out that it's also beneficial to consider more "complex" images, regardless of their size, that contain /SMask or /Mask data; see issue 11518.	2024-04-20 11:10:09 +02:00
Calixte Denizet	901d995a7e	Correctly update the xref table when an annotation is deleted	2024-04-18 21:27:39 +02:00
Calixte Denizet	52ea2333b3	Remove the tag for missing font subset when trying to find a substitution Fixes #17929.	2024-04-11 20:34:28 +02:00
Tim van der Meij	d01a0bd0c8	Fix annotation border style parsing by handling empty dash arrays The PDF specification states that empty dash arrays, i.e. arrays with zero elements, are in fact valid. In that case the dash array simply corresponds to a solid, unbroken line. However, this case was erroneously being flagged as invalid and therefore the annotation was not drawn because its width was set to zero. This commit fixes the issue by allowing dash arrays to have a length of zero.	2024-04-08 16:34:27 +02:00
Tim van der Meij	2e5282928f	Merge pull request #17854 from Snuffleupagus/rm-PromiseCapability [api-minor] Replace the `PromiseCapability` with `Promise.withResolvers()`	2024-04-02 15:21:43 +02:00
Jonas Jenwald	e4d0e84802	[api-minor] Replace the `PromiseCapability` with `Promise.withResolvers()` This replaces our custom `PromiseCapability`-class with the new native `Promise.withResolvers()` functionality, which does almost the same thing[1]; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/withResolvers The only difference is that `PromiseCapability` also had a `settled`-getter, which was however not widely used and the call-sites can either be removed or re-factored to avoid it. In particular: - In `src/display/api.js` we can tweak the `PDFObjects`-class to use a "special" initial data-value and just compare against that, in order to replace the `settled`-state. - In `web/app.js` we change the only case to manually track the `settled`-state, which should hopefully be OK given how this is being used. - In `web/pdf_outline_viewer.js` we can remove the `settled`-checks, since the code should work just fine without it. The only thing that could potentially happen is that we try to `resolve` a Promise multiple times, which is however not a problem since the value of a Promise cannot be changed once fulfilled or rejected. - In `web/pdf_viewer.js` we can remove the `settled`-checks, since the code should work fine without them: - For the `_onePageRenderedCapability` case the `settled`-check is used in a `EventBus`-listener which is removed on its first (valid) invocation. - For the `_pagesCapability` case the `settled`-check is used in a print-related helper that works just fine with "only" the other checks. - In `test/unit/api_spec.js` we can change the few relevant cases to manually track the `settled`-state, since this is both simple and test-only code. --- [1] In browsers/environments that lack native support, note [the compatibility data](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/withResolvers#browser_compatibility), it'll be polyfilled via the `core-js` library (but only in `legacy` builds).	2024-04-01 11:42:37 +02:00
Calixte Denizet	136c1faa7f	Display outlines even if one has no title Fixes #17856.	2024-03-29 21:30:24 +01:00
Jonas Jenwald	0d039937f9	Add better support for /Launch actions with /FileSpec dictionaries (issue 17846)	2024-03-26 20:15:48 +01:00
Jonas Jenwald	0022310b9c	Merge pull request #17706 from Snuffleupagus/Node-Fetch-API [api-minor] Use the Fetch API, when supported, to load PDF documents in Node.js environments	2024-03-19 11:04:28 +01:00
Jonas Jenwald	eded037d06	[api-minor] Use the Fetch API, when supported, to load PDF documents in Node.js environments Given that modern Node.js versions now implement support for a fair number of "browser" APIs, we can utilize the standard Fetch API to load PDF documents that are specified via http/https URLs. Please find compatibility information at: - https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API#browser_compatibility - https://nodejs.org/dist/latest-v18.x/docs/api/globals.html#fetch - https://developer.mozilla.org/en-US/docs/Web/API/Response#browser_compatibility - https://nodejs.org/dist/latest-v18.x/docs/api/globals.html#response	2024-02-21 22:38:42 +01:00
Jonas Jenwald	90b2664622	Add better validation for the "PREFERENCE" kind `AppOptions` Given that the "PREFERENCE" kind is used e.g. to generate the preference-list for the Firefox PDF Viewer, those options need to be carefully validated. With this patch we'll now check this unconditionally in development mode, during testing, and when creating the preferences in the gulpfile.	2024-02-20 18:38:15 +01:00

1 2 3 4 5 ...

1235 commits