- Over time the number and size of these factories have increased, especially the `DOMFilterFactory` class, and this split should thus aid readability/maintainability of the code.
- By introducing a couple of new import maps we can avoid bundling the `DOMCMapReaderFactory`/`DOMStandardFontDataFactory` classes in the Firefox PDF Viewer, since they are dead code there given that worker-thread fetching is always being used.
- This patch has been successfully tested, by running `$ ./mach test toolkit/components/pdfjs/`, in a local Firefox artifact-build.
*Note:* This patch reduces the size of the `gulp mozcentral` output by `1.3` kilo-bytes, which isn't a lot but still cannot hurt.
This patch updates the minimum supported browsers as follows:
- Google Chrome 103[1], which was released on 2022-06-21; see https://chromereleases.googleblog.com/2022/06/stable-channel-update-for-desktop_21.html
Note that nowadays we usually try, where feasible and possible, to support browsers that are about two years old. By limiting support to only "recent" browsers we reduce the risk of holding back improvements of the *built-in* Firefox PDF Viewer, and also (significantly) reduce the maintenance/support burden for the PDF.js contributors.
*Please note:* As always, the minimum supported browser version assumes that a `legacy`-build of the PDF.js library is being used; see https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support
---
[1] This is consistent with the minimum supported version in the recently updated Google Chrome addon.
For the Firefox pdf viewer, we want to use AI to guess an alt-text when adding an image to a pdf.
For now the telemtry stuff is not implemented and will come soon.
In order to test it locally:
- set enableAltText, enableFakeMLManager and enableUpdatedAddImage to true.
or in Firefox:
- set browser.ml.enable, pdfjs.enableAltText and pdfjs.enableUpdatedAddImage to true.
The `streamqueue` dependency is only used for the test targets in the
Gulpfile to make sure that the test types are run in series. This is
done by modelling the test processes as readable streams and then having
`streamqueue` combine them into a single readable stream for Gulp that
processes the inner readable streams in series (in contrast to the
`ordered-read-streams` dependency which is very similar but processes
the inner streams in parallel).
However, modelling the test processes as readable streams is a bit odd
because we're not actually streaming any data as one might expect.
Instead, we only use them to signal test process completion/abortion.
Fortunately nowadays, with modern Gulp versions, we don't need readable
streams and `streamqueue` anymore because we can achieve the same result
with simple asynchronous functions that can be passed to e.g.
`gulp.series()` calls. Note that we already do this in various places,
and overall it should be a better fit for test process invocations.
The Git logic for pushing to the `pdfjs-dist` repository was already
removed in PR #18350, but it was forgotten to also remove the logic to
create a Git repository in the first place. This commit fixes the open
action for that from
https://github.com/mozilla/pdf.js/pull/18358#discussion_r1661367441.
Moreover, we implement the suggestion from
https://github.com/mozilla/pdf.js/pull/18358#discussion_r1661364026
about moving the Git URL to a separate variable for consistency and we
update the homepage URL to the HTTPS scheme to avoid an HTTP -> HTTPS
redirect and enforce TLS-encrypted connections.
In the MOZCENTRAL build the `BasePreferences` class is almost unused, since it's only being used to initialize and update the `AppOptions` which means that theoretically we could move the relevant code there.
However for the other build targets (e.g. GENERIC and CHROME) we still need to keep the `BasePreferences` class, although we can re-factor things to move the necessary validation inside of `AppOptions` and thus simplify the code and reduce duplication.
The patch also moves the event dispatching, for changed preference values, into `AppOptions` instead.
For provenance, enabled in PR #18352, to work the repository URL in
`package.json` is required to match the repository URL of the GitHub
Actions invocation. This should fix the following error we encountered
publishing a new release today:
```
npm error 422 Unprocessable Entity - PUT https://registry.npmjs.org/pdfjs-dist - Error verifying sigstore provenance bundle: Failed to validate repository information: package.json: "repository.url" is "git+https://github.com/mozilla/pdfjs-dist.git", expected to match "https://github.com/mozilla/pdf.js" from provenance
```
This commit removes the following warnings from the `npm publish` output:
```
npm warn publish npm auto-corrected some errors in your package.json when publishing. Please run "npm pkg fix" to address these errors.
npm warn publish errors corrected:
npm warn publish Removed invalid "scripts"
npm warn publish "repository.url" was normalized to "git+https://github.com/mozilla/pdfjs-dist.git"
```
For the "scripts" section it turns out that if the package doesn't have
any scripts it's expected to explicitly set it to an empty object; refer
to https://github.com/npm/cli/issues/6918 and
https://github.com/denoland/dnt/pull/414.
The release builds are currently not reproducible because ZIP files
record the modification date of files generated during the build
process, meaning that two builds from identical source code, made
at different times, result in different output.
This is undesirable because it makes detecting differences in the output
harder, for instance recently during the Gulp 5 efforts, because the
modification date differences are irrelevant and could obscure actually
important differences in the output during e.g. code changes. Moreover,
reprodicibility of build artifacts has become increasingly important;
please refer to the Reproducible Builds initiative at
https://reproducible-builds.org (note the "Why does it matter?" section
specifically) and https://reproducible-builds.org/docs/timestamps which
further explains the problem of timestamps in build artifacts.
This commit fixes the issue by configuring the ZIP file creation to use
the (fixed) date of the last Git commit for which the release is being
made. With this the build is fully reproducible so that identical source
code builds result in bit-by-bit identical output artifacts.
To improve readability we convert the compression method to take a
parameter object and use template strings where useful.
The JSDoc builds are currently not reproducible because a timestamp is
included in the output, meaning that two builds from identical source
code, made at different times, result in different output.
This is undesirable because it makes diffing the output difficult, for
instance recently during the Gulp 5 efforts, because the timestamp
differences are irrelevant and could obscure actually important
differences in the output during e.g. code changes. Moreover,
reprodicibility of build artifacts has become increasingly important;
please refer to the Reproducible Builds initiative at
https://reproducible-builds.org (note the "Why does it matter?" section
specifically) and https://reproducible-builds.org/docs/timestamps which
further explains the problem of timestamps in build artifacts.
This commit fixes the issue by configuring JSDoc to not include the
timestamps in the output. It's not relevant for end users and without it
the build is fully reproducible so that identical source code builds
result in bit-by-bit identical output artifacts.
Note that this option sadly can only be set via a configuration file,
and not via the command line parameters like we used to have, so for
consistency we also move the other options into the configuration file
so they are all in one place and the Gulpfile becomes a bit simpler.
Wintersmith is no longer maintained given that the most recent version
is from six years ago, and all vulnerabilities that NPM reports
originate from Wintersmith's dependencies. Metalsmith, and its plugins,
on the other hand have recently had releases and don't have known
vulnerabilities. In fact, the number of reported vulnerabilities by NPM
even goes down to zero with this patch applied.
This commit therefore replaces Wintersmith with Metalsmith by providing
a transparent drop-in replacement, in a way that requires the least
amount of changes to the code and the generated output.
Note that this patch does update our versions of jQuery, Bootstrap and
the Highlight.js theme because the previous versions were very outdated
and didn't work correctly with Metalsmith. Moreover, those old versions
contained vulnerabilities that are hereby fixed.
Fixes#18198.
This is a major version bump, and the changelog at
https://github.com/gulpjs/gulp/releases/tag/v5.0.0 indicates one
breaking change that impacts us, namely that streams are now by default
interpreted/transformed to UTF-8 encoding. This breaks `gulp.src` calls
that work on binary files such as images or CMaps, but is fortunately
easy to fix for us by disabling re-encoding for all `gulp.src` calls
(see https://github.com/gulpjs/gulp/issues/2764#issuecomment-2063415792
for more information). This restores the previous behavior of copying
the files as-is without Gulp performing any transformations to it, which
is what we want because Gulp is only used for bundling and we make sure
that the source files have the right encoding.
The `merge-stream` dependency is no longer maintained and doesn't work
in combination with Gulp 5 anymore (for more information refer to
https://github.com/gulpjs/gulp/issues/2802#issuecomment-2094130656).
Fortunately the Gulp team maintains a drop-in replacement dependency
called `ordered-read-streams` with the same API as `merge-stream`.
Indeed, running all affected Gulp targets and comparing build artifacts
with `diff -r <old> <new>` confirms that no unexpected changes are made.
Fixes a part of #17922.
The `through2` dependency got introduced over four years ago in #11325 to
replace the unmaintained `gulp-transform` dependency. However, sadly the
same holds for `through2` since the last release was also four years ago.
Fortunately the `through2` dependency can trivially be replaced with the
built-in Node.js `stream.Transform` API nowadays. In fact, the `through2`
dependency mentions themselves in their README already that they are "a
tiny wrapper around Node.js streams.Transform". The `stream.Transform`
API is available in all Node.js versions we support, and in Node.js 6
already the simplified constructor approach for `stream.Transform` got
introduced to simplify creating custom stream transformers; see
https://nodejs.org/docs/latest-v6.x/api/stream.html#stream_new_stream_transform_options.
This commit therefore replaces `through2` by switching to the
`stream.Transform` API directly so we don't need any wrappers anymore.
Note that for our case the only change we have to make is to enable
object mode, see https://nodejs.org/api/stream.html#object-mode, because
we pass in `VinylFile` objects instead of e.g. regular `Buffer` objects.
I have confirmed in two ways that this is indeed a drop-in replacement:
- Running the Gulp targets that call the `transform` function and
diffing the resulting `build` folder before/after this patch, with
`diff -r build-old/ build-new/`, to ensure that there are no
unexpected changes in the output.
- Changing the Gulpfile to, instead of UTF-8, transform the files to
ASCII, and diffing the resulting `build` folder to confirm that the
transformation logic works and produces different results, such as:
```
diff build/lib/core/standard_fonts.js build-ascii/lib/core/standard_fonts.js
284c284
< t["Trinité"] = true;
---
> t["Trinit�"] = true;
```
In Node.js 14.14.0 the `fs.rmSync` function was added that removes files
and directories. The `recursive` option is used to remove directories
and their contents, making it a drop-in replacement for the `rimraf`
dependency we use.
Given that PDF.js now requires Node.js 18+ we can be sure that this
option is available, so we can safely remove `rimraf` and reduce the
number of project dependencies.
Co-authored-by: Wojciech Maj <kontakt@wojtekmaj.pl>
The Puppeteer update should in particular be helpful for us because it
contains improved WebDriver BiDi compatibility, a newer Chrome version
(both might help for #17962) and an official deprecation of CDP for
Firefox. Note that the latter doesn't require changes on our end because
we already use WebDriver BiDi unconditionally for Firefox since commit
4db0174. The full release notes can be found at
https://github.com/puppeteer/puppeteer/releases/tag/puppeteer-core-v22.8.0.
This global was only introduced to work-around problems caused by the GENERIC PDF.js build using top level await. Since that was removed in the previous commit, this global is now dead code.
This patch updates the minimum supported browsers as follows:
- Safari 16.4, which was released on 2023-03-27; see https://developer.apple.com/documentation/safari-release-notes/safari-16_4-release-notes
Nowadays we usually we try, where feasible and possible, to support browsers/environments that are about two years old. The reasons for limiting support to a slightly more recent Safari version include:
- Safari has always been slower, compared to other browsers, at implementing e.g. new JavaScript features.
- Trying to provide support for Safari is often difficult, and over the years we have seen *a lot* of bugs that are specific to Safari.
- Safari is, and has been for many years, only listed as "mostly" supported in the FAQ.
- This allows us to remove feature-testing, only relevant to Safari, from the main code-base.
By limiting support to only "recent" browsers we reduce the risk of holding back improvements of the built-in Firefox PDF Viewer, and also (significantly) reduce the maintenance/support burden for the PDF.js core contributors.
*Please note:* As always, the minimum supported browser version assumes that a `legacy`-build of the PDF.js library is being used; see https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support
In Node.js 10.12.0 the `recursive` option was added to `fs.mkdirSync`.
This option allows us to create a directory and all its parent
directories if they do not exist, making it a drop-in replacement for
the `mkdirp` dependency we use.
Given that PDF.js now requires Node.js 18+ we can be sure that this
option is available, so we can safely remove `mkdirp` and reduce the
number of project dependencies.
Co-authored-by: Wojciech Maj <kontakt@wojtekmaj.pl>
This replaces our custom `PromiseCapability`-class with the new native `Promise.withResolvers()` functionality, which does *almost* the same thing[1]; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/withResolvers
The only difference is that `PromiseCapability` also had a `settled`-getter, which was however not widely used and the call-sites can either be removed or re-factored to avoid it. In particular:
- In `src/display/api.js` we can tweak the `PDFObjects`-class to use a "special" initial data-value and just compare against that, in order to replace the `settled`-state.
- In `web/app.js` we change the only case to manually track the `settled`-state, which should hopefully be OK given how this is being used.
- In `web/pdf_outline_viewer.js` we can remove the `settled`-checks, since the code should work just fine without it. The only thing that could potentially happen is that we try to `resolve` a Promise multiple times, which is however *not* a problem since the value of a Promise cannot be changed once fulfilled or rejected.
- In `web/pdf_viewer.js` we can remove the `settled`-checks, since the code should work fine without them:
- For the `_onePageRenderedCapability` case the `settled`-check is used in a `EventBus`-listener which is *removed* on its first (valid) invocation.
- For the `_pagesCapability` case the `settled`-check is used in a print-related helper that works just fine with "only" the other checks.
- In `test/unit/api_spec.js` we can change the few relevant cases to manually track the `settled`-state, since this is both simple and *test-only* code.
---
[1] In browsers/environments that lack native support, note [the compatibility data](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise/withResolvers#browser_compatibility), it'll be polyfilled via the `core-js` library (but only in `legacy` builds).
This patch updates the minimum supported browsers as follows:
- Google Chrome 98, which was released on 2022-02-01; see https://chromereleases.googleblog.com/2022/02/stable-channel-update-for-desktop.html
The primary reason for this version bump is `structuredClone` support, please see https://developer.mozilla.org/en-US/docs/Web/API/structuredClone#browser_compatibility
At this point in time we're using `structuredClone` in different parts of the code-base, and it's unfortunately functionality that's difficult to polyfill *completely* (affects the `transfer` option specifically).
Note that nowadays we usually try, where feasible and possible, to support browsers that are about two years old. By limiting support to only "recent" browsers we reduce the risk of holding back improvements of the *built-in* Firefox PDF Viewer, and also (significantly) reduce the maintenance/support burden for the PDF.js contributors.
*Please note:* As always, the minimum supported browser version assumes that a `legacy`-build of the PDF.js library is being used; see https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#faq-support
Given that the "PREFERENCE" kind is used e.g. to generate the preference-list for the Firefox PDF Viewer, those options need to be carefully validated.
With this patch we'll now check this unconditionally in development mode, during testing, and when creating the preferences in the gulpfile.
Over time, as we've started relying more and more on import maps, the number of aliases have increased a lot. This is now affecting the size and readability of `createWebpackConfig`, which was already fairly large and complex, hence moving the aliases to their own function should help improve things a little bit.
As part of the changes in PR 17686 we "accidentally" enabled source-maps for the *minified* builds, which seems unnecessary since those have never been included in the `pdfjs-dist` output.
Locally this patch reduces the run-time of `gulp minified` by ~15 percent.
Rather than first building the library and then use Terser "manually" to minify the files, we can utilize a Webpack plugin to combine these steps which helps to simplify the gulpfile.