In order to fix bug 1935076, we'll have to add a pure js fallback in case wasm is disabled
or simd isn't supported. Unfortunately, this fallback will take some space.
So, the main goal of this patch is to reduce the overall size (by ~93k).
As a side effect, it should make easier to use an other wasm file (which must export
_jp2_decode, _malloc and _free).
JSON imports are now supported by all tools used in PDF.js' build
process. The `chromecom.js` file is bundled by webpack and
import attributes are thus removed, so browser compatibility for this
new syntax is not relevant.
Flat config is the new config system used by ESLint 9.
To make the migration easier, they also added
flat config support to ESLint 8.
This commit migrates the various ESLint configs in the repository to use
the new system, **without** upgrading to ESLint 9 yet.
Note that the new version of `eslint-plugin-import` contains support
for ESLint 9.
Moreover, the new version of Babel removed the leading space for import
comments (see https://github.com/babel/babel/pull/16780), so we update
the corresponding test expectation to match this.
Fixes a part of #17928.
We introduced quite a few new strings recently, but during the last few
rounds of updates we didn't see new translations updates becoming
available. I only just now recalled the announcement that Mozilla is
moving from Mercurial to Git, and indeed the Mercurial page at
https://hg.mozilla.org/l10n-central hasn't been updated since July
anymore and that's were we used to pull our translations from.
This commit fixes the issue by changing the URLs to the Mozilla Git
repositories hosted on GitHub instead.
This commit updates the Babel plugin to:
- apply the same flattening logic that we already
have for blocks, to flatten blocks nested inside
class static blocks
- remove class static blocks when, after flattening
all the blocks they contain, they are empty.
Before this commit, the transform output was the
same as the input.
When an image has a non-zero SMaskInData it means that the image
has an alpha channel.
With JPX images, the colorspace isn't required (by spec) so when we
don't have it, the JPX decoder will handle the conversion in RGBA
format.
This only happens when it's safe to do so. The exceptions are:
- when the class extends another subclass: removing the constructor would remove the error about the missing super() call
- when there are default parameters, that could have side effects
- when there are destructured prameters, that could have side effects
Given that the Fetch API is supported since Node.js 18 we should be able to use it when downloading l10n files, which allows us to simplify the code and to make it fully `async`.
Note how we're using custom `__non_webpack_import__`-calls in the code-base, that we replace during the post-processing stage of the build, to be able to write `import`-calls that Webpack will leave alone during parsing.
This work-around is necessary since we let Babel discards all comments, given that we generally don't need/want them in the builds, hence why we cannot utilize `/* webpackIgnore: true */`-comments in the source-code.
After the changes in PR 17563 it thus seems to me that we should be able to just move this re-writing into the Babel plugin instead.
Looking at the *built* files you'll notice some lines containing nothing more than a semicolon. This is the result of (mostly top-level) `if`-statements, which include `PDFJSDev`-checks, that evalute to `false` during Babel parsing.
This has always annoyed me a bit, and looking at Babel plugin it seems that we can fix this simply by *removing* the relevant nodes.
This part of the (modern) preprocessor is now dead code, since we no longer use `require` statements anywhere in the main code-base.
Note that as part of the changes leading up to PDF.js version `4` we removed all[1] the remaining `require` statements, and we also have an ESLint rule to ensure that no new ones are accidentally added.
---
[1] With two small exceptions, in benchmarking-code and in the Webpack-example.
All of our static evaluation & dead-code elimination transforms need to
happen in post-order, transforming inner nodes first. This is so that
in complex nested cases all transforms see the simplified version of
their inner nodes.
For example:
async getNimbusExperimentData() {
if (!PDFJSDev.test("GECKOVIEW")) { return null; }
// other code
}
-> [evaluation of PDFJSDev.*]
async getNimbusExperimentData() {
if (!false) { return null; }
// other code
}
-> [!false -> true]
async getNimbusExperimentData() {
if (true) { return null; }
// other code
}
-> [if (true) -> replace with the if branch]
async getNimbusExperimentData() {
return null;
// other code
}
-> [early return -> remove dead code]
async getNimbusExperimentData() {
return null;
// other code
}
This was done correctly in all cases except for our `UnaryExpression`
transform, which was happening in pre-order.
This commit converts the pdfjsdev-loader transform into a Babel plugin,
to skip a AST->string->AST round-trip.
Before this commit, the webpack build process was:
1. Babel parses the code
2. Babel transforms the AST
3. Babel generates the code
4. Acorn parses the code
5. pdfjsdev-loader transforms the AST
6. @javascript-obfuscator/escodegen generates the code
7. Webpack parses the file
8. Webpack concatenates the files
After this commit, it is reduced to:
1. Babel parses the code
2. Babel transforms the AST
3. babel-plugin-pdfjs-preprocessor transforms the AST
4. Babel generates the code
5. Webpack parses the file
6. Webpack concatenates the files
This change improves the build time by ~25% (tested on MacBook Air M2):
- `gulp lib`: 3.4s to 2.6s
- `gulp dist`: 36s to 29s
- `gulp generic`: 5.5s to 4.0s
- `gulp mozcentral`: 4.7s to 3.2s
The new Babel plugin doesn't support the `saveComments` option of
pdfjsdev-loader, and it just always discards comments. Even though
pdfjsdev-loader supported multiple values for that option, it was
effectively ignored due to `acorn` dropping comments by default.
This is in preparation for the next commit, which will convert
preprocessor2.mjs to a Babel plugin. The purpose of this commit
is to help git track the rename regardless of the large amount
of changes.
- For the generic viewer we use @fluent/dom and @fluent/bundle
- For the builtin pdf viewer in Firefox, we set a localization url
and then we rely on document.l10n which is a DOMLocalization object.
Given the amount of work put into removing `require`-calls from the code-base, let's ensure that new ones aren't accidentally added in the future.
Note that we still have a couple of files where `require` is being used, in particular:
- The Node.js examples, however those will be updated to use `import` in PR 17081.
- The Webpack examples, and related support files, however I unfortunately don't know enough about Webpack to be able to update those. (Hopefully users of that code will help out here, once version `4` is released.)
- The `statcmp`-tool, since *some* of those `require`-calls cannot be converted to `import` without other code changes (and that file is only used during benchmarking).
Please find additional details at https://github.com/import-js/eslint-plugin-import/blob/main/docs/rules/no-commonjs.md
At this point in time all browsers, and also Node.js, support standard `import`/`export` statements and we can now finally consider outputting modern JavaScript modules in the builds.[1]
In order for this to work we can *only* use proper `import`/`export` statements throughout the main code-base, and (as expected) our Node.js support made this much more complicated since both the official builds and the GitHub Actions-based tests must keep working.[2]
One remaining issue is that the `pdf.scripting.js` file cannot be built as a JavaScript module, since doing so breaks PDF scripting.
Note that my initial goal was to try and split these changes into a couple of commits, however that unfortunately didn't really work since it turned out to be difficult for smaller patches to work correctly and pass (all) tests that way.[3]
This is a classic case of every change requiring a couple of other changes, with each of those changes requiring further changes in turn and the size/scope quickly increasing as a result.
One possible "issue" with these changes is that we'll now only output JavaScript modules in the builds, which could perhaps be a problem with older tools. However it unfortunately seems far too complicated/time-consuming for us to attempt to support both the old and modern module formats, hence the alternative would be to do "nothing" here and just keep our "old" builds.[4]
---
[1] The final blocker was module support in workers in Firefox, which was implemented in Firefox 114; please see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/import#browser_compatibility
[2] It's probably possible to further improve/simplify especially the Node.js-specific code, but it does appear to work as-is.
[3] Having partially "broken" patches, that fail tests, as part of the commit history is *really not* a good idea in general.
[4] Outputting JavaScript modules was first requested almost five years ago, see issue 10317, and nowadays there *should* be much better support for JavaScript modules in various tools.
Given the limitations of the old pre-processor that's used for CSS/HTML files, this unfortunately isn't as "easy" to implement as it is for JavaScript code.
Since this is the first case where we've wanted to do conditional CSS imports, rather than trying to completely re-write the pre-processor, this patch settles for handling it explicitly in the `expandCssImports` function.
It occurred to me that we can actually run this unit-test in Node.js environments by making use of the preprocessor to stub out the browser globals there.
- Replace FoxitSans with LiberationSans: LiberationSans is already there (for XFA) and we can use
it as a good replacement of FoxitSans.
- For now we just try to substitue standard fonts, the strategy is the following:
* we try to find a font locally from a hardcoded list;
* if it fails then we use Liberation as fallback (only for Helvetica for the moment);
* else we just fallback on the system serif/sansserif/monospace font.
Now that https://bugzilla.mozilla.org/show_bug.cgi?id=1247687 has landed in Firefox, we're able to use worker-modules during development :-)
This removes the final piece of SystemJS usage from the PDF.js library, thus allowing a fair bit of clean-up, and we now use *only* native `import`/`export` statements everywhere in development mode.