Given that we handle non-embedded Calibri fonts as "mapped to standard font", we really ought to be able to use the same glyph mapping as for an actual standard font.
Note that this actually improves consistency in the code, given how we already handle such fonts if they happen to be of the `CIDFontType2` type; see b47c7eca83/src/core/fonts.js (L1186-L1190)
According to the PDF specification these destinations should have a zoom parameter, which may however be `null`, but it shouldn't be omitted; please see https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G11.2095870
Hence we try to work-around bad PDF generators by making the zoom parameter optional when validating explicit destinations in both the worker and the viewer.
Switching to an editing mode can be asynchronous (e.g. if an editable annotation exists on a
visible page), so we must add a new editor only when the page rendering is done.
This functionality is purposely limited to development mode and GENERIC builds, since it's unnecessary in e.g. the *built-in* Firefox PDF Viewer, and will only be used when a `<base>`-element is actually present.
*Please note:* We also have tests in mozilla-central that will *indirectly* ensure that relative filter-URLs work as intended in the Firefox PDF Viewer, see https://searchfox.org/mozilla-central/source/toolkit/components/pdfjs/test/browser_pdfjs_filters.js
---
To test that the issue is fixed, the following code can be used:
```html
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<base href=".">
<title>base href (issue 18406)</title>
</head>
<body>
<ul>
<li>Place this code in a file, named `base_href.html`, in the root of the PDF.js repository</li>
<li>Run <pre>npx gulp dist-install</pre></li>
<li>Run <pre>npx gulp server</pre></li>
<li>Open <a href="http://localhost:8888/base_href.html">http://localhost:8888/base_href.html</a> in a browser</li>
<li>Compare rendering with <a href="http://localhost:8888/web/viewer.html?file=/test/pdfs/issue16287.pdf">http://localhost:8888/web/viewer.html?file=/test/pdfs/issue16287.pdf</a></li>
</ul>
<canvas id="the-canvas" style="border: 1px solid black; direction: ltr;"></canvas>
<script src="/node_modules/pdfjs-dist/build/pdf.mjs" type="module"></script>
<script id="script" type="module">
//
// If absolute URL from the remote server is provided, configure the CORS
// header on that server.
//
const url = '/test/pdfs/issue16287.pdf';
//
// The workerSrc property shall be specified.
//
pdfjsLib.GlobalWorkerOptions.workerSrc =
'/node_modules/pdfjs-dist/build/pdf.worker.mjs';
//
// Asynchronous download PDF
//
const loadingTask = pdfjsLib.getDocument(url);
const pdf = await loadingTask.promise;
//
// Fetch the first page
//
const page = await pdf.getPage(1);
const scale = 1.5;
const viewport = page.getViewport({ scale });
// Support HiDPI-screens.
const outputScale = window.devicePixelRatio || 1;
//
// Prepare canvas using PDF page dimensions
//
const canvas = document.getElementById("the-canvas");
const context = canvas.getContext("2d");
canvas.width = Math.floor(viewport.width * outputScale);
canvas.height = Math.floor(viewport.height * outputScale);
canvas.style.width = Math.floor(viewport.width) + "px";
canvas.style.height = Math.floor(viewport.height) + "px";
const transform = outputScale !== 1
? [outputScale, 0, 0, outputScale, 0, 0]
: null;
//
// Render PDF page into canvas context
//
const renderContext = {
canvasContext: context,
transform,
viewport,
};
page.render(renderContext);
</script>
</body>
</html>
```
According to the PDF specification these destinations should have a coordinate parameter, which may however be `null`, but it shouldn't be omitted; please see https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G11.2095870
Hence we try to work-around bad PDF generators by making the coordinate parameter optional when validating explicit destinations in both the worker and the viewer.
Add unit test to check compatability with such cmaps
In the PDF in issue 18099. the toUnicode cmap had a line to map the glyph char codes from 00 to 7F to the corresponding code points. The syntax to map a range of char codes to a range of unicode code points is
<start_char_code> <end_char_code> <start_unicode_codepoint>
As the unicode code points are supposed to be given in UTF-16 BE, the PDF's line SHOULD have probably read
<00> <7F> <0000>
Instead it omitted two leading zeros from the UTF-16 like this
<00> <7F> <00>
This confused PDF.js into mapping these character codes to the UTF-16 characters with the corresponding HIGH bytes (01 became \u0100, 02 became \u0200, et cetera), which ended up turning latin text in the PDF into chinese when it was copied
I'm not sure if the PDF spec actually allows PDFs to do this, but since there's at least one PDF in the wild that does and other PDF readers read it correctly, PDF.js should probably support this
There's no specification for that (even if it's possible to have an idea from
the xfa specs) so we just want to hide them in order to avoid to display something
wrong.
When an image has a non-zero SMaskInData it means that the image
has an alpha channel.
With JPX images, the colorspace isn't required (by spec) so when we
don't have it, the JPX decoder will handle the conversion in RGBA
format.
Note that the referenced file is trivially corrupt, since it contains *two* PDF documents placed in the same file which doesn't make sense (and isn't how a PDF document should be updated).
However it's still a good idea to ensure that `loadFont` is able to handle errors when resolving References, since that allows us to invoke the existing fallback font handling.
Fixes issue #16843.
In certain cases, the text layer was misaligned
due to a difference between the `lang` attribute
of the viewer and the canvas. This commit addresses
the problem by adding the `lang` attribute to the canvas.
The issue was caused because PDF.js uses serif/sans-serif
fonts to generate the text layer and relies on system fonts.
The difference in the `lang` attribute led to different fonts
being picked, causing the misalignment.
This also required changing the initial `charCodeToGlyphId`-data to an Object, which seems generally correct since it's consistent with existing code in the `src\core\{cff_font, type1_font}.js` files.
For images that failed to decode once we want to avoid a pointless round-trip to the main-thread, which could otherwise happen for globally cached images.
It fixes issues #14982 and #14724.
The main problem of upscaling a canvas is that it can induces some pixelation
(see issue #14724). So this patch is just removing the limit and as a side
effect it fixes issue #14982.
As far as I can tell, in looking different profiles (especially some memory profile)
in using the Firefox profiler, I don't see any noticeable difference in term of
memory use.
and implement then in using some SVG filters and composition.
Composing in using destination-in in order to multiply RGB components by
the alpha from the mask isn't perfect: it'd be a way better to natively have
alpha masks support, it induces some small rounding errors and consequently
computed RGB are approximatively correct.
In term of performance, it's a real improvement, for example, the pdf in
issue #17779 is now rendered in few seconds.
There are still some room for improvement, but overall it should be a way
better.
*Note:* This borrows a helper function from the viewer, however the code cannot be directly shared since the worker-thread has access to various primitives.
In PR 17428 this functionality was limited to "larger" images, to not affect performance negatively. However it turns out that it's also beneficial to consider more "complex" images, regardless of their size, that contain /SMask or /Mask data; see issue 11518.
The PDF specification states that empty dash arrays, i.e. arrays with
zero elements, are in fact valid. In that case the dash array simply
corresponds to a solid, unbroken line. However, this case was erroneously
being flagged as invalid and therefore the annotation was not drawn
because its width was set to zero. This commit fixes the issue by
allowing dash arrays to have a length of zero.