For simplicity we will abort /Form XObject parsing *immediately* when encountering a circular reference, rather than letting it continue up until some limit (as e.g. PDFium appears to do), which should be fine since there are never any guarantees if/how *corrupt* PDF documents will render.
Previously we'd only do this for Type1/CFF fonts, see e.g. PR 6736, since the font-program may update the /FontMatrix.
However, it seems that we should do this unconditionally to account for fonts with non-default /FontMatrix-entries in the font-dictionary (which seem to be pretty rare).
In PDF version 2.0 the handling of Indexed color spaces was clarified as follows:
> The index value should be an integer in the range 0 to hival. If the value is a real number, it shall be rounded to the nearest integer (0.5 values shall be rounded up); if it is outside the range 0 to hival, it shall be adjusted to the nearest value within that range.
Please refer to https://github.com/pdf-association/pdf-differences/tree/main/IndexedColor
The current text layer approach based on absolutely positioned
`<span>` elements by default causes flickering with text selection,
and we have browser-specific workarounds to solve that.
In Chrome, the workaround involves moving the `.endOfContent` element to
right after the last element that contains some selected content. This
works well in simple PDFs, but breaks when we have `span.markedContent`
elements. Given a text layer structure like the following, rendered
as four consecutive lines:
```html
<span class="markedContent">
<br>
<span>development enter the construction phase (estimated at around</span>
</span>
<span class="markedContent">
<br>
<span>300 MEUR).</span>
</span>
<span class="markedContent">
<br>
<span>Kreate's EBITA increased to 2.8 MEUR (Q4'23: 2.7 MEUR) and the</span>
</span>
<span class="markedContent">
<br>
<span>margin rose to 3.7% (Q4'23: 3.4%). However, profitability was</span>
</span>
```
when starting to select from inside the first line and dragging down
to the empty space after the second line, Chrome will anchor the
selection at the beginning of either the `<br>` or the `<span>` inside
the last `.markedContent`, depending on whether the selection is in
"per-character mode" (i.e. click and drag) or "per-word mode" (i.e.
double click and drag). This causes us to insert the `.endOfContent`
element in the wrong place (one element too far), which causes one
more line to be selected, which triggers another `"selecctionchange"`
event, which causes us to move `.endOfContent` again, and so on, looping
until when the whole page is selected.
This commit fixes the issue by making sure that when the end of the
selection range points to the _begining_ of an element, we walk back
the dom finding the first non-empty element, and attatch `.endOfContent`
to the end of that.
In the included PDF document the Type3-font doesn't contain any glyph definition for "space", despite that character being referenced in the /Contents stream.
While missing Type3-glyphs obviously cannot be rendered, we still need to update the current canvas position such that any char/word-spacing is correctly applied.
The test-case was found at https://github.com/pdf-association/pdf-differences/tree/main/Type3WordSpacing
Originally this function would "manually" invoke the rendering commands for Type3-glyphs, however that was changed some time ago:
- Initial `Path2D` support was added in PR 14858, but the old code kept for Node.js compatibility.
- Since PR 15951 we've been using a `Path2D` polyfill in Node.js environments.
Hence, after the previous commit, we can further simplify this function by *directly* returning/using the `Path2D` object when rendering Type3-glyphs; see also https://github.com/mozilla/pdf.js/pull/19731#discussion_r2018712695
While this won't improve performance significantly, when compared to the introduction of `Path2D`, it definately cannot hurt.
With this patch, all the paths components are collected in the worker until a path
operation is met (i.e., stroke, fill, ...).
Then in the canvas a Path2D is created and will replace the path data transfered from the worker,
this way when rescaling, the Path2D can be reused.
In term of performances, using Path2D is very slightly improving speed when scaling the canvas.
This patch extends the approach of PR 14543, by also treating e.g. minus signs followed by '(' or '<' as zero.
Inside of a /Contents stream those characters will generally mean the start of one or more glyphs.
For Type3 glyphs with `d1` operators it's easy to compute a fallback bounding box, however for `d0` the situation is more difficult.
Given that we nowadays compute the min/max of basic path-rendering operators on the worker-thread, we can utilize that by parsing these Type3 operatorLists to guess a more suitable fallback bounding box.
One of the images have a corrupt SMask, where the /Height-entry is bogus; see the excerpt below (via https://brendandahl.github.io/pdf.js.utils/browser/).
```
SMask (stream) [id: 17, gen: 0]
ColorSpace = /DeviceGray
Height = /Length
Subtype = /Image
Filter = /FlateDecode
Type = /XObject
Width = 157
Matte (array)
BitsPerComponent = 8
Length = 3893
<view contents> download
```
Hence we enable SMask/Mask images to fallback to the parent image dimensions, and also add more validation of the width/height to get a better error message when that data is wrong.
Web archive no longer has the revision for the saved PDF referenced in
that test case, so this updates that link to a more recent revision to
enable the tests to run on a clean clone.
When rendering big PDF pages at high zoom levels, we currently fall back
to CSS zoom to avoid rendering canvases with too many pixels. This
causes zoomed in PDF to look blurry, and the text to be potentially
unreadable.
This commit adds support for rendering _part_ of a page (called
`PDFPageDetailView` in the code), so that we can render portion of a
page in a smaller canvas without hiting the maximun canvas size limit.
Specifically, we render an area of that page that is slightly larger
than the area that is visible on the screen (100% larger in each
direction, unless we have to limit it due to the maximum canvas size).
As the user scrolls around the page, we re-render a new area centered
around what is currently visible.
During the XRef stream parsing we're attempting to lookup an entry that hasn't yet been found, since parsing is currently running, and given that we'd also cache free/missing XRef entries we'd then return an incorrect value during normal PDF parsing.
The simplest solution here is to just not cache free/missing XRef entries, since a properly generated PDF document shouldn't be trying to access objects it doesn't contain.
Furthermore, the amount of "extra" parsing now needed for such XRef entries shouldn't be significant enough to be an issue.
It fixes#19505.
We were invaliding throwing actions (in setting event.rc to false) and all the event process was stopped.
Now we're just dumping the exception in the console: the action is skipped and event.rc is not set
else the input fields aren't updated wit KeyStroke actions.
Rather than modifying the "raw" dimensions of the page, we'll instead apply the `userUnit` as an *additional* scale-factor via CSS.
*Please note:* It's not clear to me if this solution is fully correct either, or if there's other problems with it, but it at least *appears* to work.
---
With these changes, the following CSS variables are now assumed to be available/set as necessary: `--total-scale-factor`, `--scale-factor`, `--user-unit`, `--scale-round-x`, and `--scale-round-y`.
*Note:* For the issue mentioned on Matrix it'll obviously still make sense to improve the regular expression to detect more URL edge-cases.
However it occurred to me that even once that particular case is fixed there'll always be a risk that inferred links could overlap, and effectively block, the actual LinkAnnotations.
Hence this patch removes the URL comparison to ensure that overlapping inferred links will always be ignored.
Automatically detect links in the text content of a file and automatically
generate link annotations at the appropriate locations to achieve
automatic link detection and hyperlinking.
It fixes#19360.
Each glyph in the test case has a fill and a stroke pattern, so the current transform used
to scale the glyph outline must be the same.
In setting the stroke color to green, I noticed that the last outline contains some non-closed
subpaths, so when generating the glyph outline, every time we 'moveTo', we close the previous
subpath.
In the affected font the total number of mapping-entries is `1142348`, and no less than `997473` of them are duplicates.
Given that every duplicate causes a lot of Array elements to be moved this becomes extremely inefficient, which we can avoid by keeping track of seen `charCode`s and directly build the final mappings-Array instead.
It fixes#19239.
When the canvas isn't existing the editor has no image: it's fine because the editor is invisible.
Once it's made visible, the canvas is set when the annotation layer has been rendered.
This appears to have regressed in PR 13808, since it removed the `matrix`-entry from array returned by the `MeshShading.prototype.getIR` method *without* also updating the indexes in the `MeshShadingPattern` constructor.
The `/Root/AcroForm/Fields` array contains a "ridiculous" number of LinkAnnotations, which obviously makes no sense since those are not form fields.
To improve performance we'll thus ignore those when collecting the field objects.
From section [11.6.4.3 Mask Shape and Opacity](https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#G10.4848628) in the PDF specification:
- An image XObject may contain its own *soft-mask image* in the form of a subsidiary image XObject in the `SMask` entry of the image dictionary (see "Image Dictionaries"). This mask, if present, shall override any explicit or colour key mask specified by the image dictionary's `Mask` entry. Either form of mask in the image dictionary shall override the current soft mask in the graphics state.
When computing the left offset of the highlighted text, we cannot use
.offsetLeft because the text might have been scaled through CSS, and it
needs to be taken into account.
Use `.getClientRects()`/`.getBoundingClientRect()` instead, which will
return measurements scaled appropriately.
The date was create in UTC+0 and then amended in using set-Month/Date which take into account
the user timezone.
With this patch we build all the date in the user timezone.