1
0
Fork 0
mirror of https://github.com/mozilla/pdf.js.git synced 2025-04-20 07:08:08 +02:00
pdf.js/src
Jonas Jenwald 6eef69de22 Export the "raw" toUnicode-data from PartialEvaluator.preEvaluateFont
Compared to other data-structures, such as e.g. `Dict`s, we're purposely *not* caching Streams on the `XRef`-instance.[1]
The, somewhat unfortunate, effect of Streams not being cached is that repeatedly getting the *same* Stream-data requires re-parsing/re-initializing of a bunch of data; see `XRef.fetch` and related methods.

For the font-parsing in particular we're currently fetching the `toUnicode`-data, which is very often a Stream, in `PartialEvaluator.preEvaluateFont` and then *again* in `PartialEvaluator.extractDataStructures` soon afterwards.
By instead letting `PartialEvaluator.preEvaluateFont` export the "raw" `toUnicode`-data, we can avoid *some* unnecessary re-parsing/re-initializing when handling fonts.
*Please note:* In this particular case, given that `PartialEvaluator.preEvaluateFont` only accesses the "raw" `toUnicode` data, exporting a Stream should be safe.

---
[1] The reasons for this include:
 - Streams, especially `DecodeStream`-instances, can become *very* large once read. Hence caching them really isn't a good idea simply because of the (potential) memory impact of doing so.

 - Attempting to read from the *same* Stream-instance more than once won't work, unless it's `reset` in between, since using any method such as e.g. `getBytes` always starts at the current data position.

 - Given that parsing, even in the worker-thread, is now fairly asynchronous it's generally impossible to assert that any one Stream-instance isn't being accessed "concurrently" by e.g. different `getOperatorList` calls. Hence `reset`-ing a cached Stream-instance isn't going to work in the general case.
2021-05-08 12:04:13 +02:00
..
core Export the "raw" toUnicode-data from PartialEvaluator.preEvaluateFont 2021-05-08 12:04:13 +02:00
display Remove unnecessary closure in src/display/text_layer.js, and use standard classes 2021-05-05 18:44:56 +02:00
images Vectorize the logo. 2012-10-29 14:08:52 -04:00
scripting_api [JS] Fix several issues found in pdf in #13269 2021-05-04 19:21:51 +02:00
shared Display widget signature 2021-04-10 19:13:28 +02:00
doc_helper.js [api-major] Completely remove the global PDFJS object 2018-03-01 18:13:27 +01:00
interfaces.js Use ESLint to ensure that exports are sorted alphabetically 2021-01-09 20:37:51 +01:00
license_header.js Update the year in the license_header files 2021-02-11 17:52:26 +01:00
license_header_libre.js Update the year in the license_header files 2021-02-11 17:52:26 +01:00
pdf.image_decoders.js Use ESLint to ensure that exports are sorted alphabetically 2021-01-09 20:37:51 +01:00
pdf.js Merge pull request #13105 from Snuffleupagus/BasePdfManager-parseDocBaseUrl 2021-03-19 23:03:20 +01:00
pdf.sandbox.external.js [JS] Fix several issues found in pdf in #13269 2021-05-04 19:21:51 +02:00
pdf.sandbox.js [JS] Use heap allocation when initializing quickjs sandbox (#13286) 2021-04-23 12:04:14 +02:00
pdf.scripting.js Tweak the pdf.scripting.js bundling, to improve overall consistency 2020-10-25 16:36:56 +01:00
pdf.worker.entry.js Update the year in the license_header files 2021-02-11 17:52:26 +01:00
pdf.worker.js Convert the src/pdf.js and src/pdf.worker.js files to use standard import/export statements 2020-05-20 13:18:23 +02:00
worker_loader.js Update Prettier to version 2.0 2020-04-14 12:28:14 +02:00