mirror of
https://github.com/mozilla/pdf.js.git
synced 2025-04-20 07:08:08 +02:00
Add unit test to check compatability with such cmaps In the PDF in issue 18099. the toUnicode cmap had a line to map the glyph char codes from 00 to 7F to the corresponding code points. The syntax to map a range of char codes to a range of unicode code points is <start_char_code> <end_char_code> <start_unicode_codepoint> As the unicode code points are supposed to be given in UTF-16 BE, the PDF's line SHOULD have probably read <00> <7F> <0000> Instead it omitted two leading zeros from the UTF-16 like this <00> <7F> <00> This confused PDF.js into mapping these character codes to the UTF-16 characters with the corresponding HIGH bytes (01 became \u0100, 02 became \u0200, et cetera), which ended up turning latin text in the PDF into chinese when it was copied I'm not sure if the PDF spec actually allows PDFs to do this, but since there's at least one PDF in the wild that does and other PDF readers read it correctly, PDF.js should probably support this |
||
---|---|---|
.. | ||
chromium | ||
font | ||
fuzz | ||
images | ||
integration | ||
pdfs | ||
resources | ||
stats | ||
types | ||
unit | ||
.eslintrc | ||
.gitignore | ||
add_test.mjs | ||
annotation_layer_builder_overrides.css | ||
downloadutils.mjs | ||
draw_layer_test.css | ||
driver.js | ||
integration-boot.mjs | ||
test.mjs | ||
test_manifest.json | ||
test_slave.html | ||
testutils.mjs | ||
text_layer_test.css | ||
webserver.mjs | ||
xfa_layer_builder_overrides.css |