1
0
Fork 0
mirror of https://github.com/mozilla/pdf.js.git synced 2025-04-20 15:18:08 +02:00

Attempt to combine separate beginText/endText sequences in getTextContent (issue 9984)

Please note that while this *improves* issue 9984 slightly (and likely others too), it's not a complete solution.
The remaining issues are related to the, more general, problems with the existing heuristics related to attempting to combine separate text items.
This commit is contained in:
Jonas Jenwald 2018-08-18 13:28:40 +02:00
parent 160ca55163
commit 497b765ede
4 changed files with 47 additions and 7 deletions

View file

@ -72,6 +72,7 @@
!issue9458.pdf
!issue9915_reduced.pdf
!issue9940.pdf
!issue9984.pdf
!bad-PageLabels.pdf
!decodeACSuccessive.pdf
!filled-background.pdf

BIN
test/pdfs/issue9984.pdf Normal file

Binary file not shown.

View file

@ -1352,6 +1352,13 @@
"link": false,
"type": "eq"
},
{ "id": "issue9984-text",
"file": "pdfs/issue9984.pdf",
"md5": "41be5f1b43f61892978cfc57c74ccf4c",
"rounds": 1,
"link": false,
"type": "text"
},
{ "id": "issue8570",
"file": "pdfs/issue8570.pdf",
"md5": "0355731adb72df233eaa10464dcc8c51",