Metadata
- Source
- DECA-287
- Type
- Bug
- Priority
- Major
- Status
- Open
- Resolution
- N/A
- Assignee
- N/A
- Reporter
- Jonathan Hung
- Created
2012-06-27T15:51:37.201-0400 - Updated
2013-01-27T12:05:45.948-0500 - Versions
-
- 0.5
- 0.6
- 0.7
- Fixed Versions
-
- Future
- Component
-
- genpdf
Description
For some images, large sections of text are omitted when generating Type 3 or Type 4. Typically the top few lines of text would be missing.
To reproduce, run the following on the relevant image:
./decapod-genpdf.py -d test-t4 -t 4 -p test-t4.PDF filename.png/jpeg
./decapod-genpdf.py -d test-t3 -t 3 -p test-t3.PDF filename.png/jpeg
The following two images reproduce this error:
2-1-1.jpg
faithful-to-the-book-page-4-copy.jpeg (see attached PDF to see the results of a Type 3 export)
The following two images do not produce this error (despite being somewhat similar):
4-1-01-grey.jpg
Image_0016-grey.png
Format and colour do not appear to play a role as colour or TIFF versions of problematic images exhibit the same behaviour.
Attachments
Comments
-
tamir@tamirhassan.com commented
2013-01-27T12:03:52.545-0500 The reason is because the line-finding stage of layout analysis has failed and the lines have not been found – and used for further processing.
I've tried it out with the current version and get a much better result – only the page number at the top is missing.
Ideally, all content not recognized as text lines would be included as part of a background image.
-
tamir@tamirhassan.com commented
2013-01-27T12:05:08.641-0500 (this comment relates to the file test-t3.pdf) This is the output that I got when running genpdf on the same pdf (t3). Only the page number at the top (not recognized as text?) is missing. Tamir