DECA-287 | Fluid Project Issues Archive

Metadata

Source: DECA-287
Type: Bug
Priority: Major
Status: Open
Resolution: N/A
Assignee: N/A
Reporter: Jonathan Hung
Created: 2012-06-27T15:51:37.201-0400
Updated: 2013-01-27T12:05:45.948-0500
Versions: 0.5

0.6

0.7
Fixed Versions: Future
Component: genpdf

Description

For some images, large sections of text are omitted when generating Type 3 or Type 4. Typically the top few lines of text would be missing.

To reproduce, run the following on the relevant image:
./decapod-genpdf.py -d test-t4 -t 4 -p test-t4.PDF filename.png/jpeg
./decapod-genpdf.py -d test-t3 -t 3 -p test-t3.PDF filename.png/jpeg

The following two images reproduce this error:
2-1-1.jpg
faithful-to-the-book-page-4-copy.jpeg (see attached PDF to see the results of a Type 3 export)

The following two images do not produce this error (despite being somewhat similar):
4-1-01-grey.jpg
Image_0016-grey.png

Format and colour do not appear to play a role as colour or TIFF versions of problematic images exhibit the same behaviour.

Attachments

Comments

tamir@tamirhassan.com commented 2013-01-27T12:03:52.545-0500

The reason is because the line-finding stage of layout analysis has failed and the lines have not been found – and used for further processing.

I've tried it out with the current version and get a much better result – only the page number at the top is missing.

Ideally, all content not recognized as text lines would be included as part of a background image.
tamir@tamirhassan.com commented 2013-01-27T12:05:08.641-0500

(this comment relates to the file test-t3.pdf) This is the output that I got when running genpdf on the same pdf (t3). Only the page number at the top (not recognized as text?) is missing. Tamir