Sunday, 15 September 2013

pdf generation - Remove all text from PDF file -



pdf generation - Remove all text from PDF file -

i using ghostscript convert source pdf file array of png images. before convert pdf page png image need extract (delete) text pdf converted page image contain other elements, excluding text.

can accomplish ghostscript or need different tools?

i interested in tool can read-save source pdf removing text.

you can accomplish want without ghostscript, using text editor.

convert compressed pdf 1 has (nearly) pdf objects' contents , streams expanded readable form using qpdf:

qpdf --qdf --object-streams=disable input.pdf editable.pdf

open new editable.pdf file text editor (which gracefully handles remaining binary blobs within pdf such font or icc resources).

search occurences of tj , tj strings (pdf operators used show text) within pdf object streams , alter them jt , jt strings respectively (undefined, nonsense pdf operators). save file edited.pdf.

now convert edited.pdf png images needed.

note, edited.pdf still display in pdf viewers, text missing. however, easy restore text again, restoring original tj/tj operators.

update/correction

my bad! original reply contained repeated typo. had used tj @ places tj should have been used. sorry confusion may have created.

update 2

to clarify "object stream" is... in "normalized" form created qpdf command given above, objects streams (where nnn integer number):

nnn 0 obj << % here key:value pairs of object dictionary /key1 somevalue1 /key2 somevalue2 % ... (more key:value pairs) >> stream % here content of object stream endstream endobj

an "image stream" has same structure. key:value pairs typically contain next 4 entries, in order (where nnn , mmm integer values giving width , height of image in pixels):

/type /xobject /subtype /image /width nnn /height mmm

pdf-generation ghostscript

No comments:

Post a Comment