pdf generation - Remove all text from PDF file -
i using ghostscript convert source pdf file array of png images. before convert pdf page png image need extract (delete) text pdf converted page image contain other elements, excluding text.
can accomplish ghostscript or need different tools?
i interested in tool can read-save source pdf removing text.
you can accomplish want without ghostscript, using text editor.
convert compressed pdf 1 has (nearly) pdf objects' contents , streams expanded readable form using qpdf:
qpdf --qdf --object-streams=disable input.pdf editable.pdf
open new editable.pdf
file text editor (which gracefully handles remaining binary blobs within pdf such font or icc resources).
search occurences of tj
, tj
strings (pdf operators used show text) within pdf object streams , alter them jt
, jt
strings respectively (undefined, nonsense pdf operators). save file edited.pdf
.
now convert edited.pdf
png images needed.
note, edited.pdf
still display in pdf viewers, text missing. however, easy restore text again, restoring original tj/tj operators.
my bad! original reply contained repeated typo. had used tj
@ places tj
should have been used. sorry confusion may have created.
to clarify "object stream" is... in "normalized" form created qpdf
command given above, objects streams (where nnn
integer number):
nnn 0 obj << % here key:value pairs of object dictionary /key1 somevalue1 /key2 somevalue2 % ... (more key:value pairs) >> stream % here content of object stream endstream endobj
an "image stream" has same structure. key:value pairs typically contain next 4 entries, in order (where nnn
, mmm
integer values giving width , height of image in pixels):
/type /xobject /subtype /image /width nnn /height mmm
pdf-generation ghostscript
No comments:
Post a Comment