Thursday, 15 March 2012

iTextSharp does not retrieve the TAB character -



iTextSharp does not retrieve the TAB character -

i'm reading pdf file itextsharp next command not homecoming tab character, enter.

var rect = new system.util.rectanglej(x, y, width, height); var filters = new renderfilter[1]; filters[0] = new regiontextrenderfilter(rect); itextextractionstrategy strategy = new filteredtextrenderlistener(new locationtextextractionstrategy(), filters); var currenttext = pdftextextractor.gettextfrompage(pdfreader, pagenumber, strategy);

can help me?

thank you

nobody can reply question because assumption concept of tab character in pdf content stream exists wrong.

there no such thing tab character between 2 words. tabs created defining distances between words. text added @ absolute positions , if 2 snippets of text need separated tab space, coordinates adapted in accordance requirement. there no tab characters! differences in distances between text snippets.

itextsharp can give detailed info position of text snippets stored within pdf. can find code in accepted reply question: pdf reading highlighed text (highlight annotations) using c#

we've demonstrated concept of text extraction @ our itext summit in cologne on june 17, 2014. these slides help on way: http://www.slideshare.net/itextpdf/itext-summit-2014-talk-unstructured-pdf

itextsharp

No comments:

Post a Comment