I have code that extracts text from a PDF using a filetotext class. Worked until last week when something changed in the pdf's being generated. Weird thing is that it appears the characters are there and correct once I add 29 to the ord of the character.
Example response debug printout: /F1 7.31 Tf 0 0 0 rg 1 0 0 1 195.16 597.4 Tm ($PRXQW)Tj ET BT
The code uses gzuncompress on the stream section of the pdf. The $PRXQW is Amount, and adding 29dec to the ord of each character gives me this. But sometimes a character will not be this exact translation, such as what should be a ) in the text appears to be two bytes of 5C66.
Just wondering about this code ring type of character coming out of PDF's now and if anyone has seen this kind of thing?
via Chebli Mohamed
Aucun commentaire:
Enregistrer un commentaire