Files
zmVault/automating-pdf-annotation.md
T

1.3 KiB

id, aliases, tags
id aliases tags
topic/automation
topic/software
status/fleeting

Automating PDF Annotation

See portable-tools for valid dependencies.

Clean Documents

Page Rotation

Pages have a rotation value independent of appearance which rotates the reference grid. This must be resolved before further processing.

Extract Bluebeam Markups

Right now I'm exporting Bluebeam markups to csv before processing, however if I converted to the code to extract the markups directly with MuPDF.Net as I've managed before with itext, that could save a step.

PDF Content Positional Tokenization

Recursively parse and consume pdf vector content.

[!example] A span with text "GFCI" is consumed. (draw calls are removed from page content) A gfci_label token is created and encoded with the span's position.

[!example] A duplex_receptacle token and a gfci_label token in close proximity are consumed creating a duplex_gfci_receptacle token which inherits the duplex_receptacle's position.

$ mutool show file.pdf pages/1/Contents

629 0 obj
<<
  /Filter /FlateDecode
  /Length 31375
>>
stream
q
0.12 0 0 0.12 0 0 cm
/R8 gs
/R9 gs
2 w
1 J
1 j
0 G
q
4217 3947 m
4217 3879 l
4467 3879 l
4467 3947 l
4455 3947 l
4455 3891 l
4230 3891 l
4230 3947 l
...