Files
zmVault/automating-pdf-annotation.md
T

1.1 KiB

id, aliases, tags
id aliases tags
topic/automation
topic/software
status/fleeting

Automating PDF Annotation

See portable-tools for valid dependencies.

Extract Bluebeam Markups

Right now I'm exporting Bluebeam markups to csv before processing, however if I converted to the code to extract the markups directly with MuPDF.Net as I've managed before with itext, that could save a step.

PDF Content Positional Tokenization

Recursively parse and consume pdf vector content.

[!example] A span with text "GFCI" is consumed. (draw calls are removed from page content) A gfci_label token is created and encoded with the span's position.

[!example] A duplex_receptacle token and a gfci_label token in close proximity are consumed creating a duplex_gfci_receptacle token which inherits the duplex_receptacle's position.

$ mutool show file.pdf pages/1/Contents

629 0 obj
<<
  /Filter /FlateDecode
  /Length 31375
>>
stream
q
0.12 0 0 0.12 0 0 cm
/R8 gs
/R9 gs
2 w
1 J
1 j
0 G
q
4217 3947 m
4217 3879 l
4467 3879 l
4467 3947 l
4455 3947 l
4455 3891 l
4230 3891 l
4230 3947 l
...