--- id: aliases: [] tags: - topic/automation - topic/software - status/fleeting --- # Automating PDF Annotation See [[portable-tools]] for valid dependencies. ## Extract Bluebeam Markups Right now I'm exporting Bluebeam markups to csv before processing, however if I converted to the code to extract the markups directly with MuPDF.Net as I've managed before with itext, that could save a step. ## PDF Content Positional Tokenization Recursively parse and consume pdf vector content. > [!example] > A span with text "GFCI" is consumed. > (draw calls are removed from page content) > A `gfci_label` token is created > and encoded with the span's position. > [!example] > A `duplex_receptacle` token and a `gfci_label` token > in close proximity are consumed > creating a `duplex_gfci_receptacle` token > which inherits the `duplex_receptacle`'s position. ``` $ mutool show file.pdf pages/1/Contents 629 0 obj << /Filter /FlateDecode /Length 31375 >> stream q 0.12 0 0 0.12 0 0 cm /R8 gs /R9 gs 2 w 1 J 1 j 0 G q 4217 3947 m 4217 3879 l 4467 3879 l 4467 3947 l 4455 3947 l 4455 3891 l 4230 3891 l 4230 3947 l ... ```