--- tags: - destiny/fleeting - topic/automation - topic/software - type/idea title: Automating PDF Annotation --- # Automating PDF Annotation See [[portable-tools]] for valid dependencies. ## Clean Documents ### Page Rotation Pages have a rotation value independent of appearance which rotates the drawing reference grid. This must be resolved before further processing. Trivial coordinate-returning tools/functions may use the non-rotation-adjusted values, which are essentially useless for our purposes. MuPDF and similar libraries provide functions to return the "visual" (read _correct_) coordinates, but it would be ideal to redraw content with the correct orientation. ## Extract Bluebeam Markups Right now I'm exporting Bluebeam markups to csv before processing, however if I converted to the code to extract the markups directly with MuPDF.Net as I've managed before with itext, that could save a step. ### Bluebeam Revu Measure Hack BlueBeam Revu does not give coordinates for count annotations, even where count = 1. Bluebeam's .bax is a annotation interchange format based on xml. 1. Export markups to .bax > [!menu] > Markups List > Markups > Export Markups 2. Edit markups Convert count annotations to polygons ``` Bluebeam.PDF.Annotations.AnnotationPolygon ``` 3. Delete all markups in the pdf 4. Import Markups from edited .bax file > [!menu] > Markups List > Markups > Import ## PDF Content Positional Tokenization Recursively parse and consume pdf vector content. > [!example] > A span with text "GFCI" is consumed. > (draw calls are removed from page content) > A `gfci_label` token is created > and encoded with the span's position. > [!example] > A `duplex_receptacle` token and a `gfci_label` token > in close proximity are consumed > creating a `duplex_gfci_receptacle` token > which inherits the `duplex_receptacle`'s position. ## PDF Internals ```sh $ mutool show file.pdf pages/1/Contents ``` ```pdf 629 0 obj << /Filter /FlateDecode /Length 31375 >> stream q 0.12 0 0 0.12 0 0 cm /R8 gs /R9 gs 2 w 1 J 1 j 0 G q 4217 3947 m 4217 3879 l 4467 3879 l 4467 3947 l 4455 3947 l 4455 3891 l 4230 3891 l 4230 3947 l ... ```