Files
zmVault/automating-pdf-annotation.md
T

2.1 KiB

id, aliases, tags
id aliases tags
topic/automation
topic/software
destiny/fleeting
type/idea

Automating PDF Annotation

See portable-tools for valid dependencies.

Clean Documents

Page Rotation

Pages have a rotation value independent of appearance which rotates the drawing reference grid. This must be resolved before further processing.

Trivial coordinate-returning tools/functions may use the non-rotation-adjusted values, which are essentially useless for our purposes.

MuPDF and similar libraries provide functions to return the "visual" (read correct) coordinates, but it would be ideal to redraw content with the correct orientation.

Extract Bluebeam Markups

Right now I'm exporting Bluebeam markups to csv before processing, however if I converted to the code to extract the markups directly with MuPDF.Net as I've managed before with itext, that could save a step.

Bluebeam Revu Measure Hack

BlueBeam Revu give coordinates for count annotations, even where count = 1.

Bluebeam's .bax is a annotation interchange format based on xml

  1. Export markups to .bax

    [!menu] Markups List > Markups > Export Markups

  2. Edit markups

    Convert count annotations to polygons

    <TypeInternal>Bluebeam.PDF.Annotations.AnnotationPolygon</TypeInternal>
    
  3. Delete all markups in the pdf

  4. Import Markups from edited .bax file

    [!menu] Markups List > Markups > Import

PDF Content Positional Tokenization

Recursively parse and consume pdf vector content.

[!example] A span with text "GFCI" is consumed. (draw calls are removed from page content) A gfci_label token is created and encoded with the span's position.

[!example] A duplex_receptacle token and a gfci_label token in close proximity are consumed creating a duplex_gfci_receptacle token which inherits the duplex_receptacle's position.

$ mutool show file.pdf pages/1/Contents
629 0 obj
<<
  /Filter /FlateDecode
  /Length 31375
>>
stream
q
0.12 0 0 0.12 0 0 cm
/R8 gs
/R9 gs
2 w
1 J
1 j
0 G
q
4217 3947 m
4217 3879 l
4467 3879 l
4467 3947 l
4455 3947 l
4455 3891 l
4230 3891 l
4230 3947 l
...