2.1 KiB
id, aliases, tags
| id | aliases | tags | |||
|---|---|---|---|---|---|
|
Automating PDF Annotation
See portable-tools for valid dependencies.
Clean Documents
Page Rotation
Pages have a rotation value independent of appearance which rotates the drawing reference grid. This must be resolved before further processing.
Trivial coordinate-returning tools/functions may use the non-rotation-adjusted values, which are essentially useless for our purposes.
MuPDF and similar libraries provide functions to return the "visual" (read correct) coordinates, but it would be ideal to redraw content with the correct orientation.
Extract Bluebeam Markups
Right now I'm exporting Bluebeam markups to csv before processing, however if I converted to the code to extract the markups directly with MuPDF.Net as I've managed before with itext, that could save a step.
Bluebeam Revu Measure Hack
BlueBeam Revu give coordinates for count annotations, even where count = 1.
Bluebeam's .bax is a annotation interchange format based on xml
-
Export markups to .bax
[!menu] Markups List > Markups > Export Markups
-
Edit markups
Convert count annotations to polygons
<TypeInternal>Bluebeam.PDF.Annotations.AnnotationPolygon</TypeInternal> -
Delete all markups in the pdf
-
Import Markups from edited .bax file
[!menu] Markups List > Markups > Import
PDF Content Positional Tokenization
Recursively parse and consume pdf vector content.
[!example] A span with text "GFCI" is consumed. (draw calls are removed from page content) A
gfci_labeltoken is created and encoded with the span's position.
[!example] A
duplex_receptacletoken and agfci_labeltoken in close proximity are consumed creating aduplex_gfci_receptacletoken which inherits theduplex_receptacle's position.
$ mutool show file.pdf pages/1/Contents
629 0 obj
<<
/Filter /FlateDecode
/Length 31375
>>
stream
q
0.12 0 0 0.12 0 0 cm
/R8 gs
/R9 gs
2 w
1 J
1 j
0 G
q
4217 3947 m
4217 3879 l
4467 3879 l
4467 3947 l
4455 3947 l
4455 3891 l
4230 3891 l
4230 3947 l
...