1.3 KiB
id, aliases, tags
| id | aliases | tags | |||
|---|---|---|---|---|---|
|
Automating PDF Annotation
See portable-tools for valid dependencies.
Clean Documents
Page Rotation
Pages have a rotation value independent of appearance which rotates the reference grid. This must be resolved before further processing.
Extract Bluebeam Markups
Right now I'm exporting Bluebeam markups to csv before processing, however if I converted to the code to extract the markups directly with MuPDF.Net as I've managed before with itext, that could save a step.
PDF Content Positional Tokenization
Recursively parse and consume pdf vector content.
[!example] A span with text "GFCI" is consumed. (draw calls are removed from page content) A
gfci_labeltoken is created and encoded with the span's position.
[!example] A
duplex_receptacletoken and agfci_labeltoken in close proximity are consumed creating aduplex_gfci_receptacletoken which inherits theduplex_receptacle's position.
$ mutool show file.pdf pages/1/Contents
629 0 obj
<<
/Filter /FlateDecode
/Length 31375
>>
stream
q
0.12 0 0 0.12 0 0 cm
/R8 gs
/R9 gs
2 w
1 J
1 j
0 G
q
4217 3947 m
4217 3879 l
4467 3879 l
4467 3947 l
4455 3947 l
4455 3891 l
4230 3891 l
4230 3947 l
...