1.1 KiB
1.1 KiB
id, aliases, tags
| id | aliases | tags | |||
|---|---|---|---|---|---|
|
Automating PDF Annotation
See portable-tools for valid dependencies.
Extract Bluebeam Markups
Right now I'm exporting Bluebeam markups to csv before processing, however if I converted to the code to extract the markups directly with MuPDF.Net as I've managed before with itext, that could save a step.
PDF Content Positional Tokenization
Recursively parse and consume pdf vector content.
[!example] A span with text "GFCI" is consumed. (draw calls are removed from page content) A
gfci_labeltoken is created and encoded with the span's position.
[!example] A
duplex_receptacletoken and agfci_labeltoken in close proximity are consumed creating aduplex_gfci_receptacletoken which inherits theduplex_receptacle's position.
$ mutool show file.pdf pages/1/Contents
629 0 obj
<<
/Filter /FlateDecode
/Length 31375
>>
stream
q
0.12 0 0 0.12 0 0 cm
/R8 gs
/R9 gs
2 w
1 J
1 j
0 G
q
4217 3947 m
4217 3879 l
4467 3879 l
4467 3947 l
4455 3947 l
4455 3891 l
4230 3891 l
4230 3947 l
...