vault backup: 2025-08-29 12:47:00
This commit is contained in:
@@ -15,3 +15,48 @@ See [[portable-tools]] for valid dependencies.
|
||||
Right now I'm exporting Bluebeam markups to csv before processing,
|
||||
however if I converted to the code to extract the markups directly with MuPDF.Net
|
||||
as I've managed before with itext, that could save a step.
|
||||
|
||||
## PDF Content Positional Tokenization
|
||||
|
||||
Recursively parse and consume pdf vector content.
|
||||
|
||||
> [!example]
|
||||
> A span with text "GFCI" is consumed.
|
||||
> (draw calls are removed from page content)
|
||||
> A `gfci_label` token is created
|
||||
> and encoded with the span's position.
|
||||
|
||||
> [!example]
|
||||
> A `duplex_receptacle` token and a `gfci_label` token
|
||||
> in close proximity are consumed
|
||||
> creating a `duplex_gfci_receptacle` token
|
||||
> which inherits the `duplex_receptacle`'s position.
|
||||
|
||||
```
|
||||
$ mutool show file.pdf pages/1/Contents
|
||||
|
||||
629 0 obj
|
||||
<<
|
||||
/Filter /FlateDecode
|
||||
/Length 31375
|
||||
>>
|
||||
stream
|
||||
q
|
||||
0.12 0 0 0.12 0 0 cm
|
||||
/R8 gs
|
||||
/R9 gs
|
||||
2 w
|
||||
1 J
|
||||
1 j
|
||||
0 G
|
||||
q
|
||||
4217 3947 m
|
||||
4217 3879 l
|
||||
4467 3879 l
|
||||
4467 3947 l
|
||||
4455 3947 l
|
||||
4455 3891 l
|
||||
4230 3891 l
|
||||
4230 3947 l
|
||||
...
|
||||
```
|
||||
Reference in New Issue
Block a user