vault backup: 2025-08-29 12:47:00

This commit is contained in:
2025-08-29 12:47:00 -04:00
parent c1bdb06c6a
commit f831a58d53
6 changed files with 108 additions and 22 deletions
+45
View File
@@ -15,3 +15,48 @@ See [[portable-tools]] for valid dependencies.
Right now I'm exporting Bluebeam markups to csv before processing,
however if I converted to the code to extract the markups directly with MuPDF.Net
as I've managed before with itext, that could save a step.
## PDF Content Positional Tokenization
Recursively parse and consume pdf vector content.
> [!example]
> A span with text "GFCI" is consumed.
> (draw calls are removed from page content)
> A `gfci_label` token is created
> and encoded with the span's position.
> [!example]
> A `duplex_receptacle` token and a `gfci_label` token
> in close proximity are consumed
> creating a `duplex_gfci_receptacle` token
> which inherits the `duplex_receptacle`'s position.
```
$ mutool show file.pdf pages/1/Contents
629 0 obj
<<
/Filter /FlateDecode
/Length 31375
>>
stream
q
0.12 0 0 0.12 0 0 cm
/R8 gs
/R9 gs
2 w
1 J
1 j
0 G
q
4217 3947 m
4217 3879 l
4467 3879 l
4467 3947 l
4455 3947 l
4455 3891 l
4230 3891 l
4230 3947 l
...
```