110 lines
2.2 KiB
Markdown
110 lines
2.2 KiB
Markdown
---
|
|
tags:
|
|
- destiny/fleeting
|
|
- topic/automation
|
|
- topic/software
|
|
- type/idea
|
|
title: Automating PDF Annotation
|
|
---
|
|
# Automating PDF Annotation
|
|
|
|
See [[portable-tools]] for valid dependencies.
|
|
|
|
## Clean Documents
|
|
|
|
### Page Rotation
|
|
|
|
Pages have a rotation value independent of appearance
|
|
which rotates the drawing reference grid.
|
|
This must be resolved before further processing.
|
|
|
|
Trivial coordinate-returning tools/functions
|
|
may use the non-rotation-adjusted values,
|
|
which are essentially useless for our purposes.
|
|
|
|
MuPDF and similar libraries provide functions
|
|
to return the "visual" (read _correct_) coordinates,
|
|
but it would be ideal to redraw content with the correct orientation.
|
|
|
|
## Extract Bluebeam Markups
|
|
|
|
Right now I'm exporting Bluebeam markups to csv before processing,
|
|
however if I converted to the code to extract the markups directly with MuPDF.Net
|
|
as I've managed before with itext, that could save a step.
|
|
|
|
### Bluebeam Revu Measure Hack
|
|
|
|
BlueBeam Revu does not give coordinates for count annotations,
|
|
even where count = 1.
|
|
|
|
Bluebeam's .bax is a annotation interchange format based on xml.
|
|
|
|
1. Export markups to .bax
|
|
|
|
> [!menu]
|
|
> Markups List > Markups > Export Markups
|
|
|
|
2. Edit markups
|
|
|
|
Convert count annotations to polygons
|
|
|
|
```
|
|
<TypeInternal>Bluebeam.PDF.Annotations.AnnotationPolygon</TypeInternal>
|
|
```
|
|
|
|
3. Delete all markups in the pdf
|
|
|
|
4. Import Markups from edited .bax file
|
|
|
|
> [!menu]
|
|
> Markups List > Markups > Import
|
|
|
|
## PDF Content Positional Tokenization
|
|
|
|
Recursively parse and consume pdf vector content.
|
|
|
|
> [!example]
|
|
> A span with text "GFCI" is consumed.
|
|
> (draw calls are removed from page content)
|
|
> A `gfci_label` token is created
|
|
> and encoded with the span's position.
|
|
|
|
> [!example]
|
|
> A `duplex_receptacle` token and a `gfci_label` token
|
|
> in close proximity are consumed
|
|
> creating a `duplex_gfci_receptacle` token
|
|
> which inherits the `duplex_receptacle`'s position.
|
|
|
|
## PDF Internals
|
|
|
|
```sh
|
|
$ mutool show file.pdf pages/1/Contents
|
|
```
|
|
|
|
```pdf
|
|
629 0 obj
|
|
<<
|
|
/Filter /FlateDecode
|
|
/Length 31375
|
|
>>
|
|
stream
|
|
q
|
|
0.12 0 0 0.12 0 0 cm
|
|
/R8 gs
|
|
/R9 gs
|
|
2 w
|
|
1 J
|
|
1 j
|
|
0 G
|
|
q
|
|
4217 3947 m
|
|
4217 3879 l
|
|
4467 3879 l
|
|
4467 3947 l
|
|
4455 3947 l
|
|
4455 3891 l
|
|
4230 3891 l
|
|
4230 3947 l
|
|
...
|
|
```
|