76 lines
1.4 KiB
Markdown
76 lines
1.4 KiB
Markdown
---
|
|
id:
|
|
aliases: []
|
|
tags:
|
|
- type/idea
|
|
- automation
|
|
---
|
|
# PowerShell Document Conversion
|
|
|
|
PowerShell module to allow fluent conversion of "document" files,
|
|
including markup source (.md, .adoc, .rst, etc.)
|
|
and binary formats (.pdf, .docx, etc.),
|
|
as well as allow ~~common high-level actions on them.~~
|
|
|
|
> [!note]
|
|
> Cmdlets should be written to accept multiple input files where appropriate.
|
|
|
|
## `Convert-Document`
|
|
|
|
* pandoc
|
|
* calibre `ebook-convert` (if necessary)
|
|
* AsciiDoctor
|
|
|
|
### Case: From PDF
|
|
|
|
* marker
|
|
* Python based CLI
|
|
* Option to use LLM for more accurate output
|
|
* muPDF
|
|
|
|
### Case: From Asciidoc
|
|
|
|
pandoc can not translate from Asciidoc, only to.
|
|
Convert first to html.
|
|
|
|
## `Measure-Document`
|
|
|
|
Measure total words, occurances of unique words, etc.
|
|
Where applicable, the same by page, as well as total pages.
|
|
|
|
## `Format-Document`
|
|
|
|
Run format specific linter on document.
|
|
|
|
## `Get-DocumentMetadata`/`Set-DocumentMetadata`
|
|
|
|
> Reconcile document metadata formats
|
|
|
|
* markdown yaml frontmatter
|
|
* asciidoc attributes
|
|
|
|
## `Import-Document`
|
|
|
|
Return PSCustomObject with appropriate properties.
|
|
|
|
***
|
|
|
|
## `Import-XML`
|
|
|
|
Missing from base pwsh but should be straightforward to implement.
|
|
|
|
```pwsh
|
|
function Import-XML {
|
|
param(
|
|
[string]$Path
|
|
)
|
|
$xmlText = Get-Content $Path
|
|
$document = [xml]$xmlText
|
|
return $document
|
|
}
|
|
```
|
|
|
|
`*.SupplierLink.xml`
|
|
|
|
## `Unblock-Excel`
|