Files
zmVault/new-additions/powershell-document-conversion-module.md
T

69 lines
1.3 KiB
Markdown

# PowerShell Document Conversion
PowerShell module to allow fluent conversion of "document" files,
including markup source (.md, .adoc, .rst, etc.)
and binary formats (.pdf, .docx, etc.),
as well as allow ~~common high-level actions on them.~~
> [!note]
> Cmdlets should be written to accept multiple input files where appropriate.
## `Convert-Document`
* pandoc
* calibre `ebook-convert` (if necessary)
* AsciiDoctor
### Case: From PDF
* marker
* Python based CLI
* Option to use LLM for more accurate output
* muPDF
### Case: From Asciidoc
pandoc can not translate from Asciidoc, only to.
Convert first to html.
## `Measure-Document`
Measure total words, occurances of unique words, etc.
Where applicable, the same by page, as well as total pages.
## `Format-Document`
Run format specific linter on document.
## `Get-DocumentMetadata`/`Set-DocumentMetadata`
> Reconcile document metadata formats
* markdown yaml frontmatter
* asciidoc attributes
## `Import-Document`
Return PSCustomObject with appropriate properties.
***
## `Import-XML`
Missing from base pwsh but should be straightforward to implement.
```pwsh
function Import-XML {
param(
[string]$Path
)
$xmlText = Get-Content $Path
$document = [xml]$xmlText
return $document
}
```
`*.SupplierLink.xml`
## `Unblock-Excel`