1.3 KiB
1.3 KiB
PowerShell Document Conversion
PowerShell module to allow fluent conversion of "document" files,
including markup source (.md, .adoc, .rst, etc.)
and binary formats (.pdf, .docx, etc.),
as well as allow common high-level actions on them.
Note
Cmdlets should be written to accept multiple input files where appropriate.
Convert-Document
- pandoc
- calibre
ebook-convert(if necessary) - AsciiDoctor
Case: From PDF
- marker
- Python based CLI
- Option to use LLM for more accurate output
- muPDF
Case: From Asciidoc
pandoc can not translate from Asciidoc, only to. Convert first to html.
Measure-Document
Measure total words, occurances of unique words, etc. Where applicable, the same by page, as well as total pages.
Format-Document
Run format specific linter on document.
Get-DocumentMetadata/Set-DocumentMetadata
Reconcile document metadata formats
- markdown yaml frontmatter
- asciidoc attributes
Import-Document
Return PSCustomObject with appropriate properties.
Import-XML
Missing from base pwsh but should be straightforward to implement.
function Import-XML {
param(
[string]$Path
)
$xmlText = Get-Content $Path
$document = [xml]$xmlText
return $document
}
*.SupplierLink.xml