Files
zmVault/powershell-document-conversion-module.md
T

1.5 KiB

id, aliases, tags
id aliases tags
status/fleeting
topic/automation

PowerShell Document Conversion

PowerShell module to allow fluent conversion of "document" files, including markup source (.md, .adoc, .rst, etc.) and binary formats (.pdf, .docx, etc.), as well as allow common high-level actions on them.

Note

Cmdlets should be written to accept multiple input files where appropriate.

See portable-tools for valid dependencies.

Convert-Document

  • pandoc
  • calibre ebook-convert (if necessary)
  • AsciiDoctor

Case: From PDF

  • marker
    • Python based CLI
    • Option to use LLM for more accurate output
  • muPDF

Case: From Asciidoc

pandoc can not translate from Asciidoc, only to. Convert first to html.

Measure-Document

Measure total words, occurances of unique words, etc. Where applicable, the same by page, as well as total pages.

Format-Document

Run format specific linter on document.

Get-DocumentMetadata/Set-DocumentMetadata

Reconcile document metadata formats

  • markdown yaml frontmatter
  • asciidoc attributes

Import-Document

Return PSCustomObject with appropriate properties.


Import-XML

Missing from base pwsh but should be straightforward to implement.

function Import-XML {
    param(
        [string]$Path
    )
    $xmlText = Get-Content $Path
    $document = [xml]$xmlText
    return $document
}

*.SupplierLink.xml

Unblock-Document

Excel sheet protection

Where-Document

Filter by metadata. etc.