--- id: aliases: [] tags: - type/idea - automation --- # PowerShell Document Conversion PowerShell module to allow fluent conversion of "document" files, including markup source (.md, .adoc, .rst, etc.) and binary formats (.pdf, .docx, etc.), as well as allow ~~common high-level actions on them.~~ > [!note] > Cmdlets should be written to accept multiple input files where appropriate. See [[portable-tools]] for valid dependencies. ## `Convert-Document` * pandoc * calibre `ebook-convert` (if necessary) * AsciiDoctor ### Case: From PDF * marker * Python based CLI * Option to use LLM for more accurate output * muPDF ### Case: From Asciidoc pandoc can not translate from Asciidoc, only to. Convert first to html. ## `Measure-Document` Measure total words, occurances of unique words, etc. Where applicable, the same by page, as well as total pages. ## `Format-Document` Run format specific linter on document. ## `Get-DocumentMetadata`/`Set-DocumentMetadata` > Reconcile document metadata formats * markdown yaml frontmatter * asciidoc attributes ## `Import-Document` Return PSCustomObject with appropriate properties. *** ## `Import-XML` Missing from base pwsh but should be straightforward to implement. ```pwsh function Import-XML { param( [string]$Path ) $xmlText = Get-Content $Path $document = [xml]$xmlText return $document } ``` `*.SupplierLink.xml` ## `Unblock-Document` Excel sheet protection ## `Where-Document` Filter by metadata. etc.