ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE. Software > Finished Programs

DONE: Mass convert already locally saved html (+htm +mht) files to pdf

<< < (2/4) > >>

I just tested with Chrome Headless browser :
(Adapted from

--- ---cd C:\Program Files (x86)\Google\Chrome\Application
chrome --headless --print-to-pdf="C:\result\20170619_075623.pdf" "C:\source\t2\20170619_075623.htm"
I need now to try to adapt the above ahk script.
Edit:Here is what I have tried but I am stuck !

--- ---WorKingDir := "C:\Program Files (x86)\Google\Chrome\Application"      
pdParams := "chrome.exe --headless --print-to-pdf= "
FileSelectFolder,SourcePath,,0,Select Source Folder
If SourcePath =

FileSelectFolder,TargetPath,*%SourcePath%,0,Select Target Folder
If TargetPath =

If (SourcePath==TargetPath){
  Msgbox 0x40000,, % "SourcePath and TargetPath cant be the same "    TargetPath

Loop, Files, %SourcePath%\*.htm, R
SplitPath, A_LoopFileFullPath, , , , OutNameNoExt
pCmd := pdParams """"  A_LoopFileFullPath """"  " " """" TargetPath "\" OutNameNoExt "_.pdf" """"   
RunWait % comspec " /c " pCmd , % WorKingDir
Thanks in advance ;)

Something quick in PowerShell using wkhtmltopdf, (put the executable in the same directory as the script), and Chrome.

Recursively converts .html, .htm, and .mht to PDF files.

Btw, if it looks familiar, 90% came from here.

Run it from a PoSh console or use a shortcut with the following as the Target, (assuming shortcut in the same folder as the script):

--- Code: Text ---%SystemRoot%\system32\WindowsPowerShell\v1.0\powershell.exe -sta -NoProfile -ExecutionPolicy Bypass -File "CTP.ps1"

--- Code: PowerShell ---<#  CTP.ps1    Convert .htm(l) and .mht to PDF#> Function Get-Folder {  Add-Type -AssemblyName System.Windows.Forms  $FolderBrowser = New-Object System.Windows.Forms.FolderBrowserDialog  [void]$FolderBrowser.ShowDialog()  $temp = $FolderBrowser.SelectedPath  If($temp -eq '') {Exit}  Return $temp}   If($PSVersionTable.PSVersion.Major -lt 3) {  Write-Host '** Script requires at least Powershell V3 **'} else {  Write-Host 'Choose input folder: ' -NoNewline -BackgroundColor DarkGreen -ForegroundColor White  $srcFolder = (Get-Folder)  Write-Host $srcFolder  Write-Host 'Choose output folder: ' -NoNewline -BackgroundColor DarkGreen -ForegroundColor White  Do {$dstFolder = (Get-Folder)} While($dstFolder -eq $srcFolder)  Write-Host $dstFolder   # PowerShell doesn't care how many stacked '\' there are in a path, multiples will always be seen as one  # ie. C:\\\\Windows = C:\Windows  # Means you can just add to the path without worrying about the trailing character  # Collect all files bigger than 3kB  $aFiles = (Get-ChildItem -Include *.html,*.htm,*.mht -Path ($srcFolder + "\*") -Recurse | Where-Object {$_.Length -gt 3kb} )  for($i = 0; $i -lt $aFiles.Count; $i++) {    $inFile = [string]$aFiles[$i]# Substitute destination folder for source folder and tack .pdf on the end# Could probably replace the extension instead ... but laziness and all that    $outFile = $dstFolder + $inFile.Replace($srcFolder, "") + '.pdf'# If output file doesn't exist then process    if (!(Test-Path $outFile)) {# Grab the parent of the output file, create the folder structure if it doesn't exist      $temp = Split-Path $outFile -Parent      if (!(Test-Path $temp)) {        New-Item $temp -ItemType Directory | Out-Null      }       Write-Host 'File:' $inFile -BackgroundColor DarkBlue -ForegroundColor Yellow# Switch command based on the last 3 characters of the file name, anything that isn't 'mht' gets processed# as htm(l)      switch ($inFile.Substring([Math]::Max($inFile.Length - 3, 0))) {        mht {          $args = "`"$($inFile)`" --headless --print-to-pdf=`"$($outFile)`""          Start-Process -FilePath "C:\Program Files (x86)\Google\Chrome\Application\chrome.exe" -Wait -NoNewWindow -ArgumentList $args          }        default {          $args = "-p -q `"$($inFile)`" `"$($outFile)`""          Start-Process -FilePath ".\wkhtmltopdf.exe" -Wait -NoNewWindow -ArgumentList $args          }      }    }  }}Write-Host ''Write-Host 'Close window to exit ...'cmd /c pause | out-null

Wow 4wd! Thank you ! ;)
It is working for html and htm but not for wkhtmltopdf as it can't convert from mht to pdf.

Sorry, bit busy prepping to go overseas atm, if I have time in the next day or two I'll clean it up.
--- End quote ---
No problem. By that time I will do some tests. Many thanks in advance. ;)

Added *.htm to the Get-ChildItem filter so no need to edit and run separately, (not tested but works in other scripts - just remove it if there's a problem).

As for .mht, you could use PowerShell to pass them through the IE core to output as PDF ... so goes the theory.

Alternatively, convert to HTML first then run the script, interesting blog post:

The solution to meet the requirement need not necessarily be complex. Try going to the lowest common denominator - e.g., the data type. For example, I have been saving documents and web pages as .html and now (usually) .mht/.mhtml for years and searching them successfully with WDS (Windows Desktop Search) and GDS (Google Desktop Search). The files are all backed up (synced) to Google Drive.

Interesting points:

* The search and preview of these files in Google Drive itself though is not much use as it seems to have become somewhat proprietary in the way it enforces the preferred proprietary and/or Google docs extensions.
Moreover, the user is obliged to risk permanent degradation of their data if they convert to Google Docs format(s) from other formats.
* The best browsers for being able to view text and images in .html and .mht and .mhtml seem to be Internet Explorer :Thmbsup: and Firefox (not sure about the latest Firefox though). Chrome  :down: doesn't seem to do it it at all well, and Brave's capability  :down: appears to be nonexistant. :o
* Other tools for viewing these files include Everything (search tool) and xplorer² (Windows Explorer alternative), and not forgetting Universal Viewer.


[0] Message Index

[#] Next page

[*] Previous page

Go to full version