Author Topic: DONE: Mass convert already locally saved html (+htm +mht) files to pdf (Read 21425 times)

jity2 · « **on:** March 14, 2019, 12:53 PM »

Dear all,

I would like to convert the html files of my archives into pdf (text OCRed + local related saved images included) so I can make keyword searches in them with Google Drive.
I need that the related images saved with the html file (usually in a related folder) be included as well as the url available in the html. I would prefer that the tool does its job with an offline mode as all the info that I have is already saved in the local html files, so it doesn't spend times to try all missing urls.

My about 20 years archives (many thousand of files) were mostly saved with Internet Explorer (Maxthon) for a few years then mostly with Firefox, and now Google Chrome (and httrack).

The idea : I choose one big folder "Source" (usually a month archives), it scan alone all the html, htm and mth files, in all the subfolders, than create in a big folder "Target" all the converted pdf files with the original names in the same subfolders.
Example :
C:\Source\2009\2019-04\15\2009_04_15_075256.html
...
C:\Target\2009\2019-04\15\2009_04_15_075256.pdf

I have tried to use wkhtmltopdf which is based on webkit (Safari https://github.com/w...tmltopdf/issues/3163) with the following script (based on a old AHK script found here) :

[Select]

WorKingDir := "C:\prog\wkhtmltox\bin"
pdParams := "wkhtmltopdf.exe "
FileSelectFolder,SourcePath,,0,Select Source Folder
If SourcePath =
ExitApp

FileSelectFolder,TargetPath,*%SourcePath%,0,Select Target Folder
If TargetPath =
ExitApp

If (SourcePath==TargetPath){
Msgbox 0x40000,, % "SourcePath and TargetPath cant be the same " TargetPath
ExitApp
}

Loop, Files, %SourcePath%\*.htm, R
{
SplitPath, A_LoopFileFullPath, , , , OutNameNoExt
pCmd := pdParams """" A_LoopFileFullPath """" " " """" TargetPath "\" OutNameNoExt "_.pdf" """"
RunWait % comspec " /c " pCmd , % WorKingDir
}
ExitApp

Results seems ok but here are the problems that I have found :
- The output folders are not created as wkhtmltopdf puts all created pdf files into the Target folder without subfolders. This creates problems when html files have the same name as wkhtmltopdf overwrites them !
Feature request: create output folders if necessary
https://github.com/w...tmltopdf/issues/2421

- Sometimes it creates small unnecessary pdf files (2ko!). I can later delete them. I think they are created as when saving a webpage CTRL+S there are also some small htm files created in a related folder.
Example:
C:\A\save01.html
C:\A\save01\image.gif
C:\A\save01\image.htm
...
so in fact this is normal and ok !

- I have tried to implement the following trick :
offline mode: does not try to look for missing component online for locally saved html files
https://github.com/w...tmltopdf/issues/3294

I have replaced this line of code:

[Select]

pdParams := "wkhtmltopdf.exe "

with :

[Select]

pdParams := "wkhtmltopdf.exe --proxy=http://127.0.0.1:0 "

Alas it didn't work as it is still slowly trying to crawl missing urls online.

- No silent mode as flashing cmd windows don't let me continue to work on my computer!

If I am not using the correct tool, I'd be pleased to try other ideas (maybe based on other rendering engines ?).

I realize that no method is error free so text or images may not be rendered perfectly each time but as long as I have most of them it will be fine.

Many thanks in advance

Jity2

Win 8.1 64bits home

Shades · « **Reply #1 on:** March 15, 2019, 01:07 AM »

There is a piece of software, called: PanDoc.

It converts a lot of text based formats to other text based formats. One of those is HTML to PDF. It is available for all the major operating systems. It is freeware and really good at what it does. However, it is a command-line tool and that makes it immediate software non grata to some. A manual is included and likely you'll need to install GhostScript (also freeware for all major OS's) for PDF. The amount of parameters you can adjust is staggering and while that may frighten you a bit, the default values for these have worked well for me, on the occasions that I used PanDoc.

Both offline and online documentation is easy to follow. As it is a command-line tool, that means you can scripting to automatically go through your whole collection, even on timed intervals if you have a desire for that as well.

tomos · « **Reply #2 on:** March 15, 2019, 03:19 AM »

@jity2 yöu mention OCR - what for? Are there some images with text involved?

jity2 · « **Reply #3 on:** March 15, 2019, 03:32 AM »

Hi,
Thanks Shades. I am trying PanDoc right now !

@Tomos: It is just that if there is some text content inside the html, it is a text content that can be read (or later indexed by Google Drive) in the pdf file.
No need to use a program to do the OCR of image files contained inside html files.
I am not sure I am clear! But I don't want an image only pdf file as a result.

Thanks in advance

jity2 · « **Reply #4 on:** March 15, 2019, 04:35 AM »

I did some Pandoc tests but I keep getting the same error :

Code example:

[Select]

pandoc C:\prog\pandoc\pandoc-2.7.1-windows-x86_64\t5\20170611_074645.htm -t latex --pdf-engine=xelatex -s -o C:\prog\pandoc\pandoc-2.7.1-windows-x86_64\t5\20170611_074645.pdf

Error:

DONE: Mass convert already locally saved html (+htm +mht) files to pdf
I did some google searches but I am stuck!

Thanks in advance

jity2 · « **Reply #5 on:** March 15, 2019, 05:13 AM »

I just tested with Chrome Headless browser :
(Adapted from https://superuser.com/a/1211603/27956)
[windows+R]
cmd

[Select]

cd C:\Program Files (x86)\Google\Chrome\Application
chrome --headless --print-to-pdf="C:\result\20170619_075623.pdf" "C:\source\t2\20170619_075623.htm"

I need now to try to adapt the above ahk script.
Edit:Here is what I have tried but I am stuck !

[Select]

WorKingDir := "C:\Program Files (x86)\Google\Chrome\Application"
pdParams := "chrome.exe --headless --print-to-pdf= "
FileSelectFolder,SourcePath,,0,Select Source Folder
If SourcePath =
ExitApp

FileSelectFolder,TargetPath,*%SourcePath%,0,Select Target Folder
If TargetPath =
ExitApp

If (SourcePath==TargetPath){
Msgbox 0x40000,, % "SourcePath and TargetPath cant be the same " TargetPath
ExitApp
}

Loop, Files, %SourcePath%\*.htm, R
{
SplitPath, A_LoopFileFullPath, , , , OutNameNoExt
pCmd := pdParams """" A_LoopFileFullPath """" " " """" TargetPath "\" OutNameNoExt "_.pdf" """"
RunWait % comspec " /c " pCmd , % WorKingDir
}
ExitApp

Thanks in advance

4wd · « **Reply #6 on:** March 15, 2019, 07:50 AM »

Something quick in PowerShell using wkhtmltopdf, (put the executable in the same directory as the script), and Chrome.

Recursively converts .html, .htm, and .mht to PDF files.

Btw, if it looks familiar, 90% came from here.

Run it from a PoSh console or use a shortcut with the following as the Target, (assuming shortcut in the same folder as the script):

Code: Text [Select]

%SystemRoot%\system32\WindowsPowerShell\v1.0\powershell.exe -sta -NoProfile -ExecutionPolicy Bypass -File "CTP.ps1"

CTP.ps1

Code: PowerShell [Select]

<#
  CTP.ps1
  
  Convert .htm(l) and .mht to PDF
#>
 
Function Get-Folder {
  Add-Type -AssemblyName System.Windows.Forms
  $FolderBrowser = New-Object System.Windows.Forms.FolderBrowserDialog
  [void]$FolderBrowser.ShowDialog()
  $temp = $FolderBrowser.SelectedPath
  If($temp -eq '') {Exit}
  Return $temp
}  
 
If($PSVersionTable.PSVersion.Major -lt 3) {
  Write-Host '** Script requires at least Powershell V3 **'
} else {
  Write-Host 'Choose input folder: ' -NoNewline -BackgroundColor DarkGreen -ForegroundColor White
  $srcFolder = (Get-Folder)
  Write-Host $srcFolder
  Write-Host 'Choose output folder: ' -NoNewline -BackgroundColor DarkGreen -ForegroundColor White
  Do {$dstFolder = (Get-Folder)} While($dstFolder -eq $srcFolder)
  Write-Host $dstFolder
 
  # PowerShell doesn't care how many stacked '\' there are in a path, multiples will always be seen as one
  # ie. C:\\\\Windows = C:\Windows
  # Means you can just add to the path without worrying about the trailing character
  # Collect all files bigger than 3kB
  $aFiles = (Get-ChildItem -Include *.html,*.htm,*.mht -Path ($srcFolder + "\*") -Recurse | Where-Object {$_.Length -gt 3kb} )
  for($i = 0; $i -lt $aFiles.Count; $i++) {
    $inFile = [string]$aFiles[$i]
# Substitute destination folder for source folder and tack .pdf on the end
# Could probably replace the extension instead ... but laziness and all that
    $outFile = $dstFolder + $inFile.Replace($srcFolder, "") + '.pdf'
# If output file doesn't exist then process
    if (!(Test-Path $outFile)) {
# Grab the parent of the output file, create the folder structure if it doesn't exist
      $temp = Split-Path $outFile -Parent
      if (!(Test-Path $temp)) {
        New-Item $temp -ItemType Directory | Out-Null
      }
 
      Write-Host 'File:' $inFile -BackgroundColor DarkBlue -ForegroundColor Yellow
# Switch command based on the last 3 characters of the file name, anything that isn't 'mht' gets processed
# as htm(l)
      switch ($inFile.Substring([Math]::Max($inFile.Length - 3, 0))) {
        mht {
          $args = "`"$($inFile)`" --headless --print-to-pdf=`"$($outFile)`""
          Start-Process -FilePath "C:\Program Files (x86)\Google\Chrome\Application\chrome.exe" -Wait -NoNewWindow -ArgumentList $args
          }
        default {
          $args = "-p 127.0.0.1 -q `"$($inFile)`" `"$($outFile)`""
          Start-Process -FilePath ".\wkhtmltopdf.exe" -Wait -NoNewWindow -ArgumentList $args
          }
      }
    }
  }
}
Write-Host ''
Write-Host 'Close window to exit ...'
cmd /c pause | out-null

jity2 · « **Reply #7 on:** March 15, 2019, 08:27 AM »

Wow 4wd! Thank you !

It is working for html and htm but not for wkhtmltopdf as it can't convert from mht to pdf.

Sorry, bit busy prepping to go overseas atm, if I have time in the next day or two I'll clean it up.

No problem. By that time I will do some tests. Many thanks in advance.

4wd · « **Reply #8 on:** March 15, 2019, 05:41 PM »

Added *.htm to the Get-ChildItem filter so no need to edit and run separately, (not tested but works in other scripts - just remove it if there's a problem).

As for .mht, you could use PowerShell to pass them through the IE core to output as PDF ... so goes the theory.

Alternatively, convert to HTML first then run the script, interesting blog post: http://raywoodcocksl...-html-mht-files.html

IainB · « **Reply #9 on:** March 15, 2019, 05:43 PM »

The solution to meet the requirement need not necessarily be complex. Try going to the lowest common denominator - e.g., the data type. For example, I have been saving documents and web pages as .html and now (usually) .mht/.mhtml for years and searching them successfully with WDS (Windows Desktop Search) and GDS (Google Desktop Search). The files are all backed up (synced) to Google Drive.

Interesting points:

The search and preview of these files in Google Drive itself though is not much use as it seems to have become somewhat proprietary in the way it enforces the preferred proprietary and/or Google docs extensions.
Moreover, the user is obliged to risk permanent degradation of their data if they convert to Google Docs format(s) from other formats.
The best browsers for being able to view text and images in .html and .mht and .mhtml seem to be Internet Explorer and Firefox (not sure about the latest Firefox though). Chrome doesn't seem to do it it at all well, and Brave's capability appears to be nonexistant.
Other tools for viewing these files include Everything (search tool) and xplorer² (Windows Explorer alternative), and not forgetting Universal Viewer.

jity2 · « **Reply #10 on:** March 16, 2019, 06:48 AM »

Hi,
Many thanks 4wd.

I much appreciated.

I did some tests (one old month saved with IE and one old month saved with Firefox) with your updated script and it works fine for html and htm files at the same time.

As I have quite some files converting htm and html files should run for about a few months! But it seems that I can speed up the converting if I run several (copied and renamed -shortcut included ) powershell instances.

DONE: Mass convert already locally saved html (+htm +mht) files to pdf

I have added a few manual steps :
Modified from https://stackoverflo...irectory-recursively , here is the Powershell code that I use to remove the pdf files that are smaller than 3ko (created by wkhtmltopdf, they in fact contains no text in my case):

[Select]

Get-ChildItem $path -Filter *.pdf -recurse -file | ? {$_.length -lt 3000} | % {Remove-Item $_.fullname}

Then I use the freeware "Remove Empty Directories" http://www.jonasjohn.de/red.htm which removes..all empty directories in all the subfolders. It is very powerful IMHO.

I don't know if this is possible but it would be great if the Powershell could exclude converting htm and html files that are smaller than 3ko ? Thanks in advance

You also have a great memory for my 2016 request.

But I must acknowledge that I wouldn't have been able to modify the 10% left in the new code!!!

For mht files to pdf:
Thanks for the mht to html link. It reminds me that finding a simple solution can leads to many trials !
Mine were not created with Internet Explorer but were and are created with Google Chrome. In my manual tests, these mht files are often better displayed in Chrome than in I.E.

Here are my tests :
[windows+R]
cmd
[Select]
cd C:\Program Files (x86)\Google\Chrome\Application

Apparently Chrome also understands if I change the code from
chrome --headless --print-to-pdf="C:\result\20170619_075623.pdf" "C:\source\t2\20170619_075623.htm"
to
chrome --headless "C:\source\t2\20170619_075623.htm" --print-to-pdf="C:\result\20170619_075623.pdf"

And after some tests (thanks to https://www.autohotk...iewtopic.php?t=26819) it helped me having a working code ! It works

but it copies other files (png..) in the target folder !

DONE: Mass convert already locally saved html (+htm +mht) files to pdf
:

[Select]

WorKingDir := "C:\Program Files (x86)\Google\Chrome\Application"
pdParams := "chrome.exe --headless "
FileSelectFolder,SourcePath,,0,Select Source Folder
If SourcePath =
ExitApp

FileSelectFolder,TargetPath,*%SourcePath%,0,Select Target Folder
If TargetPath =
ExitApp

pdParams := "chrome.exe --headless "
WorKingDir := "C:\Program Files (x86)\Google\Chrome\Application"
RunWait % comspec " /c xCopy """ SourcePath A_loopField """ """ TargetPath A_loopField """ *.mht /s /i /y",, Hide

Loop, Files, % TargetPath "\*.mht" , R
{
SplitPath, A_LoopFileFullPath, name, dir, ext, name_no_ext
outPDF_repared := dir "\" name_no_ext "" ".pdf"
pCmd := pdParams " " """" A_LoopFileFullPath """" " " """" "--print-to-pdf="outPDF_repared """"
RunWait % comspec " /c " pCmd , % WorKingDir , Hide
FileAppend, % "Result pdrepair`n" outPDF_repared "`n", % A_Temp "\LOG_pdrepair.txt"
FileRead, outLOG, % TargetPath "\LOG.txt"
FileAppend, % outLOG "`n" , % A_Temp "\LOG_pdrepair.txt"
FileDelete, % A_LoopFileFullPath
}

Msgbox 0x40000,, % "END!",1

ExitApp

I have tried to change:
RunWait % comspec " /c xCopy """ SourcePath A_loopField """ """ TargetPath A_loopField """ /s /i /y",, Hide
with
RunWait % comspec " /c xCopy """ SourcePath A_loopField """ """ TargetPath A_loopField """ *.mht /s /i /y",, Hide
or
RunWait % comspec " /c xCopy """ SourcePath A_loopField """ *.mht """ TargetPath A_loopField """ /s /i /y",, Hide
or
RunWait % comspec " /c xCopy *.mht """ SourcePath A_loopField """ """ TargetPath A_loopField """ /s /i /y",, Hide
or
RunWait % comspec " /c xCopy "\*.mht" """ SourcePath A_loopField """ """ TargetPath A_loopField """ /s /i /y",, Hide
Alas I am stuck !

So I have tried to modify your Powershell script :

[Select]

<#
CTP.ps1

Recursively convert *.mht to PDF.
#>

Function Get-Folder {
Add-Type -AssemblyName System.Windows.Forms
$FolderBrowser = New-Object System.Windows.Forms.FolderBrowserDialog
[void]$FolderBrowser.ShowDialog()
$temp = $FolderBrowser.SelectedPath
If($temp -eq '') {Exit}
If(-Not $temp.EndsWith('\')) {$temp = $temp + '\'}
Return $temp
}

If($PSVersionTable.PSVersion.Major -lt 3) {
Write-Host '** Script requires at least Powershell V3 **'
} else {
Write-Host 'Choose folder with PDF files: ' -NoNewline -BackgroundColor DarkGreen -ForegroundColor White
$srcFolder = (Get-Folder)
Write-Host $srcFolder
Write-Host 'Choose output folder: ' -NoNewline -BackgroundColor DarkGreen -ForegroundColor White
Do {$dstFolder = (Get-Folder)} While($dstFolder -eq $srcFolder)
Write-Host $dstFolder

$aFiles = (Get-ChildItem -Include *.mht -Path ($srcFolder + "*") -Recurse)
for($i = 0; $i -lt $aFiles.Count; $i++) {
$inFile = [string]$aFiles[$i]
Write-Host 'File:' $inFile -BackgroundColor DarkBlue -ForegroundColor Yellow
$outFile = $dstFolder + $inFile.Replace($srcFolder, "") + '.pdf'
$temp = Split-Path $outFile -Parent
if (!(Test-Path $temp)) {
New-Item $temp -ItemType Directory | Out-Null
}
$args = "`"$($infile)`" chrome --headless --print-to-pdf=`"$($outFile)`""
Start-Process -FilePath "C:\Program Files (x86)\Google\Chrome\Application\chrome.exe" -Wait -NoNewWindow -ArgumentList $args
}
}

See "--print-to-pdf=". Alas this doesn't work !

@lainB: I am saving mht like you now (https://www.donation....msg417446#msg417446). I don't use Google Docs now for uploaded files (I used to save htm files from Firefox but I have stopped). I just now mainly upload pdf files into Google Drive.

Many thanks in advance

4wd · « **Reply #11 on:** March 16, 2019, 08:19 AM »

I don't know if this is possible but it would be great if the Powershell could exclude converting htm and html files that are smaller than 3ko ?
-jity2 (March 16, 2019, 06:48 AM)

You already have the answer, same idea as deleting the small PDFs.

Code: PowerShell [Select]

$aFiles = (Get-ChildItem -Include *.html,*.htm -Path ($srcFolder + "*") -Recurse | Where-Object {$_.Length -gt 3kb} )

As for Chrome, I couldn't get it to work from PowerShell either, I'll have another look when I have time.

But you do have the args wrong, should be:

Code: PowerShell [Select]

$args = "`"$($infile)`" --headless --print-to-pdf=`"$($outFile)`""

Just the arguments for the command.

Could try adding a working directory also:

Code: PowerShell [Select]

Start-Process -FilePath "C:\Program Files (x86)\Google\Chrome\Application\chrome.exe" -Wait -NoNewWindow -ArgumentList $args -WorkingDirectory "C:\Program Files (x86)\Google\Chrome\Application"

jity2 · « **Reply #12 on:** March 16, 2019, 09:24 AM »

Wow this is fantastic 4wd !

For the second part I have seen no visible differences when adding the "working directory".

Many thanks. I much appreciated.

So as a summary:
In order to convert htm to pdf and html to pdf, use this Powershell script written by 4wd (if Powershell is new to you have a look here https://www.donation....msg399588#msg399588) :

Code: PowerShell [Select]

<#
  CTP.ps1
  
  Recursively convert *.htm and *.html to PDF + exclude htm and html files that are smaller than 3kb.
#>
 
Function Get-Folder {
  Add-Type -AssemblyName System.Windows.Forms
  $FolderBrowser = New-Object System.Windows.Forms.FolderBrowserDialog
  [void]$FolderBrowser.ShowDialog()
  $temp = $FolderBrowser.SelectedPath
  If($temp -eq '') {Exit}
  If(-Not $temp.EndsWith('\')) {$temp = $temp + '\'}
  Return $temp
}  
 
If($PSVersionTable.PSVersion.Major -lt 3) {
  Write-Host '** Script requires at least Powershell V3 **'
} else {
  Write-Host 'Choose folder with PDF files: ' -NoNewline -BackgroundColor DarkGreen -ForegroundColor White
  $srcFolder = (Get-Folder)
  Write-Host $srcFolder
  Write-Host 'Choose output folder: ' -NoNewline -BackgroundColor DarkGreen -ForegroundColor White
  Do {$dstFolder = (Get-Folder)} While($dstFolder -eq $srcFolder)
  Write-Host $dstFolder
 
  $aFiles = (Get-ChildItem -Include *.html,*.htm -Path ($srcFolder + "*") -Recurse | Where-Object {$_.Length -gt 3kb} )
  for($i = 0; $i -lt $aFiles.Count; $i++) {
    $inFile = [string]$aFiles[$i]
    Write-Host 'File:' $inFile -BackgroundColor DarkBlue -ForegroundColor Yellow
    $outFile = $dstFolder + $inFile.Replace($srcFolder, "") + '.pdf'
    $temp = Split-Path $outFile -Parent
    if (!(Test-Path $temp)) {
      New-Item $temp -ItemType Directory | Out-Null
    }
    $args = "`"$($infile)`" -p 127.0.0.1 `"$($outFile)`""
    Start-Process -FilePath ".\wkhtmltopdf.exe" -Wait -NoNewWindow -ArgumentList $args
  }
}

In order to convert mht to pdf, use this Powershell script written by 4wd (mht files created with Google Chrome) :

Code: PowerShell [Select]

<#
  CTP.ps1
  
  Recursively convert *.mht to PDF.
#>
 
Function Get-Folder {
  Add-Type -AssemblyName System.Windows.Forms
  $FolderBrowser = New-Object System.Windows.Forms.FolderBrowserDialog
  [void]$FolderBrowser.ShowDialog()
  $temp = $FolderBrowser.SelectedPath
  If($temp -eq '') {Exit}
  If(-Not $temp.EndsWith('\')) {$temp = $temp + '\'}
  Return $temp
}  
 
If($PSVersionTable.PSVersion.Major -lt 3) {
  Write-Host '** Script requires at least Powershell V3 **'
} else {
  Write-Host 'Choose folder with PDF files: ' -NoNewline -BackgroundColor DarkGreen -ForegroundColor White
  $srcFolder = (Get-Folder)
  Write-Host $srcFolder
  Write-Host 'Choose output folder: ' -NoNewline -BackgroundColor DarkGreen -ForegroundColor White
  Do {$dstFolder = (Get-Folder)} While($dstFolder -eq $srcFolder)
  Write-Host $dstFolder
 
  $aFiles = (Get-ChildItem -Include *.mht -Path ($srcFolder + "*") -Recurse)
  for($i = 0; $i -lt $aFiles.Count; $i++) {
    $inFile = [string]$aFiles[$i]
    Write-Host 'File:' $inFile -BackgroundColor DarkBlue -ForegroundColor Yellow
    $outFile = $dstFolder + $inFile.Replace($srcFolder, "") + '.pdf'
    $temp = Split-Path $outFile -Parent
    if (!(Test-Path $temp)) {
      New-Item $temp -ItemType Directory | Out-Null
    }
    $args = "`"$($infile)`" --headless --print-to-pdf=`"$($outFile)`""
    Start-Process -FilePath "C:\Program Files (x86)\Google\Chrome\Application\chrome.exe" -Wait -NoNewWindow -ArgumentList $args -WorkingDirectory "C:\Program Files (x86)\Google\Chrome\Application"
  }
}

Many thanks again to 4wd for the great help.

See ya

4wd · « **Reply #13 on:** March 16, 2019, 07:49 PM »

Many thanks again to 4wd for the great help.
-jity2 (March 16, 2019, 09:24 AM)

Thanks, edited my OP, now does the lot, (.html, .htm, .mht).

Updated

jity2 · « **Reply #14 on:** March 17, 2019, 05:58 AM »

Wow ! Many thanks 4wd. This is working like a charm !

After several tests I realize that I have to close your script(s) each night even if it has not finished its job.
note: wkhtmltopdf uses very little CPU and Chrome far more. So I run several copies of your script at the same time (especially for folders containing htm and html files).
And the next day when I open the script(s), it would be great if it can avoid converting to pdf if the pdf file already exist in the destination folder.
Currently I have to manually dig and move the files and specific folders in order to avoid spending a few hours just to continue where it has stopped.

I have tried to insert this code at row #43:

Code: PowerShell [Select]

if {($inFile.Substring([Math]::Max($inFile.Length - 3, 0))) = $outFile
next i
}

After testing it, alas this proves that ...I am still not a coder !!

Many thanks in advance

4wd · « **Reply #15 on:** March 17, 2019, 06:57 AM »

Just need to test for the existence of the output file:

Code: PowerShell [Select]

if (!(Test-Path $outFile)) {
...
}

Added, try it now, (haven't tested it but it should work).

jity2 · « **Reply #16 on:** March 17, 2019, 01:22 PM »

I have tested your update and this is so great 4wd !

Many thanks.

Inspired by https://stackoverflo...fore-printing-to-pdf I have added for Chrome (see in row #49) :

Code: PowerShell [Select]

$args = "`"$($inFile)`" --headless --run-all-compositor-stages-before-draw --virtual-time-budget=10000 --print-to-pdf=`"$($outFile)`""

For me it doesn't seem to change anything ! Maybe this would help someone in the future...

Thank you again

See ya

skwire · « **Reply #17 on:** March 19, 2019, 08:13 AM »

Thanks, 4wd. Moving thread to the Finished section.

Author Topic: DONE: Mass convert already locally saved html (+htm +mht) files to pdf (Read 21425 times)

jity2

DONE: Mass convert already locally saved html (+htm +mht) files to pdf

Shades

Re: DONE: Mass convert already locally saved html (+htm +mht) files to pdf

tomos

Re: DONE: Mass convert already locally saved html (+htm +mht) files to pdf

jity2

Re: DONE: Mass convert already locally saved html (+htm +mht) files to pdf

jity2

Re: DONE: Mass convert already locally saved html (+htm +mht) files to pdf

jity2

Re: DONE: Mass convert already locally saved html (+htm +mht) files to pdf

4wd

Re: DONE: Mass convert already locally saved html (+htm +mht) files to pdf

jity2

Re: DONE: Mass convert already locally saved html (+htm +mht) files to pdf

4wd

Re: DONE: Mass convert already locally saved html (+htm +mht) files to pdf

IainB

Re: DONE: Mass convert already locally saved html (+htm +mht) files to pdf

jity2

Re: DONE: Mass convert already locally saved html (+htm +mht) files to pdf

4wd

Re: DONE: Mass convert already locally saved html (+htm +mht) files to pdf

jity2

Re: DONE: Mass convert already locally saved html (+htm +mht) files to pdf

4wd

Re: DONE: Mass convert already locally saved html (+htm +mht) files to pdf

jity2

Re: DONE: Mass convert already locally saved html (+htm +mht) files to pdf

4wd

Re: DONE: Mass convert already locally saved html (+htm +mht) files to pdf

jity2

Re: DONE: Mass convert already locally saved html (+htm +mht) files to pdf

skwire

Re: DONE: Mass convert already locally saved html (+htm +mht) files to pdf