Home | Blog | Software | Reviews and Features | Forum | Help | Donate | About us
topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • September 24, 2016, 09:03:15 PM
  • Proudly celebrating 10 years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: IDEA: Batch merge many pdf (result : only one big pdf per subfolder)?  (Read 643 times)

jity2

  • Charter Member
  • Joined in 2006
  • ***
  • default avatar
  • Posts: 98
    • View Profile
    • Donate to Member
Dear all,

I need to merge many pdf files located into thousands of subfolders (several levels) full of pdf and other files. At the end, I need only one big pdf file per subfolder.



Example: “C:\Main_folder\” contains :

C:\Main_folder\subfolder_wgs\jshhd545.pdf

C:\Main_folder\subfolder_wgs\jshhd545.htm

C:\Main_folder\subfolder_wgs\ejkehe5485.pdf



C:\Main_folder\subfolder_ghdfdhd\jdjdhjd5545.pdf

C:\Main_folder\subfolder_ghdfdhd\jdsdjdh44.pdf



C:\Main_folder\subfolder_yuege255\uejgd56564\kdfhk5465.txt

C:\Main_folder\subfolder_yuege255\uejgd56564\kdfhk5465.pdf

…etc..

Desired results:

C:\Main_folder\subfolder_wgs\subfolder_wgs.pdf

C:\Main_folder\subfolder_ghdfdhd\subfolder_ghdfdhd.pdf

C:\Main_folder\subfolder_yuege255\uejgd56564\subfolder_yuege255.pdf  (or C:\Main_folder\subfolder_yuege255\uejgd56564\uejgd56564.pdf)

etc…



note: It would be great if results could be added into a new big folder (like C:\Main_folder2\ for instance).

I am on Win8.1 64bits.
Free or open source solutions preferred.
Thanks in advance ;)

4wd

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 4,394
    • View Profile
    • Donate to Member
Something along these lines?

jity2

  • Charter Member
  • Joined in 2006
  • ***
  • default avatar
  • Posts: 98
    • View Profile
    • Donate to Member
Thanks "4wd". ;)

I had some difficulties (virus or hammering websites) finding the correct programs that you have recommended files in the thread (http://www.donationc....msg374877#msg374877) but I finally found a work around with these links :
http://filehippo.com...rsal_extractor/4795/
http://web.archive.o...g/web/20140315000000*/http://www.adultpdf.com/products/txttopdf/txttopdf.exe
https://www.pdflabs....e-2.02-win-setup.exe
Universal extractor did not work with pdftk but I was able to find the correct file once I installed PDKtk in : "C:\Program Files (x86)\PDFtk\bin\".

So anyway I was able to test your solution. It works correctly for one folder. ;) I hope you can adapt it to subfolders. ;)
Note: the header is not really needed for me but please do as you prefer. ;)

Thanks in advance for you help ;)
Jity

PS: in my case I don't have password protected pdf files but if someone has some you can remove them using the shareware "PDF Password Remover v3.1" http://www.verypdf.c...ord-remover-com.html using these instructions :
In windows find the "command prompt" then copy/paste the following (just adapt the correct path C:\...\) :
for /r "C:\test\" %F in (*.pdf) do "C:\Program Files (x86)\PDF Password Remover v3.1\pdfdecrypt.exe" -i "%F"

4wd

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 4,394
    • View Profile
    • Donate to Member
I had some difficulties (virus or hammering websites) finding the correct programs that you have recommended files in the thread (http://www.donationc....msg374877#msg374877) but I finally found a work around with these links :
http://filehippo.com...rsal_extractor/4795/
http://web.archive.o...g/web/20140315000000*/http://www.adultpdf.com/products/txttopdf/txttopdf.exe
https://www.pdflabs....e-2.02-win-setup.exe
Universal extractor did not work with pdftk but I was able to find the correct file once I installed PDKtk in : "C:\Program Files (x86)\PDFtk\bin\".

Yes, probably need to update the innounp binary for UniExtract, I run a more updated version than what is available on the original site along with updated extractor binaries.  You can get it here, (17.51MB - link will expire in 72 hours).

Quote
I hope you can adapt it to subfolders. ;)

Should be easy, I'll have a play around.

4wd

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 4,394
    • View Profile
    • Donate to Member
jity2's Strange PDF Thing (jsPDFt)  :P

Code: PowerShell [Select]
  1. <#
  2.   jsPDFt.ps1
  3.  
  4.   Concatenate PDFs in sub-folders
  5. #>
  6.  
  7. Function Get-Folder {
  8.   Add-Type -AssemblyName System.Windows.Forms
  9.   $FolderBrowser = New-Object System.Windows.Forms.FolderBrowserDialog
  10.   [void]$FolderBrowser.ShowDialog()
  11.   $temp = $FolderBrowser.SelectedPath
  12.   If($temp -eq '') {Exit}
  13.   If(-Not $temp.EndsWith('\')) {$temp = $temp + '\'}
  14.   Return $temp
  15. }  
  16.  
  17. Function Get-PDF {
  18.   Param(
  19.     [String]$folder,
  20.     [string]$source,
  21.     [string]$dest,
  22.     [string]$command
  23.   )
  24.   If(-Not $folder.EndsWith('\')) {$folder = $folder + '\'}
  25.  
  26. # If there are any PDFs in the folder then execute the concatenation
  27.   If(@(Get-ChildItem -Path ($folder + '*') -Include *.pdf -File).Count -gt 0) {
  28.     Write-Host 'Folder:' $folder -BackgroundColor Black -ForegroundColor Yellow
  29. # Create output file name
  30.     If((Split-Path -Path $folder -Leaf).EndsWith('\')) { # Input was root folder
  31.       $outFile = $dest + '\' + $source.Substring(0, 1) + '.pdf'
  32.     } else {
  33.       $outFile = $dest + $folder.Replace($source, "") + '\' + (Split-Path -Path $folder -Leaf) + '.pdf'
  34.     }
  35.     $outFile = $outFile.Replace('\\', '\')
  36.  
  37. # Otherwise, check if output folder exists and create if necessary
  38.     If(-Not (Test-Path (Split-Path -Path $outFile))) {
  39.       New-Item (Split-Path -Path $outFile) -Type Directory | Out-Null
  40.     }
  41.  
  42.     Write-Host 'File:  ' $outFile -BackgroundColor Black -ForegroundColor White
  43. # Compose argument string
  44.     $arguments = '"' + $folder + '*.pdf" cat output "' + $outFile + '"'
  45. # Execute pdftk.exe with arguments
  46.     Start-Process -FilePath $command -Wait -ArgumentList $arguments -NoNewWindow
  47.   }
  48. }
  49.  
  50. $console = $host.UI.RawUI
  51. $size = $console.WindowSize
  52. $size.Width = 80
  53. $size.Height = 30
  54. $console.WindowSize = $size
  55.  
  56. If($PSVersionTable.PSVersion.Major -lt 3) {
  57.     Write-Host '** Script requires at least Powershell V3 **'
  58.   } else {
  59.   Write-Host 'Choose folder with PDF files: ' -NoNewline -BackgroundColor DarkGreen -ForegroundColor White
  60.   $srcFolder = (Get-Folder)
  61.   Write-Host $srcFolder
  62.   Write-Host 'Choose output folder: ' -NoNewline -BackgroundColor DarkGreen -ForegroundColor White
  63.   Do {$dstFolder = (Get-Folder)} While($dstFolder -eq $srcFolder)
  64.   Write-Host $dstFolder
  65.  
  66. # Full path to pdftk.exe which should be in the same folder as Powershell script
  67.   $pdftk = '"' + (Split-Path $SCRIPT:MyInvocation.MyCommand.Path -Parent) + '\pdftk.exe"'
  68. # List of sub-folders
  69.   $aFolders = @(Get-ChildItem -Path $srcFolder -Recurse -Directory | Select-Object -ExpandProperty Fullname)
  70.   for($i=0; $i -lt $aFolders.Count; $i++) {
  71. # For each sub-folder call the routine
  72.     Get-PDF $aFolders[$i] $srcFolder $dstFolder $pdftk
  73.   }
  74. # Finally, call routine on source folder
  75.   Get-PDF $srcFolder $srcFolder $dstFolder $pdftk
  76. }
  77.  
  78. Write-Host ''
  79. Write-Host 'Close window to exit ...'
  80. cmd /c pause | out-null

Requires the files from PDF Toolkit either extract them from the setup file using UniExtract or install it and then copy the files into the same folder as the script, (then you can uninstall it).

So your folder will look like this:

2016-06-27 17_02_29.pngIDEA: Batch merge many pdf (result : only one big pdf per subfolder)?

Run jsPDFt.ps1 using the shortcut.

Hopefully it'll work - it did here.

NOTES:
  • No provision for password protected PDFs, it'll probably die a horrible death while running if it finds one.
  • No provision for seeing if the output file already exists, if it does it will probably be overwritten.

jity2

  • Charter Member
  • Joined in 2006
  • ***
  • default avatar
  • Posts: 98
    • View Profile
    • Donate to Member
Dear "4wd",
Many thanks. ;) It works like a charm. ;)
Thank you again ;)
See ya

4wd

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 4,394
    • View Profile
    • Donate to Member
You're welcome, I think we can get skwire to move this thread  :)