Screenshot Captor / Re: Scroll capture - Errors and Home button
« on: July 19, 2020, 04:18 AM »
Dear Mouser,

I think I have a similar problem with the scrolling window capture (for some non downloadable Google Drive shared big pdf files.).
Maybe an idea : The "save each capture as a separate image" feature is great but could it be automatic without keeping all the capture images in RAM memory at once ? Like that no more RAM problem ! ;)
I.e. : Screenshot Captor saves each screenshot file on the hard drive just before scrolling to the next page. ;)
Next I could split the saved images files at once for instance with XnView (See : Tools/Batch processing). And make a big pdf out of it. ;)
Many thanks in advance ;)

Hi 4wd,

Thank you so much for your updated code. ;) This is working great. ;)

Not sure whether you mean folder1 should be completely empty at the end or not since files that aren't PDF will be remaining - ie. delete everything in folder1 after running the Check-PDF function.

In which case, what happens to the non-PDF files that were in the folders?
Delete or move with PDF?
Keeping (like you have already done) what is left inside folder1 is fine. ;)

I did some tests and here is a zip files with examples :
The main bug (see "2-itextsharp.pdfa") is when in folder1 a folder is named "bla.pdf" it causes a "stackoverflow" bug in powershell (it closes Powershell and restarts it when I run it in edit mode).
The other thing is strange filenames (maybe some asian text?). Edit 1: it is because on my computer the folder path is too long on some of those one ! Otherwise it is renaming them fine ! ;)

Others are detecting invalid pdf files (see "1-malformed_pdf"- from what I understand there are at least 2 problems : 1)Invalid pdf file and 2)The image file format has not been recognized  ). From experience it is a complex problem and I think it is better if I do it by hand with PDFinfoGUI (*) and remove them with an excel macro as I can see very fast if I need to download again some important files. So please forget about those. ;)
(*) neither yes or no in column encrypted and other columns - Then I copy the list - except the important one

Also, many thanks for the detailed explanations of long names. I appreciated.;) Your truncate filename current code is just already very fine for me. ;) Thanks. ;)

Currently doesn't check for the existence of a file with the same name before renaming.
If I understand well it is because "folder1\venise-.pdf" would have the same name of a file already available in folder3 "folder3\venise.pdf". Renaming the new one (for instance with a counter "venise1.pdf" would be fine).

I have added a small function in order to delete empty folders in folder3
Function Delete-SmallPDF2 {
  param (
  if ($delSmall) {
    Get-ChildItem "$($folder3)\*" -Include *.pdf -Recurse | ? {$_.length -lt 2048} | % {Remove-Item $_.fullname -Force}
  Get-ChildItem "$($folder3)" -recurse | Where {$_.PSIsContainer -and `
    @(Get-ChildItem -Lit $_.Fullname -r | Where {!$_.PSIsContainer}).Length -eq 0} | Remove-Item -recurse

Edit 2:
I forgot I had this error message :
PS C:\Windows\System32\WindowsPowerShell\v1.0> C:\Users\E\Documents\tests\jityPDF.ps1
Add-Type : Cannot bind parameter 'Path' to the target. Exception setting "Path": "Cannot find path 'C:\Windows\System32\WindowsPowerShell\v1.0\itextsharp.dll' because it does not exist."
So I have copied "itextsharp.dll" in C:\Windows\System32\WindowsPowerShell\v1.0\itextsharp.dll
It may explain why if I try to use 4wd's code in another hard drive (example : L:\) even if I copied the 7z.dll, 7z.exe and itextsharp.dll files and adapt the code for new locations, the script doesn't show errors but it fails to run properly the Check-PDF part. It moves some image based pdf in folder2 instead of folder3 for some files ? So I stay in C:\ ;)

Thanks in advance ;)

Hi 4wd,
Wow. Many thanks. This is fantastic. ;)

I did some tests and the only small things that are not working are :
- it doesn't delete empty folders of folder1 (if some subfolders are empty or not with other kind of files).

Testing for text PDFs ...
Move-Item : Cannot retrieve the dynamic parameters for the cmdlet. The specified wildcard character pattern is not valid: [Lac_ven_Drnyvn,_Sr_ehajoe_Uduizn,_Giles_Suilo-Sm(
At C:\Users\E\Documents\test\jityPDFt3v5_7zip.ps1:82 char:7
+       Move-Item "$($files[$i])" -Destination "$($outfile)"
+       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidArgument: (:) [Move-Item], ParameterBindingException
    + FullyQualifiedErrorId : GetDynamicParametersException,Microsoft.PowerShell.Commands.MoveItemCommand

Then, the test "for text PDFs" stops and the other same kind of files are not processed.

Also if the pdf file has "[" in its filename, it ignores it (but creates an empty folder in folder2 if the pdf is located into a subfolder).

And the great thing is that it finds encrypted pdf with or without OCR already done moves them accordingly. ;)

Thanks in advance ;)

Hi 4wd,

Many thanks for your detailed explanations. ;) Your last code update is working great for 'unzipping' zip and rar files. ;)  :Thmbsup:

(A simple note for those following : I have downloaded this 7zip version ( that I have extracted it in "C:\Program Files\7-Zip". Then I have copied "7z.exe" and "7z.dll" in my working directory.)

Still doesn't do extracted archives ... still thinking about it  :P

I thought that if I run it twice it would find them (example : folder1\6\6\  during the next run (which would be fine) but no.

Thanks in advance ;)

Hi again,

Would this help for testing if PDF have been OCRed with Powershell ?

Thanks in advance, ;)

