ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

DonationCoder.com Software > Post New Requests Here

REQ: CLI : basically unzip + run PDFTextChecker and move resulting files

<< < (4/4)

4wd:
Hi 4wd,
Wow. Many thanks. This is fantastic. ;)-jity2 (November 18, 2019, 12:36 PM)
--- End quote ---

You're welcome  :)

I did some tests and the only small things that are not working are :
- it doesn't delete empty folders of folder1 (if some subfolders are empty or not with other kind of files).
--- End quote ---

It just needs to call the Delete-SmallPDF function again after doing the Check-PDF function.

Not sure whether you mean folder1 should be completely empty at the end or not since files that aren't PDF will be remaining - ie. delete everything in folder1 after running the Check-PDF function.

In which case, what happens to the non-PDF files that were in the folders?
Delete or move with PDF?


--- ---Testing for text PDFs ...
Move-Item : Cannot retrieve the dynamic parameters for the cmdlet. The specified wildcard character pattern is not valid: [Lac_ven_Drnyvn,_Sr_ehajoe_Uduizn,_Giles_Suilo-Sm(e-kjd.org).pdf
At C:\Users\E\Documents\test\jityPDFt3v5_7zip.ps1:82 char:7
+       Move-Item "$($files[$i])" -Destination "$($outfile)"
+       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidArgument: (:) [Move-Item], ParameterBindingException
    + FullyQualifiedErrorId : GetDynamicParametersException,Microsoft.PowerShell.Commands.MoveItemCommand
Then, the test "for text PDFs" stops and the other same kind of files are not processed.

Also if the pdf file has "[" in its filename, it ignores it (but creates an empty folder in folder2 if the pdf is located into a subfolder).
--- End quote ---

The replace strange characters in the filename was going to be part of the too long filename function, just haven't got there yet.

Can you zip up some of the PDFs it stops on, (strange characters, etc - 3 or 4 would be good) ?

Unfortunately, going to be a bit busy this week so might not get back to this until next week, I'll see how I go.

EDIT: OK, besides having a full path less than 260 characters, you can apparently only have a full directory path of 247 characters maximum.

ie.

C:\plus 243 characters\                            (max 247 chars)
C:\plus 243 characters\16characters.pdf    (max 259 chars)

Otherwise you get this:

--- ---New-Item : The specified path, file name, or both are too long. The fully qualified file name must be less than
260 characters, and the directory name must be less than 248 characters.
ADDENDUM: Give the update a try, it should remove any special characters from file names and in theory it'll shorten filenames if it's the length of the filename that pushes the total path length over the 259 character limit.

If, however, it's the path that exceeds the 247 character limit then the file won't be touched, it'll remain in the initial folder ... so goes the theory :-\

ie.
- If the filename has diacritics and various other strange characters, they'll be removed. (At this point no rename happens.)
  An example: This: François-Xavier!!#@$%^&()_+}{ €$¥£¢ ^$.+()[{ 0123456789.pdf will turn into this: FrancoisXavier()_ Yc () 0123456789.pdf
- If the folder path is less than 248 characters and the full path is less than 259 characters, the file will be renamed.
- If the folder path is less than 248 characters and the full path is greater than 259 characters, the new filename will be truncated and then the file is renamed.
- If the folder path is greater than 247 characters, nothing happens - the file isn't renamed, it will remain in the initial folder.

I might have to tweak the Get-ChildItem statement in the Check-PDF routine to ignore file paths greater than 259 characters, see how you go.

Currently doesn't check for the existence of a file with the same name before renaming.

jity2:
Hi 4wd,

Thank you so much for your updated code. ;) This is working great. ;)

Not sure whether you mean folder1 should be completely empty at the end or not since files that aren't PDF will be remaining - ie. delete everything in folder1 after running the Check-PDF function.

In which case, what happens to the non-PDF files that were in the folders?
Delete or move with PDF?
--- End quote ---
Keeping (like you have already done) what is left inside folder1 is fine. ;)


I did some tests and here is a zip files with examples :
The main bug (see "2-itextsharp.pdfa") is when in folder1 a folder is named "bla.pdf" it causes a "stackoverflow" bug in powershell (it closes Powershell and restarts it when I run it in edit mode).
The other thing is strange filenames (maybe some asian text?). Edit 1: it is because on my computer the folder path is too long on some of those one ! Otherwise it is renaming them fine ! ;)

Others are detecting invalid pdf files (see "1-malformed_pdf"- from what I understand there are at least 2 problems : 1)Invalid pdf file and 2)The image file format has not been recognized  ). From experience it is a complex problem and I think it is better if I do it by hand with PDFinfoGUI (*) and remove them with an excel macro as I can see very fast if I need to download again some important files. So please forget about those. ;)
(*) neither yes or no in column encrypted and other columns - Then I copy the list - except the important one
https://filebin.net/t5ai1inqcfw7vp67/20191121tests.zip?t=6qfxn4l3

Also, many thanks for the detailed explanations of long names. I appreciated.;) Your truncate filename current code is just already very fine for me. ;) Thanks. ;)

Currently doesn't check for the existence of a file with the same name before renaming.
--- End quote ---
If I understand well it is because "folder1\venise-.pdf" would have the same name of a file already available in folder3 "folder3\venise.pdf". Renaming the new one (for instance with a counter "venise1.pdf" would be fine).

I have added a small function in order to delete empty folders in folder3
Function Delete-SmallPDF2 {
  param (
    [bool]$delSmall
  )
  if ($delSmall) {
    Get-ChildItem "$($folder3)\*" -Include *.pdf -Recurse | ? {$_.length -lt 2048} | % {Remove-Item $_.fullname -Force}
  }
  Get-ChildItem "$($folder3)" -recurse | Where {$_.PSIsContainer -and `
    @(Get-ChildItem -Lit $_.Fullname -r | Where {!$_.PSIsContainer}).Length -eq 0} | Remove-Item -recurse
}
--- End quote ---


Edit 2:
I forgot I had this error message :

--- ---PS C:\Windows\System32\WindowsPowerShell\v1.0> C:\Users\E\Documents\tests\jityPDF.ps1
Add-Type : Cannot bind parameter 'Path' to the target. Exception setting "Path": "Cannot find path 'C:\Windows\System32\WindowsPowerShell\v1.0\itextsharp.dll' because it does not exist."So I have copied "itextsharp.dll" in C:\Windows\System32\WindowsPowerShell\v1.0\itextsharp.dll
It may explain why if I try to use 4wd's code in another hard drive (example : L:\) even if I copied the 7z.dll, 7z.exe and itextsharp.dll files and adapt the code for new locations, the script doesn't show errors but it fails to run properly the Check-PDF part. It moves some image based pdf in folder2 instead of folder3 for some files ? So I stay in C:\ ;)


Thanks in advance ;)
Jity2

4wd:
Edit 2:
I forgot I had this error message :

--- ---PS C:\Windows\System32\WindowsPowerShell\v1.0> C:\Users\E\Documents\tests\jityPDF.ps1
Add-Type : Cannot bind parameter 'Path' to the target. Exception setting "Path": "Cannot find path 'C:\Windows\System32\WindowsPowerShell\v1.0\itextsharp.dll' because it does not exist."So I have copied "itextsharp.dll" in C:\Windows\System32\WindowsPowerShell\v1.0\itextsharp.dll
It may explain why if I try to use 4wd's code in another hard drive (example : L:\) even if I copied the 7z.dll, 7z.exe and itextsharp.dll files and adapt the code for new locations, the script doesn't show errors but it fails to run properly the Check-PDF part. It moves some image based pdf in folder2 instead of folder3 for some files ? So I stay in C:\ ;)-jity2 (November 21, 2019, 09:09 AM)
--- End quote ---

That's because with starting the script from outside the folder it resides in, the working directory is no longer it's folder.

PS C:\Windows\System32\WindowsPowerShell\v1.0> C:\Users\E\Documents\tests\jityPDF.ps1

Since you started the script from the folder in bold above, that becomes the working directory so if a file that's referenced by '.\' isn't found within that folder it'll fail with the above error.

PS C:\Users\E\Documents\tests> C:\Users\E\Documents\tests\jityPDF.ps1
PS C:\Users\E\Documents\tests> .\jityPDF.ps1

Changing your current directory to the same as the script would have let either of the above work OK.

An easier way would be just to use a shortcut and set the Start in parameter once you've set up the folder1/folder2/folder3 variables, then you can just double-click the shortcut.

I've attached an example shortcut.

I'll have a look at the other things.

Navigation

[0] Message Index

[*] Previous page

Go to full version