ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE. Software > Post New Requests Here

REQ: CLI : basically unzip + run PDFTextChecker and move resulting files

<< < (3/4) > >>

Hi again,

Would this help for testing if PDF have been OCRed with Powershell ?

Thanks in advance, ;)

In fact I think the last big change for WinRAR was the new Version 5 in 2017 ( And alas 7Zip4Powershell won't be updated soon (see and
edit: "7-Zip v15.06 and later support extraction of files in the RAR5 format"
I did a test with an old rar file and it worked.-jity2 (November 16, 2019, 04:25 AM)
--- End quote ---

I think the problem is with the intermediate DLL it uses, SevenZipSharp, which is actively maintained and did get updated for RAR5: SevenZipSharp

If I can get an updated DLL then it should work OK - might have to reinstall Visual Studio and compile it.

In the meantime I'll get it to use unrar instead.

Changed my mind and just made it use 7z.exe, see the updated script.  Just copy the latest versions of 7z.exe and 7z.dll into the same folder as the script.

Still doesn't do extracted archives ... still thinking about it  :P

You can uninstall the 7Zip4Powershell module by opening an Admin Powershell console and entering:

--- Code: PowerShell ---uninstall-module -name 7Zip4Powershell

Hi 4wd,

Many thanks for your detailed explanations. ;) Your last code update is working great for 'unzipping' zip and rar files. ;)  :Thmbsup:

(A simple note for those following : I have downloaded this 7zip version ( that I have extracted it in "C:\Program Files\7-Zip". Then I have copied "7z.exe" and "7z.dll" in my working directory.)

Still doesn't do extracted archives ... still thinking about it  :P
--- End quote ---

I thought that if I run it twice it would find them (example : folder1\6\6\  during the next run (which would be fine) but no.

Thanks in advance ;)

Okey dokey, most of the way there now ...

* Deletes duplicate archives
* Extracts archives, (including in sub-folders)
* Deletes small PDFs and empty folders
* Checks for text/image based PDFs and moves them into different folders, (recreating folder tree) - all it does is count the number of text lines in the first 5 pages, (or less if there's less than 5 total), and if the number is greater than a set threshold regards it as a text based PDF
Only thing left is the long names bit I think.


Hi 4wd,
Wow. Many thanks. This is fantastic. ;)

I did some tests and the only small things that are not working are :
- it doesn't delete empty folders of folder1 (if some subfolders are empty or not with other kind of files).


--- ---Testing for text PDFs ...
Move-Item : Cannot retrieve the dynamic parameters for the cmdlet. The specified wildcard character pattern is not valid: [Lac_ven_Drnyvn,_Sr_ehajoe_Uduizn,_Giles_Suilo-Sm(
At C:\Users\E\Documents\test\jityPDFt3v5_7zip.ps1:82 char:7
+       Move-Item "$($files[$i])" -Destination "$($outfile)"
+       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidArgument: (:) [Move-Item], ParameterBindingException
    + FullyQualifiedErrorId : GetDynamicParametersException,Microsoft.PowerShell.Commands.MoveItemCommand
Then, the test "for text PDFs" stops and the other same kind of files are not processed.

Also if the pdf file has "[" in its filename, it ignores it (but creates an empty folder in folder2 if the pdf is located into a subfolder).

And the great thing is that it finds encrypted pdf with or without OCR already done moves them accordingly. ;)

Thanks in advance ;)


[0] Message Index

[#] Next page

[*] Previous page

Go to full version