1
Post New Requests Here / REQ: CLI : basically unzip + run PDFTextChecker and move resulting files
« on: November 14, 2019, 02:41 AM »
Dear all,
I am trying to speed up my monthly archiving manual work by doing some automatic things. I am on Win8.1 64 bits home.
[0) I am using Synckback pro V8 to periodically move files (zip, rar, pdf, jpg and png) to a folder named "folder1".
When this Synckback pro profile has run, it can run a program (see image
) after the above files has been moved ("Run after profile").
I have discovered that I can run a .bat file which itself run several bat files (https://stackoverflo...es-within-a-bat-file). ]
The main.bat file (or powershell?) should do the following :
1) delete duplicate files (MD5 checksum?)
(maybe using this powershell script https://n3wjack.net/...ith-just-powershell/ ?)
2) unzip+unrar zip and rar files recursively (and once done delete the originals) of "folder1"
I thought using this old coding snack in ahk (RecurUnZip https://www.donation....msg192366#msg192366) but I need something automatic as my "folder1" path doesn't change.
(I have tried to adapt this powershell code with its comments alas unsucessfully https://superuser.co...-the-archives/620077 !)
3) Reduce filepath to less than 260 characters (because some old programs can't open filepath that are long than 260 characters them later on). I manually use "Path Scanner" (http://www.parhelia-...canner/Download.aspx)
4) Delete pdf files that are less than 2ko (because they are garbage) (I manually do this by using the freeware Everything and rank by size pdf files)
5) Run PDFTextChecker (https://www.donation....msg255322#msg255322) on "folder1" by itself (it creates 2 files "!Not_Searchable.txt" and "!Searchable.txt"). Move the Not_Searchable pdf files in "folder2" and Searchable files in "folder3" (maybe using this old coding snack https://www.donation....msg330784#msg330784 ?).
Then I use ABBYY Finereader 12 Corporate (not the last version which limits the number of page you can OCR per month !) which does an OCR automatically every day of "folder2" (be careful if you follow this process as it sometimes delete pdf files without warnings! So use the options to keep original files in a separated folder! Then at the end of the month I use manually the freeware "pathsync"(https://www.cockos.com/pathsync/) to check differences and find those Finereader bugs!)
Many thanks in advance,
Jity2
I am trying to speed up my monthly archiving manual work by doing some automatic things. I am on Win8.1 64 bits home.
[0) I am using Synckback pro V8 to periodically move files (zip, rar, pdf, jpg and png) to a folder named "folder1".
When this Synckback pro profile has run, it can run a program (see image
I have discovered that I can run a .bat file which itself run several bat files (https://stackoverflo...es-within-a-bat-file). ]
The main.bat file (or powershell?) should do the following :
1) delete duplicate files (MD5 checksum?)
(maybe using this powershell script https://n3wjack.net/...ith-just-powershell/ ?)
2) unzip+unrar zip and rar files recursively (and once done delete the originals) of "folder1"
I thought using this old coding snack in ahk (RecurUnZip https://www.donation....msg192366#msg192366) but I need something automatic as my "folder1" path doesn't change.
(I have tried to adapt this powershell code with its comments alas unsucessfully https://superuser.co...-the-archives/620077 !)
3) Reduce filepath to less than 260 characters (because some old programs can't open filepath that are long than 260 characters them later on). I manually use "Path Scanner" (http://www.parhelia-...canner/Download.aspx)
4) Delete pdf files that are less than 2ko (because they are garbage) (I manually do this by using the freeware Everything and rank by size pdf files)
5) Run PDFTextChecker (https://www.donation....msg255322#msg255322) on "folder1" by itself (it creates 2 files "!Not_Searchable.txt" and "!Searchable.txt"). Move the Not_Searchable pdf files in "folder2" and Searchable files in "folder3" (maybe using this old coding snack https://www.donation....msg330784#msg330784 ?).
Then I use ABBYY Finereader 12 Corporate (not the last version which limits the number of page you can OCR per month !) which does an OCR automatically every day of "folder2" (be careful if you follow this process as it sometimes delete pdf files without warnings! So use the options to keep original files in a separated folder! Then at the end of the month I use manually the freeware "pathsync"(https://www.cockos.com/pathsync/) to check differences and find those Finereader bugs!)
Many thanks in advance,

Jity2