topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Thursday December 12, 2024, 1:38 am
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: active extension filter for archive drive  (Read 4235 times)

questorfla

  • Supporting Member
  • Joined in 2012
  • **
  • Posts: 570
  • Fighting Slime all the Time
    • View Profile
    • Donate to Member
active extension filter for archive drive
« on: November 25, 2015, 01:02 AM »
In the office we have a system with a drive that is dedicated to archiving mostly Word Documents but also other specific data file types.  In theory, it should contain only data files but in fact, it contains everything from temp files to executable to just about every possible type that can be copied from one place to another.

There are, however, a multitude of file-types that are absolutely worthless to anyone in the context for which the use was originally envisioned.  Half the space and probably 50% of the files by filename are actually worthless and never used.  They only take up space and slow down searches.

From time to time, I have used "Everything.exe" (the Best File Search engine I have ever seen...<a small 'plug' to Voidtools.com for making it :)> to locate the *.tmp and *.exe etc. and delete them all.  But it takes quite a lot of time and I am never sure I have gotten out all the various files extensions that just waste space and never will be of use to anyone.

It occurred to me that it would be nice to have two tools for this.  One would be something like the program "Filesize" to create a graph showing how much space is used by each of the various file extensions.  But most useful would be a "filter by extension" that could be applied to any files copied to the drive to either filter out or allow in each file based on its extension.  I am inclined to think it best to "block by specific extension" because it would give more control over unnecessary files and not possibly block those that I am just not familiar with yet. 

This probably sounds a bit off the wall(and I am sure it is) but when a 4TB drive is half full of worthless files and only 2 TB of it is data anyone really needs to keep, it would save upgrading to another drive just to be able to store more worthless trash as well as slowing down searches for the specific files that are needed.

Most users just drag and drop entire folder groups in order to get just the documents contained in them.  There are far too many ways to get the "trash in with the treasure" and no easy way to prevent it from being stored there.

This was a "Way-Out-There" solution that I thought might actually exist somewhere.  A "file-by-extension-filter"   (like a firewall for filenames)
that could stop it from ever happening in the first place instead of trying to skim them all out after the fact.  And one to make a nice graphic chart showing usage by extension would give me a way to prove or disprove the need for such a filter in the first place.

If anyone has ever written such, i figure someone here would know about it!   :D

IainB

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 7,544
  • @Slartibartfarst
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: active extension filter for archive drive
« Reply #1 on: November 25, 2015, 04:37 AM »
Off the top of my head: (not sure whether this is what you are looking for)
If you listed in Everything the contents of a drive's  files filtered by only the particular extension - e.g., (say) .BAK - that you wanted to inspect, and displayed the file path and the file size columns in the resulting listing, then you could use Nirsoft's SysExporter (which grabs data from list-view, tree-view, combo box, WebBrowser control, and text-box) to export the data to an Excel spreadsheet, and carry out a quick  analysis of the numbers of files with that extension and the total disk space occupied by them in their various sub-directories and the disk in total.
You could then make some information-based decisions, including for example either:
  • (a) determine whether to filter files of this type out of the periodic backup process (i.e., ignore them for backup purposes), or
  • (b) delete them from backup if your backup software did not enable you (or was not set) to selectively ignore specified file types.

Shades

  • Member
  • Joined in 2006
  • **
  • Posts: 2,939
    • View Profile
    • Donate to Member
Re: active extension filter for archive drive
« Reply #2 on: November 25, 2015, 06:11 AM »
- If the 4TByte disc is in a Windows PC and shared on the network, an easy solution would be to run a piece of software called 'Belvedere' on it (AHK script turned into an executable, according to its github page). You could use this with the Windows Task scheduler to clean up the 4TByte disk at regular intervals.

- There is also the option to use 'Belvedere' on every user PC that could pump only the desired data to the 4TByte disc at scheduled intervals. You would need to tell every user that they are not allowed to make backups themselves anymore.

A combination of the two is also a possibility of course.

To my knowledge your network isn't making use of AD, just a Workgroup. Still, you could lock down the shared folder(s) on the 4TByte disc (only a specific Windows user account or Windows user group is able to write) and create a specific Windows account on each user PC that can be used to execute 'Belvedere' with the task scheduler.

Perhaps you can manage this whole task with the Windows task scheduler alone. 'Belvedere' does make automated file management easier. It's GPL and last updated in 2012. Then again, how much development can you put in this piece of software without adding bells and whistles that are not part of its scope. But if you are worried, on the github page you can get the source script, which you can alter as you see fit.

IainB

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 7,544
  • @Slartibartfarst
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: active extension filter for archive drive
« Reply #3 on: November 25, 2015, 10:25 AM »
^^ Having trialed Belvedere, I reckon that it might be able to fit the bill - more or less. However, I recall that it was a real CPU resource hog, which was why I stopped using it - and that was just when it was monitoring only a few files/folders on my hard drive.

UPDATE 2015-11-26 0817hrs:
   For information, these points are copied from my notes:

The github page indicates that development/maintenance stopped about 4 years ago (i.e., 2012), so v0.7.1 is still the current/"latest" version and it will presumably have the same CPU overload characteristics. I had the thing set to run at startup, but had to keep terminating the process due to it consistently maintaining a high CPU overhead even when it was "doing nothing". It looked to me as though the constant monitoring was keeping the proggie in a perpetual and inefficient bind (redolent of a certain early queuing algorithm in IBM's VM/CMS...).
I reckoned the concept was very good though    :up: , and perhaps if I had had the time and inclination then I might have tinkered about with the code, but I didn't have either.
« Last Edit: November 25, 2015, 01:21 PM by IainB »

4wd

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 5,644
    • View Profile
    • Donate to Member
Re: active extension filter for archive drive
« Reply #4 on: November 26, 2015, 06:17 AM »
From time to time, I have used "Everything.exe" (the Best File Search engine I have ever seen...<a small 'plug' to Voidtools.com for making it :)> to locate the *.tmp and *.exe etc. and delete them all.

You could create a Bookmark in Everything that has a RegEx that specifies all the extensions you want to list for removal.

Then when you run Everything, you only need enter, for example, badext: - it would list every file with the required extensions, you then select and delete them all at once.

eg. Here's a Bookmark I use for *.jpg_original files, (created by EXIFTool):

2015-11-26 23_14_35.pngactive extension filter for archive drive

I just enter ojpg: and it lists all *.jpg_original files over the whole system.

If you tell us what extensions you commonly remove it should be a simple matter to create a RegEx that would match them all.

An example, bookmark for .bak, .tmp, and .old:

2015-11-27 14_30_53.pngactive extension filter for archive drive

2015-11-27 14_35_14.pngactive extension filter for archive drive

2015-11-27 14_30_35.pngactive extension filter for archive drive
« Last Edit: November 26, 2015, 09:40 PM by 4wd »