topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Friday April 19, 2024, 8:01 pm
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Topics - bhuiraj [ switch to compact view ]

Pages: [1]
1
I need a small app that can remove duplicate entries from within a single large (anywhere from a couple hundred MB to several GB in size) text file. I have tried using several apps to do this, but they all crash, possibly because they run out of free RAM to use (loading an 800MB text file results in over 1.5GB of RAM being used by the apps). Text files are in the format of one word per line:

donation
coder
<3
cool
cat
Mike

If there is some workaround (including splitting files up into multiple parts and then comparing each one to the rest once to make sure there are no duplicates), it would be great.

Please see https://www.donation...ex.php?topic=26416.0 for sample dictionary files and let me know if you have any questions. Thank you!

2
IT security is a hobby of mine and one aspect requires building large (from hundreds of megabytes to several gigabytes) dictionary (text) files in the format of one word on each line. For example:

donation
coder
<3
cat
dog
Mary

Currently, my collection includes several gigabytes of dictionaries spread out over dozens of text files (the largest being over 30GB, while most are between a few MB and 1.5GB). The problem is that I waste days and even weeks of processing time because of all the duplicate entries between files. So, I need a program that will let me select a master list and multiple additional lists and do any of the following (all functions would be ideal, but it's entirely up to whatever you are comfortable doing and consider a coding snack; if you can code an app to do all of these functions and they work with large text files, I can make a small donation to you as a thank you):
1) compare the master list to the additional lists and remove the duplicate entries from the additional lists
2) compare the additional lists to the master list and remove the duplicate entries from the master list (the opposite of #2)

I have tried a couple public programs, but they crash/error out when I try to use multiple files or files that are over a couple hundred megabytes. I came across posts suggesting that it might be because of RAM limitations, but I don't think that this is the case, since I have over 3GB of RAM.

I hope that someone will be able to do this and apologize if I have not given enough detail. Please let me know if you have any questions about this. Thank you!

Pages: [1]