Welcome Guest.   Make a donation to an author on the site October 24, 2014, 01:09:30 PM  *

Please login or register.
Or did you miss your validation email?


Login with username and password (forgot your password?)
Why not become a lifetime supporting member of the site with a one-time donation of any amount? Your donation entitles you to a ton of additional benefits, including access to exclusive discounts and downloads, the ability to enter monthly free software drawings, and a single non-expiring license key for all of our programs.


You must sign up here before you can post and access some areas of the site. Registration is totally free and confidential.
 
The N.A.N.Y. Challenge 2011! Download 30+ custom programs!
   
   Forum Home   Thread Marks Chat! Downloads Search Login Register  
Pages: Prev 1 [2]   Go Down
  Reply  |  New Topic  |  Print  
Author Topic: IDEA: Compare dictionary (text) files and remove duplicate entries  (Read 12824 times)
MilesAhead
Member
**
Posts: 4,947



View Profile WWW Give some DonationCredits to this forum member
« Reply #25 on: May 02, 2011, 05:41:35 PM »

How would I use/apply this? smiley

I would hazard a guess that most apps that use a "one word per line" flat text file dictionary just suck the whole file into ram and split on the end of line marker.  For example AutoIt3 has the user defined function _FileReadToArray().  If the file fits in ram it's trivial. Each array element is a line of the file.  Most dictionaries I've used are less than 2 MB in size.

You haven't specified if you're using any particular software most often to access this data.

Logged

"Genius is not knowing you can't do it that way."
- MilesAhead
MilesAhead
Member
**
Posts: 4,947



View Profile WWW Give some DonationCredits to this forum member
« Reply #26 on: May 02, 2011, 06:13:37 PM »

Just for grins I started a thread on a db forum:

http://forums.databasejou...d.php?p=129101#post129101

If anyone uses the terms "password cracking" or "dictionary attack" you're on your own!!


Logged

"Genius is not knowing you can't do it that way."
- MilesAhead
bhuiraj
Supporting Member
**
Posts: 21

View Profile Give some DonationCredits to this forum member
« Reply #27 on: May 02, 2011, 06:51:02 PM »

How would I use/apply this? smiley

I would hazard a guess that most apps that use a "one word per line" flat text file dictionary just suck the whole file into ram and split on the end of line marker.  For example AutoIt3 has the user defined function _FileReadToArray().  If the file fits in ram it's trivial. Each array element is a line of the file.  Most dictionaries I've used are less than 2 MB in size.

You haven't specified if you're using any particular software most often to access this data.


I use different pieces of software, so there isn't one specific one that I only use. All of them support this plain dictionary/wordlist/text file format.

Just for grins I started a thread on a db forum:

http://forums.databasejou...d.php?p=129101#post129101

If anyone uses the terms "password cracking" or "dictionary attack" you're on your own!!



lol thanks smiley
Logged
DK2IT
Supporting Member
**
Posts: 6


see users location on a map View Profile Give some DonationCredits to this forum member
« Reply #28 on: May 03, 2011, 03:03:20 AM »

How would I use/apply this? smiley
Just use some DB manager for SQLite, like this SQLite Database Browser, or the command line version or there are many other programs.

I don't see what you suggested that I didn't already in this post:
http://www.donationcoder....26416.msg245865#msg245865
Nothing of new, just a real implementation, because we don't know how fast is a DB with a keyword as a key. And I can say that is very fast and do not need so much ram, but need hard disk space. Maybe enterprise DB (like Oracle/MySQL/etc.) can handle GB of data better than SQLite, but the system is the same.
Of course, you must find the right program to handle, because some GUI App (like SQLite DB Browser) load the file into ram and need over 1GB for that file of 100MB. The command line version, need only about 3MB instead.
Logged
MilesAhead
Member
**
Posts: 4,947



View Profile WWW Give some DonationCredits to this forum member
« Reply #29 on: May 03, 2011, 07:24:16 PM »

Quote
Nothing of new, just a real implementation, because we don't know how fast is a DB with a keyword as a key. And I can say that is very fast and do not need so much ram, but need hard disk space. Maybe enterprise DB (like Oracle/MySQL/etc.) can handle GB of data better than SQLite, but the system is the same.
Of course, you must find the right program to handle, because some GUI App (like SQLite DB Browser) load the file into ram and need over 1GB for that file of 100MB. The command line version, need only about 3MB instead.

I added a reply to the "ask the expert" thread I started saying there has to be a "worst case scenario" with keys-only db likely to be it.  That was yesterday. I see it hasn't cleared the moderator. I think they don't really want to bring up the Achilles Heel.  I doubt I'll see my reply.

To really test this out you should have some method that directly accesses the flat file.  Compare it for speed vs. overhead.  A dummy run of a few MB doesn't mean anything.  Just about any manipulation all in ram is going to be fast.  We need a comparison of db and non-db access say for an 8 GB flat file of words.  Then see what happens.

I would tend to guess the db overhead would not be worth the effort compared to direct flat file access and manipulation for simple search. Also I suspect if you made a 34 GB table of keys, the db would crash on the OP's machine. smiley
Logged

"Genius is not knowing you can't do it that way."
- MilesAhead
DK2IT
Supporting Member
**
Posts: 6


see users location on a map View Profile Give some DonationCredits to this forum member
« Reply #30 on: May 04, 2011, 12:48:49 PM »

Of course this is a quick and fast solution for bhuiraj, it's not optimal, but do not require special software. I've tested 1.5Gb of data for over 227 millions of words, and the DB is quite big and the search are not so fast. But, of course, if you need speed you can use DB like mysql or oracle using a fine tuned configuration (memory index, cache query, partition table, etc.).
In this case, however, is possible create an optimal solution (without the generic DB overhead), but you need to create a specific software to handle a very very big dictionary.
Logged
MilesAhead
Member
**
Posts: 4,947



View Profile WWW Give some DonationCredits to this forum member
« Reply #31 on: May 04, 2011, 03:29:35 PM »

Where many of the entries are variations on the same base, user01 user02 user1979 user1980 etc..  my last suggestion would be only store the "base" of the dictionary entry and generate the variations.  That way you'd only have to store "user" on disk and have the algorithm generate all the offshoots.

I'm no DB expert. Haven't read any Codd in over 20 years. I think I'm at the limit of what I can contribute. smiley


Logged

"Genius is not knowing you can't do it that way."
- MilesAhead
DK2IT
Supporting Member
**
Posts: 6


see users location on a map View Profile Give some DonationCredits to this forum member
« Reply #32 on: May 05, 2011, 07:09:05 AM »

Quote
Where many of the entries are variations on the same base, user01 user02 user1979 user1980 etc..  my last suggestion would be only store the "base" of the dictionary entry and generate the variations.
And that can be an interesting idea  Thmbsup
Logged
Pages: Prev 1 [2]   Go Up
  Reply  |  New Topic  |  Print  
 
Jump to:  
   Forum Home   Thread Marks Chat! Downloads Search Login Register  

DonationCoder.com | About Us
DonationCoder.com Forum | Powered by SMF
[ Page time: 0.03s | Server load: 0.23 ]