ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

DonationCoder.com Software > Post New Requests Here

IDEA: Compare dictionary (text) files and remove duplicate entries

<< < (7/7)

DK2IT:
Of course this is a quick and fast solution for bhuiraj, it's not optimal, but do not require special software. I've tested 1.5Gb of data for over 227 millions of words, and the DB is quite big and the search are not so fast. But, of course, if you need speed you can use DB like mysql or oracle using a fine tuned configuration (memory index, cache query, partition table, etc.).
In this case, however, is possible create an optimal solution (without the generic DB overhead), but you need to create a specific software to handle a very very big dictionary.

MilesAhead:
Where many of the entries are variations on the same base, user01 user02 user1979 user1980 etc..  my last suggestion would be only store the "base" of the dictionary entry and generate the variations.  That way you'd only have to store "user" on disk and have the algorithm generate all the offshoots.

I'm no DB expert. Haven't read any Codd in over 20 years. I think I'm at the limit of what I can contribute. :)


DK2IT:
Where many of the entries are variations on the same base, user01 user02 user1979 user1980 etc..  my last suggestion would be only store the "base" of the dictionary entry and generate the variations.
--- End quote ---
And that can be an interesting idea  :Thmbsup:

Navigation

[0] Message Index

[*] Previous page

Go to full version