ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

DonationCoder.com Software > Finished Programs

DONE: Delete double lines (all but the first) in a text file

(1/6) > >>

korad:
Hello,

sorry for my poor english.

I am looking for an application which is able to delete double entries in a large text file. I did only find a macro for UltraEdit, but if the file is greater than 1 mb it hangs. I am sure that there is already such an app available, but I couldn´t find it with google. I could only find other people looking for such a piece of software :) Sometimes I code some little things in vbs, but I am a absolute beginner. I know I have to create 2 further files:

File 1: already available master file
File 2: Temporäry File
File 3: Results File

cut first (not empty) line from file 1 and paste it to file 2
delete all lines in file 1 that are equal to this line
cut line 1 in file 2 and paste it to file 3
etc. etc.

I would appreciate some help.

Many thanks :)

chrisi

jgpaiva:
I made a script similar to your request for a recent request.
I'll modify it and post it here so as you can test.
But i have to warn you: it'll be slow, and it'll be limited to a max of 64mb of text.
Anyways, i'll give it a go.
(it's a script in ahk, if it was made in C, i'm sure it'd be about a million times faster, but I don't remember C too well, and I can't compile it for windows)

skrommel:
 :-[ I really should start reading the whole posts!

Here's one, but it sorts the file, and it's limited to 1 GB.

Skrommel



--- ---;DelDuplicates.ahk
; Removes duplicate lines from a text file
; To run, download and install AutoHotkey from www.autohotkey.com
;Skrommel @2006

infile=C:\Temp\in.txt
outfile=C:\Temp\out.txt

#MaxMem 1024
SetBatchLines,-1
FileRead,file,%infile%
If ErrorLevel=0
{
  Sort,file,U
  FileDelete,%outfile%
  FileAppend,%file%,%outfile%
  file=
}



Try this one!

Skrommel


;DelDouble.ahk
; Removes double lines from text files
; To run, download and install AutoHotkey from www.autohotkey.com
;Skrommel @2006

fromfile=C:\Temp\in.txt
tofile=C:\Temp\out.txt

SetBatchLines,-1
FileDelete,%tofile%
prevline=
Loop,Read,%fromfile%
{
  If A_LoopReadLine<>%prevline%
    FileAppend,%A_LoopReadLine%`n,%tofile%
  prevline=%A_LoopReadLine%
}

PhilKC:
Pseudo code:


--- ---StreamReader in = new StreamReader(inputFile);
String[] lines = in.ReadToEnd().Split("\r\n".ToCharArray());
in.Close();
ArrayList checker = new ArrayList();
for (int i=0;i<lines.Length;i++)
     if (!checker.Contains(lines[i]))
          checker.Add(lines[i]);
StreamWriter out = new StreamWriter(outputFile);
for (int i=0;i<checker.Count;i++)
     out.WriteLine(checker(i));
out.Close();
That was from memory, so, I have no idea if it would compile... (It's C# :P)

PhilKC

jgpaiva:
Here's the modified version i mentioned.
It's algorithm is quite good, but ahk is a script language, so, it takes more time than C, for sure.
It took 1 minute 45 seconds to find repeated entries on a 9000 lines file, on my laptop centrino 2.0.
Still, it does solve your problem.
Doesn't alter the initial file, but the file created doesn't have the repeated entries.
It has a small bug: the progress bar doesn't correspond to the truth. In the end of the file, it's way faster than in the beggining. Just leaving the heads-up, in case you start thinking about giving up at the beggining.
It is supposed to be able to hadle 64mb of plain text, by the ahk references.

Hope it solves your problem.
(btw: the .ahk file needs autohotkey to run, and the exe file only accepts a file called "textfile.txt" as input, and only outputs to a file called "out.txt". Both are in the attached compressed file)

.exe version
.ahk version

Navigation

[0] Message Index

[#] Next page

Go to full version