topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Friday March 29, 2024, 2:16 am
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: delete all the lines that do NOT start with a specific string in multiple files  (Read 9344 times)

kalos

  • Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,823
    • View Profile
    • Donate to Member
hello

I have an amount of text files and I want to delete all the lines that do NOT start with a specific string

how can I do this?

thanks

EDIT: the program must support unicode characters

housetier

  • Charter Honorary Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 1,321
    • View Profile
    • Donate to Member
on linux a combination of grep and the shell's for loop could accomplish this:

export BAN="whatever should not be kept"
mkdir after
for textfile in *.txt; do grep $textfile -v ^$BAN > after/$textfile; done

This should work if the files all have a .txt extension.

"grep -v" prints lines that do NOT match; "^" means beginning-of-line; "$BAN" will be replaced by whatever you put between the quotation marks in the export statement above. The output from grep is then put into a new file with the same filename, but in a different directory.

If you have cygwin, you can do this under windows as well.

bgd77

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 203
    • View Profile
    • Donate to Member
kalos, could you specify if this is for Windows or Linux?

TucknDar

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 1,133
    • View Profile
    • Donate to Member
edit:
completely misunderstood... sorry

kalos

  • Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 1,823
    • View Profile
    • Donate to Member
it's for winxp
thanks

Ehtyar

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 1,237
    • View Profile
    • Donate to Member
Well, I now despise Windows for having such a totally sucky console, but I managed to get this working.

As an exercise for myself I compiled GNU grep myself. It required a couple of patched but all-in-all relatively simple. If you're not capable of this yourself, you can get an only slightly outdated binary from UnxUtils (I've included my build in the attached zip). Converting the shell script to batch was a huuuge pain in the ass because (again) the Windows console sucks, and doesn't honor piping symbols in a for loop in the same way it would as a standalone command (you have to enclose your command in parens prior to the >, < or | symbol). God help you trying to find that in Google (I ended up randomly guessing it). Finally, grep on Windows converts CRLF line endings to plain LF (for internal operational reasons), so we need the todos utility to convert them back again.

So here you are, give it a try and let me know how it goes.

Ehtyar.

[edit]
Er...should this not be moved to the coding snack new request forum?
[/edit]

[edit2]
If you have cygwin, you can do this under windows as well.
MSYS will manage it also, minus the enormity of Cygwin.
[/edit2]
« Last Edit: March 14, 2009, 05:49 PM by Ehtyar »

bgd77

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 203
    • View Profile
    • Donate to Member
I think that the only way to do it is how Ehtyar suggested. i'm no expert in Windows Command Prompt, but I do not think that it is powerful enough to make changes in files. If someone knows better, please correct me!

One alternative would be to install ActivePerl or Python and to write a small program to do this, if you think that all the trouble is worthwhile.

Ehtyar

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 1,237
    • View Profile
    • Donate to Member
*sigh*, life would indeed be easier if Windows came with a posix-compatible shell or Strawberry Perl, but alas, here I am passing around utilities that others would've had on almost any other operating system. I think, however, that this is a debate for another day.

kalos, would you let us know how it goes please?

Ehtyar.

rjbull

  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 3,199
    • View Profile
    • Donate to Member
Converting the shell script to batch was a huuuge pain in the ass because (again) the Windows console sucks, and doesn't honor piping symbols in a for loop in the same way it would as a standalone command

Ehtyar,

Take a look at Horst Schaeffer's List MODifier LMOD, which is a great workaround for making batch files to get round these kind of problems.


Ehtyar

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 1,237
    • View Profile
    • Donate to Member
I do like the sound of that utility rjbull, thanks for recommending it. However, the problem is no longer a problem, as I found the solution. I was complaining about the lack of documentation regarding the Windows console. Perhaps my searching skills failed me, but if you have to resort to guessing the syntax of any language, the documentation is in pretty poor shape.

Ehtyar.

David1904

  • Supporting Member
  • Joined in 2006
  • **
  • default avatar
  • Posts: 43
    • View Profile
    • Donate to Member
TextHarvest is useful for keeping or deleting lines from text files. It is very simple to use.
You can find it on Tucows or other download sites.

TheQwerty

  • Supporting Member
  • Joined in 2007
  • **
  • default avatar
  • Posts: 84
    • View Profile
    • Donate to Member
I believe since Windows NT there's been a command, findstr, which acts similar to grep, but I'm not sure about the support for Unicode.

You should be able to do something like this:
findstr /V "^needle" input.txt>output.txt
The /V tells it to print the lines that do not match the regular expression "^needle" (any line beginning with "needle").

bgd77

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 203
    • View Profile
    • Donate to Member
I've tried it and searched some forums. findstr does not support unicode.  :down:

So, find must be used, and it seems to be really complicated to use it for this purpose.

bgd77

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 203
    • View Profile
    • Donate to Member
I think that this can be easily done with Windows Power Shell. Does anyone know more things about it?

app103

  • That scary taskbar girl
  • Global Moderator
  • Joined in 2006
  • *****
  • Posts: 5,884
    • View Profile
    • Donate to Member
Could probably do it with WSH and javascript too.

Krishean

  • Honorary Member
  • Joined in 2008
  • **
  • Posts: 75
  • I like pie
    • View Profile
    • Draconis Labs
    • Donate to Member
i could do this in javascript/wsh, i just need to know:

1. how large will the files be
2. how much memory are you willing to let the script use (more memory = faster, less = slower)

Any sufficiently advanced technology is indistinguishable from magic.

- Arthur C. Clarke