ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

Main Area and Open Discussion > General Software Discussion

Extract REGEX matches from multiple text files

<< < (4/22) > >>

4wd:
Yeah, kind of overstepped my G.A.S. limit but it provided a little mental exercise.

Normally would have left it at my first post but I was a little bored ...

kalos:
Thanks but I struggle to follow. I find AHK much more straight forward. But how can I make it work with a 25GB?

kalos:
In the meantime I will read https://www.itprotoday.com/management-mobility/dons-188-minute-powershell-crash-course-you-can-learn

Ath:
In the meantime I will read https://www.itprotoday.com/management-mobility/dons-188-minute-powershell-crash-course-you-can-learn
-kalos (August 04, 2018, 04:59 PM)
--- End quote ---
I expect you comprehend what's written there, it seems quite suited for powershell n00bs.

Thanks but I struggle to follow. I find AHK much more straight forward. But how can I make it work with a 25GB?
-kalos (August 04, 2018, 03:46 PM)
--- End quote ---
Please stop asking for an AHK solution, those that participated here sofar aren't going to provide it, as a perfect solution is already provided.

If you had tried the script at actual data, you wouldn't have asked again about the 'measily' 25 GB files; yes, ofcourse it will take some time to process, but so does an 18.8 minute powershell crash course.
Powershell is built on the foundation of .NET, so it knows how to handle files efficiently.

Why did you leave these rather important 'details' out in your original question?-Ath (August 04, 2018, 07:03 AM)
--- End quote ---

Um .... because that's what he normally does ... it always takes at least a week, (sometimes longer or never), before all pertinent information is obtained ...
-4wd (August 04, 2018, 07:37 AM)
--- End quote ---
I know, I know, I'm just trying to educate someone (again, but it doesn't seem to be picked up much), see my quote below...

kalos:
$items - an arbitrarily named variable
=        - sign signifying equality
Get-ChildItem

Thus $items now equals an array of files in the current folder that match *.txt
$items[0] = firstfile.txt
$items[1] = secondfile.txt
etc
etc
etc

$items.Count  - total number of matching files found

for(){}   - a for loop, $i is a variable that gets incremented by 1 every loop until the total number of matching files is reached

Thus loop through all the files in the array performing the following on every file:

Select-String -Path $items[$i] -Pattern $regex -AllMatches

Search each file for matching RegEx pattern, get all matches.

| % { $_.Matches } | % { $_.Value } >> $outfile

RegEx matches are piped into a ForEach loop, (shorthand notation). For each regex match, pipe it's value to the output file in append mode.

Don't actually need to escape the " in the RegEx either:

--- Code: PowerShell ---$regex = '<dsf:tsdfgd trsdfge="urn:x-ssdfgs-dfg-com:isdfgc/tg4r3e-i4d" id="OsdfgsdfD">'Will also work.

Same as the 6 lines above without assigned variables or a for loop:

--- Code: PowerShell ---gci *.txt | % { sls $_.Name -Pattern '<dsf:tsdfgd trsdfge="urn:x-ssdfgs-dfg-com:isdfgc/tg4r3e-i4d" id="OsdfgsdfD">' -a | % { $_.Matches } | % { $_.Value } >> K:\out.txt }-4wd (August 03, 2018, 10:11 PM)
--- End quote ---


That is very helpful thanks!

From what I have understood, the script will first scan its own folder where it exists, for all the txt files present and process them one by one in an array. Actually I think I can skip that bit if it can process the whole 25GB txt file at once.

As for the actual regex matches, what I would actually like it to do is to:
- scan the source file for a regex(A)
- finding the first instance of regex(A), it would store it in a variable and search another regex(B) inside that variable.
- then I have a couple more regex matches that I need it to store in that variable and output specific things from these regex matches inside the initial regex(A). By output I mean write sequencially line by line in an output file.
- then the loop will continue with the next regex(A) match inside the source file, and store it in a variable, and search for the same regex(B) etc matches inside that variable and output parts of those regex matches in the output file.

Sounds very basic and simple. Can you tell me what commands I need to write something like that please?

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version