topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Tuesday November 11, 2025, 6:23 am
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Recent Posts

Pages: prev1 ... 3 4 5 6 7 [8] 9 10 11 12 13 ... 73next
176
General Software Discussion / Re: Extract REGEX matches from multiple text files
« Last post by kalos on August 10, 2018, 09:43 AM »
2. He doesn't read what has already been given because the answer is in this thread.

PS. Sorry mouser ...  :-\

Oh sorry but due to my learning difficulty I need to be pinpointed to the exact thing.
I am developing my PS understanding though and it seems very powerful  :Thmbsup:

I wrote this script:
(gc *.txt)  -replace "regex1(.+?)", "`$1" >> out

It replaces the regex with the reference from the regex and outputs to a new file. Not exactly what I want it to do.
I want to output the reference, any idea how to do that?
Also, I want to run sequential several regex matches with their own references, one by one and append each result to the output file.

I believe piping commands does not achieve this. I think piping is about getting the output object from the previous command and feed it to the next command. However, I want the various regex matches to work on the original object sequentially. This is a bit tricky, any idea?


Also, can you tell me please how to find and select and append values from multiple xml nodes knowing their XPath?
I do that and it doesn't work:
Select-Xml -Path "*.xml" -XPath "/html:book/html:Entity" >> out
Also this doesn't work:
PS H:\> [xml]$Types = get-content *.xml
PS H:\> select-xml -xml $Types -xpath "//html:Entity"
select-xml : Namespace Manager or XsltContext needed. This query

Thanks!
177
General Software Discussion / Re: Extract REGEX matches from multiple text files
« Last post by kalos on August 10, 2018, 05:32 AM »
Any idea why this does not work?
Get-Content *.xml | Out-String

I have a better idea, you tell us why you think it doesn't work.

gc *.xml -match *regex*
does not work :(

Code: PowerShell [Select]
  1. Get-Help Get-Content

You tell us why it doesn't work.

Mmm I don't know why Get-Content *.xml | Out-String does not work to be honest.
I read at https://ss64.com/ps/out-string.html:
Send the content of Test1.txt to the console as a single string:
PS C:\> get-content C:\docs\test1.txt | out-string
So, shouldn't it work?

As for gc *.xml -match *regex* it seems that -match does not go after gc, but how do I connect/pipe these?
I had no clue, but I found in an irrelevant place on the internet this:
(Get-Content .\input.txt) -join "`r`n"
How should I know that I need to parenthesise the first object? Where does it say that in PS manual?
178
General Software Discussion / REST API
« Last post by kalos on August 09, 2018, 02:43 PM »
Hello!

At work, they have a difficult to send a curl command to server via Rest API in order to download some data.

Having used before wget, I thought that this curl command would be something easy so upon saying that, now everyone expects me to do it.

Is there any tutorial that can enable me to do that? I have no clue.

Thanks!
179
General Software Discussion / Re: Extract REGEX matches from multiple text files
« Last post by kalos on August 09, 2018, 10:53 AM »
That's good info, so I will have to have a good read on these to be able to understand PS.

I am better understanding by studying examples so I would appreciate your help with this.

PS: This is not related to the initial data file I wanted to process.

Any idea why this does not work?
Get-Content *.xml | Out-String

I want then to append to a file all the matches of a regex1 or regex2. Any idea?

Also, any idea on how to extract specific values from xml nodes?
I type  select-xml -path *.xml -xpath "/html:html/html:products/html:product/html:referenceData/html:pct/html:productId" and it doesn't work

Something else, I want to parse the text of a file and use -match on it, but I cannot figure out how to do it, it's so embarrassing!
gc *.xml -match *regex*
does not work :(
180
General Software Discussion / Re: Extract REGEX matches from multiple text files
« Last post by kalos on August 08, 2018, 08:59 AM »
gci *.txt | % { sls $_.Name -Pattern '^.*"ui_mode",(\d+).*$' -a | % { $_.Matches } | % { $_.Groups[1].Value } >> K:\out.txt }

I still struggle very much with this and Google does not help. The main source of confusion I believe is the fact that I don't know if a term in the command is a random variable name or if it is a specific variable which is part of PS core. Also, another thing is that I may be able to Google "what does % mean in powershell" to find out what a specific symbol means but if they are part of another word like $_.Matches, it becomes confusing.

So the gci command will grab all the files that match *.txt in the directory.
The pipe means that we run sequentially another command.
The sls command means that it matches the regex inside the input $_.Name. What is that? I read "The $_ variable holds a reference to the current item being processed." But I don't understand what that means. Any idea?
Then, we move on to the command % which is basically a for-each loop. So for each of the Matches found with the previous command, we run another for each loop, the Groups[1].Value. I don't know what that is either. Any idea?
Finally we append to out.txt.

Are the terms Name, Matches, Groups, Value, something standard in Powershell or they are random variables names?
181
General Software Discussion / Re: Extract REGEX matches from multiple text files
« Last post by kalos on August 07, 2018, 09:01 AM »
I want answers to specific questions, not a complete solution. It is not possible for anyone external to offer a complete solution because the source data cannot be shared.
Also, most of my questions are for my own understanding and may not directly relate to the specific problem.
182
General Software Discussion / Re: Extract REGEX matches from multiple text files
« Last post by kalos on August 07, 2018, 04:16 AM »
Last, how to find the next regex match in the file?
Add another Select-String line with the next RegEx.

No, I don't mean a different regex. I mean the same regex. The same regex may have multiple matches in one file. How do I make the script to find the first instance, do stuff, then find the second instance, do stuff etc?

Also, how do I make . to include newline?

Also, what is | % { $_.Matches } | % { $_.Value } >> $outfile exactly?
I don't know what % and { $_. and Value are?

Also, how do I return a specific part from regex? In normal regex text editors, you put the part in parentheses and then you replace them with \1 etc. How do I do it in Powershell?

Also, I should be able to figure this out myself, but I am looking for a neat code and I can only manage to come up with messy stuff: is there a script to delete lines not containing specific literate phrases? E.g. not containing 'lue } >> $ou' without having to go through each character to check if it needs escaping or not.
183
General Software Discussion / Re: Extract REGEX matches from multiple text files
« Last post by kalos on August 06, 2018, 03:17 PM »
Stream EDitor

Very interesting tool, thanks!
184
General Software Discussion / Re: Extract REGEX matches from multiple text files
« Last post by kalos on August 06, 2018, 02:27 PM »
Guys, the more I am looking on it, the more I am convinced that Regex would be the best solution.
Please listen to people with more (programming) experience than you have, you are really trying to hammer round screws into square holes here, don't do that, you'll hurt yourself.

OK but I would be highly interested to learn how to do the below?
Can anyone tell me please how to find a regex in a file and append it to a file? Also, how to loop that? Last, how to find the next regex match in the file?
185
General Software Discussion / Re: Extract REGEX matches from multiple text files
« Last post by kalos on August 06, 2018, 02:25 PM »
I don't understand what CDATA is.

My xml file contains tons of tags, ie text inside <>, in a complex hierarchy.
Apart from that, it contains values both inside the <>, in the format of <someTag someID="SomeValue"> and in the format of <someTag>SomeValue<\someTag>

1) I don't know what the total number and hierarchy of tags is. So can I select ALL nodes under the whole hierarchy?
2) Will PS process the both formats of values above?
186
General Software Discussion / Re: Extract REGEX matches from multiple text files
« Last post by kalos on August 06, 2018, 12:50 PM »
But I cannot make it work for my file. Any hint?

Yeah, as Ath suggested, your XML  contains CDATA so you have to read that separately.

https://stackoverflo...file-with-powershell

I will try but can you help me with the below:

Guys, the more I am looking on it, the more I am convinced that Regex would be the best solution.

Can anyone tell me please how to find a regex in a file and append it to a file? Also, how to loop that? Last, how to find the next regex match in the file?


187
General Software Discussion / Re: Extract REGEX matches from multiple text files
« Last post by kalos on August 06, 2018, 11:36 AM »
Guys, the more I am looking on it, the more I am convinced that Regex would be the best solution.

Can anyone tell me please how to find a regex in a file and append it to a file? Also, how to loop that? Last, how to find the next regex match in the file?

188
General Software Discussion / Re: Extract REGEX matches from multiple text files
« Last post by kalos on August 06, 2018, 10:55 AM »
By the way, this script looks amazing (From: https://www.codeproj...0/powershell-and-xml):

PS C:\> $xml = (Get-Content file.xml)
PS C:\> $xml = [xml](Get-Content file.xml)
PS C:\> $xml.SelectNodes("/employees/employee")

id                                      name                                    age
--                                      ----                                    ---
101                                     Frankie Johnny                          36
102                                     Elvis Presley                           79
301                                     Ella Fitzgerald                         102

But I cannot make it work for my file. Any hint?
189
General Software Discussion / Re: Extract REGEX matches from multiple text files
« Last post by kalos on August 06, 2018, 08:13 AM »
Yeah, I want to use RegEx to be honest. But I struggle to find a way to do it.

The first line of the data contains an ID. So I can store all these IDs in an array.
Then, for each entry in the array, I will be able to match some regex and output them.

The problem is that the data gets into so many deep tree branches that it gets hard to isolate them.

Mmmmm! Now I got an idea.
If I could convert the XML file in a flat structured file, where each line will display the attribute name and value (as it normally does in XML), but it will also display the attributes and values from all the above hierarchy!

That way, it will be much more manageable, because I will be able to isolate and process specific lines.

Any script that can do this?
190
General Software Discussion / Re: Extract REGEX matches from multiple text files
« Last post by kalos on August 06, 2018, 07:46 AM »
By the way, is there a way to do 'find next' in Powershell without having to find all matches and create an array? I imagine the latter is very RAM consuming.
191
General Software Discussion / Re: Extract REGEX matches from multiple text files
« Last post by kalos on August 06, 2018, 07:01 AM »
Mmm, I see.

Well guys, the data is what I posted in my last post (Plants), these are three sample records and they keep repeating (with different values).

How do I parse this in the most easy way?
192
General Software Discussion / Re: Extract REGEX matches from multiple text files
« Last post by kalos on August 06, 2018, 03:26 AM »
The first lines are:

<?xml version="1.0" encoding="ISO8859-1" ?>
<CATALOG>
 <PLANT>
 <COMMON>Bloodroot</COMMON>
 <BOTANICAL>Sanguinaria canadensis</BOTANICAL>
 <ZONE>4</ZONE>
 <LIGHT>Mostly Shady</LIGHT>
 <PRICE>$2.44</PRICE>
 <AVAILABILITY>031599</AVAILABILITY>
 </PLANT>
 <PLANT>
 <COMMON>Columbine</COMMON>
 <BOTANICAL>Aquilegia canadensis</BOTANICAL>
 <ZONE>3</ZONE>
 <LIGHT>Mostly Shady</LIGHT>
 <PRICE>$9.37</PRICE>
 <AVAILABILITY>030699</AVAILABILITY>
 </PLANT>

Now this goes on and on and the last lines are:
<PLANT>
 <COMMON>Cardinal Flower</COMMON>
 <BOTANICAL>Lobelia cardinalis</BOTANICAL>
 <ZONE>2</ZONE>
 <LIGHT>Shade</LIGHT>
 <PRICE>$3.02</PRICE>
 <AVAILABILITY>022299</AVAILABILITY>
 </PLANT>
</CATALOG>

But I do not want to work it with Select-XML because it will limit my learning a lot. Instead I want to use REGEX so that I can learn something that can be applied to many other situations.
I believe I need to learn in PowerShell:
1) how to read file
2) how to search for a regex, store it in a variable then perform another regex search in that variable and return a part of the match or append it in an output file
3) how to search for the next instance of the regex and loop the above
4) all regexes must be multiline
193
General Software Discussion / Re: Extract REGEX matches from multiple text files
« Last post by kalos on August 05, 2018, 03:54 PM »
Now we are getting somewhere, sort of.

You only didn't tell what other parts of the data you need extracted from each record, besides the ui_mode field, and what the identifying field is that should go in the first column of the csv output you suggested earlier.

We can extract the ui_mode and max_timing and the first column would be the second text in "", ie for the first recond iGTzEhwhMx4U
194
General Software Discussion / Re: Extract REGEX matches from multiple text files
« Last post by kalos on August 05, 2018, 03:48 PM »
Now we are getting somewhere, sort of.

You only didn't tell what other parts of the data you need extracted from each record, besides the ui_mode field, and what the identifying field is that should go in the first column of the csv output you suggested earlier.


Is your second line a question?
195
General Software Discussion / Re: Extract REGEX matches from multiple text files
« Last post by kalos on August 05, 2018, 01:55 PM »
You can check this as well:


,["t--ddbPTeIsNI","iGTzEhwhMx4U","r-iGTzEhwhMx4U",[["debug",null,null,null,null,[null,null,null,null,0]
]
,["ui_mode",1234125123l,[null,null,"inline"]
]
,["num_cols",14351435,[null,null,null,2.0]
]
,["max_timing",235123512,[null,null,null,2500.0]
]
,["check_parent_card",143512122,[null,null,null,null,1]
]
,["counterfactual_logging",213513212412,[null,null,null,null,0]
]
]
]
,["t--ddbPTeIsNI","iLS0pb0OlVDE","r-iLS0pb0OlVDE",[["debug",null,null,null,null,[null,null,null,null,0]
]
,["ui_mode",4311231235,[null,null,"inline"]
]
,["num_cols",12341241234,[null,null,null,2.0]
]
,["max_timing",23512351223,[null,null,null,2500.0]
]
,["check_parent_card",5235123412,[null,null,null,null,1]
]
,["counterfactual_logging",12351251212,[null,null,null,null,0]
]
]
]
,["t--ddbPTeIsNI","ibE7thiz85_Y","r-ibE7thiz85_Y",[["debug",null,null,null,null,[null,null,null,null,0]
]
,["ui_mode",124351235,[null,null,"inline"]
]
,["num_cols",623423451,[null,null,null,2.0]
]
,["max_timing",123512351,[null,null,null,2500.0]
]
,["check_parent_card",1235125123,[null,null,null,null,1]
]
,["counterfactual_logging",12351235145,[null,null,null,null,0]
]
]
]

Let's say I want to extract the numbers in the fields ui_mode etc or each of these three separate records.
196
General Software Discussion / Re: Extract REGEX matches from multiple text files
« Last post by kalos on August 05, 2018, 01:44 PM »
Why is it useless? It's exact representation apart from the fact that are more irrelevant text around.
197
General Software Discussion / Re: Extract REGEX matches from multiple text files
« Last post by kalos on August 05, 2018, 11:09 AM »
The format of the data is like that (the only difference is that the data is multiline rather than single line as in this example):

prod1
blah
specs=a
blah
price=b
blah
prod2
blah
specs=c
blah
price=d
blah

So I want the output to be a csv like:
prod1; a; b
prod2; c; d

So I was thinking first a regex to highlight/save in a variable the first area of the text that belongs to a prod, which is the the first six lines (I cannot use the number of lines to distinguish them as they vary).
Then it would extract a and b from that variable by matching the specs and price regex 'within' prod1 variable, so that I can distinguish them from prod2.
And then loop to complete the conversion.

Hope this helps?

So my understanding is that I cannot search for a regex that will match "specs=.+?" or something because I won't be able to distinguish this for prod1, prod2, etc.
At the same time, I cannot match the regex "prod1.+specs=.+?" because I don't know the exact text for prod1 (it's an xml attribute that is called prodID, but the value can be anything).

Do you have any idea on how to process this?
198
General Software Discussion / Re: Extract REGEX matches from multiple text files
« Last post by kalos on August 05, 2018, 11:03 AM »
If you are searching for a regex within a regex, 'You Are Doing It Wrong' (T).

You initial requirement was to find and extract content using a regex, but now you need parts of that regex to be split out? That can be done using a single regex, grouping the stuff you need to split out.
And for this whole exersize to make any sense, where is the variable part of the data to find? When searching for explicit text(s), a count would suffice...
Please provide a complete example, with actual data (not an entire file!), clearly marking the stuff you need to extract, of what you want to achieve, not how you think it could/should be solved.

Indeed, I now realised it!
I will try to provide an example in a bit.
199
General Software Discussion / Re: Extract REGEX matches from multiple text files
« Last post by kalos on August 05, 2018, 05:46 AM »
$items - an arbitrarily named variable
=        - sign signifying equality
Get-ChildItem

Thus $items now equals an array of files in the current folder that match *.txt
$items[0] = firstfile.txt
$items[1] = secondfile.txt
etc
etc
etc

$items.Count  - total number of matching files found

for(){}   - a for loop, $i is a variable that gets incremented by 1 every loop until the total number of matching files is reached

Thus loop through all the files in the array performing the following on every file:

Select-String -Path $items[$i] -Pattern $regex -AllMatches

Search each file for matching RegEx pattern, get all matches.

| % { $_.Matches } | % { $_.Value } >> $outfile

RegEx matches are piped into a ForEach loop, (shorthand notation). For each regex match, pipe it's value to the output file in append mode.

Don't actually need to escape the " in the RegEx either:
Code: PowerShell [Select]
  1. $regex = '<dsf:tsdfgd trsdfge="urn:x-ssdfgs-dfg-com:isdfgc/tg4r3e-i4d" id="OsdfgsdfD">'
Will also work.

Same as the 6 lines above without assigned variables or a for loop:
Code: PowerShell [Select]
  1. gci *.txt | % { sls $_.Name -Pattern '<dsf:tsdfgd trsdfge="urn:x-ssdfgs-dfg-com:isdfgc/tg4r3e-i4d" id="OsdfgsdfD">' -a | % { $_.Matches } | % { $_.Value } >> K:\out.txt }


That is very helpful thanks!

From what I have understood, the script will first scan its own folder where it exists, for all the txt files present and process them one by one in an array. Actually I think I can skip that bit if it can process the whole 25GB txt file at once.

As for the actual regex matches, what I would actually like it to do is to:
- scan the source file for a regex(A)
- finding the first instance of regex(A), it would store it in a variable and search another regex(B) inside that variable.
- then I have a couple more regex matches that I need it to store in that variable and output specific things from these regex matches inside the initial regex(A). By output I mean write sequencially line by line in an output file.
- then the loop will continue with the next regex(A) match inside the source file, and store it in a variable, and search for the same regex(B) etc matches inside that variable and output parts of those regex matches in the output file.

Sounds very basic and simple. Can you tell me what commands I need to write something like that please?
200
General Software Discussion / Re: Extract REGEX matches from multiple text files
« Last post by kalos on August 04, 2018, 04:59 PM »
Pages: prev1 ... 3 4 5 6 7 [8] 9 10 11 12 13 ... 73next