Show Posts

176

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: August 10, 2018, 05:32 AM »

Any idea why this does not work?
Get-Content *.xml | Out-String
-kalos (August 09, 2018, 10:53 AM)

I have a better idea, you tell us why you think it doesn't work.

gc *.xml -match *regex*
does not work

Code: PowerShell [Select]
Get-Help Get-Content

You tell us why it doesn't work.
-4wd (August 09, 2018, 07:59 PM)

Mmm I don't know why Get-Content *.xml | Out-String does not work to be honest.
I read at https://ss64.com/ps/out-string.html:
Send the content of Test1.txt to the console as a single string:
PS C:\> get-content C:\docs\test1.txt | out-string
So, shouldn't it work?

As for gc *.xml -match *regex* it seems that -match does not go after gc, but how do I connect/pipe these?
I had no clue, but I found in an irrelevant place on the internet this:
(Get-Content .\input.txt) -join "`r`n"
How should I know that I need to parenthesise the first object? Where does it say that in PS manual?

177

General Software Discussion / REST API

« on: August 09, 2018, 02:43 PM »

Hello!

At work, they have a difficult to send a curl command to server via Rest API in order to download some data.

Having used before wget, I thought that this curl command would be something easy so upon saying that, now everyone expects me to do it.

Is there any tutorial that can enable me to do that? I have no clue.

Thanks!

178

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: August 09, 2018, 10:53 AM »

That's good info, so I will have to have a good read on these to be able to understand PS.

I am better understanding by studying examples so I would appreciate your help with this.

PS: This is not related to the initial data file I wanted to process.

Any idea why this does not work?
Get-Content *.xml | Out-String

I want then to append to a file all the matches of a regex1 or regex2. Any idea?

Also, any idea on how to extract specific values from xml nodes?
I type select-xml -path *.xml -xpath "/html:html/html:products/html:product/html:referenceData/html:pct/html:productId" and it doesn't work

Something else, I want to parse the text of a file and use -match on it, but I cannot figure out how to do it, it's so embarrassing!
gc *.xml -match *regex*
does not work

179

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: August 08, 2018, 08:59 AM »

gci *.txt | % { sls $_.Name -Pattern '^.*"ui_mode",(\d+).*$' -a | % { $_.Matches } | % { $_.Groups[1].Value } >> K:\out.txt }

I still struggle very much with this and Google does not help. The main source of confusion I believe is the fact that I don't know if a term in the command is a random variable name or if it is a specific variable which is part of PS core. Also, another thing is that I may be able to Google "what does % mean in powershell" to find out what a specific symbol means but if they are part of another word like $_.Matches, it becomes confusing.

So the gci command will grab all the files that match *.txt in the directory.
The pipe means that we run sequentially another command.
The sls command means that it matches the regex inside the input $_.Name. What is that? I read "The $_ variable holds a reference to the current item being processed." But I don't understand what that means. Any idea?
Then, we move on to the command % which is basically a for-each loop. So for each of the Matches found with the previous command, we run another for each loop, the Groups[1].Value. I don't know what that is either. Any idea?
Finally we append to out.txt.

Are the terms Name, Matches, Groups, Value, something standard in Powershell or they are random variables names?

180

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: August 07, 2018, 09:01 AM »

I want answers to specific questions, not a complete solution. It is not possible for anyone external to offer a complete solution because the source data cannot be shared.
Also, most of my questions are for my own understanding and may not directly relate to the specific problem.

181

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: August 07, 2018, 04:16 AM »

Last, how to find the next regex match in the file?
Add another Select-String line with the next RegEx.
-4wd (August 06, 2018, 04:41 PM)

No, I don't mean a different regex. I mean the same regex. The same regex may have multiple matches in one file. How do I make the script to find the first instance, do stuff, then find the second instance, do stuff etc?

Also, how do I make . to include newline?

Also, what is | % { $_.Matches } | % { $_.Value } >> $outfile exactly?
I don't know what % and { $_. and Value are?

Also, how do I return a specific part from regex? In normal regex text editors, you put the part in parentheses and then you replace them with \1 etc. How do I do it in Powershell?

Also, I should be able to figure this out myself, but I am looking for a neat code and I can only manage to come up with messy stuff: is there a script to delete lines not containing specific literate phrases? E.g. not containing 'lue } >> $ou' without having to go through each character to check if it needs escaping or not.

182

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: August 06, 2018, 03:17 PM »

Stream EDitor
-Ath (August 06, 2018, 02:45 PM)

Very interesting tool, thanks!

183

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: August 06, 2018, 02:27 PM »

Guys, the more I am looking on it, the more I am convinced that Regex would be the best solution.
-kalos (August 06, 2018, 11:36 AM)
Please listen to people with more (programming) experience than you have, you are really trying to hammer round screws into square holes here, don't do that, you'll hurt yourself.
-Ath (August 06, 2018, 02:14 PM)

OK but I would be highly interested to learn how to do the below?
Can anyone tell me please how to find a regex in a file and append it to a file? Also, how to loop that? Last, how to find the next regex match in the file?

184

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: August 06, 2018, 02:25 PM »

I don't understand what CDATA is.

My xml file contains tons of tags, ie text inside <>, in a complex hierarchy.
Apart from that, it contains values both inside the <>, in the format of <someTag someID="SomeValue"> and in the format of <someTag>SomeValue<\someTag>

1) I don't know what the total number and hierarchy of tags is. So can I select ALL nodes under the whole hierarchy?
2) Will PS process the both formats of values above?

185

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: August 06, 2018, 12:50 PM »

But I cannot make it work for my file. Any hint?
-kalos (August 06, 2018, 10:55 AM)

Yeah, as Ath suggested, your XML contains CDATA so you have to read that separately.

https://stackoverflo...file-with-powershell
-4wd (August 06, 2018, 11:37 AM)

I will try but can you help me with the below:

Guys, the more I am looking on it, the more I am convinced that Regex would be the best solution.

Can anyone tell me please how to find a regex in a file and append it to a file? Also, how to loop that? Last, how to find the next regex match in the file?

-kalos (August 06, 2018, 11:36 AM)

186

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: August 06, 2018, 11:36 AM »

Guys, the more I am looking on it, the more I am convinced that Regex would be the best solution.

Can anyone tell me please how to find a regex in a file and append it to a file? Also, how to loop that? Last, how to find the next regex match in the file?

187

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: August 06, 2018, 10:55 AM »

By the way, this script looks amazing (From: https://www.codeproj...0/powershell-and-xml):

PS C:\> $xml = (Get-Content file.xml)
PS C:\> $xml = [xml](Get-Content file.xml)
PS C:\> $xml.SelectNodes("/employees/employee")

id name age
-- ---- ---
101 Frankie Johnny 36
102 Elvis Presley 79
301 Ella Fitzgerald 102

But I cannot make it work for my file. Any hint?

188

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: August 06, 2018, 08:13 AM »

Yeah, I want to use RegEx to be honest. But I struggle to find a way to do it.

The first line of the data contains an ID. So I can store all these IDs in an array.
Then, for each entry in the array, I will be able to match some regex and output them.

The problem is that the data gets into so many deep tree branches that it gets hard to isolate them.

Mmmmm! Now I got an idea.
If I could convert the XML file in a flat structured file, where each line will display the attribute name and value (as it normally does in XML), but it will also display the attributes and values from all the above hierarchy!

That way, it will be much more manageable, because I will be able to isolate and process specific lines.

Any script that can do this?

189

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: August 06, 2018, 07:46 AM »

By the way, is there a way to do 'find next' in Powershell without having to find all matches and create an array? I imagine the latter is very RAM consuming.

190

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: August 06, 2018, 07:01 AM »

Mmm, I see.

Well guys, the data is what I posted in my last post (Plants), these are three sample records and they keep repeating (with different values).

How do I parse this in the most easy way?

191

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: August 06, 2018, 03:26 AM »

The first lines are:

<?xml version="1.0" encoding="ISO8859-1" ?>
<CATALOG>
<PLANT>
<COMMON>Bloodroot</COMMON>
<BOTANICAL>Sanguinaria canadensis</BOTANICAL>
<ZONE>4</ZONE>
<LIGHT>Mostly Shady</LIGHT>
<PRICE>$2.44</PRICE>
<AVAILABILITY>031599</AVAILABILITY>
</PLANT>
<PLANT>
<COMMON>Columbine</COMMON>
<BOTANICAL>Aquilegia canadensis</BOTANICAL>
<ZONE>3</ZONE>
<LIGHT>Mostly Shady</LIGHT>
<PRICE>$9.37</PRICE>
<AVAILABILITY>030699</AVAILABILITY>
</PLANT>

Now this goes on and on and the last lines are:
<PLANT>
<COMMON>Cardinal Flower</COMMON>
<BOTANICAL>Lobelia cardinalis</BOTANICAL>
<ZONE>2</ZONE>
<LIGHT>Shade</LIGHT>
<PRICE>$3.02</PRICE>
<AVAILABILITY>022299</AVAILABILITY>
</PLANT>
</CATALOG>

But I do not want to work it with Select-XML because it will limit my learning a lot. Instead I want to use REGEX so that I can learn something that can be applied to many other situations.
I believe I need to learn in PowerShell:
1) how to read file
2) how to search for a regex, store it in a variable then perform another regex search in that variable and return a part of the match or append it in an output file
3) how to search for the next instance of the regex and loop the above
4) all regexes must be multiline

192

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: August 05, 2018, 03:54 PM »

Now we are getting somewhere, sort of.

You only didn't tell what other parts of the data you need extracted from each record, besides the ui_mode field, and what the identifying field is that should go in the first column of the csv output you suggested earlier.
-Ath (August 05, 2018, 03:24 PM)

We can extract the ui_mode and max_timing and the first column would be the second text in "", ie for the first recond iGTzEhwhMx4U

193

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: August 05, 2018, 03:48 PM »

Now we are getting somewhere, sort of.

You only didn't tell what other parts of the data you need extracted from each record, besides the ui_mode field, and what the identifying field is that should go in the first column of the csv output you suggested earlier.

-Ath (August 05, 2018, 03:24 PM)

Is your second line a question?

194

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: August 05, 2018, 01:55 PM »

You can check this as well:

,["t--ddbPTeIsNI","iGTzEhwhMx4U","r-iGTzEhwhMx4U",[["debug",null,null,null,null,[null,null,null,null,0]
]
,["ui_mode",1234125123l,[null,null,"inline"]
]
,["num_cols",14351435,[null,null,null,2.0]
]
,["max_timing",235123512,[null,null,null,2500.0]
]
,["check_parent_card",143512122,[null,null,null,null,1]
]
,["counterfactual_logging",213513212412,[null,null,null,null,0]
]
]
]
,["t--ddbPTeIsNI","iLS0pb0OlVDE","r-iLS0pb0OlVDE",[["debug",null,null,null,null,[null,null,null,null,0]
]
,["ui_mode",4311231235,[null,null,"inline"]
]
,["num_cols",12341241234,[null,null,null,2.0]
]
,["max_timing",23512351223,[null,null,null,2500.0]
]
,["check_parent_card",5235123412,[null,null,null,null,1]
]
,["counterfactual_logging",12351251212,[null,null,null,null,0]
]
]
]
,["t--ddbPTeIsNI","ibE7thiz85_Y","r-ibE7thiz85_Y",[["debug",null,null,null,null,[null,null,null,null,0]
]
,["ui_mode",124351235,[null,null,"inline"]
]
,["num_cols",623423451,[null,null,null,2.0]
]
,["max_timing",123512351,[null,null,null,2500.0]
]
,["check_parent_card",1235125123,[null,null,null,null,1]
]
,["counterfactual_logging",12351235145,[null,null,null,null,0]
]
]
]

Let's say I want to extract the numbers in the fields ui_mode etc or each of these three separate records.

195

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: August 05, 2018, 01:44 PM »

Why is it useless? It's exact representation apart from the fact that are more irrelevant text around.

196

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: August 05, 2018, 11:09 AM »

The format of the data is like that (the only difference is that the data is multiline rather than single line as in this example):

prod1
blah
specs=a
blah
price=b
blah
prod2
blah
specs=c
blah
price=d
blah

So I want the output to be a csv like:
prod1; a; b
prod2; c; d

So I was thinking first a regex to highlight/save in a variable the first area of the text that belongs to a prod, which is the the first six lines (I cannot use the number of lines to distinguish them as they vary).
Then it would extract a and b from that variable by matching the specs and price regex 'within' prod1 variable, so that I can distinguish them from prod2.
And then loop to complete the conversion.

Hope this helps?

So my understanding is that I cannot search for a regex that will match "specs=.+?" or something because I won't be able to distinguish this for prod1, prod2, etc.
At the same time, I cannot match the regex "prod1.+specs=.+?" because I don't know the exact text for prod1 (it's an xml attribute that is called prodID, but the value can be anything).

Do you have any idea on how to process this?

197

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: August 05, 2018, 11:03 AM »

If you are searching for a regex within a regex, 'You Are Doing It Wrong' (T).

You initial requirement was to find and extract content using a regex, but now you need parts of that regex to be split out? That can be done using a single regex, grouping the stuff you need to split out.
And for this whole exersize to make any sense, where is the variable part of the data to find? When searching for explicit text(s), a count would suffice...
Please provide a complete example, with actual data (not an entire file!), clearly marking the stuff you need to extract, of what you want to achieve, not how you think it could/should be solved.
-Ath (August 05, 2018, 06:20 AM)

Indeed, I now realised it!
I will try to provide an example in a bit.

198

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: August 05, 2018, 05:46 AM »

$items - an arbitrarily named variable
= - sign signifying equality
Get-ChildItem

Thus $items now equals an array of files in the current folder that match *.txt
$items[0] = firstfile.txt
$items[1] = secondfile.txt
etc
etc
etc

$items.Count - total number of matching files found

for(){} - a for loop, $i is a variable that gets incremented by 1 every loop until the total number of matching files is reached

Thus loop through all the files in the array performing the following on every file:

Select-String -Path $items[$i] -Pattern $regex -AllMatches

Search each file for matching RegEx pattern, get all matches.

| % { $_.Matches } | % { $_.Value } >> $outfile

RegEx matches are piped into a ForEach loop, (shorthand notation). For each regex match, pipe it's value to the output file in append mode.

Don't actually need to escape the " in the RegEx either:
Code: PowerShell [Select]
$regex = '<dsf:tsdfgd trsdfge="urn:x-ssdfgs-dfg-com:isdfgc/tg4r3e-i4d" id="OsdfgsdfD">'
Will also work.

Same as the 6 lines above without assigned variables or a for loop:
Code: PowerShell [Select]
gci *.txt | % { sls $_.Name -Pattern '<dsf:tsdfgd trsdfge="urn:x-ssdfgs-dfg-com:isdfgc/tg4r3e-i4d" id="OsdfgsdfD">' -a | % { $_.Matches } | % { $_.Value } >> K:\out.txt }
-4wd (August 03, 2018, 10:11 PM)

That is very helpful thanks!

From what I have understood, the script will first scan its own folder where it exists, for all the txt files present and process them one by one in an array. Actually I think I can skip that bit if it can process the whole 25GB txt file at once.

As for the actual regex matches, what I would actually like it to do is to:
- scan the source file for a regex(A)
- finding the first instance of regex(A), it would store it in a variable and search another regex(B) inside that variable.
- then I have a couple more regex matches that I need it to store in that variable and output specific things from these regex matches inside the initial regex(A). By output I mean write sequencially line by line in an output file.
- then the loop will continue with the next regex(A) match inside the source file, and store it in a variable, and search for the same regex(B) etc matches inside that variable and output parts of those regex matches in the output file.

Sounds very basic and simple. Can you tell me what commands I need to write something like that please?

199

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: August 04, 2018, 04:59 PM »

In the meantime I will read https://www.itprotod...course-you-can-learn

200

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: August 04, 2018, 03:46 PM »

Thanks but I struggle to follow. I find AHK much more straight forward. But how can I make it work with a 25GB?

Messages - kalos [ switch to compact view ]

General Software Discussion / Re: Extract REGEX matches from multiple text files

General Software Discussion / REST API

General Software Discussion / Re: Extract REGEX matches from multiple text files

General Software Discussion / Re: Extract REGEX matches from multiple text files

General Software Discussion / Re: Extract REGEX matches from multiple text files

General Software Discussion / Re: Extract REGEX matches from multiple text files

General Software Discussion / Re: Extract REGEX matches from multiple text files

General Software Discussion / Re: Extract REGEX matches from multiple text files

General Software Discussion / Re: Extract REGEX matches from multiple text files

General Software Discussion / Re: Extract REGEX matches from multiple text files

General Software Discussion / Re: Extract REGEX matches from multiple text files

General Software Discussion / Re: Extract REGEX matches from multiple text files

General Software Discussion / Re: Extract REGEX matches from multiple text files

General Software Discussion / Re: Extract REGEX matches from multiple text files

General Software Discussion / Re: Extract REGEX matches from multiple text files

General Software Discussion / Re: Extract REGEX matches from multiple text files

General Software Discussion / Re: Extract REGEX matches from multiple text files

General Software Discussion / Re: Extract REGEX matches from multiple text files

General Software Discussion / Re: Extract REGEX matches from multiple text files

General Software Discussion / Re: Extract REGEX matches from multiple text files

General Software Discussion / Re: Extract REGEX matches from multiple text files

General Software Discussion / Re: Extract REGEX matches from multiple text files

General Software Discussion / Re: Extract REGEX matches from multiple text files

General Software Discussion / Re: Extract REGEX matches from multiple text files

General Software Discussion / Re: Extract REGEX matches from multiple text files