(what are the $i = 0; $i -lt $items.Count; $i++-kalos (August 03, 2018, 07:04 AM)
Also, I need to append to the output file several regex matches/returns, how do I do that?Why did you leave these rather important 'details' out in your original question?
Also, if I specify a regex match, how do I specify what I want to be returned from this match?-kalos (August 03, 2018, 05:15 PM)
Why did you leave these rather important 'details' out in your original question?-Ath (August 04, 2018, 07:03 AM)
Why did you leave these rather important 'details' out in your original question?-Ath (August 04, 2018, 07:03 AM)
Um .... because that's what he normally does ... it always takes at least a week, (sometimes longer or never), before all pertinent information (https://d.cxcore.net/Eric%20S%20Raymond/How%20To%20Ask%20Questions%20The%20Smart%20Way.pdf) is obtained ...-4wd (August 04, 2018, 07:37 AM)
In the meantime I will read https://www.itprotoday.com/management-mobility/dons-188-minute-powershell-crash-course-you-can-learnI expect you comprehend what's written there, it seems quite suited for powershell n00bs.-kalos (August 04, 2018, 04:59 PM)
Thanks but I struggle to follow. I find AHK much more straight forward. But how can I make it work with a 25GB?Please stop asking for an AHK solution, those that participated here sofar aren't going to provide it, as a perfect solution is already provided.-kalos (August 04, 2018, 03:46 PM)
I know, I know, I'm just trying to educate someone (again, but it doesn't seem to be picked up much), see my quote below...Why did you leave these rather important 'details' out in your original question?-Ath (August 04, 2018, 07:03 AM)
Um .... because that's what he normally does ... it always takes at least a week, (sometimes longer or never), before all pertinent information (https://d.cxcore.net/Eric%20S%20Raymond/How%20To%20Ask%20Questions%20The%20Smart%20Way.pdf) is obtained ...-4wd (August 04, 2018, 07:37 AM)
$items - an arbitrarily named variable
= - sign signifying equality
Get-ChildItem (https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.management/get-childitem?view=powershell-6)
Thus $items now equals an array of files in the current folder that match *.txt
$items[0] = firstfile.txt
$items[1] = secondfile.txt
etc
etc
etc
$items.Count - total number of matching files found
for(){} - a for (https://ss64.com/ps/for.html) loop, $i is a variable that gets incremented by 1 every loop until the total number of matching files is reached
Thus loop through all the files in the array performing the following on every file:
Select-String -Path $items[$i] -Pattern $regex -AllMatches
Search each file for matching RegEx pattern, get all matches.
| % { $_.Matches } | % { $_.Value } >> $outfile
RegEx matches are piped into a ForEach (https://ss64.com/ps/foreach.html) loop, (shorthand notation). For each regex match, pipe it's value to the output file in append mode.
Don't actually need to escape the " in the RegEx either:Code: PowerShell [Select]Will also work.
$regex = '<dsf:tsdfgd trsdfge="urn:x-ssdfgs-dfg-com:isdfgc/tg4r3e-i4d" id="OsdfgsdfD">'
Same as the 6 lines above (https://www.donationcoder.com/forum/index.php?topic=45945.msg422126#msg422126) without assigned variables or a for loop:Code: PowerShell [Select]
-4wd (August 03, 2018, 10:11 PM)
If you are searching for a regex within a regex, 'You Are Doing It Wrong' (T).
You initial requirement was to find and extract content using a regex, but now you need parts of that regex to be split out? That can be done using a single regex, grouping the stuff you need to split out.
And for this whole exersize to make any sense, where is the variable part of the data to find? When searching for explicit text(s), a count would suffice...
Please provide a complete example, with actual data (not an entire file!), clearly marking the stuff you need to extract, of what you want to achieve, not how you think it could/should be solved.-Ath (August 05, 2018, 06:20 AM)
If you are searching for a regex within a regex, 'You Are Doing It Wrong' (T).
You initial requirement was to find and extract content using a regex, but now you need parts of that regex to be split out? That can be done using a single regex, grouping the stuff you need to split out.
And for this whole exersize to make any sense, where is the variable part of the data to find? When searching for explicit text(s), a count would suffice...
Please provide a complete example, with actual data (not an entire file!), clearly marking the stuff you need to extract, of what you want to achieve, not how you think it could/should be solved.-Ath (August 05, 2018, 06:20 AM)
Now we are getting somewhere, sort of.
You only didn't tell what other parts of the data you need extracted from each record, besides the ui_mode field, and what the identifying field is that should go in the first column of the csv output you suggested earlier.-Ath (August 05, 2018, 03:24 PM)
Yes, 2 actually.Now we are getting somewhere, sort of.
You only didn't tell what other parts of the data you need extracted from each record, besides the ui_mode field, and what the identifying field is that should go in the first column of the csv output you suggested earlier.-Ath (August 05, 2018, 03:24 PM)
Is your second line a question?-kalos (August 05, 2018, 03:48 PM)
Now we are getting somewhere, sort of.
You only didn't tell what other parts of the data you need extracted from each record, besides the ui_mode field, and what the identifying field is that should go in the first column of the csv output you suggested earlier.-Ath (August 05, 2018, 03:24 PM)
Why is it useless? It's exact representation apart from the fact that are more irrelevant text around.-kalos (August 05, 2018, 01:44 PM)
,["t--ddbPTeIsNI","iGTzEhwhMx4U","r-iGTzEhwhMx4U",[["debug",null,null,null,null,[null,null,null,null,0]
]
,["ui_mode",1234125123l,[null,null,"inline"]
]
,["num_cols",14351435,[null,null,null,2.0]
]
,["max_timing",235123512,[null,null,null,2500.0]
]
,["check_parent_card",143512122,[null,null,null,null,1]
]
,["counterfactual_logging",213513212412,[null,null,null,null,0]
]
]
]-kalos (August 05, 2018, 01:55 PM)
<PLANT>
<COMMON>Bloodroot</COMMON>
<BOTANICAL>Sanguinaria canadensis</BOTANICAL>
<ZONE>4</ZONE>
<LIGHT>Mostly Shady</LIGHT>
<PRICE>$2.44</PRICE>
<AVAILABILITY>031599</AVAILABILITY>
</PLANT>-kalos (August 06, 2018, 03:26 AM)
it will limit my learning a lotWell, please first try to learn how to describe your challenge well, a tutorial was linked earlier by 4wd, then we will try to teach you how to best solve your challenge. It may not need regex at all.-kalos (August 06, 2018, 03:26 AM)
But I do not want to work it with Select-XML because it will limit my learning a lot.-kalos (August 06, 2018, 03:26 AM)
it will limit my learning a lotWell, please first try to learn how to describe your challenge well, a tutorial was linked earlier by 4wd, then we will try to teach you how to best solve your challenge. It may not need regex at all.-kalos (August 06, 2018, 03:26 AM)
A common saying about regexes goes like this: You try to solve a problem with a regex. Now you've got 2 problems...-Ath (August 06, 2018, 05:31 AM)
But I cannot make it work for my file. Any hint?-kalos (August 06, 2018, 10:55 AM)
But I cannot make it work for my file. Any hint?-kalos (August 06, 2018, 10:55 AM)
Yeah, as Ath suggested, your XML contains CDATA so you have to read that separately.
https://stackoverflow.com/questions/1274070/how-to-read-cdata-in-xml-file-with-powershell-4wd (August 06, 2018, 11:37 AM)
Guys, the more I am looking on it, the more I am convinced that Regex would be the best solution.
Can anyone tell me please how to find a regex in a file and append it to a file? Also, how to loop that? Last, how to find the next regex match in the file?-kalos (August 06, 2018, 11:36 AM)
If I could convert the XML file in a flat structured fileConverting your .xml to .csv is a quite easy one-liner in Powershell, assuming a single xml file, into a single .csv file:-kalos (August 06, 2018, 08:13 AM)
Guys, the more I am looking on it, the more I am convinced that Regex would be the best solution.Please listen to people with more (programming) experience than you have, you are really trying to hammer round screws into square holes here, don't do that, you'll hurt yourself.-kalos (August 06, 2018, 11:36 AM)
Guys, the more I am looking on it, the more I am convinced that Regex would be the best solution.Please listen to people with more (programming) experience than you have, you are really trying to hammer round screws into square holes here, don't do that, you'll hurt yourself.-kalos (August 06, 2018, 11:36 AM)-Ath (August 06, 2018, 02:14 PM)
Can anyone tell me please how to find a regex in a file and append it to a file? Also, how to loop that? Last, how to find the next regex match in the file?Well, the trouble is you'll have to do it in some script or programming language, as regex is actually a selection mechanism using pattern matching ('regular expressions').-kalos (August 06, 2018, 02:27 PM)
Stream EDitor-Ath (August 06, 2018, 02:45 PM)
Can anyone tell me please how to find a regex in a file and append it to a file? Also, how to loop that?-kalos (August 06, 2018, 11:36 AM)
Last, how to find the next regex match in the file?
Last, how to find the next regex match in the file?Add another Select-String line with the next RegEx.-4wd (August 06, 2018, 04:41 PM)
most of my questions are for my own understanding and may not directly relate to the specific problem.-kalos (August 07, 2018, 09:01 AM)
You obviously don't understand what we wrote earlier. I already gave 2 (two) possible ways how to handle that. And there are other solutions too.To spell it out, again:-Ath (August 07, 2018, 08:50 AM)
It is not possible for anyone external to offer a complete solution because the source data cannot be shared.-kalos (August 07, 2018, 09:01 AM)
Also, what is | % { $_.Matches } | % { $_.Value } >> $outfile exactly?-kalos (August 07, 2018, 04:16 AM)
|A Pipe (https://quickleft.com/blog/command-line-tutorials-redirection-pipes/)
%ForEach-Object (https://ss64.com/ps/foreach-object.html)
$_Object passed through pipe.
MatchesA Property of the object.
ValueA Property of the object.
Let's say I want to extract the numbers in the fields ui_mode etc or each of these three separate records.-kalos (August 05, 2018, 01:55 PM)
Name, Matches, Groups, Value, something standard in Powershell or they are random variables names?4wd did a fine job of explaining all the special chars / shortcuts he used in the post just above yours.-kalos (August 08, 2018, 08:59 AM)
4wd did a fine job of explaining all the special chars / shortcuts ...-Ath (August 08, 2018, 01:39 PM)
Any idea why this does not work?
Get-Content *.xml | Out-String-kalos (August 09, 2018, 10:53 AM)
gc *.xml -match *regex*
does not work :(
Any idea why this does not work?
Get-Content *.xml | Out-String-kalos (August 09, 2018, 10:53 AM)
I have a better idea, you tell us why you think it doesn't work.gc *.xml -match *regex*
does not work :(Code: PowerShell [Select]
Get-Help Get-Content
You tell us why it doesn't work.-4wd (August 09, 2018, 07:59 PM)
2. He doesn't read what has already been given because the answer is in this thread.
PS. Sorry mouser ... :-\-4wd (August 10, 2018, 08:38 AM)
Also, I want to run sequential several regex matches with their own references, one by one and append each result to the output file.You have to make clear whether the results from the separate queries have any positional relation to each other, or can the queries be run one after the other and the output of the second, third, etc., runs appended to the first regex run?-kalos (August 10, 2018, 09:43 AM)
Also, I want to run sequential several regex matches with their own references, one by one and append each result to the output file.You have to make clear whether the results from the separate queries have any positional relation to each other, or can the queries be run one after the other and the output of the second, third, etc., runs appended to the first regex run?-kalos (August 10, 2018, 09:43 AM)-Ath (August 10, 2018, 01:37 PM)
So it's always xml and it's always that schema? And you're just worried about duplicates?-wraith808 (August 13, 2018, 07:18 AM)
And one last question... when you say duplicate, you mean the whole record is duplicated? Or just some of the fields, i.e. productID or prod id?-wraith808 (August 13, 2018, 10:05 AM)
OK, so the input is:That finally makes some sense. Here (https://www.donationcoder.com/forum/index.php?topic=45945.msg422205#msg422205) is an example solution for putting that into a .csv formatted file.
<html:products>
<html:prod id="prod1">
<html:referenceData>
<html:product>
<html:classificationType>PRD</html:classificationType>
<html:productType>PRD_XE</html:productType>
<html:productId>10004</html:productId>
<html:assignedDate>2018-07-23</html:assignedDate>
</html:product>
<html:book>
<html:name>REPAIRS</html:name>
<html:Entity>REP_XE</html:legalEntity>
<html:location>ED</html:location>
</html:book>
</html:referenceData>
</html:prod>
The above continues to prod2 etc.
The output of the data would be:
prod1; PRD; PRD_XE; 10004; 2018-07-23; REPAIRS; REP_XE; ED
Then a new line would start with:
prod2; etc-kalos (August 13, 2018, 05:03 AM)
It doesn't even look a teensy bit like this new data you've given just now, are you playing us?<CATALOG>
<PLANT>
<COMMON>Bloodroot</COMMON>
<BOTANICAL>Sanguinaria canadensis</BOTANICAL>
<ZONE>4</ZONE>
<LIGHT>Mostly Shady</LIGHT>
<PRICE>$2.44</PRICE>
<AVAILABILITY>031599</AVAILABILITY>
</PLANT>-kalos (August 06, 2018, 03:26 AM)Well guys, the data is what I posted in my last post (Plants),-kalos (August 06, 2018, 07:01 AM)
However, I want to convert the input data in a string, because, I may need to match longer substrings than eg "<html:classificationType>(.+?)</html:classificationType>"You are talking b.s. here.-kalos (August 13, 2018, 05:03 AM)
Also, I think there may be duplicates for each prod, e.g. more than one assignedDate node with different values, so MatchAll would be best.This doesn't make sense without an example, and MatchAll is inappropriate here.-kalos (August 13, 2018, 05:03 AM)
extract the appropriate regexPLEASE STOP TELLING US HOW TO SOLVE YOUR CHALLENGE!-kalos (August 13, 2018, 10:47 AM)
You didn't give the specification for that html: namespace though. (But as it's the only namespace used, for data-extraction it can be filtered out)-Ath (August 13, 2018, 02:05 PM)
It doesn't even look a teensy bit like this new data you've given just now, are you playing us?Well guys, the data is what I posted in my last post (Plants),-kalos (August 06, 2018, 07:01 AM)
PS: This is not related to the initial data file I wanted to process.-kalos (August 09, 2018, 10:53 AM)
How can I do that?That's why we asked more specific questions, but you never answered them.-kalos (August 20, 2018, 08:37 AM)
How can I do that?That's why we asked more specific questions, but you never answered them.-kalos (August 20, 2018, 08:37 AM)
So then I gave you the assignment of answering all our unanswered questions, but you haven't done that up until now, so basically, we are waiting (but not holding our breath) for your answers, before accepting new questions. :(-Ath (August 20, 2018, 08:59 AM)
That finally makes some sense. Here (https://www.donationcoder.com/forum/index.php?topic=45945.msg422205#msg422205) is an example solution for putting that into a .csv formatted file.-Ath (August 13, 2018, 02:05 PM)
It doesn't even look a teensy bit like this new data you've given just now, are you playing us?-Ath (August 13, 2018, 02:05 PM)
Also, I still have not figured out how to make Powershell match . any character including newline... Any hint?-kalos (August 20, 2018, 09:36 AM)
Also, I still have not figured out how to make Powershell match . any character including newline... Any hint?-kalos (August 20, 2018, 09:36 AM)
Learn to use Powershell's built-in help system:Code: PowerShell [Select]
Get-Help about_comparison_operators
Learn to use Google:
http://lmgtfy.com/?q=powershell+regex+match+any+character+including+newline
http://lmgtfy.com/?q=powershell+regex+match+multiline
http://bfy.tw/JV48-4wd (August 20, 2018, 07:15 PM)
but I cannot see in the list of operators the OR :tellme:-kalos (August 21, 2018, 04:02 AM)
but I cannot see in the list of operators the OR :tellme:-kalos (August 21, 2018, 04:02 AM)Code: PowerShell [Select]
Get-Help about_Logical_Operators-4wd (August 21, 2018, 07:49 AM)
Also, I still have not figured out how to make Powershell match . any character including newline... Any hint?-kalos (August 20, 2018, 09:36 AM)
Learn to use Powershell's built-in help system:Code: PowerShell [Select]
Get-Help about_comparison_operators
Learn to use Google:
http://lmgtfy.com/?q=powershell+regex+match+any+character+including+newline
http://lmgtfy.com/?q=powershell+regex+match+multiline
http://bfy.tw/JV48-4wd (August 20, 2018, 07:15 PM)
but I cannot see in the list of operators the OR :tellme:-kalos (August 21, 2018, 04:02 AM)Code: PowerShell [Select]
Get-Help about_Logical_Operators-4wd (August 21, 2018, 07:49 AM)
I did that, but I get this:PS H:\> Get-Help about_Logical_Operators
Get-Help : Get-Help could not find about_Logical_Operators in a help file in this session. To download updated help
topics type: "Update-Help". To get help online, search for the help topic in the TechNet library at
http://go.microsoft.com/fwlink/?LinkID=107116.
At line:1 char:1
+ Get-Help about_Logical_Operators
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : ResourceUnavailable: (:) [Get-Help], HelpNotFoundException
+ FullyQualifiedErrorId : HelpNotFound,Microsoft.PowerShell.Commands.GetHelpCommand-kalos (August 21, 2018, 08:31 AM)
Any idea why the below does not work?As usual you are asking half questions without *any* documentation. And you still haven't answered all previous questions, as requested, (and even have asked new questions in the half-baked 'answer') so in my book, you're not yet ready to ask new questions.-kalos (August 21, 2018, 09:42 AM)
Thanks, but it needs me to run it as admin, which I cannot.-kalos (August 21, 2018, 09:42 AM)
TOPIC
about_Logical_Operators
SHORT DESCRIPTION
Describes the operators that connect statements in Windows PowerShell.
LONG DESCRIPTION
The Windows PowerShell logical operators connect expressions and
statements, allowing you to use a single expression to test for multiple
conditions.
For example, the following statement uses the and operator and
the or operator to connect three conditional statements. The statement is
true only when the value of $a is greater than the value of $b, and
either $a or $b is less than 20.
($a -gt $b) -and (($a -lt 20) -or ($b -lt 20))
Windows PowerShell supports the following logical operators.
Operator Description Example
-------- ------------------------------ ------------------------
-and Logical and. TRUE only when (1 -eq 1) -and (1 -eq 2)
both statements are TRUE. False
-or Logical or. TRUE when either (1 -eq 1) -or (1 -eq 2)
or both statements are TRUE. True
-xor Logical exclusive or. TRUE (1 -eq 1) -xor (2 -eq 2)
only when one of the statements False
is TRUE and the other is FALSE.
-not Logical not. Negates the -not (1 -eq 1)
statement that follows it. False
! Logical not. Negates the !(1 -eq 1)
statement that follows it. False
(Same as -not)
Note: The previous examples also use the equal to comparison
operator (-eq). For more information, see about_Comparison_Operators.
The examples also use the Boolean values of integers. The integer 0
has a value of FALSE. All other integers have a value of TRUE.
The syntax of the logical operators is as follows:
<statement> {-AND | -OR | -XOR} <statement>
{! | -NOT} <statement>
Statements that use the logical operators return Boolean (TRUE or FALSE)
values.
The Windows PowerShell logical operators evaluate only the statements
required to determine the truth value of the statement. If the left operand
in a statement that contains the and operator is FALSE, the right operand
is not evaluated. If the left operand in a statement that contains
the or statement is TRUE, the right operand is not evaluated. As a result,
you can use these statements in the same way that you would use
the If statement.
SEE ALSO
about_Operators
Compare-Object
about_Comparison_operators
about_If
Any idea why the below does not work?As usual you are asking half questions without *any* documentation. And you still haven't answered all previous questions, as requested, (and even have asked new questions in the half-baked 'answer') so in my book, you're not yet ready to ask new questions.-kalos (August 21, 2018, 09:42 AM)-Ath (August 21, 2018, 12:26 PM)
But I need ad-hoc answers, it's not about a specific thing I try to achieve, but mostly to learn-kalos (August 22, 2018, 03:55 AM)
.. I need ad-hoc answers, it's not about a specific thing I try to achieve, but mostly to learnit's nice to see the enthusiasm for learning :up:-kalos (August 22, 2018, 03:55 AM)
Thanks, but it needs me to run it as admin, which I cannot.-kalos (August 21, 2018, 09:42 AM)
Any idea why the below does not work?
(gc *.xml) -match '(?s)<\?xml\ version="1\.0"\ encoding="UTF-8"\?>.+?</dbts:PmryObj>'
But I need ad-hoc answers, it's not about a specific thing I try to achieve, but mostly to learn-kalos (August 22, 2018, 03:55 AM)
Thanks, but it needs me to run it as admin, which I cannot.-kalos (August 21, 2018, 09:42 AM)
And it's taken 4 pages to find that out - something that should have been stated earlier.Any idea why the below does not work?
(gc *.xml) -match '(?s)<\?xml\ version="1\.0"\ encoding="UTF-8"\?>.+?</dbts:PmryObj>'
Sure.
Q: Whats's the input data?
A: We don't know.
Q: What's the command output?
A: We don't know.
Q: What version of Powershell are you using?
A: We don't know.
Q: What OS are you using, (including architecture)?
A: We don't know.
Q: What's the statistics of the input file, (eg. size)?
A: We don't know.
Q: Why the hell are you trying to process all files at once instead of one at a time?
A: We don't know.
etc, etc, etc, etc ... for 4 pages.
Idea: We don't know.
Why: See point 1 here (https://www.donationcoder.com/forum/index.php?topic=45945.msg422337#msg422337).-4wd (August 22, 2018, 06:30 AM)
Maybe we should just shut this down as it's getting a bit heated, and I think that everyone is done.Agree. Moderator, please move this thread to underground.-wraith808 (August 23, 2018, 11:46 AM)-anandcoral (August 24, 2018, 06:49 AM)
Code: PowerShell [Select]
Get-Help about_*-wraith808 (August 21, 2018, 09:04 AM)
Guys, can anyone tell me the command that will find all the regex matches, isolate a specific part of each regex match and output all of them in a file?
I have the regex, but I don't know how to indicate a part in it.
The regex is this: "<html:productType>(.+?)</html:productType>"
I used the parentheses to isolate the part of the regex that I want to be output in the file.
How the whole command should be?
I found online and wrote this:
[regex]::match($s,"<html:productType>(.+?)</html:productType>").Groups[1].Value
But I don't know where you specify the source text or if it is correct. Any hint?
Thanks!
PS: It is really a nightmare to do some simple stuff in Powershell. There is very poor and incomplete documentation. Do you think there could be any other solution? Python maybe or anything else? I need it to work with big data though and if it has GUI it would be nice. Also, it needs to be free for commercial and any use.-kalos (September 11, 2018, 11:08 AM)
Any hint?We've been here before: https://www.donationcoder.com/forum/index.php?topic=45945.msg422274#msg422274-kalos (September 12, 2018, 04:23 AM)
Any hint?We've been here before: https://www.donationcoder.com/forum/index.php?topic=45945.msg422274#msg422274-kalos (September 12, 2018, 04:23 AM)-Ath (September 12, 2018, 05:07 AM)
What could be the problem?You haven't shared the file, so we'll never know, unless...-kalos (September 12, 2018, 05:20 AM)
What could be the problem?You haven't shared the file, so we'll never know, unless...-kalos (September 12, 2018, 05:20 AM)-Ath (September 12, 2018, 10:10 AM)
I want to output to a separate file all the matches where the (.+?) is the same.Run a second command on your previous output-kalos (September 17, 2018, 05:03 AM)
gci FILEPATH\out.txt|group|select Count,Name >FILEPATH\out-counted.txt-Ath (September 17, 2018, 12:59 PM)
What's the difference?
You either have 3 lines that say:
3 Product1
1 Product2
1 Product3
Or three files that contain lines that say:
File "Product1.txt"
Product1
Product1
Product1
File "Product2.txt"
Product2
File "Product3.txt"
Product3
Either way all you're getting is a count of how many times a match appears.-4wd (September 18, 2018, 09:06 AM)
because the regex will be different! And I want to store the whole regex match in the file, which will be huge multiline text!Whut? :o-kalos (September 18, 2018, 09:21 AM)