topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • October 19, 2019, 02:20 PM
  • Proudly celebrating 13 years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - kalos [ switch to compact view ]

Pages: prev1 2 3 [4] 5 6 7 8 9 ... 70next
76
I don't understand why you do not answer my specific questions, regardless of the source data format and the desired output. Is what I am asking not possible to be done with Powershell?

For example, I want to perform a regex match that will output all matches of regex1 and regex2 and regex3.

How can I do that?

77
And one last question... when you say duplicate, you mean the whole record is duplicated?  Or just some of the fields, i.e. productID or prod id?

Some fields, eg there may be more than one assignedDate value, so the script will need to process these additional fields for the same prod.

The pseudocode I am looking for is like this:
1) search for the first 'prod' section of the file, convert it to single line, extract the appropriate regex (all matches) one after the other (that's why I want to specify the all the regex matches that I want the script to search for when scanning the line, as I am not sure which order they will be - it shouldn't change but just in case)
2) then find the next 'prod' section in the file, convert it to single line and put it in a line below the previous, then extract the regexes one by one

Any hint?

I tried to use ¦ to add OR regex matches, but I think it didn't work.

78
So it's always xml and it's always that schema?  And you're just worried about duplicates?


Yeah, for now it looks like that.

79
OK, so the input is:
<html:products>
    <html:prod id="prod1">
      <html:referenceData>
        <html:product>
          <html:classificationType>PRD</html:classificationType>
          <html:productType>PRD_XE</html:productType>
          <html:productId>10004</html:productId>
          <html:assignedDate>2018-07-23</html:assignedDate>
        </html:product>
        <html:book>
          <html:name>REPAIRS</html:name>
          <html:Entity>REP_XE</html:legalEntity>
          <html:location>ED</html:location>
        </html:book>
      </html:referenceData>
   </html:prod>

The above continues to prod2 etc.

The output of the data would be:
prod1; PRD; PRD_XE; 10004; 2018-07-23; REPAIRS; REP_XE; ED
Then a new line would start with:
prod2; etc


However, I want to convert the input data in a string, because, I may need to match longer substrings than eg "<html:classificationType>(.+?)</html:classificationType>"
Also, I think there may be duplicates for each prod, e.g. more than one assignedDate node with different values, so MatchAll would be best.
thanks!

80
2. He doesn't read what has already been given because the answer is in this thread.

PS. Sorry mouser ...  :-\

Oh sorry but due to my learning difficulty I need to be pinpointed to the exact thing.
I am developing my PS understanding though and it seems very powerful  :Thmbsup:

I wrote this script:
(gc *.txt)  -replace "regex1(.+?)", "`$1" >> out

It replaces the regex with the reference from the regex and outputs to a new file. Not exactly what I want it to do.
I want to output the reference, any idea how to do that?
Also, I want to run sequential several regex matches with their own references, one by one and append each result to the output file.

I believe piping commands does not achieve this. I think piping is about getting the output object from the previous command and feed it to the next command. However, I want the various regex matches to work on the original object sequentially. This is a bit tricky, any idea?


Also, can you tell me please how to find and select and append values from multiple xml nodes knowing their XPath?
I do that and it doesn't work:
Select-Xml -Path "*.xml" -XPath "/html:book/html:Entity" >> out
Also this doesn't work:
PS H:\> [xml]$Types = get-content *.xml
PS H:\> select-xml -xml $Types -xpath "//html:Entity"
select-xml : Namespace Manager or XsltContext needed. This query

Thanks!

81
Any idea why this does not work?
Get-Content *.xml | Out-String

I have a better idea, you tell us why you think it doesn't work.

gc *.xml -match *regex*
does not work :(

Code: PowerShell [Select]
  1. Get-Help Get-Content

You tell us why it doesn't work.

Mmm I don't know why Get-Content *.xml | Out-String does not work to be honest.
I read at https://ss64.com/ps/out-string.html:
Send the content of Test1.txt to the console as a single string:
PS C:\> get-content C:\docs\test1.txt | out-string
So, shouldn't it work?

As for gc *.xml -match *regex* it seems that -match does not go after gc, but how do I connect/pipe these?
I had no clue, but I found in an irrelevant place on the internet this:
(Get-Content .\input.txt) -join "`r`n"
How should I know that I need to parenthesise the first object? Where does it say that in PS manual?

82
General Software Discussion / REST API
« on: August 09, 2018, 02:43 PM »
Hello!

At work, they have a difficult to send a curl command to server via Rest API in order to download some data.

Having used before wget, I thought that this curl command would be something easy so upon saying that, now everyone expects me to do it.

Is there any tutorial that can enable me to do that? I have no clue.

Thanks!

83
That's good info, so I will have to have a good read on these to be able to understand PS.

I am better understanding by studying examples so I would appreciate your help with this.

PS: This is not related to the initial data file I wanted to process.

Any idea why this does not work?
Get-Content *.xml | Out-String

I want then to append to a file all the matches of a regex1 or regex2. Any idea?

Also, any idea on how to extract specific values from xml nodes?
I type  select-xml -path *.xml -xpath "/html:html/html:products/html:product/html:referenceData/html:pct/html:productId" and it doesn't work

Something else, I want to parse the text of a file and use -match on it, but I cannot figure out how to do it, it's so embarrassing!
gc *.xml -match *regex*
does not work :(

84
gci *.txt | % { sls $_.Name -Pattern '^.*"ui_mode",(\d+).*$' -a | % { $_.Matches } | % { $_.Groups[1].Value } >> K:\out.txt }

I still struggle very much with this and Google does not help. The main source of confusion I believe is the fact that I don't know if a term in the command is a random variable name or if it is a specific variable which is part of PS core. Also, another thing is that I may be able to Google "what does % mean in powershell" to find out what a specific symbol means but if they are part of another word like $_.Matches, it becomes confusing.

So the gci command will grab all the files that match *.txt in the directory.
The pipe means that we run sequentially another command.
The sls command means that it matches the regex inside the input $_.Name. What is that? I read "The $_ variable holds a reference to the current item being processed." But I don't understand what that means. Any idea?
Then, we move on to the command % which is basically a for-each loop. So for each of the Matches found with the previous command, we run another for each loop, the Groups[1].Value. I don't know what that is either. Any idea?
Finally we append to out.txt.

Are the terms Name, Matches, Groups, Value, something standard in Powershell or they are random variables names?

85
I want answers to specific questions, not a complete solution. It is not possible for anyone external to offer a complete solution because the source data cannot be shared.
Also, most of my questions are for my own understanding and may not directly relate to the specific problem.

86
Last, how to find the next regex match in the file?
Add another Select-String line with the next RegEx.

No, I don't mean a different regex. I mean the same regex. The same regex may have multiple matches in one file. How do I make the script to find the first instance, do stuff, then find the second instance, do stuff etc?

Also, how do I make . to include newline?

Also, what is | % { $_.Matches } | % { $_.Value } >> $outfile exactly?
I don't know what % and { $_. and Value are?

Also, how do I return a specific part from regex? In normal regex text editors, you put the part in parentheses and then you replace them with \1 etc. How do I do it in Powershell?

Also, I should be able to figure this out myself, but I am looking for a neat code and I can only manage to come up with messy stuff: is there a script to delete lines not containing specific literate phrases? E.g. not containing 'lue } >> $ou' without having to go through each character to check if it needs escaping or not.

87
Stream EDitor

Very interesting tool, thanks!

88
Guys, the more I am looking on it, the more I am convinced that Regex would be the best solution.
Please listen to people with more (programming) experience than you have, you are really trying to hammer round screws into square holes here, don't do that, you'll hurt yourself.

OK but I would be highly interested to learn how to do the below?
Can anyone tell me please how to find a regex in a file and append it to a file? Also, how to loop that? Last, how to find the next regex match in the file?

89
I don't understand what CDATA is.

My xml file contains tons of tags, ie text inside <>, in a complex hierarchy.
Apart from that, it contains values both inside the <>, in the format of <someTag someID="SomeValue"> and in the format of <someTag>SomeValue<\someTag>

1) I don't know what the total number and hierarchy of tags is. So can I select ALL nodes under the whole hierarchy?
2) Will PS process the both formats of values above?

90
But I cannot make it work for my file. Any hint?

Yeah, as Ath suggested, your XML  contains CDATA so you have to read that separately.

https://stackoverflo...file-with-powershell

I will try but can you help me with the below:

Guys, the more I am looking on it, the more I am convinced that Regex would be the best solution.

Can anyone tell me please how to find a regex in a file and append it to a file? Also, how to loop that? Last, how to find the next regex match in the file?



91
Guys, the more I am looking on it, the more I am convinced that Regex would be the best solution.

Can anyone tell me please how to find a regex in a file and append it to a file? Also, how to loop that? Last, how to find the next regex match in the file?


92
By the way, this script looks amazing (From: https://www.codeproj...0/powershell-and-xml):

PS C:\> $xml = (Get-Content file.xml)
PS C:\> $xml = [xml](Get-Content file.xml)
PS C:\> $xml.SelectNodes("/employees/employee")

id                                      name                                    age
--                                      ----                                    ---
101                                     Frankie Johnny                          36
102                                     Elvis Presley                           79
301                                     Ella Fitzgerald                         102

But I cannot make it work for my file. Any hint?

93
Yeah, I want to use RegEx to be honest. But I struggle to find a way to do it.

The first line of the data contains an ID. So I can store all these IDs in an array.
Then, for each entry in the array, I will be able to match some regex and output them.

The problem is that the data gets into so many deep tree branches that it gets hard to isolate them.

Mmmmm! Now I got an idea.
If I could convert the XML file in a flat structured file, where each line will display the attribute name and value (as it normally does in XML), but it will also display the attributes and values from all the above hierarchy!

That way, it will be much more manageable, because I will be able to isolate and process specific lines.

Any script that can do this?

94
By the way, is there a way to do 'find next' in Powershell without having to find all matches and create an array? I imagine the latter is very RAM consuming.

95
Mmm, I see.

Well guys, the data is what I posted in my last post (Plants), these are three sample records and they keep repeating (with different values).

How do I parse this in the most easy way?

96
The first lines are:

<?xml version="1.0" encoding="ISO8859-1" ?>
<CATALOG>
 <PLANT>
 <COMMON>Bloodroot</COMMON>
 <BOTANICAL>Sanguinaria canadensis</BOTANICAL>
 <ZONE>4</ZONE>
 <LIGHT>Mostly Shady</LIGHT>
 <PRICE>$2.44</PRICE>
 <AVAILABILITY>031599</AVAILABILITY>
 </PLANT>
 <PLANT>
 <COMMON>Columbine</COMMON>
 <BOTANICAL>Aquilegia canadensis</BOTANICAL>
 <ZONE>3</ZONE>
 <LIGHT>Mostly Shady</LIGHT>
 <PRICE>$9.37</PRICE>
 <AVAILABILITY>030699</AVAILABILITY>
 </PLANT>

Now this goes on and on and the last lines are:
<PLANT>
 <COMMON>Cardinal Flower</COMMON>
 <BOTANICAL>Lobelia cardinalis</BOTANICAL>
 <ZONE>2</ZONE>
 <LIGHT>Shade</LIGHT>
 <PRICE>$3.02</PRICE>
 <AVAILABILITY>022299</AVAILABILITY>
 </PLANT>
</CATALOG>

But I do not want to work it with Select-XML because it will limit my learning a lot. Instead I want to use REGEX so that I can learn something that can be applied to many other situations.
I believe I need to learn in PowerShell:
1) how to read file
2) how to search for a regex, store it in a variable then perform another regex search in that variable and return a part of the match or append it in an output file
3) how to search for the next instance of the regex and loop the above
4) all regexes must be multiline

97
Now we are getting somewhere, sort of.

You only didn't tell what other parts of the data you need extracted from each record, besides the ui_mode field, and what the identifying field is that should go in the first column of the csv output you suggested earlier.

We can extract the ui_mode and max_timing and the first column would be the second text in "", ie for the first recond iGTzEhwhMx4U

98
Now we are getting somewhere, sort of.

You only didn't tell what other parts of the data you need extracted from each record, besides the ui_mode field, and what the identifying field is that should go in the first column of the csv output you suggested earlier.


Is your second line a question?

99
You can check this as well:


,["t--ddbPTeIsNI","iGTzEhwhMx4U","r-iGTzEhwhMx4U",[["debug",null,null,null,null,[null,null,null,null,0]
]
,["ui_mode",1234125123l,[null,null,"inline"]
]
,["num_cols",14351435,[null,null,null,2.0]
]
,["max_timing",235123512,[null,null,null,2500.0]
]
,["check_parent_card",143512122,[null,null,null,null,1]
]
,["counterfactual_logging",213513212412,[null,null,null,null,0]
]
]
]
,["t--ddbPTeIsNI","iLS0pb0OlVDE","r-iLS0pb0OlVDE",[["debug",null,null,null,null,[null,null,null,null,0]
]
,["ui_mode",4311231235,[null,null,"inline"]
]
,["num_cols",12341241234,[null,null,null,2.0]
]
,["max_timing",23512351223,[null,null,null,2500.0]
]
,["check_parent_card",5235123412,[null,null,null,null,1]
]
,["counterfactual_logging",12351251212,[null,null,null,null,0]
]
]
]
,["t--ddbPTeIsNI","ibE7thiz85_Y","r-ibE7thiz85_Y",[["debug",null,null,null,null,[null,null,null,null,0]
]
,["ui_mode",124351235,[null,null,"inline"]
]
,["num_cols",623423451,[null,null,null,2.0]
]
,["max_timing",123512351,[null,null,null,2500.0]
]
,["check_parent_card",1235125123,[null,null,null,null,1]
]
,["counterfactual_logging",12351235145,[null,null,null,null,0]
]
]
]

Let's say I want to extract the numbers in the fields ui_mode etc or each of these three separate records.

100
Why is it useless? It's exact representation apart from the fact that are more irrelevant text around.

Pages: prev1 2 3 [4] 5 6 7 8 9 ... 70next