ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

Main Area and Open Discussion > General Software Discussion

Extract REGEX matches from multiple text files

<< < (18/22) > >>

kalos:
Any idea why the below does not work?
-kalos (August 21, 2018, 09:42 AM)
--- End quote ---
As usual you are asking half questions without *any* documentation. And you still haven't answered all previous questions, as requested, (and even have asked new questions in the half-baked 'answer') so in my book, you're not yet ready to ask new questions.
-Ath (August 21, 2018, 12:26 PM)
--- End quote ---

But I need ad-hoc answers, it's not about a specific thing I try to achieve, but mostly to learn

4wd:
But I need ad-hoc answers, it's not about a specific thing I try to achieve, but mostly to learn-kalos (August 22, 2018, 03:55 AM)
--- End quote ---

And yet every answer given here can be found on Google ... if you're going to learn anything, learn to ask the right questions.

Input<?xml version="1.0" encoding="ISO8859-1" ?>
<html:products>
    <html:prod id="prod1">
      <html:referenceData>
        <html:product>
          <html:classificationType>PRD</html:classificationType>
          <html:productType>PRD_XE</html:productType>
          <html:productId>10004</html:productId>
          <html:assignedDate>2018-07-23</html:assignedDate>
        </html:product>
        <html:book>
          <html:name>REPAIRS</html:name>
          <html:Entity>REP_XE</html:legalEntity>
          <html:location>ED</html:location>
        </html:book>
      </html:referenceData>
   </html:prod>
    <html:prod id="prod2">
      <html:referenceData>
        <html:product>
          <html:classificationType>PRD2</html:classificationType>
          <html:productType>PRD_XE2</html:productType>
          <html:productId>10005</html:productId>
          <html:assignedDate>2018-12-23</html:assignedDate>
        </html:product>
        <html:book>
          <html:name>REPAIRS2</html:name>
          <html:Entity>REP_XE2</html:legalEntity>
          <html:location>ED2</html:location>
        </html:book>
      </html:referenceData>
   </html:prod>
    <html:prod id="prod3">
      <html:referenceData>
        <html:product>
          <html:classificationType>PRD3</html:classificationType>
          <html:productType>PRD_XE3</html:productType>
          <html:productId>10014</html:productId>
          <html:assignedDate>2013-07-23</html:assignedDate>
        </html:product>
        <html:book>
          <html:name>REPAIRS3</html:name>
          <html:Entity>REP_XE3</html:legalEntity>
          <html:location>ED3</html:location>
        </html:book>
      </html:referenceData>
   </html:prod>
    <html:prod id="prod4">
      <html:referenceData>
        <html:product>
          <html:classificationType>PRD4</html:classificationType>
          <html:productType>PRD_XE4</html:productType>
          <html:productId>10567</html:productId>
          <html:assignedDate>2010-07-23</html:assignedDate>
        </html:product>
        <html:book>
          <html:name>REPAIRS4</html:name>
          <html:Entity>REP_XE4</html:legalEntity>
          <html:location>ED4</html:location>
        </html:book>
      </html:referenceData>
   </html:prod>
    <html:prod id="prod5">
      <html:referenceData>
        <html:product>
          <html:classificationType>PRD5</html:classificationType>
          <html:productType>PRD_XE5</html:productType>
          <html:productId>10004890</html:productId>
          <html:assignedDate>2015-05-15</html:assignedDate>
        </html:product>
        <html:book>
          <html:name>REPAIRS5</html:name>
          <html:Entity>REP_XE5</html:legalEntity>
          <html:location>ED5</html:location>
        </html:book>
      </html:referenceData>
   </html:prod>
</html:products>



--- Code: PowerShell ---gc "test.xml" -Raw | sls '(?smi)(<html:prod\s.+?/html:prod>)' -AllMatches | % {$_.Matches} | % { ((((($_.Value) -replace '(<[^>]+>|\s)', '; ') -replace '`r', '') -replace '`n', '') -replace '(;\s)(;\s)+', '$1').Trim('; ') }
OutputPRD; PRD_XE; 10004; 2018-07-23; REPAIRS; REP_XE; ED
PRD2; PRD_XE2; 10005; 2018-12-23; REPAIRS2; REP_XE2; ED2
PRD3; PRD_XE3; 10014; 2013-07-23; REPAIRS3; REP_XE3; ED3
PRD4; PRD_XE4; 10567; 2010-07-23; REPAIRS4; REP_XE4; ED4
PRD5; PRD_XE5; 10004890; 2015-05-15; REPAIRS5; REP_XE5; ED5


* Stop trying to put it all on one line, (just because I do doesn't mean you should).
* Stop using Powershell shortcuts until you to understand what they are because they make the source harder to read, (and yes, I used them for a reason).
* No, the output is not exactly what you wanted - not my problem since getting any coherent information is like extracting a 3 course meal from a lump of granite and takes just as long.
* If it doesn't work on the files you have - again, not my problem - see above reason.
Your homework to increase your knowledge: Render the above one line into a multi-line Powershell script with no command shortcuts.
Non-optional extra: Tell us what the RegEx is doing.
Optional extra: Fix it so you get the prod value from the input data at the start of the output lines.
Optional extra: Make it process multiple files without using gc *.xml anywhere in it.

If it doesn't work on your data, you tell us why, don't ask us, we're not mind readers.

I'm done.

tomos:
.. I need ad-hoc answers, it's not about a specific thing I try to achieve, but mostly to learn
-kalos (August 22, 2018, 03:55 AM)
--- End quote ---
it's nice to see the enthusiasm for learning :up:

Regards the responses you're getting here - I know nothing about the topic but can see you're being given a big opportunity to learn how to approach things, how to tackle a problem, how to learn.

Can you tell us:
why don't you take the experts' advice?
why don't you answer their questions?

4wd:
Thanks, but it needs me to run it as admin, which I cannot.-kalos (August 21, 2018, 09:42 AM)
--- End quote ---

And it's taken 4 pages to find that out - something that should have been stated earlier.

Any idea why the below does not work?

(gc *.xml) -match '(?s)<\?xml\ version="1\.0"\ encoding="UTF-8"\?>.+?</dbts:PmryObj>'
--- End quote ---

Sure.

Q: Whats's the input data?
A: We don't know.

Q: What's the command output?
A: We don't know.

Q: What version of Powershell are you using?
A: We don't know.

Q: What OS are you using, (including architecture)?
A: We don't know.

Q: What's the statistics of the input file, (eg. size)?
A: We don't know.

Q: Why the hell are you trying to process all files at once instead of one at a time?
A: We don't know.

etc, etc, etc, etc ... for 4 pages.

Idea: We don't know.
Why: See point 1 here.

Ath:
+1
But I need ad-hoc answers, it's not about a specific thing I try to achieve, but mostly to learn
-kalos (August 22, 2018, 03:55 AM)
--- End quote ---

In that case be as clear as you can be, by asking fully documented questions, meaning: (and I've said this before)

* provide a complete file with input data, if you have to anonymize it, then only the data should be altered, not the structure
* ask as explicit and unambiguous as possible
* give an example of the desired/expected output, based on the input data
This entire thread is full of examples of you not following these business-standard rules...

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version