ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

Main Area and Open Discussion > General Software Discussion

Extract REGEX matches from multiple text files

<< < (7/22) > >>

kalos:
Now we are getting somewhere, sort of.

You only didn't tell what other parts of the data you need extracted from each record, besides the ui_mode field, and what the identifying field is that should go in the first column of the csv output you suggested earlier.
-Ath (August 05, 2018, 03:24 PM)
--- End quote ---

We can extract the ui_mode and max_timing and the first column would be the second text in "", ie for the first recond iGTzEhwhMx4U

4wd:
It looks suspiciously like JSON data which PowerShell can handle without using RegEx too much. My bad, wrong type of brackets.

What's the first 10-20 lines of the file?
And the last 20 or so, that'll give us enough, (in theory), to create a small test file.

If it was XML might be able to just use the Select-XML commandlet.

Why is it useless? It's exact representation apart from the fact that are more irrelevant text around.-kalos (August 05, 2018, 01:44 PM)
--- End quote ---

No, it's your interpretation not the exact data, (raw data), which would show us the structure.

kalos:
The first lines are:

<?xml version="1.0" encoding="ISO8859-1" ?>
<CATALOG>
 <PLANT>
 <COMMON>Bloodroot</COMMON>
 <BOTANICAL>Sanguinaria canadensis</BOTANICAL>
 <ZONE>4</ZONE>
 <LIGHT>Mostly Shady</LIGHT>
 <PRICE>$2.44</PRICE>
 <AVAILABILITY>031599</AVAILABILITY>
 </PLANT>
 <PLANT>
 <COMMON>Columbine</COMMON>
 <BOTANICAL>Aquilegia canadensis</BOTANICAL>
 <ZONE>3</ZONE>
 <LIGHT>Mostly Shady</LIGHT>
 <PRICE>$9.37</PRICE>
 <AVAILABILITY>030699</AVAILABILITY>
 </PLANT>

Now this goes on and on and the last lines are:
<PLANT>
 <COMMON>Cardinal Flower</COMMON>
 <BOTANICAL>Lobelia cardinalis</BOTANICAL>
 <ZONE>2</ZONE>
 <LIGHT>Shade</LIGHT>
 <PRICE>$3.02</PRICE>
 <AVAILABILITY>022299</AVAILABILITY>
 </PLANT>
</CATALOG>

But I do not want to work it with Select-XML because it will limit my learning a lot. Instead I want to use REGEX so that I can learn something that can be applied to many other situations.
I believe I need to learn in PowerShell:
1) how to read file
2) how to search for a regex, store it in a variable then perform another regex search in that variable and return a part of the match or append it in an output file
3) how to search for the next instance of the regex and loop the above
4) all regexes must be multiline

Ath:
,["t--ddbPTeIsNI","iGTzEhwhMx4U","r-iGTzEhwhMx4U",[["debug",null,null,null,null,[null,null,null,null,0]
]
,["ui_mode",1234125123l,[null,null,"inline"]
]
,["num_cols",14351435,[null,null,null,2.0]
]
,["max_timing",235123512,[null,null,null,2500.0]
]
,["check_parent_card",143512122,[null,null,null,null,1]
]
,["counterfactual_logging",213513212412,[null,null,null,null,0]
]
]
]

-kalos (August 05, 2018, 01:55 PM)
--- End quote ---
<PLANT>
 <COMMON>Bloodroot</COMMON>
 <BOTANICAL>Sanguinaria canadensis</BOTANICAL>
 <ZONE>4</ZONE>
 <LIGHT>Mostly Shady</LIGHT>
 <PRICE>$2.44</PRICE>
 <AVAILABILITY>031599</AVAILABILITY>
 </PLANT>

-kalos (August 06, 2018, 03:26 AM)
--- End quote ---

A couple of questions:

* In what way do these totally different types/forms of data relate to each other?
* Are they from different files?
* Is the first file some sort of definition file and the second the actual data?
* Is the first part embedded in a CDATA tag like this: "<![CDATA[ your non-xml-formed data like in the first quote goes here ]]>" ?
* Please provide the exact filenames.

Ath:
it will limit my learning a lot
-kalos (August 06, 2018, 03:26 AM)
--- End quote ---
Well, please first try to learn how to describe your challenge well, a tutorial was linked earlier by 4wd, then we will try to teach you how to best solve your challenge. It may not need regex at all.

A common saying about regexes goes like this: You try to solve a problem with a regex. Now you've got 2 problems...

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version