Main Area and Open Discussion > General Software Discussion
Extract REGEX matches from multiple text files
4wd:
What's the difference?
You either have 3 lines that say:
3 Product1
1 Product2
1 Product3
Or three files that contain lines that say:
File "Product1.txt"
Product1
Product1
Product1
File "Product2.txt"
Product2
File "Product3.txt"
Product3
Either way all you're getting is a count of how many times a match appears.
kalos:
What's the difference?
You either have 3 lines that say:
3 Product1
1 Product2
1 Product3
Or three files that contain lines that say:
File "Product1.txt"
Product1
Product1
Product1
File "Product2.txt"
Product2
File "Product3.txt"
Product3
Either way all you're getting is a count of how many times a match appears.
-4wd (September 18, 2018, 09:06 AM)
--- End quote ---
No it's not the same, because the regex will be different! And I want to store the whole regex match in the file, which will be huge multiline text!
Ath:
because the regex will be different! And I want to store the whole regex match in the file, which will be huge multiline text!
-kalos (September 18, 2018, 09:21 AM)
--- End quote ---
Whut? :o
We're back at square one. The circle is completed, again.
You have come here, asking for 'help'. Please provide us with what you want to achieve, and stop asking for small bits of silly info, with even more silly examples. This is not going to get you to a solution, as you obviously don't understand how problem-solving works.
I pretty sure I've said this before, a couple of weeks ago. :(
If you can't comply with that, I'd suggest all participants to ignore your requests until something useful comes out of your keyboard.
4wd:
You expected anything more?
@kalos: Paste your complete PS script here as it is currently, not as a single line but as a correctly formatted script with no PS shortcuts.
FYI: 80% of what you seemingly want now is covered by this script: https://www.donationcoder.com/forum/index.php?topic=45945.msg422784#msg422784
Only thing missing is output to separate files which would be trivial to add ...
4wd:
InputFile 1: xml-test.xml
<?xml version="1.0" encoding="ISO8859-1" ?>
<html:products>
<html:prod id="prod1">
<html:referenceData>
<html:product>
<html:classificationType>PRD</html:classificationType>
<html:productType>PRD_XE</html:productType>
<html:productId>10004</html:productId>
<html:assignedDate>2018-07-23</html:assignedDate>
</html:product>
<html:book>
<html:name>REPAIRS</html:name>
<html:Entity>REP_XE</html:legalEntity>
<html:location>ED</html:location>
</html:book>
</html:referenceData>
</html:prod>
<html:prod id="prod2">
<html:referenceData>
<html:product>
<html:classificationType>PRD2</html:classificationType>
<html:productType>PRD_XE2</html:productType>
<html:productId>10005</html:productId>
<html:assignedDate>2018-12-23</html:assignedDate>
</html:product>
<html:book>
<html:name>REPAIRS2</html:name>
<html:Entity>REP_XE2</html:legalEntity>
<html:location>ED2</html:location>
</html:book>
</html:referenceData>
</html:prod>
<html:prod id="prod3">
<html:referenceData>
<html:product>
<html:classificationType>PRD</html:classificationType>
<html:productType>PRD_XE</html:productType>
<html:productId>10004</html:productId>
<html:assignedDate>2013-07-23</html:assignedDate>
</html:product>
<html:book>
<html:name>REPAIRS3</html:name>
<html:Entity>REP_XE3</html:legalEntity>
<html:location>ED3</html:location>
</html:book>
</html:referenceData>
</html:prod>
<html:prod id="prod1">
<html:referenceData>
<html:product>
<html:classificationType>PRD4</html:classificationType>
<html:productType>PRD_XE4</html:productType>
<html:productId>10567</html:productId>
<html:assignedDate>2010-07-23</html:assignedDate>
</html:product>
<html:book>
<html:name>REPAIRS4</html:name>
<html:Entity>REP_XE4</html:legalEntity>
<html:location>ED4</html:location>
</html:book>
</html:referenceData>
</html:prod>
<html:prod id="prod5">
<html:referenceData>
<html:product>
<html:classificationType>PRD5</html:classificationType>
<html:productType>PRD_XE5</html:productType>
<html:productId>10004890</html:productId>
<html:assignedDate>2015-05-15</html:assignedDate>
</html:product>
<html:book>
<html:name>REPAIRS5</html:name>
<html:Entity>REP_XE5</html:legalEntity>
<html:location>ED5</html:location>
</html:book>
</html:referenceData>
</html:prod>
</html:products>
File2: xml test2.xml
<?xml version="1.0" encoding="ISO8859-1" ?>
<html:products>
<html:prod id="prod1">
<html:referenceData>
<html:product>
<html:classificationType>PRD</html:classificationType>
<html:productType>PRD_XE</html:productType>
<html:productId>10004</html:productId>
<html:assignedDate>2018-03-23</html:assignedDate>
</html:product>
<html:book>
<html:name>REFUNDS</html:name>
<html:Entity>REP_XE</html:legalEntity>
<html:location>ED</html:location>
</html:book>
</html:referenceData>
</html:prod>
<html:prod id="prod2">
<html:referenceData>
<html:product>
<html:classificationType>PRD2</html:classificationType>
<html:productType>PRD_XE2</html:productType>
<html:productId>10005</html:productId>
<html:assignedDate>2015-12-23</html:assignedDate>
</html:product>
<html:book>
<html:name>REPAIRS2k12</html:name>
<html:Entity>REP_XE2</html:legalEntity>
<html:location>ED57</html:location>
</html:book>
</html:referenceData>
</html:prod>
<html:prod id="prod3">
<html:referenceData>
<html:product>
<html:classificationType>PRD4</html:classificationType>
<html:productType>PRD_XER3</html:productType>
<html:productId>10014</html:productId>
<html:assignedDate>2010-07-23</html:assignedDate>
</html:product>
<html:book>
<html:name>DESTRUCTION</html:name>
<html:Entity>REP_XE3</html:legalEntity>
<html:location>ED43</html:location>
</html:book>
</html:referenceData>
</html:prod>
<html:prod id="prod4">
<html:referenceData>
<html:product>
<html:classificationType>PRD4</html:classificationType>
<html:productType>PRD_XE4</html:productType>
<html:productId>10567</html:productId>
<html:assignedDate>1999-07-23</html:assignedDate>
</html:product>
<html:book>
<html:name>WHORU</html:name>
<html:Entity>REP_XS4</html:legalEntity>
<html:location>ED4</html:location>
</html:book>
</html:referenceData>
</html:prod>
<html:prod id="prod5">
<html:referenceData>
<html:product>
<html:classificationType>PRD5</html:classificationType>
<html:productType>PRD_XE5</html:productType>
<html:productId>10004890</html:productId>
<html:assignedDate>2115-12-15</html:assignedDate>
</html:product>
<html:book>
<html:name>SCREW_THIS</html:name>
<html:Entity>REP_XE5</html:legalEntity>
<html:location>ED5</html:location>
</html:book>
</html:referenceData>
</html:prod>
</html:products>
Output10004890.csv
--- Code: Text ---prod5,PRD5,PRD_XE5,10004890,2115-12-15,SCREW_THIS,REP_XE5,ED5prod5,PRD5,PRD_XE5,10004890,2015-05-15,REPAIRS5,REP_XE5,ED5
10004890.xml
--- Code: Text ---<html:prod id="prod5"> <html:referenceData> <html:product> <html:classificationType>PRD5</html:classificationType> <html:productType>PRD_XE5</html:productType> <html:productId>10004890</html:productId> <html:assignedDate>2115-12-15</html:assignedDate> </html:product> <html:book> <html:name>SCREW_THIS</html:name> <html:Entity>REP_XE5</html:legalEntity> <html:location>ED5</html:location> </html:book> </html:referenceData> </html:prod><html:prod id="prod5"> <html:referenceData> <html:product> <html:classificationType>PRD5</html:classificationType> <html:productType>PRD_XE5</html:productType> <html:productId>10004890</html:productId> <html:assignedDate>2015-05-15</html:assignedDate> </html:product> <html:book> <html:name>REPAIRS5</html:name> <html:Entity>REP_XE5</html:legalEntity> <html:location>ED5</html:location> </html:book> </html:referenceData> </html:prod>
--- Code: PowerShell ---<#.NAME XML-GUI.ps1#> Add-Type -AssemblyName System.Windows.Forms[System.Windows.Forms.Application]::EnableVisualStyles() #region begin GUI{ $Form = New-Object system.Windows.Forms.Form$Form.ClientSize = '246,178'$Form.text = "XML Mulcher"$Form.BackColor = "#cccccc"$Form.TopMost = $false$Form.FormBorderStyle = 'Fixed3D'$Form.MaximizeBox = $false $TextBox1 = New-Object system.Windows.Forms.TextBox$TextBox1.Text = ""$TextBox1.multiline = $false$TextBox1.ReadOnly = $true$TextBox1.Width = 185$TextBox1.height = 20$TextBox1.Location = New-Object System.Drawing.Point(16,20)$TextBox1.Font = 'Microsoft Sans Serif,10' $ListBox1 = New-Object system.Windows.Forms.ListBox$ListBox1.text = ""$ListBox1.width = 100$ListBox1.height = [email protected]('Classification','ProductType','ProductID') | ForEach-Object {[void] $ListBox1.Items.Add($_)}$ListBox1.location = New-Object System.Drawing.Point(16,50) $Label1 = New-Object system.Windows.Forms.Label$Label1.Text = "Processing:"$Label1.width = 68$Label1.height = 16$Label1.location = New-Object System.Drawing.Point(16,146)$Label1.Font = 'Microsoft Sans Serif,8' $TextBox2 = New-Object system.Windows.Forms.TextBox$TextBox2.multiline = $false$TextBox2.ReadOnly = $true$TextBox2.Width = 140$TextBox2.height = 16$TextBox2.Location = New-Object System.Drawing.Point(88,144)$TextBox2.Font = 'Microsoft Sans Serif,8' $Button1 = New-Object system.Windows.Forms.Button$Button1.text = "Go"$Button1.width = 60$Button1.height = 30$Button1.location = New-Object System.Drawing.Point(171,65)$Button1.Font = 'Microsoft Sans Serif,10' $Button2 = New-Object system.Windows.Forms.Button$Button2.text = "..."$Button2.width = 25$Button2.height = 25$Button2.location = New-Object System.Drawing.Point(206,19)$Button2.Font = 'Microsoft Sans Serif,10' $Label2 = New-Object system.Windows.Forms.Label$Label2.Text = "Output:"$Label2.width = 60$Label2.height = 16$Label2.location = New-Object System.Drawing.Point(16,120)$Label2.Font = 'Microsoft Sans Serif,8' $RadioButton1 = New-Object system.Windows.Forms.RadioButton$RadioButton1.text = "XML"$RadioButton1.AutoSize = $true$RadioButton1.width = 40$RadioButton1.height = 16$RadioButton1.location = New-Object System.Drawing.Point(88,118)$RadioButton1.Font = 'Microsoft Sans Serif,8' $RadioButton2 = New-Object system.Windows.Forms.RadioButton$RadioButton2.text = "CSV"$RadioButton2.Checked = $true$RadioButton2.AutoSize = $true$RadioButton2.width = 40$RadioButton2.height = 16$RadioButton2.location = New-Object System.Drawing.Point(148,118)$RadioButton2.Font = 'Microsoft Sans Serif,8' $Form.controls.AddRange(@($ListBox1,$TextBox1,$Button1,$Button2,$Label1,$TextBox2,$Label2,$RadioButton1,$RadioButton2)) #region gui events {$Button1.Add_Click({ if ($TextBox1.Text -ne "") { if ($ListBox1.SelectedItem -ne $null) { Clear-Host Set-Regex ($ListBox1.SelectedItem) } }}) $Button2.Add_Click({ $objForm = New-Object System.Windows.Forms.FolderBrowserDialog $objForm.Description = "Select folder containing XML" $objForm.SelectedPath = [System.Environment+SpecialFolder]'MyComputer' $objForm.ShowNewFolderButton = $false $result = $objForm.ShowDialog() if ($result -eq "OK") { $TextBox1.Text = $objForm.SelectedPath } else { $TextBox1.Text = "" }}) #endregion events }#endregion GUI } #Write your logic code hereFunction Set-Regex { param ( [string]$selItem ) switch ($selItem) { "Classification" { $regex = '(____________________________)(.+?)(___)' } "ProductType" { $regex = '(_____________________)(.+?)(___)' } "ProductID" { $regex = '(___________________)(.+?)(___)' } } Mulch-Files $regex} Function Mulch-Files { param ( [string]$pattern ) $files = Get-ChildItem -Path ($TextBox1.Text + "\*.xml") for ($h = 0; $h -lt $files.Count; $h++) { $TextBox2.Text = $files[$h].Name $products = (Get-Content $files[$h] -Raw) -_____ '(____)^.*?(____________________________)' for ($i = 1; $i -lt $products.Count; $i += 2) { $products[$i] -_____ '(_________)(.+?)(___)' $prod = $Matches[0] $temp = $products[$i] -_____ $pattern for ($j = 0; $j -lt $temp.Count; $j++) { if ($RadioButton2.Checked) { $outFile = $Matches[0] + ".csv" $outText = ($prod + (((($products[$i] -replace '(<[^>]+>|\s)', ',' ) -replace '`r', '') -replace '`n', '') -replace '(,)(,)+', '$1').TrimEnd(',')) } else { $outFile = $Matches[0] + ".xml" $outText = $products[$i] } Out-File -FilePath $outFile -InputObject $outText -Append } } } $TextBox2.Text = "Finished"} [void]$Form.ShowDialog()
Navigation
[0] Message Index
[*] Previous page
Go to full version