topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • September 21, 2018, 09:51 PM
  • Proudly celebrating 13 years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - kalos [ switch to compact view ]

Pages: [1] 2 3 4 5 6 ... 67next
1
What's the difference?

You either have 3 lines that say:

3  Product1
1  Product2
1  Product3

Or three files that contain lines that say:

File "Product1.txt"
Product1
Product1
Product1

File "Product2.txt"
Product2

File "Product3.txt"
Product3

Either way all you're getting is a count of how many times a match appears.

No it's not the same, because the regex will be different! And I want to store the whole regex match in the file, which will be huge multiline text!

2
gci FILEPATH\out.txt|group|select Count,Name >FILEPATH\out-counted.txt

No you misunderstood. I don't want to count matches. I want to group them and output them in a separate file.

For example, I will search for my regex:
<html:producttype>(.+?)</html:producttype>
The possible matches will be:
<html:producttype>Product1</html:producttype>
<html:producttype>Product1</html:producttype>
<html:producttype>Product2</html:producttype>
<html:producttype>Product1</html:producttype>
<html:producttype>Product3</html:producttype>
etc

I want the script to create one file with the matches where the (.+?) is the same, so:
1 file that contains:
<html:producttype>Product1</html:producttype>
<html:producttype>Product1</html:producttype>
<html:producttype>Product1</html:producttype>
1 file that contains:
<html:producttype>Product2</html:producttype>
and 1 file that contains:
<html:producttype>Product3</html:producttype>

Thanks!

3
Guys, after I search for regex matches in a text, how can I group the matches to separate files, by same reference inside the regex match?

For example, for every regex match <html:producttype>(.+?)</html:producttype>, I want to output to a separate file all the matches where the (.+?) is the same.

Any idea? Also, please explain the strategy/pseudocode to see how that would work.

4
General Software Discussion / Re: Big Data tools
« on: September 14, 2018, 12:36 PM »
Is there a 'free for commercial use' software that opens and processes with regex large text files of 20GB?

TL;DR; Yes.

Suggestions:
  • Any Linux distro with a non-commercial license (most will fit)
  • Linux-tools for Windows (if it's supposed to run on Windows, you didn't say it should)
In any of these environments use tools like grep, sed, awk, perl, python etc. for text processing.

  • 'Plain' Windows (7 and newer)[/i]
Use powershell, like shown & explained in your other thread.


Thanks but I want to also be able to view the content, like EmEditor

5
Living Room / Re: Looking for smartphone
« on: September 14, 2018, 12:35 PM »
What are the top 3-5 cheapest mobiles with NFC and >5000mah?
Any idea?

Any suggestion for the best phone with longest battery and NFC?

6
General Software Discussion / Big Data tools
« on: September 14, 2018, 09:47 AM »
Hello!
Is there a 'free for commercial use' software that opens and processes with regex large text files of 20GB?
Thanks!

7
What could be the problem?
You haven't shared the file, so we'll never know, unless...

I made it work like that:
gci FILEPATH | sls -AllMatches '<html:productType>(.+?)<\/html:productType>' | % { $_.Matches } | % { $_.Groups[1].Value } >> FILEPATH\out.txt

But I don't know how I made it work lol, can you spot the error? Also, I know I asked before, but can you point me to somewhere that explains % { $_.Matches } | % { $_.Groups[1].Value } ?
I think % means 'for every' and $_.Matches is the object variable of the matches, while $_.Groups[1].Value is the content value of the matches objects, right? But what is [1]?

UPDATE: it seems both work, but which would be better?
Thanks!

8
Any hint?
We've been here before: http://www.donationc....msg422274#msg422274

Ah great thanks!

I tested it and there is an issue. I searched in the file and there is only one instance of <html:productType>(.+?)</html:productType>
However, the output file mentioned the above value (.+?) twice. What could be the problem?

Thanks!

gci C:\XML.xml | % { sls $_.Name -Pattern '<html:productType>(.+?)<\/html:productType>' -a | % { $_.Matches } | % { $_.Groups[1].Value } >> C:\out.txt }

9
Guys, can anyone tell me the command that will find all the regex matches, isolate a specific part of each regex match and output all of them in a file?

I have the regex, but I don't know how to indicate a part in it.
The regex is this: "<html:productType>(.+?)</html:productType>"
I used the parentheses to isolate the part of the regex that I want to be output in the file.

How the whole command should be?

I found online and wrote this:
[regex]::match($s,"<html:productType>(.+?)</html:productType>").Groups[1].Value
But I don't know where you specify the source text or if it is correct. Any hint?

Thanks!

PS: It is really a nightmare to do some simple stuff in Powershell. There is very poor and incomplete documentation. Do you think there could be any other solution? Python maybe or anything else? I need it to work with big data though and if it has GUI it would be nice. Also, it needs to be free for commercial and any use.


Anyone please?

10
Guys, can anyone tell me the command that will find all the regex matches, isolate a specific part of each regex match and output all of them in a file?

I have the regex, but I don't know how to indicate a part in it.
The regex is this: "<html:productType>(.+?)</html:productType>"
I used the parentheses to isolate the part of the regex that I want to be output in the file.

How the whole command should be?

I found online and wrote this:
[regex]::match($s,"<html:productType>(.+?)</html:productType>").Groups[1].Value
But I don't know where you specify the source text or if it is correct. Any hint?

Thanks!

PS: It is really a nightmare to do some simple stuff in Powershell. There is very poor and incomplete documentation. Do you think there could be any other solution? Python maybe or anything else? I need it to work with big data though and if it has GUI it would be nice. Also, it needs to be free for commercial and any use.

11
Living Room / Label printer
« on: September 05, 2018, 04:37 PM »
Hi!

I want a very cheap and very compact solution to print return labels when I am returning goods bought online.

Is there anything like that?

A normal printer can be cheap but it is too big. Some portable printers are more compact but very expensive for the use I want.

It doesn't have to be A4, it can much smaller!

Any idea?

Thanks!

12
Living Room / Re: Looking for smartphone
« on: August 22, 2018, 10:11 AM »
What are the top 3-5 cheapest mobiles with NFC and >5000mah?
Any idea?

13
Any idea why the below does not work?
As usual you are asking half questions without *any* documentation. And you still haven't answered all previous questions, as requested, (and even have asked new questions in the half-baked 'answer') so in my book, you're not yet ready to ask new questions.

But I need ad-hoc answers, it's not about a specific thing I try to achieve, but mostly to learn

14
Thanks, but it needs me to run it as admin, which I cannot.

Any idea why the below does not work?

(gc *.xml) -match '(?s)<\?xml\ version="1\.0"\ encoding="UTF-8"\?>.+?</dbts:PmryObj>'

15
Also, I still have not figured out how to make Powershell match . any character including newline... Any hint?

Learn to use Powershell's built-in help system:
Code: PowerShell [Select]
  1. Get-Help about_comparison_operators

Learn to use Google:
http://lmgtfy.com/?q...er+including+newline
http://lmgtfy.com/?q...egex+match+multiline
http://bfy.tw/JV48

I tried that but it is not clear if (?s) goes to either:
at the beginning of the regex and after the '
at the beginning of the regex and before the '
just before .

Any idea?

16
but I cannot see in the list of operators the OR  :tellme:

Code: PowerShell [Select]
  1. Get-Help about_Logical_Operators

I did that, but I get this:

PS H:\> Get-Help about_Logical_Operators
Get-Help : Get-Help could not find about_Logical_Operators in a help file in this session. To download updated help
topics type: "Update-Help". To get help online, search for the help topic in the TechNet library at
http://go.microsoft.com/fwlink/?LinkID=107116.
At line:1 char:1
+ Get-Help about_Logical_Operators
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : ResourceUnavailable: (:) [Get-Help], HelpNotFoundException
    + FullyQualifiedErrorId : HelpNotFound,Microsoft.PowerShell.Commands.GetHelpCommand

17
Also, I still have not figured out how to make Powershell match . any character including newline... Any hint?

Learn to use Powershell's built-in help system:
Code: PowerShell [Select]
  1. Get-Help about_comparison_operators

Learn to use Google:
http://lmgtfy.com/?q...er+including+newline
http://lmgtfy.com/?q...egex+match+multiline
http://bfy.tw/JV48

 :up: but I cannot see in the list of operators the OR  :tellme:

18
How can I do that?
That's why we asked more specific questions, but you never answered them.
So then I gave you the assignment of answering all our unanswered questions, but you haven't done that up until now, so basically, we are waiting (but not holding our breath) for your answers, before accepting new questions. :(


OK I start again:

That finally makes some sense. Here is an example solution for putting that into a .csv formatted file.

The problem with that is that I do not always know the node tree hierarchy and also it may change per record! That's why I cannot use the node tree hierarchy to extract a value, but I can use a guess of it, if that helps, eg //NODE1/*/NODE3/ ?

It doesn't even look a teensy bit like this new data you've given just now, are you playing us?

It looks the same to me??? Only the attribute names and values change. But again, the records do not contain the same attributes and in the same order. There can be some basic rules that all records follow, but unfortunately the data structure is not consistent, that's why I want to use regex, to include some fuzziness in matching!

Also, I still have not figured out how to make Powershell match . any character including newline... Any hint?

19
I don't understand why you do not answer my specific questions, regardless of the source data format and the desired output. Is what I am asking not possible to be done with Powershell?

For example, I want to perform a regex match that will output all matches of regex1 and regex2 and regex3.

How can I do that?

20
And one last question... when you say duplicate, you mean the whole record is duplicated?  Or just some of the fields, i.e. productID or prod id?

Some fields, eg there may be more than one assignedDate value, so the script will need to process these additional fields for the same prod.

The pseudocode I am looking for is like this:
1) search for the first 'prod' section of the file, convert it to single line, extract the appropriate regex (all matches) one after the other (that's why I want to specify the all the regex matches that I want the script to search for when scanning the line, as I am not sure which order they will be - it shouldn't change but just in case)
2) then find the next 'prod' section in the file, convert it to single line and put it in a line below the previous, then extract the regexes one by one

Any hint?

I tried to use ¦ to add OR regex matches, but I think it didn't work.

21
So it's always xml and it's always that schema?  And you're just worried about duplicates?


Yeah, for now it looks like that.

22
OK, so the input is:
<html:products>
    <html:prod id="prod1">
      <html:referenceData>
        <html:product>
          <html:classificationType>PRD</html:classificationType>
          <html:productType>PRD_XE</html:productType>
          <html:productId>10004</html:productId>
          <html:assignedDate>2018-07-23</html:assignedDate>
        </html:product>
        <html:book>
          <html:name>REPAIRS</html:name>
          <html:Entity>REP_XE</html:legalEntity>
          <html:location>ED</html:location>
        </html:book>
      </html:referenceData>
   </html:prod>

The above continues to prod2 etc.

The output of the data would be:
prod1; PRD; PRD_XE; 10004; 2018-07-23; REPAIRS; REP_XE; ED
Then a new line would start with:
prod2; etc


However, I want to convert the input data in a string, because, I may need to match longer substrings than eg "<html:classificationType>(.+?)</html:classificationType>"
Also, I think there may be duplicates for each prod, e.g. more than one assignedDate node with different values, so MatchAll would be best.
thanks!

23
2. He doesn't read what has already been given because the answer is in this thread.

PS. Sorry mouser ...  :-\

Oh sorry but due to my learning difficulty I need to be pinpointed to the exact thing.
I am developing my PS understanding though and it seems very powerful  :Thmbsup:

I wrote this script:
(gc *.txt)  -replace "regex1(.+?)", "`$1" >> out

It replaces the regex with the reference from the regex and outputs to a new file. Not exactly what I want it to do.
I want to output the reference, any idea how to do that?
Also, I want to run sequential several regex matches with their own references, one by one and append each result to the output file.

I believe piping commands does not achieve this. I think piping is about getting the output object from the previous command and feed it to the next command. However, I want the various regex matches to work on the original object sequentially. This is a bit tricky, any idea?


Also, can you tell me please how to find and select and append values from multiple xml nodes knowing their XPath?
I do that and it doesn't work:
Select-Xml -Path "*.xml" -XPath "/html:book/html:Entity" >> out
Also this doesn't work:
PS H:\> [xml]$Types = get-content *.xml
PS H:\> select-xml -xml $Types -xpath "//html:Entity"
select-xml : Namespace Manager or XsltContext needed. This query

Thanks!

24
Any idea why this does not work?
Get-Content *.xml | Out-String

I have a better idea, you tell us why you think it doesn't work.

gc *.xml -match *regex*
does not work :(

Code: PowerShell [Select]
  1. Get-Help Get-Content

You tell us why it doesn't work.

Mmm I don't know why Get-Content *.xml | Out-String does not work to be honest.
I read at https://ss64.com/ps/out-string.html:
Send the content of Test1.txt to the console as a single string:
PS C:\> get-content C:\docs\test1.txt | out-string
So, shouldn't it work?

As for gc *.xml -match *regex* it seems that -match does not go after gc, but how do I connect/pipe these?
I had no clue, but I found in an irrelevant place on the internet this:
(Get-Content .\input.txt) -join "`r`n"
How should I know that I need to parenthesise the first object? Where does it say that in PS manual?

25
General Software Discussion / REST API
« on: August 09, 2018, 02:43 PM »
Hello!

At work, they have a difficult to send a curl command to server via Rest API in order to download some data.

Having used before wget, I thought that this curl command would be something easy so upon saying that, now everyone expects me to do it.

Is there any tutorial that can enable me to do that? I have no clue.

Thanks!

Pages: [1] 2 3 4 5 6 ... 67next