Show Posts

151

General Software Discussion / Is it possible to group names by fuzzy logic?

« on: September 26, 2018, 11:07 AM »

Hello!

My data has several client names. It is very often that there are several names of the group of one client.

For example, Batavia Insurance, Batavia Fund, etc probably belong to the Batavia group.

It would mostly work if I match the first word, so how could I write an Excel function to group/count entries that their cell has the same first word?

But it is not 100% safe, eg when you have PT Batavia Company and a prefix kills your algo.

I know it will never be 100% accurate, but do you have any idea how to approach this?

Thanks!

152

Living Room / Re: How to model this?

« on: September 26, 2018, 03:15 AM »

Oh damn, I forgot I am talking about EXCEL lol

153

Living Room / How to model this?

« on: September 24, 2018, 04:56 PM »

Hello,

I want to model the processing of some cases, so I know the process time of each case and the number of employees, so I can find the end date.

However, depending on the deadline, the cases may need to be reprocessed every a fixed number of months, so that the total number of cases may increase depending on the end date.

How can I model this? I find difficulty because the number of cases to be reprocessed affects the end date, but also the end date affects the number of cases to be reprocessed!

Any idea?

Thanks

154

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: September 18, 2018, 09:21 AM »

What's the difference?

You either have 3 lines that say:

3 Product1
1 Product2
1 Product3

Or three files that contain lines that say:

File "Product1.txt"
Product1
Product1
Product1

File "Product2.txt"
Product2

File "Product3.txt"
Product3

Either way all you're getting is a count of how many times a match appears.
-4wd (September 18, 2018, 09:06 AM)

No it's not the same, because the regex will be different! And I want to store the whole regex match in the file, which will be huge multiline text!

155

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: September 18, 2018, 07:56 AM »

gci FILEPATH\out.txt|group|select Count,Name >FILEPATH\out-counted.txt
-Ath (September 17, 2018, 12:59 PM)

No you misunderstood. I don't want to count matches. I want to group them and output them in a separate file.

For example, I will search for my regex:
<html:producttype>(.+?)</html:producttype>
The possible matches will be:
<html:producttype>Product1</html:producttype>
<html:producttype>Product1</html:producttype>
<html:producttype>Product2</html:producttype>
<html:producttype>Product1</html:producttype>
<html:producttype>Product3</html:producttype>
etc

I want the script to create one file with the matches where the (.+?) is the same, so:
1 file that contains:
<html:producttype>Product1</html:producttype>
<html:producttype>Product1</html:producttype>
<html:producttype>Product1</html:producttype>
1 file that contains:
<html:producttype>Product2</html:producttype>
and 1 file that contains:
<html:producttype>Product3</html:producttype>

Thanks!

156

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: September 17, 2018, 05:03 AM »

Guys, after I search for regex matches in a text, how can I group the matches to separate files, by same reference inside the regex match?

For example, for every regex match <html:producttype>(.+?)</html:producttype>, I want to output to a separate file all the matches where the (.+?) is the same.

Any idea? Also, please explain the strategy/pseudocode to see how that would work.

157

General Software Discussion / Re: Big Data tools

« on: September 14, 2018, 12:36 PM »

Is there a 'free for commercial use' software that opens and processes with regex large text files of 20GB?
-kalos (September 14, 2018, 09:47 AM)

TL;DR; Yes.

Suggestions:
Any Linux distro with a non-commercial license (most will fit)
Linux-tools for Windows (if it's supposed to run on Windows, you didn't say it should)
In any of these environments use tools like grep, sed, awk, perl, python etc. for text processing.

'Plain' Windows (7 and newer)[/i]
Use powershell, like shown & explained in your other thread.

-Ath (September 14, 2018, 10:44 AM)

Thanks but I want to also be able to view the content, like EmEditor

158

Living Room / Re: Looking for smartphone

« on: September 14, 2018, 12:35 PM »

What are the top 3-5 cheapest mobiles with NFC and >5000mah?
Any idea?
-kalos (August 22, 2018, 10:11 AM)

Any suggestion for the best phone with longest battery and NFC?

159

General Software Discussion / Big Data tools

« on: September 14, 2018, 09:47 AM »

Hello!
Is there a 'free for commercial use' software that opens and processes with regex large text files of 20GB?
Thanks!

160

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: September 13, 2018, 05:10 AM »

What could be the problem?
-kalos (September 12, 2018, 05:20 AM)
You haven't shared the file, so we'll never know, unless...
-Ath (September 12, 2018, 10:10 AM)

I made it work like that:
gci FILEPATH | sls -AllMatches '<html:productType>(.+?)<\/html:productType>' | % { $_.Matches } | % { $_.Groups[1].Value } >> FILEPATH\out.txt

But I don't know how I made it work lol, can you spot the error? Also, I know I asked before, but can you point me to somewhere that explains % { $_.Matches } | % { $_.Groups[1].Value } ?
I think % means 'for every' and $_.Matches is the object variable of the matches, while $_.Groups[1].Value is the content value of the matches objects, right? But what is [1]?

UPDATE: it seems both work, but which would be better?
Thanks!

161

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: September 12, 2018, 05:20 AM »

Any hint?
-kalos (September 12, 2018, 04:23 AM)
We've been here before: https://www.donation....msg422274#msg422274
-Ath (September 12, 2018, 05:07 AM)

Ah great thanks!

I tested it and there is an issue. I searched in the file and there is only one instance of <html:productType>(.+?)</html:productType>
However, the output file mentioned the above value (.+?) twice. What could be the problem?

Thanks!

gci C:\XML.xml | % { sls $_.Name -Pattern '<html:productType>(.+?)<\/html:productType>' -a | % { $_.Matches } | % { $_.Groups[1].Value } >> C:\out.txt }

162

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: September 12, 2018, 04:23 AM »

Guys, can anyone tell me the command that will find all the regex matches, isolate a specific part of each regex match and output all of them in a file?

I have the regex, but I don't know how to indicate a part in it.
The regex is this: "<html:productType>(.+?)</html:productType>"
I used the parentheses to isolate the part of the regex that I want to be output in the file.

How the whole command should be?

I found online and wrote this:
[regex]::match($s,"<html:productType>(.+?)</html:productType>").Groups[1].Value
But I don't know where you specify the source text or if it is correct. Any hint?

Thanks!

PS: It is really a nightmare to do some simple stuff in Powershell. There is very poor and incomplete documentation. Do you think there could be any other solution? Python maybe or anything else? I need it to work with big data though and if it has GUI it would be nice. Also, it needs to be free for commercial and any use.
-kalos (September 11, 2018, 11:08 AM)

Anyone please?

163

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: September 11, 2018, 11:08 AM »

Guys, can anyone tell me the command that will find all the regex matches, isolate a specific part of each regex match and output all of them in a file?

I have the regex, but I don't know how to indicate a part in it.
The regex is this: "<html:productType>(.+?)</html:productType>"
I used the parentheses to isolate the part of the regex that I want to be output in the file.

How the whole command should be?

I found online and wrote this:
[regex]::match($s,"<html:productType>(.+?)</html:productType>").Groups[1].Value
But I don't know where you specify the source text or if it is correct. Any hint?

Thanks!

PS: It is really a nightmare to do some simple stuff in Powershell. There is very poor and incomplete documentation. Do you think there could be any other solution? Python maybe or anything else? I need it to work with big data though and if it has GUI it would be nice. Also, it needs to be free for commercial and any use.

164

Living Room / Label printer

« on: September 05, 2018, 04:37 PM »

Hi!

I want a very cheap and very compact solution to print return labels when I am returning goods bought online.

Is there anything like that?

A normal printer can be cheap but it is too big. Some portable printers are more compact but very expensive for the use I want.

It doesn't have to be A4, it can much smaller!

Any idea?

Thanks!

165

Living Room / Re: Looking for smartphone

« on: August 22, 2018, 10:11 AM »

What are the top 3-5 cheapest mobiles with NFC and >5000mah?
Any idea?

166

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: August 22, 2018, 03:55 AM »

Any idea why the below does not work?
-kalos (August 21, 2018, 09:42 AM)
As usual you are asking half questions without *any* documentation. And you still haven't answered all previous questions, as requested, (and even have asked new questions in the half-baked 'answer') so in my book, you're not yet ready to ask new questions.
-Ath (August 21, 2018, 12:26 PM)

But I need ad-hoc answers, it's not about a specific thing I try to achieve, but mostly to learn

167

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: August 21, 2018, 09:42 AM »

Thanks, but it needs me to run it as admin, which I cannot.

Any idea why the below does not work?

(gc *.xml) -match '(?s)<\?xml\ version="1\.0"\ encoding="UTF-8"\?>.+?</dbts:PmryObj>'

168

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: August 21, 2018, 08:43 AM »

Also, I still have not figured out how to make Powershell match . any character including newline... Any hint?
-kalos (August 20, 2018, 09:36 AM)

Learn to use Powershell's built-in help system:
Code: PowerShell [Select]
Get-Help about_comparison_operators

Learn to use Google:
http://lmgtfy.com/?q...er+including+newline
http://lmgtfy.com/?q...egex+match+multiline
http://bfy.tw/JV48
-4wd (August 20, 2018, 07:15 PM)

I tried that but it is not clear if (?s) goes to either:
at the beginning of the regex and after the '
at the beginning of the regex and before the '
just before .

Any idea?

169

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: August 21, 2018, 08:31 AM »

but I cannot see in the list of operators the OR
-kalos (August 21, 2018, 04:02 AM)

Code: PowerShell [Select]
Get-Help about_Logical_Operators
-4wd (August 21, 2018, 07:49 AM)

I did that, but I get this:

[Select]

PS H:\> Get-Help about_Logical_Operators
Get-Help : Get-Help could not find about_Logical_Operators in a help file in this session. To download updated help
topics type: "Update-Help". To get help online, search for the help topic in the TechNet library at
http://go.microsoft.com/fwlink/?LinkID=107116.
At line:1 char:1
+ Get-Help about_Logical_Operators
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : ResourceUnavailable: (:) [Get-Help], HelpNotFoundException
+ FullyQualifiedErrorId : HelpNotFound,Microsoft.PowerShell.Commands.GetHelpCommand

170

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: August 21, 2018, 04:02 AM »

Also, I still have not figured out how to make Powershell match . any character including newline... Any hint?
-kalos (August 20, 2018, 09:36 AM)

Learn to use Powershell's built-in help system:
Code: PowerShell [Select]
Get-Help about_comparison_operators

Learn to use Google:
http://lmgtfy.com/?q...er+including+newline
http://lmgtfy.com/?q...egex+match+multiline
http://bfy.tw/JV48
-4wd (August 20, 2018, 07:15 PM)

but I cannot see in the list of operators the OR

171

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: August 20, 2018, 09:36 AM »

How can I do that?
-kalos (August 20, 2018, 08:37 AM)
That's why we asked more specific questions, but you never answered them.
So then I gave you the assignment of answering all our unanswered questions, but you haven't done that up until now, so basically, we are waiting (but not holding our breath) for your answers, before accepting new questions.
-Ath (August 20, 2018, 08:59 AM)

OK I start again:

That finally makes some sense. Here is an example solution for putting that into a .csv formatted file.
-Ath (August 13, 2018, 02:05 PM)

The problem with that is that I do not always know the node tree hierarchy and also it may change per record! That's why I cannot use the node tree hierarchy to extract a value, but I can use a guess of it, if that helps, eg //NODE1/*/NODE3/ ?

It doesn't even look a teensy bit like this new data you've given just now, are you playing us?
-Ath (August 13, 2018, 02:05 PM)

It looks the same to me??? Only the attribute names and values change. But again, the records do not contain the same attributes and in the same order. There can be some basic rules that all records follow, but unfortunately the data structure is not consistent, that's why I want to use regex, to include some fuzziness in matching!

Also, I still have not figured out how to make Powershell match . any character including newline... Any hint?

172

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: August 20, 2018, 08:37 AM »

I don't understand why you do not answer my specific questions, regardless of the source data format and the desired output. Is what I am asking not possible to be done with Powershell?

For example, I want to perform a regex match that will output all matches of regex1 and regex2 and regex3.

How can I do that?

173

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: August 13, 2018, 10:47 AM »

And one last question... when you say duplicate, you mean the whole record is duplicated? Or just some of the fields, i.e. productID or prod id?
-wraith808 (August 13, 2018, 10:05 AM)

Some fields, eg there may be more than one assignedDate value, so the script will need to process these additional fields for the same prod.

The pseudocode I am looking for is like this:
1) search for the first 'prod' section of the file, convert it to single line, extract the appropriate regex (all matches) one after the other (that's why I want to specify the all the regex matches that I want the script to search for when scanning the line, as I am not sure which order they will be - it shouldn't change but just in case)
2) then find the next 'prod' section in the file, convert it to single line and put it in a line below the previous, then extract the regexes one by one

Any hint?

I tried to use ¦ to add OR regex matches, but I think it didn't work.

174

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: August 13, 2018, 08:45 AM »

So it's always xml and it's always that schema? And you're just worried about duplicates?
-wraith808 (August 13, 2018, 07:18 AM)

Yeah, for now it looks like that.

175

General Software Discussion / Re: Extract REGEX matches from multiple text files

« on: August 13, 2018, 05:03 AM »

OK, so the input is:
<html:products>
<html:prod id="prod1">
<html:referenceData>
<html:product>
<html:classificationType>PRD</html:classificationType>
<html:productType>PRD_XE</html:productType>
<html:productId>10004</html:productId>
<html:assignedDate>2018-07-23</html:assignedDate>
</html:product>
<html:book>
<html:name>REPAIRS</html:name>
<html:Entity>REP_XE</html:legalEntity>
<html:location>ED</html:location>
</html:book>
</html:referenceData>
</html:prod>

The above continues to prod2 etc.

The output of the data would be:
prod1; PRD; PRD_XE; 10004; 2018-07-23; REPAIRS; REP_XE; ED
Then a new line would start with:
prod2; etc

However, I want to convert the input data in a string, because, I may need to match longer substrings than eg "<html:classificationType>(.+?)</html:classificationType>"
Also, I think there may be duplicates for each prod, e.g. more than one assignedDate node with different values, so MatchAll would be best.
thanks!

Messages - kalos [ switch to compact view ]

General Software Discussion / Is it possible to group names by fuzzy logic?

Living Room / Re: How to model this?

Living Room / How to model this?

General Software Discussion / Re: Extract REGEX matches from multiple text files

General Software Discussion / Re: Extract REGEX matches from multiple text files

General Software Discussion / Re: Extract REGEX matches from multiple text files

General Software Discussion / Re: Big Data tools

Living Room / Re: Looking for smartphone

General Software Discussion / Big Data tools

General Software Discussion / Re: Extract REGEX matches from multiple text files

General Software Discussion / Re: Extract REGEX matches from multiple text files

General Software Discussion / Re: Extract REGEX matches from multiple text files

General Software Discussion / Re: Extract REGEX matches from multiple text files

Living Room / Label printer

Living Room / Re: Looking for smartphone

General Software Discussion / Re: Extract REGEX matches from multiple text files

General Software Discussion / Re: Extract REGEX matches from multiple text files

General Software Discussion / Re: Extract REGEX matches from multiple text files

General Software Discussion / Re: Extract REGEX matches from multiple text files

General Software Discussion / Re: Extract REGEX matches from multiple text files

General Software Discussion / Re: Extract REGEX matches from multiple text files

General Software Discussion / Re: Extract REGEX matches from multiple text files

General Software Discussion / Re: Extract REGEX matches from multiple text files

General Software Discussion / Re: Extract REGEX matches from multiple text files

General Software Discussion / Re: Extract REGEX matches from multiple text files