ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

DonationCoder.com Software > N.A.N.Y. 2019

NANY 2019: RegexCaptor - Simple app to extract email or other patterns from text

<< < (4/9) > >>

osensnolf:
I joined just to reply - I ran the program to extract emails from a 2.6GB file but I keep getting this error.

External exception EEFFACE

This happens less than 10% through my search.

Another option to add - allow me to turn off Results Preview.  For a large result, that consumes a lot of time when really all I want is the exported results.

How much $$ to get this resolved?

I downloaded the version in the original link.  I am a Windows 7 user - is there a better option I should use if my only goal is to extract emails from multiple large files that I have?

THanks

mouser:
Let me give it an update and see if I can reproduce the problem with a huge file.
Option to disable preview is a good idea..

Let me ask, can you see about how many results it found when it hit that error? It might give me a clue.

osensnolf:
The file contains 119m rows of data at 2.6GB txt

It stops at.. Found 7968374 results. Scanning line 7970000

On row 7969999 (1 before), I see the following email.
[email protected]

I removed that value and saved and ran the test again but still it stopped at the same place like clockwork.

What's odd is that I downloaded another email extractor (much slower) and it too locks up around the same time but I do not know which row it is on.  Storage and memory are not an issue.

I will try now with a larger and then a smaller file.

UPDATE with 3.5GB File with 27m rows, same error.
Found 9792207 results scanning line 11800000

mouser:
I noticed when I last used it that it also seemed to be reporting an unusually high number of results, so it seems it's time for me to release an update.  I'll try to get one done in the next few days.

No excuse for it throwing that exception -- it sounds like I will be able to reproduce and solve it if I just use a big enough input file.  Standby.

ps. I think the default email regular expression I have in the program is not great -- it might be nice to find a better one.

osensnolf:
If you need me to test it, you can send it to me on PM.  I will pay once it is confirmed to work as I would like to have it ASAP.

Other than knowing what they are, I do not know anything about regular expressions so I'll avoid pasting something that I find online as I"m sure you will be able to find a better solution faster than me.

But yes, disabling the preview is a must.  Even with a smaller file more time is spent populating the preview than actually getting the results.

Thank you!

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version