|App Version Reviewed||v184.108.40.206|
|Pricing Scheme||Free for personal use; $19.99 for commercial use|
|Author Donation Link||Donate to CodeByter (Carl Danley), the program author|
|Relationship btwn. Reviewer and Product|| This was a program I have wanted for my own use for quite some time. So I encouraged Carl to code it and pestered him the whole way with ideas and requests, and offered some occasional programming advice.|
LineByter is a utility designed to find and extract
patterns from text files.
It's a brand new (free) program coded by DonationCoder member Carl Danley (CodeByter) and released today.
It includes some unique features like duplicate removal, the ability to specify multiple match and reject patterns, and the ability to save and load profiles, that make it ideal for doing repeated things like extracting emails or urls from text files.Motivation for the program
When we send out the DonationCoder mailing list, a certain number of the newsletter emails bounce back each month as undeliverable. I use phplist to manage the web mailing list but lately what I've been doing is exporting these bounced emails from my email program and running an email extraction utility on the exported email to get a list of email addresses from these emails, and then feeding them into a script that turns off email notification for those users on the forum whose emails can be found.
In the past i've been using a now-discontinued utility designed specifically to extract emails. But it's less than ideal. It's a big clunky, it sometimes finds things that aren't emails, and sometimes misses real emails. It also has a bad user interface and doesn't remove duplicates. After i would run this utility i would bring the output file into a text editor, sort and remove duplicates, and then go through and remove certain emails, like those that are really donationcoder.com addresses and a a few known fake email address patterns that seem to show up regularly.
SO that's why I have been wanting for a while a little utility that is better at extracting emails and doing some of the things automatically that i have been doing manually. Of course I could have written a little perl or python script for it, but i am a big fan of custom gui tools for such things.
LineByter is the program that emerged from my discussions with Carl about this idea. It's actually a much more general purpose program that can extract and reject all kinds of regular expression patterns, BUT it's also designed to be really easy to use and is focused specifically enough on the general workflow that i described above so that it's a real joy to use for this kind of stuff.Features
Some key features of the program:
- You can drag and drop as many files to scan as you want.
- Nice progress bar so you can see how much more time it's going to take.
- Supports preset library of regular expressions so you can easily just select common patterns and add your own presets -- this is super important for letting you quickly reuse patterns and makes it suitable even for those who don't understand regular expression syntax.
- Lets you specify a list of multiple patterns that are being searched for and how to extract the data you want from these patterns.
- Lets you specify a list of additional patterns which should be rejected even if they match the first list (ideal if you want to find and extract all email patterns except those with certain properties).
- Shows a nice complete report of why each pattern was found and/or rejected.
- Automatically removes duplicates.
- Produces a final list of results in text form that can be copied to clipboard or saved to file.
- Can save and load profiles so you can reuse configuration settings for common jobs you perform.
This screenshot shows the Patterns tab. It may look overwhelming but basically i've just selected and added two presets from the drop down list at top to tell it to search for all email and url patterns and extract them. And i've added some conditions for rejection, so that all .com and .org results will be excluded, and all results that are less than 5 characters will be rejected as well.
This screenshot shows the "Chomping Grid" which basically gives you a full report of what is found and where, and why it was rejected if it was. Note that it uses the "labels" associated with the regular expression patterns on the previous screen, and shows you when duplicates are found. You can also sort by any of these columns. A planned feature for a future version will let you quickly jump to and investigate the found pattern.
This screenshot shows the final tab collecting all of the matches that weren't rejected. It's updated live like the other tabs (don't be concerned that my screenshot is showing email addresses -- this is from a scan of my spam folder so these are all spam addresses). The patterns extracted are the first "capture group" from the regular expression matched.Summary
This program far exceeded my expectations and desires. It is more generally useful than the program i initially wanted, but the simple interface and the use of presets and saved profiles means that i can use it to extract emails from a file with just one or two clicks, and once i save a profile with the patterns i want matched and rejected, i never have to mess with those settings again.
For those who are looking for a completely general purpose regular expression searching tool, this program is not for you -- there are more powerful and flexible grep tools available. This program is much more focused at people who perform repeated extraction of data from files, using some common patterns that they have use for again and again. If you need such a program, this is a miracle tool and a joy to use.