ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

Special User Sections > N.A.N.Y. 2014

"Good enough" and NANY proto-components!?

(1/2) > >>

TaoPhoenix:
I like one of the trends of this year's nany towards "back to the basics" text processing. Merging some "innovative directions in nany" I thrashed out with Mouser a year or two ago, after thinking "gee, look at all the neat text apps under pledge", I suddenly thought to poke around my oDesk account.

Turns out I have some proto-components still in Alpha stages that aren't good enough yet to be legit nany by themselves. But maybe one of y'all could adapt them and/or the ideas into features of your own programs. They were parts of projects I was working on a couple of years ago until I ran out of development money! :(

The next couple of messages will rough out stuff as I poke around my archives and see what I have on tap. Sometimes I had a couple of different coders work on the same theme just to see different approaches, so sometimes if y'all don't care for a particular implementation, I might have a second version, and then you'd get the overall idea and can put it into your own versions.

These are all fragments that I custom commissioned and have not been released anywhere else.

TaoPhoenix:
Spawn TextFile

Given a large-ish inbound source text file, this app prototype looks for a delimiter and then rips apart the inbound source and resaves several smaller files.





Tip #1: Some initial support was put in to deal with "rhetorical" annotation styles. So for example if the initial file was "separated" by a lot of dashes to show segments, the app is supposed to treat them all as one split point and not pound out 15 blank files.

Tip #2: Watch that there aren't accidental cases of the delimiter. So in my example text (borrowed from some Slashdot comments of mine) I used Asterisks * as an emphasis device! But this app will split at those points too! So before you run the splitter you may want to use a really crafty search&replace to make sure the delimiter between sections is something you never nomally use. For example I never use ^ so done right, the "emphasis" * marks would stay put and only the splitter ^ would kick in. On the same lines, if you for ex wanted to split at Dashes ------, then watch for Hyphenated words!

Included are what I believe is the (mostly) complete source code, a test run showing two split-sets from an example source file, thus showing two levels of analysis on the initial source for "different bosses", and the app.

Known Bug #1: There is a glitch in the file naming output that has different numbers of digits on the exported filenames.

(Pyrohacker, you were a major inspiration for me to remember I had this stuff! You said "This app will make it easier to analyze a chunk of text and see useful information. Its focus is analysis rather than manipulation." However, specifically with a large input file in mind, I found it useful to be able to yank out the parts of text you need!)


TaoPhoenix:
This version has an option to name the files by date of processing.

The source and the app are in the same batch this time.

P.S. I also have a proof of concept awk script version 3, but I won't burn a whole other post unless someone wants it. These were the two main versions I commissioned.

I don't recall what bugs were left in this version but I signed off at the time as "good enough". : )








TaoPhoenix:
This is the grand overall fragment that led me to make this thread.

Before even this year's NANY pledges, but especially in light of them, I dreamed up the idea ... what if you merged a word processor but with custom power-user features?

Warning: This is Pre-Alpha mockup-concept level only! But since my programmer already whacked at it, at least it's half built into a shell that one of you might pick up and run with!

The idea in action:

1. I commissioned "modules" from third tier outsource programmers on simple stuff "that didn't matter if it had an obscure bug".

2. I gave the modules to my project design lead programmer to merge into this processor shell as a feature module. He had the leeway to clean up the initial module codes etc.

We tried to design a shell that you could just for ex drop "coding snacks and NANY's into all day long". So for example, the spawn utililities above "are what they are" ... but what if that was built right into the text reader app? So our proof of concept has a couple of different versions of that algorithm. (Footnote - I didn't burn time above posting the third version - those were enough for you to get the idea. It was Bassem Fawzy's that's in here but not posted standalone above.)

"So if you'll forgive the crudity of the model" (Doc Brown!) here we go!

This project was def at the "Coding Lunch" level, and then I ran out of funds to keep going, plus one feature no longer matters because it's obsolete now etc. (TreeDB into CssZenGarden - A lost interest and B I now use MyInfo and their dev might be able to fix the MyInfo code on his end to make things easier.) But the grand vision was something like merging *all* of your text and file NANY's and Snacks into "one shell to rule them all"! This would be the text Turbo Processor that did EVERYTHING!

1. The shell is based off the Scintilla project. We put a little work into finding "open licensed" source shells to drop into.

2. The first few menus are your typical word processor ones. For you programmers, it has Scintilla's pre-built highlighting support per language.

3. But the real power is in the menus like Data Control, Tools, and Options. Unfortunately the only module we had time to put in was the file spawner. But for ex if you didn't like how the spawn files came out, the app can nuke them too (or any other junk files you don't need after doing some test. )

So I hope this framework inspires someone!

"New Components for the New Year!"



TaoPhoenix:
PDF Generator

Here is another example of a StandAlone module that could theoretically be part of the Turbo Processor.

Suppose you are doing web research.

For background, MilesAhead wrote me a "Beta" version of his BBSS that creates paired lists of titles and URLs.

Here is Mouser's Newsletter:
http://www.donationcoder.com/forum/index.php?topic=36729.msg343854#msg343854

We have decided we want to look at the NANY section.


1. Open them all up in tabs in Firefox. (Note - there seems to be a bug and this doesn't work right in Pale Moon.)

2. MilesAhead's BBSS reads the tabs and creates a paired list of your research tabs as titles and URLs in ... wait for it ... text files! : )

3. My module reads the URL and then creates matching PDFs for each of those URLs.



Known bugs:

1. As I just found out in this run, there are some unclear problems with the filenames of the resulting PDFs.
2. We put a bit of thought into how Duplicate URLs in a list are handled. I think the file number in the first character is supposed to keep increasing because it was supposed to be line count equal to the number of original URLs. But this is why this is a component - this part needs work.
3. Misc "Look and Feel" bugs - for example if you're working with a "logged in system" such as email or Monster (my original use case) or other, you might get badly formatted PDFs because the app wouldn't have correct login permissions etc. There are others. That's why I have to sign off with this "only as a component".

4. The package is rather large - it uses QtWebKit I think, and that's more than the post limit here. If anyone wants this piece I'll find somewhere to upload it, maybe my private server.

Some info:
The BBSS thread from Miles is here:
http://www.donationcoder.com/forum/index.php?topic=30913.0

Attached is the sample title and URL list and a couple of sample output PDFs.

Navigation

[0] Message Index

[#] Next page

Go to full version