topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Tuesday March 19, 2024, 12:37 am
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: Any way to automate news compilation (rss feeds, websites, etc.) daily?  (Read 21297 times)

superboyac

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 6,347
    • View Profile
    • Donate to Member
I've been using rss feeds in feedly, on my ipad, on my android, etc...I've used website-watcher for a while.  But eventually I always tire of these tools and go back to using a normal browser.  And I've realized why...it's too much work.  Trying to setup these things to work easily is too much work.  If you use website-watcher, you miss all your firefox addons when reading the websites.  Rss feeds make you click 4-5 times before getting to the actual content...and it's annoying that most feeds are just a title and the real content is only on the actual website (what's the point?).

So here's what I want:
Something like website-watcher that collects all these feeds and prints it to a pdf file every morning for me.  Then I go grab the pdf and just read that.  Automated.  Is anything like this possible?  It's like your own custom newspaper delivered each morning.

rjbull

  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 3,199
    • View Profile
    • Donate to Member
In WebSite-Watcher, if you open the properties of a bookmark, one of the tabs is Actions, which allows you to do things like export the page, with or without HTML tags, or run a program against the new version of the page, including highlighted changes if you want them.  Or automatically export the page to Local Website Archive.

superboyac

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 6,347
    • View Profile
    • Donate to Member
In WebSite-Watcher, if you open the properties of a bookmark, one of the tabs is Actions, which allows you to do things like export the page, with or without HTML tags, or run a program against the new version of the page, including highlighted changes if you want them.  Or automatically export the page to Local Website Archive.
I was looking over those options.  but is there any way to export a bunch of bookmarks to a pdf?  I don't care if things are highlighted or not.  I want something where I come in the morning, open website-watcher, and I can click a button and create a pdf of all the things I want to read today.  It does seem like WW should be able to do that, right?  I just can't figure it out yet.

rjbull

  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 3,199
    • View Profile
    • Donate to Member
I agree it should be possible, but I can't see a seamless way to do it either.  I started to ponder semi-automated ways of doing it with external software, but it began to look a lot of effort.  However I haven't updated my copy in nearly three years, so don't know if that area has been improved.

Because your suggestion seems eminently sensible and should be available, I've sent an e-mail to "Aignes" requesting he review this thread.

superboyac

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 6,347
    • View Profile
    • Donate to Member
I agree it should be possible, but I can't see a seamless way to do it either.  I started to ponder semi-automated ways of doing it with external software, but it began to look a lot of effort.  However I haven't updated my copy in nearly three years, so don't know if that area has been improved.

Because your suggestion seems eminently sensible and should be available, I've sent an e-mail to "Aignes" requesting he review this thread.
Thanks!  I appreciate it.

I think the reason why it's impossible is because of, surprise, copyright.  Just about any big name rss feed is useless...just the article title and 5 word summaries.  Then you have to click to be sent to the original page for the content, where there will invariably be an article split into 7 parts, each with about two paragraphs of content, surrounded by much larger areas of ads and other distractions.

So trying to read one article is like this:
1) click to update articles in your rss reader
2) click on a headline to open up the rss feed for that article
3) which sends you to the rss version of the article, which is pretty much exactly the same as the headline in step #2 (what's the point?)
4) click to take you to the official website for the article
5) click through everything to read the entire article

Now, what makes you think they are going to make it easy for you to collect just the articles that you want and read it through in one shot, without all those clicks?  Not likely at all.  All of those clicks and inefficiencies I listed above is exactly how people are making money now.  Can you make money by making it easy on the reader like I'm describing?  Nope.  You will try, spend a lot of effort, and realize you can't sustain the effort with the money coming in.  So that's why we're where we're at.  Then, if you try to make it better, you will be accused of copyright infringement because you are shortcutting all those ads and clicks.  This is what happens in a market where the only way to make money is by making in harder for the customers to get what they want.

rjbull

  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 3,199
    • View Profile
    • Donate to Member
I never used RSS much for work, but had most of the other irritations.  One site was so full of ads, most of them animated, that I had to use Adblock Plus to be able to read it at all.  That is, the ads made it near-impossible to read the content of what was supposed to be one of the industry's leading journals.  Pointless and self-defeating; I wouldn't have viewed the site at all if I didn't really need to.  Another of the review sites insisted on splitting up its items into bite-sized bits on separate Web pages and I had to keep clicking for the next page, clipping what I needed and merging all the clips into one.  No wonder it took me the equivalent of a whole day a week to assemble the material for the current awareness bulletin.  I couldn't have done it without WebSite-Watcher.

superboyac

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 6,347
    • View Profile
    • Donate to Member
I wonder if there is a way automatically 'scrape' the good content from a website and use that to create a pdf.  I know website-watcher can catch certain things, but I doubt aignes will partake in any kind of scraping activity.  I know there are tools like dtsearch that sound like they can do some things like this.

Do you guys know of any tool that can be trained to extract text that matches a certain pattern from a website (to just get the article content and leave the other stuff out) and then sent to a text file or something for printing?

rjbull

  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 3,199
    • View Profile
    • Donate to Member
WSW usually ignores graphics, so it should ignore many ads.  It looks like Check&Get Pro is a step closer (but still no cigar).  It can e-mail a copy of the text with the changes highlit, and ignore ads.  But no mention of PDF.  I think Darwin is a Check&Get user - at least, somebody on DC is.

superboyac

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 6,347
    • View Profile
    • Donate to Member
I found some other tools.

But they seem...shady.

aignes

  • Charter Honorary Member
  • Joined in 2005
  • ***
  • Posts: 44
    • View Profile
    • WebSite-Watcher
    • Donate to Member
Something like website-watcher that collects all these feeds and prints it to a pdf file every morning for me.  Then I go grab the pdf and just read that.  Automated.  Is anything like this possible?  It's like your own custom newspaper delivered each morning.

WSW is not able to create a PDF file, but you can create a HTML report that contains found changes (Tools + Report/Export). You can also automate the report via the scripting language, an example can be found at http://www.aignes.co...ateandsendreport.htm
- Martin Aignesberger,  author of WebSite-Watcher

40hz

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 11,857
    • View Profile
    • Donate to Member
Re: Any way to automate news compilation (rss feeds, websites, etc.) daily?
« Reply #10 on: February 10, 2012, 08:28 AM »
There was a web service some years back that did exactly what you are describing, but it disappeared one day without warning or explanation. Can't remember the name of it off the top of my head.

For your project you might want to take a look at MyRSS which is about as barebones an aggregator as you'll find anywhere. Basically it's a Python script that takes a textfile of URL links to various feeds and produces an XHTML page. The authors point out you could create a CSS file to pretty-format it or whatever. It could be set up using a scheduler (or chron on Linux) to run it once per day to gather the content. From there you'd need a (AHK maybe?) script to open it in a browser, and then print it to a PDF via something like BullzipPDF or Ghostscript. And that should do it.

It's a little Rube Goldberg-ey. But it should work.

The have an example output page of what it produces so you can take a look and see if it gets you at least near where you want to go. Find it here.

Luck! :Thmbsup:

« Last Edit: February 10, 2012, 08:36 AM by 40hz »

superboyac

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 6,347
    • View Profile
    • Donate to Member
Re: Any way to automate news compilation (rss feeds, websites, etc.) daily?
« Reply #11 on: February 10, 2012, 10:30 AM »
aignes, thanks!  I'll check that feature out and see what I can get.

40hz, I'll check out that MyRSS as well.  It seems like the hardest trick is to get the actual article content and not just the rss summary.  I'm sure I can come up with theories why the service you mentioned disappeared...without a trace!  >:(

40hz

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 11,857
    • View Profile
    • Donate to Member
Re: Any way to automate news compilation (rss feeds, websites, etc.) daily?
« Reply #12 on: February 10, 2012, 11:20 AM »
They were probably "asked" to stop doing it since it cut into ad revenues for the sites it was getting the feeds from. Ads are one reason why so many sites no longer provide full text in their feeds. (Although that doesn't excuse them making you land on two seperate ad pages before they let you get to the ad strewn article page either.) There was one other site (Feedbook?) that did a download/PDF thing too, but it's not the one I'm thinking of - and they also discontinued that part of their service. I do remember reading about it on Lifehacker. But I can't see the point of going back something like four years to find the article. Especially since its now moot.

Guess if you still want something like that you're going to have to kludge up your own.

I used to go into NYC a couple of days a week to handle a contract client. I loved having my little personal newspaper with me those mornings on the metrorail in.
 ;D

« Last Edit: February 10, 2012, 11:28 AM by 40hz »

superboyac

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 6,347
    • View Profile
    • Donate to Member
Re: Any way to automate news compilation (rss feeds, websites, etc.) daily?
« Reply #13 on: February 10, 2012, 11:58 AM »
They were probably "asked" to stop doing it since it cut into ad revenues for the sites it was getting the feeds from. Ads are one reason why so many sites no longer provide full text in their feeds. (Although that doesn't excuse them making you land on two seperate ad pages before they let you get to the ad strewn article page either.) There was one other site (Feedbook?) that did a download/PDF thing too, but it's not the one I'm thinking of - and they also discontinued that part of their service. I do remember reading about it on Lifehacker. But I can't see the point of going back something like four years to find the article. Especially since its now moot.

Guess if you still want something like that you're going to have to kludge up your own.

I used to go into NYC a couple of days a week to handle a contract client. I loved having my little personal newspaper with me those mornings on the metrorail in.
 ;D
Sigh...copyright and ads.  These two things take all the pleasure out of life AND they are holding all sorts of creativity back.  Ok, looks like I need to frankenstein it myself.  Yeah, having a custom newspaper is a pretty cool thing.  I KNOW a lot of people would love that.  Imagine waking up in the morning and having all your website articles and blogs all packaged nicely in a pdf on your tablet, very clean, no ads, no clicking around endlessly.  Do we really have to wait 10 years until people figure out how to work in the copyright laws into something like that?  I mean, the technology has been here for 10 years already.

daddydave

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 867
  • test
    • View Profile
    • Donate to Member
Re: Any way to automate news compilation (rss feeds, websites, etc.) daily?
« Reply #14 on: February 10, 2012, 12:10 PM »
I've only used the (Palm OS and) Windows Mobile version, but I have used things like Plucker and iSilo/iSilox for this purpose. These don't produce a PDF, there was a desktop component to download and format the content and a mobile component app for the viewer.  Plucker is freeware but unfortunately it seems to have stopped development over a decade ago. (The Windows Mobile spinoff was called Vade Mecum, the best tool to create those Plucker documents on the desktop was called Sunrise XP.) iSiloX is multiplatform and actively developed, although I think the iOS version didn't get good reviews, and the iOS desktop sync probably doesn't work as well on that platform, don't know about Android. You could set up a schedule for each item, tell each item to shrink or remove images, etc., and it would work with both RSS feeds and web pages. For RSS feeds that showed partial content, you could specify a link depth to fetch, although usually it was better to just fetch the web page in that case. iSilo really had far superior rendering to Plucker (which was freeware and seems to be no longer developed), it preserved indent levels and colors nicely so you can look at things like formatted code listings.

« Last Edit: February 10, 2012, 03:31 PM by daddydave »

40hz

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 11,857
    • View Profile
    • Donate to Member
Re: Any way to automate news compilation (rss feeds, websites, etc.) daily?
« Reply #15 on: February 14, 2012, 04:31 PM »
@SB - Ok, ready for this? The solution might have been right under our noses all along. calibre supposedly can do this (Who woulda thunk huh?)

According to the writing on the tin:

Downloading news from the web and converting it into e-book form


calibre can automatically fetch news from websites or RSS feeds, format the news into a ebook and upload to a connected device. The ebooks include the full versions of the articles, not just the summaries. Examples of supported news sites include:

    
  • The New York Times
  • The Wall Street Journal
  • The Economist
  • Time
  • Newsweek
  • The Guardian
  • ESPN
  • and many, many more…

calibre has over three hundred news sources and the news system is plugin based, allowing users to easily create and contribute new sources to calibre. As a result the collection of news sources keeps on growing!

If you are interested in adding support for a news site, read the User Manual [External link] . Once you have successfully created a new recipe, you can share it with other users by posting it in the calibre forum [External link] or sending it to the calibre developers for inclusion in calibre.

I haven't gotten a chance to check this out yet. But you can bet I'm going to first chance I get. :Thmbsup:


superboyac

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 6,347
    • View Profile
    • Donate to Member
Re: Any way to automate news compilation (rss feeds, websites, etc.) daily?
« Reply #16 on: February 14, 2012, 04:56 PM »
Yeah!  I think you got it...we have to play with this one.

40hz

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 11,857
    • View Profile
    • Donate to Member
Re: Any way to automate news compilation (rss feeds, websites, etc.) daily?
« Reply #17 on: February 15, 2012, 03:07 PM »
re: calibre: Ok, I just gave it a workout. What it does (i.e go out and get a feed at a scheduled time, download it, and create an ebook out of it) works quite well.

per the developer of calibre:

The news downloading feature, one of calibre's most popular, has an interesting story behind it. I used to subscribe to Newsweek, back when it was still a real news magazine. But one fine day, Newsweek simply stopped being delivered to my house and no matter how much time I spent on the phone with various sales reps, it simply would not start again. Since I'd just got my first e-book reader at the time, I decided to add the ability to download and convert websites to calibre. From the beginning, I decided to make it as modular as possible, so that other people could contribute "recipes" for different news sites. The calibre cookbook has kept on growing and now calibre has recipes for over three hundred news sources in many different languages.

The limitations, however, are annoying. Each feed gets made into its own book. You can't combine feeds using the standard scripts provided by calibre. I'm guessing you could if you were to combine them in you own script. But that defeats some of the convenience being sought.

The other problem is that a new book gets created for each source each time the "get news" button is pushed. So if you were tracking 10 feeds daily, on Monday you'd find 10 books in your library list. When it ran again on Tuesday you would then have 20 books in your library unless you deleted Monday's run. Not a real problem since you could just select all and delete. But what happens when you add something in that only gets checked weekly - and for which you want to keep a few back issues on hand? Since calibre doesn't allow you to set up folders, it starts getting excessively "manual" keeping your newsrack pruned. Which, in all fairness, may only be a problem for tech news junkies like me.

I'm in the habit of closely tracking about 30 feeds daily - and well over a hundred additional between those I peruse on a weekly or monthly basis. So having somewhere between 100 and 150 "books" in my library just for that doesn't really work for me. I suppose I could do it using a portable installation of calibre which would be used just for feeds and act as a super-newsreader. But it's kind of a kludge. And it still doesn't combine multiple feeds into a single book. I don't want a library's periodical room. I want a geek's version of Reader's Digest.

What I was hoping for was something that could support a few different collections of RSS feeds. Something that could take three different feed lists and use them to produce a daily newspaper, a weekly journal, and a monthly magazine, all on an fully automated basis.

calibre can't do that. But it's soooo close it makes me want to scream.

But that won't accomplish anything worthwhile.

So now I'm firing up my email program and composing an extremely polite message to calibre's developer Kovid Goyal to ask what it would take to get that capability added.

Time to make a wish! :)

wish upon star.jpg


superboyac

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 6,347
    • View Profile
    • Donate to Member
Re: Any way to automate news compilation (rss feeds, websites, etc.) daily?
« Reply #18 on: February 16, 2012, 12:31 AM »
^^^AAAHH!! 40...you're too much.  That poster is my favorite one from years ago!  We actually spent an hour browsing and bought a framed one for work!

DeVamp

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 122
  • Let the coding begin :-)
    • View Profile
    • Donate to Member
Re: Any way to automate news compilation (rss feeds, websites, etc.) daily?
« Reply #19 on: February 16, 2012, 06:01 AM »
Can't you just combine the feeds with a RSS mixer (examples : http://blueblots.com/tools/rss-feeds/)

And then use this one feed each day to get your book?

If you want an extra feed or one less, you can jsut do it on the service instead of in calibre.

40hz

  • Supporting Member
  • Joined in 2007
  • **
  • Posts: 11,857
    • View Profile
    • Donate to Member
Re: Any way to automate news compilation (rss feeds, websites, etc.) daily?
« Reply #20 on: February 16, 2012, 06:37 AM »
Can't you just combine the feeds with a RSS mixer (examples : http://blueblots.com/tools/rss-feeds/)

And then use this one feed each day to get your book?

If you want an extra feed or one less, you can jsut do it on the service instead of in calibre.

Never heard of this before. Thanks for sharing! :Thmbsup: I will definitely be checking this one out.
 :)

iphigenie

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 1,170
    • View Profile
    • Donate to Member
Re: Any way to automate news compilation (rss feeds, websites, etc.) daily?
« Reply #21 on: February 17, 2012, 01:50 AM »
Feedly feels a bit like what you describe, but I'm sure you've seen it

I used java client blogbridge for a while, and while it is not a browser, it does have a mode that has that "one paper" feel a bit.

iphigenie

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 1,170
    • View Profile
    • Donate to Member
Re: Any way to automate news compilation (rss feeds, websites, etc.) daily?
« Reply #22 on: February 17, 2012, 02:12 AM »
Just had a quick look in my curation link list. This is a bit strange since it needs an app/software then generates a page, but it's not far from your description http://www.genieo.com/

IainB

  • Supporting Member
  • Joined in 2008
  • **
  • Posts: 7,540
  • @Slartibartfarst
    • View Profile
    • Read more about this member.
    • Donate to Member
...What I was hoping for was something that could support a few different collections of RSS feeds. Something that could take three different feed lists and use them to produce a daily newspaper, a weekly journal, and a monthly magazine, all on an fully automated basis.
calibre can't do that. But it's soooo close it makes me want to scream.
But that won't accomplish anything worthwhile.
So now I'm firing up my email program and composing an extremely polite message to calibre's developer Kovid Goyal to ask what it would take to get that capability added. ...
@40hz: I had cross-linked your post on this to the Re: Calibre - e-Book (Personal Library/Document) Management - Mini-Review, and as I am in the process of updating that Mini-Review, I wondered whether you had some news/response from  Kovid Goyal. If you do, then could you please post it here? I could add it in to the update.
Many thanks.