topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Wednesday December 4, 2024, 6:54 pm
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Author Topic: Accessing articles from many pdfs  (Read 7870 times)

tsaint

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 497
  • Hi from the a*** end of the earth
    • View Profile
    • Read more about this member.
    • Donate to Member
Accessing articles from many pdfs
« on: August 31, 2023, 07:39 PM »
I download and keep a newspaper, pdf format, text searchable, each day.
I have 2 pieces of software  for doing a key word or phrase search on the year's collection of pdfs.
 My problem though, is that this is not sufficient for keeping track of just some articles on a particular topic....eg I can locate all articles on renewable energy, but only really am interested in 5% of them.
So I want to end up with a list of links, preferably with comments, which allows me to link to relevant pdf articles.
I don't want to do heaps of copy/pasting into a notes database

Three approaches spring to mind:
1. A sticky notes program which would allow for attaching a note to a specific page in a given pdf, and being able to use the sticky notes s/ware to be able to search its database of notes
2. Bookmarking articles or pages in the pdf and then somehow extracting all bookmarks from all pages and using some other (unknown to me) s/ware to manage those. ChatGPT suggested a python routine/script to me to extract bookmarks but it's over my head
3. Some info management s/ware which allows for linking to specific locations in local pdfs. I saw that excel allows this if you use acrobat and within that, copy the page number. I use PDF-xchange and can't seem to do that.

Probably there's a better way, as I'm sure this is not a unique want. Any ideas would be appreciated.
Thanks
Tony

rjbull

  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 3,205
    • View Profile
    • Donate to Member
Re: Accessing articles from many pdfs
« Reply #1 on: September 01, 2023, 05:55 PM »
I'm not sure this is a helpful bit of lateral thinking, but had you considered using XPDF to convert your PDFs to text?  Text would be a much more tractable format, easier to search, edit and bookmark.

tsaint

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 497
  • Hi from the a*** end of the earth
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: Accessing articles from many pdfs
« Reply #2 on: September 03, 2023, 11:18 PM »
Thanks for taking the time to reply RJ ... it's a possibility worth me pursuing, and agree with tractable.
I'm still hopeful of a pdf only solution though.
I know I can insert text into a pdf referring to article I'm interested in keeping view of,
and atm, that seems the easiest option.
 I was hoping just using my favoured pdf editor would allow for either the bookmarks, comments or sticky notes it can do, to be searchable globally (eg by DocFetcherPro or AnyTXT searcher), but it seems I'm out of luck there.

rjbull

  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 3,205
    • View Profile
    • Donate to Member
Re: Accessing articles from many pdfs
« Reply #3 on: September 04, 2023, 05:08 PM »
RightNote has features to deal with PDFs.  From the Help file:

Indexing settings

The professional version of RightNote allows you to index attachments and links of the following file types:
.txt, .rtf, .htm/html, .doc/docx, .xls, .csv, .pdf

In the options dialog under the Indexing settings section, you can select which file types you want to be indexed by default. Further, every note has it's own setting which will override the default settings.

For example, you may want to set pdf files to not be indexed by default, since often times these can be large files, and you generally do not need the contents of these file to be indexed. If you then need a specific pdf file to be indexed, you can adjust the setting in the attachment viewer.

---------

Attachment note type

The attachment note type allows you to store any type of file in a RightNote database. For example you can store MS Word documents, Excel files and PDF Documents. If supported, the contents of the file will be indexed and made searchable.

Currently the following file types will be indexed:

txt, rtf, htm, html, doc, docx, xls, pdf.

You can open/view the file by clicking on the Open File link in the viewer. This will open the file with the default associated application for the file type, for example and xls file will be opened by MS Excel (if it is installed); a doc file will be opened by MS Word.

[...]
Note:

When you open an attachment, you are viewing a copy of the original document. If you make changes to the attachment, this will not affect the original source document.

---------

Link note type

The link note type allows you to create a link to any type of file on your computer. For example you can create links to MS Word documents, Excel files and PDF Documents. If supported, the contents of the file will be indexed and made searchable.

Currently the following file types will be indexed:

txt, rtf, htm, html, doc, docx, xls, pdf.

Note:

A link note does not store the actual contents of the file in the RightNote database. It simply points to a file on the file system (or internet url). If you open the link, you will be opening the source file pointed to by the link url.

---------

Warnings:
   A) I haven't tried it
   B) Features only in the Professional version, i.e. the payware one

tsaint

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 497
  • Hi from the a*** end of the earth
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: Accessing articles from many pdfs
« Reply #4 on: September 06, 2023, 03:39 AM »
Thanks RJ. I'll investigate but probably wouldn't spend the money.
 I found out that PDF Xchange does indeed allow for global searching of its bookmarks and comments (sticky notes)... and saving the search
While I'd like to have a notes type program allowing linking to those, without importing the pdfs themselves, I'm happy enough to go with what I can do.

 .

Target

  • Honorary Member
  • Joined in 2006
  • **
  • Posts: 1,832
    • View Profile
    • Donate to Member
Re: Accessing articles from many pdfs
« Reply #5 on: September 06, 2023, 05:29 PM »
PDF Keeper (open source) appears to do a lot of indexing

No idea whether or not it fits your use case but maybe worth a look

tsaint

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 497
  • Hi from the a*** end of the earth
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: Accessing articles from many pdfs
« Reply #6 on: September 06, 2023, 10:17 PM »
Thanks Target...that sure looks worth following up!

sphere

  • Participant
  • Joined in 2018
  • *
  • default avatar
  • Posts: 176
    • View Profile
    • Donate to Member
Re: Accessing articles from many pdfs
« Reply #7 on: September 23, 2023, 06:42 PM »
Thanks Target...that sure looks worth following up!

I would look into zotero. While it is often categorized as a citation manager, it is possible to do alot more with it.    I would looking into its ability to extract pdfs through a plugin and also how people use it in conjunction with other applications, like mind mapping applications. There was an application years ago that was mentioned here (qippa I think) that did some interesting things when analyzing the citations.


rjbull

  • Charter Member
  • Joined in 2005
  • ***
  • default avatar
  • Posts: 3,205
    • View Profile
    • Donate to Member
Re: Accessing articles from many pdfs
« Reply #8 on: September 26, 2023, 04:25 PM »
Thanks Target...that sure looks worth following up!
[...]
There was an application years ago that was mentioned here (qippa I think) that did some interesting things when analyzing the citations.

Qiqqa, now freeware; a forum search turns up many mentions, e.g.

Qiqqa - Reference Management System - Mini-Review

Re: make scan PDF into text searchable PDF

The main proponent of Qiqqa was the formerly prolific IainB, but he's not been here since February 2023.

tsaint

  • Charter Member
  • Joined in 2005
  • ***
  • Posts: 497
  • Hi from the a*** end of the earth
    • View Profile
    • Read more about this member.
    • Donate to Member
Re: Accessing articles from many pdfs
« Reply #9 on: September 27, 2023, 04:37 PM »
Thanks Sphere and RJ ...
tried Zotero,but found even just adding some pdfs wasn't straightforward. I was expecting an "add (import) pdfs to database, then work with them" process.

Qiqqa sounds worth a try - will install and check out the tags and annotations.
The mentions of auto tags puts me off somewhat - I dislike s/ware that knows best what I want. I am assuming though, that I'll be able to turn this off as I don't want to drown in a sea of tags.