General brainstorming for Note-taking software

ATTENTION: You are viewing a page formatted for mobile devices; to view the full web page, click HERE.

Main Area and Open Discussion > General Software Discussion

<< < (111/192) > >>

Nod5:
Darwin & others,
Adding an entire archive of PDFs to EndNote items manually must be extremely tiresome. Given the popularity of EndNote I bet many researchers are doing just that anyway (they probably curse a lot in the process ;D).

So if there's no tool to programmatically add the PDFs then maybe smart folks at DonationCoder could look into how hard it would be to make one? If such a tool could be completed it could draw a lot of interest to DonationCoder. And hey, it might due to saved time speed up scientific progress a (very, very, very small) bit.

The main problem to solve would be how to match the pdf files and the bibliographic items. If journal articles contain their DOI id number as extractable metadata then that might be a solution. I opened a few journal articles in UltraEdit and browsed for doi numbers or something similar in the ascii text but couldn't find anything. Also, I exported metadata from some sample PDF files with the free tool PDFTK. Here's one example of what I got:

InfoKey: Author
InfoValue: x
InfoKey: GTS_PDFXVersion
InfoValue: PDF/X-1:2001
InfoKey: Producer
InfoValue: Acrobat Distiller 6.0.1 for Macintosh
InfoKey: Creator
InfoValue: InDesign: pictwpstops filter 1.0
InfoKey: ModDate
InfoValue: D:20070613112031+01'00'
InfoKey: GTS_PDFXConformance
InfoValue: PDF/X-1a:2001
InfoKey: Title
InfoValue: 14.6 News 758-759.indd NS new.indd
InfoKey: CreationDate
InfoValue: D:20070612095515+01'00'
PdfID0: cee3243181236aabefbac383315fda1f
PdfID1: 3ab8e10f7074409da8a4fc9d6eb3a2
NumberOfPages: 2
PageLabelNewIndex: 1
PageLabelStart: 758
PageLabelNumStyle: DecimalArabicNumerals
--- End quote ---

this row "InfoValue: 14.6 News 758-759.indd NS new.indd" is sort of an identification since the PDF is from Nature Volume 447 Number 7146 section News page 758. But it's not as good as a DOI. And other articles i tried did not even have that InfoValue. PdfID0 & PdfID1 are made by Acrobat Distiller I think, and does not relate to the article contents/DOI.

Embedding DOI numbers according to some standard seems like an obviously smart thing. If available it would not only allow adding a pdf to the corrent EndNote items but also the reverse. Start with a pdf, autoresolve its DOI online, then import all the resolved metadata into an EndNote item, rename the pdf according to some format and import/link it to the EndNote item.

But it wouldn't really surprise me if DOI is not embedded at all. The electronic journal systems still seems rather ineffective and user-unfriendly. As this overview states (http://hublog.hubmed.org/archives/001306.html ):
While most of the larger publishers provided an acceptable method of authentication, the PDF files they produce are obviously not optimised for ease of use by the reader. It's almost impossible to build a tool to automatically fetch PDFs for papers [---] The implementation of all of these features could be automated with little change to the publishers' systems, but would be a major benefit to researchers struggling to deal with large amounts of literature.
--- End quote ---
(BTW, a comment on that page links to http://quosa.com/solutions.html , a tool for easy, massive article downloading and EndNote importing. No price is listed on the site (only this note on discounts http://www.quosa.org/support/helpdocs/mac/pricing.htm ). So I suspect it is very expensive. )

Without DOI in the PDFs any programmatic matching of PDF and EndNote items would probably have to be more fuzzy.

nevf:
I think I can address some of what you requested using UltraRecall (http://www.kinook.com). Here is how...

UR has a feature that I have not seen in any other PIM system. It allows you to link (or store) any document on your system e.g. pdf, doc, xls, OL items, anything. You can actually store the doc within UR database and delete it from the OS. -cnewtonne (June 06, 2007, 10:32 PM)
--- End quote ---

I just want to let everyone know that Surfulater also does this. ie. Store any files in its Knowledgebases or add links to them. Then open them in their native app. You can drag and drop files from Explorer on to an Articles Attachments field or use the context menu to add links or attach files. Synchronized editing of embedded (attached files) is planned. Files with links are of course always up to date.

Surfulater version 2.00.30.0 has been released today. Download from http://www.surfulater.com

Darwin:
PDF import into Endnote is only cumbersome if you have several thousand to do "after the fact". What you do is tile an endnote window and a file manager side by side. Then you just drag the pdf onto the title of its entry Endnote and it is automatically copied to the Endnote managed folder and a link created to it. It's not too cumbersome if you do it as you download/save pdfs. Of course, it would be preferable, if you are having Endnote manage the pdfs to save the pdfs directly to Endnote, but I've yet to see that as an option. Alternatively, you can simply have a link created that points to wherever you happen to have the pdf stored. This functionality was introduced several versions back... As I noted above, if I had had my head screwed on right I would have realised that having Endnote store the pdf's is silly given the size of even a small library of pdfs and that v.9 already did all that I needed...

The soft-link to your pdf's is preferable because it allows you to organise your pdfs - I had mine very organised - my folder structure goes: PDF - Journal - Year and in some cases Volume. All pdf's were saved with file names that reflect the title of the work (if I was starting again I'd put author name first, year of publication and then the title, but can't go back 7 years, sadly. Endnote just dumps the pdfs into it's own folder in a common folder under the library folder. So, Endnote libraries - Library Name.data - PDF. Fortunately, desktop search technology and/or Endnote's own system mean that nothing is far from "hand". I have found this sort of liberating because occasionally I'll just save the pdf with its default name (science.pdf or article.pdf or 012387.pdf) and don't have to worry about finding it again.

Now, to get to NoD5's point finally (if I am getting it) - it would be great to have an app like Endnote scan a pdf and extract the information from it to generate the Endnote entry. I know that this would SERIOUSLY increase the size, complexity and footprint of Endnote, but it would be useful. If free search agents can scan a pdf in under a second and catalogue its content, surely an app could extract data from it to insert into a database. Of course, this would require some sort of standardised formatting on the part of publishers and authors OR the creation of some sort tag that contains the info (a la mp3s). Fortunately, most publications now have a "download citation" feature and will auto insert everything into a new Endnote library entry. Sweet. Now if we could just save a pdf directly to Endnote...

Surfulater looks interesting - this kind of functionality is frequently requested for TexNotes/Do-Organizer, but it's been pointed out that this would significantly increase the size of the database. The developers have promised to consider it but so far nothing as come of it (at least with TexNotes - I've not really been following Do-Organizer's development). Anyway, my point being that this would be a useful feature. The issue is with portability - if one is not concerned about that, then it's not an issue.

superboyac:
One question about Endnote users:
Would you use Endnote for keeping track of your pdf's and documents if you weren't using it for research? Like, if you didn't already have Endnote for school purposes (or some other academic application) you wouldn't go out and buy it for it's ability to annotate your documents, right?

I'm just trying to get a feel who is using Endnote this way. In my case, I'd probably go with Surfulater or something a little less expensive, especially if I didn't it for a properly formatted bibliography.

urlwolf:
Darwin & others,
Adding an entire archive of PDFs to EndNote items manually must be extremely tiresome. Given the popularity of EndNote I bet many researchers are doing just that anyway (they probably curse a lot in the process ;D).
-Nod5 (June 14, 2007, 03:47 AM)
--- End quote ---
You can bet on that.

Embedding DOI numbers according to some standard seems like an obviously smart thing. If available it would not only allow adding a pdf to the corrent EndNote items but also the reverse. Start with a pdf, autoresolve its DOI online, then import all the resolved metadata into an EndNote item, rename the pdf according to some format and import/link it to the EndNote item.

But it wouldn't really surprise me if DOI is not embedded at all. The electronic journal systems still seems rather ineffective and user-unfriendly.

--- End quote ---

Shameless plug: I have written about that here:
http://www.academicproductivity.com/blog/2007/on-metadata-indexing-and-mucking-around-with-pdfs/

As much of a pain as it is to manage a music collection (tags are always incomplete/wrong) a pdf collection is much worse.

A possible solution would be to implement fingerprinting for pdf (like what musicIP does) and then tag these pdfs with at least author, year, etc. Looks like the pdf specs support some tagging:

from atom prober in the comments:
PDF supports XMP. XMP allows all the dublin core metadata that Zotero, refbase, OpenOffice.org, and other products are using.

We just need to have publishers care enough to put this data in and more end-user tools to index/view/search/edit it.

--- End quote ---

Now we only need someone to code/maintain a central repository of pdf metadata, and mappings fingerprints -> ids. Sooner or later, publishers would start incorporating metadata as well.

You know, this is not a bad idea. How many people would use it?
Then, zotero, endnote, etc could work like any mp3 tagger/player.

That'd save lots of headaches!

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version