Messages - Nod5 [ switch to compact view ]

Pages: prev1 ... 221 222 223 224 225 [226] 227 228 229 230 231 ... 234next
1126
urlwolf,
A possible solution would be to implement fingerprinting for pdf (like what musicIP does) [---] Now we only need someone to code/maintain a central repository of pdf metadata, and mappings fingerprints -> ids.
This is a great idea that completely bypasses the need for DOI extraction. One way I can see it happening would be if some popular application like Zotero implemented this as an opt in feature that works automatic in the background. That is, every time someone downloads both article metadata and a pdf through Zotero, Zotero silently uploads pdf fingerprint and matching metadata to some server. As the database grows, downloading just pdf files will be enough since the metadata is already available in the open archive. Zotero seems like the kind of tool that is innovative and community driven enough to be ready to pioneer something like that.

One problem might be related to protected metadata. Some journals require a login just to see the abstracts for instance. So I'm not sure about the legality of storing such abstracts in an alternative, open archive. One way around that would then be to match and archive only pdf fingerprints and DOI numbers and then let Zotero and so on implement some way to later automatically resolve the doi and grab metadata from the resolved article page (including abstract if the user is authenticated to have that displayed). Another advantage with such piggybacking on the DOI system is that that the archive then never risks having outdated article links. Another problem is if journal publishers change the pdf files from time to time. But perhaps that can be solved just by letting the archive match the multiple fingerprints to the same metadata.

edit: great post on Academic Productivity also

1127
Darwin & others,
Adding an entire archive of PDFs to EndNote items manually must be extremely tiresome. Given the popularity of EndNote I bet many researchers are doing just that anyway (they probably curse a lot in the process  ;D).

So if there's no tool to programmatically add the PDFs then maybe smart folks at DonationCoder could look into how hard it would be to make one? If such a tool could be completed it could draw a lot of interest to DonationCoder. And hey, it might due to saved time speed up scientific progress a (very, very, very small) bit.

The main problem to solve would be how to match the pdf files and the bibliographic items. If journal articles contain their DOI id number as extractable metadata then that might be a solution. I opened a few journal articles in UltraEdit and browsed for doi numbers or something similar in the ascii text but couldn't find anything. Also, I exported metadata from some sample PDF files with the free tool PDFTK. Here's one example of what I got:

InfoKey: Author
InfoValue: x
InfoKey: GTS_PDFXVersion
InfoValue: PDF/X-1:2001
InfoKey: Producer
InfoValue: Acrobat Distiller 6.0.1 for Macintosh
InfoKey: Creator
InfoValue: InDesign: pictwpstops filter 1.0
InfoKey: ModDate
InfoValue: D:20070613112031+01'00'
InfoKey: GTS_PDFXConformance
InfoValue: PDF/X-1a:2001
InfoKey: Title
InfoValue: 14.6 News 758-759.indd NS new.indd
InfoKey: CreationDate
InfoValue: D:20070612095515+01'00'
PdfID0: cee3243181236aabefbac383315fda1f
PdfID1: 3ab8e10f7074409da8a4fc9d6eb3a2
NumberOfPages: 2
PageLabelNewIndex: 1
PageLabelStart: 758
PageLabelNumStyle: DecimalArabicNumerals

this row "InfoValue: 14.6 News 758-759.indd NS new.indd" is sort of an identification since the PDF is from Nature Volume 447 Number 7146 section News page 758. But it's not as good as a DOI. And other articles i tried did not even have that InfoValue. PdfID0 & PdfID1 are made by Acrobat Distiller I think, and does not relate to the article contents/DOI.

Embedding DOI numbers according to some standard seems like an obviously smart thing. If available it would not only allow adding a pdf to the corrent EndNote items but also the reverse. Start with a pdf, autoresolve its DOI online, then import all the resolved metadata into an EndNote item, rename the pdf according to some format and import/link it to the EndNote item.

But it wouldn't really surprise me if DOI is not embedded at all. The electronic journal systems still seems rather ineffective and user-unfriendly.  As this overview states (http://hublog.hubmed.org/archives/001306.html ):
While most of the larger publishers provided an acceptable method of authentication, the PDF files they produce are obviously not optimised for ease of use by the reader. It's almost impossible to build a tool to automatically fetch PDFs for papers [---] The implementation of all of these features could be automated with little change to the publishers' systems, but would be a major benefit to researchers struggling to deal with large amounts of literature.
(BTW, a comment on that page links to http://quosa.com/solutions.html , a tool for easy, massive article downloading and EndNote importing. No price is listed on the site (only this note on discounts http://www.quosa.org/support/helpdocs/mac/pricing.htm ). So I suspect it is very expensive. )

Without DOI in the PDFs any programmatic matching of PDF and EndNote items would probably have to be more fuzzy.

1128
I also print around 50%. I'm slowly getting better at reading only on the screen though.

Armando,
I disliked the fact that every single file was doubled :  always had to make sure that I moved both files when I had to move them around, had to check both files when I renamed them, etc. ... Maybe there could be  a way of doing it in a way that’s much more simple and automated.
I try to never rename the pdf files, keep them all in one giant folder and never move that folder. Then the problem doesn't occur very often.
But in a scenario where file renaming is often necessary, scripting could help. I made a AHK script you could try. Use it like this:
1. have script running
2. manually rename filename.pdf to newfilename.pdf in explorer
3. while newfilename.pdf is selected, press script hotkey (shift+§ ; but change that to something that fits your keyboard well)
4. select filename.txt (the tagfile) in explorer
5. press hotkey again within 4 seconds
---> script autorenames filename.txt to newfilename.txt
#Persistent
+§::
IfWinNotActive, ahk_class CabinetWClass     ; only run when Explorer is active
 IfWinNotActive, ahk_class ExploreWClass
  return

xnow = %A_now%
xnow -= xtime
if xnow < 5                   ; check 4 second limit
{
SplitPath, xnew,,,, xnew_noext
sendinput {F2}
sendinput %xnew_noext%.txt
sendinput {Enter}
xnew =
xtime =
goto RemoveTrayTip
}
else
{
sendinput {F2}
sendinput ^c
sendinput {Enter}
xnew = %clipboard%
xtime = %A_now%
TrayTip,, filename ready
SetTimer, RemoveTrayTip, 4000
}
return

RemoveTrayTip:
SetTimer, RemoveTrayTip, Off
TrayTip
return

1129
Armando,
"If I could find another reliable strategy which is sufficiently portable (I don't want to loose years of tags and annotation by just changing software or OS), I'd give it a try..."

Well you could do what I do with journal articles but for any other types of files also. That is, create a .txt file with the same name as the file you want tagged (but perhaps with "_tag" at the end) and put the tags and keywords in that .txt. I think any indexing search tool will find those tags and due to the same filename you then find the file you're searching for. If you have a consistent and unique format for your current tags in filenames that a script can isolate (like using [tag1 tag2 tag3] only for tags) then you could probably make a batch script that migrates to the ".txt system".

1130
Armando,
Filenotes looks interesting. It hasn't been updated since 2005 though. But the idea of having (searchable) comments for all kinds of files is a good idea. Could make things easier to find without putting a lot of tags in a (long) filename.

BTW, for each pdf journal article I save I also put its abstract in a .txt with the same name. Mostly for being able to later quickly reread the abstract. But maybe I could now also programmatically, through some script, import all those abstracts as comments for the associated pdf in Filenote or whatever pdf collection tool I in the end settle for! I hadn't thought about that before.

Darwin,
thanks for yet more suggestion and also for investigating the notes indexing issue in Archivarius!

Pages: prev1 ... 221 222 223 224 225 [226] 227 228 229 230 231 ... 234next
Go to full version