1126
General Software Discussion / Re: General brainstorming for Note-taking software
« on: June 14, 2007, 03:33 PM »
urlwolf,
One problem might be related to protected metadata. Some journals require a login just to see the abstracts for instance. So I'm not sure about the legality of storing such abstracts in an alternative, open archive. One way around that would then be to match and archive only pdf fingerprints and DOI numbers and then let Zotero and so on implement some way to later automatically resolve the doi and grab metadata from the resolved article page (including abstract if the user is authenticated to have that displayed). Another advantage with such piggybacking on the DOI system is that that the archive then never risks having outdated article links. Another problem is if journal publishers change the pdf files from time to time. But perhaps that can be solved just by letting the archive match the multiple fingerprints to the same metadata.
edit: great post on Academic Productivity also
A possible solution would be to implement fingerprinting for pdf (like what musicIP does) [---] Now we only need someone to code/maintain a central repository of pdf metadata, and mappings fingerprints -> ids.This is a great idea that completely bypasses the need for DOI extraction. One way I can see it happening would be if some popular application like Zotero implemented this as an opt in feature that works automatic in the background. That is, every time someone downloads both article metadata and a pdf through Zotero, Zotero silently uploads pdf fingerprint and matching metadata to some server. As the database grows, downloading just pdf files will be enough since the metadata is already available in the open archive. Zotero seems like the kind of tool that is innovative and community driven enough to be ready to pioneer something like that.
One problem might be related to protected metadata. Some journals require a login just to see the abstracts for instance. So I'm not sure about the legality of storing such abstracts in an alternative, open archive. One way around that would then be to match and archive only pdf fingerprints and DOI numbers and then let Zotero and so on implement some way to later automatically resolve the doi and grab metadata from the resolved article page (including abstract if the user is authenticated to have that displayed). Another advantage with such piggybacking on the DOI system is that that the archive then never risks having outdated article links. Another problem is if journal publishers change the pdf files from time to time. But perhaps that can be solved just by letting the archive match the multiple fingerprints to the same metadata.
edit: great post on Academic Productivity also