topbanner_forum
  *

avatar image

Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
  • Thursday April 18, 2024, 11:24 am
  • Proudly celebrating 15+ years online.
  • Donate now to become a lifetime supporting member of the site and get a non-expiring license key for all of our programs.
  • donate

Last post Author Topic: N.A.N.Y. 2014 Submission: sumatra_earmarks  (Read 41244 times)

ewemoa

  • Honorary Member
  • Joined in 2008
  • **
  • Posts: 2,922
    • View Profile
    • Donate to Member
Re: N.A.N.Y. 2014 Submission: sumatra_earmarks
« Reply #25 on: November 08, 2013, 05:09 PM »
Regarding page numbering in PDFs, I'd be noticing that the page number displayed on some PDFs didn't match what applications were displaying.  At some point, that lead to some digging in the specifications -- perhaps you are already aware, but FWIW there's a bit of somewhat introductory material about this at:

  http://www.w3.org/WAI/GL/WCAG20-TECHS/PDF17.html

(Relevant sections: Description and Example 2)

More details are in "12.4.2 Page Labels" in the PDF 1.7 spec it seems.



Haven't tested the new version yet but hope to -- must find appropriate PDF first :)

Nod5

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 1,169
    • View Profile
    • Donate to Member
Re: N.A.N.Y. 2014 Submission: sumatra_earmarks
« Reply #26 on: November 09, 2013, 04:48 AM »
Yes, there can be two mismatches I think:
1. page numbers in the pdf page margins (like on a physical book page) differs from the Sumatra toolbar page number edit box.
2. the Sumatra toolbar page number edit box page differs from the "count number" of the current page within the pdf file (first number in the parenthesis after the edit box in my screenshot below).

s_e uses whatever is in the toolbar edit box. The latest update handles earmarks for pages like this.
roman.png
The spec you linked also shows a special appendix page number format (A-1 ...). But I haven't seen that in any pdf file yet.
« Last Edit: November 09, 2013, 04:54 AM by Nod5 »

ewemoa

  • Honorary Member
  • Joined in 2008
  • **
  • Posts: 2,922
    • View Profile
    • Donate to Member
Re: N.A.N.Y. 2014 Submission: sumatra_earmarks
« Reply #27 on: November 12, 2013, 05:16 AM »
I tried earmarking the first three entries in the bookmarks for Free as in Freedom 2.0.

The three things listed in the popup are (in order):

1.1
00v
vii

I expected to see:

v
vii
1.1


My bad -- tested with an older version.

What I see with 131108 is:

i5
i7
 i (preceded by space?)

For each case, the grey area of the popup shows:

v i5
vii i7
1.1 i



On a side note, just learned that there's a book on PDF from O'Reilly - Developing with PDF - Dive Into the Portable Document Format - with the following blurb:

PDF is becoming the standard for digital documents worldwide, but it’s not easy to learn on your own. With capabilities that let you use a variety of images and text, embed audio and video, and provide links and navigation, there’s a lot to explore. This practical guide helps you understand how to work with PDF to construct your own documents, troubleshoot problems, and even build your own tools.

Don't know much about it, but FWIW.
« Last Edit: November 12, 2013, 05:54 AM by ewemoa »

Nod5

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 1,169
    • View Profile
    • Donate to Member
Re: N.A.N.Y. 2014 Submission: sumatra_earmarks
« Reply #28 on: November 12, 2013, 09:54 AM »
What I see with 131108 is:
i5
i7
 i (preceded by space?)

For each case, the grey area of the popup shows:
v i5
vii i7
1.1 i

This is as planned, except for the page numbered "1.1". s_e doesn't recognize that format. That pdf has page numbering in the following order: 1 2 i ii iii ... xiv 1.1 2.1 3 4 5 6 7 ... 229
I may tweak s_e to handle the 1.1 pages better. But the 1 and 2 at the start is a challenge.

Coding background: I've indexed pagenumbers with roman numerals as the numeral value minus 100. E.g. xii --> 7 - 100 = -93. That gets the ordering right for the next/prev jumps. Earmarks at i, xii, 2, 6, 12 would make up the index -99, -93, 2, 6, 12. So If I'm at page i (-99) and jump to next earmark s_e would correctly jump to xii (-93) which is the next item to the right in the index. I think I can index 1.1 as 0.1 and get the right ordering. But fitting the first 1 in that pdf into this way of doing things is trickier. A more general worry is that people may in practice use various different page naming schemes. A general fix would have to for each pdf first read *all* its page numbers to an index/array in order and, when the user earmarks a page, mark that page in the index. For that I'll first need to find a way to read all page numbers of any pdf using autohotkey.

edit: the pdftk command "dump_data" gives useful output. The relevant bit for faif-2.0.pdf :
PageLabelNewIndex: 1
PageLabelStart: 1
PageLabelNumStyle: DecimalArabicNumerals
PageLabelNewIndex: 3
PageLabelStart: 1
PageLabelNumStyle: LowercaseRomanNumerals
PageLabelNewIndex: 17
PageLabelStart: 1
PageLabelNumStyle: DecimalArabicNumerals
But to work with this I'd have to recode much of s_e and add pdftk.exe as a dependancy. And I'm not sure if pdftk dump_data would work on all pdf files. And pdftk cannot handle djvu files. So I hesitate to go there.

I choose to translate the roman i ii iii ... xii into a made up format i1 i2 i3 ... i7 to avoid a lot of column spacing in the grid view if someone earmarks xxviiii.
« Last Edit: November 12, 2013, 10:19 AM by Nod5 »

kyrathaba

  • N.A.N.Y. Organizer
  • Honorary Member
  • Joined in 2006
  • **
  • Posts: 3,200
    • View Profile
    • Donate to Member
Re: N.A.N.Y. 2014 Submission: sumatra_earmarks
« Reply #29 on: November 18, 2013, 07:26 AM »
Nod5, I'm impressed with the sustained effort you're putting into your entry. Kudos!

Nod5

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 1,169
    • View Profile
    • Donate to Member
Re: N.A.N.Y. 2014 Submission: sumatra_earmarks
« Reply #30 on: November 20, 2013, 02:53 PM »
Thanks kyrathaba!

v131120 new db format; works with pdf files with complex pagelabels (1,2,i,ii,iii,1.1,1.2,3,4,5...)

ewemoa: I found a workaround that doesn't need pdftk. The grid should now work with any file. If a page is earmarked that has a pagelabel other than a roman or arabic number s_e uses "pagecount" (the position of the page in the pdf file) in the grid for that pdf. This update breaks the database format in the .txt so all earmarks must be created anew manually.

ewemoa

  • Honorary Member
  • Joined in 2008
  • **
  • Posts: 2,922
    • View Profile
    • Donate to Member
Re: N.A.N.Y. 2014 Submission: sumatra_earmarks
« Reply #31 on: November 20, 2013, 11:47 PM »
v131120 new db format; works with pdf files with complex pagelabels (1,2,i,ii,iii,1.1,1.2,3,4,5...)

Seems to be working here :)

I found a workaround that doesn't need pdftk. The grid should now work with any file. If a page is earmarked that has a pagelabel other than a roman or arabic number s_e uses "pagecount" (the position of the page in the pdf file) in the grid for that pdf.

Nice!

Nod5

  • Supporting Member
  • Joined in 2006
  • **
  • Posts: 1,169
    • View Profile
    • Donate to Member
Re: N.A.N.Y. 2014 Submission: sumatra_earmarks
« Reply #32 on: October 25, 2014, 06:21 PM »
Chances are anyone who comes here and finds sumatra_earmarks usesful may also find my new sumatra_highlight_helper useful so that is why this sentence a few words back linked to it.