Don't know how much help I might be, but I've taken a brief look at format.pl.
One thing I wondered about was its specification of utf8 in the input and output from/to files:
TXT, "<:utf8", "page_txt.txt";
BOX, "<:utf8", "page_box.txt";
I'm not experienced with AHK in that realm, but have the sense that that might present a tricky (or at least not-so-trivial) situation.
My impression so far with the rest of the script is that it doesn't look too bad. May be if we can get close enough, help might arrive for any remaining difficulties
Do you have any sample input files and corresponding expected output that I might test with?