Downloading Banal

Banal is a perl script that you can download directly or as part of the HotCRP source distribution:

Banal relies upon another tool, pdftohtml, to parse through PDF files and extract font and text position information. The short answer is to use the following version of pdftohtml with banal:

The longer answer is that there are a few versions of pdftohtml available online. In the past, banal has used pdftohtml-0.39. Recent formatting tools have started to generate PDF v1.5, though, which pdftohtml-0.39 cannot parse. A more recent version of pdftohtml, version 0.40a, does parse PDF v1.5. However, pdftohtml-0.40a has a performance bug that makes it impractical to use. I have a small patch that fixes the performance bug to make pdftohtml-0.40a acceptable, but it effectively reverts the font handling back to v0.39 semantics (which is perfectly fine for banal, but may not be for other applications of pdftohtml-0.40a). The version of pdftohtml with this initial patch for fonts is:

For better accuracy, particularly for calculating the leading, it is useful to have pdftohtml report text positions at high zoom levels. pdftohtml by default limits zoom to 3, and the pdftohtml-0.40c version removes this limit.



voelker@cs.ucsd.edu