Quantcast
Channel: MobileRead Forums - Kindle Formats
Viewing all articles
Browse latest Browse all 862

On repairing defective apnx files.

$
0
0
(Please don't use this thread to vent that ebooks shouldn't have page numbers, what the numbering scheme should be, or other off topic posts.)

Three of the books that I've read recently with amazon supplied page numbers started out OK, but the page numbers started getting screwy near the end. My best guess is that books with extensive notes, bibliographies, etc are prone to having HREFs that look like page anchors to whatever tools publishers use to generate <pageList> sections in toc.ncx and that that screws that file up.

Details:
Spoiler:

A Brief History of Everyone Who Ever Lived by Adam Rutherford Sep 2016, Weidenfeld & Nicolson

Utopia for Realists by Rutger Bregman March 2017, Little, Brown

Bad Blood by John Carreyrou May 2018, Knopf

These aren't cheap books, but sometimes they are on sale. Also I assume that they have been out long enough to not have long wait lists at libraries. I'm curious whether the commercial EPUB versions have page number problems. (I don't check ebooks from libraries.) I did check out paper versions of all three to compare page numbers. The Utopia for Realists paper book had significantly lower page numbers than the amazon supplied numbers throughout the book. Bizarrely, the page anchors in the ebook matched the paper book, as did the fixed ebook page numbers.

I thought it might be possible to repair the apnx files, but that it would be difficult to figure out how, and tedious and time consuming to do. Then I saw post#2 by Doitsu in this thread: https://www.mobileread.com/forums/sh...d.php?t=255926

I don't think kindleunpack has an option to make an epub whose toc.ncx has a <pageList> section, but it turned out to be relatively easy to use kindleunpack -> (some regex and scripting) -> kindlegen -> kindleunpack to get repaired apnx files.

The first step is to look at some of the Text/part0*.xhtml files to learn the form of page anchors used in the book. Next make a list of file name anchor id pairs and use that to generate a <pageList> section to insert ahead of the closing /ncx> in toc.ncx after removing anything fishy that might be in the list. Then use kindlegen on the augmented EPUB. The only thing you need from the fat mobi is the apnx file, which should be renamed to match the one supplied by amazon for the book.

The really good news is that the new apnx file can be copied straight to the sdr directory for the book, overwriting the existing file. (I did this with the book closed on the kindle.) This doesn't seem to faze the kindle at all. The next time the book is opened, the page numbers are correct.

Attached is a perl script to generate a <pageList> section from a list of file name page id pairs.

Attached Files
File Type: gz gen_pagelist.pl.gz (392 Bytes)

Viewing all articles
Browse latest Browse all 862

Trending Articles