What's working:
✅ Download JSON of each page from Amazon.
✅ Deobfuscate the SVG "DRM".
✅ Draw each letter on the page with the correct indent, placement, and font (italics, etc).
What's mostly working:
🚧 OCR. Tesseract gets most of the text, but some errors.
What's not working:
❌ OCR doesn't output italics.
❌ Linebreaks are hardcoded.
❌ Doesn't integrate into the original ePub code - so no chapters etc.
❌ No idea about footnotes, images, etc.