Discussion
Loading...

#Tag

  • About
  • Code of conduct
  • Privacy
  • About Bonfire
Terence Eden
@Edent@mastodon.social  ·  activity timestamp 4 days ago

What's working:

✅ Download JSON of each page from Amazon.
✅ Deobfuscate the SVG "DRM".
✅ Draw each letter on the page with the correct indent, placement, and font (italics, etc).

What's mostly working:
🚧 OCR. Tesseract gets most of the text, but some errors.

What's not working:
❌ OCR doesn't output italics.
❌ Linebreaks are hardcoded.
❌ Doesn't integrate into the original ePub code - so no chapters etc.
❌ No idea about footnotes, images, etc.

#Kindle #DRM #OCR

Screenshot of a page of text. Indents and italics all work.
Screenshot of a page of text. Indents and italics all work.
Screenshot of a page of text. Indents and italics all work.
  • Copy link
  • Flag this post
  • Block
Music Channel
Music Channel boosted
Oblomov
@oblomov@sociale.network  ·  activity timestamp 7 days ago
#AskFedi is there an #OCR for #music notation? Something that can convert scanned sheet music in some standardized music notation format that can be typeset with appropriate programs?
  • Copy link
  • Flag this post
  • Block
Oblomov
@oblomov@sociale.network  ·  activity timestamp 7 days ago
#AskFedi is there an #OCR for #music notation? Something that can convert scanned sheet music in some standardized music notation format that can be typeset with appropriate programs?
  • Copy link
  • Flag this post
  • Block
Log in

Bonfire Dinteg Labs

This is a bonfire demo instance for testing purposes. This is not a production site. There are no backups for now. Data, including profiles may be wiped without notice. No service or other guarantees expressed or implied.

Bonfire Dinteg Labs: About · Code of conduct · Privacy ·
Bonfire social · 1.0.0-rc.3.15 no JS en
Automatic federation enabled
  • Explore
  • About
  • Code of Conduct
Home
Login