CustomCleanerPlus: A custom cleanup utility for both epubs and imported html docs.
Requirements
Plugin Type: Edit
Minimum Sigil requirement: v0.9.3 or higher
Python Requirements: Python 3.4+ (Bundled or External)
OS Requirements: Windows 7, 8, 10 or Linux
*** Tested on Windows 7, 8 & 10 and Ubuntu ***
*** Untested on OSX ***
Current Version: "0.1.1"
Installation
* Select
Manage Plugins from the Plugins menu. In the dialog box, select either the Bundled Python or the External Python(Python 3.4+ should be installed on your computer to run this plugin externally).
* Click
Add Plugin and select CustomCleanerPlus_vXXX.zip. This will load and install the plugin into Sigil, which you can then select and run using
Plugins > Edit > CustomCleanerPlus
Description
This plugin cleaner is an edit plugin that can be used to clean-up both epubs and imported html docs. It also transforms html code to help ensure proper xhtml compliance to epub 2 format.
For epubs, this plugin cleaner is best used after using an epub converter to remove any dross or non-compliant proprietary data still remaining in the epub html and stylesheet.
For those folks who prefer crafting their epubs from an html doc imported into Sigil as their start point, this plugin should also prove useful. You should use html documents with this plugin that are only derived from the following doctypes:
Word doc and docx(html filtered only), ODT(OO and LO docs only), GoogleDoc(as html, zipped) and AbiWord.
To load an html doc into Sigil, first open Sigil and go to
Edit > Tools > Preferences > General Settings > Mend XHTML Source Code On: and set this to
Open and then save it(you only have to do this once). Now you can load the html file in the normal way using
File > Open(ensure File Types is set to html).
Features
Automatic Tasks(applied to both epubs and imported html docs)
-- Thoroughly cleans out and reformats all html files
-- Removes or changes all unneeded or non-compliant proprietary data in the html
-- Trims the epub stylesheet(s) - removes any unneeded or redundant class properties from the css
-- Ensures that all ebook image formatting is epub 2 compliant
-- Adds CSS globals and presets both for compliance and to help avoid KDP Look Inside issues
-- Removes all hard line breaks(blank lines) caused by the enter key
-- Removes all html tags that are empty or that contain just spaces
-- Removes all tabs
Cleanup Options(via dialog):
-- Convert all <i>, <b>, <em>, <u>, <s> and <strong> tags to span tag styling
-- Convert all ebook text and headings to default serif throughout
-- Reformat ebook images using percentage screen values to help normalize smaller image sizes across all ereaders.
-- Remove all internet link formatting
-- Remove all internal link formatting (will not remove Sigil-generated TOC link formatting in epubs)
-- Remove all bookmarks
-- Remove all <div> tags
Plugin Run
First load your epub or html doc into Sigil and then just run this plugin. For epubs also ensure, before you run this plugin, that your epub is fully formed and contains the appropriate ebook cover, html files, xml files, stylesheet(s), images etc. After running the plugin it would also perhaps be advisable to run several passes of Sigil's
Tool > Delete Unused Stylesheet Classes to mop up any empty or unused classes in your epub's stylesheet(s) after the clean up.
Caveats
Don't use SVG wrappers for ebook images in your epub with this plugin because it will cause problems. Unfortunately, due to a quirk with the Tidy module, these SVG problems can't be resolved. Also avoid using fake smallcaps in you doc headings as this can cause nested <font> tag problems. Best to add the fake smallcaps to your epub styling in Sigil after running this plugin. You will normally get best results by using only paragraph style formatting for all text, headers and spacing in your doc. Do not use tables or captions in your html doc with this plugin.
Changes