Required E-Pub Contents
Why should I use the .epub format?
Because it's a completely open and free standard. The .epub is a standard for eBooks created by the International Digital Publishing Forum.It consists of basic XHTML for the book content, XML for descriptions, and a re-named zip file to hold it all in. Anyone can make these eBooks, and since they're essentially just XHTML, anyone can read them.
For a review of programs that can read ePub formatted books, click here.
Some books in the IDPF .epub format are available here.
How to Make an ePub eBook by Hand
If you're interested, I figured out the information in this guide by a combination of reverse engineering the Sherlock Holmes book from Adobe's site, reading through the specs at the IDPF web site, and trial and error until I got a working eBook to load properly in Digital Editions. (I figure it's OK to do that since Holmes is in the public domain now...).
- A text editor. Anything that can edit text files, HTML, and XML. Example: Notepad
- A .zip program. Anything that can create .zip files. Example: Windows XP's built-in .zip support
The Process can be broken down into two parts:
- Prepare the content
- Put in the container.
First, let's go check out the official specs. Yes, it's very boring and hard to follow, but aren't they all? These will come in handy later on though. After getting the basic structure of the file setup, the official specs are handy to reference for tags that aren't used very often, or if you can't remember what exactly goes in a certain tag. Don't let them scare you though, we really only have to fiddle with two XML files, the rest is either straight XHTML, or files that you can copy from the sample file that we'll be looking at later.
(Allowed Mark-up reference for included XHTML files)
Before we start preparing our own eBook, lets look inside a sample file.
- Download the sample file to your hard drive
- Rename the .epub extension to .zip
- Open the Zip file
Great. Now what is all this stuff?
A .epub file contains, at a bare minimum, the following files/folders:
- mimetype - tells a reader/operating system what's in here
- META-INF folder - This folder contains, at minimum, the container.xml file, which tells the reader software where in the container to find the book.
- OEBPS folder - Recommended location for the books content. It contains:
- Images folder - images go here
- Content.opf - XML file that lists what's in the container
- toc.ncx - This is the table of Contents
- xhtml files - The book's contents are in these
Lets look at each of these in more detail.
Feel free to extract these files and use them as a template...
One thing to note before we get started: the filenames are case sensitive.
This means that if you have a file named "Chapter1.xhtml" and you refer to it as "chapter1.xhtml" in the .OPF file or .NCX file, the book will not display properly.
This file is just a plain ASCII text file that contains the line: "application/epub+zip"
The operating system can look at this file to figure out what a .epub file is instead of using the file extension. This file must be the first file in the zip file, and must not be compressed.
This contains the container.xml file, which points to the location of the Content.opf file. This folder is the same for every e-book, so you should be able to recycle the whole folder from the sample file without making changes.
If you have any images for your eBook, they go in here.
This file gives a list of all files in the .epub container, defines the order of files, and stores meta data (author, genre, publisher, etc.) information.
Note that this file can be named anything you want to call it, as long as the container.xml file mentioned above points to the correct filename.
Lots of stuff in this file. I'll go through each required tag here. Check the specs to see more information about optional meta data tags.
dc:title - Title of the book
dc:language - Identifies the language used in the book content. The content has to comply with RFC 3066. List of language codes. (I'd just copy the language tag from the sample...)
dc:identifier - This is the book's unique ID. This has to be a unique identifier for every different e-book. The spec doesn't give any sort of recommendation for what to use, but an ISBN number would be a good bet. I used the name of my web site and the date and time.
One thing to note, because of how the file interacts with toc.ncx, just modify what's after the " uuid:" on this line.
Next comes the manifest. This is just a listing of the files in the .epub container, and their file type.
Each item is also assigned an item ID that's used in the spine section of content.opf. This list does not have to be in any particular order.
The spine section lists the reading order of the contents. The spine doesn't have to list every file in the manifest, just the reading order. For example, if the manifest lists images, they do not have to be listed in the spine.
This is the table of contents. This file controls what shows up in the left Table of Contents pane in Digital Editions
Things you need to change:
- Make sure the uid matches what you have in content.opf
- doctitle: The text inside the text tag is what will show up as the books title in the reader software
- The navpoint tag.
Each nav point is a chapter listing, the text is the chapter name, and the src is the file it links to.
If you copy a navpoint tag set to add chapters, make sure to update the id and playorder values.
According to the spec, the ID can be anything you want, but it's easier to keep track of things if you use the same ID you used for that file in the .OPF file. Also, some readers won't properly display the Table of Contents if the ID doesn't match.
Also, the playorder values have to be in order. (An item with playorder 1 will be before an item with playorder 2, etc.) They also have to be listed in order, and can't have any gaps. (You'll get an error if you jump from 1 to 20, etc)
This file isn't part of the IDPF spec, but Adobe Digital Editions uses it for formatting and setting column settings and whatnot. You don't need this file at all, but your book will look nicer in Digital Editions if you include it. Other readers should just ignore it.
Note: You can use a .css style sheet file to layout styles for your book as well. Just make sure to list it in the manifest section of Content.opf
Also of note here, any styling should be done in a CSS stylesheet, and not in the document.
Content .xhtml files
Content files should be XML 1.1 documents
If you're not familiar with XML, it's basicly HTML with closing tags for every element, and several style tags are not supported.
Making the Container
Now we make the .epub container that all these files go in.
1. Create an empty .zip file with whatever name you like
2. Copy the mimetype file into the zip file (don't use compression on this file)
3. Copy the rest of the files and folders mentioned above into the zip file *
4. Re-name the .zip extension to .epub
* The specification recommends that the books files go in an "OEBPS" folder inside the zip file. If you put them in another spot, be sure that container.xml in the META-INF folder points to the correct location of the *.opf file.
The zip file layout should look something like this:
You should now be able to open your eBook in Adobe Digital Editions, or any other reader that supports the .epub format.
If you want to cheat, download the file below. It's a zip file that has empty chapter pages, and the content and toc files pre filled out, so all you have to do is copy and paste your content into the empty files, and modify the OPF and NCX files.
Checking your ePub file
So you've made a sample ePub book, and it won't open, or it opens with an error, or looks funky. What now?
epubcheck is a program that will scan your ePub file and display any errors it finds in the book.