Blog to Ebook Conversion (3)

It is possible that you have a one large file that contains all the text in your ebook, but usually we create a single file for each chapter so that ebook readers can handle and navigate the book more easily.

If you define the chapter to contain all the articles in a month, here is one example to split the single large file into the smaller files for each month.

BEGIN{
 outfile="x"
 s1="a"
 s2="a"
 out=(outfile s1 s2 ".html")
 monthb4=""
}
/<h1>/ {
  split($0,array,"/") # array begins with [1]
  if(monthb4 != array[1]) {
  monthb4=array[1]
  if(NR!=1){
  close(out)
  if(s2=="z") {
   if(s1=="z") exit 1
   s1=chr(ord(s1)+1)
   s2="a"
  }
  else
   s2=chr(ord(s2)+1)
  out=(outfile s1 s2 ".html")
  }
  }
 }
 {print > out}

The program assumes that the original file is something like this:

<h1>8/22/2011</h1>
<h2>My first article.</h2>
...
<h1>9/01/2011</h1>
<h2>My second article.</h2>
...

The first number in the line with an h1 tag is the month in which the article is posted, so each time we detect the change of the month, the file name is updated as “xaa.html”, “xab.html” and so on.