Blog to Ebook Conversion (5)

Suppose you have 3,129 articles in your blog, and each article is stored in of the following files.

blog-entry-1.html
blog-entry-2.html
...
blog-entry-3128.html
blog-entry-3129.html

In such cases you can download all the files by a single command:

wget -r -np -o log.txt --accept "blog-entry-*.html" http://myFBblog.com/

This is almost OK, but when you list the names of all the files, they might appear in an order which may not be convenient for your purposes.

% ls -l blog-entry-*.html
blog-entry-1000.html
blog-entry-1001.html
...
blog-entry-998.html
blog-entry-999.html
blog-entry-99.html
blog-entry-9.html

This happens because the numbers 1 through 3129 are represented not in a fixed length. So here comes a short bash script to change the names of the files so that all digits are in four letters.

#!/bin/bash
FULL_LENGTH=20
MV=/bin/echo # change this to /bin/mv if you are sure.
for file in blog-entry-*.html
do
 myzero=(dummy 0 00 000) # the first element is indexed with zero
 len=${#file}
 if [ $len -ne $FULL_LENGTH ]
 then
  let short=$FULL_LENGTH-$len
  newfile=${file/entry-/entry-${myzero[$short]}}
  $MV $file $newfile
 fi
done
exit 0

What you will have is:

% ls -l blog-entry-*.html
blog-entry-0001.html
blog-entry-0002.html
...
blog-entry-3128.html
blog-entry-3129.html

Leave a Reply

Your email address will not be published. Required fields are marked *