Blog Summary (2)

Or you may also wish to know each date on which the article is published. Looking into the html file again, you will see the line like this:

<h2 class='date-header'><span>12/31/2013</span></h2>

So your code, myprog.awk, might be:

BEGIN {FS = "[<>]"}
/<h2 class='date-header'>/ {date=$5}
/post-title entry-title/ {getline;print date,$3}

And you will obtain:

12/31/2013 Taxi Driver
12/22/2013 The Accused
12/15/2013 The Silence of the Lambs
12/08/2013 Nell
12/01/2013 Contact

If you wish to make a list spanning many months, you first make a file, say, getlist.txt;

http://yourFBblog.com/2014-01-archive.html
...
http://yourFBblog.com/2011-04-archive.html

And get all the files, and process them.

% wget -i getlist.txt
% gawk -f myprog.awk 201*-archive.html | cat -n

The result will be something like this:

  1  29/01/2014 My Latest Post
...
402  01/04/2011 My First Post

Actually, my current version of myprog.awk is:

BEGIN {FS = "[<>]"}
/<h2 class='date-header'>/ {date=$5; if(length(date)==9) date="0" date; month=substr(date,1,2); day=substr(date,4,2); year=substr(date,7,4) }
/post-title entry-title/ {getline;printf("%2s-%2s-%2s %s\n",year,month,day,$3)}

This minor modification was required because the months from Jan. to Sept. are represented as a one digit (1 to 9), while from Oct. to Dec. as two digits(10 to 12). Therefore, the output is something like:

2013-12-31 Auld Lang Syne
2014-01-01 DX Pedition to Mars

Leave a Reply

Your email address will not be published. Required fields are marked *