Or you may also wish to know each date on which the article is published. Looking into the html file again, you will see the line like this:
<h2 class='date-header'><span>12/31/2013</span></h2>
So your code, myprog.awk, might be:
BEGIN {FS = "[<>]"} /<h2 class='date-header'>/ {date=$5} /post-title entry-title/ {getline;print date,$3}
And you will obtain:
12/31/2013 Taxi Driver 12/22/2013 The Accused 12/15/2013 The Silence of the Lambs 12/08/2013 Nell 12/01/2013 Contact
If you wish to make a list spanning many months, you first make a file, say, getlist.txt;
http://yourFBblog.com/2014-01-archive.html ... http://yourFBblog.com/2011-04-archive.html
And get all the files, and process them.
% wget -i getlist.txt % gawk -f myprog.awk 201*-archive.html | cat -n
The result will be something like this:
1 29/01/2014 My Latest Post ... 402 01/04/2011 My First Post
Actually, my current version of myprog.awk is:
BEGIN {FS = "[<>]"} /<h2 class='date-header'>/ {date=$5; if(length(date)==9) date="0" date; month=substr(date,1,2); day=substr(date,4,2); year=substr(date,7,4) } /post-title entry-title/ {getline;printf("%2s-%2s-%2s %s\n",year,month,day,$3)}
This minor modification was required because the months from Jan. to Sept. are represented as a one digit (1 to 9), while from Oct. to Dec. as two digits(10 to 12). Therefore, the output is something like:
2013-12-31 Auld Lang Syne 2014-01-01 DX Pedition to Mars