Blog Summary (2) – Spinor Lab

Or you may also wish to know each date on which the article is published. Looking into the html file again, you will see the line like this:

&lt;h2 class='date-header'&gt;&lt;span&gt;12/31/2013&lt;/span&gt;&lt;/h2&gt;

So your code, myprog.awk, might be:

BEGIN {FS = &quot;[&lt;&gt;]&quot;}
/&lt;h2 class='date-header'&gt;/ {date=$5}
/post-title entry-title/ {getline;print date,$3}

And you will obtain:

12/31/2013 Taxi Driver
12/22/2013 The Accused
12/15/2013 The Silence of the Lambs
12/08/2013 Nell
12/01/2013 Contact

If you wish to make a list spanning many months, you first make a file, say, getlist.txt;

http://yourFBblog.com/2014-01-archive.html
...
http://yourFBblog.com/2011-04-archive.html

And get all the files, and process them.

% wget -i getlist.txt
% gawk -f myprog.awk 201*-archive.html | cat -n

The result will be something like this:

  1  29/01/2014 My Latest Post
...
402  01/04/2011 My First Post

Actually, my current version of myprog.awk is:

BEGIN {FS = &quot;[&lt;&gt;]&quot;}
/&lt;h2 class='date-header'&gt;/ {date=$5; if(length(date)==9) date=&quot;0&quot; date; month=substr(date,1,2); day=substr(date,4,2); year=substr(date,7,4) }
/post-title entry-title/ {getline;printf(&quot;%2s-%2s-%2s %s\n&quot;,year,month,day,$3)}

This minor modification was required because the months from Jan. to Sept. are represented as a one digit (1 to 9), while from Oct. to Dec. as two digits(10 to 12). Therefore, the output is something like:

2013-12-31 Auld Lang Syne
2014-01-01 DX Pedition to Mars

S	M	T	W	T	F	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Leave a Reply Cancel reply