Blog Summary

Sometimes you may wish to get all the titles of your previous articles in your or someone else’s blog. There should be numeours ways to attain the goal, and here is one aproach.

Suppose all the articles in December 2013 are in, say,
http://yourFBblog.com/2013-12-archive.html.

You will first get the file by

% wget http://yourFBblog.com/2012-12-archive.html

And looking into the file, you will see something like this

<h3 class='post-title entry-title' itemprop='name'>
<a href='http://yourFBblog.com/2013/12/taxi-driver.html'>Taxi Driver</a>
</h3>

This suggests you that you need a line just after the one containg the phrase “post-title entry-title”, and the title you wish to obtain is the second element of the line. Therfore, what you need is:

% cat 2013-12-archive.html | gawk ' BEGIN {FS = "[<>]"} /post-title entry-title/ {getline;print $3}'

And you will get something like this:

Taxi Driver
The Accused
The Silence of the Lambs
Nell
Contact

Leave a Reply

Your email address will not be published. Required fields are marked *