Blog to Ebook Conversion (4)

I first intended to delete all the tags from the text body of an article, but this is obviously wrong because in that way all paragraphs will be combined to one, thus making the output very unreadable.

So it is necessary to detect a new paragraph, and insert the appropriate tags there. The problem is paragraphs are generated in the original HTML files in various ways. Some examples are:

aaa
bbb
<br />
ccc
ddd

or

<div>
aaa
bbb
</div>
<div>
 
</div>
<div>
ccc
ddd
</div>