February 14, 2014 – Spinor Lab

I first intended to delete all the tags from the text body of an article, but this is obviously wrong because in that way all paragraphs will be combined to one, thus making the output very unreadable.

So it is necessary to detect a new paragraph, and insert the appropriate tags there. The problem is paragraphs are generated in the original HTML files in various ways. Some examples are:

aaa
bbb
&lt;br /&gt;
ccc
ddd

&lt;div&gt;
aaa
bbb
&lt;/div&gt;
&lt;div&gt;
&amp;nbsp;
&lt;/div&gt;
&lt;div&gt;
ccc
ddd
&lt;/div&gt;

S	M	T	W	T	F	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

Day: February 14, 2014

Blog to Ebook Conversion (4)