Parsing HTML to fix microtypography & glyph issues Parsing HTML to fix microtypography & glyph issues php php

Parsing HTML to fix microtypography & glyph issues


My somewhat-friend Sean built something that I use for this purpose quite often. You can view the demo here: http://files.seancoates.com/lexentity/ he blogged about it here: http://seancoates.com/blogs/lexentity and you can grab the source here: https://github.com/scoates/lexentity

It might not meet your full language needs, but it's a start with English.


You might be interested in tidy. It is boundled with PHP 5+ (all you need to use it is libtidy). It not just parses HTML, but repairs it too.

But with the localization, you are on your own - intl does not have any data about quotes - f.ex.; at least i could not found them.


As about quotes read this Q tag, others I would use bbcode library.As it would be really difficult to write algorithm to distinguish between dashes You need. BBcode allows editor to choose, but in that case when editor has to make an action You may think of providing some kind of button to insert special characters.For things that are easy to recognize, You just create new rules for BBcode lib and if they have to be local aware You would create different set of rules for different languages. Obvously inheritance in OOP would come handy here.