Parsing HTML to fix microtypography & glyph issues
My somewhat-friend Sean built something that I use for this purpose quite often. You can view the demo here: http://files.seancoates.com/lexentity/ he blogged about it here: http://seancoates.com/blogs/lexentity and you can grab the source here: https://github.com/scoates/lexentity
It might not meet your full language needs, but it's a start with English.
As about quotes read this Q tag, others I would use bbcode library.As it would be really difficult to write algorithm to distinguish between dashes You need. BBcode allows editor to choose, but in that case when editor has to make an action You may think of providing some kind of button to insert special characters.For things that are easy to recognize, You just create new rules for BBcode lib and if they have to be local aware You would create different set of rules for different languages. Obvously inheritance in OOP would come handy here.