The poppler package provides a pdf2html utility that you might be able to use. There is also a Python binding to libpoppler.