I have continued to follow my interests in what I call Web Templating somewhat described in earlier posts. The basic idea is that I want to create very simple HTML files that through some magic my web server wraps with a nice CSS, menus, navigation, etc. I want my web pages to have a consistent look and feel. I want web pages to be simple for me to create and even possible for my users to create. The only thing I know that does this is SnapLook.
For reasons mentioned in earlier posts I want to be able to have this functionality with my Python web scripts or applications as well as my web pages. So I decided to finally look at Kid a bit more. Apparently, its all the rage. Having really liked TAL and SimpleTAL I did find Kid to be more TAL-ish than I originally thought. Also, its python API is much saner. Its purely XML based which gives it some very winning features versus SimpleTAL is text based. I’ve never been a big XML person but I have definitely discovered that XML based and text based each have their own merits. Just depends on what you want to get done.
Along these lines, many of the simple web pages are not XHTML. They have <img> and <br> tags. So Kid choked on my web pages. However, once converted things turned out rather nice with Kid creating HTML 4.01 Strict output of my template and web page data. The down side is that many web pages I am not going to convert. Like the piles of DocBook. I can regenerate my current documentation, but the older, historical stuff is much more painful. In general, making all my “simple web pages” into XHTML snippets removes the simple.
Also, I need to be able to pull out certain things from the simple web pages. Such as any meta, html, title, and body tags and do the right thing with the content there. That means I have to parse the page. Non-XML pages can’t be easily parsed. Regular Expressions are not fast, but is parsing the XML? This led me down the ElementTree path and to find ElementTidy. Then there was pulling this together when the modules are designed to work with files rather than strings. But, after much reading, much frustration, a dash of code, and being generally pissed off at non-working tape libraries and hot server rooms I create this proof of concept.
from mod_python import apachefrom elementtidy import TidyHTMLTreeBuilderfrom xml.parsers.expat import ExpatErrorfrom elementtree import ElementTreeimport kidkid.enable_import()import templatedef handler(req): req.content_type = "text/html" page = template.Template() fd = open(req.filename) try: tree = ElementTree.parse(fd) except IOError, e: return apache.HTTP_NOT_FOUND except ExpatError, e: # Bad XML fd.seek(0) tree = TidyHTMLTreeBuilder.parse(fd) page.html = ElementTree.tostring(tree.getroot()) ret = page.serialize(output='html-strict') req.write(ret) fd.close() return apache.OK
This is a bit of mod_python code so you’ll need
the propper Apache configs in an htaccess file or some such that
associates a filename extension with this bit of mod_python code. It
handles XHTML simple web pages and slightly not so XHTML web pages and
wraps them neatly in a Kid template. (Which at this point only has one
tag in it with the attribute
py:content="XML(html)".) The page is
properly parsed so that I can edit it slightly as needed and everything
sent to the browser in HTML 4.01.
Is this method any better than SnapLook/PHP? How much slower? Its not fast but I’ve never bench marked SnapLook either. Would other people on this planet think something like this is useful?