Archive for November, 2005

Adventures in Web Templating Part II

Tuesday, November 8th, 2005

After writing the code from my last post I have spent some time tweaking it and running some bench marks. Not anything really exact, but enough to give me some idea about the direction I’m headed in.

This direction ain’t it. The code is very, very slow rendering web pages. I took a web page right from my existing web pages running through SnapLook. 1,000 hits took over 2 minutes, 17 seconds. Figuring out that I was not using the C ElementTree implementation I got the time down to 1 minute 58 seconds. Holy crap kind of slow. I haven’t read all of Kid’s code but I believe that since I am inserting XML markup into the template it has to regenerate an ElementTree and validate the template again. Since I already have an ElementTree I’d like to make it simpler but I don’t see how. Way too much XML parsing is going on.

SnapLook renders the same web page with a template that is much more featurefull (compared to stick this bit of XML in this tag) in less than 30 seconds for 1,000 hits.

Adventures in Web Templating

Thursday, November 3rd, 2005

I have continued to follow my interests in what I call Web Templating somewhat described in earlier posts. The basic idea is that I want to create very simple HTML files that through some magic my web server wraps with a nice CSS, menus, navigation, etc. I want my web pages to have a consistent look and feel. I want web pages to be simple for me to create and even possible for my users to create. The only thing I know that does this is SnapLook.

For reasons mentioned in earlier posts I want to be able to have this functionality with my Python web scripts or applications as well as my web pages. So I decided to finally look at Kid a bit more. Apparently, its all the rage. Having really liked TAL and SimpleTAL I did find Kid to be more TAL-ish than I originally thought. Also, its python API is much saner. Its purely XML based which gives it some very winning features versus SimpleTAL is text based. I’ve never been a big XML person but I have definitely discovered that XML based and text based each have their own merits. Just depends on what you want to get done.

Along these lines, many of the simple web pages are not XHTML. They have <img> and <br> tags. So Kid choked on my web pages. However, once converted things turned out rather nice with Kid creating HTML 4.01 Strict output of my template and web page data. The down side is that many web pages I am not going to convert. Like the piles of DocBook. I can regenerate my current documentation, but the older, historical stuff is much more painful. In general, making all my “simple web pages” into XHTML snippets removes the simple.

Also, I need to be able to pull out certain things from the simple web pages. Such as any meta, html, title, and body tags and do the right thing with the content there. That means I have to parse the page. Non-XML pages can’t be easily parsed. Regular Expressions are not fast, but is parsing the XML? This led me down the ElementTree path and to find ElementTidy. Then there was pulling this together when the modules are designed to work with files rather than strings. But, after much reading, much frustration, a dash of code, and being generally pissed off at non-working tape libraries and hot server rooms I create this proof of concept.

from mod_python import apache

from elementtidy import TidyHTMLTreeBuilderfrom xml.parsers.expat import ExpatErrorfrom elementtree import ElementTree

import kidkid.enable_import()

import template

def handler(req):

   req.content_type = “text/html”

   page = template.Template()   fd = open(req.filename)

   try:       tree = ElementTree.parse(fd)   except IOError, e:       return apache.HTTP_NOT_FOUND   except ExpatError, e:       # Bad XML       fd.seek(0)       tree = TidyHTMLTreeBuilder.parse(fd)

   page.html = ElementTree.tostring(tree.getroot())   ret = page.serialize(output=’html-strict’)

   req.write(ret)   fd.close()

   return apache.OK

This is a bit of mod_python code so you’ll need the propper Apache configs in an htaccess file or some such that associates a filename extension with this bit of mod_python code. It handles XHTML simple web pages and slightly not so XHTML web pages and wraps them neatly in a Kid template. (Which at this point only has one tag in it with the attribute py:content=”XML(html)”.) The page is properly parsed so that I can edit it slightly as needed and everything sent to the browser in HTML 4.01.

Is this method any better than SnapLook/PHP? How much slower? Its not fast but I’ve never bench marked SnapLook either. Would other people on this planet think something like this is useful?