* Convert all links to absolute urls first.
  http://codespeak.net/lxml/lxmlhtml.html#examples
  
* Add support for soupparser if HTMLParser chokes
  http://codespeak.net/lxml/elementsoup.html