|
|
Cheshire3: preParser Module |
|
|||
| NormalizerPreParser | Calls a named Normalizer to do the conversion | ||
| HtmlSmashPreParser | Attempts to reduce HTML to its raw text | ||
| RegexpSmashPreParser | Either strip, replace or keep data which matches a given regular expression | ||
| HtmlTidyPreParser | Calls Tidy utility to turn HTML into XHTML for parsing | ||
| TagStripPreParser | Strip only named tags from the document eg script, style | ||
| PdfToXmlPreParser | pdftohtml wrapper to turn PDF into XML | ||
| PdfToTxtPreParser | Convert PDF to text via pdftotext utility | ||
| SgmlPreParser | Convert SGML into XML | ||
| AmpPreParser | Escape lone ampersands in otherwise XML text | ||
| MarcToXmlPreParser | Convert MARC into MARCXML | ||
| MarcToSgmlPreParser | Convert MARC into Cheshire2's MarcSgml | ||
| TxtToXmlPreParser | Minimally wrap text in <data> xml tags | ||
| GzipPreParser | Gunzip a gzipped document | ||
| BzipPreParser | |||
| B64EncodePreParser | Encode document in Base64 | ||
| B64DecodePreParser | Decode document from Base64 | ||
| UrlPreParser | |||
| OpenOfficePreParser | Use OpenOffice server to convert documents into OpenDocument XML | ||
| PrintableOnlyPreParser | Replace or Strip non printable characters | ||
| CharacterEntityPreParser | Transform latin-1 and broken character entities into numeric character entities. | ||
| Generated by Epydoc 3.0alpha2 on Wed Aug 9 18:09:56 2006 | http://epydoc.sf.net |