When displaying a large body of text, which may or may not have html-tags, following things must be done: * Allow all tags inserted by Kupu to be there * parse tex-code: \( ..code.. \) or \begin{equation} ... \end{equation} with tex-parser. Replace with image-tags. * links should be shortened as mentioned in comment by Hans for ticket #1352 * parse bracket-links: [objid linkname], replace with a-href:s. * If the text is not inside any other tags, tex-code or bracket-links, convert linebreaks to br:s. * if there are words longer than 50chars, chop them, unless they are inside "":s or ' ':s. * Allow following tags: whitelist=[??,??], remove everything else. * Make our own attributes filtering: ??? What else? Please fill in whitelist. Currently I'm doing this with one run that takes all 'interesting' parts out and deals them to functions according to type. (?Pregex) is a python-specific notation that gives dictionary keys to match-objects. Patterns and whitelist for allowed tags is below. Feel free to add to whitelist. {{{ ppattern=re.compile(r""" (?P<[a-z].*?>) # opening html tags, those that begin with ') # closing html tags, those that begin with '(?\[.*?\]) # everything that is put inside brackets |(?P\\\(.*?\\\)) # tex should be written inside \( ... \) |(?P\\begin\{(?P.*?)\}(?P.*?)\\end\{(?=P)\}) # detect \begin{smthing}...\end{smthing} |(?P(?)\ *?\n|\r) # detect linebreaks, unless they're after closed tag, f.ex !'
\n' |(?P[^ \t\n\r\f\v<>]{41}) # detect >40 char words, |(?P\Z) # detect end of a string, so open tags can be closed """, re.IGNORECASE | re.VERBOSE) # whitelist is for html-tags only whitelist=re.compile(r""" <( p |a |br |b |i |h3 |img |embed |li |ul |ol |table |tr |th |td >|\>\ .*?)> """, re.IGNORECASE | re.VERBOSE) }}}