cubicweb #1253650 html validation from user input [resolved]
When force-html-content-type is unset, user can fill html fragment which can break the page display due to unknown namespaces or invalid elements. This problems happens frequently from copy/pasting from Word. Consider to enforce html input and strip arbitrary text that could interfere with some browsers. | |
priority | normal |
---|---|
type | bug |
done in | 3.10.6 |
load | 0.500 |
load left | 0.000 |
closed by | <not specified> |
attachment
Comments
-
2010/09/24 15:04
-
2010/09/24 15:10, written by sthenault
-
2010/09/24 15:53
-
2010/11/09 15:44, written by sthenault
add commentTake a look at http://codespeak.net/lxml/lxmlhtml.html#cleaning-up-html
Cleaner class can allow a specific subset of tags.
Note: lxml version 2.2.2 added several helper methods (skip_attributes, skip_tags)
we *already* do that. Please give examples of what should be stripped and isn't.
For a very simple example, you can try adding unknown attribute from missing namespace as "<p x:str="" />
(See test.html attachement)
still waiting a demonstration ;)