@@ -12,11 +12,13 @@ This code was written with security best practices in mind, has an
1212extensive test suite, and has undergone
1313[ adversarial security review] ( docs/attack_review_ground_rules.md ) .
1414
15- ----
15+ ## Getting Started
1616
1717[ Getting Started] ( docs/getting_started.md ) includes instructions on
1818how to get started with or without Maven.
1919
20+ ## Prepackage Policies
21+
2022You can use
2123[ prepackaged policies] ( https://static.javadoc.io/com.googlecode.owasp-java-html-sanitizer/owasp-java-html-sanitizer/20190325.1/org/owasp/html/Sanitizers.html ) :
2224
@@ -25,7 +27,9 @@ PolicyFactory policy = Sanitizers.FORMATTING.and(Sanitizers.LINKS);
2527String safeHTML = policy. sanitize(untrustedHTML);
2628```
2729
28- or the
30+ ## Crafting a policy
31+
32+ The
2933[ tests] ( https://github.com/OWASP/java-html-sanitizer/blob/master/src/test/java/org/owasp/html/HtmlPolicyBuilderTest.java )
3034show how to configure your own
3135[ policy] ( https://static.javadoc.io/com.googlecode.owasp-java-html-sanitizer/owasp-java-html-sanitizer/20190325.1/org/owasp/html/HtmlPolicyBuilder.html ) :
@@ -40,7 +44,9 @@ PolicyFactory policy = new HtmlPolicyBuilder()
4044String safeHTML = policy. sanitize(untrustedHTML);
4145```
4246
43- or you can write
47+ ## Custom Policies
48+
49+ You can write
4450[ custom policies] ( https://static.javadoc.io/com.googlecode.owasp-java-html-sanitizer/owasp-java-html-sanitizer/20190325.1/org/owasp/html/ElementPolicy.html )
4551to do things like changing ` h1 ` s to ` div ` s with a certain class:
4652
@@ -49,9 +55,11 @@ PolicyFactory policy = new HtmlPolicyBuilder()
4955 .allowElements(" p" )
5056 .allowElements(
5157 new ElementPolicy () {
52- public String apply (String elementName , List<String > attrs ) {
58+ (String elementName, List<String > attrs) - > {
59+ // Add a class attribute.
5360 attrs. add(" class" );
5461 attrs. add(" header-" + elementName);
62+ // Return elementName to include, null to drop.
5563 return " div" ;
5664 }
5765 }, " h1" , " h2" , " h3" , " h4" , " h5" , " h6" )
@@ -64,14 +72,129 @@ need to be explicitly whitelisted using the `allowWithoutAttributes()`
6472method if you want them to be allowed through the filter when these
6573elements do not include any attributes.
6674
67- ----
75+ [ Attribute policies] ( https://static.javadoc.io/com.googlecode.owasp-java-html-sanitizer/owasp-java-html-sanitizer/20190325.1/org/owasp/html/AttributePolicy.html ) allow running custom code too. Adding an attribute policy will not water down any default policy like ` style ` or URL attribute checks.
76+
77+ ``` Java
78+ new HtmlPolicyBuilder = new HtmlPolicyBuilder ()
79+ .allowElement(" div" , " span" )
80+ .allowAttributes(" data-foo" )
81+ .matching(
82+ (String elementName, String attributeName, String value) - > {
83+ // Return value for the attribute or null to drop.
84+ })
85+ .onElements(" div" , " span" )
86+ .build()
87+ ```
88+
89+ ## Preprocessors
90+
91+ Preprocessors allow inserting text and large scale structural changes.
92+
93+ ``` Java
94+ new HtmlPolicyBuilder = new HtmlPolicyBuilder ()
95+ // Use a preprocessor to be backwards compatible with the
96+ // <plaintext> element which
97+ .withPreprocessor(
98+ (HtmlStreamEventReceiver r) - > {
99+ // Provide user with info about links before they click.
100+ // Before: <a href="https://example.com/...">
101+ // After: (https://example.com) <a href="https://example.com/...">
102+ return new HtmlStreamEventReceiverWrapper (r) {
103+ @Override public void openTag (String elementName , List<String > attrs ) {
104+ if (" a" . equals(elementName)) {
105+ for (int i = 0 , n = attrs. size(); i < n; i += 2 ) {
106+ if (" href" . equals(attrs. get(i)) {
107+ String url = attrs. get(i + 1 );
108+ String origin;
109+ try {
110+ URI uri = new URI (url);
111+ String scheme = uri. getScheme();
112+ String authority = uri. getRawAuthority();
113+ if (scheme == null && authority == null ) {
114+ origin = null ;
115+ } else {
116+ origin = (scheme != null ? scheme + " :" : " " )
117+ + (authority != null ? " //" + authority : " " );
118+ }
119+ } catch (URISyntaxException ex) {
120+ origin = " about:invalid" ;
121+ }
122+ if (origin != null ) {
123+ text(" (" + origin + " ) " );
124+ }
125+ }
126+ }
127+ }
128+ super . openTag(elementName, attrs);
129+ }
130+ };
131+ }
132+ .allowElement(" a" )
133+ ...
134+ .build()
135+
136+ ```
137+
138+ Preprocessing happens before a policy is applied, so cannot affect the security
139+ of the output.
140+
141+ ## Telemetry
142+
143+ When a policy rejects an element or attribute it notifies an [HtmlChangeListener ](https: // static.javadoc.io/com.googlecode.owasp-java-html-sanitizer/owasp-java-html-sanitizer/20190325.1/org/owasp/html/HtmlChangeListener.html).
144+
145+ You can use this to keep track of policy violation trends and find out when someone
146+ is making an effort to breach your security.
147+
148+ ```Java
149+ PolicyFactory myPolicyFactory = ... ;
150+ // If you need to associate reports with some context, you can do so.
151+ MyContextClass myContext = ... ;
152+
153+ String sanitizedHtml = myPolicyFactory. sanitize(
154+ unsanitizedHtml,
155+ new HtmlChangeListener<MyContextClass > () {
156+ @Override
157+ public void discardedTag (MyContextClass context , String elementName ) {
158+ // ...
159+ }
160+ @Override
161+ public void discardedAttributes (
162+ MyContextClass context , String elementName , String ... attributeNames ) {
163+ // ...
164+ }
165+ },
166+ myContext);
167+ ```
168+
169+ ** Note ** : If a string sanitizes with no change notifications, it is not the case
170+ that the input string is necessarily safe to use. Only use the output of the sanitizer.
171+
172+ The sanitizer ensures that the output is in a sub- set of HTML that commonly
173+ used HTML parsers will agree on the meaning of, but the absence of
174+ notifications does not mean that the input is in such a sub- set,
175+ only that it does not contain elements or attributes that were removed.
176+
177+ See [" Why sanitize when you can validate" ](https: // github.com/OWASP/java-html-sanitizer/blob/master/docs/html-validation.md) for more on this topic.
178+
179+ ## Questions ?
68180
69- Subscribe to the
70- [ mailing list] ( http://groups.google.com/group/owasp-java-html-sanitizer-support )
71- to be notified of known [ Vulnerabilities] ( docs/vulnerabilities.md ) .
72181If you wish to report a vulnerability, please see
73182[AttackReviewGroundRules ](docs/ attack_review_ground_rules. md).
74183
75- ----
184+ Subscribe to the
185+ [mailing list](http: // groups.google.com/group/owasp-java-html-sanitizer-support)
186+ to be notified of known [Vulnerabilities ](docs/ vulnerabilities. md) and important updates.
187+
188+ ## Contributing
189+
190+ If you would like to contribute, please ping [@mvsamuel] (https: // twitter.com/mvsamuel) or [@manicode](https://twitter.com/manicode).
191+
192+ We welcome [issue reports](https: // github.com/OWASP/java-html-sanitizer/issues) and PRs.
193+ PRs that change behavior or that add functionality should include both positive and
194+ [negative tests](https: // www.guru99.com/negative-testing.html).
195+
196+ Please be aware that contributions fall under the [Apache 2.0 License ](https: // github.com/OWASP/java-html-sanitizer/blob/master/COPYING).
197+
198+ ## Credits
76199
77200[Thanks to everyone who has helped with criticism and code](docs/ credits. md)
0 commit comments