sajad torkamani

What is HTML sanitization?

HTML sanitization is the process of parsing an HTML string and preserving only the tags that are considered “safe”. HTML sanitization is typically used by server-side programs to remove potentially dangerous tags like <script> or attributes like <onclick> that can be used as part of XSS attacks.

For example, the strip_tags function in PHP lets you do something like this:

$text = '<p>Test paragraph.</p><!-- Comment --> <a href="#fragment">Other text</a>';
echo strip_tags($text);

to produce:

Test paragraph. Other text

Note how the tags like <p> and <a> were stripped so that only their texts remain.

Or you can pass a whitelist of allowed tags:

echo strip_tags($text, ['p', 'a']);

to produce:

<p>Test paragraph.</p> <a href="#fragment">Other text</a>

Sanitizing on the client-side

If you’re working on a JavaScript application that must render data from a third-party service, you can sanitize content on the client-side using a library like DOMPurify.

Sources

Leave a comment

Your email address will not be published. Required fields are marked *