4 Ways to Strip & Remove HTML Tags In Javascript

Welcome to a quick tutorial on how to strip or remove HTML tags in Javascript. Need to extract “text-only” from a string of HTML code? Sanitize a string to make sure there are no HTML tags?

There are 4 common ways to strip or remove HTML tags in Javascript:

  1. Use regular expression – var txt = HTML-STRING.replace(/(<([^>]+)>)/gi, "");
  2. Directly extract the text content from an HTML element – var txt = ELEMENT.textContent;
  3. Use the DOM parser then extract the text content –
    • var dom = new DOMParser().parseFromString(HTML-STRING, "text/html");
    • var txt = dom.body.textContent;
  4. Use a library such as String Strip HTML.

That should cover the basics, but let us walk through more examples – Read on!

ⓘ I have included a zip file with all the source code at the start of this tutorial, so you don’t have to copy-paste everything… Or if you just want to dive straight in.

 

 

TLDR – QUICK SLIDES

[web_stories_embed url=”https://code-boxx.com/web-stories/strip-remove-html-tags-in-javascript/” title=”Strip & Remove HTML Tags In Javascript” poster=”https://code-boxx.com/wp-content/uploads/2023/02/STORY-JS-R4.webp” width=”360″ height=”600″ align=”center”]

Fullscreen Mode – Click Here

 

TABLE OF CONTENTS

 

DOWNLOAD & NOTES

Firstly, here is the download link to the example code as promised.

 

QUICK NOTES

If you spot a bug, feel free to comment below. I try to answer short questions too, but it is one person versus the entire world… If you need answers urgently, please check out my list of websites to get help with programming.

 

EXAMPLE CODE DOWNLOAD

Click here to download all the example source code, I have released it under the MIT license, so feel free to build on top of it or use it in your own project.

 

 

STRIP HTML TAGS IN JAVASCRIPT

All right, let us now move into the examples of stripping HTML tags in Javascript.

 

1) USING REGULAR EXPRESSION

1-regular-expression.html
 // (A) STRIP TAGS FUNCTION
function stripTags (original) {
  return original.replace(/(<([^>]+)>)/gi, "");
}
 
// (B) ORIGINAL STRING
var str = "<p>This is a <strong>string</strong> with some <u>HTML</u> in it.</p>";
 
// (C) "CLEANED" STRING
var cleaned = stripTags(str);
console.log(cleaned);

This method is probably plastered all over the Internet, and regular expression can be very hard to understand. So to keep things easy – replace(/(<([^>]+)>)/gi, "") simply means “replace all <SOMETHING> with an empty string”. Yep, that effectively removes all HTML tags.

 

2) TEXT CONTENT

2-text-content.html
// (A) STRIP TAGS FUNCTION
function stripTags (original) {
  // (A1) CREATE DUMMY ELEMENT & ATTACH HTML
  let ele = document.createElement("div");
  ele.innerHTML = original;
 
  // (A2) USE TEXT CONTENT TO STRIP TAGS
  return ele.textContent;
}
 
// (B) ORIGINAL STRING
var str = "<p>This is a <strong>string</strong> with some <u>HTML</u> in it.</p>";
 
// (C) "CLEANED" STRING
var cleaned = stripTags(str);
console.log(cleaned);

Yep, modern browsers actually come with a very handy NODE.textContent property. Just use that to get the contents of an HTML element, minus all the tags.

 

 

3) DOM PARSER & TEXT CONTENT

3-DOM-parser.htm
// (A) STRIP TAGS FUNCTION
function stripTags (original) {
  // (A1) PARSE STRING INTO NEW HTML DOCUMENT
  let parsed = new DOMParser().parseFromString(original, "text/html");
 
  // (A2) STRIP TAGS, RETURN AS TEXT CONTENT
  return parsed.body.textContent;
}
 
// (B) ORIGINAL STRING
var str = "<p>This is a <strong>string</strong> with some <u>HTML</u> in it.</p>";
 
// (C) "CLEANED" STRING
var cleaned = stripTags(str);
console.log(cleaned);

This is an alternative to the above. Still using textContent, but we are using the DOMParser instead of createElement. Yes, it does make a difference. More on that in the extras section below.

 

4) USING STRIP HTML LIBRARY

4-string-strip.html
<!-- (A) LOAD LIBRARY -->
<script src="https://cdn.jsdelivr.net/npm/string-strip-html/dist/string-strip-html.umd.js"></script>
<script>
const { stripHtml } = stringStripHtml;
 
// (B) ORIGINAL STRING
var str = "<p>This is a <strong>string</strong> with some <u>HTML</u> in it.</p>";
 
// (C) "CLEANED" STRING
var cleaned = stripHtml(str).result;
console.log(cleaned);
</script>

Lastly, for you guys who are on NodeJS – The above textContent will not work since there is effectively no browser. The String Strip HTML library is an alternative you can consider using – Yes, it also works on the “web version” as well.

 

 

EXTRA BITS & LINKS

That’s all for the main tutorial, and here is a small section on some extras and links that may be useful to you.

 

EXTRA – STRIP HTML ENTITES

CREDITS : https://css-tricks.com/snippets/javascript/htmlentities-for-javascript/
function htmlEntities(str) {
  return String(str).replace(/&/g, "&amp;").replace(/</g, "&lt;").replace(/>/g, "&gt;").replace(/"/g, "&quot;");
}

For the beginners, HTML entities are simply “special characters” to represent certain symbols. For example:

  • &lt; Represents “lesser than”, will display as <.
  • &gt; Represents “greater than”, will display as >.

To display HTML code snippets, we have to use HTML entities. E.G. <p>Foo</p> should be entered as &lt;p&gt;Foo&lt;/p&gt;. To remove HTML entities, we have to use an entirely different regular expression; None of the above “strip HTML” methods will work.

 

EXTRA – WHICH IS FASTER?

extra-speed.html
// (A) SOME VERY LONG AND STINKY HTML STRING
var str = `LONG LONG HTML STRING HERE`;
 
// (B) REGEX TIMING
console.time("REGEX");
var cleaned = str.replace(/(<([^>]+)>)/gi, "");
console.timeEnd("REGEX");
 
// (C) TEXT CONTENT TIMING
console.time("CONTENT");
cleaned = document.createElement("div");
cleaned.innerHTML = str;
cleaned = cleaned.textContent;
console.timeEnd("CONTENT");
 
// (D) DOM PARSER
console.time("DOMPARSE");
cleaned = new DOMParser().parseFromString(str, "text/html");
cleaned = cleaned.body.textContent;
console.timeEnd("DOMPARSE");
 
// (E) STRIP STRING LIBRARY
const { stripHtml } = stringStripHtml;
console.time("LIB");
cleaned = stripHtml(str).result;
console.timeEnd("LIB");

For you guys who are interested in using the most effective solution – Mirror mirror on the wall, which is the fastest of them all?

Not surprising, regex wins without having to parse anything. DOM parser is faster than creating a dummy element, and the library… grossly inefficient.

 

 

LINKS & REFERENCES

 

INFOGRAPHIC CHEAT SHEET

Strip/Remove HTML Tags In Javascript (click to enlarge)

 

THE END

Thank you for reading, and we have come to the end. I hope that it has helped you to better understand, and if you want to share anything with this guide, please feel free to comment below. Good luck and happy coding!

2 thoughts on “4 Ways to Strip & Remove HTML Tags In Javascript”

  1. Hi Ws. Toh

    Thanks a lot for the article, it helped me strip html tags. I just want to mention that in the first example with regex, it is not removing the html entities for example   < etc …

Comments are closed.