Preventing a WordPress XSS Attack: Complete Guide to Validating, Sanitizing, and Escaping Data

sanitization kit against xss attack

When it comes to making your WordPress site secure as a developer, probably the most impactful thing you can do is make sure you always clean up data that your code receives from users. That means, generally, two things, validating or sanitizing it on the way into your system. And escaping it on the way out, reducing the chance of a WordPress XSS attack. WordPress Cross-site Scripting attacks are one of the most common ways people compromise sites. So today we’ll cover how they work.

In a recent survey of disclosed vulnerabilities in WordPress core, plugins, and themes, I did for WordPress Security with Confidence (my course on WordPress security), the most common type of vulnerability (about 33%) was cross-site scripting. Cross-site scripting vulnerabilities (often abbreviated XSS) are one where you make it possible for an attacker to execute unauthorized JavaScript to be run on your pages, because you failed to escape or sanitize something in your application’s data flow.

Today we’re going to cover how cross-site scripting is dangerous, and how to do validation, sanitization, and escaping in WordPress. But before we do, you can sign up to get a really interesting video from the course, which shows me executing an actual WordPress XSS attack. Realizing how easy they it is for a “hacking nook” like me to follow a cross-site scripting tutorial successfully makes you better understand why its so important your code don’t allow this vector of exploitation.

What exactly is a WordPress XSS attack?

At its heart a WordPress cross-site scripting attack is one where a bad actor is able to inject some code into your visitor’s experience without your knowledge or approval. This is dangerous because JavaScript is an increasingly powerful and important part of websites and web apps. Because of how much data is available to an attacker who successfully makes an XSS attack, you want to be very careful you do everything you can to prevent it.

The root cause of cross-site scripting vulnerabilities, like most such issues in programming, is trusting too much in the source of your data. From the perspective of a rational and kind person, it’s easy to just think that if you have a “Name” field, the only things you’ll get in that field resemble human names. But make that assumption at your own peril. From a security perspective you must be a little paranoid: assume that a human or bot will offer that their name is <script>alert('XSS');</script> and make sure that nothing bad happens in your application if they do. A random JavaScript window saying “XSS” is only the most innocuous outcome of this kind of attack.

Random unwanted alerts aren’t great; they’re pretty annoying and will almost certainly make you feel like your site was “hacked” if you’re not familiar with them. But they’re not the worst thing that can happen in an XSS attack–cookie stealing, data-snooping, unwanted forced forwards, and in-page link replacements are also trivially easy to execute if you’re able to successfully make a XSS attack on a website.

The Kind of WordPress XSS Attacks

There are, generally, two types of WordPress XSS attacks–those that your web server ends up helping the attacker in, and those that it does not. That does technically break down further into three classes: stored, reflected, and DOM-based XSS attacks.

Stored (Persisted) Cross-Site Scripting

A stored (or persisted) cross-site scripting attack is to my mind the worst kind. The reason it’s bad is that every time that a page is loaded on your site you have the real risk that the bad thing that the attacker did is served to every single visitor to your site. In a stored attack, your web server has happily accepted the data which includes a WordPress XSS attack, and then shows that attack code to everyone. An example of a stored XSS vulnerability: in the past, some CMSes have made it possible for people to add JavaScript to comments on websites. When that is allowed, every visitor who is shown the comment that contains the JS will be a victim of the XSS attack. Every. Single. One.

Reflected XSS also Uses Your Server

The involvement of your server in the stored XSS attack is mirrored by that of a reflected cross-site scripting attack. Both of these involve your server, but a reflected attack is differentiated by not being stored on the server. Rather, a reflected XSS attack exists when your server doesn’t take the input from a user and adequately clean and make-safe data before it shows that input back to the user.

An example, on a WordPress site (which very few should be susceptible to) is when a search term contains <script> tags. The URL for a search-term on a WordPress site in typically seen in the URL like https://wpshout.com/?s=security. But it’s not very hard for an attacker to instead submit something like https://wpshout.com/?s=<script>alert(1);</script>. When your site (theme, mostly) isn’t protected you’ll actually see a JavaScript pop-up from the site on the page. (That doesn’t happen here on WPShout, as you’ll fall afoul of the security protection layer we have. 🤓)

DOM-based XSS Attacks are not our focus

The last type of cross-site scripting attack is a DOM-based one. This is kind of the least relevant for most WordPress sites, as it does the least to involve a WordPress site. A DOM-based XSS attack will not go through your server AT ALL, which is how it differs from both stored and reflected attacks. Practically speaking, DOM-based XSS attacks are only relevant when you’re writing JavaScript for WordPress sites. Because that’s not a common methodology (though it is increasingly so) we’ll spend a very small amount of time talking about it. I heartily recommend the OWASP article and prevention cheat-sheet for those interested in learning more about this topic.

The same methods—validation, sanitization, and escaping—prevent all three types of XSS attacks. Today we’ll focus on WordPress-specific preventative measures, which are generally custom PHP functions WordPress provides to protect against reflected and persisted attacks. The concepts we cover when protecting in this way are also very relevant in a DOM-based attack space, but preventative measures there are JavaScript-only.

The Meaning of “Escaping”, “Validating”, and “Sanitizations”

There are two big ways to prevent a cross-site scripting attack:making data right, and making wrong data safe.

There are two big ways to prevent a cross-site scripting attack: make sure that all data that you accept matches your expectations, and make sure code that would make up a XSS attack is shown to your visitors in a way that does not allow it to execute.

Making sure that data matches your expectation generally has two sides: validating to a user that they’ve given what you’ve asked them for, and sanitizing what they give you before you store it in case they either don’t listen to you or bypass your validation. “Sanitization” is when you clean up a value, “validation” is when you tell a user that a value isn’t what you expect and invite them to make changes.

If someone gives you <script>alert(1)</script> as their age, you may validate that value by telling them it’s not a valid value. You might sanitize that value by just making it into the age of “1”. Either works, and both is better than just one. In general, I’d say that you’re more secure with a sanitized value than simply a validated one.

But if your validation or sanitization proves inadequate, it’s best to also make sure that you escape that age value before you show it to your visitor. The most common outcome of such an effort would be that you’d show that user their age back (if you failed to prevent them submitting the thing you didn’t want) as <script>alert(1);</script>. This will happen because you’ll have HTML-escaped the value and made the greater-than and less-than signs into things that prevent the browser from trying to run the JavaScript contained inside of them. This process of making-it-safe is called “escaping.”

How to Validate Data in WordPress

Validation is making sure that a value matches what you expect. Typically, you’ll validate so that you can make the user resubmit their request when the validation fails.

Typically, you’ll validate so that you can make the user resubmit their request when the validation fails.

For validation, you can use the filter_var PHP function, if that’s you’re style. That’ll often look like filter_var($_GET['email'], FILTER_VALIDATE_EMAIL). These functions are powerful, and have the distinct advantage that they work outside of WordPress. You can see all of the available validation filters on PHP.net. In all cases, if the function gives you false, you know that the data is invalid.

WordPress’s functions that are specific to XSS attack prevention fall most into the sanitization camp. But there are a few that are specifically useful for validation. My favorite is is_email. It’s meant to do the same thing as the filter_var call you used the last paragraph, but it’s much more concise if (is_email($_GET['email']) is shorter than what we typed last time, and reads much more like English.

Validation can be, and often is, done as a user-side JavaScript effort. This is great, and there a number of JavaScript libraries that I’ve used in the past to help with it. The downside of this is that there is no security guarantee in a client-side validation library. So doing some validation server-side is usually a good idea. And if you’re not doing validation, it’s necessary that you do sanitization server-side.

How to Sanitize Data in WordPress

The WordPress sanitization functions are often easier to read than PHP core functions

WordPress has a bevy of great functions to clean-up untrusted user input. You should, if you’re a developer serious about cross-site scripting prevention, become well-acquainted with them. But before we get into them, it’s good to know that PHP’s filter_var also has a bunch of sanitization flags. The behavior of those is all explained at some length in this page of the PHP manual. At the heart of most of these operations is simple removing all the values in what’s submitted that aren’t of the type you wanted. So if you run a call of filter_var($val, FILTER_SANITIZE_NUMBER_INT) you’ll have all of your letters and other characters that aren’t -+0123456789 removed.

These filter_var function calls are the basis of some of WordPress’s sanitization functions, but the WordPress functions are often easier to read. They also, nicely, apply a lot of common WordPress conventions for you. A quick list, with modest explanation of each, of them follows. First the more common ones:

  • sanitize_email – Removes all characters not in an email.
  • sanitize_file_name – Clean up a filename, like August_Reports.txt.
  • sanitize_html_class – Makes sure that an HTML class name only has valid characters.
  • sanitize_text_field – A convenient way to clean up basic form text fields.
  • sanitize_textarea_field – Like sanitize_text_field, but preserving new line characters.
  • esc_url_raw – Poorly named, but sanitizes URLs before you store them or use them to fetch data.
  • sanitize_option – A function to clean up based on some rules; which also offers you a chance to set your own rules using code like add_filter( 'sanitize_option_my_id_option', 'intval'); which WordPress will always call before saving the option (because it always calls sanitize_option before saving.
  • sanitize_meta – A hook-based system to clean up meta values based on rules you define. Similar to sanitize_option.
  • wp_kses – Remove unacceptable HTML markup from a string — you supply acceptable HTML tags as the second argument. (KSES is a recursive acronym which stands for “KSES Strips Evil Scripts”. )

And the less-used ones:

All of these functions are of some use, and may be helpful for you at some point while sanitizing your data in WordPress. The core thing to keep in mind when it comes to sanitization is that there is no real downside to it. Things like sanitize_option are great as a way to sanitize a value every time you fetch it with get_option. This multiple-runs of your cleaning function may seem a little wasteful, but it’s also a good guarantee that your values aren’t containing hostile code.

It’s also a great idea to wrap the sanitization of your values into your fetching and setting process. If you make sure that you always use get_user_age or some-such function, and that function ensures that the value it returns is of the type it implies, you’ll have a huge beneficial extra layer of security in your system.

How to Escape Data in WordPress

Escaping differs from sanitization (sometimes called filtering) in one very important way: it doesn’t remove characters, it just makes them safer for you to use. The difference here is both subtle and profound. What this means, at heart, is that where sanitizing an age of <alert>1</alert> would yield a value of 1, escaping it (depending on context) yields something like &lt;alert&gt;1&lt;alert&gt;. This latter looks the same to a visitor, but doesn’t get accidentally run by a browser. This is thus vastly safer than the unescaped alternative.

Remember, though escaping can make most any data safe in WordPress, it still should be your second line of defense.

Remember, though escaping can make most any data safe in WordPress, it still should be your second line of defense. (“Layered security” is a great practice, and is the reason that you should always do both validation/sanitization and escaping.) What happens when you escape correctly is that your HTML document is very safe and very well formed. The hard thing about escaping, is that the proper way to escape contents depends a great deal on context.

The way you make sure a value is safe to show in the middle of a page of HTML is different than how you make sure it’s safe for inside of an HTML attribute declaration, or a URL.

WordPress provides five basic escaping functions. They are:

  • esc_html – Used to protect large blocks of HTML content
  • esc_url – Checks and cleans a URL
  • esc_js – Allows you to escape text string into JavaScript operation. I find it esoteric and have never really used it.
  • esc_attr – What you need for HTML attributes — like going into a title or similar element attribute
  • esc_textarea – For values going into a textarea in HTML

There are different alternatives which exist in pure PHP, but which you can use inside of WordPress. I generally would favor the WordPress functions — they’re better named, and thus easier. But things like PHP htmlspecialchars work and can be used. I just find that esc_html is less to type and makes clear both my intent and what will happen. (Where the plural-noun of htmlspecialchars is very opaque about what, if any, transformation I should expect from that function.)

Another thing about escaping: you generally want to do it at the last possible second. There are reasons that you may not be able to, but in general escaping at the last possible second is best because you (1) don’t get in the way of your other programming with escaping concerns, and (2) don’t have to worry about some other programming operation breaking your escaping. (Some people give similar–but opposite–advice about sanitization–do it as early as possible–which is good advice for the same basic reason.)

Security Never Stops, but XSS is One of the Most Common Problems

As mentioned at the outset, a recent survey I did of security vulnerabilities inside of the WordPress ecosystem found that (by a small margin) cross-site scripting was the most common vulnerability. As a developer, understanding what XSS is and how you can prevent it is vital to keep your site secure for both you and your visitors.

But there’s more than XSS that matters to security.

But there’s more than WordPress XSS attacks that matters to security. Some of this we’ve covered before, some of this you’ve heard at WordCamps and in blog posts. But some of it is hard to understand and hard to think clearly about. That’s why I put many months into the course, WordPress Security with Confidence.

WordPress Security With Confidence is your complete guide to navigating the confusing, scary, and exceptionally important world of WordPress security. The course features 10+ chapters (with video tutorials), comes in developer and non-developer versions, and gives you all the knowledge you need to handle WordPress security, with confidence. Take a look!

Now, go forth and prevent unwanted JavaScript execution!


6 Comments
Most Voted
Newest Oldest
Inline Feedbacks
View all comments
Tim
December 7, 2020 7:35 pm

I am using esc_html, yet a security scan is still failing for a reflected XSS.

An URL value is passed like this:

to_amt=1″%27>%20

The PHP code renders it to the value and uses the esc_html function:

<input type="number" name="to_amt" " value="”>

Then it seems to swap the %27 for a backslash and the rendered code looks like this:

“>

Any idea why?

Aisha Henderson
May 27, 2020 9:36 am

This is one of the best articles I’ve read in while concerning XSS attacks. Thank you for taking the time to share your knowledge.

Lisa
May 26, 2020 2:40 pm

Tried to get the bonus video for the example of an XSS attack. When I input my email address, I get: Oops! It looks like there was an error: There was an error with your submission: 404: The requested resource could not be found.

Obi Plabon
October 19, 2017 1:10 pm

Thanks for this informative article.

I’ve found another sanitization function sanitize_textarea_field() which’s been added in 4.7.0 (https://developer.wordpress.org/reference/functions/sanitize_textarea_field/). Maybe you can add this to sanitization functions list.

Thanks