Preventing a WordPress XSS Attack: Complete Guide to Validating, Sanitizing, and Escaping Data
When it comes to making your WordPress site secure as a developer, probably the most impactful thing you can do is make sure you always clean up data they get from users. That means, generally, two things, validating or sanitizing it on the way into your system, and escaping it on the way out, reducing the chance of a WordPress XSS attack.
Today we’re going to cover how cross-site scripting is dangerous, and how to do validation, sanitization, and escaping in WordPress. But before we do, you can sign up to get a really interesting video from the course, which shows me executing an actual WordPress XSS attack – thus showing why they’re important, and how to protect yourself against them.
What exactly is a WordPress XSS attack?
The root cause of cross-site scripting vulnerabilities, like most such issues in programming, is trusting too much in the source of your data. From the perspective of a rational and kind person, it’s easy to just think that if you have a “Name” field, the only things you’ll get in that field resemble human names. But make that assumption at your own peril. From a security perspective you must be a little paranoid: assume that a human or bot will offer that their name is
Random unwanted alerts aren’t great; they’re pretty annoying and will almost certainly make you feel like your site was “hacked” if you’re not familiar with them. But they’re not the worst thing that can happen in an XSS attack–cookie stealing, data-snooping, unwanted forced forwards, and in-page link replacements are also trivially easy to execute if you’re able to successfully make a XSS attack on a website.
The Kind of WordPress XSS Attacks
There are, generally, two types of WordPress XSS attacks–those that your web server ends up helping the attacker in, and those that it does not. That does technically break down further into three classes: stored, reflected, and DOM-based XSS attacks.
Stored (Persisted) Cross-Site Scripting
Reflected XSS also Uses Your Server
The involvement of your server in the stored XSS attack is mirrored by that of a reflected cross-site scripting attack. Both of these involve your server, but a reflected attack is differentiated by not being stored there. Rather, a reflected XSS attack exists when your server doesn’t take the input from a user and adequately clean and safen it before it shows that input.
An example, on a WordPress site (which very few should be susceptible to) is when a search term contains
<script> tags. The URL for a search-term on a WordPress site in typically seen in the URL like
https://wpshout.com/?s=security. But it’s not very hard for an attacker to instead submit something like
DOM-based XSS Attacks are not our focus
The Meaning of “Escaping”, “Validating”, and “Sanitizations”
There are two big ways to prevent a cross-site scripting attack:making data right, and making wrong data safe.
There are two big ways to prevent a cross-site scripting attack: make sure that all data that you accept matches your expectations, and make sure code that would make up a XSS attack is shown to your visitors in a way that does not allow it to execute.
Making sure that data matches your expectation generally has two sides: validating to a user that they’ve given what you’ve asked them for, and sanitizing what they give you before you store it in case they either don’t listen to you or bypass your validation. Sanitization is when you clean up a value, validation is when you tell a user that a value isn’t what you expect and invite them to make changes. If someone gives you
<script>alert(1)</script> as their age, you may validate that value by telling them it’s not a valid value. You might sanitize that value by just making it into the age of “1”. Either works, and both is better than just one. In general, I’d say that you’re more secure with a sanitized value than simply a validated one.
But if your validation or sanitization proves inadequate, it’s best to also make sure that you escape that age value before you show it to your visitor. The most common outcome of such an effort would be that you’d show that user their age back (if you failed to prevent them submitting the thing you didn’t want) as
How to Validate Data in WordPress
Validation is making sure that a value matches what you expect. Typically, you’ll validate so that you can make the user resubmit their request when the validation fails.
Typically, you’ll validate so that you can make the user resubmit their request when the validation fails.
For validation, you can use the
filter_var PHP function, if that’s you’re style. That’ll often look like
filter_var($_GET['email'], FILTER_VALIDATE_EMAIL). These functions are powerful, and have the distinct advantage that they work outside of WordPress. You can see all of the available validation filters on PHP.net. In all cases, if the function gives you
false, you know that the data is invalid.
WordPress’s functions that are specific to XSS attack prevention fall most into the sanitization camp. But there are a few that are specifically useful for validation. Maybe my favorite is
is_email. It’s meant to do the same thing as the
filter_var call you used the last paragraph, but it’s much more concise
if (is_email($_GET['email']) is shorter than what we typed last time, and reads much more like English.
How to Sanitize Data in WordPress
The WordPress sanitization functions are often easier to read than PHP core functions
WordPress has a bevy of great functions to clean-up untrusted user input. You should, if you’re a developer serious about cross-site scripting prevention, become well-acquainted with them. But before we get into them, it’s good to know that PHP’s
filter_var also has a bunch of sanitization flags. The behavior of those is all explained at some length in this page of the PHP manual. At the heart of most of these operations is simple removing all the values in what’s submitted that aren’t of the type you wanted. So if you run a call of
filter_var($val, FILTER_SANITIZE_NUMBER_INT) you’ll have all of your letters and other characters that aren’t
filter_var function calls are the basis of some of WordPress’s sanitization functions, but the WordPress functions are often easier to read. They also, nicely, apply a lot of common WordPress conventions for you. A quick list, with modest explanation of each, of them follows. First the more common ones:
- sanitize_email – Removes all characters not in an email.
- sanitize_file_name – Clean up a filename, like
- sanitize_html_class – Makes sure that an HTML class name only has valid characters.
- sanitize_text_field – A convenient way to clean up basic form text fields.
- sanitize_textarea_field – Like
sanitize_text_field, but preserving new line characters.
- esc_url_raw – Poorly named, but sanitizes URLs before you store them or use them to fetch data.
- sanitize_option – A function to clean up based on some rules; which also offers you a chance to set your own rules using code like
add_filter( 'sanitize_option_my_id_option', 'intval');which WordPress will always call before saving the option (because it always calls
- sanitize_meta – A hook-based system to clean up meta values based on rules you define. Similar to
- wp_kses – Remove unacceptable HTML markup from a string — you supply acceptable HTML tags as the second argument. (KSES is a recursive acronym which stands for “KSES Strips Evil Scripts”. )
And the less-used ones:
- sanitize_key – Used by WordPress for plugin names, rarely used in development.
- sanitize_mime_type – Cleans up MIME types, like
- sanitize_sql_orderby – Ensures string is a valid SQL ‘order by’ statement.
- sanitize_title – Clean up a string for use as a post title.
- sanitize_title_for_query – Clean up a post title specifically for a DB query.
- sanitize_title_with_dashes – Clean up a string specifically for permalink-type use.
- sanitize_user – Clean up a username to match WordPress standards.
- wp_filter_post_kses – Use
wp_kseswithout needing to specify the rules, just get the same set WordPress uses for post content.
- wp_filter_nohtml_kses – Strip all HTML using
All of these functions are of some use, and may be helpful for you at some point while sanitizing your data in WordPress. The core thing to keep in mind when it comes to sanitization is that there is no real downside to it. Things like
sanitize_option are great as a way to sanitize a value every time you fetch it with
get_option. This multiple-runs of your cleaning function may seem a little wasteful, but it’s also a good guarantee that your values aren’t containing hostile code.
It’s also a great idea to wrap the sanitization of your values into your fetching and setting process. If you make sure that you always use
get_user_age or some-such function, and that function ensures that the value it returns is of the type it implies, you’ll have a huge beneficial extra layer of security in your system.
How to Escape Data in WordPress
Escaping differs from sanitization (sometimes called filtering) in one very important way: it doesn’t remove characters, it just makes them safer for you to use. The difference here is both subtle and profound. What this means, at heart, is that where sanitizing an age of
<alert>1</alert> would yield a value of
1 escaping it (depending on context) yields something like
<alert>1<alert>. This latter looks the same to a visitor, but doesn’t get accidentally run by a browser. This is thus vastly preferable.
Remember, though escaping can make most any data safe in WordPress, it still should be your second line of defense.
Remember, though escaping can make most any data safe in WordPress, it still should be your second line of defense. (“Layered security” is a great practice, and is the reason that you should always do both validation/sanitization and escaping.) What happens when you escape correctly is that your HTML document is very safe and very well formed. The hard thing about escaping, is that the proper way to escape contents depends a great deal on context.
The way you make sure a value is safe to show in the middle of a page of HTML is different than how you make sure it’s safe for inside of an HTML attribute declaration, or a URL.
WordPress provides five basic escaping functions. They are:
- esc_html – Used to protect large blocks of HTML content
- esc_url – Checks and cleans a URL
- esc_attr – What you need for HTML attributes — like going into a
titleor similar element attribute
- esc_textarea – For values going into a textarea in HTML
There are different alternatives which exist in pure PHP, but which you can use inside of WordPress. I generally would favor the WordPress functions — they’re better named, and thus easier, but things like PHP
htmlspecialchars work and can be used. I just find that
esc_html is less to type and makes clear both my intent and what will happen.
Another thing about escaping: you generally want to do it at the last possible second. There are reasons that you may not be able to, but in general escaping at the last possible second is best because you (1) don’t get in the way of your other programming with escaping concerns, and (2) don’t have to worry about some other programming operation breaking your escaping. (Some people give similar–but opposite–advice about sanitization–do it as early as possible–which is good advice for the same basic reason.)
Security Never Stops, but XSS is One of the Most Common Problems
As mentioned at the outset, a recent survey I did of security vulnerabilities inside of the WordPress ecosystem found that (by a small margin) cross-site scripting was the most common vulnerability. As a developer, understanding what XSS is and how you can prevent it is vital to keep your site secure for both you and your visitors.
But there’s more than XSS that matters to security.
But there’s more than WordPress XSS attacks that matters to security. Some of this we’ve covered before, some of this you’ve heard at WordCamps and in blog posts. But some of it is hard to understand and hard to think clearly about.
WordPress Security With Confidence is your complete guide to navigating the confusing, scary, and exceptionally important world of WordPress security. The course features 10+ chapters (with video tutorials), comes in developer and non-developer versions, and gives you all the knowledge you need to handle WordPress security, with confidence. Take a look!