How to Create an Excerpt From a Post Without an Excerpt and Limit It by Character Count

wordpress excerpt from the content

WordPress posts have two types of texts associated with them: the post’s content, and its excerpt. The content is the main part of the post, and the excerpt is either filled out in its own field or generated by WordPress by extracting the first part of the content.

In this article, we’ll concentrate on what happens if the post’s excerpt is empty, and how we can control the length of the generated excerpt. However, it’s always best to fill out the excerpt with your own words and thus gain the utmost control over that field. How to add excerpt in a WordPress post is described in the WPShout post:

How to Set a Custom Post Excerpt in WordPress

The Case for Requiring a Fixed Character Length Excerpt

Many sites’ homepages display a post list showing posts’ excerpts only. Sometimes that list’s design is that of a grid, where each post is a block in a row. In those cases, the space allotted to every post is very specific – with a specific width and height – so the length of each excerpt must be the same so that all blocks are neatly aligned by row and column.

WordPress has a good function called the_excerpt() which displays a post’s excerpt. It works smartly: it first checks if there is an excerpt at all and if not, it extracts the first few words from the content. By default, it takes 55 words, but it is possible to change this number by using a filter, the excerpt_length filter, which you can deep-dive into in this WPShout article.

Changing WordPress Excerpt Length: Learning Through Spelunking

Theoretically, we could use the the_excerpt() function in our case where we want to limit the excerpt length. However, upon deeper inspection, it was not enough for us. Words are a very inaccurate unit of measure – words are of varying length, and we need something more precise: letters, or, as we call them in programming: characters. So this article will depict a function that takes the content (and not the excerpt) and extracts the desired number of characters from it.

This article will go through all parts of that function and explain all the elements that have to be taken into consideration in a function like this. Extracting characters doesn’t consist only of counting letters. The post might have other elements in it: HTML tags wrapping the words, images, shortcodes, blocks. Also, the counting of characters needs consideration for characters that are beyond ASCII code and extend to UTF-8. All of these issues will be discussed now.

Strip Tags

The first thing we’ll want is to strip all HTML tags. This will also remove the img tag and the comment tags that wrap Gutenberg blocks. Therefore it the first command in the function should be $content = strip_tags($content);.

Strip Shortcodes

Now we’re left without HTML tags, but we still have content that we don’t want to show in the excerpt: shortcodes. Since the excerpt is only meant to show text, we don’t want them in our content as they don’t make sense when not parsed. Thankfully, we don’t have to write our own parser – WordPress has a function that does exactly that: strip_shortcodes().

Strip Newlines and Spaces

After stripping the tags we might be left with redundant spaces and newlines, which we don’t want since we want the excerpt to be a continuous paragraph of text. Therefore, we’ll use a regular expression to find and replace all tabs, space, newlines, etc. with a single space.

str vs. mb_str

Now we’re left only with the text, and we have 2 things left to do: extract the requested amount of characters and make sure that words aren’t cut in the middle.

Counting letters, how hard can that be? Using the strpos() and substr()  functions should do the task like a piece of cake. I was doing this in a site that used a language whose letters aren’t in ASCII code, and 2 things happened: I was getting much fewer letters than the number I passed to the function, and sometimes I’s get a gibberish letter at the end of the excerpt. Following a comment from a colleague, I was made aware that every str function in PHP has a multi_byte equivalent whose purpose is to address all UTF-8 characters. I hadn’t realized that UTF-8 characters took up more than 1 byte, therefore when strpos() was counting letters, it was actually counting bytes, but in UTF-8 case there are fewer letters than bytes, and that was the reason I was getting shorter strings that requested. This was also the reason for the gibberish character at the end of the excerpt: substr() also counts bytes, and if the number of bytes it gets to happens to bring it to a middle of a character, that character won’t be a real one and will be displayed as gibberish.
Therefore the mb_strpos() should be used instead of strpos(), and mb_substr() troubles should be used instead of substr().

Leave whole words

The next step is to elegantly trim the excerpt so that no word gets cut in the middle. This of course means that the number of characters passed as a parameter won’t, in most likelihood, be the exact number we’ll extract, but we do want to stay as close to it as possible and therefore we’ll check whats closets: cutting off before or after the last word that we’re in the middle of.

We use mb_strrpos we find the last space before the character limit, and mb_strpos to find the next space after the character limit. We then distance those spaces from the character limit and return the excerpt till the closest space.

The Whole Function

/**
* Get a limited part of the content - sans html tags and shortcodes - 
* according to the amount written in $limit. Make sure words aren't cut in the middle
* @param int $limit - number of characters
* @return string - the shortened content
*/
function wpshout_the_short_content($limit) {
   $content = get_the_content();
   /* sometimes there are <p> tags that separate the words, and when the tags are removed, 
   * words from adjoining paragraphs stick together.
   * so replace the end <p> tags with space, to ensure unstickinees of words */
   $content = strip_tags($content);
   $content = strip_shortcodes($content);
   $content = trim(preg_replace('/\s+/', ' ', $content));
   $ret = $content; /* if the limit is more than the length, this will be returned */
   if (mb_strlen($content) >= $limit) {
      $ret = mb_substr($content, 0, $limit);
      // make sure not to cut the words in the middle:
      // 1. first check if the substring already ends with a space
      if (mb_substr($ret, -1) !== ' ') {
         // 2. If it doesn't, find the last space before the end of the string
         $space_pos_in_substr = mb_strrpos($ret, ' ');
         // 3. then find the next space after the end of the string(using the original string)
         $space_pos_in_content = mb_strpos($content, ' ', $limit);
         // 4. now compare the distance of each space position from the limit
         if ($space_pos_in_content != false && $space_pos_in_content - $limit <= $limit - $space_pos_in_substr) {
            /* if the closest space is in the original string, take the substring from there*/
            $ret = mb_substr($content, 0, $space_pos_in_content);
         } else {
            // else take the substring from the original string, but with the earlier (space) position
            $ret = mb_substr($content, 0, $space_pos_in_substr);
         }
      }
   }
   return $ret . '...';
}

Update the Post with the Generated Excerpt

If we don’t want the code to have to generate the excerpt every time we display the post, we could update the post’s excerpt in the DB so that the next time the has_excerpt() runs on that particular post it will return a positive result and our function will just retrieve the excerpt without all the calculations.

Using do that with the wp_update_post() function, we’ll fill its post ID field and the excerpt field, and at the end of the wpshout_the_short_content() function we’ll call a little update function:


function wpshout_update_post_excerpt($new_excerpt){
	$post = array(	
		'ID' => get_the_ID(),
		'post_excerpt' => $new_excerpt,
	);
	wp_update_post($post);
}

How to Call the Function

The best way to call this function is by hooking it to a filter. In the WPShout article How to Customize Your WordPress Post Excerpts we learn, among other things, about the wp_trim_excerpt filter:
“The function wp_trim_excerpt() is the main function that actually generates an excerpt from WordPress post content (by shortening it to 55 words and adding “[…]”). Before it finishes, it calls the wp_trim_excerpt filter to let you filter the results. We’re using the wp_trim_excerpt filter to generate and return back our completely own excerpt behavior.”

function wpshout_excerpt( $text ) {
	if( is_admin() ) {
		return $text;
	}
	if (! has_excerpt() ) {
		$text = wpshout_the_short_content(200);
	}
	wpshout_update_post_excerpt($text);
	return $text;
}

add_filter( 'wp_trim_excerpt', 'wpshout_excerpt' );

Conclusion

This article taught us how to programmatically generate and update a post excerpt in a way that limits the excerpt characters and without cutting words in the middle.


3 Responses

Comments

Add a Comment

Your email address will not be published. Required fields are marked *