How long should your blog post be for SEO?

One thing people may worry about SEO when they are writing blog posts is how long the posts should be. The purpose of this post is to calculate how long are the posts of your blog (in number of words). A GNU Linux/Mac OS machine with access to the posts folder will be required to follow this tutorial.

SEO post word counter script

Once on your posts folder, you can run the following bash script in order to list all the pages and sort them by the number of words descending:

#!/usr/bin/env bash

# get posts in an array called posts
posts=`ls -p | grep -v /`;

# print length of posts array
echo -e "\nListing posts sorted by word count (${#posts[@]} posts): \n";

# loop posts array
for post in $posts;
  # word counting content of each file
  do cat $post | wc -w | while read n ;
    do printf '%4s words => %s\n' $n $post;
  done;
done |
# numeric sort (-n numeric) descending (-r reverse)
sort -n -r;

One-liner version:

$ posts=`ls -p | grep -v /`;echo -e "\nListing posts sorted by word count (${#posts[@]} posts): \n";for post in $posts; do cat $post | wc -w | while read n ; do printf '%4s words => %s\n' $n $post; done; done; sort -n -r;

Bonus track: SEO Keyword density checker script

Another thing to bear in mind in SEO is the keyword density. You can use the following script to calculate the percentage of times that a specific keyword appears in one post its density.

1) With local access to the post file:

$ KEYWORD='MY_KEYWORD'
$ TOTAL=`cat MY_POST_FILE | wc -w`
$ FOUND=`cat MY_POST_FILE | grep -io '\<KEYWORD\>' | wc -w`
$ printf "Total word count: %s \nKeyword ($KEYWORD) appears %s time/s.\nnDensity: %s%%" $TOTAL $FOUND $(( 100 * $FOUND / $TOTAL ))

# Example:
KEYWORD='post'
TOTAL=`cat 2018-05-07-how-long-are-my-posts.markdown | wc -w`
FOUND=`cat 2018-05-07-how-long-are-my-posts.markdown | grep -io '\<'$KEYWORD'\>' | wc -w`
printf "Total word count: %s \nKeyword ($KEYWORD) appears %s time/s\nDensity: %s%%" $TOTAL $FOUND $(( 100 * $FOUND / $TOTAL ))
>
Total word count: 489
Keyword (post) appears 16 time/s
Density: 3%

2) Accessing an external post URL:

$ apt-get install -y lynx
$ TOTAL=`curl MY_POST_URL | lynx -stdin -dump | wc -w`
$ FOUND=`curl MY_POST_URL | lynx -stdin -dump | grep -io '\<KEYWORD\>' | wc -w`
$ printf "Total word count: %s \nKeyword ($KEYWORD) appears %s time/s.\nnDensity: %s%%" $TOTAL $FOUND $(( 100 * $FOUND / $TOTAL ))

# Example:
apt-get install -y lynx
KEYWORD='docker'
TOTAL=`curl -s "https://devopscell.com/docker/dockerignore/2018/04/25/using-dockerignore.html" | lynx -stdin -dump | wc -w`
FOUND=`curl -s "https://devopscell.com/docker/dockerignore/2018/04/25/using-dockerignore.html" | lynx -stdin -dump | grep -io '\<'$KEYWORD'\>' | wc -w`
printf "Total word count: %s \nKeyword ($KEYWORD) appears %s time/s\nDensity: %s%%" $TOTAL $FOUND $(( 100 * $FOUND / $TOTAL ))
>
Total word count: 499
Keyword (post) appears 14 time/s
Density: 2%

Once the HTML content is rendered we are using the command line browser Lynx to do a word count and check the its density (preventing to count HTML meta tags in previous SEO technique).