In this post I’ll write about how to test a website’s performance and functioning, to see if a tested optimization method is doing any good. Same goes for comparing different setups, WordPress plugins etc.
Disclaimer: I am not a professional web developer, nor would I call myself an expert. I am not doing this for any money – the main goal is to figure out what’s best for use with my websites. Budget for testing is down to my free time (no special tools paid, servers deployed etc.). Take all the provided information as: “to the best of my knowledge” – with any additions and especially corrections, being more than welcome – in order to improve the given info.
Table Of Contents (T.O.C.):
- Why do I write and publish this stuff
- Which website testing methods have I used
- Gremlin website testing method v2
- Testing the test method
- Testing method test results 🙂
5.1. GTmetrix testing
5.2. Octoperf testing
- Addition: Non-cached testing
1. Why do I write and publish this stuff
A perfectly logical question that I think is worth answering here. There are several reasons for writing stuff on my websites:
Filling in any gaps of my own knowledge
When I’m writing a post, on any topic, I try to write so that even a “novice” can understand. I believe that a man can claim to know something, only if he can explain it to a layman (and/or a child).
Now, when I start thinking and writing with that in mind, I easily see if there are any things that I myself haven’t figured out well enough – and do more research/reading/learning/testing… This is a surprisingly effective method that has worked for me both with school and work. Well worth the time invested.
Information that I’ve personally gathered and published are, at least for me, the most reliable. I often use my websites as a personal reminder/reference. Sometimes it is fun to read old posts and see how stupid I was. 🙂
Post on hosting providers’ technical support even got me some job offers – that was unexpected.
Practising my English
I had studied English for over a decade, got a Cambridge FCE to show for it, but foreign language, like most other skills, gets rusty if not used. So all this drivel is also a perfect opportunity for me to practice my English writing. While my sweet Tarzan English voice gets practised and can be enjoyed on BikeGremlin YouTube channel.
Easier answering of frequently asked questions
For questions that people often ask, it is easier to just give them a link to a post I had already written, than answering the same question 1000 times. And it feels good to share knowledge and experience, helping the community.
Getting help from the experts
When I have nicely documented all that I’ve done (and, perhaps, messed up), it is easier to get useful advice from various experts. A detail that seems trivial to me, or that I forget in a few months, might turn out to be important.
The last, but to most, not the least important. 🙂 Yes, my websites make some extra income. Using AdSense ads (see my Google AdSense experiment) and, when writing about products, I use affiliate links for those I like and would recommend (if they offer an affiliate program, if not, then just links). I had started doing it for free (practically publishing my cycling notes and charts online), but it’s not bad earning some profit (if nothing is sacrificed in terms of quality for that). The downside is that I occasionally get called out for shilling. 🙂 Can’t please everyone.
Addition: another logical question I occasionally get either explicitly or implied – no post, review, or test I have published was ordered, much less paid, by any company unless explicitly stated otherwise at the start of the article. Would have been great if they were. 🙂 My texts are mostly long, detailed and probably boring for most people – suppose it’s not really something companies would sponsor in terms of boosting sales. If any company still wishes to pay for my time and effort in testing their products, I’d be happy to, under the following conditions:
- Suggestions are welcome, but the contents of the published texts are up to me. If I’m writing about a product, I must publish all the flaws, along with the good stuff.
- You can get the text for a review before it is published and are free to ask for me to not publish it. Think that is fair if you’re paying. But I will not edit texts to make it look better than it really is – that’s where I draw the line.
- All the reader donations are more than welcome – PayPal me.
2. Which website testing methods have I used
I had already written about testing methods in my first post about website optimization – measurement. However, most of those methods were used on “live” (“production”) websites, before and after migrations, changes etc.
I had also tested the quality of resource separation in a reseller hosting environment. I.e: does a load test on one cPanel (or DirectAdmin) account cause extra resource load on the other reseller (sub)accounts. I concluded that it doesn’t:
This is especially the case with hosting providers who use CloudLinux. Of course, since in shared hosting environment many websites share the same physical server’s resources, in case all those resources are loaded, the load will certainly affect the other websites’ performance, even if it isn’t shown in the stats. So it is worth taking a look at the total server load, especially during the testing (I am sure there are better, automated solutions, but free open source phpSysInfo works on many shared hosting environments):
I have also confirmed that page load time (as well as size and the number of requests) increases when Google AdSense ads are shown on a page, compared to when they aren’t:
Embedded YouTube videos cause similar effect.
Post in which I described my testing and comparison of Yoast and The SEO Framework plugins used a method of testing one website, before and after changes. I had linked that article several times on Reddit, in discussions concerning WordPress SEO plugins. Authors of The SEO Framework plugin have joined some of the discussions (Reddit topic on Yoast alternative). They noted some objections to my testing methods, that seemed to have some merit. Which made me think.
3. Gremlin website testing method v2
I tried to figure out a testing method that would allow the following:
- Fast and efficient website testing, so that it doesn’t take weeks, or months.
- Enough precision (if not accuracy) to show differences even when they are relatively small (comparing two different setups, or plugins).
- Results relevance – so that they can be comparable over time, when testing various setups – at least for my websites (with hosting and setup I use).
- Ability for third parties to check and confirm, or dispute results by doing similar tests themselves (with hosting and setup that they use – to see what’s best for their websites). Aiming to be as objective as possible.
Based on that, my testing method would be the following:
- Creating two identically configured hosting accounts (my hosting account configuration for WordPress). On two separate cPanel accounts with the same hosting provider, on the same hosting server (using a reseller hosting account).
- Cloning my cycling website (cloning as migrating a website, but changing the domain while doing so) to two separate subdomains (such as test1.bikegremlin.com and test2.bikegremlin.com), each to a separate hosting account (as explained above).
- Disabling all the externally loaded resources (embedded YouTube videos, Google Analytics, Google AdSense etc.). Of course, disabling indexing as well, so Google doesn’t get confused with any duplicate content.
- If testing hosting server performance, disabling all the caching (including Cloudflare).
When testing a plugin, leaving all the caching enabled, since that is how the website will be run in production.
- Running 20 consecutive GTmetrix tests, one on each test website simultaneously, using separate browser tabs. Using a test server closest to the hosting server, to minimize any network problems from affecting the tests. Except when testing a CDN – in which case several different testing locations are preferred.
Update November 17th 2020: GTmetrix have changed their testing and test display methods, and introduced a limit of 10 tests per day. So I will have to find a different tool for this.
Running two load tests in the same manner. For that I’ll be using OctoPerf free package (that can simulate 50 concurrent website visitors browsing, page after page).
4. Testing the testing method
One of the authors of The SEO Framework plugin claims and explains why this method of testing is not accurate enough, giving their alternative recommendations (link). The recommendations boil down to these:
- Using the Query Monitor plugin, that can measure (among other things) database query times.
- Using a tool like ab for load tests.
Claiming that tools that “imitate” a visitor browsing put other, non-controlled factors (like network speed, browser rendering speed etc). Noting that a SEO plugin affect on speed is so small, that any measuring imperfection makes results unreliable.
When it comes to WordPress plugin testing, what I’m interested in is the following:
- Do they have any security problems (WPScan Vulnerability Database is a good place to check).
- Do they break the site (white screen of death, unwanted altering of website’s visual look, or functioning).
- Does the website load faster, or slower for the visitors (or it remains the same).
- Is the server load higher, lower, or the same (when doing load tests).
So the first thing I wish to test is the testing method itself. 🙂 I plan on doing this by comparing test results without changing anything. Testing two identical websites (apart from the different sub-domain). Afterwards, I’ll publish the results here to know how precise this method is. Anything over 1% is not very good, while over 5% is unacceptably poor. That way, at least I’ll know how reliable my testing is and whether to take it as conclusive, or just a rough guide.
5. Testing method test results 🙂
OK, I created two website clones – completely identical. Placed them on separate cPanel accounts, on the same server. Disabling all the analytics and AdSense ads – so they don’t affect the test results.
The hosting server is located in London, so all the tests were done from that location, to minimize any network speed fluctuations from affecting the results.
5.1. GTmetrix testing
First testing was done with GTmetrix tool. I randomly chose a page (TP1 from now on) from the first test site (TS1 from now on). Then, on the second test site (TS2) i chose the same page (PG2). I.e. the pages have identical permalinks and contents, only sub-domains of the test sites differ (as test1.bikegremlin.com and test2.bikegremlin.com).
I opened two browser tabs, and as simultaneously as possible, ran tests on both websites. Repeating the tests 20 times for each website.
Then I did the same for another page (PG2) on both websites.
Finally, the third test page (PG3) was in fact a category, containing a list of 18 posts.
Here are the test results:
There were deviations in the page load times measured for the same pages on the two different websites.
In percents, TS2 varied from 1.11% faster, to 5.33% slower (total 6.44% variation).
In absolute page load time, the variation is from 0.01 seconds faster, to 0.04 seconds slower (total 0.05 seconds variation).
AVG T in picture 4 shows an average for all the 3 tested pages.
Based on this, my conclusion is that this testing method can reliably measure differences only if they are greater than 7%, or/and 0.05 seconds.
5.2. Octoperf testing
I created a test using Octoperf tool, which simulates 30 visitors browsing the site simultaneously. 15 of them “use” Chrome browser on Windows 10, while the other 15 “use” Safari on iPhone 10.
Those visitors are browsing 40 pages, that I had randomly chosen and listed (some are pages, some posts, and some are categories that contain lists of posts). Each visitor opens page one, then, as soon as it is loaded, they open page two – and so on until they visit all the listed 19 pages. Then they clear their browser cache and start over from page one. Doing so for 10 minutes.
Since real people usually spend at least a second, or two, before clicking on another page, this kind of testing puts a lot more load on the server than 30 real visitors would. I kept an eye on the server load during the testing:
Moderate, as expected from a server that hosts hundreds of websites. Octoperf test results:
In percentage, TS2 page loading speed varied from 8.83% faster, to 4.62% slower (total 13.45% of variation).
Absolute page load time of TS2 varied from 0.06 seconds faster, to 0.03 seconds slower (total of 0.09 seconds variation).
My conclusion is that this method can reliably show differences if they are greater than 14%, or/and 0.01 seconds.
For those interested, these are detailed Octoperf test results in .pdf format:
2021 update: In order to make my tests more relevant (“realistic” if you like), I have created a new test site (WordPress website intended for testing and comparison). This website has a page created with Elementor, and a very small e-WooCommerce shop with a few products, and over 70 posts. The new test website is not a copy of my cycling website, in order to make my tests more objective: opening a hosting account using a friend’s credit card and testing on a new website doesn’t let the providers easily figure out that I am doing the testing – aiming for objective testing as much as possible… plus it lets my friends pay for all the expenses, instead of me! 🙂
Also, I’ve limited Octoperf load tests to “only” 30 concurrent visitors – the original version was made with 50 visitors. These visitors are opening every page and post on the website – opening the next page right after one page is loaded. Without any caching in their browsers (like brand new visitors). Because people, when they visit a website, spend at least a few seconds on a page and don’t go opening every page of a website right after the first page is loaded, this test is close to simulating over 300 simultaneous human visitors (my estimation).
This simulation is close to over 300,000 monthly visitors (though, the test, while it’s being performed, is a lot more stressful, simulating a peak load with many simultaneous requests).
The bottom line is: I’ve designed the test to run a server through its paces, but without crashing it. I’m trying to simulate a great number of visits, not a DDOS attack.
It is good that I have measured how (im)precise this testing method is. Now I know, when testing two different hosting servers, WordPress plugins, or different site setups, if one of them is faster/slower for more than 0.1 seconds, then it is most probably really so. While smaller differences in speed could be down to the testing method errors.
In percentage, for pages that load for under one second, that is under 15% of measurement error. I didn’t test any pages with slower load time, since my website pages don’t load slowly enough, at least not when those with embedded YouTube videos are excluded and when all the analytics and ads are disabled. If I had included those, I would have practically been measuring Google and YouTube, not the website’s “raw” performance.
This doesn’t mean that such a setup shouldn’t be measured. It just means that when comparing plugins, optimizations, or hosting server performance, introducing those “extras” will make results too random and inconclusive. I do keep an eye on Google analytics stats and check on any pages that have long average load times. But for quick testing and comparison, including these external resources can be detrimental.
7. Addition: Non-cached testing
Thanks to some good feedback (and corrections) from the WebWhim team (link to their website), I’ve decided it’s a good idea to also include some “bare-metal” tests, i.e. tests that skip any caching, CDNs and similar. These are the tools I use for that:
- PHP simple benchmark script (script’s GitHub link)
BikeGremlin script file download copy (.zip file download)
- WordPress Hosting Benchmark tool (plugin’s wp. org link)
I’ve also decided to stop using PHP/MySQL CPU performance statistics (plugin’s wp.org link). Its database speed test results are highly inaccurate (could be called misleading).
My methods, knowledge and experience change over time, but this always stays true: 🙂