The testing industry has come a long way in the last ten years. Where do you think it will be in ten years time? How will the job of a tester have changed? Let’s imagine ourselves ten years in the future and then look back and see what happened…
A view from the year 2023 :
The last 10 years in the test industry
Test code overload
As test automation grew in popularity and accessibility, the volume of tests and test code grew and grew. And with the progressive shortening of development cycles, continuous deployment to production became the norm. Pretty soon we started to realise that the volume of tests and test code was in many cases larger than the code it was testing and it was less easy to change, less well controlled and more dependant on external factors such as environments and test data. Businesses were releasing code to production twice a day or more, and it became impossible to write and execute and maintain the huge automated functional test suites at that pace. Different ways of addressing this began to appear:
Some people took the view that a passing test has a negative value. A test that passes takes time to design and repeatedly execute and doesn’t inform any future development. (Compared to a failed test, which was clearly worth writing and executing because it found a bug.) This controversial view lead to testers looking critically at their regression test suites, and any tests that had never failed were cut. Only tests that had failed at some point in the past were retained, along with a small number of high priority tests.
New tests for new functionality were allowed to remain in the test pack for only a limited time and then removed once they had proven themselves to pass reliably. In a similar way, a hand picked selection of regression tests were temporarily brought out of retirement when neighbouring functionality was changed. This approach was successful in terms to reducing the size the functional test packs and increasing the speed of test execution. However, it didn’t affect the total volume of test code, which continued to grow over time. When regression tests were temporarily brought out of retirement they typically needed work before they would run again usefully. Lots of human involvement was involved in deciding which tests to include and which tests to cut, and this was inevitably subjected to political pressure from within the IT organisation, and frequently resulted in dangerous holes in the test coverage.
Other people began to abandon automated functional tests altogether. Component and unit tests were retained and extended and integration tests at component and sometimes system level were run. Testers got far more involved in the design and coverage of these low level tests. Some of these companies only realised how useless their cumbersome automated functional tests had become when they stopped using them and didn’t notice much of a dip in the quality of their output. They found that by improving the quality of unit and component tests and hiring Test Gangs (see below), they could find the business critical bugs quickly, and allow the market and production monitoring tools (see below) to find the non-critical bugs.
A new breed of tester and test agency began to emerge. These were the Test Gangs; small groups of testers with superb bug hunting skills, an ability to very quickly grasp the vulnerabilities and goals of a business and its software, and accomplished penetration testing skills. They would work in small groups of three or four, move into a development team and attack the system as best they could. It was important for these Test Gangs to remain fresh, so a gang typically stayed on site for no more than a month or two before being replaced (with a short overlap) by a fresh gang. A constant stream of fresh eyes, and slightly different test practices meant the software was always being energetically exercised and the quality of the unit and component tests continually improved. Most Test Gangs would operate as a small business and companies would bid for their services. Some of these Test Gangs became superstars and earned astronomical fees from those who could afford them. The role of a test manager in an organisation was often reduced to bidding for Test Gangs and making sure the rotation schedule stayed full.
The growth in popularity of the Test Gangs cemented the already shifting opinion that testing, and the automation of tests were two separate things. The automation of tests was simply coding and was done by coders, whereas the design or audit of those tests was done by test analysts. This shift in attitudes was met with relief by the testing industry. It allowed testers who really wanted to be developers to become bona fide developers and allowed testers with little interest in writing code to focus on testing.
Users as testers
Increasing use was made of analysis tools running against production software. Now that data storage and data processing costs were so low, it became possible to capture huge amounts of usage data and logs from live systems and analyse them quickly. The results were often referenced against cost and revenue data which was analysed to work out the actual cost and impact of each issue. The politics of much of the defect prioritisation process was replaced by defect prioritisation and fix cost/benefit software.
Some advanced systems were even able to take defects from test environments and apply those to the usage data from live environments, to work out what the projected cost and impact of that defect would be, were it not fixed.
Releases to production were so frequent that in the event that severe defects did escape the attentions of the automated unit/component/integration tests and Test Gangs, the defects could typically be identified, prioritised and fixed within hours if not minutes. Additionally, technologies to incrementally release (or Ramp Release) software became commonplace, so new versions of code could easily be trickled out to small user groups and then ramped up to the rest of the market.
The ramp release technologies meant that that the steepness of the ramp could be controlled very accurately, with different segments of the market (identified by numerous demographic criteria) targeted at each stage of the roll out. It also meant that several versions of the software might be in production at one time, perhaps with a tiny group of risk-tolerant early adopters using a very early partially tested version, larger and expanding groups using the previous version, and the remainder using the version before that.
What’s more, users who experienced problems with newer versions of the software were automatically rolled back or re-directed to a previous version that was known not to have the same problem, and were rewarded with different forms of loyalty or reward schemes. Customers’ attitudes towards seeing low level problems in live gradually shifted to a point where groups of people in less well-off countries would deliberately try to “mine” defects in exchange for the reward schemes which were then sold for cash. This was a sort of self-organised crowd sourced off-shoring process that was entirely unmanageable and highly effective.
Performance testing as we knew it is dead
The notion that in order to test the performance of your application you had to create a full scale production-like environment, filled with full scale production-like data and reproduce peak periods of load with production-like quantities of virtual users behaving in production-like ways became laughable and impossible to reconcile against the speed of development and deployment. It just couldn’t keep up.
At the same time, the idea that a system would degrade or fail completely at certain level of load became laughable. Why should 100% of the users of a system be denied use of the system just because the final 1% of those users exceeded the amount of load the system could handle? (As an analogy, if you know your concert venue has a capacity of 500 people, when you’ve counted 500 people going through the doors, you stop anyone else going in. You don’t keep letting more and more people squeeze in, until there is a stampede or fire and everyone perishes.)
The tolerance of applications or sites being down or unavailable also fell to zero. It was no more acceptable for a website to be unavailable than it was for supermarkets or banks to close unexpectedly in the middle of the day, or for TV to stop broadcasting. Anything other than 100% availability was treated as a disaster-recovery (DR) situation.
So performance testing had to change. Components were performance, load and stress tested in stubbed out, driver fed isolation all way down to unit level. And crucially, the integration points of components were gated, or throttled, to allow in no more traffic than the component had been proven to be able to handle, and gracefully reject any excess traffic. In this way, code based contracts were established at integration points at the component level, all the way up to system level. Each component would accept no more traffic than it could handle, and in return it would throttle its out-bound messages such that it never sent more traffic to the downstream component/system than it was contracted to. Systems therefore protected themselves and each other from load levels that were known to cause problems.
Performance testing therefore largely became a job of translating peak levels of users into useful information about what that meant for each component in the system, and ensuring component level tests covered this. This was a tough job. It also involved testing and approving the “traffic contracts” between the different components and different parts of the system.
Risk assessment with heat maps
A distinct new testing discipline began to emerge from Risk Based Testing practices. These testers would use tools to generate highly detailed diagrams of the system showing all the components and data flows and integrations. Typiucally these would be based on automated static-analysis of the system. These diagrams could zoom in to show individual pieces of code, or zoom out to show entire systems. The diagrams could then be overlayed with heat-maps, showing the most commonly used paths through the application (based on actual usage data), or heat maps to show the areas that were changed most frequently, or heat maps to show the most sensitive or critical areas. Individual data flows or processes could be highlighted to show their dependencies, at all levels of zoom. All this could also be overlayed with heat maps showing test coverage, and histories of passed and failed tests showing fragile areas. In this way, testing effort could be directed to the most beneficial areas.
Some people tried to apply predictive data modelling to these tools and predict where and when defects would appear, but without much success.
Some of the same old problems won’t go away
Test environments and test data were the bug-bear of many a tester back in 2013, and not much has changed since.
Testers still complain of there not being enough test environments and the environments they have not being representative of the live environments. Some testers have shifted their focus from testing functionality to testing the config and set up of different environments, and building tools to make the small local test environments mimic the full-scale production environments as closely as possible. This problem hasn’t been solved though, and it’s sadly not uncommon to find that even though all your tests have passed, when you go live a simple firewall setting or cache or config causes problems.
Test data, and specifically generating sufficient quantities of a sufficient variety of test data is another problem that hasn’t gone away. The dilemma of whether to use artificially created test data meeting exacting criteria for specific tests, which is therefore controllable but not at all representative of live data, or whether to use copies of live data which is uncontrollable and might not meet the needs of specific tests is still the subject of much discussion.
So, in summary
The over-riding theme of the last ten years has been to cut back on the huge costs of test execution and increase the effort spent on defect prevention. Naturally this has been greeted with uncertainty in the industry. How can testers advocate doing less testing? It is still a brave and persuasive IT manager or director who can convince his organisation that you don’t necessarily add value to a product by executing lots and lots of tests against it. You add value by by shining a very bright light on the product from all directions and understanding exactly what it is doing and why, illuminating the way ahead for the project team and the software.