How safe is it out there? Zeroing in on the vulnerabilities of application security.
How safe is it out there?
Zeroing in on the vulnerabilities of application security
|Table of Contents|| |
By Moran Surf, Application Security Expert
The article presents a statistical analysis of results obtained from numerous application level penetration tests performed by Imperva experts for various customers over the years 2000 - 2003. The research dives into the types of vulnerabilities found, their sources, the risk they incur, and their effects. The institutions whose applications were tested include banks, government institutions, telecommunication firms and even information security vendors. The article presents a unique opportunity to take a peek into the usually secluded data regarding the actual risk posed to web applications. It shows a constant increase in risk level over years and an overwhelming overall percentage of applications susceptible to information theft (over 57%), direct financial damage (over 22%), denial of service (11%) and execution of arbitrary code (over 8%). The article analyses results of first time penetration tests as well as repeat tests (retests) in order to evaluate the evolution of application security within Web applications over time. Our conclusion is that without proper application security devices and secure software development education, the inherent risk to an application does not decrease and may even increase over time. Taking into consideration that the organizations whose applications are included in this report are considered security aware (they showed the insight to order costly penetration tests) the results paints a bleak picture of the current state of Web application security.
Application hacking is a security field that contains a vast number of techniques enabling an attacker to compromise the confidentiality, integrity, and availability of an application. One of the common techniques used by application developers and service providers to increase the security of their internal data is to have experts try to hack into their systems in order to locate the numerous security holes within the systems, so that they can be patched. These efforts, known as application penetration tests, are very different from one another and require the testers to constantly develop new techniques and new methods for the tests to continue to be successful. Imagination is the key ingredient required by the tester in order to bypass the fences built by the developers. Application penetration tests are tests focused on the flow of the application, as opposed to network penetration tests, which attack the entire network surrounding the application.
Penetration tests are usually confidential and their results are usually kept within the closed boundaries of the tester and the developers of the application, for obvious reasons. Since the results of such tests are usually confidential, not many statistical analyses of test results are ever done. This report is a first trial to conduct a thorough analysis of real penetration test results
Between the years 2000 â€“ 2003 Imperva ADC conducted hundreds of application penetration tests for numerous customers. Some of the penetration tests were first time tests conducted against a system. Others are repeat tests (retests) that occurred either very closely to the first test or after a few months' period. The outcome of each penetration test is a written report submitted to the customer. The report contains a detailed description of identified vulnerabilities. Each vulnerability is classified by technique (e.g. SQL injection, Cross Site Scripting), severity (e.g. Critical, High or Medium) and potential effect (e.g. direct financial damage or denial of service). The analysis presented in this paper is based on the data gathered from 306 such reports of which 73 were obtained from retests. The raw data used for the analysis is provided in the appendix.
As mentioned earlier we used three types of classifications for the analysis of the data: Technique, Severity and Potential Outcome. For describing severity we used five values: Critical, High, Medium, Low and Informative. Classifying the techniques is a bit more complicated and we were forced to create a scale that combines both technique and purpose.
- Cross site-scripting â€“ an attack aimed at pushing a script tag into a server that would be sent from the server to an innocent user browsing the Web server thus causing the script to be activated in the innocent user's browser.
- SQL injection â€“ an attack that manipulates input data sent to the server, causing it to run a SQL-generated input that would pull data or change the contents of its internal data.
- Parameter Tampering â€“ changing the data within a parameter sent from one Web page to another in a way that would alter the behavior of the latter page.
- Known vulnerabilities â€“ using known vulnerabilities and exploits on commercial software platforms. This class holds dozens of attacks that are widely known and published.
- Cookie poisoning â€“ changing the contents of cookie saved in the client's computer in such a way that it would change the normal flow of the application.
- Access to administration area and internal modules â€“ allowing unauthorized access to administrative areas or other internal modules of an application.
- Directory traversal â€“ allowing access to unauthorized server directories.
- Improper management of permissions â€“ improper management of the server's permissions allowing a non-privileged user to access some modules that weren't originally intended to be seen by that user.
- Buffer overflow â€“ data sent as input to the server that overflows the boundaries of the input area, thus causing the server to misbehave. Buffer overflows can be used to make the server run a code sent into the overflowed buffer.
- Forceful browsing â€“ the ability of an attacker to directly access unauthorized Web pages by bypassing the logical flow of the application, possibly avoiding authentication requirements and credentials checking.
- Denial of service â€“ causing the site to malfunction due to some sort of denial of the service it is offering by means of bandwidth consumption, site defacement, and such.
- Session hijacking â€“ capturing the session of another user, which in effect means being able to impersonate the user in the eyes of the application.
- Brute force â€“ attacks designed to steal of passwords or session ids, by means of enumerating a large number of password/session ID options.
- Information gathering â€“ attacks whose purpose is not to actually perform an attack, but rather to reveal information on the system, which can further assist in other attacks.
Setting Risk Levels within Tests
Each vulnerability found during a penetration test is classified with one of 5 risk levels:
- Informative (1)
The risk level takes into consideration several factors. The most important factor is the potential damage of such an attack. Attacks allowing direct financial damage (such as purchasing of stocks at half price, or wire transferring money to a 3rd party in an offshore bank account) or sensitive information theft are naturally of critical or high risk. Potential damage varies according to the nature of the applications. For instance, ecommerce sites usually take denial of service attacks more seriously than an extranet employee portal does.
The second factor is the complexity of the required attack. Vulnerabilities which require exceptional skills to take advantage of are obviously of lower risk than those that can be easily exploited by script kiddies.
Lastly, the source of attack can influence the risk level. If an attack can only be performed by a subset of the company's employees, its risk level is likely to be lower than that of an attack that can be carried out by any hacker coming from an anonymous proxy.
The results are presented for each year separately. The results describe the total number of vulnerabilities of each type found, rather than the number of applications vulnerable to a specific type of attack. While the difference in terminology might seem minor, the difference in the results in significant. Many applications tend to have a specific vulnerability appearing numerous times.
Looking at the entire collection of results from all the years together we can generated the following conclusion. Parameter tampering is the most common vulnerability, constituting 16% of all the vulnerabilities. The second is permissions improper management, at 13% of the vulnerabilities. The third is the SQL Injection at 10%, and the fourth is Cross-Site Scripting at 9%. Figure 5 presents a chart with the distribution of all the vulnerability types.
We also attach a chart illustrating the distribution of the risk, which shows that vulnerabilities with high and critical risk are the most common at more than 50% coming from either category.
Figure 1 â€“ Percentage of Attack Types, and risks: 2000 - 2003
The Evolution of Application Security Risks
The first step in our analysis takes a bird's-eye view of the results by looking at the number and risk level associated of the vulnerabilities found in each application over time. Figure 1 below compares the number of critical and high risk vulnerabilities to the average number of vulnerabilities per test over time. We find that although the average number of vulnerabilities per tested application tends to decrease with time, the average risk level increases.
Are There Any Secure Applications Out There?
Another look at the results compares the number of tests in which we found critical vulnerabilities with the number of tests in which we found no vulnerabilities. The portion of presumably secure application is very small and relatively stable. Some of the tests that yielded no vulnerabilities (in fact all of them in 2001) are retests performed after previously uncovered vulnerabilities were fixed. In contrast, the portion of tests yielding critical vulnerabilities constantly grows over the years to an overwhelming 89% in 2003.
|% of reports without any problem:||0%||4% (3)||9% (4)||3% (5)|
|% of reports with CRITICAL problems |
(regardless of the number of critical problems)
Vulnerabilities and Their Effects
In order to emphasize the meaning of the results we classify the vulnerabilities by effect on the system. We use four categories which can be stated in simple terms that apply not only to information security experts, but to end users as well.
- Execution of arbitrary code on the server
- Unauthorized arbitrary information retrieval (includes private information theft)
- Direct financial damage
- Denial of Service
Each class may be the consequence of a wide variety of attacks. For example â€“ execution of arbitrary code can be achieved by either taking control of the server using improper permissions management, or by performing a buffer overflow attack that causes a terminal window to be opened at the client side.
As can be seen from the results graphs, execution of arbitrary code is fairly infrequent. However, direct financial loss is a risk we found in almost Â¼ of tested application. And the most amazing of all is continuous rise in applications susceptible to information theft and denial of service, where 60% of tested application were found to be at risk.
It's important to note that while both denial of service and execution of code appear to remain static over time, the actual vulnerabilities changed. In 2000, most of the DoS and arbitrary code execution vulnerabilities were related to the web server platform. Over time, production web server platforms of security-aware organizations became generally more secure, yet with the increase of vulnerabilities in the applications themselves, the numbers remain relatively stable.
Another important issue to notice is the generally low number of denial of service vulnerabilities. This is partially due to the fact that in a large portion of the penetration tests performed, the customer limited the test to non-destructive tests. This means that in fact, the numbers of Denial of Service attacks would have been higher without these limitations.
Is Penetration Testing a Silver Bullet for Application Security Risks?
It is our experience that many organizations regard penetration tests as an important means to mitigating application security risks. This assumes that once a test report is issued, vulnerabilities are fixed and that new vulnerabilities are not introduced. Many organizations, however, do not bother with repeating the penetration test after the problems were allegedly fixed. The information we collected over the years from customers that DO repeat penetration tests indicates that failing to perform a repeat penetration test may lead to a false sense of security.
The revealed vulnerabilities can be categorized into three classes: vulnerabilities that were missed by the first penetration test, vulnerabilities that were uncovered by the first penetration test but repeated themselves in the retest, and new vulnerabilities that were introduced during the period between penetration tests. Among the retests in which vulnerabilities were encountered all displayed vulnerabilities of either High or Critical risk level.
In one third of all retests we found previously encountered vulnerabilities of which half were claimed to be fixed by programmers. This figures indicate that programmers either did not understand the problem, did not know how to fix it or in many occasions just tried to hide it (e.g. disable detailed error messages on web server hoping to avoid SQL injection attacks). In 10% of the retests we found new, previously uncovered vulnerabilities. This is not due to incompetence of the first penetration testing team. Most of the applications we tested required many man years work to construct. In comparison, the calendar time reserved for penetration tests ranged from 4 to 14 days of at most 2 testers. In a single case of a system that required hundreds of man years to construct, the calendar time reserved for penetration testing exceeded 2 month and the penetration test team included 3 people at a time. In 60% of the retests we found completely new vulnerabilities which were either introduced during the fixing of the first group of identified vulnerabilities, or were introduced with the application's development evolution.
The Problem of the Ever Changing Application
Only a small portion of all organizations bother to periodically conduct penetration tests against the application. Most will perform at most one penetration test upon the launch of a new application, some will perform a second penetration test soon after the vulnerabilities from the first penetration test are patched. However, in a few cases we were able to perform periodic application penetration tests at intervals of 1 year, 6 month, and (in one case) once a month. This gave us an opportunity to analyze the behavior of security risks for a single application over time.
It turns out that all retests that were performed with a long period between tests revealed vulnerabilities that were already uncovered by the first penetration test. In 60% of those retests we found vulnerabilities that were actually fixed after the first penetration test and were reintroduced over time. A little bit of detailed research within the tested organizations revealed that those vulnerabilities were introduced during various change cycles that the applications went through during the period of between penetration tests. Some of the changes were introduced by programmers who had never seen the report of the first penetration tests. Nonetheless some of the changes that reintroduced old vulnerabilities were performed by the same programmers that introduced the original ones.
In 50 of the 73 applications, the retest was performed immediately after the initial test and we found that the security holes were indeed fixed. In 23 of the immediate retests, we observed the same errors. This is due to one of two reasons: either the customers didn't fix the bugs; or the fix was incorrect.
In the following chart we can see that the risk level of the application established after the periodic penetration test (excluding those that were performed within a month of each other) remained the same or even increased.
By comparing the type of vulnerabilities that were uncovered in the retest to those found in the original penetration tests, we find that in many cases programmers fix a specific instance of a vulnerability rather than eliminate it completely. This was very evident with SQL injection vulnerabilities that were only fixed in those same modules that we explicitly mentioned in our reports. Other modules suffering from the same vulnerability were not fixed. One of the major reasons for this type of behavior is that the penetration test (and the changes its results incurred) is the last stage in an already delayed project. Hence programmers are under a tremendous time pressure. Also, in some cases in which application programming is outsourced, the subcontractor is not being paid for the time and effort put in patching the vulnerabilities (the subcontractor was bound by the contract to deliver a â€œsecureâ€ application). Hence, it is the interest of programmers to invest as little time as possible.
First and foremost the results clearly show that application level hacking does pose a prominent risk to most applications. Also, the risk incurred by application level hacking is on the rise. This is mainly due to increasing functionality of applications, (they provide much more access to sensitive data) and the constant evolution of hacking techniques. Keeping in mind that the sample population used for our analysis is security-aware organizations, it is likely that more general figures regarding risk level are much higher.
Considering the fact that applications tend to undergo many and frequent changes, even an application that went through a thorough penetration test and patching cycle before it was launched is likely to become vulnerable over a time frame of 6 months to a year. Hence, application security cannot be reduced to a singular effort at a single point in time.
Another conclusion is that penetration test and fixes, however important as an audit mechanism, they do not yield completely secure applications. This is not because of lack of penetration test skills but rather lack of resources. No organization would invest an equivalent amount of time in application QA and application penetration testing. Hence no organization can expect the two processes to have the same yield. Moreover, knowing that applications subjected to a thorough QA process still display bugs, one cannot expect applications that went through a much shorter penetration test to display no vulnerabilities. In addition, if an application does not undergo penetration testing very frequently (i.e. with every change that the application undergoes) we can assume that there are periods of time (between the penetration tests) in which the risk level is very high.
It is therefore our conclusion, based on the analysis detailed in this paper, that in order to truly mitigate application security risks over time, organization must incorporate into their networks true application security solutions. Such solutions, which are the application security equivalent of network access controls (e.g. Firewall, NIDS, Router ACLs) will protect applications against application level attack techniques in a constant manner, providing protection against both known and unknown attacks.
Or, if we try to say how many applications out there suffer from attacks, we can extrapolate these numbers to the following conclusion. (Notice that the numbers don't add up to 100 %, rather it represents the percentage of attacks that included the attack class.)
Note that information disclosure attacks are not necessarily of 'Informative' risk level. Their risk level is set to informative when the information gathered has no immediate outcome that allows further attacking the system. A source-disclosure vulnerability, for instance, is likely to be classified with a higher risk level, assuming that the source can be later used for identifying other vulnerabilities.
See Appendix for tabular view of the results.
- All of them are retests where the initial test had some errors.
- 8% were initial tests without any error, and 1% are retests.
- 2% were initial tests without any error, and 1% are retests.