Templates: https://github.com/OWASP/Top10/tree/master/2024/Data
Contribution Process
There are a few ways that data can be contributed:
- Email a CSV/Excel/JSON file with the dataset(s) to brian.glas@owasp.org
- Upload a CSV/Excel/JSON file to https://bit.ly/OWASPTop10Data
We plan to accept contributions to the Top 10 2024 during Jun-Dec of 2024 for data dating from 2021 to current.
We have both CSV and JSON templates to aid in normalizing contributions: https://github.com/OWASP/Top10/tree/master/2024/Data
The following data elements are *required or optional:
Per DataSet:
- Contributor Name (org or anon)
- Contributor Contact Email
- Time period (2023, 2022, 2021)
- *Number of applications tested
- *CWEs w/ number of applications found in
- Type of testing (TaH, HaT, Tools)
- Primary Language (code)
- Geographic Region (Global, North America, EU, Asia, other)
- Primary Industry (Multiple, Financial, Industrial, Software, ??)
- Whether or not data contains retests or the same applications multiple times (T/F)
If a contributor has two types of datasets, one from HaT and one from TaH sources, then it is recommended to submit them as two separate datasets.
AnalysisWe will conduct analysis of the data, in a similar manner as the 2021 and hope to also include some trending data over both the 2021 and 2024 collection time periods.
Timeline
Data Collection: Jun - Dec
Analysis: Early 2025
Draft: Early 2025
Release: First half of 2025
All told for the data collection; we have thirteen contributors and a grand total of 515k applications represented as non-retests (we have additional data marked as retest, so it's not in the initial data for building the Top 10, but will be used to look at trends and such later).
We asked ourselves whether we wanted to go with a single CWE for each "category" in the OWASP Top 10. Based on the contributed data, this is what it could have looked something like:
1. Reachable Assertion
2. Divide by Zero
3. Insufficient Transport Layer Encryption
4. Clickjacking
5. Known Vulns
6. Deployment of the Wrong Handler
7. Infinite Loop
8. Known Vulns
9. File or Dir Externally Accessible
10. Missing Release of Resources
And that is why we aren't doing single CWEs from this data. It's not helpful for awareness, training, baselines, etc. So we confirmed that we are building risk categories of groups of related CWEs. As we categorized CWEs, we ran into a decision point, focusing more on Root Cause or Symptom ?
For example, Sensitive Data Exposure is a symptom, and Cryptographic Failure is a root cause. Cryptographic Failure can likely lead to Sensitive Data Exposure, but not the other way around. Another way to think about it is a sore arm is a symptom; a broken bone is the root cause for the soreness. Grouping by Root Cause or Symptom isn't a new concept, but we wanted to call it out. Within the CWE hierarchy, there is a mix of Root Cause and Symptom weaknesses. After much thought, we focused on mapping primarily to Root Cause categories as possible, understanding that sometimes it's just going to be a Symptom category because it isn't classified by root cause in the data. A benefit of grouping by Root Cause is that it can help with identification and remediation as well.
We spent a few months grouping and regrouping CWEs by categories and finally stopped. We could have kept going but needed to stop at some point. We have ten categories with an average of almost 20 CWEs per category. The smallest category has one CWE, and the largest category has 40 CWEs. We've received positive feedback related to grouping like this as it can make it easier for training and awareness programs to focus on CWEs that impact a targeted language or framework. Previously we had some Top 10 categories that simply no longer existed in some languages or frameworks, and that would make training a little awkward.
Finding Impact (via Exploit and Impact in CVSS)
In 2017, once we defined Likelihood using incidence rate from the data, we spent a good while discussing the high-level values for Exploitability , Detectability , and Technical Impact . While four of us used decades of experience to agree, we wanted to see if it could be more data-driven this time around. (We also decided that we couldn't get Detectability from data so we are not going to use it for this iteration.)
We downloaded OWASP Dependency Check and extracted the CVSS Exploit and Impact scores grouped by related CWEs. It took a fair bit of research and effort as all the CVEs have CVSSv2 scores, but there are flaws in CVSSv2 that CVSSv3 should address. After a certain point in time, all CVEs are assigned a CVSSv3 score as well. Additionally, the scoring ranges and formulas were updated between CVSSv2 and CVSSv3.
In CVSSv2, both Exploit and Impact could be up to 10.0, but the formula would knock them down to 60% for Exploit and 40% for Impact. In CVSSv3, the theoretical max was limited to 6.0 for Exploit and 4.0 for Impact. We analyzed the average scores for CVSSv3 after the changes to weighting are factored in; and the Impact scoring shifted higher, almost a point and a half on average, and exploitability moved nearly half a point lower on average.
There are 125k records of a CVE mapped to a CWE in the NVD data extracted from OWASP Dependency Check at the time of extract, and there are 241 unique CWEs mapped to a CVE. 62k CWE maps have a CVSSv3 score, which is approximately half of the population in the data set.
For the Top Ten, we calculated average exploit and impact scores in the following manner. We grouped all the CVEs with CVSS scores by CWE and weighted both exploit and impact scored by the percentage of the population that had CVSSv3 + the remaining population of CVSSv2 scores to get an overall average. We mapped these averages to the CWEs in the dataset as Exploit and Impact scoring for the other half of the risk equation.
We agreed that we would use the high watermark of the incidence rate for each grouping to help set the order of the 2021 Top 10. The results of this will be released shortly as our target release date is Sept 24, 2021, to align with the OWASP 20th Anniversary.