Problem Space

Cybercrime and security threat research is limited by access to threat intelligence data

What is the problem we are trying to solve? Data and metadata can be used to identify, characterize, locate and mitigate cybercrimes or security threats that exploit domain names or Internet addresses. Intelligence or data sharing are face many challenges, most notable among these being privacy considerations (e.g., data protection regulations) and commercial interests (licensing, cost, ownership, and rights to redistribute). Threat intelligence is often collected and processed using proprietary methods or community submission, which creates uncertainties in the accuracy, completeness, and validity of data; simply put, only the parties privy to the proprietary methods know the science involved.

To varying degrees, academic and private sector research overcome these; for example, academic research may choose to use data sets that have been made available to the public domain using reproducible scientific methods, whereas private research may use data sets that are processed using proprietary methods but that are acknowledged as reliable or "high confidence" by virtue of their wide adoption and scrutiny by by academic or private researchers. In the latter case, researchers trust that private researchers are using reproducible scientific methods.

Open threat intelligence efforts exist, but they are often short-lived. Novel academic research that results in the publication of valuable data are not typically sustained, either for lack of funding or because researchers and their institutions or funding agencies do not invest in production environments.

Cybercrime and security threat researchers must contend with a common and more serious problem: many if not most of the "best" threat data is designed to be used for real time threat detection or mitigation. The methods of acquiring these threat data are typically oriented to daily or more frequent access to the most recent criminal or abusive activity. Only in rare cases do parties who provide threat data, commercially or freely, offer access to historical data. Thus, research that calls for longitudinal (historical) study by necessity must compile its own historical repositories.

Accumulating repositories for historical threat analysis is a large and complex activity. The ICANN Domain Abuse Activity Reporting (DAAR) System is a noteworthy example. DAAR monthly reports focus on DNS abuse (spam, malware, phishing, and botnet domains) in ICANN TLDs. The COMPASS report from the DNS Abuse Institute (DNSAI) also measures DNS abuse. These efforts are narrowly scoped to measure domain name abuse. This provides a partial, often imperfect perspective of the cybercrime landscape. By evaluating URLs, registered domain names, hostnames, and the address space, we are able to accurately identify attacks and study the entire resource “pool” that criminals employ for campaigns to understand whether and what relationships exist.

Simply put, we dig deeper. By accumulating historical threat data and reporting more measurements, more transparently, for both the Internet naming and addressing systems. We will also report on a broader set of security threats, and will report on security threats that are classified as cybercrimes in the Council of Europe’s Convention on Cybercrime as we identify and acquire sources of threat intelligence data for these crimes.