Intelligence Community’s Research ‘Arm’ Building A ‘Machine’ To Predict Cyber Attacks; Not A Panacea, Some Issues To Consider; ‘It Is The Second Mouse That Always Get The Cheese’
Aliya Sternstein, writing on the February 24, 2015 website, DefenseOne, says that “the Intelligence Community is holding a contest to design software that combs open source data — to predict cyber attacks before they occur,” “Imagine,” she writes, “if IBM’s Watson — the ‘Jeopardy ‘ champion — could not only answer trivia questions and forecast the weather; but, also predict data breaches…days before they occur.”
“That is the ambitious, long-term goal of a contest being held by the U.S. Intelligence Community,” Ms. Sternstein writes. “Academics and industry scientists are teaming up to build software that can analyze publicly available data and a specific organization’s network activity…to finding patterns suggesting the likelihood of an imminent hack”
“The dream of the future. A White House supercomputer, spitting out forecasts on the probability that say, China will try to intercept Situation Room video that day, or that Russia will eavesdrop on Secretary of State John Kerry’s phone conversations with German Chancellor Angela Merkel,” Ms Sternstein wrote.
“IBM has even expressed interest in the “Cyber-Attack Automated Unconventional Sensor Environment.” or CAUSE. project. Big Blue officials presented a basic approach at a January 21, proposers day.”
Aims To Get To Root Of Cyber Attacks
“CAUSE, is the brainchild of the Office Of Anticipating Surprise, under the Director of National Intelligence. “A Broad Agency Agreement,” — competition terms and conditions — is expected to be issued any day now, contest hopefuls say.” “Current plans call for a four-year race, to develop a totally new way of detecting cyber incidents — hours, to weeks earlier than intrusion-detection systems,” according to the Intelligence Advanced Research Projects Activity (IARPA).
IARPA Program Manager, Rob Rahmer, points to the hacks at Sony, and health insurance provider Anthem, as evidence that traditional methods of identifying “indicators” of a hacker afoot have not effectively enabled defenders to get ahead of threats.” “This is an industry that has invested heavily in analyzing and mitigating the cause — of cyber attacks,” said Rahmer, who is running CAUSE, told NextGov in an interview. “Instead of reporting relevant events that happen today, or in previous days, decision-makers will benefit from knowing what is likely to happen tomorrow.”
“The project’s cyber-psychic bots, will estimate when an intruder might attempt to break into a system, or install malicious code,” Ms. Sternstein writes. “Forecasts will also report when a hacker might flood a network with bogus traffic that freezes operations — a so-called Denial-of-Service Attack (DoSA). Such computer-driven predictions have worked for anticipating the spread of Ebola, other disease outbreaks, and political uprisings. But, few researchers have used such technology for cyber attack forecasts.”
At Least 150 People Interested — No Word Yet On The Size Of The Prize Pot
“About 150 would-be-participants from the private sector and academia showed up for the January informational workshop,” DefenseOne noted. Rahmer “was tight-lipped about the size of the prize pot, which will be announced later this year. Teams will have to meet various mini-goals, to pass on to the next round of competition, such as picking data feeds, creating probability formulas and forecasting cyber attacks across multiple organizations.”
At the end, “What you are most likely to be able to do is say to a client, “Given the state of the world; and, given the asset you’re trying to protect, or that you care about, here are the [events] you might want to worry about the most,” said David Burke, an aspiring participant and research lead for machine learning at computer science research firm, Galois, said in an interview. “Instead of having to pay attention to every single bulletin that comes across your desk about possible zero days,: or previously unknown vulnerabilities, it would be wonderful if some machine said, “These are the highest likelihood threats,” “His research is focus is advanced persistent threats, involving well-coordinated hackers who conduct reconnaissance on a system, find a security weakness, wiggle in, and invisibly traverse the network,” DefenseOne noted.
“Imagine CAUSE was all about the real-world analogy of figuring out whether some local teenagers are going to knock over a 7-Eleven,” Ms. Sternstein postulates. “That would be really hard to predict. You probably couldn’t even tie that to any larger goal. But, in the case of APTs — absolutely” you can, Burke said in an interview with DefenseOne. “The fact that APTs are on networks for a long period of time…gives you not only the sociopolitical pieces of data, or clues, but you have all sorts of clues on your network that you can integrate.”
“Its not an exact science. There will be false alarms. And the human brain must provide some support after the machines do their thing. “The goal is not to replace human analysts; but, to assist in making sense of the massive amount of information available; and, while it would be ideal to always find the needle in a haystack, CAUSE seeks to significantly reduce the size of the haystack for an analyst,” Rahmer said.
Unclassified Program Will Trawl For Clues On Social Media
“Fortunately, or unfortunately, depending on one’s stance on surveillance, National Security Agency intercepts will not be provided to participants,” DefenseOne said. “Currently, CAUSE is planned to be an unclassified program,” Rahmer said. “We’re going to ask performers to be as creative in identifying the future victim, the method of attack, time of future incident and location of the attacker,” according to IARPA.
“Clues might be found on Twitter, FaceBook, and other social media, as well as online discussions, news feeds, Web searches, and many other online platforms. Unconventional sources tapped could include black market storefronts that peddle malware and hacker group-behavior models. AI will do all this work, not people. Machines will try to infer motivations and intentions, Then, mathematical formulas, or algorithms will parse these systems of data to generate likely hits,” It will be interesting to see how many false positives, and false negatives this method/technique will generate.
“One research thread Burke is pursuing, examines the “nature of deception, and counter-deception, particularly as it applies to the cyber domain,” according to an abstract of his proposers’ day presentation,” Ms. Sternstein wrote.
“Cyber adversaries rely on deceptive attack techniques, and understanding patterns of deception enables accurate predictions and proactive counter-deceptive responses,” the abstract said.
” For example,” Ms. Sternstein notes, “if you are able to look at every single FaceBook post; and, you processed everything and ran it through some filter, through the conversations and the little day-to-day things people do, you could actually start to see larger patterns, and you could imagine that is a ton of data,” Burke said, “You would need some sort of big data technology that you’d have to bring to bear — to be able to digest all that.”
Still Nailing Down Specifics On Supercomputer Use
“The final rules will indicate whether companies can,or must use a supercomputer; and, whether they can borrow federal computing assets,” Rahmer said. “We definitely want innovation and creativity from the offerers,” he added.
“Researchers at Battelle, a technology development organization, said they might harness fact data processing engines, like Hadoop, and Apache Spark. They added that the rules and their team partners will ultimately dictate the system used to amp up computing power,” Ms. Sternstein wrote.
“We have already recognized as both the rate of collection and the connections between data points grow, we will need to move to a high-performance computing environment.” “Battelle’s Cyber Innovations Technical Director, Ernest Hampson, said in an email to DefenseOne, “For the CAUSE program, the data from several contractors could push us towards the need for a super-computing infrastructure…using technologies such as IBM’s Watson to support deep learning, or hardware such as a Cray Unika, “could provide the power to fuel advanced analytics….at scale.”
According to IBM’s January 2015 briefing, “the apparatus currently used to solve similar prediction problems “runs on x-86 infrastructure.” However, IBM’s x-86 supercomputer hardware was spun off the Chinese firm Lenovo last year. It remains to be seen what machine IBM might deploy,” a company spokesman said.
“In theory, the government could say they are going to own the servers,” IBM spokesman Michael B. RRowniski said. “We don’t know ultimately that we would participate; or, what we would even propose.”
Recorded Future, a six-year old CIA-backed firm, already knows how to generate hacker behavior models by assimilating public information sources, like Internet traffic, social networks, and news reports. But, the company’s analysis do not factor in network activity inside a targeted organization, because such data typically is confidential.”
“Doing this successfully is not simply analysis applied to current flashpoints,” Burke said. “You also have observables on a network: signs possibly of malware, or penetration, because many campaigns that take place go on for weeks, or months. So, you also have a lot of network data that you are going to end up crunching.”
I Applaud The Effort And Thinking — But, I Do Have Some Worries/Concerns
I mentioned one of my areas of concern earlier: false positives, and false negatives. This kind of predictive analysis — to be useful; and, embraced/believed by the analysts, as well as the senior levels of management — will have to have a very low rate of false negatives/false positives. Otherwise, first impressions and the trust factor are critical in the early stages If there are too many false negatives/false positives, analysts and senior managers will disregard, or question the analysis and information they’re getting and the project could be seriously undermined from a trust factor in the critical early stages. So, it will be important not to initiate these techniques too early, for fear of being fatally undermined right out of the blocks.
Secondly, there is the concern that analysts and senior management become too dependent on this kind of effort. I saw — before I retired from the Intelligence Community, the embracing of data mining software and techniques by the analytical ranks. I had a concern, and I still do, that some analysts become too dependent,and reliant on data mining — at the expense of good, old fashioned, insatiable curiosity. Analysts can get stale, and their analytical and critical thinking skills atrophy — if they become too reliant on data mining. I suspect in these times,when everything is being reported and provided so fast, that there is more than a fair amount of dependency on data mining pulls; and, not enough critical thinking and good old fashioned intellectual war-gaming of what the adversary, or their target might be up to. I hope I am wrong on that — but, I suspect there is some of that going on.
Which brings me to this ‘predicrtive,’ cyber threat, intrusion program/project. Those charged with monitoring, and discovering an unauthorized network intrusion/penetration — could become ‘lazy,’ and too dependent on this technique — if it proves to work well enough — and, their inquisitive skills and second-guessing about how the adversary may compromise their IT ecosystem — could atrophy and become stale. This is not a minor issue; and, needs constant oversight by middle and senior management — that this CAUSE Program is not a panacea, and is additive; but, not a substitute for critical thinking.
I am sure the Program Manager at IARPA has considered these issues and is cognizant that this technique not be oversold — for what it can, and cannot do.
I also do not know if this technique and method can aid in discovering that a trusted insider — like an Edward Snowden — is stealing the companies confidential secrets and/or, inflicting damage because they are disgruntled or unhappy with the way they are being treated, lack promotion, etc. Every Medieval fortress in Europe was eventually breached; and, more often than not — it was due to a trusted insider providing the adversary with the key vulnerabilities that allowed the adversary to successfully penetrate and defeat their ‘impregnable’ castle.
Forecasting and predicting is great if it can do better than we’re doing now. But, the adversary will always be adapting, and learning from our methods and techniques, and will find creative and clever ways to deceive us; and, sometimes thwart — even our best efforts. Remember, it is the second mouse, that always gets the cheese.
Hopefully, IARPA is considering these issues as they move forward in this endeavor. V/R, RCP