2008年4月14日 星期一
The illusion of security
The scenario we present here hinges on the theft of data from a multinational company. The company suffers from the illusion of security—that is, the belief it has implemented more than adequate security measures—only to discover it has not.
A DARK SCENARIO
The Data Mining Corporation (DMC) has an almost perfect business model. It collects data about individuals from hundreds of sources and then sells the aggregated data back to many of those sources. Its principal sources (and clients) include insurance companies, retail chains, media conglomerates, creditreporting agencies, mobile phone companies, law enforcement agencies, customs and immigration authorities, and intelligence agencies.
Among the ways DMC has managed to sidestep legislative and regulatory constraints on transfers of personal data is through mergers with or acquisitions of companies with their own extensive databases. DMC is headquartered in Miami, but now has major subsidiaries in London and Tokyo. It is listed on the New York and London Stock Exchanges and is considering a listing on the Tokyo Stock Exchange.
Scene 1: Management board meeting. The company secretary stands close to the iris scanner. The door opens and he enters the boardroom. The president, already there, nods a slight greeting to the company secretary who can see his boss is preoccupied. She is watching the boardroom video display, which depicts her vice presidents coming down the corridor toward the boardroom. A few seconds later, the vice presidents enter one by one and take their seats.
“Okay, let’s get on with it,” says the president. “Show the agenda.” The agenda appears on the large waferthin video screen on the wall opposite the president. Three items are listed:
* Data from developing countries. (Switzer)
* Theft of data. (Perrier)
1 For a detailed version of this scenario and for three other dark scenarios, see D. Wright, S. Gutwirth, M. Friedewald et al., Safeguards in a World of Ambient Intelligence. Springer, Dordrecht, 2008. The project was funded under the European Commission’s Sixth Framework Programme. It had five partner organizations, represented by the authors. The project aimed at identifying a range of safeguards to the threats and vulnerabilities facing privacy, identity, security, trust and inclusiveness from ambient intelligence. The views and opinions expressed in this article are those of the authors alone and in no way are intended to reflect those of the European Commission.
• Considerations re: listing on the TSE. (Hausmann)
Kevin Switzer, vice president for operations, speaks. “We’ve had complaints from the Customs and Immigration folks about the shortage and reliability of our data on people coming into the States. It mainly concerns people from developing countries. With our profiling technologies, we are able to identify anyone who might be a security risk or disposed to antisocial behavior. Unfortunately, most developing countries have no AmI networks, which makes it impossible to build up the same kind of detailed profiles of individuals like we can here in the U.S., Europe, or Japan. So the immigration authorities have been making threatening noises about refusing entry to people from countries without AmI networks.” Switzer seems concerned, but then smiles. “I think we have a golden opportunity here. We can set up AmI networks in those countries as long as we are the ones to collect and process the data. You’d think most countries would jump at the chance to have networks put in place at virtually no or little cost to them, but some of the countries are quibbling with us.”
“Quibbling?” asks the president, “What do you mean?”
“Quibbling about control of the data. They say if we control the data, it’s tantamount to signing their sovereignty over to us. But we’ve been working on a deal where we copy for them the data we collect... well, some of it, at least. Our intelligence agencies would not want us to hand over everything, nor do we have to. We can offer the raw data to the developing countries, but they won’t know if or how we’ve processed the data, especially since we do the processing here in the U.S. or in the U.K., outside their jurisdiction. They’ll have to settle for what we give them.”
“Okay, that sounds good to me. Any objections?” she asks the others, who remain silent. “No? Okay, then, Jacques, it’s your turn. What’s the latest on the theft at our London office?”
Perrier, vice president for security, shifts uncomfortably in his chair. “Well, as everyone here knows, we have a regular monthly audit of DMC’s data processing activity. From the last audit, we discovered that there had been a second backup of data immediately after the first, but we can’t identity exactly the device to which the data was backed up...”
“But you know who made the second backup?” asks the president.
“Umm... uh... yes. It seems likely that three of my staff were responsible for doing the regular backups that night. We want to ask them about this second
58 March 2008/Vol. 51, No. 3 COMMUNICATIONS OF THE ACM
backup, of course, but we haven’t been able to contact them. It seems all three left on holidays a few hours after the second backup was made. They were supposed to have returned three days ago, but they haven’t reported for work and they haven’t answered our calls.”
The president is getting angry. “And why don’t you know where they are? Surely you can track them via their location implants. Everybody has to have a location implant. It’s a condition of employment in our company, just like any critical infrastructure like banks or nuclear power companies.”
“Yes, but their implants are inoperable. They could have been surgically removed,” says Perrier.
“And what about the sensor networks in their homes and cars?”
“Yes,” says Perrier. “Like other employees, they’ve agreed that we can check their home systems and we’ve done that. There’s obviously nobody in their apartments, and their cars have been stationary since they left on holidays...”
“Have you checked the surveillance systems?” asks the president. “You can’t go anywhere in London without being caught by surveillance cameras hundreds of times a day.”
“Yes, we’ve been reviewing the data from the surveillance systems, too,” says Perrier. “But they haven’t shown up on those either. We’ve also checked with the airlines and railways and car rental agencies to see where they might have gone. Now we know they left for Costa Rica, but then the trail goes cold. As Kevin has just pointed out, the developing countries don’t have the kind of AmI infrastructure needed to track people, so they could really be anywhere. We’ve also been checking with the 4G companies, but so far, there’s been no data recovered on use of their mobiles.”
“I don’t understand how they could have got past our own security systems,” says the president. “We have access control to prevent unauthorized employees from copying or manipulation of data.”
“That’s true,” says Perrier. “The snag is that they were authorized. Quite a few employees have partial access, so if three or four with access to different bits collaborate, as these three appear to have done, they are able to get virtually full access to the data.”
“Even so,” says the president, “how did they get the data outside our headquarters?”
“With today’s technology, it’s easy to copy vast amounts of data in seconds onto highcapacity optical storage devices no larger than a deck of playing cards, which makes them easy to conceal on the way out of the building. It’s hard to break into DMC offices, but it’s not hard to get out.”
“If we were exposed, it would be a complete disaster,” says MacDonald, the VP for public affairs. “Among other things, it would show our clients that the profiles of our own employees are not reliable because we weren’t able to predict that these few bad apples were going to abscond with copies of our records.”
Max Court, DMC’s general counsel, speaks up. “If we were exposed? Are you suggesting we should withhold information about this theft from the police and those whose files have been copied?”
“Of course,” says MacDonald. “It’s obvious, isn’t it? I’d hate to imagine what it would do to our share price and our plans for a listing on the Tokyo Stock Exchange.”
Scene 2: The Old Bailey, two years later. BBC1 news anchor: “And now we go to our reporter, Miles Davenport, who’s been at the Old Bailey today, attending the trial involving the Data Mining Corporation and its directors. What’s the latest, Miles? Has the jury returned with a verdict?”
Miles Davenport: “Thanks, Serena. No, the jury hasn’t returned yet, but an announcement is expected in the next few minutes.”
BBC presenter: “Miles, can you just recap for our viewers what this trial’s been all about? And why is it so important?”
Miles: “Sure, Serena. It all started two years ago when The Financial Times broke a story about the theft of personal information on about 16 million people in the U.S. and the U.K. All this data was held by DMC, the world’s largest data miner. DMC discovered that someone had broken into its supercomputers but it didn’t say anything to anybody.2 Then there was a big spike in the number of identity theft cases. People were seeing all kinds of purchases on their monthly statements for things they hadn’t bought. A lot more people and companies were reporting that they were being blackmailed with threats of releases of embarrassing information. The FT got wind of this, and was able to trace the source back to a theft of data from DMC.
“At first, DMC denied everything, and then said they wouldn’t comment on it because the theft was under investigation. When its share price began skydiving on Wall Street and in London, DMC had to call off plans for a listing on the Tokyo Stock Exchange. For awhile, it looked like DMC was going bust, but the U.S. government stepped in and
2DMC is not alone. See J. Krim, “Consumers Not Told Of Security Breaches, Data Brokers Admit,” The Washington Post, Apr.14, 2005. See also D. Stout and T. Zeller Jr., “Agency Delayed Reporting Theft of Veterans’ Data,” The New York Times, May 24, 2006.
propped up the company. They said that national security was involved, and they could not allow the company to go bust.”
BBC presenter: “Personalized services are great, of course; they save us lots of time. And so are the improvements in our security, like knowing when we are near criminals or suicide bombers, but isn’t there a dark side?”
Miles: “Well, yes, there is. We have to trust companies like DMC to keep our data safe, secure, and accurate. But now we know that our data is not secure. DMC not only failed to protect our data, they were actually selling it to governments who were hunting for people with behavioral dysfunctions in case they were likely to commit a serious crime or an act of terrorism. They’ve also been selling the data to other companies who were using the data to spam just about everybody in the U.S. and here in the
U.K. DMC claimed they couldn’t be held responsible for what their clients did with the data.” 3
BBC presenter: “Thanks for that recap, Miles, but weren’t there some other issues that came out during the trial?”
Miles: “There certainly were, Serena. People are entitled to see their records, but most people didn’t even know about DMC, let alone the fact that they had built up such extensive records on every one of us. So, some consumer activist groups banded together to sue DMC for negligence. People had no idea just how pervasive ambient intelligence had become. We heard that in many instances the data coming from so many different ambient technology networks was often in conflict or didn’t make any sense. DMC countered that its proprietary software contains an algorithm for comparing data from different sources to maximize reliability and its predictive capability, but under intense questioning from the prosecution, they admitted they could never eliminate unreliability nor could their predictions of who might be a terrorist or criminal be 100% certain.”
BBC presenter: “And the DMC directors, what’s going to happen to them?”
Miles: “We’ll find out after the jury comes back with the verdict. The DMC president, however, has already resigned, but she went out with a golden parachute—a severance package worth a cool $100 million—and now she’s apparently living in Costa Rica.”
3cf. W. Safire, “Goodbye To Privacy,” The New York Times, April 10, 2005: “Of all the companies in the securityindustrial complex, none is more dominant or acquisitive than ChoicePoint of Alpharetta, Ga. This data giant collects, stores, analyzes and sells literally billions of demographic, marketing and criminal records to police departments and government agencies that might otherwise be criticized (or defunded) for building a national identity base to make American citizens prove they are who they say they are.”
ANALYSIS
Here, we present a methodological structure for analyzing this scenario, which could also be applied to the construction and analysis of many technologyoriented scenarios.
Situation. The objective of this scenario is to depict what is called the “illusion of security” in an AmI world a decade from now, when ambient intelligence has become pervasive in developed countries (but not developing countries), when most people embrace the personalization of services and the supposedly enhanced security resulting from the application of AmI. Although AmI offers powerful new technologies for security applications, such technologies can be undermined by determined people.
This dark scenario is a trend or reference scenario because it starts from the present and projects forward on the basis of tobeexpected trends and events. It is intended to be realistic or descriptive rather than, for instance, normative or extreme.
The scenario concerns the theft of personal information held by a data aggregator (DMC) by three rogue employees. Theft of identity occurs now, but the difference between such crimes today and in the future is the scale of the data involved. AmI will make it possible to gather orders of magnitude more information about virtually every person in America, Europe, and Japan. The future is also marked by an increasing concentration in the control of personal data. Thus, the risks to individuals are much greater when something goes wrong.
AmI technologies used in the scenario. The scenario makes reference to several AmI or AmIrelated technologies, including:
* Biometrics, such as the iris scanners that grant admission to the boardroom;
* Networked sensors/actuators, such as those that detect human presence in cars or homes;
* Speech recognition and voice activation, such as the system in the boardroom that recognizes a command from the president of operations to show the agenda;
* Surveillance technologies including video cameras, keylogging software, location implants, biometrics, and networked sensors, that are used to monitor where employees are and what they are doing;
* Intelligent software that can analyze past behavior and preferences in order to predict needs and personalize services (which TV program to watch, which products to buy), something Serena, the TV presenter in Scene 2, views as welcome by the market;
60 March 2008/Vol. 51, No. 3 COMMUNICATIONS OF THE ACM
* Networked RFIDs, sensors, and actuators for gathering data about people and the products they have or services they use. These and other AmI technologies greatly facilitate profiling of virtually everyone; and
o Fourthgeneration mobile phones, which combine today’s PDA capabilities with thirdgeneration mobile technology (and much else). Such multimedia personal devices provide a wide range of services (and collect vast data), but 4G networks are not available everywhere, especially not in developing countries, like Costa Rica, to which the data thieves and, later, the DMC president decide to decamp.
o Applications. The AmI technologies referenced in the scenario are used in various applications, including:
* Security: DMC has instituted various security measures, such as access control (to offices and software systems), key logging, proprietary software, employee monitoring and so on, to ensure the security of the personal data it collects and processes.
* Surveillance: Video cameras and other surveillance technologies keep watch on virtually everyone, especially in the streets and shops of London (and other cities), but increasingly in their homes too. Such technologies can be used to detect whether someone exceeds the speed limits or pilfers items from the shops, but also whether they engage in terrorism on the Underground.
* Immigration control, counterterrorism and policing: AmI networks are used to compile personal data and profile wouldbe visitors and immigrants to help officials assess whether they present a security risk or might behave in a socially dysfunctional way.
* Personalization of services and targeted marketing: With the prevalence of AmI networks, and the vast amount of personal data they generate, service providers can individuate their services to new levels of specificity.
* Critical infrastructure protection: It’s hard to get into the DMC offices (but not so hard to get out). AmI sensors and actuators, biometrics, and other access control measures are used to protect critical infrastructure, such as DMC, banks, public utility networks, government offices.
Drivers. The drivers at work here can largely be derived from the motives and needs of the principal characters in the scenario and/or economic, political or social forces. DMC’s management are primarily driven by the profit motive, a desire for scale (such as to be the market leader, to swallow or overwhelm competitors) and to create a situation where their clients are dependent on DMC services and products.
A second driver must be market demand, that is, there are many companies and governmental agencies that want the processed data that DMC has been supplying.
A third driver, not so dissimilar from the first, is that the data thieves are also impelled by the profit motive.
A fourth driver is respect for the law. This is (partly) indicated when DMC’s general counsel expresses some disbelief at the suggestion that DMC should cover up the data theft from both the police and those whose files have been copied. In Scene 2, respect for and redress through the law is the key driver.
Yet another driver can be identified, such as the media’s desire for a good story, which has the benefit of raising public awareness about the pervasiveness of AmI.
The scenario raises several issues:
Digital divide. The developed countries have AmI networks and the developing countries don’t. There is a risk that this will lead to discrimination against developing countries. Intelligence agencies and immigration authorities may not admit visitors and emigrants from countries without the AmI networks needed to generate detailed profiles and a determination as to whether a person could be a security risk. The digital divide issue radiates in many directions and prompts many questions. Will the quest for perfect security really protect our societies? Recent developments suggest we are as much at risk from homegrown terrorists as from those in developing countries. Also, if immigration is restricted from developing countries without AmI networks, won’t our “developed” societies somehow be impoverished because we will lack the views and experiences of those who know what it’s like to live on both sides of the digital divide? If immigration is restricted, especially on the grounds of a lack of AmIgenerated data, won’t we inflame resentment in developing countries?
Concentration of power. DMC is the clear market leader in the aggregation and processing of AmIgenerated data. It has a wide range of powerful clients. When there is a risk that DMC might collapse, the government steps in to prop up the company. When governments and client industries are so dependent on a single market player, they are at risk of being held hostage. Even if the company professes respect for the law, there is a distinct risk, whatever its declared intentions, that it will act in a monopolistic way (“Power tends to corrupt.”). High technology companies may fly under the radar screen of competition authorities for a long time before they are noticed, by which time they may have, like DMC, accumulated too much power.
he concentration of power manifests itself in other ways in the scenario. DMC says it is willing to establish AmI networks in some developing countries as long as DMC controls them. Developing countries, concerned about their sovereignty, will “have to settle for what we give them,” says Switzer. Also, employees have “agreed” that DMC can check their home sensor networks, that is, if they want a job at DMC, they must agree. Similarly, employees must bear location implants.
Lack of public awareness. Despite the convenience of personalized services and enhancements in security made possible by AmI, most people have not comprehended just how pervasive AmI has become, nor of the scale and volume of data being generated about them by AmI networks. In the scenario, public awareness is increased as a result of the investigative reporting and media coverage of the theft of data from DMC, the resulting trial, and the highlevel political intervention to stave off DMC collapse. Aroused public awareness may force changes in legislative or regulatory oversight. Hence, public awareness and the pressure of public opinion, stoked by the media, have utility as a safeguard against abuse. Unfortunately, such pressure is almost always reactive.
The illusion of security. Most people are willing to trade some of their privacy for better security. The scenario suggests that terrorism has become sufficiently serious that the intelligence agencies and immigration authorities are becoming unwilling to admit foreigners unless they have detailed information on each individual. Similarly, DMC employees seem willing to have location implants and surveillance equipment installed not only in their offices but in their homes and cars. They probably see this as beneficial in security terms.
It is ironic that DMC and its directors face a class action lawsuit on the grounds that they were negligent in securing personal data. Security would seem to be one of DMC’s key strengths, one of its key selling points. DMC can hardly believe that its many security measures—video surveillance, biometrics, keylogging software, access control measures, regular audits, employee implants and so on—could fail. But the question is: have DMC executives done enough? Was their profiling of employees sufficiently rigorous so that they did not need to fear theft by insiders? We are told that it was difficult to get into DMC offices, but not difficult to get out. DMC’s security defenses seemed primarily aimed at preventing breaches at its perimeter.
The company was rather less focused on the enemy within, hence the three employees (who had authorized access to the data) were able to collaborate, to copy the files and exit the premises without being challenged. Further, it seems to have been relatively easy for them to remove their location implants and to disappear without a trace. But the three employee data thieves are not the only miscreants at DMC. The senior executives also behaved unethically and illegally by not informing the police and their customers about the data theft.
Hence, we can conclude that an illusion of security prevailed at DMC and, perhaps, more widely within society as a whole. The illusion is fed by the implicit assumption that various AmI technologies and procedures will form an adequate defense against miscreants. Unfortunately, no matter how strong these technologies and procedures may be, they may still fail, especially against insiders acting in concert (both the employees and the executives).
At the societal level, we may assume that laws and regulations will protect us, but this scenario suggests that even there we suffer from the illusion of security—it takes a class action suit to bring DMC to justice. Market forces that might otherwise punish DMC are undermined because government decides that DMC cannot go to the wall. DMC has managed to acquire so much power—partly through its proprietary technology and partly through its market dominance—and has come to play so big a role in (ironically) national security that government cannot allow it to go under. But if DMC was unable to detect the security risk posed by three of its own employees, isn’t the government’s confidence in DMC technology misplaced?
The illusion of security is also fed by unwarranted trust. The issue of trust is not directly raised in this scenario, but it is not far away. One would think that a data aggregator, processor, and reseller like DMC would have some obligation to inform people whenever it sells data to others or takes over another company with personal data records. But this has not occurred. It seems that DMC clients, the intelligence agencies and immigration authorities, are content that individuals are not informed about what information DMC has on them, even if the law dictates otherwise.
California and a number of other states have strict laws requiring that companies do inform individuals
62 March 2008/Vol. 51, No. 3 COMMUNICATIONS OF THE ACM
when their data has been compromised—but that does not mean that they will. Compliance will depend as much on corporate culture and, especially, ethics as on legal deterrents. Thus, to some extent, even laws and regulations can instill an illusion of security.
CONCLUSION
The principal conclusion we draw from this article—from the dark scenario and the analysis—is that, although we can expect amazing advances in the development and deployment of ambient technologies, there is a risk that corporate ethics in the year 2018 will not be so different from those prevalent in the year 2008, which is to say that some companies will be good corporate citizens and some won’t. Similarly, some companies will have rogue employees just as they do today who are capable of undermining what might be perceived as strong security (technologically, procedurally, legally). A principal difference between today’s world and that depicted for the year 2018 could be that security concerns about terrorism and antisocial behavior will be such that unless individuals have really detailed profiles compiled from data from AmI networks, they may be barred from entering a developed country. Also, while people may welcome the convenience from personalization of services and the ubiquity of surveillance technologies, they may be lulled into a false sense of security.
As mentioned in the introduction to this article, there have been few “dark” scenarios put forward by AmI experts and aficionados. The SWAMI project has taken a deliberately contrarian position with regard to scenarios that show the “sunny” side of AmI. While the authors are as enthusiastic as anyone about the potential of AmI, advances in surveillance technologies, biometrics, and fourthgeneration mobile systems, they believe the AmI community, policymakers, and society must be alert to possible abuses of the technology. Constructing scenarios and using an analytical structure along the lines as noted in this article offer a useful way of stimulating dialogue about such possible abuses as well as other technology issues.
Identifying possible abuses is the first step in devising safeguards. Almost certainly, a mix of safeguards will be needed—technological, socioeconomic, legal, and regulatory and even cultural safeguards can be envisaged.4 As a minimum, the SWAMI consortium advocates a privacy impact assessment for any projects
4Spielberg’s 2002 film, Minority Report, could be regarded as an example of a cultural safeguard. The film depicts a society, in 2054, embedded with AmI technologies, in which we hear memorable phrases such as “I’m placing you under arrest for the future murder of ...”. For a discussion, see D. Wright’s, “Alternative futures: AmI scenarios and Minority Report,” Futures 40, (June 2008) 40, 5.
supported by public funding. Designers of new technology should be required to factor in data protection in any new AmI architectures and networks. Legislation and regulation will probably be necessary, and one can predict that will elicit protests from those in favor of deregulation and getting the government off their backs. So be it.
If civil liberty advocates have had concerns about encroachments upon our privacy in the emerging surveillance society, they will be positively apoplectic if AmI, already being implemented in a somewhat piecemeal fashion, becomes as pervasive as its supporters believe it will. To anticipate this future, rather than react to it, appropriate safeguards should be agreed and put in place. Now is not too soon to start. To that end, the authors hope this article will stimulate interesting discussions and constructive debates on the issues it raises, including corporate ethics and privacy in the AmI space; surveillance technologies— from convenience to a false sense of security; the role of horror stories and dark scenarios in ubiquitous computing; and the risks resulting from unwarranted trust. As Thomas Jefferson said, “The price of freedom is eternal vigilance.”
David Wright (david.wright@trilateralresearch.com) is managing partner of Trilateral Research & Consulting LLP, based in London,
U.K. Michael Friedewald (m.friedewald@isi.fraunhofer.de) is a senior scientist and project manager in the Department of Emerging Tech nologies at the Fraunhofer Institute of Systems and Innovation Research, Karlsruhe, Germany. Wim Schreurs (wim.schreurs@vub.ac.be) is a researcher at Vrije Universiteit Brussel in Brussels, Belgium. Michiel Verlinden (michiel.verlinden@gmail.com) is an attorney at the Brussels Bar. Serge Gutwirth (serge.gutwirth@vub.ac.be) is a professor of law at Vrije Universiteit Brussel in Belgium. Yves Punie (Yves.Punie@ec.europa.eu) is senior researcher at the Institute for Prospective Technological Studies (IPTS) in Seville, Spain. The IPTS is part of the European Commission’s Joint Research Centre (JRC). Ioannis Maghiros (Ioannis.Maghiros@ec.europa.eu) is principal IST scientific officer at the IPTS. Elena Vildjiounaite (Elena.Vildjiounaite@vtt.fi) is a researcher at the VTT Technical Research Centre of Finland in Oulu. Petteri Alahuhta (Petteri.Alahuhta@vtt.fi) is a technology manager in the Mobile Interaction Knowledge Centre of VTT Technical Research Centre of Finland.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee
2008年3月3日 星期一
5.1
Command copy_all_right(p,q,s) without flag c
If own in A[p,s]
Then
Enter rc into A[q,s]
Enter wc into A[q,s]
Enter
End
5.2
Command copy_all_right(p,q,s) without flag c
If own in A[p,s]
Then
Enter r into A[q,s]
Enter w into A[q,s]
Enter
End
5.3
When enter R into A[q,s],because it’s without flag c , so q could not copy the right to others . If with flag c , it will have right to copy right to others .
Google Docs Link:http://docs.google.com/Doc?id=dhdhrj5r_2145gm27hjfc
2008年2月25日 星期一
Homework 01 自己翻譯版
1.Introduction
在入侵偵測中,最有挑戰性的任務之一就是建立一個事件上一致的觀察,從多種的監聽裝置上將兩種以上的警報混和。這種混和程序的警報可被定義為聚合串流警報的關係。此集合為警報的群組都被結束在相同的時間,而且被建立擁有類似的特點,此集合融合了不同觀點在同樣的事件中。警報關聯必須用於邏輯上的連結警報認知。「關係」並不是必要的是指「統計上的關係」,但是某些方法依據統計上的關係有時候被使用在顯示這些關係。
警報的融合是比較複雜的,考慮到異常檢測系統,因為沒有任何資料的類型或分類的觀察攻擊,可以使用在融合算法。在最近相關性的文獻中,大部分的演算法建議使用這個資訊,因此不適用於純粹基於異常的入侵偵測系統。
在這一個實驗中,我們探究統計因果試驗的使用,被建議用在IDS警報的關聯性上,而且也可以良好的應用在異常的IDS警報。我們把焦點專注在Granger Causality Test (GCT)的使用。而且也顯示他的效能強烈的依賴於選擇一個好的參數,可以證明是靈敏的且困難的估計。我們重新定義因果問題在相似的統計測試的術語上,而且試著去驗證他。
2.Problem statement and state of the art
警報混和程序輸出的需求是密切的,高層次的觀察在網路(通常是大而且複雜的)上發生了什麼事情,在這一次的研究中我們使用他。
圖一、圖示報警熔合術語用在這方面的工作
一個些許修改過的科技建議Alerts streams在不同的IDS系統收集常態化且集合之後,警報關聯性是相當最後的一個步驟。在「1」的階段,「融合」被應用在我們稱為「聚合」的階段。然而我們使用前者表示整個過程。
我們建議一個模糊的依賴時間的聚合技巧顯示他產生好的效果在於錯誤的時期確實的減少。在這裡,我們多點聚焦在有挑戰性的關聯性狀態。有效的且一般相關性的演算法是很困難的去設計,特別假如目標是複雜攻擊腳本的重建。
一個關於警報關聯的技巧基於狀態過渡圖在Fig.3顯示出來。有限狀態自動控制的使用能夠用於複雜的方案描述,但是需要已知的腳本簽章。同時前者也不適合用於完全不規則其無法辨別不同形式的事件之間差別的檢測器。稍微接近一點,以相似的長度和毛病但是不同的行為,被嘗試在先-瞭解條件格式的攻擊,有時沿著時間間隔的標準。他可能是採集方案的規則直接來自資料,也從被監督或是為被監督的形式。這兩種方法使用類似於他們的規則的警報分類。
所有這些技術,將致力於異常檢測系統,因為他們信賴警報代號或工作的分類。此種演算法的最佳例子不需要這樣的功能是基於時間序列分析與建模。例如Fig.8是基於時間序列的結構靠著計算警報進入採樣間隔的數量。開發的趨勢和週期性的可移除的演算法允許過濾可預測的物件,留下真實的警報作為輸出。比起一個關聯性的方法,雖然這是一個誤判和降噪的方法。這關聯性的辦法研究在Fig.9而且基於GCT也不需要先備的知識。而且吸引我們的注意力正如那少許可以留下的針對不規則偵測警報的方案關係在早期的文獻中。我們也將描述且更深入的分析這一個方法在Ch.4
3 Problems in the evaluation of alert correlation systems
警報混合系統的估計值仍然受限於一些建議,是實際的且富理論的挑戰可增長[2]。除此之外,可靠資料的來源缺乏的共同問題為了基準程序衝突嚴厲的也在理想的關聯系統評估,我們需要主題及網路資料組,完全的標示有複雜的攻擊腳本詳細的描述。這些資料應該可以隨意的變動對於科學的社會。這些需要的條件超出現實社會。
這類型唯一的資料組有效利用是單一的藉著DARPA(IDEVAL 資料組)。當然,自從這資料組被建立來估計入侵偵測感測器不是用來估算關聯性的工具,他並不包含感測器警報。這警報必須被在資料上連續變化的感測器所建立。有1999個資料組[10]我們拿來使用在這一個研究有許多已知的缺陷。首先他是顯然且無望的過時。而且,為數的缺陷被偵測出而且在網路追蹤上受到批評[11,12]。最近,我們分析了主機式系統呼叫路徑,而且顯示出[13,14]他充斥著許多的問題。
這些基本的缺陷並不是極端的對於這一個研究充滿危險,自從攻擊效果的散播(從網路到主機)藉著任何IDEVAL的已知缺陷是不受影響的,而且事實上我們觀察到它實實在在的如此表現。問題是事實上入侵腳本太簡單而且直接。此外,許多的攻擊是無法被察覺的在主機及網路的資料(如此,建立完整的關聯消失的點上)。現在,網路及攻擊者較為高技巧性且攻擊腳本比起1999以來是越來越複雜,操作許多層面的網路以及應用攻擊。
這一個研究,我們緊密的採用DEFCON 9 CTF轉出和DARPA Cyber Panel Correlation Technology Validation (CTV) [15]資料組關於警報關聯模型的計算。這些已成型的資料組並不是階層化而且不包含任何背景通信,所以在事實上(就像作者自己認為)並不能夠用於適當的計算,但是可以作定量的分析。相反地,DARPA CTV的成果,在2002被提出建立一個複雜的實驗的網路,沿著背景通信和一串的攻擊腳本。這警報藉著變化的感測器產生,當這些攻擊被收集而且給計算關聯工具。不幸的,資料組是無法利用於更進一步的實驗。
依照前面所有的理由,在我們的試驗中我們將使用IDEVAL資料組經過簡化的。我們試著去把從單一主機的IDS感測器(HIDS)的警報流與對應的從網路式的IDS(NIDS)的警報產生關聯,後者監視著整個網路。直到最後,我們執行兩個不規則的IDS模型(亦描述在13,14,16)在整個IDEVAL資料組。我們執行NIDS模型在tcpdump資料而且收集128個違反主機pascal.eyrie.af.mil [17]的攻擊警報。這個NIDS也被1009個警報與其他有關聯的主機所建立。使用HIDS模型,我們建立了1070個警報從pascal.eyrie.af.mil主機所得到。針對這些警報,NIDS的是能夠探測近66 %的攻擊與小於0.03 %的錯誤結果; 而HIDS更是呈現了檢出率為98 %和1.7 %的錯誤結果。
在隨後,我們使用這種以下的速記符號:Net 是 substream 所有警報藉著NIDS所產生的。 HostP 是所有警報藉著HIDS安裝在pascal.eyrie.af.mil 所產生的 substream ,而 NetP 被視為所有的警報藉著NIDS(與 帕斯卡爾 作為標靶)所產生的,最後NetP = Net \ NetP 表示所有警報藉著NIDS(同所有,但 帕斯卡爾 作為標靶)所產生的。
4 The Granger Causality Test
在[9]Qin and Lee建議一個有趣的演算法關於警報關聯性似乎是適當的也能夠適用於不規則的警報。擁有相同特質的警報被組合為一個時間依序專案的集合,屬於相同的模式。(依據[8]的概念。)隨後,一連串頻繁的次數被建立,使用一個大小固定,可調整的視窗,其結果是時間序列對於每一個警報的集合。這一個樣板稍後被應用在 GCT[18],一個統計上有搜尋能力的假說實驗在於因果關係於兩個時間序列中,當他們是線性的發生,常態的過程。GCT提供一個隨機的測量標準,叫做「Granger Causality Index」(GCI),多少的歷史在一個時間序列(假定的原因)需要說明另外一個的發展(假定的結果或是目標)。GCT是基於兩種模式的估算:其一是一個自動回歸模式(AR),在這一個模式,未來目標的樣板是被模組化為有影響力的靠著之前目標本身的例子。其二是自回歸移動平均線外源性(ARMAX)模式,也是進入計算假定的發生時間序列如同外生的元件。一個統計學上的F-試驗建立在模式之上估算錯誤選擇出最適當的模式,假使ARMAX選擇了最適合的,那麼源頭有效的影響著目標。
在沒有監督的因果關係的認證事件被完成靠著反覆的上述程序針對每一個成雙的時間序列。這一個方法的益處是他不需要先備知識(縱使他是使用強力的攻擊數值,假使可以,在一個可選的次序。)然而,在先前的工作中,我們也顯示了GCT轉弱雖然認知上是有意義的關係在 IDEVAL攻擊之間。
我們測試了GCT的感度針對於兩種參數,一個樣本時間w和一個延遲的時間p(是AR的排序)在我們的樣本中。
圖三:最佳的時間延遲p藉著AIC標準非常多重的時間給予。
經過實驗以後,期望得到的結果是NetP、HostP、和HostP6 NetP(表示「因果」當6是它的否定),在「9」,樣本時間是隨意的設定為W=60,當P的選擇不足為證時。然而,我們的實驗顯示這些參數的選擇可以強烈的影響實驗的結果。在圖2,(1-a/b)我們繪製P值和實驗的GCI在每一個不同的P(W=60秒)。特別的是,虛線符合NetP(K)HostP (K)的實驗,而且實線符合HostP(K)NetP(K),我們重新回想假使P值低於有意義的等級,無效的假設是被拒絕的。要注意到不同的P的選擇如何引導不確定的甚至可能的結果。例如,...(數值)
有一個可能的解釋如下:為了計算正確的誤差,GCT是有意義的只要線性回歸模式都是最佳的。假使我們使用Akaike訊息Criterion去計算最佳時間延遲P在所有不同的資料窗中,我們發現P不停的廣泛的變化,正如Fig.3所展示的。事實顯示沒有一個固定的最佳選擇的P值,同時實驗的結果顯著的依賴他,使我們懷疑GCT是可行的功能一般的警訊集合。W的選擇看起來相同重要甚至是較困難去完成,除了猜測以外。
當然,我們的實驗並不是最後的確證,IDEVAL警報集合或許會比較簡單,不足以顯示因果關係,另一方面,儘管那是不太可能的,說明了,GCT是不太適合不規則的偵測警訊。事實上,在[9],被測試過在於錯誤的警報。不過事實上也有理論上的理由去懷疑GCT的應用可以固定、好的結果。第一,實驗是漸進的w.r.t.p表示結果信賴度隨著P值增加而降低,因為自由程度的損害。第二,奠基在強力線性的假定在自動回歸符合步驟的模式,其也強力的依賴觀察的現象。在同樣的方法,不變的模式假設並不長存。
5 Modeling alerts as stochastic processes
為了代替解讀警報串流因著時間序列(按建議由GCT-based方法) ,我們提出要改變的焦點,用一個隨機模型,其中警報被描述為隨著時間(隨機)活動。 這項建議可以被看作是一個正式的延伸的做法,在[1]已經介紹過,其中相關的警報,如果他們被發射由不同IDS中的一個"微不足道"的時間內,在那裡, "微不足道"的定義是一個簡潔、固定標準。
為了簡潔明了,我們再次說明我們的技術在簡單的單一HIDS和單一NIDS的監測整個網路。這一個概念,雖然,可以很容易普及在顧及兩個以上的警報串流中,藉著一對一對的評估他們。每個警報,我們有三個基本信息:一個時間標記,一個目標的主機(固定的,在HIDS的環境,對於主機本身) ,以及設計感應器(在我們的情況下,一個二進制值)。
我們重用的情況和數據,我們已經呈現在第4段。使用不需解釋的標示,我們也定義了以下隨機變數:TNetp 是網路警報在NetP到達的時間( 同樣的定義用於TnetO , THostP);Netp ( NetO ) ,是在一個具體的基於網絡的警報關於pascal (非pascal ),以及相應的基於主機的警報所造成延遲(造成藉著傳輸,處理和不同偵測間隔)。每一個T()的實際值不過是一套從對應的警報串流所獲得的時間標記。我們有理由假定" NetP和TNetP是隨機獨立的(同時假定NetO和TnetO)。
在一個理想的相關框架與兩個同樣完美有以100%的IDS和0%FDR的IDS中,如果兩個警報串流是有關連性的(例如,他們所代表同樣攻擊事件的獨立偵測,由不同IDSes [1]),他們也是同時接近的。NetP和HostP應明顯舉出一對串流的例子。顯然的,在現實世界中,一些警報,將會漏掉(因為錯誤結果,或者僅僅是因為部分的攻擊是無法檢測的,只有特定類型的探測器),而且相關的警報之間的距離將因此有較高的變異性。為了去計算這些,我們就可以"切斷"那些在其他時間序列過於遠離對應警報的警報,假定他們在我們的情況中是單一的,在原有數據中知道單一攻擊沒有持續超過400秒,在這一點上,我們初步定分界門檻。
根據給定的工作假設和建議的隨機模型,我們可以正式的界定問題為一組兩個統計假設的檢驗:
Let { t i,k } be the observed timestamps of T i 8 i 2 { HostP,NetP,NetO } , 第一次實驗的意義是直截了當的:一隨機且大量的時間中,NetP,發生的一連串警報, HostP,k,被網路警報提前,NetP ,k ,如果沒有統計上的顯著數量的事件發生時,實驗結果是警報串流TNetP 與THostP 不相關,在這種情況下,我們有足夠的統計證據,而拒絕H1和不接受任一。同樣的,拒絕第二次試驗的空假設,表示NetO警報串流(對於所有主機,除了pascal ),是與關於pascal的警報串流有關聯性。
值得注意的是,以上兩個試驗有很強的相互關聯:在一個理想的關聯架構,它不會發生的情況,就是以下兩個 " NetP和HostP是有關聯性的 " 且 " NetO和Hostp是相關的 " :這將意味著該網路活動關於所有主機除了pascal(會引起NetO)必須與pascal(提高HostP)的主機活動一同執行與同一量級的 NetP,這是一個直接的相互矛盾的結論。因此,第二次實驗是作為一種"穩健"的標準。
從我們的警報,我們可以簡單地取得計算出NetP的樣本,為每個在NetP中的值,其值在 HostP是最接近的,但更巨大的(支持上述定義的門檻),我們可以做同樣的NetO,使用警報在NetO和HostP。
下一步涉及到的選擇分佈的隨機變量,我們在上面已定義。典型的分佈用於模擬隨機定時事件的發生決定於指數的組合(概率密度函數( PDF格式)) [20]。尤其是,我們決定,使他們與Gamma PDFs符合,因為我們的實驗結果表明,這種分佈是一個很好的選擇,無論是NetP和NetO。
PDF的估計NetPf P := f " NetP , and " NetO , f O := f " NetO , 是用著名的最相似的(ML)技術[21]作為落實在GNU R軟體包:這些結果被歸納在Fig。 4. f P and f O are approximated by Gamma[3 . 0606 , 0 . 0178] and Gamma[1 . 6301 , 0 . 0105], respectively (standard errors on parameters are 0 . 7080, 0 . 0045 for f P and 0 . 1288, 0 . 009 for f O ).從現在起,某一給予的密度 f 的估計會被標示 ? F 。
圖4顯示統計圖與估計密度(紅色虛線)和分位區( QQ的區) , 是關於F和FP 。We recall that QQ plots are an intuitive graphical “tool” for comparing data distributions by plotting the quantile of the first distribution against the quantile of the other one. 我們還記得QQ的區塊是一個直觀的圖形"工具"的資料進行比較資料,藉著第一分佈的分位點對照其餘的分為點。
考慮到樣本大小的 " ( ) 是靠近40 , QQ的區塊,實證證實了我們的直覺:事實上, F 0 及 ? F P 都能夠說明真實的數據,無可避免的,但微不足道有估計誤差。即使? f p 和 ? f O都是gamma-圖形,但必須注意,它們在其參數顯著不同;這是一個非常重要的結果,因為它允許設立一個適當的標準來決定要或不要 " NetP " NetO 是被同樣的標準所產生。
鑑於上述估計的,更精確和穩健的假說測試可於現在設定,試驗1和2可以映射到兩面的的Kolmogorov- Smirnov tests [20],取得了同樣的結果在設定的條件。
這象徵代表"具有相同的分佈"。使用估計是有利的。我們重新回想,KS-test是一個非參數檢驗比較樣本(或一個PDF )對比於PDF(或樣品),檢查與對方有多少不同。這種測試可以被執行,例如,隨著KS-test( )(一個GNU R的原始程序):P值的結果分別為IDEVAL 1999 are 0 . 83 and 0 . 03, respectively.1999年的0 83和0 。 03 。明顯的,有一個顯著的統計數字顯示同意虛無假設的試驗3。它似乎是ML估計是能夠正確地符合Gamma PDF 在 F P(給予NetP樣本),在其中雙重檢查我們對分佈的直覺。這其中同樣的並不待在F 0 :事實上,他也不能。
從NetO以Gamma PDF正確的估計。低P值為了測試4證實了NetO延遲的分配,比起NetP是完全不同的。因此,我們的標準不但承認嘈雜且延誤的關聯性在警報串流之間,如果他們存在 ,它也有能力檢測如果這種相關性並不成立。
我們也考驗了我們警報產生的的技術藉著NIDS/HIDS執行在集合的1998(限於我們對於一個星期頭四天的分析),以交叉驗證了上述結果。們準備了並處理數據以同樣的程序如我們上文所述針對1999筆數據。從幾乎相同比例的主機/警報網對任何pascal或其他主機,有ML估計已計算出兩個Gamma密度在圖5。f P 和 F O接近Gamma(3 . 5127 , 0 . 1478) and Gamma(1 . 3747 , 0 . 0618),分別為(standard errors on estimated parameters are 1 . 3173, 0 . 0596 for f P and 0 . 1265, 0 . 0068 for f O ))。這些參數與我們所估計針對收集到的1999比數據都非常相似。此外,當P值為0.51和0.09,兩者的KS測試證實同一統計不一致,從我們觀察的1999筆數據。上述計算結果顯示,通過解譯警報串流作為隨機程序,有幾個(隨機)的不同點在網路到主機的延遲之間屬於同一網路-主機的攻擊session,而且網路到主機的延遲屬於不同session。利用這些不同處,我們可以找出串流之間的關係在一個不受監督的方式,而不需要事先任何參數。
6 Conclusions 6結論
在這篇文章中我們分析了使用不同類型的統計測試,針對異常檢測警報的關聯性,有一個問題是目前很少或根本沒有解決方案。其中的一些相關建議,可以將它們應用到異常檢測,是使用一種Granger Causality Test(GCT)。經過討論一種可能的測試方法之後,我們觀察到收集到的數據集
傳統上使用的評估方式有不同的缺點,我們藉著使用一個較簡單的情景對比數據作部份的處理,只調查針對特定主機之主機式警報的串流之間的連結,以及相應的警報串流,從一個網路式的探測器。
我們研究使用一種GCT中的建議,在早期著作,它也顯示,依賴於選擇非顯而易見的配置參數,這大大影響最終的結果。我們還表現出,其中的參數(該命令的模式)是絕對關鍵的,但不能針對給定的系統單獨估計。代替GCT,我們提出了一個簡單的警報產生器的統計模型,並形容警報串流和時間標記作為隨機變量,並顯示統計測試可以用來創造一個合理的標準,區分正相關和不相關的串流。我們證明了我們的標準化作業已完備使用在簡化相關我們用來測試的任務且不需要複雜的設定參數。
這是一項探究的工作,並需要進一步用於實際的研究,更多的序列數據,以及進一步我們所提出的完善的測試和標準,肯定是必要的。另一種這項工作可能的延伸是這些條件可以被使用在建立異常和misuse-based警報的關聯性的研究,以建立現有侵入偵測的範例。
Homework 01 原文
Abstract. In this paper we analyze the use of different types of statistical tests for the correlation of anomaly detection alerts. We show that the Granger Causality Test, one of the few proposals that can be extended to the anomaly detection domain, strongly depends on good choices of a parameter which proves to be both sensitive and difficult to estimate. We propose a different approach based on a set of simpler statistical tests, and we prove that our criteria work well on a simplified correlation task, without requiring complex configuration parameters.
1 Introduction
One of the most challenging tasks in intrusion detection is to create a unified vision of the events, fusing together alerts from heterogeneous monitoring devices. This alert fusion process can be defined as the correlation of aggregated streams of alerts. Aggregation is the grouping of alerts that both are close in time and have similar features; it fuses together different “views” of the same event. Alert correlation has to do with the recognition of logically linked alerts. “Correlation” does not necessarily imply “statistical correlation”, but statistical correlation based methods are sometimes used to reveal these relationships.
Alert fusion is more complex when taking into account anomaly detection systems, because no information on the type or classification of the observed attack is available to the fusion algorithms. Most of the algorithms proposed in the current literature on correlation make use of such information, and are therefore inapplicable to purely anomaly based intrusion detection systems.
In this work, we explore the use of statistical causality tests, which have been proposed for the correlation of IDS alerts, and which could be applied to anomaly based IDS as well. We focus on the use of Granger Causality Test (GCT), and show that its performance strongly depends on a good choice of a parameter which proves to be sensitive and difficult to estimate. We redefine the causality problem in terms of a simpler statistical test, and experimentally validate it.
2 Problem statement and state of the art
The desired output of an alert fusion process is a compact, high-level view of what is happening on a (usually large and complex) network. In this work we use
Fig. 1. A diagram illustrating alert fusion terminology as used in this work.
a slightly modified version of the terminology proposed in [1]. Alerts streams are collected from different IDS sources, normalized and aggregated; alert correlation is the very final step of the process. In [1] the term “fusion” is used for the phase we name “aggregation”, whereas we use the former to denote the whole process.
Fig. 1 summarizes the terminology.
In [2] we propose a fuzzy time-based aggregation technique, showing that it yields good performance in terms of false positive reduction. Here, we focus on the more challenging correlation phase. Effective and generic correlation algorithms are difficult to design, especially if the objective is the reconstruction of complex attack scenarios.
A technique for alert correlation based on state-transition graphs is shown in [3]. The use of finite state automata enables for complex scenario descriptions, but it requires known scenarios signatures. It is also unsuitable for pure anomaly detectors which cannot differentiate among different types of events. Similar approaches, with similar strengths and shortcomings but different formalisms, have been tried with the specification of pre- and post-conditions of the attacks [4], sometimes along with time-distance criteria [5]. It is possible to mine scenario rules directly from data, either in a supervised [6] or unsupervised [7] fashion. Both approaches use alert classifications as part of their rules.
None of these techniques would work for anomaly detection systems, as they rely on alert names or classification to work. The best examples of algorithms that do not require such features are based on time-series analysis and modeling. For instance, [8] is based on the construction of time-series by counting the number of alerts occurring into sampling intervals; the exploitation of trend and periodicity removal algorithms allows to filter out predictable components, leaving real alerts only as the output. More than a correlation approach, this is a false-positive and noise-suppression approach, though. The correlation approach investigated in [9] and based on the GCT also does not require prior knowledge, and it drew our attention as one of the few viable proposal for anomaly detection alert correlation in earlier literature. We will describe and analyze this approach in detail in Section 4.
3 Problems in the evaluation of alert correlation systems
Evaluation techniques for alert fusion systems are still limited to a few proposals, and practically and theoretically challenging to develop [2]. Additionally, the common problem of the lack of reliable sources of data for benchmarking impacts heavily also on the evaluation of correlation systems. Ideally, we need both host and network datasets, fully labeled, with complex attack scenarios described in detail. These data should be freely available to the scientific community. These requirements rule out real-world dumps.
The only datasets of this kind effectively available are the ones by DARPA (IDEVAL datasets). Of course, since this data set was created to evaluate IDS sensors and not to assess correlation tools, it does not include sensor alerts. The alerts have to be generated by running various sensors on the data. The 1999 dataset [10], which we used for this work, has many known shortcomings. Firstly, it is evidently and hopelessly outdated. Moreover, a number of flaws have been detected and criticized in the network traces [11,12]. More recently, we analyzed the host-based system call traces, and showed [13, 14] that they are ridden with problems as well.
For this work these basic flaws are not extremely dangerous, since the propagation of attack effects (from network to hosts) is not affected by any of the known flaws of IDEVAL, and in fact we observed it to be quite realistically present. What could be a problem is the fact that intrusion scenarios are too simple and extremely straightforward. Additionally, many attacks are not detectable in both network and host data (thus making the whole point of correlation disappear). Nowadays, networks and attackers are more sophisticated and attack scenarios are much more complex than in 1999, operating at various layers of the network and application stack.
The work we analyze closely in the following [9] uses both the DEFCON 9 CTF dumps and the DARPA Cyber Panel Correlation Technology Validation (CTV) [15] datasets for the evaluation of an alert correlation prototype. The former dataset is not labeled and does not contain any background traffic, so in fact (as the authors themselves recognize) it cannot be used for a proper evaluation, but just for qualitative analysis. On the contrary, the DARPA CTV effort, carried out in 2002, created a complex testbed network, along with background traffic and a set of attack scenarios. The alerts produced by various sensors during these attacks were collected and given as an input to the evaluated correlation tools. Unfortunately, this dataset is not available for further experimentation.
For all the previous reasons, in our testing we will use the IDEVAL dataset with the following simplification: we will just try to correlate the stream of alerts coming from a single host-based IDS (HIDS) sensor with the corresponding alerts from a single network-based IDS (NIDS), which is monitoring the whole network. To this end, we ran two anomaly-based IDS prototypes (both described in [13,14,16]) on the whole IDEVAL testing dataset. We ran the NIDS prototype on tcpdump data and collected 128 alerts for attacks against the host pascal.eyrie.af.mil [17]. The NIDS also generated 1009 alerts related to other hosts. Using the HIDS prototype we generated 1070 alerts from the dumps of the host pascal.eyrie.af.mil. With respect to these alerts, the NIDS was capable of detecting almost 66% of the attacks with less than 0.03% of false positives; the HIDS performs even better with a detection rate of 98% and 1.7% of false positives.
Fig. 2. p-value (-a) and GCI (-b) vs. p with w = w1 = 60s (1-) and w = w2 = 1800s
(2-) “NetP(k) HostP (k)” (dashed line), “HostP (k) NetP(k)” (solid line).
In the following, we use this shorthand notation: Net is the substream of all the alerts generated by the NIDS. HostP is the substream of all the alerts generated by the HIDS installed on pascal.eyrie.af.mil, while NetP regards all the alerts (with pascal as a target) generated by the NIDS; finally, NetO = Net\NetP indicates all the alerts (with all but pascal as a target) generated by the NIDS.
4 The Granger Causality Test
In [9] Qin and Lee propose an interesting algorithm for alert correlation which seems suitable also for anomaly-based alerts. Alerts with the same feature set are grouped into collections of time-sorted items belonging to the same “type” (following the concept of type of [8]). Subsequently, frequency time series are built, using a fixed size sliding-window: the result is a time-series for each collection of alerts. The prototype then exploits the GCT [18], a statistical hypothesis test capable of discovering causality relationships between two time series when they are originated by linear, stationary processes. The GCT gives a stochastic measure, called Granger Causality Index (GCI), of how much of the history of one time series (the supposed cause) is needed to “explain” the evolution of the other one (the supposed consequence, or target). The GCT is based on the estimation of two models: the first is an Auto Regressive model (AR), in which future samples of the target are modeled as influenced only by past samples of the target itself; the second is an Auto Regressive Moving Average eXogenous (ARMAX) model, which also takes into account the supposed cause time series as an exogenous component. A statistical F-test built upon the model estimation errors selects the best-fitting model: if the ARMAX fits better, the cause effectively influences the target.
In [9] the unsupervised identification of “causally related” events is performed by repeating the above procedure for each couple of time-series. The advantage of the approach is that it does not require prior knowledge (even if it may use attack probability values, if available, for an optional prioritization phase). However, in a previous work [2] we showed that the GCT fails however in recognizing “meaningful” relationships between IDEVAL attacks.
We tested the sensitivity of the GCT to the choice of two parameters: the sampling time, w, and the time lag p (that is, the order of the AR). In our simple
Fig. 3. The optimal time lag ˆp given by the AIC criterion strongly varies over time.
experiment, the expected result is that NetP HostP , and that HostP 6 NetP (the indicates “causality” while 6 is its negation). In [9] the sampling time was arbitrarily set to w = 60s, while the choice of p is not documented. However, our experiments show that the choice of these parameters can strongly influence the results of the test. In Fig. 2 (1-a/b) we plotted the p-value and the GCI of the test for different values of p (w = 60s). In particular, the dashed line corresponds to the test NetP (k) HostP (k), and the solid line to the test HostP (k) NetP (k).We recall that if the p-value is lower than the significance level, the null hypothesis is refused. Notice how different choices of p can lead to inconclusive or even opposite results. For instance, with = 0.20 and with 2 p 3, the result is that NetP (k) HostP (k) and that HostP (k) 6 NetP (k). As we detailed in [2] (Fig. 2 (2-a/b)), other values of p lead to awkward result that both HostP (k) NetP (k) and NetP (k) HostP (k).
A possible explanation is that the GCT is significant only if both the linear regression models are optimal, in order to calculate the correct residuals. If we use the Akaike Information Criterion (AIC) [19] to estimate the optimal time lag ˆp over different windows of data, we find out that ˆp wildly varies over time, as it is shown in Fig. 3. The fact that there is no stable optimal choice of p, combined with the fact that the test result significantly depends on it, makes us doubt that the Granger causality test is a viable option for general alert correlation. The choice of w seems equally important and even more difficult to perform, except by guessing.
Of course, our testing is not conclusive: the IDEVAL alert sets may simply not be adequate for showing causal relationships. Another, albeit more unlikely, explanation, is that the Granger causality test may not be suitable for anomaly detection alerts: in fact, in [9] it has been tested on misuse detection alerts. But in fact there are also theoretical reasons to doubt that the application of the Granger test can lead to stable, good results. First, the test is asymptotic w.r.t.p meaning that the results reliability decreases as p increases because of the loss of degrees of freedom. Second, it is based on the strong assumption of linearity in the auto-regressive model fitting step, which strongly depends on the observed phenomenon. In the same way, the stationarity assumption of the model does not always hold.
5 Modeling alerts as stochastic processes
Instead of interpreting alert streams as time series (as proposed by the GCTbased approach), we propose to change point of view by using a stochastic model in which alerts are modeled as (random) events in time. This proposal can be seen as a formalized extension of the approach introduced in [1], which correlates alerts if they are fired by different IDS within a “negligible” time frame, where “negligible” is defined with a crisp, fixed threshold.
For simplicity, once again we describe our technique in the simple case of a single HIDS and a single NIDS which monitors the whole network. The concepts, however, can be easily generalized to take into account more than two alert streams, by evaluating them couple by couple. For each alert, we have three essential information: a timestamp, a “target” host (fixed, in the case of the HIDS, to the host itself), and the generating sensor (in our case, a binary value).
We reuse the scenario and data we already presented in Section 4 above. With a self-explaining notation, we also define the following random variables: TNetP are the arrival times of network alerts in NetP (TNetO, THostP are similarly defined); "NetP ("NetO) are the delays (caused by transmission, processing and different granularity in detection) between a specific network-based alert regarding pascal (not pascal) and the corresponding host-based one. The actual values of each T(·) is nothing but the set of timestamps extracted from the corresponding alert stream. We reasonably assume that "NetP and TNetP are stochastically independent (the same is assumed for "NetO and TnetO).
In an ideal correlation framework with two equally perfect IDS with a 100% DR and 0% FPR, if two alert streams are correlated (i.e., they represent independent detections of the same attack occurrences by different IDSes [1]), they also are “close” in time. NetP and HostP should evidently be an example of such a couple of streams. Obviously, in the real world, some alerts will be missing (because of false negatives, or simply because some of the attacks are detectable only by a specific type of detector), and the distances between related alerts will therefore have some higher variability. In order to account for this, we can “cut off” alerts that are too far away from a corresponding alert in the other time series, presuming them to be singletons. In our case, knowing that single attacks did not last more than 400s in the original dataset, we tentatively set a cutoff threshold at this point.
Under the given working assumptions and the proposed stochastic model, we can formalize the correlation problem as a set of two statistical hypothesis tests:
H0 : THostP 6= TNetP + "NetP vs. H1 : THostP = TNetP + "NetP (1)
H0 : THostP 6= TNetO + "NetO vs. H1 : THostP = TNetO + "NetO (2)
Let {ti,k} be the observed timestamps of Ti 8i 2 {HostP,NetP,NetO}, the meaning of the first test is straightforward: within a random amount of time, "NetP , the occurring of a host alert, tHostP,k, is preceded by a network alert, tNetP,k. If this does not happen for a statistically significant amount of events, the test result is that alert stream TNetP is uncorrelated to THostP ; in this case, we have enough statistical evidence for refusing H1 and accepting the null one. Symmetrically, refusing the null hypothesis of the second test means that the NetO alert stream (regarding to all hosts but pascal) is correlated to the alert stream regarding pascal.
Fig. 4. Histograms vs. est. density (red dashes) and Q-Q plots, for both fˆO and fˆP .
Note that, the above two tests are strongly related to each other: in an ideal correlation framework, it cannot happen that both “NetP is correlated to HostP ” and “NetO is correlated to HostP ”: this would imply that the network activity regarding to all hosts but pascal (which raises NetO) has to do with the host activity of pascal (which raises HostP ) with the same order of magnitude of NetP , that is an intuitively contradictory conclusion. Therefore, the second test acts as a sort of “robustness” criterion.
From our alerts, we can compute a sample of "NetP by simply picking, for each value in NetP , the value in HostP which is closest, but greater (applying a threshold as defined above). We can do the same for "NetO, using the alerts in NetO and HostP .
The next step involves the choice of the distributions of the random variables we defined above. Typical distributions used for modeling random occurrences of timed events fall into the family of exponential Probability Density Functions (PDF)s [20]. In particular, we decided to fit them with Gamma PDFs, because our experiments show that such a distribution is a good choice for both the "NetP and "NetO.
The estimation of the PDF of "NetP , fP := f"NetP , and "NetO, fO := f"NetO, is performed using the well known Maximum Likelihood (ML) technique [21] as implemented in the GNU R software package: the results are summarized in Fig. 4. fP and fO are approximated by Gamma[3.0606, 0.0178] and Gamma[1.6301, 0.0105], respectively (standard errors on parameters are 0.7080, 0.0045 for fP and 0.1288, 0.009 for fO). From now on, the estimator of a given density f will be indicated as ˆ f.
Fig. 4 shows histograms vs. estimated density (red, dashed line) and quantilequantile plots (Q-Q plots), for both ˆ fO and ˆ fP . We recall that Q-Q plots are an intuitive graphical “tool” for comparing data distributions by plotting the quantile of the first distribution against the quantile of the other one.
Considering that the samples sizes of "(·) are around 40, Q-Q plots empirically confirms our intuition: in fact, ˆ fO and ˆ fP are both able to explain real data well, within inevitable but negligible estimation errors. Even if ˆ fP and ˆ fO are both Gamma-shaped, it must be noticed that they significantly differ in their parametrization; this is a very important result since it allows to set up a proper criterion to decide whether or not "NetP and "NetO are generated by the same phenomenon.
Fig. 5. Histograms vs. est. density (red dashes) for both fˆO and fˆP (IDEVAL 1998)
Given the above estimators, a more precise and robust hypotheses test can be now designed. The Test 1 and 2 can be mapped into two-sided Kolmogorov-Smirnov (KS) tests [20], achieving the same result in terms of decisions:
H0 : "NetP fP vs. H1 : "NetP 6 fP (3)
H0 : "NetO fO vs. H1 : "NetO 6 fO (4)
where the symbol means “has the same distribution of”. Since we do not know the real PDFs, estimators are used in their stead. We recall that the Kstest is a non-parametric test to compare a sample (or a PDF) against a PDF (or a sample) to check how much they differs from each other (or how much they fit). Such tests can be performed, for instance, with ks.test() (a GNU R native procedure): resulting p-values on IDEVAL 1999 are 0.83 and 0.03, respectively. Noticeably, there is a significant statistical evidence to accept the null hypothesis of Test 3. It seems that the ML estimation is capable of correctly fitting a Gamma PDF for fP (given "NetP samples), which double-checks our intuition about the distribution. The same does not hold for fO: in fact, it cannot be
correctly estimated, with a Gamma PDF, from "NetO. The low p-value for Test 4 confirms that the distribution of "NetO delays is completely different than the one of "NetP . Therefore, our criterion doest not only recognize noisy delay-based relationships among alerts stream if they exists; it is also capable of detecting if such a correlation does not hold.
We also tested our technique on alerts generated by our NIDS/HIDS running on IDEVAL 1998 (limiting our analysis to the first four days of the first week), in order to cross-validate the above results. We prepared and processed the data with the same procedures we described above for the 1999 dataset. Starting from almost the same proportion of host/net alerts against either pascal or other hosts, the ML-estimation has computed the two Gamma densities shown in Fig. 5: fP and fO are approximated by Gamma(3.5127, 0.1478) and Gamma(1.3747, 0.0618), respectively (standard errors on estimated parameters are 1.3173, 0.0596 for fP and 0.1265, 0.0068 for fO). These parameter are very similar to the ones we estimated for the IDEVAL 1999 dataset. Furthermore, with p-values of 0.51 and 0.09, the two KS tests confirm the same statistical discrepancies we observed on the 1999 dataset. The above numerical results show that, by interpreting alert streams as random processes, there are several (stochastic) dissimilarities between net-to-host delays belonging to the same net-host attack session, and net-to-host delays belonging to different sessions. Exploiting these dissimilarities, we may find out the correlation among streams in an unsupervised manner, without the need to predefine any parameter.
6 Conclusions
In this paper we analyzed the use of of different types of statistical tests for the correlation of anomaly detection alerts, a problem which has little or no solutions available today. One of the few correlation proposals that can be applied to anomaly detection is the use of a Granger Causality Test (GCT). After discussing a possible testing methodology, we observed that the IDEVAL datasets
traditionally used for evaluation have various shortcomings, that we partially addressed by using the data for a simpler scenario of correlation, investigating only the link between a stream of host-based alerts for a specific host, and the corresponding stream of alerts from a network based detector.
We examined the usage of a GCT as proposed in earlier works, showing that it relies on the choice of non-obvious configuration parameters which significantly affect the final result. We also showed that one of these parameters (the order of the models) is absolutely critical, but cannot be uniquely estimated for a given system. Instead of the GCT, we proposed a simpler statistical model of alert generation, describing alert streams and timestamps as stochastic variables, and showed that statistical tests can be used to create a reasonable criterion for distinguishing correlated and non correlated streams.We proved that our criteria work well on the simplified correlation task we used for testing, without requiring complex configuration parameters.
This is an exploratory work, and further investigations of this approach on real, longer sequences of data, as well as further refinements of the tests and the criteria we proposed, are surely needed. Another possible extension of this work is the investigation of how these criteria can be used to correlate anomaly and misuse-based alerts together, in order to bridge the gap between the existing paradigms of intrusion detection.
Acknowledgments
We need to thank prof. Ilenia Epifani and prof. Giuseppe Serazzi for their helpful comments, as well as the anonymous reviewers.