Search This Blog

Saturday, December 5, 2020

National Security Agency

From Wikipedia, the free encyclopedia

National Security Agency
Seal of the U.S. National Security Agency.svg
Seal of the National Security Agency
Flag of the U.S. National Security Agency.svg
Flag of the National Security Agency
National Security Agency headquarters, Fort Meade, Maryland.jpg
NSA Headquarters, Fort Meade, Maryland
Agency overview
FormedNovember 4, 1952
Preceding agency
  • Armed Forces Security Agency
HeadquartersFort Meade, Maryland, U.S.
39°6′32″N 76°46′17″WCoordinates: 39°6′32″N 76°46′17″W
Motto"Defending Our Nation. Securing the Future."
EmployeesClassified (est. 30,000–40,000)
Annual budgetClassified (estimated $10.8 billion, 2013)
Agency executives
Parent agencyDepartment of Defense
WebsiteNSA.gov

The National Security Agency (NSA) is a national-level intelligence agency of the United States Department of Defense, under the authority of the Director of National Intelligence. The NSA is responsible for global monitoring, collection, and processing of information and data for foreign and domestic intelligence and counterintelligence purposes, specializing in a discipline known as signals intelligence (SIGINT). The NSA is also tasked with the protection of U.S. communications networks and information systems. The NSA relies on a variety of measures to accomplish its mission, the majority of which are clandestine.

Originating as a unit to decipher coded communications in World War II, it was officially formed as the NSA by President Harry S. Truman in 1952. Since then, it has become the largest of the U.S. intelligence organizations in terms of personnel and budget. The NSA currently conducts worldwide mass data collection and has been known to physically bug electronic systems as one method to this end. The NSA is also alleged to have been behind such attack software as Stuxnet, which severely damaged Iran's nuclear program. The NSA, alongside the Central Intelligence Agency (CIA), maintains a physical presence in many countries across the globe; the CIA/NSA joint Special Collection Service (a highly classified intelligence team) inserts eavesdropping devices in high value targets (such as presidential palaces or embassies). SCS collection tactics allegedly encompass "close surveillance, burglary, wiretapping, [and] breaking and entering".

Unlike the CIA and the Defense Intelligence Agency (DIA), both of which specialize primarily in foreign human espionage, the NSA does not publicly conduct human-source intelligence gathering. The NSA is entrusted with providing assistance to, and the coordination of, SIGINT elements for other government organizations – which are prevented by law from engaging in such activities on their own. As part of these responsibilities, the agency has a co-located organization called the Central Security Service (CSS), which facilitates cooperation between the NSA and other U.S. defense cryptanalysis components. To further ensure streamlined communication between the signals intelligence community divisions, the NSA Director simultaneously serves as the Commander of the United States Cyber Command and as Chief of the Central Security Service.

The NSA's actions have been a matter of political controversy on several occasions, including its spying on anti–Vietnam War leaders and the agency's participation in economic espionage. In 2013, the NSA had many of its secret surveillance programs revealed to the public by Edward Snowden, a former NSA contractor. According to the leaked documents, the NSA intercepts and stores the communications of over a billion people worldwide, including United States citizens. The documents also revealed the NSA tracks hundreds of millions of people's movements using cellphones' metadata. Internationally, research has pointed to the NSA's ability to surveil the domestic Internet traffic of foreign countries through "boomerang routing".

History

Formation

The origins of the National Security Agency can be traced back to April 28, 1917, three weeks after the U.S. Congress declared war on Germany in World War I. A code and cipher decryption unit was established as the Cable and Telegraph Section which was also known as the Cipher Bureau. It was headquartered in Washington, D.C. and was part of the war effort under the executive branch without direct Congressional authorization. During the course of the war it was relocated in the army's organizational chart several times. On July 5, 1917, Herbert O. Yardley was assigned to head the unit. At that point, the unit consisted of Yardley and two civilian clerks. It absorbed the navy's Cryptanalysis functions in July 1918. World War I ended on November 11, 1918, and the army cryptographic section of Military Intelligence (MI-8) moved to New York City on May 20, 1919, where it continued intelligence activities as the Code Compilation Company under the direction of Yardley.

The Black Chamber

Black Chamber cryptanalytic work sheet for solving Japanese diplomatic cipher, 1919

After the disbandment of the U.S. Army cryptographic section of military intelligence, known as MI-8, in 1919, the U.S. government created the Cipher Bureau, also known as Black Chamber. The Black Chamber was the United States' first peacetime cryptanalytic organization. Jointly funded by the Army and the State Department, the Cipher Bureau was disguised as a New York City commercial code company; it actually produced and sold such codes for business use. Its true mission, however, was to break the communications (chiefly diplomatic) of other nations. Its most notable known success was at the Washington Naval Conference, during which it aided American negotiators considerably by providing them with the decrypted traffic of many of the conference delegations, most notably the Japanese. The Black Chamber successfully persuaded Western Union, the largest U.S. telegram company at the time, as well as several other communications companies to illegally give the Black Chamber access to cable traffic of foreign embassies and consulates. Soon, these companies publicly discontinued their collaboration.

Despite the Chamber's initial successes, it was shut down in 1929 by U.S. Secretary of State Henry L. Stimson, who defended his decision by stating, "Gentlemen do not read each other's mail".

World War II and its aftermath

During World War II, the Signal Intelligence Service (SIS) was created to intercept and decipher the communications of the Axis powers. When the war ended, the SIS was reorganized as the Army Security Agency (ASA), and it was placed under the leadership of the Director of Military Intelligence.

On May 20, 1949, all cryptologic activities were centralized under a national organization called the Armed Forces Security Agency (AFSA). This organization was originally established within the U.S. Department of Defense under the command of the Joint Chiefs of Staff. The AFSA was tasked to direct Department of Defense communications and electronic intelligence activities, except those of U.S. military intelligence units. However, the AFSA was unable to centralize communications intelligence and failed to coordinate with civilian agencies that shared its interests such as the Department of State, Central Intelligence Agency (CIA) and the Federal Bureau of Investigation (FBI). In December 1951, President Harry S. Truman ordered a panel to investigate how AFSA had failed to achieve its goals. The results of the investigation led to improvements and its redesignation as the National Security Agency.

The National Security Council issued a memorandum of October 24, 1952, that revised National Security Council Intelligence Directive (NSCID) 9. On the same day, Truman issued a second memorandum that called for the establishment of the NSA. The actual establishment of the NSA was done by a November 4 memo by Robert A. Lovett, the Secretary of Defense, changing the name of the AFSA to the NSA, and making the new agency responsible for all communications intelligence. Since President Truman's memo was a classified document, the existence of the NSA was not known to the public at that time. Due to its ultra-secrecy the U.S. intelligence community referred to the NSA as "No Such Agency".

Vietnam War

In the 1960s, the NSA played a key role in expanding U.S. commitment to the Vietnam War by providing evidence of a North Vietnamese attack on the American destroyer USS Maddox during the Gulf of Tonkin incident.

A secret operation, code-named "MINARET", was set up by the NSA to monitor the phone communications of Senators Frank Church and Howard Baker, as well as key leaders of the civil rights movement, including Martin Luther King Jr., and prominent U.S. journalists and athletes who criticized the Vietnam War. However, the project turned out to be controversial, and an internal review by the NSA concluded that its Minaret program was "disreputable if not outright illegal".

The NSA mounted a major effort to secure tactical communications among U.S. forces during the war with mixed success. The NESTOR family of compatible secure voice systems it developed was widely deployed during the Vietnam War, with about 30,000 NESTOR sets produced. However a variety of technical and operational problems limited their use, allowing the North Vietnamese to exploit and intercept U.S. communications.

Church Committee hearings

In the aftermath of the Watergate scandal, a congressional hearing in 1975 led by Senator Frank Church revealed that the NSA, in collaboration with Britain's SIGINT intelligence agency Government Communications Headquarters (GCHQ), had routinely intercepted the international communications of prominent anti-Vietnam war leaders such as Jane Fonda and Dr. Benjamin Spock. The Agency tracked these individuals in a secret filing system that was destroyed in 1974. Following the resignation of President Richard Nixon, there were several investigations of suspected misuse of FBI, CIA and NSA facilities. Senator Frank Church uncovered previously unknown activity, such as a CIA plot (ordered by the administration of President John F. Kennedy) to assassinate Fidel Castro. The investigation also uncovered NSA's wiretaps on targeted U.S. citizens.

After the Church Committee hearings, the Foreign Intelligence Surveillance Act of 1978 was passed into law. This was designed to limit the practice of mass surveillance in the United States.

From 1980s to 1990s

In 1986, the NSA intercepted the communications of the Libyan government during the immediate aftermath of the Berlin discotheque bombing. The White House asserted that the NSA interception had provided "irrefutable" evidence that Libya was behind the bombing, which U.S. President Ronald Reagan cited as a justification for the 1986 United States bombing of Libya.

In 1999, a multi-year investigation by the European Parliament highlighted the NSA's role in economic espionage in a report entitled 'Development of Surveillance Technology and Risk of Abuse of Economic Information'. That year, the NSA founded the NSA Hall of Honor, a memorial at the National Cryptologic Museum in Fort Meade, Maryland. The memorial is a, "tribute to the pioneers and heroes who have made significant and long-lasting contributions to American cryptology". NSA employees must be retired for more than fifteen years to qualify for the memorial.

NSA's infrastructure deteriorated in the 1990s as defense budget cuts resulted in maintenance deferrals. On January 24, 2000, NSA headquarters suffered a total network outage for three days caused by an overloaded network. Incoming traffic was successfully stored on agency servers, but it could not be directed and processed. The agency carried out emergency repairs at a cost of $3 million to get the system running again. (Some incoming traffic was also directed instead to Britain's GCHQ for the time being.) Director Michael Hayden called the outage a "wake-up call" for the need to invest in the agency's infrastructure.

In the 1990s the defensive arm of the NSA—the Information Assurance Directorate (IAD)—started working more openly; the first public technical talk by an NSA scientist at a major cryptography conference was J. Solinas' presentation on efficient Elliptic Curve Cryptography algorithms at Crypto 1997. The IAD's cooperative approach to academia and industry culminated in its support for a transparent process for replacing the outdated Data Encryption Standard (DES) by an Advanced Encryption Standard (AES). Cybersecurity policy expert Susan Landau attributes the NSA's harmonious collaboration with industry and academia in the selection of the AES in 2000—and the Agency's support for the choice of a strong encryption algorithm designed by Europeans rather than by Americans—to Brian Snow, who was the Technical Director of IAD and represented the NSA as cochairman of the Technical Working Group for the AES competition, and Michael Jacobs, who headed IAD at the time.

After the terrorist attacks of September 11, 2001, the NSA believed that it had public support for a dramatic expansion of its surveillance activities. According to Neal Koblitz and Alfred Menezes, the period when the NSA was a trusted partner with academia and industry in the development of cryptographic standards started to come to an end when, as part of the change in the NSA in the post-September 11 era, Snow was replaced as Technical Director, Jacobs retired, and IAD could no longer effectively oppose proposed actions by the offensive arm of the NSA.

War on Terror

In the aftermath of the September 11 attacks, the NSA created new IT systems to deal with the flood of information from new technologies like the Internet and cellphones. ThinThread contained advanced data mining capabilities. It also had a "privacy mechanism"; surveillance was stored encrypted; decryption required a warrant. The research done under this program may have contributed to the technology used in later systems. ThinThread was cancelled when Michael Hayden chose Trailblazer, which did not include ThinThread's privacy system.

Trailblazer Project ramped up in 2002 and was worked on by Science Applications International Corporation (SAIC), Boeing, Computer Sciences Corporation, IBM, and Litton Industries. Some NSA whistleblowers complained internally about major problems surrounding Trailblazer. This led to investigations by Congress and the NSA and DoD Inspectors General. The project was cancelled in early 2004.

Turbulence started in 2005. It was developed in small, inexpensive "test" pieces, rather than one grand plan like Trailblazer. It also included offensive cyber-warfare capabilities, like injecting malware into remote computers. Congress criticized Turbulence in 2007 for having similar bureaucratic problems as Trailblazer. It was to be a realization of information processing at higher speeds in cyberspace.

Global surveillance disclosures

The massive extent of the NSA's spying, both foreign and domestic, was revealed to the public in a series of detailed disclosures of internal NSA documents beginning in June 2013. Most of the disclosures were leaked by former NSA contractor Edward Snowden. On 4 September 2020, the NSA’s surveillance program was ruled unlawful by the US Court of Appeals. The court also added that the US intelligence leaders, who publicly defended it, were not telling the truth.

Mission

NSA's eavesdropping mission includes radio broadcasting, both from various organizations and individuals, the Internet, telephone calls, and other intercepted forms of communication. Its secure communications mission includes military, diplomatic, and all other sensitive, confidential or secret government communications.

According to a 2010 article in The Washington Post, "[e]very day, collection systems at the National Security Agency intercept and store 1.7 billion e-mails, phone calls and other types of communications. The NSA sorts a fraction of those into 70 separate databases."

Because of its listening task, NSA/CSS has been heavily involved in cryptanalytic research, continuing the work of predecessor agencies which had broken many World War II codes and ciphers.

In 2004, NSA Central Security Service and the National Cyber Security Division of the Department of Homeland Security (DHS) agreed to expand NSA Centers of Academic Excellence in Information Assurance Education Program.

As part of the National Security Presidential Directive 54/Homeland Security Presidential Directive 23 (NSPD 54), signed on January 8, 2008, by President Bush, the NSA became the lead agency to monitor and protect all of the federal government's computer networks from cyber-terrorism.

Operations

Operations by the National Security Agency can be divided in three types:

  • Collection overseas, which falls under the responsibility of the Global Access Operations (GAO) division.
  • Domestic collection, which falls under the responsibility of the Special Source Operations (SSO) division.
  • Hacking operations, which falls under the responsibility of the Tailored Access Operations (TAO) division.

Collection overseas

Echelon

"Echelon" was created in the incubator of the Cold War. Today it is a legacy system, and several NSA stations are closing.

NSA/CSS, in combination with the equivalent agencies in the United Kingdom (Government Communications Headquarters), Canada (Communications Security Establishment), Australia (Australian Signals Directorate), and New Zealand (Government Communications Security Bureau), otherwise known as the UKUSA group, was reported to be in command of the operation of the so-called ECHELON system. Its capabilities were suspected to include the ability to monitor a large proportion of the world's transmitted civilian telephone, fax and data traffic.

During the early 1970s, the first of what became more than eight large satellite communications dishes were installed at Menwith Hill. Investigative journalist Duncan Campbell reported in 1988 on the "ECHELON" surveillance program, an extension of the UKUSA Agreement on global signals intelligence SIGINT, and detailed how the eavesdropping operations worked. On November 3, 1999 the BBC reported that they had confirmation from the Australian Government of the existence of a powerful "global spying network" code-named Echelon, that could "eavesdrop on every single phone call, fax or e-mail, anywhere on the planet" with Britain and the United States as the chief protagonists. They confirmed that Menwith Hill was "linked directly to the headquarters of the US National Security Agency (NSA) at Fort Meade in Maryland".

NSA's United States Signals Intelligence Directive 18 (USSID 18) strictly prohibited the interception or collection of information about "... U.S. persons, entities, corporations or organizations...." without explicit written legal permission from the United States Attorney General when the subject is located abroad, or the Foreign Intelligence Surveillance Court when within U.S. borders. Alleged Echelon-related activities, including its use for motives other than national security, including political and industrial espionage, received criticism from countries outside the UKUSA alliance.

Protesters against NSA data mining in Berlin wearing Chelsea Manning and Edward Snowden masks

Other SIGINT operations overseas

The NSA was also involved in planning to blackmail people with "SEXINT", intelligence gained about a potential target's sexual activity and preferences. Those targeted had not committed any apparent crime nor were they charged with one.

In order to support its facial recognition program, the NSA is intercepting "millions of images per day".

The Real Time Regional Gateway is a data collection program introduced in 2005 in Iraq by NSA during the Iraq War that consisted of gathering all electronic communication, storing it, then searching and otherwise analyzing it. It was effective in providing information about Iraqi insurgents who had eluded less comprehensive techniques. This "collect it all" strategy introduced by NSA director, Keith B. Alexander, is believed by Glenn Greenwald of The Guardian to be the model for the comprehensive worldwide mass archiving of communications which NSA is engaged in as of 2013.

A dedicated unit of the NSA locates targets for the CIA for extrajudicial assassination in the Middle East. The NSA has also spied extensively on the European Union, the United Nations and numerous governments including allies and trading partners in Europe, South America and Asia.

In June 2015, WikiLeaks published documents showing that NSA spied on French companies.

In July 2015, WikiLeaks published documents showing that NSA spied on federal German ministries since the 1990s. Even Germany's Chancellor Angela Merkel's cellphones and phone of her predecessors had been intercepted.

Boundless Informant

Edward Snowden revealed in June 2013 that between February 8 and March 8, 2013, the NSA collected about 124.8 billion telephone data items and 97.1 billion computer data items throughout the world, as was displayed in charts from an internal NSA tool codenamed Boundless Informant. Initially, it was reported that some of these data reflected eavesdropping on citizens in countries like Germany, Spain and France, but later on, it became clear that those data were collected by European agencies during military missions abroad and were subsequently shared with NSA.

Bypassing encryption

In 2013, reporters uncovered a secret memo that claims the NSA created and pushed for the adoption of the Dual EC DRBG encryption standard that contained built-in vulnerabilities in 2006 to the United States National Institute of Standards and Technology (NIST), and the International Organization for Standardization (aka ISO). This memo appears to give credence to previous speculation by cryptographers at Microsoft Research. Edward Snowden claims that the NSA often bypasses encryption altogether by lifting information before it is encrypted or after it is decrypted.

XKeyscore rules (as specified in a file xkeyscorerules100.txt, sourced by German TV stations NDR and WDR, who claim to have excerpts from its source code) reveal that the NSA tracks users of privacy-enhancing software tools, including Tor; an anonymous email service provided by the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) in Cambridge, Massachusetts; and readers of the Linux Journal.

Software backdoors

Linus Torvalds, the founder of Linux kernel, joked during a LinuxCon keynote on September 18, 2013, that the NSA, who are the founder of SELinux, wanted a backdoor in the kernel. However, later, Linus' father, a Member of the European Parliament (MEP), revealed that the NSA actually did this.

When my oldest son was asked the same question: "Has he been approached by the NSA about backdoors?" he said "No", but at the same time he nodded. Then he was sort of in the legal free. He had given the right answer, everybody understood that the NSA had approached him.

— Nils Torvalds, LIBE Committee Inquiry on Electronic Mass Surveillance of EU Citizens – 11th Hearing, 11 November 2013

IBM Notes was the first widely adopted software product to use public key cryptography for client–server and server–server authentication and for encryption of data. Until US laws regulating encryption were changed in 2000, IBM and Lotus were prohibited from exporting versions of Notes that supported symmetric encryption keys that were longer than 40 bits. In 1997, Lotus negotiated an agreement with the NSA that allowed export of a version that supported stronger keys with 64 bits, but 24 of the bits were encrypted with a special key and included in the message to provide a "workload reduction factor" for the NSA. This strengthened the protection for users of Notes outside the US against private-sector industrial espionage, but not against spying by the US government.

Boomerang routing

While it is assumed that foreign transmissions terminating in the U.S. (such as a non-U.S. citizen accessing a U.S. website) subject non-U.S. citizens to NSA surveillance, recent research into boomerang routing has raised new concerns about the NSA's ability to surveil the domestic Internet traffic of foreign countries. Boomerang routing occurs when an Internet transmission that originates and terminates in a single country transits another. Research at the University of Toronto has suggested that approximately 25% of Canadian domestic traffic may be subject to NSA surveillance activities as a result of the boomerang routing of Canadian Internet service providers.

Hardware implanting

Intercepted packages are opened carefully by NSA employees
 
A "load station" implanting a beacon

A document included in NSA files released with Glenn Greenwald's book No Place to Hide details how the agency's Tailored Access Operations (TAO) and other NSA units gain access to hardware. They intercept routers, servers and other network hardware being shipped to organizations targeted for surveillance and install covert implant firmware onto them before they are delivered. This was described by an NSA manager as "some of the most productive operations in TAO because they preposition access points into hard target networks around the world."

Computers seized by the NSA due to interdiction are often modified with a physical device known as Cottonmouth. Cottonmouth is a device that can be inserted in the USB port of a computer in order to establish remote access to the targeted machine. According to NSA's Tailored Access Operations (TAO) group implant catalog, after implanting Cottonmouth, the NSA can establish a network bridge "that allows the NSA to load exploit software onto modified computers as well as allowing the NSA to relay commands and data between hardware and software implants."

Domestic collection

NSA's mission, as set forth in Executive Order 12333 in 1981, is to collect information that constitutes "foreign intelligence or counterintelligence" while not "acquiring information concerning the domestic activities of United States persons". NSA has declared that it relies on the FBI to collect information on foreign intelligence activities within the borders of the United States, while confining its own activities within the United States to the embassies and missions of foreign nations.

The appearance of a 'Domestic Surveillance Directorate' of the NSA was soon exposed as a hoax in 2013.

NSA's domestic surveillance activities are limited by the requirements imposed by the Fourth Amendment to the U.S. Constitution. The Foreign Intelligence Surveillance Court for example held in October 2011, citing multiple Supreme Court precedents, that the Fourth Amendment prohibitions against unreasonable searches and seizures applies to the contents of all communications, whatever the means, because "a person's private communications are akin to personal papers." However, these protections do not apply to non-U.S. persons located outside of U.S. borders, so the NSA's foreign surveillance efforts are subject to far fewer limitations under U.S. law. The specific requirements for domestic surveillance operations are contained in the Foreign Intelligence Surveillance Act of 1978 (FISA), which does not extend protection to non-U.S. citizens located outside of U.S. territory.

President's Surveillance Program

George W. Bush, president during the 9/11 terrorist attacks, approved the Patriot Act shortly after the attacks to take anti-terrorist security measures. Title 1, 2, and 9 specifically authorized measures that would be taken by the NSA. These titles granted enhanced domestic security against terrorism, surveillance procedures, and improved intelligence, respectively. On March 10, 2004, there was a debate between President Bush and White House Counsel Alberto Gonzales, Attorney General John Ashcroft, and Acting Attorney General James Comey. The Attorneys General were unsure if the NSA's programs could be considered constitutional. They threatened to resign over the matter, but ultimately the NSA's programs continued. On March 11, 2004, President Bush signed a new authorization for mass surveillance of Internet records, in addition to the surveillance of phone records. This allowed the president to be able to override laws such as the Foreign Intelligence Surveillance Act, which protected civilians from mass surveillance. In addition to this, President Bush also signed that the measures of mass surveillance were also retroactively in place.

The PRISM program

PRISM: a clandestine surveillance program under which the NSA collects user data from companies like Microsoft and Facebook.

Under the PRISM program, which started in 2007, NSA gathers Internet communications from foreign targets from nine major U.S. Internet-based communication service providers: Microsoft, Yahoo, Google, Facebook, PalTalk, AOL, Skype, YouTube and Apple. Data gathered include email, videos, photos, VoIP chats such as Skype, and file transfers.

Former NSA director General Keith Alexander claimed that in September 2009 the NSA prevented Najibullah Zazi and his friends from carrying out a terrorist attack. However, this claim has been debunked and no evidence has been presented demonstrating that the NSA has ever been instrumental in preventing a terrorist attack.

Hacking operations

Besides the more traditional ways of eavesdropping in order to collect signals intelligence, NSA is also engaged in hacking computers, smartphones and their networks. These operations are conducted by the Tailored Access Operations (TAO) division, which has been active since at least circa 1998.

According to the Foreign Policy magazine, "... the Office of Tailored Access Operations, or TAO, has successfully penetrated Chinese computer and telecommunications systems for almost 15 years, generating some of the best and most reliable intelligence information about what is going on inside the People's Republic of China."

In an interview with Wired magazine, Edward Snowden said the Tailored Access Operations division accidentally caused Syria's internet blackout in 2012.

Organizational structure

Paul M. Nakasone, the director of the NSA.

The NSA is led by the Director of the National Security Agency (DIRNSA), who also serves as Chief of the Central Security Service (CHCSS) and Commander of the United States Cyber Command (USCYBERCOM) and is the highest-ranking military official of these organizations. He is assisted by a Deputy Director, who is the highest-ranking civilian within the NSA/CSS.

NSA also has an Inspector General, head of the Office of the Inspector General (OIG), a General Counsel, head of the Office of the General Counsel (OGC) and a Director of Compliance, who is head of the Office of the Director of Compliance (ODOC).

Unlike other intelligence organizations such as CIA or DIA, NSA has always been particularly reticent concerning its internal organizational structure.

As of the mid-1990s, the National Security Agency was organized into five Directorates:

  • The Operations Directorate, which was responsible for SIGINT collection and processing.
  • The Technology and Systems Directorate, which develops new technologies for SIGINT collection and processing.
  • The Information Systems Security Directorate, which was responsible for NSA's communications and information security missions.
  • The Plans, Policy and Programs Directorate, which provided staff support and general direction for the Agency.
  • The Support Services Directorate, which provided logistical and administrative support activities.

Each of these directorates consisted of several groups or elements, designated by a letter. There were for example the A Group, which was responsible for all SIGINT operations against the Soviet Union and Eastern Europe, and G Group, which was responsible for SIGINT related to all non-communist countries. These groups were divided in units designated by an additional number, like unit A5 for breaking Soviet codes, and G6, being the office for the Middle East, North Africa, Cuba, Central and South America.

Directorates

As of 2013, NSA has about a dozen directorates, which are designated by a letter, although not all of them are publicly known. The directorates are divided in divisions and units starting with the letter of the parent directorate, followed by a number for the division, the sub-unit or a sub-sub-unit.

The main elements of the organizational structure of the NSA are:

    • DP – Associate Directorate for European Affairs
      • DP1 –
        • DP11 – Office of European Affairs
        • DP12 – Office of Tunisian Affairs
        • DP15 – Office of British Affairs
  • E – Directorate for Education and Training
  • F –
    • F1 -
      • F1A2 - Office of the NSA Representative of US Diplomatic Missions
    • F4 –
      • F406 – Office of Foreign Affairs, Pacific Field
    • F6 – Special Collection Service acronymically the SCS, a joint program created by CIA and NSA in 1978 to facilitate clandestine activities such as bugging computers throughout the world, using the expertise of both agencies.
    • F7 -
  • G – Directorate only known from unit G112, the office that manages the Senior Span platform, attached to the U2 spy planes. Also known by GS2E4, which manages Iranian digital networks
  • H – Only known for joint activities, this directorate specializes in collaboration with other nations
        • H52G – Joint Signal Activity
  • I – Information Assurance Directorate (IAD), which ensures availability, integrity, authentication, confidentiality, and non-repudiation of national security and telecommunications and information systems (national security systems).
  • J – Directorate only known from unit J2, the Cryptologic Intelligence Unit
  • L – Installation and Logistics
    • LL – Services
      • LL1 – Material Management
      • LL2 – Transportation, Asset, and Disposition Services
        • LL23 –
          • LL234 –
            • LL234M – Property Support
  • M – Associate Directorate for Human Resources (ADHRS)
  • Q – Security and Counterintelligence
  • R – Research Directorate, which conducts research on signals intelligence and on information assurance for the U.S. Government.
  • S – Signals Intelligence Directorate (SID), which is responsible for the collection, analysis, production and dissemination of signals intelligence. This directorate is led by a director and a deputy director. The SID consists of the following divisions:
    • S0 – SID Workforce Performance
    • S1 – Customer Relations
      • S11 – Customer Gateway
        • S112 – NGA Accounting Management
      • S12 Known under S12 is S12C the Consumer Services unit
      • S17 – Strategic Intelligence Disputes and Issues
    • S2 – Analysis and Production Centers, with the following so-called Product Lines:
      • S211 –
        • S211A – Advanced Analysis Lab
      • S2A: South Asia
      • S2B: China and Korea
      • S2C: International Security
      • S2D – Foreign Counterintelligence
        • S2D3 –
        • S2D31 – Operations Support
      • S2E: Middle East/Asia
          • S2E33: Operational Technologies in Middle East and Asia
      • S2F: International Crime
        • S2F2:
          • S2F21: Transnational Crime
      • S2G: Counter-proliferation
      • S2H: Russia
      • S2I – Counter-terrorism
      • S2I02 – Management Services
      • S2J: Weapons and Space
      • S2T: Current Threats
    • S3 – Data Acquisition, with these divisions for the main collection programs:
      • S31 – Cryptanalysis and Exploitation Services (CES)
        • S311 –
          • S3115 –
            • S31153 – Target Analysis Branch of Network Information Exploitation
      • S32 – Tailored Access Operations (TAO), which hacks into foreign computers to conduct cyber-espionage and reportedly is "the largest and arguably the most important component of the NSA's huge Signal Intelligence (SIGINT) Directorate, consisting of over 1,000 military and civilian computer hackers, intelligence analysts, targeting specialists, computer hardware and software designers, and electrical engineers."
      • S33 – Global Access Operations (GAO), which is responsible for intercepts from satellites and other international SIGINT platforms. A tool which details and maps the information collected by this unit is code-named Boundless Informant.
      • S34 – Collections Strategies and Requirements Center
      • S35 – Special Source Operations (SSO), which is responsible for domestic and compartmented collection programs, like for example the PRISM program. Special Source Operations is also mentioned in connection to the FAIRVIEW collection program.
  • T – Technical Directorate (TD)
    • T1 Mission Capabilities
    • T2 Business Capabilities
    • T3 Enterprise IT Services
  • V – Threat Assessment Directorate also known as the NTOC National Threat Operations Center
  • Directorate for Corporate Leadership
  • Foreign Affairs Directorate, which acts as liaison with foreign intelligence services, counter-intelligence centers and the UKUSA-partners.
  • Acquisitions and Procurement Directorate
  • Information Sharing Services (ISS), led by a chief and a deputy chief.

In the year 2000, a leadership team was formed, consisting of the Director, the Deputy Director and the Directors of the Signals Intelligence (SID), the Information Assurance (IAD) and the Technical Directorate (TD). The chiefs of other main NSA divisions became associate directors of the senior leadership team.

After president George W. Bush initiated the President's Surveillance Program (PSP) in 2001, the NSA created a 24-hour Metadata Analysis Center (MAC), followed in 2004 by the Advanced Analysis Division (AAD), with the mission of analyzing content, Internet metadata and telephone metadata. Both units were part of the Signals Intelligence Directorate.

A 2016 proposal would combine the Signals Intelligence Directorate with Information Assurance Directorate into Directorate of Operations.

NSANet

Behind the Green Door – Secure communications room with separate computer terminals for access to SIPRNET, GWAN, NSANET, and JWICS

NSANet stands for National Security Agency Network and is the official NSA intranet. It is a classified network, for information up to the level of TS/SCI to support the use and sharing of intelligence data between NSA and the signals intelligence agencies of the four other nations of the Five Eyes partnership. The management of NSANet has been delegated to the Central Security Service Texas (CSSTEXAS).

NSANet is a highly secured computer network consisting of fiber-optic and satellite communication channels which are almost completely separated from the public Internet. The network allows NSA personnel and civilian and military intelligence analysts anywhere in the world to have access to the agency's systems and databases. This access is tightly controlled and monitored. For example, every keystroke is logged, activities are audited at random and downloading and printing of documents from NSANet are recorded.

In 1998, NSANet, along with NIPRNET and SIPRNET, had "significant problems with poor search capabilities, unorganized data and old information". In 2004, the network was reported to have used over twenty commercial off-the-shelf operating systems. Some universities that do highly sensitive research are allowed to connect to it.

The thousands of Top Secret internal NSA documents that were taken by Edward Snowden in 2013 were stored in "a file-sharing location on the NSA's intranet site"; so, they could easily be read online by NSA personnel. Everyone with a TS/SCI-clearance had access to these documents. As a system administrator, Snowden was responsible for moving accidentally misplaced highly sensitive documents to safer storage locations.

Watch centers

The NSA maintains at least two watch centers:

  • National Security Operations Center (NSOC), which is the NSA's current operations center and focal point for time-sensitive SIGINT reporting for the United States SIGINT System (USSS). This center was established in 1968 as the National SIGINT Watch Center (NSWC) and renamed into National SIGINT Operations Center (NSOC) in 1973. This "nerve center of the NSA" got its current name in 1996.
  • NSA/CSS Threat Operations Center (NTOC), which is the primary NSA/CSS partner for Department of Homeland Security response to cyber incidents. The NTOC establishes real-time network awareness and threat characterization capabilities to forecast, alert, and attribute malicious activity and enable the coordination of Computer Network Operations. The NTOC was established in 2004 as a joint Information Assurance and Signals Intelligence project.

Employees

The number of NSA employees is officially classified but there are several sources providing estimates. In 1961, NSA had 59,000 military and civilian employees, which grew to 93,067 in 1969, of which 19,300 worked at the headquarters at Fort Meade. In the early 1980s NSA had roughly 50,000 military and civilian personnel. By 1989 this number had grown again to 75,000, of which 25,000 worked at the NSA headquarters. Between 1990 and 1995 the NSA's budget and workforce were cut by one third, which led to a substantial loss of experience.

In 2012, the NSA said more than 30,000 employees worked at Fort Meade and other facilities. In 2012, John C. Inglis, the deputy director, said that the total number of NSA employees is "somewhere between 37,000 and one billion" as a joke, and stated that the agency is "probably the biggest employer of introverts." In 2013 Der Spiegel stated that the NSA had 40,000 employees. More widely, it has been described as the world's largest single employer of mathematicians. Some NSA employees form part of the workforce of the National Reconnaissance Office (NRO), the agency that provides the NSA with satellite signals intelligence.

As of 2013 about 1,000 system administrators work for the NSA.

Personnel security

The NSA received criticism early on in 1960 after two agents had defected to the Soviet Union. Investigations by the House Un-American Activities Committee and a special subcommittee of the United States House Committee on Armed Services revealed severe cases of ignorance in personnel security regulations, prompting the former personnel director and the director of security to step down and leading to the adoption of stricter security practices. Nonetheless, security breaches reoccurred only a year later when in an issue of Izvestia of July 23, 1963, a former NSA employee published several cryptologic secrets.

The very same day, an NSA clerk-messenger committed suicide as ongoing investigations disclosed that he had sold secret information to the Soviets on a regular basis. The reluctance of Congressional houses to look into these affairs had prompted a journalist to write, "If a similar series of tragic blunders occurred in any ordinary agency of Government an aroused public would insist that those responsible be officially censured, demoted, or fired." David Kahn criticized the NSA's tactics of concealing its doings as smug and the Congress' blind faith in the agency's right-doing as shortsighted, and pointed out the necessity of surveillance by the Congress to prevent abuse of power.

Edward Snowden's leaking of the existence of PRISM in 2013 caused the NSA to institute a "two-man rule", where two system administrators are required to be present when one accesses certain sensitive information. Snowden claims he suggested such a rule in 2009.

Polygraphing
Defense Security Service (DSS) polygraph brochure given to NSA applicants

The NSA conducts polygraph tests of employees. For new employees, the tests are meant to discover enemy spies who are applying to the NSA and to uncover any information that could make an applicant pliant to coercion. As part of the latter, historically EPQs or "embarrassing personal questions" about sexual behavior had been included in the NSA polygraph. The NSA also conducts five-year periodic reinvestigation polygraphs of employees, focusing on counterintelligence programs. In addition the NSA conducts periodic polygraph investigations in order to find spies and leakers; those who refuse to take them may receive "termination of employment", according to a 1982 memorandum from the director of NSA.

NSA-produced video on the polygraph process

There are also "special access examination" polygraphs for employees who wish to work in highly sensitive areas, and those polygraphs cover counterintelligence questions and some questions about behavior. NSA's brochure states that the average test length is between two and four hours. A 1983 report of the Office of Technology Assessment stated that "It appears that the NSA [National Security Agency] (and possibly CIA) use the polygraph not to determine deception or truthfulness per se, but as a technique of interrogation to encourage admissions." Sometimes applicants in the polygraph process confess to committing felonies such as murder, rape, and selling of illegal drugs. Between 1974 and 1979, of the 20,511 job applicants who took polygraph tests, 695 (3.4%) confessed to previous felony crimes; almost all of those crimes had been undetected.

In 2010 the NSA produced a video explaining its polygraph process. The video, ten minutes long, is titled "The Truth About the Polygraph" and was posted to the Web site of the Defense Security Service. Jeff Stein of The Washington Post said that the video portrays "various applicants, or actors playing them—it's not clear—describing everything bad they had heard about the test, the implication being that none of it is true." AntiPolygraph.org argues that the NSA-produced video omits some information about the polygraph process; it produced a video responding to the NSA video. George Maschke, the founder of the Web site, accused the NSA polygraph video of being "Orwellian".

After Edward Snowden revealed his identity in 2013, the NSA began requiring polygraphing of employees once per quarter.

Arbitrary firing

The number of exemptions from legal requirements has been criticized. When in 1964 the Congress was hearing a bill giving the director of the NSA the power to fire at will any employee, The Washington Post wrote: "This is the very definition of arbitrariness. It means that an employee could be discharged and disgraced on the basis of anonymous allegations without the slightest opportunity to defend himself." Yet, the bill was accepted by an overwhelming majority. Also, every person hired to a job in the US after 2007, at any private organization, state or federal government agency, must be reported to the New Hire Registry, ostensibly to look for child support evaders, except that employees of an intelligence agency may be excluded from reporting if the director deems it necessary for national security reasons.

Facilities

Headquarters

History of headquarters

Headquarters at Fort Meade circa 1950s

When the agency was first established, its headquarters and cryptographic center were in the Naval Security Station in Washington, D.C. The COMINT functions were located in Arlington Hall in Northern Virginia, which served as the headquarters of the U.S. Army's cryptographic operations. Because the Soviet Union had detonated a nuclear bomb and because the facilities were crowded, the federal government wanted to move several agencies, including the AFSA/NSA. A planning committee considered Fort Knox, but Fort Meade, Maryland, was ultimately chosen as NSA headquarters because it was far enough away from Washington, D.C. in case of a nuclear strike and was close enough so its employees would not have to move their families.

Construction of additional buildings began after the agency occupied buildings at Fort Meade in the late 1950s, which they soon outgrew. In 1963 the new headquarters building, nine stories tall, opened. NSA workers referred to the building as the "Headquarters Building" and since the NSA management occupied the top floor, workers used "Ninth Floor" to refer to their leaders. COMSEC remained in Washington, D.C., until its new building was completed in 1968. In September 1986, the Operations 2A and 2B buildings, both copper-shielded to prevent eavesdropping, opened with a dedication by President Ronald Reagan. The four NSA buildings became known as the "Big Four." The NSA director moved to 2B when it opened.

National Security Agency headquarters in Fort Meade, 2013

Headquarters for the National Security Agency is located at 39°6′32″N 76°46′17″W in Fort George G. Meade, Maryland, although it is separate from other compounds and agencies that are based within this same military installation. Fort Meade is about 20 mi (32 km) southwest of Baltimore, and 25 mi (40 km) northeast of Washington, D.C. The NSA has two dedicated exits off Baltimore–Washington Parkway. The Eastbound exit from the Parkway (heading toward Baltimore) is open to the public and provides employee access to its main campus and public access to the National Cryptology Museum. The Westbound side exit, (heading toward Washington) is labeled "NSA Employees Only".  The exit may only be used by people with the proper clearances, and security vehicles parked along the road guard the entrance.

NSA is the largest employer in the state of Maryland, and two-thirds of its personnel work at Fort Meade. Built on 350 acres (140 ha; 0.55 sq mi) of Fort Meade's 5,000 acres (2,000 ha; 7.8 sq mi), the site has 1,300 buildings and an estimated 18,000 parking spaces.

NSA headquarters building in Fort Meade (left), NSOC (right)

The main NSA headquarters and operations building is what James Bamford, author of Body of Secrets, describes as "a modern boxy structure" that appears similar to "any stylish office building." The building is covered with one-way dark glass, which is lined with copper shielding in order to prevent espionage by trapping in signals and sounds. It contains 3,000,000 square feet (280,000 m2), or more than 68 acres (28 ha), of floor space; Bamford said that the U.S. Capitol "could easily fit inside it four times over."

The facility has over 100 watchposts, one of them being the visitor control center, a two-story area that serves as the entrance. At the entrance, a white pentagonal structure, visitor badges are issued to visitors and security clearances of employees are checked. The visitor center includes a painting of the NSA seal.

The OPS2A building, the tallest building in the NSA complex and the location of much of the agency's operations directorate, is accessible from the visitor center. Bamford described it as a "dark glass Rubik's Cube". The facility's "red corridor" houses non-security operations such as concessions and the drug store. The name refers to the "red badge" which is worn by someone without a security clearance. The NSA headquarters includes a cafeteria, a credit union, ticket counters for airlines and entertainment, a barbershop, and a bank. NSA headquarters has its own post office, fire department, and police force.

The employees at the NSA headquarters reside in various places in the Baltimore-Washington area, including Annapolis, Baltimore, and Columbia in Maryland and the District of Columbia, including the Georgetown community. The NSA maintains a shuttle service from the Odenton station of MARC to its Visitor Control Center and has done so since 2005.

Power consumption

Due to massive amounts of data processing, NSA is the largest electricity consumer in Maryland.

Following a major power outage in 2000, in 2003 and in follow-ups through 2007, The Baltimore Sun reported that the NSA was at risk of electrical overload because of insufficient internal electrical infrastructure at Fort Meade to support the amount of equipment being installed. This problem was apparently recognized in the 1990s but not made a priority, and "now the agency's ability to keep its operations going is threatened."

On August 6, 2006, The Baltimore Sun reported that the NSA had completely maxed out the grid, and that Baltimore Gas & Electric (BGE, now Constellation Energy) was unable to sell them any more power. NSA decided to move some of its operations to a new satellite facility.

BGE provided NSA with 65 to 75 megawatts at Fort Meade in 2007, and expected that an increase of 10 to 15 megawatts would be needed later that year. In 2011, the NSA was Maryland's largest consumer of power. In 2007, as BGE's largest customer, NSA bought as much electricity as Annapolis, the capital city of Maryland.

One estimate put the potential for power consumption by the new Utah Data Center at US$40 million per year.

Computing assets

In 1995, The Baltimore Sun reported that the NSA is the owner of the single largest group of supercomputers.

NSA held a groundbreaking ceremony at Fort Meade in May 2013 for its High Performance Computing Center 2, expected to open in 2016. Called Site M, the center has a 150 megawatt power substation, 14 administrative buildings and 10 parking garages. It cost $3.2 billion and covers 227 acres (92 ha; 0.355 sq mi). The center is 1,800,000 square feet (17 ha; 0.065 sq mi) and initially uses 60 megawatts of electricity.

Increments II and III are expected to be completed by 2030, and would quadruple the space, covering 5,800,000 square feet (54 ha; 0.21 sq mi) with 60 buildings and 40 parking garages. Defense contractors are also establishing or expanding cybersecurity facilities near the NSA and around the Washington metropolitan area.

National Computer Security Center

The DoD Computer Security Center was founded in 1981 and renamed the National Computer Security Center (NCSC) in 1985. NCSC was responsible for computer security throughout the federal government. NCSC was part of NSA, and during the late 1980s and the 1990s, NSA and NCSC published Trusted Computer System Evaluation Criteria in a six-foot high Rainbow Series of books that detailed trusted computing and network platform specifications. The Rainbow books were replaced by the Common Criteria, however, in the early 2000s.

Other U.S. facilities

As of 2012, NSA collected intelligence from four geostationary satellites. Satellite receivers were at Roaring Creek Station in Catawissa, Pennsylvania and Salt Creek Station in Arbuckle, California. It operated ten to twenty taps on U.S. telecom switches. NSA had installations in several U.S. states and from them observed intercepts from Europe, the Middle East, North Africa, Latin America, and Asia.

NSA had facilities at Friendship Annex (FANX) in Linthicum, Maryland, which is a 20 to 25-minute drive from Fort Meade; the Aerospace Data Facility at Buckley Air Force Base in Aurora outside Denver, Colorado; NSA Texas in the Texas Cryptology Center at Lackland Air Force Base in San Antonio, Texas; NSA Georgia at Fort Gordon in Augusta, Georgia; NSA Hawaii in Honolulu; the Multiprogram Research Facility in Oak Ridge, Tennessee, and elsewhere.

On January 6, 2011, a groundbreaking ceremony was held to begin construction on NSA's first Comprehensive National Cyber-security Initiative (CNCI) Data Center, known as the "Utah Data Center" for short. The $1.5B data center is being built at Camp Williams, Utah, located 25 miles (40 km) south of Salt Lake City, and will help support the agency's National Cyber-security Initiative. It is expected to be operational by September 2013. Construction of Utah Data Center finished in May 2019.

In 2009, to protect its assets and access more electricity, NSA sought to decentralize and expand its existing facilities in Fort Meade and Menwith Hill, the latter expansion expected to be completed by 2015.

The Yakima Herald-Republic cited Bamford, saying that many of NSA's bases for its Echelon program were a legacy system, using outdated, 1990s technology. In 2004, NSA closed its operations at Bad Aibling Station (Field Station 81) in Bad Aibling, Germany. In 2012, NSA began to move some of its operations at Yakima Research Station, Yakima Training Center, in Washington state to Colorado, planning to leave Yakima closed. As of 2013, NSA also intended to close operations at Sugar Grove, West Virginia.

International stations

RAF Menwith Hill has the largest NSA presence in the United Kingdom.

Following the signing in 1946–1956 of the UKUSA Agreement between the United States, United Kingdom, Canada, Australia and New Zealand, who then cooperated on signals intelligence and ECHELON, NSA stations were built at GCHQ Bude in Morwenstow, United Kingdom; Geraldton, Pine Gap and Shoal Bay, Australia; Leitrim and Ottawa, Ontario, Canada; Misawa, Japan; and Waihopai and Tangimoana, New Zealand.

NSA operates RAF Menwith Hill in North Yorkshire, United Kingdom, which was, according to BBC News in 2007, the largest electronic monitoring station in the world. Planned in 1954, and opened in 1960, the base covered 562 acres (227 ha; 0.878 sq mi) in 1999.

The agency's European Cryptologic Center (ECC), with 240 employees in 2011, is headquartered at a US military compound in Griesheim, near Frankfurt in Germany. A 2011 NSA report indicates that the ECC is responsible for the "largest analysis and productivity in Europe" and focuses on various priorities, including Africa, Europe, the Middle East and counterterrorism operations.

In 2013, a new Consolidated Intelligence Center, also to be used by NSA, is being built at the headquarters of the United States Army Europe in Wiesbaden, Germany. NSA's partnership with Bundesnachrichtendienst (BND), the German foreign intelligence service, was confirmed by BND president Gerhard Schindler.

Thailand

Thailand is a "3rd party partner" of the NSA along with nine other nations. These are non-English-speaking countries that have made security agreements for the exchange of SIGINT raw material and end product reports.

Thailand is the site of at least two US SIGINT collection stations. One is at the US Embassy in Bangkok, a joint NSA-CIA Special Collection Service (SCS) unit. It presumably eavesdrops on foreign embassies, governmental communications, and other targets of opportunity.

The second installation is a FORNSAT (foreign satellite interception) station in the Thai city of Khon Kaen. It is codenamed INDRA, but has also been referred to as LEMONWOOD. The station is approximately 40 hectares (99 acres) in size and consists of a large 3,700–4,600 m2 (40,000–50,000 ft2) operations building on the west side of the ops compound and four radome-enclosed parabolic antennas. Possibly two of the radome-enclosed antennas are used for SATCOM intercept and two antennas used for relaying the intercepted material back to NSA. There is also a PUSHER-type circularly-disposed antenna array (CDAA) just north of the ops compound.

NSA activated Khon Kaen in October 1979. Its mission was to eavesdrop on the radio traffic of Chinese army and air force units in southern China, especially in and around the city of Kunming in Yunnan Province. Back in the late 1970s the base consisted only of a small CDAA antenna array that was remote-controlled via satellite from the NSA listening post at Kunia, Hawaii, and a small force of civilian contractors from Bendix Field Engineering Corp. whose job it was to keep the antenna array and satellite relay facilities up and running 24/7.

According to the papers of the late General William Odom, the INDRA facility was upgraded in 1986 with a new British-made PUSHER CDAA antenna as part of an overall upgrade of NSA and Thai SIGINT facilities whose objective was to spy on the neighboring communist nations of Vietnam, Laos, and Cambodia.

The base apparently fell into disrepair in the 1990s as China and Vietnam became more friendly towards the US, and by 2002 archived satellite imagery showed that the PUSHER CDAA antenna had been torn down, perhaps indicating that the base had been closed. At some point in the period since 9/11, the Khon Kaen base was reactivated and expanded to include a sizeable SATCOM intercept mission. It is likely that the NSA presence at Khon Kaen is relatively small, and that most of the work is done by civilian contractors.

Research and development

NSA has been involved in debates about public policy, both indirectly as a behind-the-scenes adviser to other departments, and directly during and after Vice Admiral Bobby Ray Inman's directorship. NSA was a major player in the debates of the 1990s regarding the export of cryptography in the United States. Restrictions on export were reduced but not eliminated in 1996.

Its secure government communications work has involved the NSA in numerous technology areas, including the design of specialized communications hardware and software, production of dedicated semiconductors (at the Ft. Meade chip fabrication plant), and advanced cryptography research. For 50 years, NSA designed and built most of its computer equipment in-house, but from the 1990s until about 2003 (when the U.S. Congress curtailed the practice), the agency contracted with the private sector in the fields of research and equipment.

Data Encryption Standard

FROSTBURG was the NSA's first supercomputer, used from 1991 to 1997

NSA was embroiled in some minor controversy concerning its involvement in the creation of the Data Encryption Standard (DES), a standard and public block cipher algorithm used by the U.S. government and banking community. During the development of DES by IBM in the 1970s, NSA recommended changes to some details of the design. There was suspicion that these changes had weakened the algorithm sufficiently to enable the agency to eavesdrop if required, including speculation that a critical component—the so-called S-boxes—had been altered to insert a "backdoor" and that the reduction in key length might have made it feasible for NSA to discover DES keys using massive computing power. It has since been observed that the S-boxes in DES are particularly resilient against differential cryptanalysis, a technique which was not publicly discovered until the late 1980s but known to the IBM DES team.

Advanced Encryption Standard

The involvement of NSA in selecting a successor to Data Encryption Standard (DES), the Advanced Encryption Standard (AES), was limited to hardware performance testing (see AES competition). NSA has subsequently certified AES for protection of classified information when used in NSA-approved systems.

NSA encryption systems

STU-III secure telephones on display at the National Cryptologic Museum

The NSA is responsible for the encryption-related components in these legacy systems:

  • FNBDT Future Narrow Band Digital Terminal
  • KL-7 ADONIS off-line rotor encryption machine (post-WWII – 1980s)
  • KW-26 ROMULUS electronic in-line teletypewriter encryptor (1960s–1980s)
  • KW-37 JASON fleet broadcast encryptor (1960s–1990s)
  • KY-57 VINSON tactical radio voice encryptor
  • KG-84 Dedicated Data Encryption/Decryption
  • STU-III secure telephone unit, phased out by the STE

The NSA oversees encryption in following systems which are in use today:

The NSA has specified Suite A and Suite B cryptographic algorithm suites to be used in U.S. government systems; the Suite B algorithms are a subset of those previously specified by NIST and are expected to serve for most information protection purposes, while the Suite A algorithms are secret and are intended for especially high levels of protection.

SHA

The widely used SHA-1 and SHA-2 hash functions were designed by NSA. SHA-1 is a slight modification of the weaker SHA-0 algorithm, also designed by NSA in 1993. This small modification was suggested by NSA two years later, with no justification other than the fact that it provides additional security. An attack for SHA-0 that does not apply to the revised algorithm was indeed found between 1998 and 2005 by academic cryptographers. Because of weaknesses and key length restrictions in SHA-1, NIST deprecates its use for digital signatures, and approves only the newer SHA-2 algorithms for such applications from 2013 on.

A new hash standard, SHA-3, has recently been selected through the competition concluded October 2, 2012 with the selection of Keccak as the algorithm. The process to select SHA-3 was similar to the one held in choosing the AES, but some doubts have been cast over it, since fundamental modifications have been made to Keccak in order to turn it into a standard. These changes potentially undermine the cryptanalysis performed during the competition and reduce the security levels of the algorithm.

Dual_EC_DRBG random number generator cryptotrojan

NSA promoted the inclusion of a random number generator called Dual EC DRBG in the U.S. National Institute of Standards and Technology's 2007 guidelines. This led to speculation of a backdoor which would allow NSA access to data encrypted by systems using that pseudorandom number generator (PRNG).

This is now deemed to be plausible based on the fact that output of next iterations of PRNG can provably be determined if relation between two internal Elliptic Curve points is known. Both NIST and RSA are now officially recommending against the use of this PRNG.

Clipper chip

Because of concerns that widespread use of strong cryptography would hamper government use of wiretaps, NSA proposed the concept of key escrow in 1993 and introduced the Clipper chip that would offer stronger protection than DES but would allow access to encrypted data by authorized law enforcement officials. The proposal was strongly opposed and key escrow requirements ultimately went nowhere. However, NSA's Fortezza hardware-based encryption cards, created for the Clipper project, are still used within government, and NSA ultimately declassified and published the design of the Skipjack cipher used on the cards.

Perfect Citizen

Perfect Citizen is a program to perform vulnerability assessment by the NSA on U.S. critical infrastructure. It was originally reported to be a program to develop a system of sensors to detect cyber attacks on critical infrastructure computer networks in both the private and public sector through a network monitoring system named Einstein. It is funded by the Comprehensive National Cybersecurity Initiative and thus far Raytheon has received a contract for up to $100 million for the initial stage.

Academic research

NSA has invested many millions of dollars in academic research under grant code prefix MDA904, resulting in over 3,000 papers as of October 11, 2007. NSA/CSS has, at times, attempted to restrict the publication of academic research into cryptography; for example, the Khufu and Khafre block ciphers were voluntarily withheld in response to an NSA request to do so. In response to a FOIA lawsuit, in 2013 the NSA released the 643-page research paper titled, "Untangling the Web: A Guide to Internet Research," written and compiled by NSA employees to assist other NSA workers in searching for information of interest to the agency on the public Internet.

Patents

NSA has the ability to file for a patent from the U.S. Patent and Trademark Office under gag order. Unlike normal patents, these are not revealed to the public and do not expire. However, if the Patent Office receives an application for an identical patent from a third party, they will reveal NSA's patent and officially grant it to NSA for the full term on that date.

One of NSA's published patents describes a method of geographically locating an individual computer site in an Internet-like network, based on the latency of multiple network connections. Although no public patent exists, NSA is reported to have used a similar locating technology called trilateralization that allows real-time tracking of an individual's location, including altitude from ground level, using data obtained from cellphone towers.

Insignia and memorials

Seal of the U.S. National Security Agency.svg

The heraldic insignia of NSA consists of an eagle inside a circle, grasping a key in its talons. The eagle represents the agency's national mission. Its breast features a shield with bands of red and white, taken from the Great Seal of the United States and representing Congress. The key is taken from the emblem of Saint Peter and represents security.

When the NSA was created, the agency had no emblem and used that of the Department of Defense. The agency adopted its first of two emblems in 1963. The current NSA insignia has been in use since 1965, when then-Director, LTG Marshall S. Carter (USA) ordered the creation of a device to represent the agency.

The NSA's flag consists of the agency's seal on a light blue background.

National Cryptologic Memorial

Crews associated with NSA missions have been involved in a number of dangerous and deadly situations. The USS Liberty incident in 1967 and USS Pueblo incident in 1968 are examples of the losses endured during the Cold War.

The National Security Agency/Central Security Service Cryptologic Memorial honors and remembers the fallen personnel, both military and civilian, of these intelligence missions. It is made of black granite, and has 171 names carved into it, as of 2013. It is located at NSA headquarters. A tradition of declassifying the stories of the fallen was begun in 2001.

Controversy and litigation

In the United States, at least since 2001, there has been legal controversy over what signal intelligence can be used for and how much freedom the National Security Agency has to use signal intelligence. In 2015, the government made slight changes in how it uses and collects certain types of data, specifically phone records. The government was not analyzing the phone records as of early 2019. The surveillance programs were deemed unlawful in September 2020 in a court of appeals case. 

Warrantless wiretaps

On December 16, 2005, The New York Times reported that, under White House pressure and with an executive order from President George W. Bush, the National Security Agency, in an attempt to thwart terrorism, had been tapping phone calls made to persons outside the country, without obtaining warrants from the United States Foreign Intelligence Surveillance Court, a secret court created for that purpose under the Foreign Intelligence Surveillance Act (FISA).

One such surveillance program, authorized by the U.S. Signals Intelligence Directive 18 of President George Bush, was the Highlander Project undertaken for the National Security Agency by the U.S. Army 513th Military Intelligence Brigade. NSA relayed telephone (including cell phone) conversations obtained from ground, airborne, and satellite monitoring stations to various U.S. Army Signal Intelligence Officers, including the 201st Military Intelligence Battalion. Conversations of citizens of the U.S. were intercepted, along with those of other nations.

Proponents of the surveillance program claim that the President has executive authority to order such action, arguing that laws such as FISA are overridden by the President's Constitutional powers. In addition, some argued that FISA was implicitly overridden by a subsequent statute, the Authorization for Use of Military Force, although the Supreme Court's ruling in Hamdan v. Rumsfeld deprecates this view. In the August 2006 case ACLU v. NSA, U.S. District Court Judge Anna Diggs Taylor concluded that NSA's warrantless surveillance program was both illegal and unconstitutional. On July 6, 2007, the 6th Circuit Court of Appeals vacated the decision on the grounds that the ACLU lacked standing to bring the suit.

On January 17, 2006, the Center for Constitutional Rights filed a lawsuit, CCR v. Bush, against the George W. Bush Presidency. The lawsuit challenged the National Security Agency's (NSA's) surveillance of people within the U.S., including the interception of CCR emails without securing a warrant first.

In September 2008, the Electronic Frontier Foundation (EFF) filed a class action lawsuit against the NSA and several high-ranking officials of the Bush administration, charging an "illegal and unconstitutional program of dragnet communications surveillance," based on documentation provided by former AT&T technician Mark Klein.

As a result of the USA Freedom Act passed by Congress in June 2015, the NSA had to shut down its bulk phone surveillance program on November 29 of the same year. The USA Freedom Act forbids the NSA to collect metadata and content of phone calls unless it has a warrant for terrorism investigation. In that case the agency has to ask the telecom companies for the record, which will only be kept for six months. The NSA's use of large telecom companies to assist it with its surveillance efforts has caused several privacy concerns.

AT&T Internet monitoring

In May 2008, Mark Klein, a former AT&T employee, alleged that his company had cooperated with NSA in installing Narus hardware to replace the FBI Carnivore program, to monitor network communications including traffic between U.S. citizens.

Data mining

NSA was reported in 2008 to use its computing capability to analyze "transactional" data that it regularly acquires from other government agencies, which gather it under their own jurisdictional authorities. As part of this effort, NSA now monitors huge volumes of records of domestic email data, web addresses from Internet searches, bank transfers, credit-card transactions, travel records, and telephone data, according to current and former intelligence officials interviewed by The Wall Street Journal. The sender, recipient, and subject line of emails can be included, but the content of the messages or of phone calls are not.

A 2013 advisory group for the Obama administration, seeking to reform NSA spying programs following the revelations of documents released by Edward J. Snowden. One mentioned in 'Recommendation 30' on page 37, "...that the National Security Council staff should manage an interagency process to review on a regular basis the activities of the US Government regarding attacks that exploit a previously unknown vulnerability in a computer application." Retired cyber security expert Richard A. Clarke was a group member and stated on April 11, 2014 that NSA had no advance knowledge of Heartbleed.

Illegally obtained evidence

In August 2013 it was revealed that a 2005 IRS training document showed that NSA intelligence intercepts and wiretaps, both foreign and domestic, were being supplied to the Drug Enforcement Administration (DEA) and Internal Revenue Service (IRS) and were illegally used to launch criminal investigations of US citizens. Law enforcement agents were directed to conceal how the investigations began and recreate an apparently legal investigative trail by re-obtaining the same evidence by other means.

Barack Obama administration

In the months leading to April 2009, the NSA intercepted the communications of U.S. citizens, including a Congressman, although the Justice Department believed that the interception was unintentional. The Justice Department then took action to correct the issues and bring the program into compliance with existing laws. United States Attorney General Eric Holder resumed the program according to his understanding of the Foreign Intelligence Surveillance Act amendment of 2008, without explaining what had occurred.

Polls conducted in June 2013 found divided results among Americans regarding NSA's secret data collection. Rasmussen Reports found that 59% of Americans disapprove, Gallup found that 53% disapprove, and Pew found that 56% are in favor of NSA data collection.

Section 215 metadata collection

On April 25, 2013, the NSA obtained a court order requiring Verizon's Business Network Services to provide metadata on all calls in its system to the NSA "on an ongoing daily basis" for a three-month period, as reported by The Guardian on June 6, 2013. This information includes "the numbers of both parties on a call ... location data, call duration, unique identifiers, and the time and duration of all calls" but not "[t]he contents of the conversation itself". The order relies on the so-called "business records" provision of the Patriot Act.

In August 2013, following the Snowden leaks, new details about the NSA's data mining activity were revealed. Reportedly, the majority of emails into or out of the United States are captured at "selected communications links" and automatically analyzed for keywords or other "selectors". Emails that do not match are deleted.

The utility of such a massive metadata collection in preventing terrorist attacks is disputed. Many studies reveal the dragnet like system to be ineffective. One such report, released by the New America Foundation concluded that after an analysis of 225 terrorism cases, the NSA "had no discernible impact on preventing acts of terrorism."

Defenders of the program said that while metadata alone cannot provide all the information necessary to prevent an attack, it assures the ability to "connect the dots" between suspect foreign numbers and domestic numbers with a speed only the NSA's software is capable of. One benefit of this is quickly being able to determine the difference between suspicious activity and real threats. As an example, NSA director General Keith B. Alexander mentioned at the annual Cybersecurity Summit in 2013, that metadata analysis of domestic phone call records after the Boston Marathon bombing helped determine that rumors of a follow-up attack in New York were baseless.

In addition to doubts about its effectiveness, many people argue that the collection of metadata is an unconstitutional invasion of privacy. As of 2015, the collection process remains legal and grounded in the ruling from Smith v. Maryland (1979). A prominent opponent of the data collection and its legality is U.S. District Judge Richard J. Leon, who issued a report in 2013 in which he stated: "I cannot imagine a more 'indiscriminate' and 'arbitrary invasion' than this systematic and high tech collection and retention of personal data on virtually every single citizen for purposes of querying and analyzing it without prior judicial approval...Surely, such a program infringes on 'that degree of privacy' that the founders enshrined in the Fourth Amendment".

As of May 7, 2015, the United States Court of Appeals for the Second Circuit ruled that the interpretation of Section 215 of the Patriot Act was wrong and that the NSA program that has been collecting Americans' phone records in bulk is illegal. It stated that Section 215 cannot be clearly interpreted to allow government to collect national phone data and, as a result, expired on June 1, 2015. This ruling "is the first time a higher-level court in the regular judicial system has reviewed the N.S.A. phone records program." The replacement law known as the USA Freedom Act, which will enable the NSA to continue to have bulk access to citizens' metadata but with the stipulation that the data will now be stored by the companies themselves. This change will not have any effect on other Agency procedures - outside of metadata collection - which have purportedly challenged Americans' Fourth Amendment rights;, including Upstream collection, a mass of techniques used by the Agency to collect and store American's data/communications directly from the Internet backbone.

Under the Upstream collection program, the NSA paid telecommunications companies hundreds of millions of dollars in order to collect data from them. While companies such as Google and Yahoo! claim that they do not provide "direct access" from their servers to the NSA unless under a court order, the NSA had access to emails, phone calls and cellular data users. Under this new ruling, telecommunications companies maintain bulk user metadata on their servers for at least 18 months, to be provided upon request to the NSA. This ruling made the mass storage of specific phone records at NSA datacenters illegal, but it did not rule on Section 215's constitutionality.

Fourth Amendment encroachment

In a declassified document it was revealed that 17,835 phone lines were on an improperly permitted "alert list" from 2006 to 2009 in breach of compliance, which tagged these phone lines for daily monitoring. Eleven percent of these monitored phone lines met the agency's legal standard for "reasonably articulable suspicion" (RAS).

The NSA tracks the locations of hundreds of millions of cellphones per day, allowing it to map people's movements and relationships in detail. The NSA has been reported to have access to all communications made via Google, Microsoft, Facebook, Yahoo, YouTube, AOL, Skype, Apple and Paltalk, and collects hundreds of millions of contact lists from personal email and instant messaging accounts each year. It has also managed to weaken much of the encryption used on the Internet (by collaborating with, coercing or otherwise infiltrating numerous technology companies to leave "backdoors" into their systems), so that the majority of encryption is inadvertently vulnerable to different forms of attack.

Domestically, the NSA has been proven to collect and store metadata records of phone calls, including over 120 million US Verizon subscribers, as well as intercept vast amounts of communications via the internet (Upstream). The government's legal standing had been to rely on a secret interpretation of the Patriot Act whereby the entirety of US communications may be considered "relevant" to a terrorism investigation if it is expected that even a tiny minority may relate to terrorism. The NSA also supplies foreign intercepts to the DEA, IRS and other law enforcement agencies, who use these to initiate criminal investigations. Federal agents are then instructed to "recreate" the investigative trail via parallel construction.

The NSA also spies on influential Muslims to obtain information that could be used to discredit them, such as their use of pornography. The targets, both domestic and abroad, are not suspected of any crime but hold religious or political views deemed "radical" by the NSA.

According to a report in The Washington Post in July 2014, relying on information provided by Snowden, 90% of those placed under surveillance in the U.S. are ordinary Americans, and are not the intended targets. The newspaper said it had examined documents including emails, text messages, and online accounts that support the claim.

Congressional oversight

Excerpt of James Clapper's testimony before the Senate Select Committee on Intelligence

Despite White House claims that these programs have congressional oversight, many members of Congress were unaware of the existence of these NSA programs or the secret interpretation of the Patriot Act, and have consistently been denied access to basic information about them. The United States Foreign Intelligence Surveillance Court, the secret court charged with regulating the NSA's activities is, according to its chief judge, incapable of investigating or verifying how often the NSA breaks even its own secret rules. It has since been reported that the NSA violated its own rules on data access thousands of times a year, many of these violations involving large-scale data interceptions. NSA officers have even used data intercepts to spy on love interests; "most of the NSA violations were self-reported, and each instance resulted in administrative action of termination."

The NSA has "generally disregarded the special rules for disseminating United States person information" by illegally sharing its intercepts with other law enforcement agencies. A March 2009 FISA Court opinion, which the court released, states that protocols restricting data queries had been "so frequently and systemically violated that it can be fairly said that this critical element of the overall ... regime has never functioned effectively." In 2011 the same court noted that the "volume and nature" of the NSA's bulk foreign Internet intercepts was "fundamentally different from what the court had been led to believe". Email contact lists (including those of US citizens) are collected at numerous foreign locations to work around the illegality of doing so on US soil.

Legal opinions on the NSA's bulk collection program have differed. In mid-December 2013, U.S. District Judge Richard Leon ruled that the "almost-Orwellian" program likely violates the Constitution, and wrote, "I cannot imagine a more 'indiscriminate' and 'arbitrary invasion' than this systematic and high-tech collection and retention of personal data on virtually every single citizen for purposes of querying and analyzing it without prior judicial approval. Surely, such a program infringes on 'that degree of privacy' that the Founders enshrined in the Fourth Amendment. Indeed, I have little doubt that the author of our Constitution, James Madison, who cautioned us to beware 'the abridgement of freedom of the people by gradual and silent encroachments by those in power,' would be aghast."

Later that month, U.S. District Judge William Pauley ruled that the NSA's collection of telephone records is legal and valuable in the fight against terrorism. In his opinion, he wrote, "a bulk telephony metadata collection program [is] a wide net that could find and isolate gossamer contacts among suspected terrorists in an ocean of seemingly disconnected data" and noted that a similar collection of data prior to 9/11 might have prevented the attack.

Official responses

At a March 2013 Senate Intelligence Committee hearing, Senator Ron Wyden asked Director of National Intelligence James Clapper, "does the NSA collect any type of data at all on millions or hundreds of millions of Americans?" Clapper replied "No, sir. ... Not wittingly. There are cases where they could inadvertently perhaps collect, but not wittingly." This statement came under scrutiny months later, in June 2013, details of the PRISM surveillance program were published, showing that "the NSA apparently can gain access to the servers of nine Internet companies for a wide range of digital data." Wyden said that Clapper had failed to give a "straight answer" in his testimony. Clapper, in response to criticism, said, "I responded in what I thought was the most truthful, or least untruthful manner." Clapper added, "There are honest differences on the semantics of what -- when someone says ‘collection’ to me, that has a specific meaning, which may have a different meaning to him."

NSA whistle-blower Edward Snowden additionally revealed the existence of XKeyscore, a top secret NSA program that allows the agency to search vast databases of "the metadata as well as the content of emails and other internet activity, such as browser history," with capability to search by "name, telephone number, IP address, keywords, the language in which the internet activity was conducted or the type of browser used." XKeyscore "provides the technological capability, if not the legal authority, to target even US persons for extensive electronic surveillance without a warrant provided that some identifying information, such as their email or IP address, is known to the analyst."

Regarding the necessity of these NSA programs, Alexander stated on June 27 2013 that the NSA's bulk phone and Internet intercepts had been instrumental in preventing 54 terrorist "events", including 13 in the US, and in all but one of these cases had provided the initial tip to "unravel the threat stream". On July 31 NSA Deputy Director John Inglis conceded to the Senate that these intercepts had not been vital in stopping any terrorist attacks, but were "close" to vital in identifying and convicting four San Diego men for sending US$8,930 to Al-Shabaab, a militia that conducts terrorism in Somalia.

The U.S. government has aggressively sought to dismiss and challenge Fourth Amendment cases raised against it, and has granted retroactive immunity to ISPs and telecoms participating in domestic surveillance.

The U.S. military has acknowledged blocking access to parts of The Guardian website for thousands of defense personnel across the country, and blocking the entire Guardian website for personnel stationed throughout Afghanistan, the Middle East, and South Asia.

An October 2014 United Nations report condemned mass surveillance by the United States and other countries as violating multiple international treaties and conventions that guarantee core privacy rights.

Responsibility for international ransomware attack

An exploit dubbed EternalBlue, which was claimed to have been created by the NSA by hacker group The Shadow Brokers and whistleblower Edward Snowden, was used in the unprecedented worldwide WannaCry ransomware attack in May 2017. The exploit had been leaked online by a hacking group, The Shadow Brokers, nearly a month prior to the attack. A number of experts have pointed the finger at the NSA's non-disclosure of the underlying vulnerability, and their loss of control over the EternalBlue attack tool that exploited it. Edward Snowden said that if the NSA had "privately disclosed the flaw used to attack hospitals when they found it, not when they lost it, [the attack] might not have happened". Wikipedia co-founder, Jimmy Wales, stated that he joined "with Microsoft and the other leaders of the industry in saying this is a huge screw-up by the government ... the moment the NSA found it, they should have notified Microsoft so they could quietly issue a patch and really chivvy people along, long before it became a huge problem."

 

Data mining

From Wikipedia, the free encyclopedia
 
Data mining is a process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and statistics with an overall goal to extract information (with intelligent methods) from a data set and transform the information into a comprehensible structure for further use. Data mining is the analysis step of the "knowledge discovery in databases" process, or KDD. Aside from the raw analysis step, it also involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.

Neurohacking is a subclass of biohacking, focused specifically on the brain. Neurohackers seek to better themselves or others by “hacking the brain” to improve reflexes, learn faster, or treat psychological disorders. The modern neurohacking movement has been around since the 1980s. However, herbal supplements have been used to increase brain function for hundreds of years. After a brief period marked by a lack of research in the area, neurohacking started regaining interest in the early 2000s. Currently, most neurohacking is performed via do-it-yourself (DIY) methods by in-home users.

Simple uses of neurohacking include the use of chemical supplements to increase brain function. More complex medical devices can be implanted to treat psychological disorders and illnesses.

The term "data mining" is a misnomer, because the goal is the extraction of patterns and knowledge from large amounts of data, not the extraction (mining) of data itself. It also is a buzzword and is frequently applied to any form of large-scale data or information processing (collection, extraction, warehousing, analysis, and statistics) as well as any application of computer decision support system, including artificial intelligence (e.g., machine learning) and business intelligence. The book Data mining: Practical machine learning tools and techniques with Java (which covers mostly machine learning material) was originally to be named just Practical machine learning, and the term data mining was only added for marketing reasons. Often the more general terms (large scale) data analysis and analytics—or, when referring to actual methods, artificial intelligence and machine learning—are more appropriate.

The actual data mining task is the semi-automatic or automatic analysis of large quantities of data to extract previously unknown, interesting patterns such as groups of data records (cluster analysis), unusual records (anomaly detection), and dependencies (association rule mining, sequential pattern mining). This usually involves using database techniques such as spatial indices. These patterns can then be seen as a kind of summary of the input data, and may be used in further analysis or, for example, in machine learning and predictive analytics. For example, the data mining step might identify multiple groups in the data, which can then be used to obtain more accurate prediction results by a decision support system. Neither the data collection, data preparation, nor result interpretation and reporting is part of the data mining step, but do belong to the overall KDD process as additional steps.

The difference between data analysis and data mining is that data analysis is used to test models and hypotheses on the dataset, e.g., analyzing the effectiveness of a marketing campaign, regardless of the amount of data; in contrast, data mining uses machine learning and statistical models to uncover clandestine or hidden patterns in a large volume of data.

The related terms data dredging, data fishing, and data snooping refer to the use of data mining methods to sample parts of a larger population data set that are (or may be) too small for reliable statistical inferences to be made about the validity of any patterns discovered. These methods can, however, be used in creating new hypotheses to test against the larger data populations.

Etymology

In the 1960s, statisticians and economists used terms like data fishing or data dredging to refer to what they considered the bad practice of analyzing data without an a-priori hypothesis. The term "data mining" was used in a similarly critical way by economist Michael Lovell in an article published in the Review of Economic Studies in 1983. Lovell indicates that the practice "masquerades under a variety of aliases, ranging from "experimentation" (positive) to "fishing" or "snooping" (negative).

The term data mining appeared around 1990 in the database community, generally with positive connotations. For a short time in 1980s, a phrase "database mining"™, was used, but since it was trademarked by HNC, a San Diego-based company, to pitch their Database Mining Workstation; researchers consequently turned to data mining. Other terms used include data archaeology, information harvesting, information discovery, knowledge extraction, etc. Gregory Piatetsky-Shapiro coined the term "knowledge discovery in databases" for the first workshop on the same topic (KDD-1989) and this term became more popular in AI and machine learning community. However, the term data mining became more popular in the business and press communities. Currently, the terms data mining and knowledge discovery are used interchangeably.

In the academic community, the major forums for research started in 1995 when the First International Conference on Data Mining and Knowledge Discovery (KDD-95) was started in Montreal under AAAI sponsorship. It was co-chaired by Usama Fayyad and Ramasamy Uthurusamy. A year later, in 1996, Usama Fayyad launched the journal by Kluwer called Data Mining and Knowledge Discovery as its founding editor-in-chief. Later he started the SIGKDD Newsletter SIGKDD Explorations. The KDD International conference became the primary highest quality conference in data mining with an acceptance rate of research paper submissions below 18%. The journal Data Mining and Knowledge Discovery is the primary research journal of the field.

Background

The manual extraction of patterns from data has occurred for centuries. Early methods of identifying patterns in data include Bayes' theorem (1700s) and regression analysis (1800s). The proliferation, ubiquity and increasing power of computer technology have dramatically increased data collection, storage, and manipulation ability. As data sets have grown in size and complexity, direct "hands-on" data analysis has increasingly been augmented with indirect, automated data processing, aided by other discoveries in computer science, specially in the field of machine learning, such as neural networks, cluster analysis, genetic algorithms (1950s), decision trees and decision rules (1960s), and support vector machines (1990s). Data mining is the process of applying these methods with the intention of uncovering hidden patterns. in large data sets. It bridges the gap from applied statistics and artificial intelligence (which usually provide the mathematical background) to database management by exploiting the way data is stored and indexed in databases to execute the actual learning and discovery algorithms more efficiently, allowing such methods to be applied to ever-larger data sets.

Process

The knowledge discovery in databases (KDD) process is commonly defined with the stages:

  1. Selection
  2. Pre-processing
  3. Transformation
  4. Data mining
  5. Interpretation/evaluation.

It exists, however, in many variations on this theme, such as the Cross-industry standard process for data mining (CRISP-DM) which defines six phases:

  1. Business understanding
  2. Data understanding
  3. Data preparation
  4. Modeling
  5. Evaluation
  6. Deployment

or a simplified process such as (1) Pre-processing, (2) Data Mining, and (3) Results Validation.

Polls conducted in 2002, 2004, 2007 and 2014 show that the CRISP-DM methodology is the leading methodology used by data miners. The only other data mining standard named in these polls was SEMMA. However, 3–4 times as many people reported using CRISP-DM. Several teams of researchers have published reviews of data mining process models, and Azevedo and Santos conducted a comparison of CRISP-DM and SEMMA in 2008.

Pre-processing

Before data mining algorithms can be used, a target data set must be assembled. As data mining can only uncover patterns actually present in the data, the target data set must be large enough to contain these patterns while remaining concise enough to be mined within an acceptable time limit. A common source for data is a data mart or data warehouse. Pre-processing is essential to analyze the multivariate data sets before data mining. The target set is then cleaned. Data cleaning removes the observations containing noise and those with missing data.

Data mining

Data mining involves six common classes of tasks:

  • Anomaly detection (outlier/change/deviation detection) – The identification of unusual data records, that might be interesting or data errors that require further investigation.
  • Association rule learning (dependency modeling) – Searches for relationships between variables. For example, a supermarket might gather data on customer purchasing habits. Using association rule learning, the supermarket can determine which products are frequently bought together and use this information for marketing purposes. This is sometimes referred to as market basket analysis.
  • Clustering – is the task of discovering groups and structures in the data that are in some way or another "similar", without using known structures in the data.
  • Classification – is the task of generalizing known structure to apply to new data. For example, an e-mail program might attempt to classify an e-mail as "legitimate" or as "spam".
  • Regression – attempts to find a function that models the data with the least error that is, for estimating the relationships among data or datasets.
  • Summarization – providing a more compact representation of the data set, including visualization and report generation.

Results validation

An example of data produced by data dredging through a bot operated by statistician Tyler Vigen, apparently showing a close link between the best word winning a spelling bee competition and the number of people in the United States killed by venomous spiders. The similarity in trends is obviously a coincidence.

Data mining can unintentionally be misused, and can then produce results that appear to be significant; but which do not actually predict future behavior and cannot be reproduced on a new sample of data and bear little use. Often this results from investigating too many hypotheses and not performing proper statistical hypothesis testing. A simple version of this problem in machine learning is known as overfitting, but the same problem can arise at different phases of the process and thus a train/test split—when applicable at all—may not be sufficient to prevent this from happening.

The final step of knowledge discovery from data is to verify that the patterns produced by the data mining algorithms occur in the wider data set. Not all patterns found by data mining algorithms are necessarily valid. It is common for data mining algorithms to find patterns in the training set which are not present in the general data set. This is called overfitting. To overcome this, the evaluation uses a test set of data on which the data mining algorithm was not trained. The learned patterns are applied to this test set, and the resulting output is compared to the desired output. For example, a data mining algorithm trying to distinguish "spam" from "legitimate" emails would be trained on a training set of sample e-mails. Once trained, the learned patterns would be applied to the test set of e-mails on which it had not been trained. The accuracy of the patterns can then be measured from how many e-mails they correctly classify. Several statistical methods may be used to evaluate the algorithm, such as ROC curves.

If the learned patterns do not meet the desired standards, subsequently it is necessary to re-evaluate and change the pre-processing and data mining steps. If the learned patterns do meet the desired standards, then the final step is to interpret the learned patterns and turn them into knowledge.

Research

The premier professional body in the field is the Association for Computing Machinery's (ACM) Special Interest Group (SIG) on Knowledge Discovery and Data Mining (SIGKDD). Since 1989, this ACM SIG has hosted an annual international conference and published its proceedings, and since 1999 it has published a biannual academic journal titled "SIGKDD Explorations".

Computer science conferences on data mining include:

Data mining topics are also present on many data management/database conferences such as the ICDE Conference, SIGMOD Conference and International Conference on Very Large Data Bases

Standards

There have been some efforts to define standards for the data mining process, for example, the 1999 European Cross Industry Standard Process for Data Mining (CRISP-DM 1.0) and the 2004 Java Data Mining standard (JDM 1.0). Development on successors to these processes (CRISP-DM 2.0 and JDM 2.0) was active in 2006 but has stalled since. JDM 2.0 was withdrawn without reaching a final draft.

For exchanging the extracted models—in particular for use in predictive analytics—the key standard is the Predictive Model Markup Language (PMML), which is an XML-based language developed by the Data Mining Group (DMG) and supported as exchange format by many data mining applications. As the name suggests, it only covers prediction models, a particular data mining task of high importance to business applications. However, extensions to cover (for example) subspace clustering have been proposed independently of the DMG.

Notable uses

Data mining is used wherever there is digital data available today. Notable examples of data mining can be found throughout business, medicine, science, and surveillance.

Privacy concerns and ethics

While the term "data mining" itself may have no ethical implications, it is often associated with the mining of information in relation to peoples' behavior (ethical and otherwise).

The ways in which data mining can be used can in some cases and contexts raise questions regarding privacy, legality, and ethics. In particular, data mining government or commercial data sets for national security or law enforcement purposes, such as in the Total Information Awareness Program or in ADVISE, has raised privacy concerns.

Data mining requires data preparation which uncovers information or patterns which compromise confidentiality and privacy obligations. A common way for this to occur is through data aggregation. Data aggregation involves combining data together (possibly from various sources) in a way that facilitates analysis (but that also might make identification of private, individual-level data deducible or otherwise apparent). This is not data mining per se, but a result of the preparation of data before—and for the purposes of—the analysis. The threat to an individual's privacy comes into play when the data, once compiled, cause the data miner, or anyone who has access to the newly compiled data set, to be able to identify specific individuals, especially when the data were originally anonymous.

It is recommended to be aware of the following before data are collected:

  • The purpose of the data collection and any (known) data mining projects;
  • How the data will be used;
  • Who will be able to mine the data and use the data and their derivatives;
  • The status of security surrounding access to the data;
  • How collected data can be updated.

Data may also be modified so as to become anonymous, so that individuals may not readily be identified. However, even "anonymized" data sets can potentially contain enough information to allow identification of individuals, as occurred when journalists were able to find several individuals based on a set of search histories that were inadvertently released by AOL.

The inadvertent revelation of personally identifiable information leading to the provider violates Fair Information Practices. This indiscretion can cause financial, emotional, or bodily harm to the indicated individual. In one instance of privacy violation, the patrons of Walgreens filed a lawsuit against the company in 2011 for selling prescription information to data mining companies who in turn provided the data to pharmaceutical companies.

Situation in Europe

Europe has rather strong privacy laws, and efforts are underway to further strengthen the rights of the consumers. However, the U.S.–E.U. Safe Harbor Principles, developed between 1998 and 2000, currently effectively expose European users to privacy exploitation by U.S. companies. As a consequence of Edward Snowden's global surveillance disclosure, there has been increased discussion to revoke this agreement, as in particular the data will be fully exposed to the National Security Agency, and attempts to reach an agreement with the United States have failed.

In the United Kingdom in particular there have been cases of corporations using data mining as a way to target certain groups of customers forcing them to pay unfairly high prices. These groups tend to be people of lower socio-economic status who are not savvy to the ways they can be exploited in digital market places.

Situation in the United States

In the United States, privacy concerns have been addressed by the US Congress via the passage of regulatory controls such as the Health Insurance Portability and Accountability Act (HIPAA). The HIPAA requires individuals to give their "informed consent" regarding information they provide and its intended present and future uses. According to an article in Biotech Business Week, "'[i]n practice, HIPAA may not offer any greater protection than the longstanding regulations in the research arena,' says the AAHC. More importantly, the rule's goal of protection through informed consent is approach a level of incomprehensibility to average individuals." This underscores the necessity for data anonymity in data aggregation and mining practices.

U.S. information privacy legislation such as HIPAA and the Family Educational Rights and Privacy Act (FERPA) applies only to the specific areas that each such law addresses. The use of data mining by the majority of businesses in the U.S. is not controlled by any legislation.

Copyright law

Situation in Europe

Under European copyright and database laws, the mining of in-copyright works (such as by web mining) without the permission of the copyright owner is not legal. Where a database is pure data in Europe, it may be that there is no copyright—but database rights may exist so data mining becomes subject to intellectual property owners' rights that are protected by the Database Directive. On the recommendation of the Hargreaves review, this led to the UK government to amend its copyright law in 2014 to allow content mining as a limitation and exception. The UK was the second country in the world to do so after Japan, which introduced an exception in 2009 for data mining. However, due to the restriction of the Information Society Directive (2001), the UK exception only allows content mining for non-commercial purposes. UK copyright law also does not allow this provision to be overridden by contractual terms and conditions.

The European Commission facilitated stakeholder discussion on text and data mining in 2013, under the title of Licences for Europe. The focus on the solution to this legal issue, such as licensing rather than limitations and exceptions, led to representatives of universities, researchers, libraries, civil society groups and open access publishers to leave the stakeholder dialogue in May 2013.

Situation in the United States

US copyright law, and in particular its provision for fair use, upholds the legality of content mining in America, and other fair use countries such as Israel, Taiwan and South Korea. As content mining is transformative, that is it does not supplant the original work, it is viewed as being lawful under fair use. For example, as part of the Google Book settlement the presiding judge on the case ruled that Google's digitization project of in-copyright books was lawful, in part because of the transformative uses that the digitization project displayed—one being text and data mining.

Software

Free open-source data mining software and applications

The following applications are available under free/open-source licenses. Public access to application source code is also available.

  • Carrot2: Text and search results clustering framework.
  • Chemicalize.org: A chemical structure miner and web search engine.
  • ELKI: A university research project with advanced cluster analysis and outlier detection methods written in the Java language.
  • GATE: a natural language processing and language engineering tool.
  • KNIME: The Konstanz Information Miner, a user-friendly and comprehensive data analytics framework.
  • Massive Online Analysis (MOA): a real-time big data stream mining with concept drift tool in the Java programming language.
  • MEPX - cross-platform tool for regression and classification problems based on a Genetic Programming variant.
  • ML-Flex: A software package that enables users to integrate with third-party machine-learning packages written in any programming language, execute classification analyses in parallel across multiple computing nodes, and produce HTML reports of classification results.
  • mlpack: a collection of ready-to-use machine learning algorithms written in the C++ language.
  • NLTK (Natural Language Toolkit): A suite of libraries and programs for symbolic and statistical natural language processing (NLP) for the Python language.
  • OpenNN: Open neural networks library.
  • Orange: A component-based data mining and machine learning software suite written in the Python language.
  • R: A programming language and software environment for statistical computing, data mining, and graphics. It is part of the GNU Project.
  • scikit-learn is an open-source machine learning library for the Python programming language
  • Torch: An open-source deep learning library for the Lua programming language and scientific computing framework with wide support for machine learning algorithms.
  • UIMA: The UIMA (Unstructured Information Management Architecture) is a component framework for analyzing unstructured content such as text, audio and video – originally developed by IBM.
  • Weka: A suite of machine learning software applications written in the Java programming language.

Proprietary data-mining software and applications

The following applications are available under proprietary licenses.

 

Educational data mining

From Wikipedia, the free encyclopedia

Educational data mining (EDM) describes a research field concerned with the application of data mining, machine learning and statistics to information generated from educational settings (e.g., universities and intelligent tutoring systems). At a high level, the field seeks to develop and improve methods for exploring this data, which often has multiple levels of meaningful hierarchy, in order to discover new insights about how people learn in the context of such settings. In doing so, EDM has contributed to theories of learning investigated by researchers in educational psychology and the learning sciences. The field is closely tied to that of learning analytics, and the two have been compared and contrasted.

Definition

Educational data mining refers to techniques, tools, and research designed for automatically extracting meaning from large repositories of data generated by or related to people's learning activities in educational settings. Quite often, this data is extensive, fine-grained, and precise. For example, several learning management systems (LMSs) track information such as when each student accessed each learning object, how many times they accessed it, and how many minutes the learning object was displayed on the user's computer screen. As another example, intelligent tutoring systems record data every time a learner submits a solution to a problem. They may collect the time of the submission, whether or not the solution matches the expected solution, the amount of time that has passed since the last submission, the order in which solution components were entered into the interface, etc. The precision of this data is such that even a fairly short session with a computer-based learning environment (e.g. 30 minutes) may produce a large amount of process data for analysis.

In other cases, the data is less fine-grained. For example, a student's university transcript may contain a temporally ordered list of courses taken by the student, the grade that the student earned in each course, and when the student selected or changed his or her academic major. EDM leverages both types of data to discover meaningful information about different types of learners and how they learn, the structure of domain knowledge, and the effect of instructional strategies embedded within various learning environments. These analyses provide new information that would be difficult to discern by looking at the raw data. For example, analyzing data from an LMS may reveal a relationship between the learning objects that a student accessed during the course and their final course grade. Similarly, analyzing student transcript data may reveal a relationship between a student's grade in a particular course and their decision to change their academic major. Such information provides insight into the design of learning environments, which allows students, teachers, school administrators, and educational policy makers to make informed decisions about how to interact with, provide, and manage educational resources.

History

While the analysis of educational data is not itself a new practice, recent advances in educational technology, including the increase in computing power and the ability to log fine-grained data about students' use of a computer-based learning environment, have led to an increased interest in developing techniques for analyzing the large amounts of data generated in educational settings. This interest translated into a series of EDM workshops held from 2000 to 2007 as part of several international research conferences. In 2008, a group of researchers established what has become an annual international research conference on EDM, the first of which took place in Montreal, Quebec, Canada.

As interest in EDM continued to increase, EDM researchers established an academic journal in 2009, the Journal of Educational Data Mining, for sharing and disseminating research results. In 2011, EDM researchers established the International Educational Data Mining Society to connect EDM researchers and continue to grow the field.

With the introduction of public educational data repositories in 2008, such as the Pittsburgh Science of Learning Centre's (PSLC) DataShop and the National Center for Education Statistics (NCES), public data sets have made educational data mining more accessible and feasible, contributing to its growth.

Goals

Ryan S. Baker and Kalina Yacef identified the following four goals of EDM:

  1. Predicting students' future learning behavior – With the use of student modeling, this goal can be achieved by creating student models that incorporate the learner's characteristics, including detailed information such as their knowledge, behaviours and motivation to learn. The user experience of the learner and their overall satisfaction with learning are also measured.
  2. Discovering or improving domain models – Through the various methods and applications of EDM, discovery of new and improvements to existing models is possible. Examples include illustrating the educational content to engage learners and determining optimal instructional sequences to support the student's learning style.
  3. Studying the effects of educational support that can be achieved through learning systems.
  4. Advancing scientific knowledge about learning and learners by building and incorporating student models, the field of EDM research and the technology and software used.

Users and stakeholders

There are four main users and stakeholders involved with educational data mining. These include:

  • Learners – Learners are interested in understanding student needs and methods to improve the learner's experience and performance. For example, learners can also benefit from the discovered knowledge by using the EDM tools to suggest activities and resources that they can use based on their interactions with the online learning tool and insights from past or similar learners. For younger learners, educational data mining can also inform parents about their child's learning progress. It is also necessary to effectively group learners in an online environment. The challenge is using the complex data to learn and interpret these groups through developing actionable models.
  • Educators – Educators attempt to understand the learning process and the methods they can use to improve their teaching methods. Educators can use the applications of EDM to determine how to organize and structure the curriculum, the best methods to deliver course information and the tools to use to engage their learners for optimal learning outcomes. In particular, the distillation of data for human judgment technique provides an opportunity for educators to benefit from EDM because it enables educators to quickly identify behavioural patterns, which can support their teaching methods during the duration of the course or to improve future courses. Educators can determine indicators that show student satisfaction and engagement of course material, and also monitor learning progress.
  • Researchers – Researchers focus on the development and the evaluation of data mining techniques for effectiveness. A yearly international conference for researchers began in 2008, followed by the establishment of the Journal of Educational Data Mining in 2009. The wide range of topics in EDM ranges from using data mining to improve institutional effectiveness to student performance.
  • Administrators – Administrators are responsible for allocating the resources for implementation in institutions. As institutions are increasingly held responsible for student success, the administering of EDM applications are becoming more common in educational settings. Faculty and advisors are becoming more proactive in identifying and addressing at-risk students. However, it is sometimes a challenge to get the information to the decision makers to administer the application in a timely and efficient manner.

Phases

As research in the field of educational data mining has continued to grow, a myriad of data mining techniques have been applied to a variety of educational contexts. In each case, the goal is to translate raw data into meaningful information about the learning process in order to make better decisions about the design and trajectory of a learning environment. Thus, EDM generally consists of four phases:

  1. The first phase of the EDM process (not counting pre-processing) is discovering relationships in data. This involves searching through a repository of data from an educational environment with the goal of finding consistent relationships between variables. Several algorithms for identifying such relationships have been utilized, including classification, regression, clustering, factor analysis, social network analysis, association rule mining, and sequential pattern mining.
  2. Discovered relationships must then be validated in order to avoid overfitting.
  3. Validated relationships are applied to make predictions about future events in the learning environment.
  4. Predictions are used to support decision-making processes and policy decisions.

During phases 3 and 4, data is often visualized or in some other way distilled for human judgment. A large amount of research has been conducted in best practices for visualizing data.

Main approaches

Of the general categories of methods mentioned, prediction, clustering and relationship mining are considered universal methods across all types of data mining; however, Discovery with Models and Distillation of Data for Human Judgment are considered more prominent approaches within educational data mining.

Discovery with models

In the Discovery with Model method, a model is developed via prediction, clustering or by human reasoning knowledge engineering and then used as a component in another analysis, namely in prediction and relationship mining. In the prediction method use, the created model's predictions are used to predict a new variable. For the use of relationship mining, the created model enables the analysis between new predictions and additional variables in the study. In many cases, discovery with models uses validated prediction models that have proven generalizability across contexts.

Key applications of this method include discovering relationships between student behaviors, characteristics and contextual variables in the learning environment. Further discovery of broad and specific research questions across a wide range of contexts can also be explored using this method.

Distillation of data for human judgment

Humans can make inferences about data that may be beyond the scope in which an automated data mining method provides. For the use of education data mining, data is distilled for human judgment for two key purposes, identification and classification.

For the purpose of identification, data is distilled to enable humans to identify well-known patterns, which may otherwise be difficult to interpret. For example, the learning curve, classic to educational studies, is a pattern that clearly reflects the relationship between learning and experience over time.

Data is also distilled for the purposes of classifying features of data, which for educational data mining, is used to support the development of the prediction model. Classification helps expedite the development of the prediction model, tremendously.

The goal of this method is to summarize and present the information in a useful, interactive and visually appealing way in order to understand the large amounts of education data and to support decision making. In particular, this method is beneficial to educators in understanding usage information and effectiveness in course activities. Key applications for the distillation of data for human judgment include identifying patterns in student learning, behavior, opportunities for collaboration and labeling data for future uses in prediction models.

Applications

A list of the primary applications of EDM is provided by Cristobal Romero and Sebastian Ventura. In their taxonomy, the areas of EDM application are:

  • Analysis and visualization of data
  • Providing feedback for supporting instructors
  • Recommendations for students
  • Predicting student performance
  • Student modeling
  • Detecting undesirable student behaviors
  • Grouping students
  • Social network analysis
  • Developing concept maps
  • Constructing courseware – EDM can be applied to course management systems such as open source Moodle. Moodle contains usage data that includes various activities by users such as test results, amount of readings completed and participation in discussion forums. Data mining tools can be used to customize learning activities for each user and adapt the pace in which the student completes the course. This is in particularly beneficial for online courses with varying levels of competency.
  • Planning and scheduling

New research on mobile learning environments also suggests that data mining can be useful. Data mining can be used to help provide personalized content to mobile users, despite the differences in managing content between mobile devices and standard PCs and web browsers.

New EDM applications will focus on allowing non-technical users use and engage in data mining tools and activities, making data collection and processing more accessible for all users of EDM. Examples include statistical and visualization tools that analyzes social networks and their influence on learning outcomes and productivity.

Courses

  1. In October 2013, Coursera offered a free online course on "Big Data in Education" that taught how and when to use key methods for EDM. This course moved to edX in the summer of 2015, and has continued to run on edX annually since then. A course archive is now available online.
  2. Teachers College, Columbia University offers a MS in Learning Analytics.

Publication venues

Considerable amounts of EDM work are published at the peer-reviewed International Conference on Educational Data Mining, organized by the International Educational Data Mining Society.

EDM papers are also published in the Journal of Educational Data Mining (JEDM).

Many EDM papers are routinely published in related conferences, such as Artificial Intelligence and Education, Intelligent Tutoring Systems, and User Modeling, Adaptation, and Personalization.

In 2011, Chapman & Hall/CRC Press, Taylor and Francis Group published the first Handbook of Educational Data Mining. This resource was created for those that are interested in participating in the educational data mining community.

Contests

In 2010, the Association for Computing Machinery's KDD Cup was conducted using data from an educational setting. The data set was provided by the Pittsburgh Science of Learning Center's DataShop, and it consisted of over 1,000,000 data points from students using a cognitive tutor. Six hundred teams competed for over 8,000 USD in prize money (which was donated by Facebook). The goal for contestants was to design an algorithm that, after learning from the provided data, would make the most accurate predictions from new data. The winners submitted an algorithm that utilized feature generation (a form of representation learning), random forests, and Bayesian networks.

Costs and challenges

Along with technological advancements are costs and challenges associated with implementing EDM applications. These include the costs to store logged data and the cost associated with hiring staff dedicated to managing data systems. Moreover, data systems may not always integrate seamlessly with one another and even with the support of statistical and visualization tools, creating one simplified version of the data can be difficult. Furthermore, choosing which data to mine and analyze can also be challenging, making the initial stages very time consuming and labor-intensive. From beginning to end, the EDM strategy and implementation requires one to uphold privacy and ethics for all stakeholders involved.

Criticisms

  • Generalizability – Research in EDM may be specific to the particular educational setting and time in which the research was conducted, and as such, may not be generalizable to other institutions. Research also indicates that the field of educational data mining is concentrated in western countries and cultures and subsequently, other countries and cultures may not be represented in the research and findings. Development of future models should consider applications across multiple contexts.
  • Privacy – Individual privacy is a continued concern for the application of data mining tools. With free, accessible and user-friendly tools in the market, students and their families may be at risk from the information that learners provide to the learning system, in hopes to receive feedback that will benefit their future performance. As users become savvy in their understanding of online privacy, administrators of educational data mining tools need to be proactive in protecting the privacy of their users and be transparent about how and with whom the information will be used and shared. Development of EDM tools should consider protecting individual privacy while still advancing the research in this field.
  • Plagiarism – Plagiarism detection is an ongoing challenge for educators and faculty whether in the classroom or online. However, due to the complexities associated with detecting and preventing digital plagiarism in particular, educational data mining tools are not currently sophisticated enough to accurately address this issue. Thus, the development of predictive capability in plagiarism-related issues should be an area of focus in future research.
  • Adoption – It is unknown how widespread the adoption of EDM is and the extent to which institutions have applied and considered implementing an EDM strategy. As such, it is unclear whether there are any barriers that prevent users from adopting EDM in their educational settings.

Operator (computer programming)

From Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Operator_(computer_programmin...