Accurate, Focused Research on Law, Technology and Knowledge Discovery Since 2002

Category Archives: Search Engines

Investigative Report – NSA created ‘google-like search’ engine – shared access with other agencies

Data available through ICREACH appears to be primarily derived from surveillance of foreigners’ communications, and planning documents show that it draws on a variety of different sources of data maintained by the NSA. Though one 2010 internal paper clearly calls it “the ICREACH database,” a U.S. official familiar with the system disputed that, telling The Intercept that while “it enables the sharing of certain foreign intelligence metadata,” ICREACH is “not a repository [and] does not store events or records.” Instead, it appears to provide analysts with the ability to perform a one-stop search of information from a wide variety of separate databases. In a statement to The Intercept, the Office of the Director of National Intelligence confirmed that the system shares data that is swept up by programs authorized under Executive Order 12333, a controversial Reagan-era presidential directive that underpins several NSA bulk surveillance operations that monitor communications overseas. The 12333 surveillance takes place with no court oversight and has received minimal Congressional scrutiny because it is targeted at foreign, not domestic, communication networks. But the broad scale of 12333 surveillance means that some Americans’ communications get caught in the dragnet as they transit international cables or satellites—and documents contained in the Snowden archive indicate that ICREACH taps into some of that data. Legal experts told The Intercept they were shocked to learn about the scale of the ICREACH system and are concerned that law enforcement authorities might use it for domestic investigations that are not related to terrorism.

“To me, this is extremely troublesome,” said Elizabeth Goitein, co-director of the Liberty and National Security Program at the New York University School of Law’s Brennan Center for Justice. “The myth that metadata is just a bunch of numbers and is not as revealing as actual communications content was exploded long ago—this is a trove of incredibly sensitive information.” Brian Owsley, a federal magistrate judge between 2005 and 2013, said he was alarmed that traditional law enforcement agencies such as the FBI and the DEA were among those with access to the NSA’s surveillance troves. “This is not something that I think the government should be doing,” said Owsley, an assistant professor of law at Indiana Tech Law School. “Perhaps if information is useful in a specific case, they can get judicial authority to provide it to another agency. But there shouldn’t be this buddy-buddy system back-and-forth.”

Google’s fact-checking bots build vast knowledge bank – New Scientist

Hal Hodson, 20 August 2014, New Scientist - The search giant is automatically building Knowledge Vault, a massive database that could give us unprecedented access to the world’s facts “GOOGLE is building the largest store of knowledge in human history – and it’s doing so without any human help. Instead, Knowledge Vault autonomously gathers and merges information from acrossContinue Reading

Comparing Google Consumer Surveys to Existing Probability and Non-Probability Based Internet Surveys

“This study compares the responses of a probability based Internet panel, a non-probability based Internet panel and Google Consumer Surveys against several media consumption and health benchmarks. The Consumer Surveys results were found to be more accurate than both the probability and non-probability based Internet panels in three separate measures: average absolute error (distance fromContinue Reading

Google Earth expands to the Moon and Mars – Outside

Outside News from the Field: “Google couldn’t celebrate Curiosity’s second anniversary on Mars (in Earth years) with just a doodle. Instead, the California-based gods of the Internet have released two new maps to explore using the Google Earth application—on Mars and the Moon. They were assembled using images taken by various spacecraft as well as data on each body’s elevationContinue Reading

Visualizing language usage in New York Times news coverage throughout its history

Chronicle – Tracking New York Times Language Usage Over Time, Alexis Lloyd:  “News publishing is an inherently ephemeral act. A big story will consume public attention for a day, or a month or a year only to fade from memory as quickly as it erupted. But news coverage, aggregated over time, can provide a fascinating “firstContinue Reading

Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing

Research at Google – “Mesa is a highly scalable analytic data warehousing system that stores critical measurement data related to Google’s Internet advertising business. Mesa is designed to satisfy a complex and challenging set of user and systems requirements, including near real-time data ingestion and queryability, as well as high availability, reliability, fault tolerance, andContinue Reading

New York Times Launches Enhanced Archive Search and @NYTArchives Twitter Account

PressRun: “Today The New York Times launches search on its interactive digital archive: TimesMachine.  With this newly-developed search technology, users can now use both free text and subject headings from the Times Index to search the 11,298,320 Times articles published across 46,592 issues between September 18, 1851 and December 31, 1980. Unlike previous iterations of searchContinue Reading

Google News Publisher Center

Google News Blog: “If you are a news publisher, your website has probably evolved and changed over time.  Until now, when you made changes to the structure of your site, we might not have discovered them unless you told us.  And that meant they might not have shown up in Google News, which in turnContinue Reading

Census State and Local Government Finances Historical Tables

“A new online data tool shows statistics for the finances of state and local governments, such as revenue, expenditures and debt, aggregated to the state level for fiscal years 2004 to 2011. These statistics are also available via the DataFerrett tool.”  

26 Questions EU Regulators Want Google to Answer – WSJ

WSJ.com: “European Union privacy watchdogs grilled Google Inc. and other search engines for two hours on Thursday on how they are implementing the bloc’s new “right to be forgotten” online–and then gave them homework to do by next week, too. The main body that joins together the EU’s national data-protection regulators called the Brussels meeting withContinue Reading

The Right to Be Forgotten in the Google Spain Case

Iglezakis, Ioannis, The Right to Be Forgotten in the Google Spain Case (Case C-131/12): A Clear Victory for Data Protection or an Obstacle for the Internet? (July 26, 2014). Available for download at SSRN: http://ssrn.com/abstract=2472323 “The right to be forgotten is a new right that is introduced in the Draft Proposal for a General Data ProtectionContinue Reading

ACSI: Customer Satisfaction with E-Business Rebounds as Social Media, Search Engines and News Sites Improve

News release: ” Customer satisfaction with social media, search engines and online news and opinion websites is up, according to the E-Business Report released today by the American Customer Satisfaction Index (ACSI). The latest results reveal a 2.9 percent rise in user satisfaction with e-business websites to 73.4 on ACSI’s 100-point scale. “Even with improvements acrossContinue Reading