Yves here. Note that it’s possible (one would assume probable) that the Administration has been collecting the sort of information it has been getting from Verizon from all major telephone carriers. But if not, then the line of thought below is pertinent.
By Patrick Durusau, who consults on semantic integration and edits standards. Durusau is convener of JTC 1 SC 34/WG 3, co-editor of 13250-1 and 13250-5 (Topic Maps Introduction and Reference Model, respectively), and editor of the OpenDocument Format (ODF) standard at OASIS and ISO (ISO/IEC 26300). Originally published at Another Word for It.
The first question that came to mind when the Guardian broke the news on NSA-Verizon phone record metadata collection.
Here’s why I ask:
Verizon over 2011-2012 had only 34% of the cell phone market.
Unless terrorists prefer Verizon for ideological reasons, why Verizon?
Choosing only Verizon means the NSA is missing 66% of potential terrorist cell traffic.
That sounds like a bad plan.
What other reason could there be for picking Verizon?
Consider some other known players:
President Barack Obama, candidate for President of the United States, 2012.
“Bundlers” who gathered donations for Barack Obama:
|$200,000||$500,000||Hill, David||Silver Spring||MD||Verizon Communications|
|$200,000||$500,000||Brown, Kathryn||Oakton||VA||Verizon Communications|
|$50,000||$100,000||Milch, Randal||Bethesda||MD||Verizon Communications|
BTW, the Max category means more money may have been given, but that is the top reporting category.
I have informally “identified” the bundlers as follows:
- Kathryn C. Brown Kathryn C. Brown is senior vice president – Public Policy Development and Corporate Responsibility. She has been with the company since June 2002. She is responsible for policy development and issues management, public policy messaging, strategic alliances and public affairs programs, including Verizon Reads.
Ms. Brown is also responsible for federal, state and international public policy development and international government relations for Verizon. In that role she develops public policy positions and is responsible for project management on emerging domestic and international issues. She also manages relations with think tanks as well as consumer, industry and trade groups important to the public policy process.
- David A. Hill, Bloomberg Business Week reports: David A. Hill serves as Director of Verizon Maryland Inc. LinkedIn profile reports David A. Hill worked for Verizon, VP & General Counsel (2000 – 2006), Associate General Counsel (March 2006 – 2009), Vice President & Associate General Counsel (March 2009 – September 2011) “Served as a liaison between Verizon and the Obama Administration”
- Randal S. Milch Executive Vice President – Public Policy and General Counsel
What is Verizon making for each data delivery? Is this cash for cash given?
If someone gave your more than $1 million (how much more is unknown), would you talk to them about such a court order?
If you read the “secret” court order, you will notice it was signed on April 23, 2013.
There isn’t a Kathryn C. Brown in Oakton in the White House visitor’s log, but I did find this record, where a “Kathryn C. Brown” made an appointment at the Whitehouse and was seen two (2) days later on the 17th of January 2013.
BROWN,KATHRYN,C,U69535,,VA,,,,,1/15/13 0:00,,,176,CM,WIN,1/15/13 11:27,CM,,POTUS/FLOTUS,WH,State Floo,MCNAMARALAWDER,CLAUDIA,,,04/26/2013
[The 04/26/2013 date is the date the data was released. –lambert]
I don’t have all the dots connected because I am lacking some unknown number of the players, internal Verizon communications, and Verizon accounting records showing government payments, but it is enough to make you wonder about the purpose of the “secret” court order.
Was it a serious attempt at gathering data for national security reasons?
Or was it gathering data as a pretext for payments to Verizon or other contractors?
My vote goes for “pretext for payments.”
I say that because using data from different sources has always been hard.
In fact, about 60 to 80% of the time of a data analyst is spent “cleaning up data” for further processing.
The phrase “cleaning up data” is the colloquial form of “semantic impedance.”
Semantic impedance happens when the same people are known by different names in different data sets or different people are known by the same names in the same or different data sets.
Remember Kathryn Brown, of Oakton, VA? One of the Obama bundlers. Let’s use her as an example of “semantic impedance.”
The FEC has a record for Kathryn Brown of Oakton, VA.
But a search engine found:
Same person? Or different?
I found another Kathryn Brown at Facebook:
And an image of Facebook Kathryn Brown:
And a photo from a vacation she took:
Not to mention the Kathryn Brown that I found at Twitter.
That’s only four (4) data sources and I have at least four (4) different Kathryn Browns.
A quick search shows 227,000 hits for Kathryn Browns.
Remember that is just a personal name. What about different forms of addresses? Or names of employers? Or job descriptions? Or simple errors, like the 20% error rate in credit report records.
Take all the phones, plus names, addresses, employers, job descriptions, errors + other data and multiply that times 311.6 million Americans. (And that’s before we get to Facebook “likes,” twitter hash tags, Google search data, or telephone record linkages, all of which can be “dirty” in themselves, or with respect to each other.)
Can the problem of eliminating that semantic impedance be solved with petabytes of data and teraflops of processing?
Not a chance.
So, the Orwellian Fearists can stop huffing and puffing about the coming eclipse of civil liberties. Those passed from view a short time after 9/11 with the passage of the Patriot Act,* and not because of ineffectual NSA data collection.
Is there some class of problems that the NSA data collection efforts actually have a chance of solving?
Yes. Those that are sanely scoped.
Remember that my identification of Kathryn “bundler” Brown with the Kathryn C. Brown of Verizon was a human judgement, not an automatic rule. Nor would a computer “think” to check the White House visitor logs to see if another, possibly the same Kathryn C. Brown visited the White House before the secret order was signed.
Human judgement is required to eliminate or mitigate semantic impedance because all the data that the NSA has been collecting is “dirty” data, from one perspective or other. Either it’s is truly “dirty” in the sense of having errors, or it is “dirty” in the sense it doesn’t play well with other data.
For example, a sane scope for preventing terrorist attacks could be starting with a set of known or suspected terrorist phone numbers. Using all phone data (not just from Obama contributors), only numbers contacting or being contacted by those numbers would be subject to further analysis.*
Using that much smaller set of phone numbers as identifiers, we could then collect other data, such as names and addresses associated with that smaller set of phone numbers. That doesn’t make the data any cleaner but it does give us a starting point for mapping “dirty” data sets into our starter set.
The next step would be create mappings from other data sets. If we say why we have created a mapping, others can evaluate the accuracy of our mappings.
Those tasks would require computer assistance, but they ultimately would be matters of human judgement.
Examples of such judgements exist, say for example in the Palantir product line. If you watch Palantir Gotham being used to model biological relationships, take note of the results that were tagged by another analyst. And how the presenter tags additional material that becomes available to other researchers.
Computer assisted? Yes. Computer driven? No.
To be fair, human judgement is also involved in ineffectual NSA data collection efforts.
But it is human judgement that rewards sycophants and supporters, not serving public purpose.
NOTE * American voters bear responsibility for the loss of civil liberties by not voting leadership into office that would repeal the Patriot Act.
NOTE ** This frightening example works because it starts from a known phone number that’s already linked to a known identity; it’s sanely scoped. Whoever organized the Eliot Spitzer takedown, for example, also had a sanely scoped project and an excellent proxy for the subject identity for the target in context: Black socks. –lambert
Lambert here: None of this is to say that Bush and Obama’s massive surveillance program isn’t a gross violation of the Fourth Amendment. But there’s no particular reason to think that one segment of our grotesquely engorged National Security state is any less corrupt or effectual than any other, and that goes for NSA and DHS as much as for the Pentagon.