As the Philadelphia meetup, I got to chat at some length with a reader who had a considerable high end IT background, including at some cutting-edge firms, and now has a job in the Beltway where he hangs out with military-surveillance types. He gave me some distressing information on the state of snooping technology, and as we’ll get to shortly, is particularly alarmed about the new “home assistants” like Amazon Echo and Google Home.
He pointed out that surveillance technology is more advanced than most people realize, and that lots of money and “talent” continues to be thrown at it. For instance, some spooky technologies are already decades old. Forgive me if this is old hat to readers:
Edward Snowden has disabled the GPS, camera, and microphone on his cell phone to reduce his exposure. As most readers probably know, both the microphone and the camera can be turned on even when the phone has been turned off. He uses headphones to make calls. This makes the recent phone design trend away from headphone jacks look particularly nefarious.
“Laser microphones” can capture conversations by shining a laser on a window pane and interpreting the vibrations. However, this isn’t really a cause for worry since there are easier ways to spy on meetings.
With a voice recording (think a hostage tape), analysts can determine the room size, number of people in the room, and even make a stab at the size and placement of objects, particularly if they get more than one recording from the same site.
But what really got this reader worked up was Amazon’s Echo, the device that allows users to give voice instructions to a device that will tell your TV to stream video or audio. order from Amazon or other participating vendors, provide answers to simple search queries, like “Tell me the weather,” perform simple calculations, and allow you to order around smart devices in your home that are on the networks. like tell your coffee maker to make some coffee. He said, “I’d never take one of them out of the box.”
He was at a party recently with about 15-20 people when the host decided to show off her Echo. She called across the room, “Alexa, tell me the capital of Wisconsin,” and Alexa dutifully responded.
Based on his knowledge of other technologies, here is what he argues was happening:
The Echo was able to pick a voice out of a crowd engaged in conversation. That means it is capable of singling out individual voice. That means it has been identifying individual voices, tagging the as “Unidentified voice 1″, Unidentified voice 2” and so on. It has already associated the voices of its owners, and if they have set up profiles for other family members, for them as well, so it knows who goes with those voices.
Those voices may be unidentified now, but as more and more voice data is being collected or provided voluntarily, people will be able to be connected to their voice. And more and more recording is being done in public places.
So now think of that party I was at. At some time in the not too distant future, analysts will be able to make queries like, “Tell me who was within 15 feet of Person X at least eight times in the last six months.” That will produce a reliable list of their family, friends, lovers, and other close associates.
CNET claims that Amazon uploads and retains voice data from the Echo only when it has been activated by calling to it and stops recording when the request ends. But given the Snowden revelations that every camera and microphone in computers and mobile devices can be and are used as viewing and listening devices even when the owner thinks they are off, I would not be so trusting. Even if Amazon isn’t listening and recording at other times, the NSA probably can. CNET adds:
Amazon Echo is always listening. From the moment you wake up Echo to the end of your command, your voice is recorded and transcribed. And then it’s stored on Amazon’s servers….
It’s unclear how long the data is stored, but we do know that it is not anonymized. And, for now, there’s no way to prevent recordings from being saved.
Reread the first paragraph. The Echo has to be listening at all times in order to respond to the “Alexa” command. So the only question is whether Amazon or some friendly member of the surveillance state is recording then too.
This scenario ties into a recent development I find alarming: banks and other retail financial firms relentlessly offering to let you use your voice as your identifier if you wind up calling them. Every time I have called, I have to waste time rejecting their efforts to route me into that system. I’ve told the customer reps I never want that done but there is no way to override that even when I call in from a phone number they recognize as belonging to a customer.
Now let us play devil’s advocate. The Echo is awfully promiscuous in terms of who it seems to think is allowed to place orders. A parrot famously placed an order for some gift boxes:
But the story in the Sun states that the African Grey “Buddy” was imitating his owner:
Buddy activated her £150 Amazon Echo smart speaker, which connects to the internet shopping giant’s artificial intelligence hub.
Users can bark commands at it to control heating, order a takeaway or access a host of other services.
It responds to the name “Alexa” and hilarious footage filmed by South Africa-born Corienne now shows Buddy squawking “Alexa!” in her voice.
Now since on a quick search, I didn’t find any videos of Buddy’s owner saying “Alexa,” we have no idea of how good a mimic Buddy is (as is does the Echo allow anyone to place orders in a home who says “Alexa”? One would hope not, since imagine the mischief, say, an angry nanny or plumber or teenager could make).
Some argued that Echo and its ilk are not a threat because speaker recognition isn’t as good as is often claimed. From Scientific American:
Voice recognition has started to feature prominently in intelligence investigations. Examples abound: When ISIS released the video of journalist James Foley being beheaded, experts from all over the world tried to identify the masked terrorist known as Jihadi John by analyzing the sound of his voice. Documents disclosed by Edward Snowden revealed that the U.S. National Security Agency has analyzed and extracted the content of millions of phone conversations. Call centers at banks are using voice biometrics to authenticate users and to identify potential fraud.
But is the science behind voice identification sound? Several articles in the scientific literature have warned about the quality of one of its main applications: forensic phonetic expertise in courts. We have compiled two dozens judicial cases from around the world in which forensic phonetics were controversial. Recent figures published by INTERPOL indicate that half of forensic experts still use audio techniques that have been openly discredited….
The recorded fragments subject to analysis can be phone conversations, voice mail, ransom demands, hoax calls and calls to emergency or police numbers. One of the main hurdles voice analysts have to face is the poor quality of recorded fragments. “The telephone signal does not carry enough information to allow for fine-grained distinctions of speech sounds. You would need a band twice as broad to tell certain consonants apart, such as f and s or m and n,” said Andrea Paoloni, a scientist at the Ugo Bordoni Foundation and the foremost forensic phoneticist in Italy until his death in November 2015. To make things worse, recorded messages are often noisy, short and can be years or even decades old. In some cases, simulating the context of a phone call can be particularly challenging. Imagine recreating a call placed in a crowded movie theater, using an old cell phone or one made by an obscure foreign brand.
In other words, a significant problem is sample contamination, which is also can be an impediment in DNA analysis, in that contamination often has occurred at the collection site and sometime takes place in the lab. However, if you are repeatedly giving Amazon and whoever else might be interested voice samples again and again and again, you are giving them the opportunity to get a good, indeed many good recordings.
And our concerned reader points out that you don’t need pristine recordings to make useful inferences:
Although voice identification has a margin of error that would make it unacceptable for legal identification and non-repudiation, it still has useful utility for intelligence and “user experience” applications, especially when paired with other available data.
For example, if a sensor captures signature characteristics of a subject’s voice, it may limit the potential matches to, say, 500 people, but if another sensor detects cell phone IMEI signals near by, a match with a high degree of certainty may be predicted. Similarly a facial recognition algorithm may get a match that comes back with dozens of potential matches, but when cross-referenced to the nearby voice signature matches, a high confidence match is possible.
Databases in the cloud are very economical at scale. If persistent collection is stored in a database with proper meta data (e.g. Date/time, GPS, sensor type), then Bayesian algorithms will eventually retag the data for an unknown subject into a known subject (with with X probability).
To understand how this may work, consider the TSA backscatter scans performed every day at airports. The first batch will produce piles of scans of unknown persons. If these scans are compared with the boarding pass scans around the same place and time, then each backscatter scan may be considered as potentially matching one of the boarding passes scanned. Now, when the same person is scanned again, the number of potential matches of similar scans and common boarding passes reduces significantly. Eventually, scans can be quickly paired to an individual with a high degree of certainty. This can be further optimized by considering which scans and boarding passes have not already been tagged to someone with sufficient certainty.
But Echo and Google Home users may argue that they are allowed to erase their data, so what’s the worry? Again per CNET:
For those who don’t take chances, there’s a way to delete all voice data in one fell swoop. Head to www.amazon.com/myx, sign in, and click Your Devices. Select Amazon Echo, then click Manage Voice Recordings.
This is not as reassuring as it might sound. Amazon collects at least your Echo instructions by default. You can wipe them manually. You can’t set the Echo up not to retain your instructions nor to wipe the periodically, say daily.
So Amazon (and whoever else might have access to the data) pretty much always has some voice data to work with. And remember, Amazon is not deleting the voice profile that is has been constructing on you, merely the raw data it has been using to construct and refine that profile. So you can keep wiping your data, but ever time you speak to Alexa, and perhaps at other times too, you are giving it more and more information to develop a better and better vocal fingerprint.
Confirming some of the concerns described above, computer scientists at the University of North Carolina depict the “overhearing” of devices like the Echo and Google’s home as a hacking risk (while our reader’s and our concern is that the overhearing is a feature, not a bug). From their paper SoundSifter: Mitigating Overhearing of Continuous Listening Devices:
Having reached the milestone of human-level speech understanding by machines, continuous listening devices are now becoming ubiquitous. Today, it is possible for an embedded device to continuously capture, process, and interpret acoustic signals in real-time….Although these devices are activated upon a hot-word, in the process, they are con- tinuously listening to everything. It is not hard to imagine that sooner or later someone will be hacking into these cloud-connected systems and will be listening to every conversation we are having at our home, which is one of our most private places.
Their solution is what amounts to a hardware condom:
Instead of proposing modifications to existing home hubs, we build an indepen- dent embedded system that connects to a home hub via its audio input. Considering the aesthetics of home hubs, we envision SoundSifter as a smart sleeve or a cover for these devices.
An indirect confirmation that this security concern is real is that Amazon is giving patently dishonest reassurances to Echo customers, as in technically accurate but utterly misleading. In a Quartz article, Amazon’s vice president and head scientist of Alexa machine learning Rohit Prasad claims there is no reason to worry about the Echo devices because they are “too dumb”. They have almost no memory, a buffer of only a few seconds, and know only four wake words. In other words, he acts as if the potential of intercepting the communication to the cloud does not exist, and worse, directs consumer attention from the fact that Amazon retains user voice recordings.
One thing that may impede the spread of widespread voice-spying is that the Echo appears to be sufficiently fussy that it does not work very well in a lot of real-world settings. So only partial uptake among customers that fall squarely into its target market (upscale, tech-friendly, servant-loving) would limit how many customer profiles it gathers as well as how many parties it can listen in on.
Plus Amazon seems to have trained its algos on American voices, which means if you have a pronounced accent, you may not be very happy with the Echo.1 From Clive:
Apart from the creepy crawley surveillance aspect (and Google/Amazon bother me far more than the state security apparatus) I bought a couple of Apple Homekit enabled devices for home automation and Siri voice control. Absolutely useless. Works barely 60 percent of the time which is way, way less than tolerable considering the cost premium over conventional equivalents.
Wth proper microphone kit, a quiet workspace and a few hours training it on your dialect, there’s noting especially wrong with the principles of computer voice recognition. But it will always struggle in real-world environments and the vagaries of human speech without extensive customisation.
A lot of Silicon Valley’s output is what Japanese firms used to be castigated for — “Galapagos” products which only work in a narrow niche-local market. If you are not an urban hipster in a San Francisco loft apartment with unimpeded WiFi signal strength, reliable low-latency broadband, good acoustic envelope, no street noise and so on, the tech has an embarrassing tendency to fall over in the kinds of environments the rest of us live in.
Even in the US, these kinds of living conditions are atypical. City dwellers may have apartment type accommodation, but room sizes are smaller and reinforced concrete construction means the router in your hallway or kitchen will be patchy in the bedrooms. Suburban housing will be much bigger and you’ll need powerline repeaters to get to the outer edges of the building. CAT5 or 6 cabling isn’t standard on mass built housing and even custom build doesn’t normally specify it for residential development. My house is small by US standards but even I have to have a repeater to get a decent WiFi signal on the first floor.
I move in a tech-y circle and everyone I’ve discussed this with has tried Echo/ Siri Homekit/Google Home and has given up faced with the flakiness and demands to reconfigure their living spaces to accommodate their demands.
So many of the more trusting sort of customer may be put off by the lack of reliability of these “home assistants.” But if you care at all about your security, I wouldn’t get near one.
Update 7:00 AM. By happenstance, a story just out in the Sun confirms the UK “Echo is not ready for prime time” point of view. From Cops raid music fan’s flat after his Alexa Amazon Echo device ‘holds a party on its own’ while he was out:
A music fan has been left with a huge bill after his voice-operated Amazon Echo device threw a house party while he was away.
Cops were forced to break into Oliver Haberstroh’s flat in Hamburg, Germany, after neighbours complained about deafening music blasting from inside – but found the apartment empty.
Mr Haberstroh claims he walked out of his flat to meet friend [sic] on Friday night after checking that the lights and music were switched off.
He wrote on Facebook: “While I was relaxed and enjoying a beer, Alexa managed on her own, without command and without me using my mobile phone, to switch on at full volume and have her own party in my apartment”
“She decided to have it at a very inconvenient time, between 1.50am and 3am. My neighbours called the police.”
1 More from our wary reader:
Although I have never, and will never, own an Echo, when I saw people use it, it was accurate and responsive. I have not been impressed with Siri. I have noticed too a marked improvement in call center voice recognition for processing voice menus and transcribing voicemails. There is a lot of cheap older voice recognition technology in use, but the newer stuff is significantly improved each generation. The venture capital company InQTel, which funds tech for the intelligence sector is funding lots of tech in voice recognition. The big drivers for investments are: 1) replacement of call center support and marketing workers; 2) expansion of call center services because new workers are not needed; 3) transcription for marketing/business/government intelligence and sentiment analysis; and 4) cooperative and non-cooperative personal identification.