The Center for Data Innovation spoke with Topher White, chief executive officer of Rainforest Connection, a company based in San Francisco that develops acoustic monitoring systems for conservation. White discussed how AI can turn low-tech hardware into powerful conservation tools as well as the value of exposing biologists and ecologists to machine learning.
Joshua New: Rainforest Connection uses modified cell phones dispersed throughout a forest to detect signs of illegal logging and poaching. This seems like a low-tech solution to a very important problem. Are there not more advanced systems already in place?
Topher White: There are a lot of systems in place to fight deforestation and a lot of them have been effective to a certain degree. Probably one of the more advanced ones is using satellite imagery, which has made a really big difference in fighting deforestation over the years, particularly in Brazil, since it gives you a really comprehensive look over remote areas. One of the issues with this approach, despite its potential to allow advanced analysis, such as by using machine learning, is that turnaround time is getting better but it’s still too slow to fight deforestation in a real-time sense. There’s also a lot of pretty great systems that rely on camera traps, community-based reporting, drones, and other techniques but at the end of the day, these are all pretty human-intensive and they don’t work in real-time. We saw a space for us to set up a 24-hour, comprehensive monitoring system that could operate over decent distances and send real-time alerts to people on the ground who could stop deforestation. This is more of a surgical approach without heavy reliance on human forces.
Using cell phones is, in a certain sense, relatively low-tech, but the way I like to think about it is it’s a very high-end system using very low-end technology on the ground. There’s pretty good cell phone service in a lot of these areas in the periphery of the forest, which is what’s under threat. As we get better at this with better software and better antennas, we’re able to pick up things up to 20 kilometers away from the nearest cell tower which allows us to go pretty deep into the forest. There’s a few things about the way we design our system that are pretty intense and different than how people have built similar systems in the past, but overall yes, I completely agree that this is a low-tech system on the surface. That’s because we try to build low-tech hardware and very high-tech software.
New: After you detect potential signs of illegal logging, how does this data get used to actually put a stop to it?
White: Every single one of the cell-phone devices we use, called Guardians, are up in trees capturing audio and transmitting it in as real-time as possible to our servers. We then use AI to analyze this data to spot a number of different things, including chainsaws and logging trucks, which is our forte, as well species detection, biodiversity monitoring, gunshots, voices, and anything else that’s there.
After this, local partnerships are super important. You could have all the data you want but if nobody is going to do anything about it, it’s not so helpful. A big part of our work is building partnerships with local groups which invite us in to help. These could be indigenous tribes, non-governmental organizations, or for example we’re working on a partnership with the government of Bolivia. Everyone assumes we’re working with law enforcement, but usually it’s local communities that have taken responsibility for protecting their land. So when there’s a detection, there are web interfaces and real-time alerts through text or email. Partners get these alerts and can get a read-out of what we heard. This is all done with AI, so when they get an alert, they can review it to verify it is what we think it is. Then they get the location and can take off. This creates a feedback loop, with them saying yes or no that the alert was correct and yes or no that they did respond. At the current scale we’re operating at, which will get larger soon, there is the opportunity to do follow-ups on a personal basis and keep track of individual cases.
New: You recently began using machine learning to get better at detecting activity related to logging. How much of an improvement has this been?
White: We’ve been using machine learning for quite a few months but have just started talking about it because we’re really excited about what’s coming next. It has been a dramatic, dramatic improvement. In the first couple years that we built the system, we focused more on harmonic detection. When you’re off in the forest, it’s really noisy all of the time but the sound of chainsaws and motors have a harmonic pattern. We built a pretty simple harmonic detector that was able to pick out these sounds but that’s about it. As we grew and worked in different ecosystems, the background noise would change a lot: the forest changes very dynamically from hour to hour, certainly between night and day, and picking out these harmonics became a much greater challenge. We started using convolutional neural networks, which were more or less image detection systems, that became really helpful tools.
We switched over to this approach in 2017 and had a pretty dramatic improvement on the detection side. But then of course we encountered other issues that we had to solve. There were a lot fewer false positives, which is super important because it’s crucial that we build faith in the system by always giving our partners good information and the ability to confirm what’s actually there. But in any kind of system, especially one where you’re working with 24 hours of data, you’re going to have a false positive rate. This was particularly high when we were focusing on harmonics, and machine learning cut it down, but we’re still looking at a few false positives per day.
To give you an idea of how the system works, we’re using what we call binary classifiers. The system divides up the entire day into one- or two-second chunks of time that it analyzes constantly throughout the day, and we use classifiers that determine the probability that a chunk of time contains chainsaws or other signs of logging. These are binary, so they either say “yes” or “no.” This has an advantage over other kinds of systems because we could develop hundreds of different classifiers working on the same data as it comes in, allowing us to know the probability of different things occuring at any given point in time across the entire area we’re working. If a certain detection system isn’t perfect, it’s backed up by other classifiers that can help us figure out what it may have missed.
On top of that, we have an additional layer of AI to look at all of these things together and pull intelligence out of it. This is what we call the cognition layer, which allows us to go from asking “what is in the forest” to “what does it all mean?” This allows us to do some very fancy analysis, like figuring out the sounds of things we can’t hear. For example if a jaguar walks through a forest, it might not be making any noise but the forest’s soundscape changes as other things react to it. As we’re able to tie these correlations back to known entities, we start to detect things that don’t make noise. We’ve been running one experiment recently about how a housecat runs through a backyard. It doesn’t make noise, but the birds change their calls. In this case birds actually make explicit calls about the presence of a certain type of predator and we’re able to detect the presence of a cat in a yard just based on this.
We’d like to be able to take ecology, biomonitoring, and all this other biology and conservation work and apply big data to do things that used to be impossible. Even at the scale we’re operating at now, we’re working with what would be considered pretty “big” data from an ecology point of view, and we want to scale that up 100-fold.
New: Could Rainforest Connection eventually use other kinds of sensors to aid in this effort? For example, if you are already using cell phones, could cameras be useful here?
White: We use every sensor that we could possibly use in the phones themselves. That’s one of the cool things about TensorFlow, which is the machine learning library that we’re using, because it allows us to pull in a lot of different sensor data into our models. We also are able to use the cameras in the phones. But from a conservation point of view, while camera traps can be great, they can only see a small bit of the forest and need light. Our Guardians are also pretty high up in trees, which limits what a camera can see as well. We do use cameras, but we don’t send all that data up to the cloud—we request that a Guardian sends camera data if we want to see something at a particular time.
We prefer audio data because it’s so far-reaching. You see that reflected in the evolution of animals in the forest—most of them communicate vocally because it’s easier to transmit information over long distances that way.
New: I imagine all of the data you’re gathering would have a lot of value outside of a strictly conservation context. Are you sharing this data with researchers?
White: This is an important part of our mission. From an organizational perspective, we’ve found that if partners can show up and stop logging a couple times in a row, the loggers will go away for a long period of time. This is great, but we have to prove that this is the case and keep recording data. Given that we have this growing dataset, we have an imperative to make use of it to protect the forest. Making this available to biologists, ecologists, and anybody else is a fantastic way to help make new discoveries and help forests. We put a lot of resources into making our data accessible.
We found early on when we started working with machine learning that the status quo was to rely on data scientists to be able to build new models. We really quickly found out that we wouldn’t be able to afford more than one or two. We have a great tech team now, but we are learning that if we wanted to protect hundreds of species, we needed a system that didn’t require someone to know much about machine learning to add new species to it. This is what we’ve been working on for the last nine or ten months and plan on releasing it pretty soon. It’s a training system that allows anyone to come in and gain access to our entire dataset. For example, if you want to include a particular bird and you have some examples of it, or if you can find examples of it in our data, we have a spectrometer that can see what you’re looking for and go off and bring back examples from our data. Over the course of a few hours, you can train a model that uses months or years of our data from multiple locations. Once someone does this, this detection is added into the system for everyone to use.
We think this is super important because 90 percent of the biologists and ecologists out there just haven’t had a chance or the interest to use machine learning, and we want to lower the bar to it. Right now, there are so many awesome datasets sitting on people’s hard drives that they got a grant to create and that they are really proud of, but it’s really hard for them to make use of it. They have to learn how to analyze it and parse through the data to find exactly what they want. We automated this process and want to make it available to people as cheaply as possible.
Image: Tyler Roemer.