The Center for Data Innovation spoke to Eyal Toledano, co-founder and chief technology officer of Zebra Medical Vision, an Israeli startup that uses artificial intelligence to identify bone, heart, liver and lung problems in medical scans. Toledano talked about how technology can help meet the increasing demand for radiologists and how algorithms can spot things in medical imagery that humans cannot.
Nick Wallace: How did you come to the idea of applying machine learning to medical imaging and founding the company?
Eyal Toledano: It all started started when Eyal Gura, my co-founder, called me when I was around the middle of my masters studies in the Massachusetts Institute of Technology (MIT) media lab. He told me his personal story, of how he’d been in a diving accident in Mexico and been hospitalized in a small medical center that had excellent scanning facilities, but no-one to read the images. He was very frustrated personally—it was his health, after all— but also frustrated intellectually because at the time he was the chief executive officer of what was then PicScout, an Internet imaging, analysis, comparison, and search company, later sold to Getty Images. So he couldn’t send his images to a reference database and understand if he’s worried for nothing, or if he needs to quickly find ways to get home.
He approached me with the core understanding that in the medical profession, the expertise of radiologists is so scarce that the gap between the demand and the supply of imaging diagnostics is huge. In the western world, this manifests as attention overload, and too much work for radiologists, so they miss things or they don’t report each and every possible finding. And in the developing world, this doesn’t even exist; their diagnostic abilities are scarce. You can have hospitals built with imaging facilities, and these machines have improved substantially—not only in resolution and the amount data of generated, they’re also easier to operate and to calibrate, and technicians are not difficult to train—but the number of radiologists is essentially a fixed resource; if there’s an increase, it’s small.
So it’s very difficult to bridge the gap between the utilization of medical imaging and the availability of radiologists—because you can find more conditions now with medical imaging, magnetic resonance imaging (MRI) and computerized tomography (CT) scans are used for more conditions than they used to be, but we don’t have that many more radiologists.
We believe that technology will help fill this gap. Eyal [Gura] and I had as undergraduates been in an entrepreneurial program together in the university, and we’d always wanted to work together. But he went and founded PicScout, and later sold four or five companies, and I went to start Samsung Telecom R&D in Israel, and then I went to study at MIT, but he caught me in—I think—a very personal moment, when I caught the bug of, “if you are trying to do something with your talent, try to do something that is meaningful, where you will be proud to fail miserably.” So in the MIT media lab, the director Joi Ito and my advisors covered the atrium with huge signs with big world-wide problems written on them. And that was when two things clicked: that I must graduate, and I need to start Zebra.
The technology started to mature around early 2012, and we started to see signs that there was potential to go beyond traditional computer vision. By that time, I was well educated on traditional computer vision. At Samsung we did augmented reality, in-camera technology, improving picture-taking, and many kinds of augmented communication and imaging. But the rise of deep learning technology opened a window of opportunity, and improvement in computational ability and ability of storage, and the fall in price of these things, made it possible for startups to play in the big data and machine learning arena.
When I thought about starting Zebra, I knew that the imaging challenges from ImageNet, or from Internet images, don’t really fit in the radiology domain. If you scan me twice, once with my lungs full of air and my spine straight, and again after I breathe out and relax, the two scans will look very different, even though they’re of the same person. Sometimes what you are trying to locate is a specific finding that is focal and localized, and you need to to ignore these changes, and to generalize a pattern to target your algorithm. And the resolution is huge, with 3D volumes. Even 2D mammography images are more than ten times the resolution of the images in ImageNet. The resolution, size, and volumetric data makes it challenging.
But the concepts can be generalized and can map to the radiology domain. Although when I started, most of the people that we talked to didn’t get that at the time. They thought we were crazy. We were two guys coming from the mobile and Internet domain. But luckily Elad Benjamin, our chief executive officer, joined us in the beginning, so we could mitigate the fact we were these two crazy guys not coming from the healthcare domain, because we were with a well established executive from the medical imaging domain. He got it immediately. Our investors had an amazing vision that resonated well with our vision. That was kind of lucky, I think: startups need luck.
Then it became hard work to establish all the data, agreements, partnerships, pass all the regulatory and security and privacy requirements, which was a big challenge, find channels to market, build a bridge between a strong algorithmic team and a medical direction that lets you build more of what’s important, rather than what’s sexy. I think Zebra established a roadmap around what are more important problems to solve, and less around what is sexy and appealing. Just to give an example: more people die from lung diseases such as chronic obstructive pulmonary disease (COPD), and the burden on the health system is much bigger, than from lung cancer. But lung cancer research is more in vogue.
Wallace: What are the benefits of having machines analyze medical images? What can algorithms spot that doctors can’t, and why?
Toledano: There are two answers to this. Two kinds of families of algorithms and technologies augment human ability in two ways. One is merely because computers can aggregate information and quantify information in a repeatable and accurate way, which allows them to identify things that a human eye cannot. If I show you two spine images of an osteoporotic patient and a healthy patient, it could be that neither you nor any radiologist would spot the difference.
But if you project that image with a regression map to thousands and thousands of patients, then the computer has a golden reference and the physiological volumes that generated that score, which means once you get a new scan, you can map it to the relevant score, which a human eye cannot do. AI and computer vision algorithms excel at tasks that involve quantification and mapping between domains, such as regression.
The second family of algorithms are algorithms in the domain of risk stratification. So imagine you have two classes—sick and healthy. You have seen 2,000 or 200,000 scans of hearts two years before a cardiac event, and another 200,000 that didn’t have any cardiac event for the next five or ten years. Just from that statistical nature of these algorithms, they can determine if a new scan of a heart belongs to one group or the other.
That statistical distance is in a sense a risk measure. If you asked radiologists to go over all the population of managed group hospitals or even a health ministry of a country, and identify the people at risk of heart disease—or any other disease— they could not give you a number. For individual cases, they can say “maybe,” or “maybe not.” And the stability of that impression is a bit low, because there are even studies that show the same physician on the same study may say two different things in two different times and radiologists agree with each other sometimes as infrequently as 70 or 75 percent of the time, depending on the case.
Generally speaking, algorithms are more stable in their ability to predict and risk-stratify, and you can use them to run workloads that humans cannot. Even if you have all the radiologists available, it will be too expensive and it will take too much time for them to go over an entire population.
So it’s not only the fact that computers have infinite attention span and can work around the clock, there are two inherent differences in the way they can infer and quantify and map results and the way they can risk-stratify a condition, that makes them, I think, a very valuable tool, and we should see more of them in our future healthcare.
Wallace: If an algorithm turns up something that would never have been spotted otherwise, is it always clear how or why it concluded there was a problem?
Toledano: It’s a very difficult and long answer. I’ll try to point to a few things. First of all, this debate about explaining algorithms is really about trust. As a scientist, I trust the scientific method and the rigorous process of experiment, validating, reproducing results, evaluating results, externally reproducing them. This is how I gain trust. This is how scientists gain trust. But even for us at Zebra, we are dealing with radiologists that want to understand, they gain trust by understanding the algorithm.
So all of our algorithms produce what I’ll call artifacts, which give a deep insight into how and why they reached a specific conclusion. Since most of the algorithms we are dealing with produce diagnoses or findings that can be summarized in a sentence, that’s not always the best way to deliver information or gain trust.
But since we analyze images, we always have a way to visualize what the algorithm sees and why it identified a specific location: those visual artifacts are not only algorithmic decisions, they are images that were generated along the process of analyzing a sample. With those images, instead of physicians saying, “well, it’s a black box,” now they can say, “Wow, it produced a synthetic image where I can easily see other things If I have that, I know I can trust the algorithm, but it can also help me be sure that I’m accepting an algorithmic decision that I can fully back. I don’t just know it works, I can know why it works.”
Other things are just our way to debug: the diagnosis produced by the algorithm indicates what we would expect to see. If a condition is highly localized in a particular place, like a tumor or a lump or a mass or a legion, then an anomaly in the heatmap should be aligned with the area of the pathology. And sometimes a condition can be diffused all over the lungs, like emphysema. So the condition and the pathology also direct us to understanding and debugging and creating artifacts that improve trust and make it reasonable to people that rely more on a convincing argument than the scientific method.
There’s another example physicians don’t have. For example, mammography is very difficult to diagnose. If you look at the scan, it’s very difficult to decide if a distortion that you can barely detect is malignant or indicative of malignancy or not, and if we look at a statistic, 50 percent of biopsies return benign, so the imaging doesn’t really give any insight to the physician whether this is benign or malignant. Because if it did, the data would look different.
When we began to train our breast imaging algorithm, we went and looked at the end point: we took scans of pathologically proven biopsies and then went back the last scan, of two or more years before. And now you have two classes, those that were proven malignant, and all the others. The algorithm can beat human accuracy, because the quality of its insight is based on data. It’s like flying back in time after you know what the result should be, and then you have kind of an unfair advantage. So algorithms can generate this unfair advantage over humans when there is an endpoint or an outcome that we can trace back to the point of decision making. A physician cannot do that, he or she has to rely on feeling, memory, and sometimes on consensus agreement in the medical domain.
Wallace: Zebra Med recently launched Profound, a website for analyzing scans to be used by patients rather than doctors. Can you tell me a little about the purpose of having patients analyze their own medical scans?
Toledano: We state very strongly, and it is all over the website, Profound is accessible only in localities where the regulatory authorities allow it. It’s not a diagnostic application, and we don’t replace physicians. We would like to inform patients so they can have a meaningful discussion with relevant caregivers. That was the purpose of this site, since in the medical domain, things are slow and some things don’t always reach the patient in the end. We also recognize the trend of enabling patients, giving them more choice and more knowledge, patronizing them less, and the rise of the “quantified self.” We wanted to create an opportunity for some of the value from medical imaging to be out in the open. So it’s a free-to-use website that gives patients something to discuss with their physicians.
Obviously patients need to already have had scans before they can use Profound. Currently, most of the analyses are of CT scans. People sometimes get given a CD when they have scans, with their data on it. But who has a CD reader anymore? So that’s a limiting factor. In the future this may become some kind of extension for patients that could link with an online electronic medical record. And there are services to store healthcare data in the cloud and share it. So that could also be a way people engage with this website.
Wallace: What possibilities do you see for combining medical imaging with other types of data, such as genomic data, for example?
Toledano: There are two modes in which I see these combinations can provide value.
One is a way that we currently use, which involves combining other data sets in order to select the right imaging for the pathology being diagnosed. If you use a pathology database to select screening images for a radiology database, you have a higher quality imaging dataset. And sometimes you need end results or outcomes in order to build a data set that will be used to train an algorithm. So other clinical data, like admissions and discharge information, or data that comes from surgical systems in the hospital that contain end results and outcomes, can be fused with imaging data that sometimes comes before and generates a higher quality data set with a better chance to predict and localize their pathology.
Another way of providing value by fusing data sets involves what we like to call “meta-algorithms.” You can have a person with very bad lungs running five miles a day, and his lung function test would be perfect. Because the anatomical data is analyzed on the CT, it may be different from the functional data of that person. He’s used to running, his lung oxygen is very good, and although the volume participating in this function is lower, his overall abilities and functions are high. So in a sense you can see something, but you need additional information in order to diagnose a condition.
Having other data sets that give information on what kind of medication a person takes, his exercise regime, what his overall functionalities look like, I think will lead to a new generation of algorithms. But this is kind of a second or third wave, and we look at how to provide value that will have an impact on routine health care, instead of the more advanced use cases. We believe in these algorithms, we think that those meta-algorithms will be the second or third generation of what we’re doing today.