In Depth The White House Big Data Report identified only two tangible harms of big data.

Published on July 15th, 2014 | by Daniel Castro and Travis Korte

0

A Catalog of Every “Harm” in the White House Big Data Report

The White House spent three months looking for ways big data was hurting Americans. It found only two cases.

Earlier this year, President Obama ordered a comprehensive review of the privacy implications of big data. The review, which took 90 days, convened a working group of senior administration officials who met with hundreds of stakeholders from a wide array of fields, sponsored conferences at top universities, and received public comment from more than 70 organizations. The review culminated in the report “Big Data: Seizing Opportunities, Preserving Values” which offered six policy recommendations, with the top one being to draft new consumer privacy legislation. However, for all the concern expressed by some commentators about big data, the report failed to identify almost any concrete examples of how big data is actually causing consumers economic, physical, or social harm. In fact, after reviewing all 37 concerns identified in the report, we found that all but two of them were purely speculative, i.e., the authors cited no evidence that the concerns mentioned were occurring today, and many were vague and ill-defined.

This is a crucial distinction. If the White House had identified a series of tangible examples of how big data was presently harming consumers, then it would be legitimately justified in calling for policymakers to adopt stronger consumer privacy rules or other protections. But since it did not, this raises the question of whether there is even a compelling need for policy intervention at this stage. After all, many theoretical concerns may never be realized if factors, such as market forces, cultural norms, and new technologies, intervene. Thus policymakers should be extremely cautious about regulating on the basis of purely speculative concerns which might not even come to pass, especially when doing so might curtail substantial economic and social benefits, many of which are already occurring today.

So what were the two cases where the White House found concrete consumer harms from big data?

The first example came from a 2013 study by researcher Latanya Sweeney that found the search engine ads for the background check website Instantcheckmate.com were disproportionately shown to users searching for “black-sounding names” than “white-sounding names.” The White House report describes the problem as follows: (pp. 7-8):

Unfortunately, “perfect personalization” also leaves room for subtle and not-so-subtle forms of discrimination in pricing, services, and opportunities. For example, one study found web searches involving black-identifying names (e.g., “Jermaine”) were more likely to display ads with the word “arrest” in them than searches with white-identifying names (e.g., “Geoffrey”). Outcomes like these, by serving up different kinds of information to different groups, have the potential to cause real harm to individuals.

Clearly, it is harmful for advertisements to reinforce negative stereotypes about marginalized groups. But given that advertisers frequently come under criticism for accusations of racismsexism, and ageism, this concern may have less to do with big data than it does with the conduct of advertisers. In addition, changes in privacy laws would not address this harm since the root of the problem in this example was negative stereotypes in advertisements being linked to certain search terms. Although it is unlikely that Congress would pass a law banning this type of activity (since, for better or worse, the 1stAmendment gives advertisers wide latitude in crafting public messages), the advertising industry, as well as individual advertisers and ad platforms, can adopt their own self-regulatory guidelines and company policies banning the negative portrayal of a particular race or ethnicity.

The second example concerns retailers offering different consumers different prices for the same goods. In 2012, the Wall Street Journal published an article finding that some retailers varied the prices they showed to online shoppers based on a variety of factors. The White House report describes the problem as follows: (pp. 46-47)

Recently, some offline retailers were found to be using an algorithm that generated different discounts for the same product to people based on where they believed the customer was located. While it may be that the price differences were driven by the lack of competition in certain neighborhoods, in practice, people in higher-income areas received higher discounts than people in lower-income areas.

Ironically, one of the solutions to the above problem is actually more data. As the original Wall Street Journal article noted, one likely reason for the lower prices in higher-income areas was greater competition in well-off suburban neighborhoods. If low-income shoppers had more access to data about the prices paid by others, they could make better informed decisions about where to buy and thereby create a more competitive market.

It is worth noting that this finding is an exception to the norm which is that variable pricing improves consumer welfare. Variable pricing presents a tension between equality (charging everyone the same price) and access (charging everyone based on their willingness to pay). If retailers can only charge one price, then some consumers may not be able to afford the product. However, if retailers can charge low-income consumers a lower price, and high-income consumers a higher price, then they will sell at both prices, thereby making both consumers and producers better off overall.

In short, the White House report identified only two concrete consumer harms from big data, neither of which would justify new privacy laws. As Defense Secretary Donald Rumsfeld famously said, “the absence of evidence is not the evidence of absence,” and it is possible that additional problems will arise and more sweeping legislation will be necessary. But until that day arrives, rather than preemptively trying to curtail the use of data, policymakers would be better off narrowly focusing on identifying specific consumer concerns and then constructing targeted remedies to address those particular problems.

ConcernPage
These capabilities, most of which are not visible or available to the average consumer, also create an asymmetry of power between those who hold the data and those who intentionally or inadvertently supply it.3
Big data applications may be the driver of America's economic future or a threat to cherished liberties.3
Unfortunately, 'perfect personalization' also leaves room for subtle and not-so-subtle forms of discrimination in pricing, services, and opportunities. For example, one study found web searches involving black-identifying names (e.g., 'Jermaine' were more likely to display ads with the word 'arrest' in them than searches with white-identifying names (e.g., 'Geoffrey'). Outcomes like these, by serving up different kinds of information to different groups, have the potential to cause real harm to individuals.7
Big data technology could assign people to ideologically or culturally segregated enclaves known as 'filter bubbles' that effectively prevent them from encountering information that challenges their biases or assumptions.8
Some of the most profound challenges revealed during this review concern how big data analytics may lead to disparate inequitable treatment, particularly of disadvantaged groups, or create such an opaque decision-making environment that individual autonomy is lost in an impenetrable set of algorithms.10
These new technologies...test individual privacy, whether defined as the right to be let alone, the right to control one's identity, or some other variation.10
Once information about citizens is compiled for a defined purpose, the temptation to use it for other purposes can be considerable, especially in times of national emergency.22
If unchecked, big data could be a tool that substantially expands government power over citizens.22
[Children's data] could be used to build an invasive consumer profile of them once they become adults, or otherwise pose problems later in their lives.25
Because young people are exactly that--young--they need appropriate freedoms to explore and experiment safely and without the specter of being haunted by mistakes in the future.25
It is even possible to discern whether students have learning disabilities or have trouble concentrating for long periods. What time of day and for how long students stay signed in to online tools reveals lifestyle habits. What should educational institutions do with this data to improve learning opportunities for students? How can students who use these platforms, especially those in K-12 education, be confident that their data is safe?26
Blending multiple data sources can create a fuller picture of a suspect's activities around the time of a crime, but can also aid in the creation of suspect profiles that focus scrutiny on particular individuals with little or no human intervention. Pattern analysis can reveal how criminal organizations are structured or can be used to make predictions about possible future crimes. Gathering broad datasets can help catch criminals, but can also sweep up detailed personal information about people who are not subjects of an investigation.29
The presence and persistence of authority, and the reasonable belief that one's activities, movements, and personal affiliations are being monitored by law enforcement, can have a chilling effect on rights of free speech and association.32
The advent of more powerful analytics, which can discern quite a bit from even small and disconnected pieces of data, raises the possibility that data gathered and held by third parties can be amalgamated and analyzed in ways that reveal even more information about individuals.34
While big data will be a powerful engine for economic growth and innovation, there remains the potential for a disquieting asymmetry between consumers and the companies that control information about them.39
While this precise profiling of consumer attributes yields benefits, it also represents a powerful capacity on the part of the private sector to collect information and use that information to algorithmically profile an individual, possibly without the individual's knowledge or consent. This application of big data technology, if used improperly, irresponsibly, or nefariously, could have significant ramifications for targeted individuals.45
Powerful algorithms can unlock value in the vast troves of information available to businesses, and can help empower consumers, but also raise the potential of encoding discrimination in automated decisions.45
Because of this lack of transparency and accountability, individuals have little recourse to understand or contest the information that has been gathered about them or what that data, after analysis, suggests. Nor is there an industry-wide portal for consumers to communicate with data services companies, as the online advertising industry voluntarily provides and the Fair Credit Reporting Act requires for regulated entities. This can be particularly harmful to victims of identity theft who have ongoing errors or omissions impacting their scores and, as a result, their ability to engage in commerce.46
For all of these reasons, the civil rights community is concerned that such algorithmic specter of 'redlining' in the digital economy the potential to discriminate against the most vulnerable classes of our society under the guise of neutral algorithms. 46
Recently, some offline retailers were found to be using an algorithm that generated different discounts for the same product to people based on where they believed the customer was located. While it may be that the price differences were driven by the lack of competition in certain neighborhoods, in practice, people in higher-income areas received higher discounts than people in lower-income areas.46
It will also be important to examine how algorithmically-driven decisions might exacerbate existing socio-economic disparities beyond the pricing of goods and services, including in education and workforce settings.47
Certain private and public institutions have access to more data and more resources to compute it, potentially heightening asymmetries between institutions and individuals.48
But big data tools also unquestionably increase the potential of government power to accrue unchecked.49
It is one thing for big data to segment consumers for marketing purposes, thereby providing more tailored opportunities to purchase goods and services. It is another, arguably far more serious, matter if this information comes to figure in decisions about a consumer's eligibility for or the conditions for the provision of employment, housing, health care, credit, or education.51
In addition to creating tremendous social good, big data in the hands of government and the private sector can cause many kinds of harms. These harms range from tangible and material harms, such as financial loss, to less tangible harms, such as intrusion into private life and reputational damage. An important conclusion of this study is that big data technologies can cause societal harms beyond damages to privacy, such as discrimination against individuals and groups. This discrimination can be the inadvertent outcome of the way big data technologies are structured and used. It can also be the result of intent to prey on vulnerable classes.51
More serious cases of potential discrimination occur when individuals interact with complex databases as they verify their identity.52
Left unresolved, technical issues like this could create higher barriers to employment or other critical needs for certain individuals and groups, making imperative the importance of accuracy, transparency, and redress in big data systems.52
There is, however, a whole other class that merits concern the use of big data for deliberate discrimination.53
A significant finding of this report is that big data could enable new forms of discrimination and predatory practices.53
Data that is socially beneficial in one scenario can cause significant harm in another.56
Perhaps most important of all, a shift to focus on responsible uses in the big data context allows us to put our attention more squarely on the hard questions we must reckon with: how to balance the socially beneficial uses of big data with the harms to privacy and other values that can result in a world where more data is inevitably collected about more things.56
Big data also introduces many quandaries. By their very nature, many of the sensor technologies deployed on our phones and in our homes, offices, and on lampposts and rooftops across our cities are collecting more and more information. Continuing advances in analytics provide incentives to collect as much data as possible not only for today's uses but also for potential later uses. Technologically speaking, this is driving data collection to become functionally ubiquitous and permanent, allowing the digital traces we leave behind to be collected, analyzed, and assembled to reveal a surprising number of things about ourselves and our lives. These developments challenge longstanding notions of privacy and raise questions about the 'notice and consent' framework, by which a user gives initial permission for their data to be collected.58
An important finding of this review is that while big data can be used for great social good, it can also be used in ways that perpetrate social harms or render outcomes that have inequitable impacts, even when discrimination is not intended. Small biases have the potential to become cumulative, affecting a wide range of outcomes for certain disadvantaged groups.58
As students begin to share information with educational institutions, they expect that they are doing so in order to develop knowledge and skills, not to have their data used to build extensive profiles about their strengths and weaknesses that could be used to their disadvantage in later years.63
Students and their families need robust protection against current and emerging harms, but they also deserve access to the learning advancements enabled by technology that promise to empower all students to reach their full potential.64
This combination of circumstances and technology raises difficult questions about how to ensure that discriminatory effects resulting from automated decision processes, whether intended or not, can be detected, measured, and redressed.64
To prevent chilling effects to Constitutional rights of free speech and association, the public must be aware of the existence, operation, and efficacy of such programs.66

Photo: Flickr user Stephen John Bryde 

Tags: , , , ,


About the Author

Daniel Castro is the director of the Center for Data Innovation and vice president of the Information Technology and Innovation Foundation. Mr. Castro writes and speaks on a variety of issues related to information technology and internet policy, including data, privacy, security, intellectual property, internet governance, e-government, and accessibility for people with disabilities. His work has been quoted and cited in numerous media outlets, including The Washington Post, The Wall Street Journal, NPR, USA Today, Bloomberg News, and Businessweek. In 2013, Mr. Castro was named to FedScoop’s list of “Top 25 most influential people under 40 in government and tech.” In 2015, U.S. Secretary of Commerce Penny Pritzker appointed Mr. Castro to the Commerce Data Advisory Council. Mr. Castro previously worked as an IT analyst at the Government Accountability Office (GAO) where he audited IT security and management controls at various government agencies. He contributed to GAO reports on the state of information security at a variety of federal agencies, including the Securities and Exchange Commission (SEC) and the Federal Deposit Insurance Corporation (FDIC). In addition, Mr. Castro was a Visiting Scientist at the Software Engineering Institute (SEI) in Pittsburgh, Pennsylvania where he developed virtual training simulations to provide clients with hands-on training of the latest information security tools. He has a B.S. in Foreign Service from Georgetown University and an M.S. in Information Security Technology and Management from Carnegie Mellon University.



Back to Top ↑

Show Buttons
Hide Buttons