The Troubled Ethics of Open Data
![]() |
A screenshot of the Cook County Medical Examiner Maps application, which allows users to access and filter records about primary causes of death in Cook County, Illinois |
The
white-painted wooden crosses showed up in June, on the front stairs of a
house around the corner from mine in Chicago. Honoring a life and
mourning the loss thereof, they quietly announced a family’s trauma. For
days following, new faces came and went, providing support to relatives
and making the necessary arrangements. Without the tradition of the
crosses, neighbors who didn’t know these people would have just seen the
occasion as another family gathering. In moments like these, privacy
can allow people to process and suffer at a pace that’s healthy for
them.
The
week after the crosses appeared, I was in the midst of research about
public health in Chicago’s neighborhoods. Using a tool published by the
Cook County Medical Examiner’s office, I could access and filter records
about primary causes of death that were of interest to my work, like
cardiovascular disease. Before applying filters, though, I generally
liked to use the tool’s mapping features to get a look at overall causes
of death in each of the neighborhoods I analyzed, color-coded to
reflect natural deaths and various categories of human-caused deaths
like accidents and homicides. The big picture helped form an initial
understanding of a neighborhood’s age makeup, nutritional health, and
other contributing factors to mortality. While browsing through my own
neighborhood, I ran across a little dot representing that address with
the crosses around the corner. Hovering over it, the following was
spelled out in clear, cold letters:
Gender: FEMALE
Age: 12
Manner Of Death: SUICIDE
Primary Cause: HANGING
—
In 2013, the U.S. presidential administration of Barack Obama created an Open Data Policy
for federal agencies, requiring those agencies to “collect or create
information in a way that supports downstream information processing and
dissemination activities”. In other words, data about the daily
operations of the federal government would be made more accessible to
the public than ever before. This policy came into existence as an
outgrowth of a then-decade-old movement toward “open government”,
closely linked to what is now most commonly known as civic tech.
At
that time, many national, regional, and local governments around the
world were taking similar steps to establish open data publication
platforms, which were intended to increase government transparency as
well as ease the handoff of data between different offices within a
single level of government. Technology-related civic advocacy
organizations like the U.K.’s mySociety, g0v.tw in Taiwan, and Code for
America combined advocacy for collaboratively-built government
technology with pushes for “open by default” policies for
government-held data. The idea at the core of this open data
proliferation was that the data collected and used by our governments
was ultimately “our” data, since it reflected our civic interactions,
our public services, and the overall administration of our societies.
Open
data policies have been celebrated by researchers, journalists, and
community activists. Thousands of governments now actively publish data
on topics ranging from lobbyist registrations to property transfers. A
survey conducted in 2018 by Open Knowledge International and Sunlight
Foundation found that 265 U.S. cities now meet some criteria for open data release, and information published on open data portals has been used to reveal corruption, make legislative meetings more accessible, and even create works of civically-engaged art.
In many cases, proactive publication of government data has saved
municipal resources, too, since that same data would often previously be
obtained by the public through a laborious public records request
process (a process that had to be duplicated each time a new individual
or organization needed data access).
—
The
usefulness of open data across a wide variety of disciplines has made
it an easy subject for enthusiasm in civic circles. Today, advocacy for
data openness has been embraced wholeheartedly by many groups both
inside and outside of government. However, this enthusiasm has bordered
on fanatical at times, expressed in ways that ignore many inherent
complications of the release of government-held information.
Published
data often lacks the local context or domain knowledge that it was
created with. Especially in jurisdictions large enough to hire
administrative staff whose job it is to maintain an open data portal,
the people responsible for data publication are often multiple steps
removed from the people responsible for data creation. When it comes
time to analyze open data in this environment, it can be hard to
establish the trustworthiness of a “raw” dataset. Data about
police-resident interactions very rarely includes information about how
heavily each police precinct is patrolled, for instance, so anybody
drawing conclusions from the “raw” data would see an over-representation
of interactions in heavily patrolled areas. Data about anything related
to municipal complaint systems tends to be skewed toward wealthy, white areas or rapidly gentrifying areas,
giving an outsize impression of a problem in places typically already
well covered by municipal resources. And wealthy civic stakeholders
often have the time and resources to utilize public data in ways that
marginalized communities don’t, so this can become a self-perpetuating
problem when in-the-know residents use skewed data to advocate for
further targeting of resources in their area, draining those same
resources from areas with greater actual need.
Biases
of collection and interpretation sometimes lead well-meaning volunteers
and organizations to create data-related solutions that don’t
effectively meet the needs of their targeted communities. Open data is
celebrated as a shared base upon which objective civic plans can be
built, but such a framing ignores that seemingly subjective knowledge
from communities can and should be an important component of planning.
And beyond that, data is always tainted in some way by subjectivity.
This is all well known in civic tech circles, but those circles often
get stuck in a mode of action that sees data publication as an ultimate
end goal, rather than as a starting point for much larger conversations
about how data is used by governments and how it can be leveraged
equitably by communities.
—
These
issues aren’t new, nor are they unique to data that is published
publicly. Data held internally by governments doesn’t become more or
less biased upon publication, of course, and existing government uses of
data can be highly biased in their own right.
Data-informed solutions require effective local context in order to be
effective, but they also require significant work to understand and
mitigate biased decisions with respect to that context. One unique
challenge posed by open data is not that data is released or used in the
first place, but that the actual act of data release is governed
differently than by more established freedom-of-information processes.
These traditional public records requests tend to be steered by criteria
related to the potential harms of records release, with the intent to
provide data to the public after vetting. Freedom-of-information
processes are far from perfect: some agencies perpetually stonewall the release of any data, some are inept at redacting sensitive information
when requests are approved, and so on. But they are generally more
robust than open data publication processes, which are newer and often
not subject to the same legally required oversight. This means more data
is published that perhaps shouldn’t have been, like the home addresses
of public employees. Or, like the gruesome manner of death of a
12-year-old girl who lived around the corner from me.
Fundamentally,
tools like the Cook County Medical Examiner Maps demonstrate that
sometimes what the open data community thinks of as “our” data may not
ethically belong to all of us, but instead may belong to a single person
or group of people who have interacted with a civic process. And when
we center openness as the end goal of a movement, we ignore those ugly
realities of injudicious data publication.
I
know people on the team that built Cook County Medical Examiner Maps,
and many other open data tools like it that contain highly personal
information. I know that their aims are to provide data that can be made
useful and improve communities, and I know from my own experience that
tools like these can be very helpful to research
efforts or social impact projects. But all of that potential benefit
doesn’t erase the need to think seriously about harms that can come from
publication of certain data, and the need to establish strict data
governance within the teams that manage open datasets. Civic
technologists have a responsibility to fight for transparency in a way
that is cognizant of the dangers that certain forms of transparency can
bring, and we need to do better than just openness for openness’s sake.