Originally published in Online Searcher – Volume 42, Number 6 (November/December 2018)
From the article:
The environment around research data management and open data has become incredibly complex—and the evolution doesn’t appear to be slowing down at all. At the core of many of today’s challenges are machine learning, natural language processing, and predictive analytics—the methods used for processing tremendous quantities of data for a variety of intended purposes.
On a daily basis, the news is full of stories about private sector and government agencies that are mining massive, internally collected sets of data for all sorts of outcomes. Technology is making it easier for organizations to become proactive in response to patterns in data. For example, with early alert systems, it is now possible for universities to identify students who might be on the cusp of dropping out in time for an advisor to intervene. Companies want to mine their customer data to achieve greater profitability and inventory data to forecast demand for products in a timely manner.
But these types of methods aren’t restricted to closed data. In fact, one of the ad vantages of open data is that it allows data from disparate datasets to be combined—remixing or merging many “small data” sets to convert them into “big data.” From the perspective of funding agencies, this type of reuse is one of the intended benefits of open data. If datasets use common variables, include well-structured and organized data elements, are deposited into interoperable repositories that can be found by harvesters, and include Creative Commons Attribution (CC-BY) licenses (or another similar license allowing for reuse), other researchers are encouraged to find, access, and reuse these datasets without restrictions.
Reuse without restrictions is what sparks fear in many researchers. Once data has been published and is out in the world, you lose all control over your dataset. It can be used, combined, and repurposed in all sorts of ways—including ways you never considered, ways that could potentially put someone else in harm’s way, or for more morally ambiguous purposes.
Although we’re proponents of open data, it’s useful to know about some incidents in which open data has led to problems.
Read the full article in Online Searcher, available on the Information Today website.