VALA 2014 – Plenary 1 – Christine Borgman – University of California

Christine Borgman (UCLA) – Big data, little data, no data: scholarship in the networked world

Information has moved a long way, from oral communication,  cumulatively to virtual. We use them concurrently and in different ways. Need to think about how we work with traditional materials and ‘data’. We need to think of new types of infrastructures to manage the morass of data.

The rest of the world is looking to Australian National Data Service (ANDS) and to the UK for direction and ideas on this. Australia is the only place where the Code of Conduct requires researchers to manage their data well. US is moving to follow in our footsteps.

Data are not publications, not natural objects, they are representations, and sharing and reuse depends on knowledge infrastructures. We have already begun to see the different ways that data is managed. Open access publishing, repositories and more.

Publication legitimises the work – it gets it’s authority from peer review. It disseminates the work. It provides access and preservation. The only thing that has changed in the digital age is that the access is now wider.

Scholarly communication is on average a three year process, the public part of which is only a small proportion of that time.

Open access publishing, is digital, no cost, online, and free of most copyright and licensing restrictions. Copyright is owned by the author and they write for impact. (Suberin, P. Open Access.)

ANDS vision – more Australian researchers reusing research data more often. ANDS is aiming to facilitate this by improving data management.

There is no standard for what makes data open as information is messy. One of the problems is working out who owns the data. Nobody can agree on what data is in the first place. Data could be anything from a list, but there is no true definition – even OECD only has a very narrow definition.

Data can be so complex – how do you catalogue it?

Social scientists are having trouble getting data as people aren’t answering their phones or doors, so they are now looking to Twitter.

Data are representations of observations, objects, etc used as evidence for the purposes of scholarship. As they are representations, they can look different in every place you put them. Just like the MARC format is the boundary around library data, other disciplines are creating their own boundaries around their data. Which makes interoperability and sharing all the more difficult.

Data infrastructure are not something you build once and you are done. It is an ongoing process, working with the data and data collectors on protocols, provenance and more. Data management is difficult. Reuse is very low because researchers themselves have problems with reusing their own data, let alone someone else doing it.

You can release data by contributing it to an archive; attaching it to a journal article – but may be missing data or context sensitive; post on local website; license on request; release on request…. Degrees of reuse, by anyone, by groups, at any time now or into the future? Need to decide on this before making data available.

Libraries are common pool resources – limited resources which must be governed. Data does not become a common pool resource until the researcher releases it. Open access is trying to make toll goods – something to be bought or sold, into common pool goods.

There are more than scholars and libraries who have a stake in data. Others include scholars, students, readers, universities, funders and more. They all need to be involved in the process of creating these knowledge infrastructures. Need to know what to keep, why they are being kept, how to keep, for how long, who will govern them and what kinks of expertise are required.

Knowledge infrastructure has technical fabric, social fabric and trust fabric.