Data is a the new oil – Andres Weigend – Stanford.
There is millions of pieces of data being collected every hour of every day. Data on every corner of the world is being collected. One of the last areas of global mapping is the oceans, but even now they have robotic vessels covered in sensors that are exploring our oceans – they can stay underwater over decades.
UK Prime Minister David Cameron announced that UK’s personal health information – anonymised , so that everyone can become a health researcher. You can pay $99 to get your personal genome data and then share it with the world. Companies are gearing up to track your retail transactions through your smart phone – Google Wallet.
One in every 5 people on earth is on Facebook – 30 billion pieces of content are shared on it monthly. Flickr gets 3000 images per minute. 450,000 new Twitter accounts daily. Every minute, there are more than 138,000 new tweets. And that’s all data on the airwaves.
Data is the new oil, yes, but is more like soup – its messy and you don’t know what’s in it.
Quantified self movement – self knowledge through numbers. Recording your bodily functions, physiology, moods etc. and using that knowledge to improve your life. The DIY approach to managing data.
The Herculean and Heroic approach to dealing with data includes the search for the God particle. The data is so massive, that external teams are being brought into CERN to help filter it.
Crowd-sourced approach, such as amateurs involved in helping discover new planets.
Researchers need to help to manage their data, which librarians can do with a bit of re-engineering.
1.Leadership – getting attention of the academics is one of the hardest things. Six reasons why you should care about data management.
Risk: where is your data – a fellow UK university lost a lot of data in a tragic fire
Reputation: data access, FOI – climate Gate case, universities have become reluctant to share data around certain topics
Quality: data gold standard – to prove research assertions, you should be able to replicate the data that underlies them
Scale: an explosion of data – there has been a massive explosion in the amount of genome data, which is costing less and less. Sharing data has led to progress on Alzheimers.
Funding: research councils are expecting universities to develop road-maps for resource data management, that align them with that council – otherwise funding will be cut.
What libraries can offer is some carrots (after the sticks being imposed):
2. Research Data Management services – providing tools and support
understanding data requirements – what data do you have, its types and its state – can use Data Asset Framework or Cardio to help in these assessments (DCC Tools) (ANDS is Australian equivalent)
data management plans – tools include DMP online and DMP Tool
advocacy and training – informatics, storage etc.
tools to track impact eg. Total Impact – can be used on all online output
At Bath, they have a partnership approach. Internally, they work with UKOLN, the Library, IT, Research Support Office and Doctoral training Services. Their research is then often in partnership with external organisations, including commercial enterprises. http://blogs.bath.ac.uk/research360/
Library and institutional stakeholders were identified and tables with their responsibilities, requirements and relationships.
3. Developing data informatics capacity and capability (the skills)
These are explored well in “Managing research data” by Sheila Corrall and “Reskilling for research”from RLUK.
Points to consider:
there is a skills shortage for data informatics support in libraries
what is being taught in our LIS curriculum that fits to support today’s researchers?
people of what background are enrolling in LIS courses?
do we get credit for informatics work?
A plan for action:
define core components of data informatics – visualisation, workflow and analysis
analyse LIS entry qualifications and increase STEM entrants
International Data Informatics Working Group to explore, promote, recognise and reward
Lots of jobs becoming available for this skill set, internationally. In other sectors, there are already data journalists (The Guardian) and data artists (the New York Times), who tell stories with data, using visualisations.
Lots of implications for big data and data science. McKinsey Global Institute predicts a shortage of 190,000 data scientists by 2019.
Many of the tasks that data scientists carry out have a lot of synergies with what librarians do.
Managing research data effectively will give an organisation a business advantage.
The ability to take data – to be able to understand it, to process it, to extract value from it, to visualise it, to communicate it’s going to be a hugely important skill in the next decades, not only at the professional level but even at the educational level for elementary school kids, for high school kids, for college kids. Because now we really do have essentially free and ubiquitous data. So the complimentary scarce factor is the ability to understand that data and extract value from it.
I think statisticians are part of it, but it’s just a part. You also want to be able to visualise the data, communicate the data, and utilise it effectively. But I do think those skills – of being able to access, understand, and communicate the insights you get from data analysis – are going to be extremely important. Managers need to be able to access and understand the data themselves.
Hal Varian – Chief Economist – Google
Libraries are on a data journey – the Informatics Transform is a step in a new direction.