Archive for the 'semantic web' Category

Apps and applications – Concurrent Session 12 – VALA 2012

mobile devices, online presence, semantic web, tagging No Comments »

QR codes: do they provide the missing link between the physical and the digital? – Tristan Badham

Received the VALA Travel Scholarship to see how QR codes were being used in libraries, their implementation, reception from users and staff, further ideas etc.

QR codes are 2 dimensional barcode when scanned by a mobile device, you get linked to the resource the creator intended. Could be website, email address, phone number, coordinates on a map. You need a mobile device, with a camera, an Internet connection and QR code reader (an app) to make use of QR codes.

Being used a lot in advertising, real estate signs etc.

How they can benefit libraries:

  • act as a bridge between the physical and the digital

  • make access to information and resource easier

Providing information at the point of need:

  • video guide on how to use the print management system

  • a map of the library layout

  • library audio tours

Catalogue records, with links to specific location information

QR codes within the collection, to link to online resources – particularly to the mobile version of them.

Social media – can communicate your library presences.

Contact the library and research help – particularly SMS reference services.

Other non-library uses included:

  • videos by curators talking more about the art work being viewed

  • Powerhouse Museum has built an app which the QR code refers to when scanned – each specific piece links to a specific page

Positives:

  • Cheap and relatively easy to implement – time is in the staff and sign creation – also have good tracking

  • Marketing appeal – makes the library look tech savvy

Negatives:

  • Device can’t read the code – too big or too small or incorrect lighting

  • Used the wrong way

  • People don’t know what they are or how to use them

  • Devices need a QR code reader which means downloading an app….

The more content your destination has, the more complicated the QR code, which loses pixilation in resizing. If that is the case, use a URL shortener first.

So are they really all that good?

  • They are worth exploring as a useful technology, especially within the broader context of the move to mobile technologies

  • one tool among many

  • could complement rather than compete with other technologies

Awareness of the codes is high amongst young people, even if they don’t know what they are called.

QR codes could burst on the scene, but if they are used in the wrong way, they may disappear. (bit like Brendan Fraser as an actor).

Hacking the nation: Libraryhack and community-created apps – Margaret Warren and Richard Hayward

LibraryHack was created to foster re-use library data – a direct result of the NSLA Re-imagining libraries vision. QSL was responsible for Project 5 – community created content. It aims to make real the ability to help people to find, remix and create new content. Library hack was in four parts.

        1. Release of library data and digital content for re-use.

Data was to be made available on data.gov.au, so to ensure the data was discoverable where other public data was available and to add a presence for cultural data. All ten participating libraries placed their data in this central location. Fifty-three datasets were added, primarily images, but also search transaction logs, music and art. Data was able to be licensed for re- use, using Creative Commons. Copyright is an important consideration. Discovered that having geo-spatial data included, made the data more popular and re-usable and that most library formats are not re-user friendly. If we want to encourage more photo mash-ups, we need to make high resolution images publicly available.

Interestingly, Ancestry has taken on the public data and made good use of it.

  1. Ideas competition

Discovering the sorts of things that people would be interested in

  1. Hack days

Days for people interested in working with the data, to come and talk to the content specialists and to find out more about the datasets.

  1. Learning

Offered a range of learning opportunities, focused on different topics, including animation and more on how to mash-up this data. Videos are still available at QUT for anyone who is interested.

Received 168 entries for the competition, as well as people creating new apps that were never entered into the competition.

Judging criteria: use of data/digital content, originality, quality, usefulness. Judging panel came from NSLA libraries.

Ideas category winner – Discovery by Diana Iles – included maps, images, manuscripts and map overlay integration. It delivered a visual message, but can be interactive when properly encoded with geo-spatial data.

Apps category winner – talking maps by Michael Henderson – walking West End multimedia tour (Brisbane suburb), custom built geographic interface, talkingmaps.com website, can listen to audio and explore images on the walk

Photo mashups category: Reflection of Time by Andrew Young – included historical images, with reflection of the artists own original work of a contemporary version of the same scene incorporated into it.

Digital media mashup category – Glorious image viewer by Mark Balandzic – projection of historical images on a variety of rotation lamps.

Collaboration was the key, between hackers and between them and the library. Mostly it was fun.

Also resulted in great staff engagement.

Next: More. Better. Easier. Collaboration.

Harvesting and semantically tagging media releases from political websites using web services – Peter Neish

Why are they interested in media releases?

  • Play an important part in political process

  • establish a party’s position on an issue at a particular time

  • often used in time urgent reference request

  • may go back many years (library has database back to 1992)

Number of political media releases released in Victoria has risen from just over 1000 in 1992, to over 6000 in 2009 and 5000 in 2010. The government puts out a lot more media releases than the opposition. The government keeps it own databases of these media releases. If it was online, the library stopped duplicating that work.

Due to the potential loss of this data when a change of government occurs, the decision was made to begin harvesting this data on the go. The aims of the project were to automate the process, combine the different databases together and to examine the possibility of automatically applying tags to media releases using web services.

Part 1 – Automation

  • Key was RSS

  • Political parties have websites, which had RSS feeds, which were used as a standard input to software.

  • Built, in Java, a servlet which polled and returned the data from the political parties website – put the full-text and its associated metadata into the library database. It also produced and saved a pdf version of the media release.

It works, having harvested over 11000 media releases since July 2010, freeing up 2 days of staff time per week. Problems include having non-standard content in feeds (eg. dates), which they addressed with Yahoo Pipes and website’s changing their structure or CMS.

Part 2 – Semantic tagging

Manual tagging was no longer viable. After examining many options, went with Open Calais – from Thomson Reuters. Although business focused, it matched up with the type of data they had, gave a good number of tags (around 20), minimal false matches, good documentation sand community and generous limits on API calls. Unfortunately, their algorithm is a closely kept secret and not as much development is happening. Check out an example at http://viewer.opencalais.com/.

User Interface – did some useful user testing which helped inform the creation of the interface.

Review – of tagging – about 85% were correct – 4% were incorrect, 6% repeated and 5% redundant. One of the things they always got wrong was Victoria which it placed in the Seychelles – very frustrating.

Linked Data – get the info back in JSON and RDF. It links to its own ontology – which means that limited classes for government.

Media releases are now available as they are released – no backlog. Data is enriched by tagging and in future will link to other databases in the Linked Data ecosystem.

 

The informatics transform: re-engineering libraries for the Data Decade – Liz Lyon

future, future of libraries, online publishing, semantic web, staff, staff training, workforce planning No Comments »

Data is a the new oil – Andres Weigend – Stanford.

There is millions of pieces of data being collected every hour of every day. Data on every corner of the world is being collected. One of the last areas of global mapping is the oceans, but even now they have robotic vessels covered in sensors that are exploring our oceans – they can stay underwater over decades.

UK Prime Minister David Cameron announced that UK’s personal health information – anonymised , so that everyone can become a health researcher. You can pay $99 to get your personal genome data and then share it with the world. Companies are gearing up to track your retail transactions through your smart phone – Google Wallet.

One in every 5 people on earth is on Facebook – 30 billion pieces of content are shared on it monthly. Flickr gets 3000 images per minute. 450,000 new Twitter accounts daily. Every minute, there are more than 138,000 new tweets. And that’s all data on the airwaves.

Data is the new oil, yes, but is more like soup – its messy and you don’t know what’s in it.

Quantified self movement – self knowledge through numbers. Recording your bodily functions, physiology, moods etc. and using that knowledge to improve your life. The DIY approach to managing data.

The Herculean and Heroic approach to dealing with data includes the search for the God particle. The data is so massive, that external teams are being brought into CERN to help filter it.

Crowd-sourced approach, such as amateurs involved in helping discover new planets.

Researchers need to help to manage their data, which librarians can do with a bit of re-engineering.

1.Leadership – getting attention of the academics is one of the hardest things. Six reasons why you should care about data management.

  • Risk: where is your data – a fellow UK university lost a lot of data in a tragic fire

  • Reputation: data access, FOI – climate Gate case, universities have become reluctant to share data around certain topics

  • Quality: data gold standard – to prove research assertions, you should be able to replicate the data that underlies them

  • Scale: an explosion of data – there has been a massive explosion in the amount of genome data, which is costing less and less. Sharing data has led to progress on Alzheimers.

  • Funding: research councils are expecting universities to develop road-maps for resource data management, that align them with that council – otherwise funding will be cut.

What libraries can offer is some carrots (after the sticks being imposed):

2. Research Data Management services – providing tools and support

  • understanding data requirements – what data do you have, its types and its state – can use Data Asset Framework or Cardio to help in these assessments (DCC Tools) (ANDS is Australian equivalent)

  • data management plans – tools include DMP online and DMP Tool

  • advocacy and training – informatics, storage etc.

  • data licensing

  • tools to track impact eg. Total Impact – can be used on all online output

At Bath, they have a partnership approach. Internally, they work with UKOLN, the Library, IT, Research Support Office and Doctoral training Services. Their research is then often in partnership with external organisations, including commercial enterprises. http://blogs.bath.ac.uk/research360/

Library and institutional stakeholders were identified and tables with their responsibilities, requirements and relationships.

 3. Developing data informatics capacity and capability (the skills)

These are explored well in “Managing research data” by Sheila Corrall and “Reskilling for research”from RLUK.

 Points to consider:

  • there is a skills shortage for data informatics support in libraries

  • what is being taught in our LIS curriculum that fits to support today’s researchers?

  • people of what background are enrolling in LIS courses?

  • do we get credit for informatics work?

 A plan for action:

  • define core components of data informatics – visualisation, workflow and analysis

  • analyse LIS entry qualifications and increase STEM entrants

  • International Data Informatics Working Group to explore, promote, recognise and reward

Lots of jobs becoming available for this skill set, internationally. In other sectors, there are already data journalists (The Guardian) and data artists (the New York Times), who tell stories with data, using visualisations.

Lots of implications for big data and data science. McKinsey Global Institute predicts a shortage of 190,000 data scientists by 2019.

Many of the tasks that data scientists carry out have a lot of synergies with what librarians do.

Managing research data effectively will give an organisation a business advantage.

The ability to take data – to be able to understand it, to process it, to extract value from it, to visualise it, to communicate it’s going to be a hugely important skill in the next decades, not only at the professional level but even at the educational level for elementary school kids, for high school kids, for college kids. Because now we really do have essentially free and ubiquitous data. So the complimentary scarce factor is the ability to understand that data and extract value from it.

I think statisticians are part of it, but it’s just a part. You also want to be able to visualise the data, communicate the data, and utilise it effectively. But I do think those skills – of being able to access, understand, and communicate the insights you get from data analysis – are going to be extremely important. Managers need to be able to access and understand the data themselves.

Hal Varian – Chief Economist – Google

Libraries are on a data journey – the Informatics Transform is a step in a new direction.

Linked data: weaving the web of libraries, museums and archives – Eric Miller

future, future of libraries, libraries, semantic web No Comments »

The web is the most successful commerce and communication platform every conceived. It has become so pervasive in such a short time – no other technology has been as pervasive or as universal. It has quickly become one of the most pervasive data management and integration platforms ever imagined. And no-one owns it.

It has moved from only a communication tool to a data tool. Most of the web currently is pages and links – its things pointing at other things, via a common platform, which can be accessed from a variety of devices. The Web as a protocol has been a very effective way of wrapping other protocols which are required for specific purposes. Its a very lightweight infrastructure – a very powerful unifying principle. It has enabled people to make connections on the web, record the connection and make it available for others to follow. And it was done by us!

Most of the web is for humans, but opaque to machines. We understand relationships, but to machines its just code. We add the meaning.

Most of the web is connected, but compartmentalised. Its page granular – pointing from one to another. Not much is being done with underlying data. But there are sites like Expedia.com, retrievr which grab the data from other sites.

Remix

  • mix data from different sites tor provide added value

  • the mix sources don’t need to be involved

  • hybrid client-server mode

Problems:

  • data is mostly locked up in pages

  • each website is different

  • and keeps changing

  • very blurry lines between use and fair-use

  • even after extraction, data needs to be modeled so that it can be mixed

  • a remixed website looks like another website (so difficult for further mixing)

Remixing is extremely useful, hard and doesn’t cascade well.

Success story: News!

Whether its RSS or Atom. It describes a chronology of news items, consumers poll and receive new items, items can be easily mixed-up by web sites and applications and they cascade. A web range of applications can also be built on that. eg. Pulse

Achieve that by using XML instead of HTML, give extensibility through XML namespaces and granularity at the news item level.

But its not enough. Limitations include no standard ways of representing relationships between items (its all temporal and chronological), no ways of joining similar items and no standard way to query the web other than polling (can only get the most recent stuff).

How do we solve this issues? Linked data – ways to integrate data in a huge range of ways. Databases are set up for the types of queries you expect to receive. Not knowing what sort of queries were going to be received, linked data had to be built on flexibility.

Linked data is a term used to describe a recommended best practice for exposing, sharing and connecting pieces of data, information and knowledge on the semantic web using URIs and RDF. (Wikipedia) This allows us to get down to the level of relating things, not just pointing to other things.

This web of data is about making it easier to publish, remix, cascade this data and empower people to do new and interesting things with this data, at a reduced cost.

Many organisations are looking at this as a framework to expose their data, not just libraries, museums and archives. Showed backstage.bbc, the New York Times, NPR,The World Bank, Data.gov, HM Government and many national libraries.

We are no longer matching on the string, but on the identifier. These organisations are creating identifiers for the concepts that they are concerned about sharing. These identifiers can be reused, rethought or new ones can be created.

Rather than leaving data where it naturally resides and making it easy to connect to. Integration is not by heaping it all into centralised repositories or apps.

There is power in human computing – OCR correction, captchas. The power of identifiers – Creative Commons – the licences are identifiers. We are assigning this relationships, making it easier for the search engines to bring back things that we can re-use.

Power of recombinant data – Lego works. Lego can be recombined to create new things. It works for Eric’s kids and it has its own meaning, which is understood and done quickly.

RDF- Resource Description Framework – common model for identifying and linking data. Can link a wide variety of types of data that we didn’t traditionally see as linkable. If the data can be surfaced, it doesn’t matter what format its in, it can be referenced and linked.

What”s the catch? It takes the big step of fundamentally rethinking applications and their integration. Not applications on the web, but in the web, using the webs existing architecture. I want your data, in my way!

Example: where to stay? Ask for accommodation recommendations and was site a website which listed local hotels and motels. He was able to scrape and encode the data as addresses and prices etc and then displayed it on a map. He built wrappers and scrapers to extract data from his calendar, to then match up where his meetings were to be held, in relation to potential accommodation.

LOC Digital Preservation Program:

  • 180+ partners (NDIIPP)

  • Located across the globe

  • each with different charters, goals, budgets

  • benefits for sharing and connecting their data

  • but it exists in disconnected silos

In order to facilitate the sharing, they created “ViewShare – interfaces to our heritage”. http://www.viewshare.org

Using identifiers, we can specify data and then contribute more data – eg. Once assigned address type, can then add latitude and longitude. Was able to do a search of Powerhouse and narrow down by height of the title, as this data is surfaced by them.

Solution is to empower users to create their own views of data, build a community round the data.

Linked data gives us simple conventions for expressing context, a mechanism for collaborating despite different points of view and a mechanism for recording agreements as they evolve. Its about building on how people communicate to mature the way systems interact.

Adoption: Google, Microsoft and Yahoo schema.org effort and LOC Marc efforts.

Libraries have the oppportunity to use our trust, brand and skills to be involved in making these connections. Its not far from where we are to where we need to go. we need to expose what we have, build the policies that enable this and empower our users to build off it.

 

 

 

VALA 2010: a reflection

blogging, conference, mashups, metadata, open source software, presentations, semantic web No Comments »

I can’t believe its been3 weeks since VALA 2010 finished.  But it has been and in the wake of all my notes from the conference and inspired by some excellent summary blog and twitter posts from fellow conference attendees, here are my key reflections from VALA 2010.

1. Discovery layers

It doesn’t matter what vendor you use these days, a discovery layer will sit over pretty much every library system and open your content to your users in a new and exciting way. Academic and State Libraries have already implemented this software and public libraries are starting to. And it sits on top of your website to give the integration between the website and catalogue that our users expect and that librarians have been seeking.

I never realised the range of offerings available until I chaired the Vendor session which demonstrated a wide range of the offerings available from different companies. If you don’t already have a discovery layer in place or in process, you need to be looking at them now.

2. Metadata

I have heard talk about metadata for well over a decade.  Til now, I thought it was the domain of repositories, archives and the like. After VALA2010 I can finally see its relevance for my own library’s web content, which is neither archival nor relating to repositories in any form.

So add another thing to the list of things to do.

3. Semantic Web

Linked data and the whole concept of the semantic web is moving from a concept to a reality in small ways.  Its fascinating to watch this evolution, from concept to working tools. Its early days yet, but there will be a lot more interesting developments in these areas in coming years, which I will be watching for with continued interest.

4. Mashups and APIs

I always thought that APIs really belonged to the realm of programmers or those with some programming knowledge/skill, of which I have a minuscule amount.  After listening to Paul Hagon at the L-Plate Series at VALA, that misconception has been corrected. I have already been planning with APIs without realising it (its only Google Maps, but hey, its still an API) and Paul pointed out some great tools to help us get into some more serious stuff. It’s time to play!  Thanks Paul.

5. Trove

This new service from the National Library of Australia is very cool and I look forward to learning more about it and seeing how we can better utilise it and promote it to our users.  There was several papers on Trove, so check them out to find out more about how it was created and exactly what it can do.

6. Open source

Is more widespread than I had ever thought about. But when I did, realised that we are using so much open source software already – it runs our Internet servers and our browsers, as well as much of our communications.  Is it that big a step for us then to start using open source software for other purposes? It’s already proven its worth in those areas listed.

7. Twitter and Blogging

Twitter was the new kid on the block at the last VALA conference.  This year, it made its presence felt big time.  It was a great back channel to what was going on in other sessions, a guide to what was worth checking out and a great way to network with other librarians, both at the conference and following along from outside.

Much to our delight, the hash tag #vala2010 was in the top 5 twitter tags in Australia the week of the conference, hitting number 1 on the Thursday – the last day.  It was also a great delight to finally meet all those twitterers I had only known online before then and to meet and start following twitterers that I met there. I think that I have started following at least another 20 people since the start of the conference.

Keep up  the good work all – you make working on computers all day all the more interesting and what you share is  entertaining, informative and useful in turn.

Twitter probably outdid blogging in terms of content sharing this VALA, but it still had its place for the detail on content. Being a conference blogger myself, I really appreciate the depth that I can get from a blogger’s reports. They are also a great teaser for the papers that I may want to go and read in full. The papers BTW are freely available from the VALA website – well worth checking out.

8. Networking

It was the best conference ever, for just spending time with other like-minded library staff.  The social events were great for this, but it was even happening whilst waiting for sessions to start, or during the breaks. It was wonderful sharing thoughts, ideas, feedback and what you’re up to, with other enthusiastic librarians (and others), who speak the same language.

9. Presenting

I was fortunate enough to present two papers, and get away with it, lol.  Both my papers, presented with two different co-authors were well received much to my amazement and relief. I have had several people follow me up with questions on both papers since, much to my delight.

Writing a paper is a difficult enough process to begin with, but then trying to present that paper in a snapshot presentation is even more so. I learnt a lot from other presenters at VALA about how to engage the audience and even how to present so that you retain their interest.

10. VALA Conference Committee

I was a member of the conference program committee this year, but the role we played was so small, compared to all the work put in by the VALA committee in general. These guys all have regular jobs and real lives, yet put everything into getting this conference off the ground, running as well as it did and responding to issues quickly and efficiently as they arose.

Alyson Kosina, the backbone of VALA is an amazing lady, who you should take a moment to meet and chat with. You will walk away enriched. David Feighan and Bart Rutherford, the Conference Chair and VALA president respectively, were endlessly everywhere, managing, listening, participating, anticipating and in Bart’s case, presenting one paper when the speakers couldn’t get here in time. Dedication personified.

I really enjoyed working with them in the small role I played and learnt a lot. I very much look forward to more opportunities to be involved with VALA.

And amazingly, this blog posts has ended up with 10 reflections. That was not my intention, it just developed that way.

Thanks to all my co-conference attendees for helping to make it the best conference I have ever attended.  Bring on #VALA2012!

Information Online 2007 – Day 1 First session.

mobile web, Online 2007, Online conference, semantic web, sensor web, Web 2.0 No Comments »

Well here I am in Sydney attending one of Australia’s premier library conferences, the Information Online conference 2007. It was a big day today, with 3 keynotes as well as other sessions. I will do my best to summarise here, what I have taken in lots of written notes. And to save everyone eye strain, I have split at least the first day into morning and afternoon sessions.

Special Minister of State Gary Nairn officially opened the proceedings, with some interesting information and a reasonable insight into what librarians are on about. Of most interest was the e-government strategy and the www.australia.gov.au portal, which is the gateway to all federal government websites. At present, 13% of people dealing with government do so only online, with Minister Nairn anticipating a figure of 30-40% will really redefine how government offers service. Blogs are also on their radar, as are other Web 2.0 applications and mashups.

The opening keynote was from Ross Ackland, Director of the Australian office of the W3C and the CSIRO ICT Centre. He gave a very interesting take on Where the web is heading, from both the W3C perspective and his own experiences.

W3C’s long term goals are the web for everyone, web on everything, knowledge base which is advanced data searching and sharing and trust and confidence – where there is collaboration, accountability, security, confidence and confidentiality. Next step for us as consumers is to use our portable devices as our purchasing power, moving on from credit cards. Although only 3 organisations in Australia are W3C members (CSIRO, Vision Australia and AGIMO), there has been significant technical input from Australia on W3C standars.

So from here its the Semantic Web – where the meaning of information is understood by machines, making searching more successful. Although it is not there yet, much work has been done on the foundations on which this will rest (ie. XML, ontologies etc). He believes that there is another 5 years before it is ready for market adoption.

In the meantime, he believes that Web 2.0 is providing great complimentary interfaces. They pave the way to the eventual rise of the semantic web, by getting users accustomed to collaboration, open interfaces and applications that can leverage multiple services.

The Mobile web has a W3C web initiative (2005) behind it, which has also been fully backed by all the major telecommunication companies. End user acceptance is the catalyst needed now. Libraries need to seriously think about delivering information to devices that are no longer sitting on a desktop. Think phones, PDAs and more.

Sensor web is the streaming data coming from wireless devices that sense environment, including environmental monitoring, home automation, security, personal health monitoring and entertainment. The monitoring devices are cheap, but how do we manage the streamed data that they will generate. Issues also arise in the searching, integration, translation and storage of such data.

He also spoke about how the Internet has a role to play in Australia’s Water Crisis, including bringing all water data together from very diverse sources, so that the best decisions can be made on how to proceed. (check out http://wron.net.au)

His predictions are that the Web will accelerate in development, with Web 2.0 being only the tip of the iceberg, that libraries have to stop building traditional websites, that mobile will become equal to the desktop and that anyone will be able to build web applications. Wow, sounds like its going to get even more interesting.