Archive for the 'archives' Category

ALIA Dreaming 08 – Weds PM Concurrent Session – Edgar Crook

archives, conference, digital library, library conferences, virtual services 1 Comment »

Web archiving in a Web 2.0 world – Edgar Crook – NLA

NLA has 3 main methodologies for web archiving.

Pandora Archive has developed a world class archive of Australian websites, using PANDA, their digital archiving system. PANDA is a distributed system, so their partners can also use it. Other international library archiving systems are based on or similar to PANDA. They have developed persistent naming scheme and have arrangements with archiving and indexing agencies. As of 1st July 2008, it contained 19307 titles over 53 million files adding up to 2.2 TB of data (now over 2.4TB). Files can be a single PDF page, or an entire website. Over 50% of their files are government publications, but they also archived academic journals, blogs, podcasts and more. It is selective, because of the restrictions on staff resources etc. They have chosen their titles carefully and try to choose sustainable sources.

Domain name harvests – once a year, for between 3 and 6 weeks and in conjunction with the Internet Archive. In 2008, they are looking at crawling a billion files. Copyright is a major drawback. The websites are crawled by the Internet Archive and the files are then sent to the NLA. There are gaps where the website publisher bans bots, and the crawler also cant follow embedded links, so there are gaps in the domain harvests. There is also issues with Australian websites without the .au in their name. Data is not publicly available at this time, although it is being use by researchers.

Archive It – is an Internet Archive product, where you can pay money to have your website archived. Sites archived using this process include the PNG governmental and research institute websites the 2007 general election – including content from YouTube an MySpace, Cambodian election 2008, Burmese monk uprising 2007 and more. There are restrictions in that you cant recapture missed files and cant present it the way you want.
Still working on arrangements with other Web 2.0 content, ie. Bebo, Flickr, Facebook etc.

Librarians should think to tell Pandora about resources they should be archiving. Take responsibility for your web presence, make sure it remains or is archived elsewhere.

Will not be making PANDAs version 4, but in future will be working with international partners to develop a new backbone to the system.

ALIA Dreaming 08 – PM Concurrent Sessions – Jason M Gibson

archives, local history, virtual services No Comments »

Unpacking the indigenous knowledge centre concept – Jason M Gibson

Idea of a national indigenous knowledge centre was flagged at the 20/20 summit, with the idea of regional centres in support was favourably supported.

Inspiration has come from Mexico and other countries. Suspicion has been aroused by these centres as they seem to appear in countries where indigenous culture has been exploited or neglected.

In Central Australia alone there are 5 regions, with up to 20 languages in each region. Such a centre has to cater to them all.

NTL started testing this idea out nine years ago. Three remote communities were chosen to trial the knowledge centre concept. Had a vision of a physical space which would be interpretive, keeping, a museum, a library etc, the aim to improve access relevant to local communities and with the ability to assist in creating and hosting new content.

Several pilot services were launched but have not been sustained. In 2004, the Our Story database was launched and this has been successful. Research showed that the Our Story database had stimulated communities to conduct further research, including through the use ditigal resources.

Tea Tree Gully has had quite a successful result, with stories, place names, oral histories and much more. Internet access, books and information are available in a centre open 5 days a week. The community has taken ownership of their centre.

Indigenous knowledge had not been acknowledge as a legitimate structure until the 1980s. Indigenous peoples persisted in its maintenance and creation regardless. The need is now for improved access to information in its many and varied formats.

(session ran over time, so had to leave to get to next session)

Information Online 2007 – Day 3 – Session 2

Online 2007, Online conference, archives, copyright, digital right management, online publishing, publishing No Comments »

I was getting tired by now at the conference, like I am now with these writeups, so the notes are getting briefer – hang in there with me now!

Shauna Hicks from the Public Records Office of Victoria (PROV) spoke on “Archives in the 2st century”. In the new PROV reading room, each desk has power and computer outlets, enabling researchers to research online, take and upload photos of archival items and more. Their help desk uses a 1800 phone number, which users can call to preplan their trip or even to avoid their trip altogether. Records can be located through the website and copies ordered for mailout. Alternatively they can find out exactly what they need first and only have to make one trip instead of several.

Derek Whitehead from Swinburne, presented “Publish and perish – the meaning of publication in the online world” and what a can of worms that is?

Can something be “accessed, read and used” and not be published? Yes! Copyright, defamation, legal deposit and online content laws all have different definitions of published. Book publishing is different again and includes editing control, review, acceptance as a publication and commercial distribution.

Web publishing is putting information or transactions online – accessible on a web server. Published to the web (not “on”). Can we be online and published? Much debate about this. Theses are available through online depositories, but they are still not published. These are now running into copyright issues, with regard to cleared content, but the only thing that has changed is the delivery mechanism.
Archives and scholarly communication also fall into these grey areas? Is YouTube a publisher?

There is confusion over the broad and specialised meanings of copyright. Is everything now published because of the web? Online is more than a publishing medium. Think conversation, dialogue……

Questions/Thoughts:
- Do we need a word for online but unpublished?
- How will we determine ownership? (mashups, sharing etc)
- Online is not a digital version of analog. What rules apply?
- Copyright applies fully to online as a default. There is no Copyright 2.0.
- Metaphors are dangerous.
- Web helps capture an fix activities for commercial purposes – need to watch this.

What to do?
- Paper days laws threatens the online world.
- 3 actions – law reform
– need a new word for online but unpublished
– sue the appropriate copyright licensing (ie. creative commons, all rights etc).

The final paper this session was Jim Alexander from CAL on “Copyright and the Online Library”. Accessing content is changing by: changes to the traditional supply chain, entry of new intermediaries (search engines), culture of free use and rise of free content repositories.

Digital Rights management comes in 2 forms: technological including passwords, encryption, hardware/software controls. Rights Management Information: copyright, watermarks, digital signatures, metadata and now Digital Object Indentifiers (DOI0, which are growing in the publishing industry. 3 key principles of DRM are:
- identification of works and copyright of owners
- monitoring of access to and use of works
- facilitating payment
DRM must be of minimal burder to rights owners and users.

CAL is working on DRM, offering new services such as Digital Course Material (DCM), an online custom publication system for course support. Provides licensed content from over 40 publishers and can also incorporate institutions own licensed content.
Also Document Delivery Service – aimed at health/medical industry, giving access to content with rights cleared, quickly and conveniently.

Future: interoperable DRM for international online content access
- common rights management infrastructure
- choice for creators and quality for consumers