Woodblock printing was first established in 200 BC, lithography was invented in 1796 and modern-day screen printing was introduced in 1907. We’ve all had a printing history lesson before, but how are we taking printed documents of the past and preserving them for the future? How are products that were printed throughout the ages being digitized and archived so that future generations can enjoy them as we have?
Too often we see print as a disposable means of disseminating information (magazines, newspapers, flyers) but what about the printed pieces that are meant to last? Print advocates and old book lovers the world over have always faced the challenge of preserving books but there is now a viable means of preservation to capture information long into the foreseeable future using digital technology. It’s as though books used to have an expiry date, but we’ve uncovered a magic pill to make their content live on for eternity.
Why Digitization is Necessary
Due to the nature of paper, deterioration will happen over time no matter how well it is preserved. Temperature, relative humidity and light all play a role in deterioration of paper, parchment, leather, adhesives and other materials used in bookbinding.
High temperature and high relative humidity can promote the growth of fungi, which can damage the book. Conversely, low relative humidity can promote over-drying and cause paper, leather and bookbinding materials to become brittle and weak. Because of the hydroscopic properties of paper, extreme changes in temperature and relative humidity cause rapid deterioration of paper.
Light (in the form of fluorescent, ultraviolet and natural sunlight) can damage books through fading of the pigmentation. Different coloured dyes and pigments deteriorate at different speeds, which can negatively affect the original contrast of the document. With regard to paper, light promotes the oxidization of cellulose, further deteriorating the material. Additionally, light reacts with lignin in the paper to cause the paper to turn yellowish brown over time. The more lignin present in a given paper (newspaper, for example, has high lignin content) the more yellowing will be prevalent. For all of the damaging effects we can see when old books are exposed to light, there are other, more subtle changes happening within the paper. Fibres that make up the paper’s structure can break into smaller and smaller subsections, thereby creating a structurally unstable document that has the potential to fall apart if handled.
What’s important to understand is that paper and books don’t last forever. No matter how well paper is preserved, organic and inorganic compounds will break down over time, which is why there is such a need to employ technology as a preservation tool. Deterioration impacts both the accessibility and longevity of printed collections, and digitization is an excellent avenue to archive important documents for years to come.
Public libraries, teachers, researchers and other citizens are the users of archived content for educational or genealogical purposes. I had the pleasure of speaking with Kimberly Taylor, who was once a Digital Archivist for a regional museum and who is now a secondary school History Teacher. Kimberly has the unique viewpoint of having been an archivist and is now a user of archived documents in her teaching. She understands the importance of online historical records for free or for a fee, but she points out that very few are available in digital format, relative to the number of old printed documents in existence.
Before Kimberly’s digitizing project at the museum was underway, if anyone was interested in looking at the old documents and photos in the museum’s possession, they would have to make an appointment with the museum to view them. The biggest obstacle here was that very few people knew they existed or knew that they were available for viewing.
Kimberly points out that there are many documents available for people to view but lack of knowing where to find them prohibits individuals from experiencing them at all. Many are not yet available online, and only available as a hard copy: “You can come to Library and Archives Canada where we have the document and you can photocopy it… but you have to drive six hours to Ottawa to get it.”
As part of her work, Kimberly was digitizing photographs, scanning into RAW file format, and then saving the files as TIFF’s. She emphasizes that “digitizing is only the first step”. It is then necessary to create a portal to access the information. When there is a lack of grant money available, museums are now turning to online services like Flickr to act as their portal to share archived documents instead of building proprietary systems.
An important consideration for a sustainable digital asset management workflow is to be aware of the rapid rate of technological change and keep up to ensure your storage format does not become obsolete. Although paper won’t last forever, it can last quite a long time if it is properly preserved (therefore it’s important not to get rid of the original when you digitize!). For example, the regional museum had film reels with lots of great historical information, however they didn’t have the equipment necessary to view them. Digital Archivists have to be aware of the technology medium used to capture the information, whether that is a hard drive or DVD, because in a few short years, there may not be that same technology to view and recover the data. Therefore it is important to always be mindful of moving with the speed of technology and ensuring the data is constantly managed and moved to new platforms if necessary.
Digitization & Preservation Technology
Book scanning technology most commonly converts book pages into one of two file formats: Portable Document Format (PDF) or Tagged Image File Format (TIFF). Although 300dpi is an ideal resolution to scan a given image, high quality book scanners capture image data at much higher resolution in order to preserve fine details. This technology can employ Optical Character Recognition (OCR) software that translates images of text into a digital file format, which can then be searched and processed by other applications and individuals. An advanced version of OCR technology is Intelligent Character Recognition (ICR), which is able to adapt and learn the characteristics of specific fonts during processing to improve recognition levels.
The book scanning process can be manual or automated based on the technology. On a commercial image scanner, the book is placed face down on a flat pane of glass and an optical array moves across the glass capturing the image information. Other book scanners are designed to sit the book face up in a v-shaped frame with glass pressed against the pages. It then takes photographs of the pages from above. Pages can either be turned by hand or by an automated device. Higher-end options can also include vacuum capabilities and static charges to turn the pages while digitizing is done automatically. Another book scanning method available (but it is not always a viable option with old books) is removing a book’s spine to create loose pages and moving pages through an automated page-feeding scanner. This provides a cost-saving option but is damaging to the original book.
After scanning, software is used to line up the pages with one another, as well as crop and edit. The sheer volume of historical books to be scanned provides a challenge for archivists even though commercial grade book scanners can scan thousands of pages per hour.
After digitizing is complete, the original will be kept safe from environmental dangers in an archive, but what if the original is not in optimal condition? In some cases, book restorations specialists are needed in libraries, archives and museums to restore old texts to their former glory. These experts can repair, restore or rebind a given text while maintaining the overall integrity of the book. Full or partial spine restorations can be achieved depending on a book’s condition. Books with pages falling out can have the pages carefully sewn back into it.
Book restoration specialists can also perform paper restorations. They can replace missing portions of paper documents or even employ a process called “deacidification”. This process stops acid decay for up to 75 years by infusing a magnesium buffer into the paper fibres.
Globally Preserving Our Records
With historical digital document services available like Ancestry.ca, genealogical research is made easier through the literally billions of digitized and indexed documents available via the Internet. Ancestry.ca brings original records from around the world to create online access. They employ a content acquisition team who seeks out historical records and then works with governments, archives and content owners to make these records available in a digital format. Once they have obtained access to the documents, they use their digitization process to create high-resolution scans of the documents. Next, their indexing team works to covert the information into a format that is easy to read, as well as easily searchable. The documents are then uploaded to the web where they can be manually searched or pushed to individuals as “Ancestry Hints” which automatically search key collections to find records that may be relevant to the user.
Crowdsourcing and collaboration tools are also employed with memory institutions like Ancestry.ca, as a way of better understanding how sets of digital documents relate to one another to create meaning for users. Crowdsourcing is defined as “the act of taking a job traditionally performed by a designated agent (usually an employee) and outsourcing it to an undefined, generally large group of people in the form of an open call”. Crowdsourcing and collaboration tools enable pooling of information to create a more complete picture of historical documents. The World Archives Project is Ancestry.ca’s current community project enabling global collaboration. This project allows people from anywhere in the world with Internet access to help index historical documents and records. Once records are digitally photographed, users can participate by accessing records online at Ancestry.ca and then enter names, dates and facts that make these documents searchable online.
Library and Archives Canada: A New Direction For the Future
Library and Archives Canada (LAC) is our national memory institution to support historical research about Canada. LAC’s mandate is to ensure “the best possible account of Canadian life is captured through acquiring, preserving and making accessible Canada’s documentary heritage for present and future generations.” The documentation includes published and unpublished works, private and public documents and portraits. Every second of every day, a new piece of information is created or discovered somewhere in the world that may contribute to Canada’s continuing memory. LAC has recognized their unique challenges moving forward in a society that is digitally driven, with individuals who expect immediate access to information via the Internet.
LAC affirms that: “the face of information has changed substantially in the last decade: superabundance; rapid creation, sharing and remixing by individuals; multiple formats; unprecedented access; ever-present and expanding user influence, points of view, skills and engagement. This picture is in direct contrast to that of the past… We must find ways to grow closer to citizens and society and embrace evolving technologies.”
LAC is adapting to new technological access points including virtual vaults, podcasts, expert demonstrations and webcasts. They have recognized that the digital asset management framework that they will establish moving forward, must reflect the changing technological landscape of our society in order to stay relevant.
A viable solution to broad accessibility is collaboration with other memory institutions in Canada. Shared information and joint projects are viable ways to broaden accessibility and LAC recognizes the opportunities possible through collaboration of physical and digital documents. Collective catalogues perpetuate their mandate of accessibility by providing archived printed material regardless of geographic location. LAC acknowledges that: “broad accessibility must remain at the forefront of our thinking…” A shared approach is the key to future success.
Additionally, LAC has recognized important questions relating to its core mandate that they need to address moving forward: How do we best engage citizens and professionals from all domains in our efforts? How do we reach out to the new generation of “born-digital” consumers? How do we ensure they can find the documentary heritage we have acquired and described? How do we work in collaboration with stakeholders?
LAC believes that their traditional archival techniques of historically printed matter must be maintained alongside the new digital archiving techniques. Metadata is formatted based on the document’s description and enables user-friendly searches. “The future success of our institution, both in the short and long term, will be largely dependant on the modernization of these processes.”
Entire new disciplines of study have been established from the convergence of literature, printing and digitization: Digital Humanists. In an age where the disciplines of technology and humanity have come face to face, Digital Humanists are creating new possibilities for archiving printed matter and enabling accessibility to more people than ever before.
I had the pleasure of listening to Cara Leitch and Julie Meloni discuss “The Future of the History of the Book” at the University of Victoria, British Columbia. Cara is a Researcher who works in UVic’s Electronic Textual Cultures Lab (ETCL), a facility in which data-harvesting, textual content analysis and document encoding takes place. Cara’s focus is on digitizing 19th Century texts and social networking. Julie has also worked in the ETCL with a focus on Information Management. They are both book lovers and consider themselves “Digital Humanists”. Through their work, they rediscover the meaning of texts by using technology, comparing with other texts and annotating. Through using OCR software, as well as crowdsourcing, Digital Humanists preserve texts of the past for future generations to enjoy.
During Cara and Julie’s session, we used technology to create a word cloud to visually understand the true focus of a specific text. By using the website www.wordle.net, we imported all of the text and the software and created a word cloud that organizes all of the text by how often words were mentioned. The words mentioned most often will be the largest in the word cloud. This helps Digital Humanists uncover patterns in the text in a very non-traditional way using technology. During their session, we also had the unique opportunity of viewing a scan of a 19th century document and we then worked to index the document by deciphering handwriting that was over 100 years old. After indexing was complete, this document was searchable and made available on the Internet.
It is important to understand how printed documents and books of the past will be preserved for future generations. Environmental conditions, such as temperature, relative humidity and light all play a role in the deterioration of paper and printed matter. Digitizing is the answer for increasing longevity, accessibility, and broader use of a given document. Print advocates and old book lovers the world over have always faced the challenge of preserving books but now have a viable resource to preserve information long into the foreseeable future. Even our national memory institution, Library and Archives Canada, understands the importance of the momentous shift in archiving processes and keeping up with technology to stay relevant. Digitization of old printed matter has created new jobs and research opportunities, as Digital Humanists use technology to learn about printed documents in whole new ways.
Crowdsourcing and collaboration projects, such as the World Archives Project, enable global collaboration thereby increasing our understanding of the connection between historical texts. Additionally, by working with governments, archives and content owners, sites like Ancestry.ca create accessibility for increased learning and understanding of years past.
Although digital technology is an excellent solution for preserving printed pieces of the past, forward thinking digital asset managers must always understand changing technology and manage their databases as not to become obsolete. From analog to digital and from paper to computer screen, historical documents are moving full speed ahead in the 21st Century.