Shaping the Values of Youth Project History
"Shaping the Values of Youth: Sunday School Books in Nineteenth Century America" was funded by a grant from the Library of Congress/Ameritech National Digital Library Competition. The period of the grant was September 1999 to February 2001.
The works digitized for this electronic archive are owned by the Special Collections Division of the Michigan State University Libraries and the Clark Historical Library at Central Michigan University. Additional titles may be digitized as they are acquired by MSU or CMU.
Digitization was performed by staff of the Digital & Multimedia Center at the Michigan State University Libraries. The electronic copies are available both as page images (.jpg files) and text transcriptions available in HTML and XML. Scanning was performed without disbinding the original books. Text transcriptions were marked up using Text Encoding Initiative (TEI Lite) tagset before conversion to HTML/XML for web browser display.
To learn more about this project please select on of the following links:
Digitization was performed by the Digital & Multimedia Center, Michigan
State University Libraries. The electronic editions were produced in two
formats: page images and text transcriptions.
All books were scanned without being disbound. The original grant proposal specified that tightly bound books would be scanned with an overhead scanner. This turned out not to be necessary; it was possible to scan all items face-down without causing further damage. The majority were scanned on a Hewlett-Packard color flatbed ScanJet 5100C with a 9x12 inch scanner bed. A few large-size titles were scanned on a UMAX Mirage D-16L color flatbed scanner with a 12x17 inch scanner bed. Scans on both machines were done at 400 dpi, 256-bit color, and saved in compressed .TIF format.
The .TIF images were resized and saved as .JPGs using JASC Image Robot. version 1.1. Both versions were burned to CD-ROMs for storage. Copies of the archival .TIF images are available on request; the .JPG images are used for web delivery.
Typing copies of the books were produced on a black & white Minolta
overhead scanner with output to a photocopier, to eliminate the need for
further handling of the originals.
The majority of texts were typed using the freeware program Note Tab Light, version 4.6a, available from http://www.notetab.com/ntl.htm. Note Tab was used instead of a conventional word processing program because it produces only ASCII text and has tag libraries which can be customized by editing one of the .clb files accompanying the program. A miniature tag library was created for the Sunday school books project, containing only page breaks, paragraphs, three font styles (italic, bold, and smallcaps), and an ISO-LAT1 character menu.
Undergraduate typists followed a set of transcription guidelines, described in Editorial Interventions. The typists inserted page break and paragraph tags, noted font changes (from roman to italic, bold, or smallcaps), and inserted codes for ISO-LAT1 characters. Other special characters, illustrations, and any other non-textual material were noted for later attention.
The majority of the books were typed twice by different students or teams
of students, and proofread using a file comparison program. The freeware
program ExamDiff was used at first, and later replaced by the Compare Documents
function in Microsoft Word 97. Single versions of a few texts were corrected
manually by a staff member with professional proofreading experience.
The corrected texts were then encoded in TEI.2 conformant markup using the TEILITE DTD. See the Encoding Guidelines for details. SoftQuad's Author/Editor was used at the beginning of the project and replaced by SoftQuad's XMetal 1.2, later 2.0. The texts were created and saved in SGML format, then converted to XML documents in XMetal.
An XSL style sheet was created to display the transcriptions in Internet
Explorer 5.x. XMLwriter, version 1.21, was used to produce HTML copies for
display in Netscape and earlier versions of Internet Explorer.
The following editorial interventions have been made in the transcriptions of the Sunday school books.
Archaic spelling and punctuation have been retained. Typographic errors have also been retained, and have been tagged as <sic> with the attribute of the "corr" value containing the correct spelling of the word. An exception has been made for missing quotation marks because the resulting tag <sic corr="""></sic>, with three consecutive sets of quotation marks, will not validate against the DTD. Missing quotation marks have been added silently.
Line breaks have not been retained. Ambiguous end-of-line hyphens have been retained, even across a page break. Unambiguous end-of-line hyphens have been dropped. When an unambiguous hyphen divides a word across two pages, the hyphen has been dropped and the trailing part of the word moved from the top of the second page and joined to the leading part of the word at the bottom of the first page. The page break is then inserted before the first full word of the second page.
Words which differ in type style from the surrounding text (italic and small caps, generally for reasons of emphasis) have been noted. The occasional use of a decorative type in places such as the column heads in a table of contents has not been noted.
Initial capitals which are simply larger in size than the surrounding text have been transcribed without special tagging. Decorative initial capitals have been treated twice: first tagged as a <figure> and then rendered as part of the transcription. This ensures that the text is still readable if illustrations are not displayed as inline graphics.
Running heads and page numbers have not been transcribed. Page numbers or page identifications have been used as the "n" attribute of the <pb> tag. (For example, <pb n="72"> or <pb n="title page">) Signature marks (inserted by the printer to correctly assemble the folded and gathered pages) have not been transcribed.
Inscriptions have been transcribed if they appear to be contemporary with the original publication of the book. Book plates have also been transcribed, but inscriptions such as prices, call numbers and inventory numbers which appear to have been added later by librarians or bookdealers have been ignored. Random pencil marks in a few books, apparently made by young children, have also been ignored.
Footnotes whose text extends across more than one page have been consolidated into a single note.
Illustrations have been been noted with the <figure> tag. Decorative typographic devices have also been noted if they can be said to depict some recognizable object. For example, a small row of flowers would be tagged as a figure, but a plain horizontal rule would not, though either might have been used to signal the end of a chapter.
Blank pages have been ignored in the transcriptions. Images of the blank pages were created as part of the preservation copy of each book but have been dropped from the sequence of page images for the convenience of the reader.
Seven 19th century hymnals were added near the end of the project. While the hymnals are a natural complement to the rest of the collection, they required some special editorial treatment. Most significantly, it was beyond the scope of this project to mark up the musical notation. However, the musical scores are still available to users by viewing the page images. The words to the hymns were transcribed verse by verse: that is, the order in which they would be sung, not the order in which they appear on the page. The chorus for each hymn is given only once, although it would normally be repeated after each verse when the hymn is sung aloud.
- Coding of repetitive elements performed by typists
- Editorial interventions
- Small scale elements
- What not to code
- How to note problems
- Line breaks are not retained.
- Ambiguous end-of-line hyphens are retained, even across a page break.
- Unambiguous end-of-line hyphens are deleted. If they occur across a page break, the trailing part of the word is moved from the top of the second page to the bottom of the first page. The page break is inserted immediately before the first full word of the second page.
- Footnotes are moved to the end of the <div> in which they occur. Each <note> is assigned a unique id= attribute, with a corresponding target= attribute at the reference point in the main text. Footnotes spanning more than one page are consolidated into a single note.
- Do not use <div0>; always start with <div1>. This is because of an anomaly in the TEI Dtd which allows <div0> in <body> but not in <front> or <back>. We will follow the "Best Practices" guidelines which recommend not using <div0> at all.
- Always use <div1> with the attribute type="body" to enclose the entire main body of a book (everything except the frontmatter and backmatter). This means that <div1> tags should always appear immediately within <body> tags. In a way, this makes the <body> tag redundant, but in many cases the tag would be needed as a wrapper for elements that either cannot or should not appear directly within <body>, such as <pb>. To be consistent, we'll use it always.
- Names (unless part of title pages, bylines, citations, and the like)
- June 2000: first draft completed.
- Sept. 15, 2000:
- added section on coding of repetitive elements performed by typists
- added section on editorial declarations
- added link to TEI header template for SSBs
- Sept. 20, 2000:
- added section on <titlePage>
- added link to training document on overall TEI.2 structure
- clarified description of footnote coding in Editorial Declarations section
- Nov. 1, 2000:
- added section on coding letters
Coding of repetitive elements performed by typists
Because the Sunday school book texts are typed in-house by student staff, it is possible to have repetitive elements inserted as part of the typing process. This streamlines the work of the staff trained to do SGML coding. At the typing stage the following elements are added to the text:
|<p> </p>||Paragraph tags are added throughout the body of the text. Some of these tags will be changed to <q> or <lg> or similar paragraph-equivalent tags by the coders.|
|<pb>||Page break tags are added throughout the document. The value of the n= attribute is either the Arabic or Roman numeral printed on the page, or "unnumbered".|
|<emph>||Within the body of the work, text set off in italics, boldface, or small caps is indicated with <emph rend="italics"> (or "bold" or "smallcaps"). During the coding some instances of <emph> may be changed to <hi>.|
The following editorial interventions are to be made as appropriate. These are declared in the standard TEI header for the project.
Use the following tags as appropriate.
|<byline>||This will typically include the element <docAuthor> along with some statement about the author, such as: <byline>Written by a long-time missionary to Burma, <docAuthor>The Reverend Claudius Buchanan</docAuthor></byline>|
|<docAuthor>||Alternately, the <docAuthor> may be used by itself (without being enclosed in <byline>) if the author's name appears with no additional information.|
|<docEdition>||Used to tag information about a book's edition, such as: <docEdition>Third edition, revised.</docEdition> (Does not occur frequently in the Sunday school books.)|
|<docDate>||Date of publication. May occur on its own within <titlePage> or within <docImprint>.|
|<docImprint>||Imprint information. May include the name of the publisher <publisher>, the place of publication <pubPlace> and the date of publication <docDate>.|
|<docTitle>||The title of the work. This element must include one or more <titlePart> elements.|
This can be the entire title of the work,
if appropriate, or portions of the title with type= attributes "main" and "sub" for
main title and subtitle. Examples:
<docTitle><titlePart type="main">The Green Mountain Annals.</titlePart></docTitle>
<docTitle><titlePart type="main">Kindness to Animals:</titlePart><titlePart type="sub" Or, The Sin of Cruelty, Exposed and Rebuked.</titlePart><docTitle>
|<epigraph>||A quotation relevant to the subject matter of the book. (This may also occur at the beginning or end of a chapter.)|
Tables of Contents
In most cases, tables of contents in Sunday School books are presented with the chapter names on the left side of the page and the page numbers on the right side.
Tag the list of chapter names/page numbers as a list, with each chapter name/page number pair as an item in the list. Tag the page number as a <ref> with the attribute rend="align(right)". If there is a heading such as "Table of Contents" tag it as a <head>.
Major Divisions of the Text
Within the <body> of a TEI document, the major portions of the text are marked with <div> tags. The TEI rules allow you to use either unnumbered divisions <div> or numbered divisions <div0>, <div1>, <div2> ... all the way to <div7>.
In the Sunday School books, always use numbered divisions.
The <div> level stays the same for each unit of a book at the same structural level. The n= attribute is used to indicate that more than one unit exists at that level.
<div1 n="1"> Chapter 1
< div1 n="2"> Chapter 2
< div1 n="3"> Chapter 3
<div1> Chapter 1
< div2> Chapter 2
< div3> Chapter 3
If you use more than one level of numbered <divs> they must be nested within each other, each level representing a smaller unit of the book. <div2> can only occur within <div1>, <div3> can only occur within <div2>, and so on.
| <div1 n="1"> Chapter 1: First
< div2 n="2"> Section 2: Methodology
< div2 n="3"> Section 3: Results
< div2 n="4"> Section 4: Conclusion
In the Sunday School books:
|A book with no smaller units within the body (often seen in short works like The Deadly Cigarette or The African Woman)||<body>
<p> At least one set of paragraph tags is required!</p>
|A book divided into chapters||<body>
Sections within a chapter would be <div3>, subsections would be <div4>, etc. (Unlikely to occur in the Sunday School books.)
Identify the location of illustrations with <figure> tags. The <figure> element can contain a <head>, one or more <p>, and a <figDesc> or figure description. At a later date, we will add entity references for images of the illustrations themselves.
<figure> can occur within <p> but there are occasions when illustrations appear outside the boundaries of a paragraph: for example, in a frontispiece. In the latter situation, the <figure> would need to be wrapped in a pair of <seg> (segment) tags. For the sake of consistency, in the Sunday School books we will always wrap <seg> tags around <figure> tags. This will validate whether the <seg> appears within the boundaries of a paragraph or between paragraphs.
Determining whether or not a certain non-textual element is really an illustration can be a gray area. The rule of thumb for this project is: if the non-textual element is any kind of representative image, tag it as an illustration, even if it appears to be merely decorative or unrelated to the text. If the non-textual element is merely an abstract image, ignore it.
For example, if the title page of a book has the title separated from the imprint by a hairline rule or a design made by repeating characters like this ~~~~~~ or this ======= or this ++++++++ , ignore the illustration. It's not a picture of anything you can identify. But, if the title is separated from the imprint by a small row of flowers, tag it as an illustration, even though the purpose of the illustration is still essentially decorative.
|Full-page illustration falling within the boundaries of a paragraph: that is, the paragraph started on the previous page continues on the following page:||...text text text.
< p>Further text following caption, if there is any.</p>
< figDesc>A 1-2 sentence description of the illustration.
text text text.</p>
Full-page illustration not falling within
boundaries of a paragraph: that is, the previous page ends with a </p> tag
and the following page with start with a <p> tag.
<seg> needs to be within the container <p>.
< head>Caption, if there is one.</head>
< p>Further text following caption, if there is any.</p>
< figDesc>A 1-2 sentence description of the illustration. < /figDesc>
Quotations; Poems and Hymns
In-line quotations set off with quotation marks do not need any special tagging. Block quotations set off from the surrounding text do need to be marked with <q> and </q>.
In the Sunday School books, the vast majority of block quotations are hymns and poetry. In addition to being identified as quotations, they must be marked with <lg> or "line group." <lg> tags should include attribute type= with the value "poem", "hymn", or "other". (Curiously, it's extremely uncommon in the SSBs for Bible verses to be treated as block quotes, so I'm not including that as a value for <lg type=>.
Within the <lg>, each individual line should be marked with <l> and </l>. This is a little time-consuming but allows us to indicate which lines have been indented. Within the <l> tag, use the attribute rend="indent1" for an indentation of one tabstop, rend="indent2" for two tabstops, etc. Don't worry about whether "indent1" is exactly equal to 5 or 10 characters; it simply stands for the smallest amount of indentation, while "indent2" is the second smallest amount of indentation, and so on.
< l rend="indent1">Second line of poem, indented in the original.</l>
< l>Third line of poem, not indented in original</l>
< l rend="indent1">Fourth line of poem, indented in the original.</l>
Footnotes which extend across more than one page in the original text should be consolidated into a single note. In the SGML file, they will appear at the end of the current <div>. SSB typists will continue to type notes at the bottom of the page they appear on (or begin on, if they extend over more than one page.) Coders will need to move them to the end of the <div>.
Each note will need corresponding targets and ids. The first time you encounter a note in a particular book, start a list and keep it in the book folder. Keeping a list is especially important for the SSBs because the notes are usually unnumbered.
Number the notes sequentially. These numbers will be the values for the target/id pairs. The values must begin with a letter. There is no need to distinguish between multiple notes appearing on the same page or to identify notes that were on more than one page in the original text. (The list kept for the book folder should resemble the first two columns of the table below.)
|n1||p. 16||first note in the book, appearing on page 16|
|n2||p. 23-24||second note in the book, beginning on page 23 and continuing on page 24|
|n3||p. 45, 1/2||third note in the book; first of 2 notes on page 45|
|n4||p. 45, 2/2||fourth note in the book; second of 2 notes on page 45|
Epigraphs and Arguments
A common practice in 19th century works was to precede chapters of novels with a summary of the action to follow or with a relevant quotation.
The summary is called an argument. The quotation is called an epigraph. An epigraph can appear at the end of a chapter as well as at the beginning.
<div2 type="chapter" n="3"
< head>Chapter 3</head>
< argument>In which Catherine makes an awful discovery about the family at Northanger Abbey.</argument>
< p>It was a dark and stormy night...
<div2 type="chapter" n="5">
< head>Chapter 5: The Picnic in the Park</head>
< epigraph>"O! What is so lovely as a day in June!" </epigraph>
< p>The day of the picnic came at last...
Some of the Sunday school books include quotations from or transcriptions of letters, whether they are real correspondence (in non-fiction works) or letters used as a storytelling device in a novel.
Inline quotations (usually only a line or two) set off with quotation marks do not need to be tagged, following our guideline for quotations of poems and hymns.
If a letter is shown as a block quotation, you should be able to determine from the text if only a portion of the letter is being quoted or if an entire letter is being reproduced or transcribed. An 'entire' letter will typically have such elements as a salutation, the date the letter was written, and a signature, and may also include where the letter was written, a closing salutation, one or more postscripts, etc.
If the block quotation appears to be only a portion of a letter and lacks any element except one or more paragraphs, then tag it as simply <q type="other"> </q>.
If the block quotation appears to be an entire letter or has any text beyond what can be tagged as <p>, then use the following tags as appropriate:
|<div>||Use a numbered <div> to identify the extent of the letter. This will mean that the remaining portion of the <div> (typically a chapter) the letter appears in will need to be tagged with the smaller <div> level as well. For the letter itself, the type= attribute should be "letter". For the remaining portion of the chapter, the type= attribute should use "portion".|
|<opener>||Use this to group together all the elements preceding the body of the letter. May include <dateline> and <salute> and (occasionally) <signed>.|
|<closer>||Use this to group together all the elements following the body of the letter. May include <dateline> and <signed>, and (rarely) <salute>.|
|<dateline>||Use this to tag items such as the date, time, and/or place that a letter was written. (Further subdivisions are available, such as <date>, <time>, and <address>, but please don't use them in the SSBs. <dateline> is sufficient.)|
Use this to tag the text identifying
the writer of a letter. This might be only a name or pseudonym, or
a longer phrase:
<signed>From your fond aunt, Cornelia</signed>
(Further subdivisions are available, such as <name>, but please don't use them in the SSBs. <signed> is sufficient.)
Use this to tag a greeting or salutation
at the beginning or end of a letter.
<salute>To Whom It May Concern:</salute>
At the end of a letter, it can be harder to decide whether to tag something as <salute> or <signed>. Is "I remain your humble and obedient servant" more of a "salute" than "From your fond aunt" ??
I would err on the side of simplicity in tagging and use <signed> to enclose an entire phrase like "I remain your humble and obedient servant, John Witherspoon" instead of dividing it into <salute> and <signed>.
|<p>||Use <p> in the body of the letter. The body is everything between the <opener> and the <closer>.|
|<trailer>||Use this for postscripts, if any. (<p> cannot be used after a <closer>.)|
Italicized Words and Other Font Changes
Words and phrases may be italicized, or shown with some other font change, in a variety of situations. The tagging depends on the reason for the font change.
|Italics are often used to denote a word or phrase in a language other that of the surrounding text. The tag for this is <foreign>. You must include the lang= attribute. For the value, use the 3-letter codes in the ISO 639 list.||
She was a fascinating woman. She had
a certain je ne sais quois.
She was a fascinating woman. She had a certain <foreign lang="fre">je ne sais quois.</foreign>
|Titles of books are usually indicated with italics or underscores. Tag as <title>.||
I am reading War and Peace.
I am reading <title type=book>War and Peace.</title>
|Font changes may be used for linguistic or rhetorical emphasis. Tag as <emph>.||
It is a very dull book.
It is a <emph rend="italics">very</emph> dull book.
|Font changes may be used for some typographic or decorative effect: for example, footnote numbers are usually printed above or below the type line (superscript or subscript) or the first word of a new chapter may be in boldface or all caps. Tag as <hi> (for highlighting).||
Acceptable values for rend= attribute
< hi rend="bold">
< hi rend="italics">
< hi rend="underline">
< hi rend="smallcaps">
< hi rend="sub"> (for subscript)
< hi rend="sup"> (for superscript)
Obvious misspellings and typographic errors should be coded. However, don't tag 19th-century spellings or usage that differ from current standards. If in doubt, attempt to find the same word used elsewhere in the text, or ask Ruth Ann.
For misspellings and typographic errors, use the <sic> tag with the correct spelling as the value of the corr= attribute.
< sic corr="Michigan"> Mihcigan </sic>
However, if the typographic error consists merely of missing punctuation, add the missing character and tag it as <corr>.
" What time is it" can be tagged as <p>"What time is it <corr>?</corr>"</p>
What Not to Code
Even though tags exist for the following elements (which you'll see in the Author-Editor or XMetal menus) do not tag the following elements:
How to Note Problems
There may be occasions when you want to skip a problematic, partially-tagged section until you can get advice on the correct way to code it. You can use "comment" tags to isolate the problem text and continue validating the rest of your work as you go. The text that is "commented out" will be ignored in rules checking.
< !-- This text is wrapped with comment tags. -->
Creation and revision history of this document
- Peter Berg,
Head, Special Collections Division, Michigan State University Libraries
- Michael Seadle,
Head, Digital & Multimedia Center, Michigan State University Libraries
- Ruth Ann Jones,
Digital Projects Coordinator, Digital & Multimedia Center, Michigan State University Libraries
Introductory Essays and Biographical Notes
- Stephen Rachman,
Department of English, Michigan State University
- Ruth Ann Jones
- Noel Allende Goitía
- Amin Maredia
- Mark Spano
- Amy Vance
Data Entry Workflow Management
- Amy Vance
- Stephanie Bour
Image Production & Conversion Workflow Management
- Mark Spano
- Stephanie Bour
eXtensible Style Language Programming
- Erica Olsen
- Edward J. N. Roberts
Original Website Design
- Erica Olsen
- Edward J. N. Roberts
- Andrea McVittie
Updated Website Design
- Shelby Kroske
- Jenny Brandon
Scanning and Data Entry
- Janet Baldwin
- Jennie Carmona García
- Anne Tracy
- Joshua Moon
- Carrie Preston
- Titles from the Russel B. Nye Popular Culture Collection, Michigan State University Libraries, and the Clarke Historical Library, Central Michigan University
- Digital archive created by the Digital & Multimedia Center, Michigan State University Libraries, East Lansing.
- Introductory essays and historical commentary by Professor Stephen Rachman, Department of English, Michigan State University.
- This site was made possible in part by funding from a Library of Congress / Ameritech National Digital Library Award.