Text Encoding

Back to the Feeding America home page

The full text of each of the cookbooks in MSU's digitization project "Feeding America: The Historic American Cookbook Project" was transcribed by hand and encoded as structured text. Visit the Feeding America data download page to download the encoded transcriptions of all the Feeding America books.

Also available is the text encoding instruction manual which was written to train the cookbook coders. This document provides a more in-depth explanation of the coding process than the outline provides, as well as explaining XML-related terms you may be unfamiliar with. Download the Encoding Guidelines instruction manual (some links inside the PDF may be broken.) The DTD to validate the XHTML encoding for Feeding America texts is also available in text or PDF.

The following editorial interventions have been made in the transcriptions of the cookbooks.

  • Archaic spelling and punctuation have been retained. Alternate spellings have been supplied to facilitate search and have been tagged as <alt> with the attribute of the "synonym" value containing the alternate spelling of the word.
  • Line breaks have not been retained. Ambiguous end-of-line hyphens have been retained, even across a page break. Unambiguous end-of-line hyphens have been dropped. When an unambiguous hyphen divides a word across two pages, the hyphen has been dropped and the trailing part of the word moved from the top of the second page and joined to the leading part of the word at the bottom of the first page. The page break is then inserted before the first full word of the second page.
  • Words which differ in type style from the surrounding text (bold, italic, or ornate scripts) have been noted.
  • Initial capitals which are simply larger in size than the surrounding text have been transcribed without special tagging. Decorative initial capitals have been treated twice: first tagged as an <illustration> and then rendered as part of the transcription. This ensures that the text is still readable if illustrations are not displayed as inline graphics.
  • Running heads and page numbers have not been transcribed. Page numbers or page identifications have been used as the "n" attribute of the <pb> tag. (For example, <pb n="72"> or <pb n="title page">). Signature marks (inserted by the printer to correctly assemble the folded and gathered pages) have not been transcribed.
  • Inscriptions have been transcribed if they appear to be contemporary with the original publication of the book. Book plates have also been transcribed, but inscriptions such as prices, call numbers and inventory numbers which appear to have been added later by librarians or bookdealers have been ignored.
  • Footnotes whose text extends across more than one page have been consolidated into a single note.
  • Illustrations have been been noted with the <illustration> tag. Decorative typographic devices have also been noted if they can be said to depict some recognizable object. For example, a small row of flowers would be tagged as a figure, but a plain horizontal rule would not, though either might have been used to signal the end of a chapter.
  • Blank pages have been ignored in the transcriptions. Images of the blank pages were created as part of the preservation copy of each book but have been dropped from the sequence of page images for the convenience of the reader.