(Written January 4, 2014.)
It’s Saturday night, and I’m at the end of the third day of the four day American Historical Association conference. I’m having a great time but I’m absolutely exhausted. Like a five-year-old, I’m so overstimulated that I haven’t been sleeping and I can’t believe that I have to start back at work on Monday. There has been a lot of talk at the conference about digital history, which is great. I don’t think of myself as a digital historian, but a historian who doesn’t like to use handwritten note cards for my research. So I used technology available to me to organize my research, which turned out to mean the creation of a digital archive. It seemed as though most of the people at the DH presentations I attended are doing large collaborative projects, and my stuff is just mine right now. But for people who are also lone wolf PhD candidates or researchers, I’m happy to share what I do.
Caveats first:
1) I had a really strong idea for my methodology before I started. I knew how I wanted everything to be organized and it worked the way I expected it to.
2) It took a year of gathering and digitizing before I even started to go through everything. If I were dependent on university financing or grants, I probably would have had a much more difficult time, since I would have had to justify the sunk costs. I knew my project would be bigger than a typical dissertation, and I was prepared for this.
I am writing a chronological history of the War Refugee Board, a government agency created in 1944 which was tasked with the relief and rescue of persecuted minorities in Nazi-occupied Europe. The Board has mostly been relegated to the afterword or throw-away end to books on American response to the Holocaust. In part, this is likely because the Board’s records are really difficult to navigate. They were microfilmed in the original 1945 order, which was two large (60+ boxes each) sets of documents–one set of correspondence organized alphabetically by author, and one set organized by topic. There is no order within any of the folders and the whole thing is really arbitrary. A secretary in 1944 could have (and in some cases did) put one part of a letter in a folder by author, but the second part of the letter by topic. Because of this, scholarly work about the Board has been largely based on the agency’s own summary of their activities, written in 1945.
So the first thing I did was to digitize my archival material. For some of the records, I used microfilm scanners, which are available at many archives, including the USHMM and LOC. At NARA, I took photographs of microfilm using a handheld camera to avoid having to pay the printing fees. I also took a lot of photographs of original archival documents that hadn’t been microfilmed. I would guess I gathered about 100,000 images.
After I scanned and photographed everything, I put all the images I collected into the structure as I found it in the archives. I saved all my images as pdfs–a pdf for each folder I scanned or photographed. I converted groups of jpegs of the things I had photographed into large pdfs using dopdf, a free pdf printer.
Since my ultimate goal was to digitally reorganize the entire collection into something more usable (in my case, I wanted everything to sort chronologically), I needed to break out these folder-level pdfs into document-level pdfs. So I went through each large pdf, noted when each document began and ended, and printed them out using dopdf (so 20 page documents would become 20 page pdfs). I needed to name these pdfs so that they would sort chronologically and I put a lot of thought into how to do this (it was actually part of my prospectus defense).
A sample document is named something like 440606WRBR47F23D408-409. From that, I can tell that the document was dated 1944 June(06) 6th, and that I got it from the War Refugee Board records, Reel 47, Folder 23, Document 408-409. Not only did everything sort chronologically when I finally put all the pdfs (about 40,000) into one folder, I retained all the metadata I needed to properly cite the document.
Once I got everything into pdfs by document, I could import them into my database. I use Papers, which cost about $50 with a student discount. It’s software (basically a CMS, or content management system) that was really designed for managing scholarly articles, but I didn’t need to make any hacks to it for my stuff. It imports the image of the document and the title which I’ve given it.
This is what it looks like in Papers. You can see that it’s basically an Itunes interface for documents.
So now I’m at the point in my research where I’m going through and actually analyzing the documents. I assign an author (which the software stores as an authority file), a title to the document, keywords (which is also an authority list of keywords I’ve created), and I can take free-text notes on each document, all while seeing the image of the document. I can rank them, assign them colors, organize them in different ways. All of this is searchable, both in an all-search box, or I can search by title, keyword, etc. I can also export all the information I’ve taken into fielded xml files, which will potentially let me manipulate the data even further if I want to.
So it took me about a year after I progressed to candidacy before I did anything related to the content of the documents. That’s a big deal, BUT, this is what I have to show for it. I’ve been writing for about 14 months now and have about 250 (good) pages, which isn’t bad.
The most important thing I can tell you, though, is that I have not had a frantic search for something I remember seeing but now can’t find. I can find EVERYTHING I need while I’m writing as soon as I think of it or want it. As you can imagine, that is invaluable. The database has allowed me to be authoritative in my writing, make connections I wouldn’t have otherwise, and ultimately, will make my work better. I didn’t set out to do anything in digital history, but my project wouldn’t have been possible without this technology.
I’m happy to email or show the live database to anyone who might be interested. Just contact me!