The Digital China Lab Pilot was a collaborative workspace designed to encourage scholars in Chinese studies to learn basic programming skills aimed at producing interesting research. This lab was held every other week during the spring 2015 semester in three-hour sessions. During each session, I (along with mathematician Anthony Ruozzi) guided participants through programming tutorials and helped them adapt the script for their own uses. We also discussed the results from the previous session. Between sessions, participants applied the tools to their own research in preparation for the next session.
The lab equiped participants not only with digital skills tailored to their own projects, but also provided an understanding of why and when certain tools are appropriate. Everyone learned each tool, even where it was not strictly apposite to their own project. Part of each session explored what was happening and why a tool is or is not appropriate. Proper interpretation of results was emphasized to ensure participants understood the limitations of their data, in addition to its potential.
(A paper using an ad hoc version of this software is forthcoming in 2016. This description was taken from the project proposal.)
Over the course of the last several decades, significant amounts of bibliographic information has been made available online. Academic and non-academic libraries alike have digitized meta-data on their collections to facilitate access by their users. Services have formed to aggregate this information so people can search multiple libraries at once. In more recent years, digital libraries have sprung up out of both commercial and academic ventures. This has resulted in the creation of large amounts of information that can be leveraged for scholarly inquiry.
To this point, there is no easy, unified way for scholars to ethically access this information and study it in aggregate. Most libraries, in the interest of not having their servers overwhelmed with requests, and in the interest of making sure that their data is not used inappropriately, have asked search engines and scripts alike to not scrape their sites. This maintains the websites so that the average user can find books that interest them, but prevents scholars from performing all manner of interesting analysis.
This software package is designed to work with aggregative sources of online library catalog records and allows scholars to move beyond the one (or often at most one hundred) record at a time results of standard library searches and allows them to perform statistical analysis on large numbers of records at once. These online records include important information, from the commonly-seen (author, publisher, date), to the less-commonly-seen (number of characters per line and number of lines per page for a large set of premodern Chinese texts). When information exists at this scale, it is possible to use it for scholarly analysis and visualization in a way that tens or even hundreds of records would not allow.
This software, in its wider deployments, will use open-source information provided through several different library services. I will also work with several providers of proprietary information in order to broaden the impact of the software.
The software will go into alpha deployment in January/February. Scholars interesting in using this software should contact me. Depending on reception and further development, this software will see general deployment by mid-2015.