• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Digital Humanities

2019/2020
Academic Year
ENG
Instruction in English
4
ECTS credits
Course type:
Elective course
When:
1 year, 3, 4 module

Instructor

Course Syllabus

Abstract

Digital humanities is an umbrella term for a number of fields of inquiry dealing with the application of computers in the humanity scholarship. Traditional areas of concentration include historical data analysis, corpus linguistics, text mining, and online publication of sources. The course will include several modules dealing with (1) data analysis and data visualisation including elements of social network analysis, computer cartography, and computer text analysis. (2) data pre-processing, including cleaning the data and preparing them for the use in data analysis environments and text markup for research purposes. The course is focused on both studying examples of Digital humanities scholarship and acquiring specific skills. The practical study of analysis will be based on both training datasets and the real-life historical datasets. A special section of the course deals with basics of GIS. The emphasis is placed on the development of practical skills.
Learning Objectives

Learning Objectives

  • learn how to use the R environment for data analysis and simple Perl scripts and Regular Expressions for data collection and pre-processing
Expected Learning Outcomes

Expected Learning Outcomes

  • to understand principles of data analysis in the humanities and possess a sufficient command of tools for data pre-processing, analysis, and visualisation (basics of R, Perl and Regular Expressions, elements of GIS) for implementation of individual research projects and mediation between humanity scholars and computer scientists in Digital humanities projects
Course Contents

Course Contents

  • Visualization and its role in data analysis. Standard tasks of data analysis, their graphical and analytic implementation.
    Visualization and its role in data analysis. Anscombe’s quartet. Standard tasks of data analysis, their graphical and analytic implementation. Frequency distributions (histograms, barplots, and descriptive statistics). Bivarate models (four-filed classification of bivariate interactions, scatter-plots, multiple boxplots, and structured barplots, correlations, regresion, t-test and analysis of variance, ). Non-parametric analogues of the widespread parametric methods. Why and where non-parametric methods are important. A special case of bivariate model: temporal dynamics. Less trivial visualisations. Visualising interaction of three and more variables. Sankey diagrams, maps, networks, and animated graphs.
  • R basics: command-line interface, objects, functions, and file management
    Command-line interface. Possible responses o the interpreter, > and + prompts. Commands history. Interrupting calculations when something goes wrong (Ctrl+C, Escape). Objects in R. Vectors, matrices, data frames and lists. Data types: numeric, character, factor. Some constants: TRUE, FALSE, NA and NULL. Arithmetics in R, assignment operator <-. Functions in R. Posible values for function arguments: constants, objects, products of other functions. Creating vectors: c(), rep(), and seq(). The default order of arguments and argument names. Some elementary math functions: log(), log10(), sqrt(). Help function: help(). Creating more complex objects: data.frame() and list() functions. Adding rows and columns, rbind(), cbind(), and their allies. Data type transformations with as.character(), as.factor(), and as.numeric(). The dimensions of objects: length(), dim(), nrow(), and ncol(). The preview functions: str(), head(), tail(), and summary(). Addressing elements of vectors, data frames and lists. Addressing by number and addressing by name. The names(), colnames(), and rownames() function, and their use for previewing and assigning names. File management in R: getwd(), setwd(), and dir(). Absolute and relative paths. Reading and saving data: read.table() and write.table(). Saving scripts: savehistory() and loadhistory(). The use of scripts in batch mmode: source().
  • Descriptive statistics and data transformation in R
    Extracting summary stats: summary(), min(), max(), mean(), median(), IQR(), sd(), fivenum(), table(). Ho to handle the NA values. Subsetting: subset(). Simple and complex conditions, Boolean operators: & (AND) and | (OR). Factor levels and subsetting, droplevels(). Loops and creation of lists. The while(){} and for(){} loops. Loops and data aggregation (including loops vs. apply() function family).
  • Basic R graphics
    The simpliest plot: a histogram: hist(). Textual elements of the plot: main, xlab, ylab. Rough adjustment of axes (xlim and ylim). Histohgram bins, breaks argument. Numeric output of the hist() function. The connection between hist() and plot(). Colours: numbers, names, RGB-codes (#RRGGBB) and rgb() function. Black and white plots, dashing (angle and density). Border and background fill. The use of plot() for printing bar- and boxplots. Specialised barplot() and boxplot() functions and their peculiarities. The trouble with axis labels in barplot(), the use of strwidth() for setting the margin widths. Saving graph to a file. General principles of working with basic R graphical system. Plotting devices, turning them on and off with R functions. Why it is absolutely necessary to appropriately close the plotting devices, the dev.off() function. Plotting graphics to files: troubleshooting. Raster graphics in R: the png() function. The use of variable texts in plots and variable filenames when saving them. Creation of arrays of illustrations. The paste() function and loops. Scatter plots: plot() and its arguments. The type argument. The size and shape of data points. What if the data points are overlapping (2D histograms). Adding elements to the plot: lines(), points(), text(), and abline(). The legend() function. Working with axes: the axis() function. The preparation of illustrations for on-screen presentations and for academic press. The issue of pixel size and resolution. General parameters of the plotting device, the par() function. Juxtaposition and highlighting of graphs. Graphical primitives in R: segments(), arrows(), rect(), polygon(). Vector graphics in R: pdf(), postscript(), and svg().
  • Implementation of basic analytic procedures in R
    An overview of bivariate models. Causality and formal connection. Different classifications of variables and scales. Null-hypothesis, p-values, false positives and false negatives. The case of two quantitative variables: correlation and linear regression: cor(), lm(), and reading their output. The case of an ‘independent’ qualitative and ‘dependent’ quantitative variables: t.test() and aov(). Post-hoc analysis. The case of two qualitative variables, contingency tables and (chisq.test()). The case of ‘independent’ quantitative and ‘dependent’ qualitative variables. Basics of nonparametric statistics in R (wilcox.test(), fisher.test(), Theil–Sen estimator: mblm()).
  • Data pre-processing
    Basics of Regular Expressions. The use of RegEx for search and replace in text editors. A generalised scheme of data pre-processing. The use of RegEx in perl scripts. Data fetchers and data parsers. Cleaning the data (s///) and extracting the variables for data restructurisation (m//). General principles of dataset organisation and different strategies of data parsing.
  • R: Some advanced chapters: Basics of network analysis in R. Text analysis in R. Basics of R cartography.
    Basics of network analysis in R. The libraries sna and network, network graphs and extraction of network metrics. Text analysis in R. From frequency lists and wordclouds to more complex models. Basics of R cartography, R as a GIS analytic tool. The libraries maps and rgdal.
  • Elements of computer cartography and GIS
    Basic organisational principles of GIS. Geocoded data, layered representation. Vector map elements: dots, (poly)lines, (multi)polygons. QGIS and its basic features. Problems of historical cartography. Shape of the Earth and cartographic projections. A brief history of cartography and geodesy. The longitude problem and map accuracy. Retrieving historical map data: pictorial and textual sources. Vectorisation of a raster map, export of shapefiles. Data exchange between QGIS and R.
  • Digital humanities applications
    Quantitative history and historical demography. Historical scientometrics. Historical digital cartography. Text mining. Public digital collections. Text Encoding Initiative.
Assessment Elements

Assessment Elements

  • non-blocking Project work
  • non-blocking Standard performance control tasks
  • non-blocking Class attendance and engagement
Interim Assessment

Interim Assessment

  • Interim assessment (4 module)
    0.4 * Class attendance and engagement + 0.4 * Project work + 0.2 * Standard performance control tasks
Bibliography

Bibliography

Recommended Core Bibliography

  • Hai-Jew, S. (2017). Data Analytics in Digital Humanities. Cham, Switzerland: Springer. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=1514614
  • Sabharwal, A. (2015). Digital Curation in the Digital Humanities : Preserving and Promoting Archival and Special Collections. Waltham, MA: Chandos Publishing. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=979489

Recommended Additional Bibliography

  • Brughmans, T. (2013). Thinking Through Networks: A Review of Formal Network Methods in Archaeology. Journal of Archaeological Method & Theory, 20(4), 623–662. https://doi.org/10.1007/s10816-012-9133-8
  • Schreibman, S., Siemens, R. G., & Unsworth, J. (2004). A Companion to Digital Humanities. Malden, MA: Wiley-Blackwell. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=231516
  • Terras, M. M., Nyhan, J., & Vanhoutte, E. (2013). Defining Digital Humanities : A Reader. Farnham, Surrey, England: Routledge. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edsebk&AN=608888