Week: 21/03 – 25/03

Lecture Room: Building 1 – Room 1.38

Schedule: 09:30 – 12:30  |  14:00 – 17:00

Teachers:

 

Contents:

Topics to be treated in this module include:

1. Foundations of corpus linguistics

  • principles and methods of corpus analysis
  • applications of corpus data in lexicography
  • types of corpora, overview of existing corpora
  • corpus design, representativity, data sources, metadata

2. Corpus compilation

  • building corpora from online data: web scraping etc.
  • boilerplate removal, normalization, metadata extraction
  • representation and exchange formats
  • online and stand-alone tools for web corpus compilation
  • automatic linguistic annotation (POS, lemma, NER, parsing, …)
  • online and stand-alone tools for linguistic annotation

3. Searching corpora

  • regular expressions
  • character encodings and the Unicode standard
  • CQP query language for lexico-grammatical patterns
  • practical exercises with Sketch Engine and CQP web

4. Quantitative analysis

  • frequency lists and metadata distribution
  • collocations and word sketches
  • keyword analysis
  • lexicographic interpretation of results
  • foundations of statistical inference

4. Reproducibility

  • research methodology and documentation
  • data management, sustainability of corpus resources

 

Please see the module description for further information.