Building a large corpus based on newspapers from the web

MyBook is a cheap paperback edition of the original book and will be sold at uniform, low price.
This Chapter is currently unavailable for purchase.

The Norwegian Newspaper Corpus (NNC) is an initiative to create a large monitor corpus representing contemporary Norwegian language in both its written varieties, Bokmål and Nynorsk. The corpus is compiled through daily harvesting and processing of published texts from the web edition of Norwegian newspapers. This introductory chapter gives a survey of work on corpus building, tool development and research in connection with the NNC project. It provides an overview of the corpus and its system architecture, describing the work flow, tools and methods used in the data processing. The chapter also gives a presentation of the individual research contributions to this volume.


This is a required field
Please enter a valid email address