Unsupervised Learning of Linguistic Structure: An Empirical Evaluation

David Powers

doi:10.1075/ijcl.2.1.06pow

ISSN 1384-6655
E-ISSN: 1569-9811

GBP

Unsupervised Learning of Linguistic Structure: An Empirical Evaluation
Author(s): David Powers ¹
View Affiliations Hide Affiliations

Affiliations:
¹ Flinders University of SA
Source: International Journal of Corpus Linguistics, Volume 2, Issue 1, Jan 1997, p. 91 - 131
DOI: https://doi.org/10.1075/ijcl.2.1.06pow

Abstract

Computational Linguistics and Natural Language have long been targets for Machine Learning, and a variety of learning paradigms and techniques have been employed with varying degrees of success. In this paper, we review approaches which have adopted an unsupervised learning paradigm, explore the assumptions which underlie the techniques used, and develop an approach to empirical evaluation. We concentrate on a statistical framework based on N-grams, although we seek to maintain neurolinguistic plausibility.The model we adopt places putative linguistic units in focus and associates them with a characteristic vector of statistics derived from occurrence frequency. These vectors are treated as defining a hyperspace, within which we demonstrate a technique for examining the empirical utility of the various metrics and normalization, visualization, and clustering techniques proposed in the literature. We conclude with an evaluation of the relative utility of a large array of different metrics and processing techniques in relation to our defined performance criteria.

Article metrics loading...

/content/journals/10.1075/ijcl.2.1.06pow

1997-01-01

2024-04-19

From This Site

/content/journals/10.1075/ijcl.2.1.06pow

dcterms_title,dcterms_subject,pub_keyword

-contentType:Journal -contentType:Contributor -contentType:Concept -contentType:Institution

10

5

Full text loading...

http://instance.metastore.ingenta.com/content/journals/10.1075/ijcl.2.1.06pow

Article Type: Research Article

Keyword(s): Classification; Feature Maps; Multidimensional Scaling; Orthography; Phonology; Self-Organization; Singular Valued Decomposition; Spearman Rank Correlation; Syntax; Tagging; Unsupervised Learning

Most Cited

- Collostructions: Investigating the interaction of words and constructions
  
  Author(s): Anatol Stefanowitsch and Stefan Th. Gries
- Automatic analysis of syntactic complexity in second language writing
  
  Author(s): Xiaofei Lu
- Extending collostructional analysis: A corpus-based perspective on `alternations'
  
  Author(s): Stefan Th. Gries and Anatol Stefanowitsch
- From key words to key semantic domains
  
  Author(s): Paul Rayson
- The 385+ million word Corpus of Contemporary American English (1990–2008+): Design, architecture, and linguistic insights
  
  Author(s): Mark Davies
- A corpus-driven approach to formulaic language in English
  
  Author(s): Douglas Biber
- Collocations in context: A new perspective on collocation networks
  
  Author(s): Vaclav Brezina, Tony McEnery and Stephen Wattam
- CQPweb — combining power, flexibility and usability in a corpus analysis tool
  
  Author(s): Andrew Hardie
- Dispersions and adjusted frequencies in corpora
  
  Author(s): Stefan Th. Gries
- Comparing Corpora
  
  Author(s): Adam Kilgarriff
More Less

Unsupervised Learning of Linguistic Structure: An Empirical Evaluation

Abstract

From This Site

Most Read This Month

Most Cited

Collostructions: Investigating the interaction of words and constructions

Automatic analysis of syntactic complexity in second language writing

Extending collostructional analysis: A corpus-based perspective on `alternations'

From key words to key semantic domains

The 385+ million word Corpus of Contemporary American English (1990–2008+): Design, architecture, and linguistic insights

A corpus-driven approach to formulaic language in English

Collocations in context: A new perspective on collocation networks

CQPweb — combining power, flexibility and usability in a corpus analysis tool

Dispersions and adjusted frequencies in corpora

Comparing Corpora