
Full text loading...
The present study deals with the pre-processing of texts. This pre-processing is performed in three steps, which are: the segmentation of the texts into textual units (sentences), the re-writing of contracted forms into a standard form, and the tagging of unambiguous compounds. We describe here two of the three steps: text segmentation, and the re-writing of contracted forms. The segmentation of the texts into textual units is made possible by using the transducer Sentence. The re-writing of contracted forms into their standard forms is done by applying the transducer Normalisation. We describe in detail the various steps involved in the development of both transducers.