UT boog University of Twente Home Page

Abstract Krahmer

James Joyce's Ulysses (1922) is commonly regarded as one of the most important novels of the 20th century. It is also arguably one of the most unreadable ones. One cause for the alleged unreadibility is the final, 18th chapter of the book commonly referred to as Penelope. It contains the "sustained stream of consciousness running through Molly [Bloom]'s lurid, vulgar, and hectic mind, the mind of a rather hysterical woman, with commonplace ideas, more or less morbidly sensual, with a rich strain of music in her and with the quite abnormal capacity of reviewing her whole life in an uninterrupted verbal flow" (Nabokov 1980:362). To model this uninterrupted verbal flow, Joyce left out all forms of interpunction and punctuation from the Penelope chapter, resulting in an completely unstructured sequence of no less than 24180 words.

In this talk we report on an explorative study in which memory-based learning techniques were used to automatically segment the Penelope chapter. First all 18 chapters were tagged and tokenized using the Memory-Based Tagger of Daelemans et al. (1996). Then, a 17-fold cross-validation experiment was performed, each time training on 16 segmented and tagged chapters and testing on the punctuated chapter which was left-out. The result has an accuracy of 83.3% and an F-beta of 51.3%. This is significantly better than the baseline. During each fold the learner was applied to the 18th chapter. In this way, we obtained 17 segmented versions of the Penelope chapter. Subsequently a voting experiment was performed to obtain a single segmented version of the Penelope. The result will be discussed and compared with Nabokov's annotation.

Last modified $Date: 2001/10/04 13:39:46 $ by Parlevink Webmaster