Saturday, April 7, 2012
print words
String input = "Input text, with words, punctuation, etc. Well, it's rather short.";
Pattern p = Pattern.compile("[w']+");
Matcher m = p.matcher(input);
while ( m.find() ) {
System.out.println(input.substring(m.start(), m.end()));
}
create words, a list of words, by splitting the input by whitespace
for every word, strip out whitespace and punctuation on the left and the right
words = input.split()
words = [word.strip(PUNCTUATION) for word in words]
PUNCTUATION = ",. nt\"'][#*:"
>>> print words[:100]
['Project', "Gutenberg's", 'Manual', 'of', 'Surgery', 'by', 'Alexis',
'Thomson', 'and', 'Alexander', 'Miles', 'This', 'eBook', 'is', 'for',
'the', 'use', 'of', 'anyone', 'anywhere', 'at', 'no', 'cost', 'and',
'with', 'almost', 'no', 'restrictions', 'whatsoever', 'You', 'may',
'copy', 'it', 'give', 'it', 'away', 'or', 're-use', 'it', 'under',
... etc etc.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment