CS21 Lab 10: Text Filtering

Due 11:59pm Saturday, April 12th

For this lab you will write one program, filter.py, that filters a text file by removing words entered by the user or by removing the most popular words in the text.

First, run update21, if you haven't already, to create the cs21/labs/10 directory. Then cd into your cs21/labs/10 directory to begin working on your program.

This lab is not specifically focused on design or testing methods, but you should continue to use the good practices, such as top-down design, incremental testing, and writing well documented code.

Introduction

In this lab you will write a program that filters text in a file by removing words from it. The English language is quite redundant, so even after removing a number of popular words, the text is often quite understandable.

When the program starts up, you will display a welcome message to the user and then ask the user to enter a file to filter. If the file doesn't exist, you will prompt them again to enter a file until a valid file is entered.

Then you will present them a menu with 4 options. The first option is to filter the file with words entered in by the user. The second option is go through the file counting how many times each word appears in the file and show the user the list of words and their counts. The third option is to use the word counts to filter the file by the most popular words in the file. For this option you will prompt the user to select how many of the top words to use as the filter. The fourth option is to quit the program, giving a goodbye message.

Sample Output
Here's a sample run showing output from the poem "Stopping by Woods on a Snowy Evening".
$python filter.py
Welcome to the text filtering program.

Which file would you like to filter? /usr/local/doc/text/frost.txt
What would you like to do?
1. Filter file with selected words
2. Show the word counts of the file
3. Filter file by the most popular words in the file
4. Quit
Choice? 1
Enter the words you would like filtered from the text.
Words: snow woods sleep
Here's your filtered text:

Whose these are I think I know.
His house is in the village though;
He will not see me stopping here
To watch his fill up with

My little horse must think it queer
To stop without a farmhouse near
Between the and frozen lake
The darkest evening of the year.

He gives his harness bells a shake
To ask if there is some mistake.
The only other sound's the sweep
Of easy wind and downy flake.

The are lovely, dark and deep,
But I have promises to keep,
And miles to go before I
And miles to go before I

What would you like to do?
1. Filter file with selected words
2. Show the word counts of the file
3. Filter file by the most popular words in the file
4. Quit
Choice? 2

Word            Count
---------------------
the             7
to              6
and             5
i               5
woods           4
his             3
a               2
are             2
before          2
go              2
he              2
is              2
miles           2
of              2
sleep           2
think           2
ask             1
bells           1
between         1
but             1
dark            1
darkest         1
deep            1
downy           1
easy            1
evening         1
farmhouse       1
fill            1
flake           1
frozen          1
gives           1
harness         1
have            1
here            1
horse           1
house           1
if              1
in              1
it              1
keep            1
know            1
lake            1
little          1
lovely          1
me              1
mistake         1
must            1
my              1
near            1
not             1
only            1
other           1
promises        1
queer           1
see             1
shake           1
snow            1
some            1
sounds          1
stop            1
stopping        1
sweep           1
there           1
these           1
though          1
up              1
village         1
watch           1
whose           1
will            1
wind            1
with            1
without         1
year            1

What would you like to do?
1. Filter file with selected words
2. Show the word counts of the file
3. Filter file by the most popular words in the file
4. Quit
Choice? 3
How many of the top words to filter? 15
Here's your filtered text:

Whose these think know.
house in village though;
will not see me stopping here
watch fill up with snow.

My little horse must think it queer
stop without farmhouse near
Between frozen lake
darkest evening year.

gives harness bells shake
ask if there some mistake.
only other sound's sweep
easy wind downy flake.

lovely, dark deep,
But have promises keep,



What would you like to do?
1. Filter file with selected words
2. Show the word counts of the file
3. Filter file by the most popular words in the file
4. Quit
Choice? 4
Goodbye!
More sample output.
Requirements
For this lab we will leave the design and implemention up to you, but be sure to use good top-down design to write well organized code. The requirements for your program are shown below.
Implementation Tips
Hacker Challenge

Here are some ideas for you to extend the lab.

Submit
Once you are satisfied with your programs, hand them in by typing handin21 at the Linux prompt.

You may run handin21 as many times as you like, and only the most recent submission will be recorded. This is useful if you realize, after handing in some programs, that you'd like to make a few more changes to them.