The goal for this assignment is for you to gain more experience creating libraries and using files, while also learning how to use structures and linked lists. You may use the book's simpio.h library to read input, if you'd like. You may work with a partner for this assignment. Again, be sure to add a comment to the top of your programs with both partner's names. Only one partner needs to turn in the code.
Suppose that the Office of Homeland Security is interested in developing an automatic process for scanning through text files (such as email) looking for potential plots against the United States. To accomplish this, further suppose that this agency has produced a list of potentially dangerous words and given each word a rating. Higher ratings equate to more dangerous words. For example, a portion of this list might include the following:
plot 5 plan 5 secret 25 destruction 40 devastate 40 devastation 40 revolt 50 revolution 50 bomb 100 crash 100
For this assignment you will develop a program that will create a dictionary of potentially dangerous words, scan files of text looking for these words, and then return a total rating of potential danger for the entire file.
For example, consider the following three files:
Let me tell you a little secret. I have a plan to take over the world as we know it. Do you want to join my REVOLUTION? Maybe you have a plan of your own. Either way, let me know as soon as you can.
This is an innocuous piece of text. I am not saying anything of importance here. Goodbye.
That party was the bomb! Thanks for letting me crash at your place afterwards. I hope we didn't cause too much destruction at your friend's place.
Below is a sample run of the scanner program working on these three files and the list of dangerous words given previously.
This program scans text files searching for possible plots against the government. It returns an overall danger rating for each file scanned. Dictionary contains 16 entries: bomb 100 crash 100 destruction 40 devastate 40 devastation 40 frighten 80 frightening 80 hunt 60 hunting 60 kill 250 molest 150 plan 5 plot 5 revolt 50 revolution 50 secret 25 Enter file to be scanned> text1 Found: secret Score: 25 Found: plan Score: 5 Found: revolution Score: 50 Found: plan Score: 5 TOTAL: 85 Continue scanning files (y/n)? y Enter file to be scanned> tetx2 Unable to open file, try again. Enter file to be scanned> text2 TOTAL: 0 Continue scanning files (y/n)? y Enter file to be scanned> text3 Found: bomb Score: 100 Found: crash Score: 100 Found: destruction Score: 40 TOTAL: 240 Continue scanning files (y/n)? n
Notice that this technique of simply rating words can be helpful in some situations. For instance, this method rates the file text1 to be more dangerous than the file text2. However it rates the file text3 to be the most dangerous. The problem is that this technique does not consider the meaning of the text, only the words it contains, and words can have many different meanings depending on the context. Despite this problem, this is a fast method for identifying text files that should be reviewed more carefully by a human reader who can more easily interpret the context.
This program will consist of several files.
FILE | PURPOSE |
dangerWords.dat | The list of dangerous words and their associated ratings |
dict.h | The interface for the dictionary library |
dict.c | The implementation of the dictionary library |
fileScanner.c | The main program for performing file scanning |
In many applications, it is useful to be able to associate a string with a definition in much the same way that a dictionary does. At one point in the program, you can enter a definition for a particular string; at a later point, you can look up that string to find its definition. More formally, what you need is a package that allows you to associate a value with a particular key (the Define operation), and later, to retrieve any value associated with that key (the Lookup operation).
Notice that this file contains the following line:
static dictionary *dict;Because this variable declaration is not inside any function, it is creating a global variable called dict which is a pointer to a dictionary. The word static should be interpreted to mean "private". This indicates that the variable dict is only known within this one file. Any function inside the dict.c file can use this variable to refer to the linked list that is storing the dictionary. In this way, we can avoid having to pass the dictionary around as an argument. This is useful within a library because we want to hide implementation details from the user of the library.
To test your dictionary, create a simple main program in fileScanner.c that creates a new dictionary, defines a few words, prints the dictionary, and attempts to look up words that are present and that are not present in the dictionary. Below is an example program that uses the dictionary interface.
#include "dict.h" main() { InitDictionary(); Define("apple", 10); Define("banana", 20); Define("pear", 5); Define("orange", 15); PrintDictionary(); printf("Looking up apple: %d\n", Lookup("apple")); printf("Looking up pear: %d\n", Lookup("pear")); printf("Looking up grape: %d\n", Lookup("grape")); }Here is the output this should produce:
Dictionary contains 4 entries: apple 10 banana 20 orange 15 pear 5 Looking up apple: 10 Looking up pear: 5 Looking up grape: -1
You should perform more extensive tests of your dictionary implementation to be sure that it is working properly. Only go on to the next part, once your dictionary is completed.
There are two major parts to scanning text files in search of dangerous words.
while (fscanf(input,"%s %d%c",word,&value,&termch) != EOF) { ... }
while (fscanf(text, "%s", str) != EOF) { ... }
There are a couple of issues you need to deal with as you read in each string from the text file. Remember that C considers any sequence of non-whitespace characters to be a string. So for example, if you were to read a string at a time from this text: Hello, I live at 12 Maple Lane. you would get the following strings:
"Hello," "I" "live" "at" "12" "Maple" "Lane."You should write the following functions to help you convert each string into a usable word that you can lookup in the dictionary. The C library ctype.h contains some functions that you will need to implement these functions.
"hello" "i" "live" "at" "maple" "lane"
Use cs21handin to turn in the following files. You do not need to turn in dict.h because it should not be modified.