Last Updated: 2017-07-10 Mon 14:49

CS 211 Project 3: Class Inhairytense

CODE DISTRIBUTION:

CHANGELOG:

Wed Mar 1 14:08:48 EST 2017
Additional guidance on how correctWord(..) should use the ignoreCase field has been added to the implementation notes for the AutomaticSC. These may clarify the overall behavior of the method.
Wed Mar 1 09:52:02 EST 2017
Minor update to the tests: a reference to "small-dict.txt" in SpellCheckerTests.java should be "short-dict.txt".
Wed Mar 1 01:34:21 EST 2017
Test cases are now linked at the top of the spec.
Tue Feb 28 09:24:37 EST 2017
Several manual inspection criteria have been added to SpellChecker in response to student questions about whether Document can be used during readAllLines(..): it cannot. The new manual inspection criteria are linked here.

Table of Contents

1 Overview

A primary feature of object-oriented languages like Java is inheritance, the ability to relate a new class to a previously written class to exploit a previously written code. Existing classes can be extended to create modified versions of their old methods and introduce entirely new functionality. This power comes with a cost though: inheritance is difficult to understand at first and it is often difficult to recognize an opportune moment to employ it.

In this project, we will develop a small class hierarchy of spell checkers, classes which are designed to enable the correction of misspelled words in documents. Each spell checker will share some structure and functionality with the basic spell checker which makes the collection of classes as a whole a good candidate for an inheritance hierarchy.

2 Spell Checker Functionality and Hierarchy

There are 4 required spell checkers to be implemented. Each of them implements the same basic functionality.

Constructor
Construct a spell checker by providing it with a file containing correct dictionary words and indicate whether upper/lower case letters should be ignored while checking spelling
boolean isCorrect(String word)
Return true if the given word considered correct by the spell checker and false otherwise.
String correctWord(String word)
Regardless of whether the word is correct or not, provide an alternative word to replace it.
void correctDocument(Document doc)
Modify the provided document to replace all incorrect words with a correction provided by the spell checker. The Document class is provided in the project pack.

The different versions of the spell checker specialize these methods to tailor the class to a specific use case. However, all of the spell checkers share a considerable amount of functionality making the collection of classes a good candidate for an inheritance hierarchy.

The hierarchy is given below.

SpellChecker : Basic spell check which highlights incorrect words
|
+--AutomaticSC extends SpellChecker : Automatic check which corrects words based on edit distance
|
+--InteractiveSC extends SpellChecker : Interactive checker which prompts users for corrections
   |
   +--PersonalSC extends InteractiveSC : Interactive checker with a personalizable dictionary

Document : Editable document class
StringComparison : Contains editDistance(..) method for use in AutomaticSC

A few notes

  • The top of the hierarchy is the SpellChecker class which has various descendants such as AutomaticSC and PersonalSC.
  • The descendant classes will inherit the fields and methods of parent classes and in some cases override/specialize methods to operate differently from the parent class.
  • The Document and StringComparison class are not part of the hierarchy of spell checkers. They are both provided in the project code pack.

Below is a brief overview of how each of the spell checkers behaves differently as they go about correcting documents. The demonstration is given via an interactive session in DrJava.

Welcome to DrJava.
> // Create a document with the provided Document class
> String content = "One potatoe, two tumatoes, three potatoes, four. I misunderestimated how many potatoes."
Document doc = new Document(content);
> doc.toString()
One potatoe, two tumatoes, three potatoes, four. I misunderestimated how many potatoes.

> // Highlight misppeled words
> doc = new Document(content);
> SpellChecker sc = new SpellChecker("english-dict.txt",true);
> sc.correctDocument(doc)
> doc.toString()
One **potatoe**, two **tumatoes**, three potatoes, four. I **misunderestimated** how many potatoes.

> // Automatic spell correcting
> doc = new Document(content);
> AutomaticSC asc = new AutomaticSC("english-dict.txt",true);
> asc.correctDocument(doc)
> doc.toString()
One potato, two tomatoes, three potatoes, four. I underestimated how many potatoes.

> // Set up input / output classes 
> import java.util.*; import java.io.*;
> Scanner stdin = new Scanner(System.in);
> PrintWriter stdout = new PrintWriter(System.out,true);

> // Interactive spell checking
> doc = new Document(content);
> SpellChecker isc = new InteractiveSC("english-dict.txt",true,stdin,stdout);
> isc.correctDocument(doc)
@ MISSPELLING in: One **potatoe**, two tumatoes, three potatoes, four. I misunderestimated how many potatoes.
@- Correction for **potatoe**:
potato
@ Corrected to: potato
@ MISSPELLING in: One potato, two **tumatoes**, three potatoes, four. I misunderestimated how many potatoes.
@- Correction for **tumatoes**:
tomatoes
@ Corrected to: tomatoes
@ MISSPELLING in: One potato, two tomatoes, three potatoes, four. I **misunderestimated** how many potatoes.
@- Correction for **misunderestimated**:
misunderstood
@ Corrected to: misunderstood
> doc.toString()
One potato, two tomatoes, three potatoes, four. I misunderstood how many potatoes.

> // Use interactive with a personal dictionary
> doc = new Document(content);
> PersonalSC psc = new PersonalSC("english-dict.txt",true,stdin,stdout,"personal-dict.txt");
> psc.correctDocument(doc)
@ MISSPELLING in: One **potatoe**, two tumatoes, three potatoes, four. I misunderestimated how many potatoes.
@- **potatoe** not in dictionary add it? (yes / no)
yes
@ MISSPELLING in: One potatoe, two **tumatoes**, three potatoes, four. I misunderestimated how many potatoes.
@- **tumatoes** not in dictionary add it? (yes / no)
no
@- Correction for **tumatoes**:
tomatoes
@ Corrected to: tomatoes
@ MISSPELLING in: One potatoe, two tomatoes, three potatoes, four. I **misunderestimated** how many potatoes.
@- **misunderestimated** not in dictionary add it? (yes / no)
yes
> doc.toString()
One potatoe, two tomatoes, three potatoes, four. I misunderestimated how many potatoes.
> psc.getAllPersonalDictWords()
potatoe
misunderestimated

3 Project Files

Files that are "provided" are in the project code pack. Tests may be posted after the initial release of the project spec.

File State Notes
SpellChecker.java create Basic spell check which highlights incorrect words
AutomaticSC.java create Automatic check which corrects words based on edit distance
InteractiveSC.java create Interactive checker which prompts users for corrections
PersonalSC.java create Interactive checker with a personalizable dictionary
Document.java provided Editable document class
StringComparison.java provided Contains editDistance(..) method for use in AutomaticSC
english-dict.txt data Dictionary of 119095 English words in ASCII, one word per line
junit-cs211.jar provided JUnit library for command line testing
ID.txt create Create in setup to identify yourself
Tests tests Will be posted later

4 Setup and Submission

The submission procedure is identical to previous projects.

  • Keep all project files in a directory named after the pattern ckauffm2-205-p3
  • Include an ID.txt file with your identification details in it.
  • When finished, create a .zip file of your project directory
  • Verify that all your project files are in the zip
  • Submit your zipped project file to Blackboard under the appropriate project link.
  • You may submit as many times as desired; only the most recent submission will be graded.

5 Grading Breakdown

Grading for this project will be divided into two distinct parts: Automated tests, and Manual Inspection.

5.1 Automated Tests (50%)   grading

  • JUnit test cases will be provided to detect errors in your code.
  • Tests may not be available on initial release but will be posted at a later time.
  • Tests may be expanded, changed, and corrected as the deadline approaches.
  • It is your responsibility to get and use the freshest set of tests available.
  • Tests will be provided in source form so that you will know what tests are doing and where you are failing.
  • It is up to you to run the tests to determine whether you are passing or not. If your code fails to compile against the tests, little to no credit will be garnered for this section.
  • Most of the credit will be divide evenly among the tests; e.g. 50% / 25 tests = 2% per test. However, the teaching staff reserves the right to adjust the weight of test cases after the fact if deemed necessary.
  • Code that does not compile and run tests according to the specified command line invocation may lose all automated testing credit. Graders will usually try to fix small compilation errors such as bad directory structures or improper use of packages. Such corrections typically result in a loss of 5-10% credit on automated testing. However, if more than a small amount of error to fix problems seems required, no credit will be given.

5.2 Manual Inspection (50%)

  • Teaching staff (GTAs) will manually inspect your work looking for a specific set of features. They are generally listed throughout the document next to the relevant project features.
  • Credit will also be awarded/deducted based on adherence to good coding style, which includes:
    • Good indentation and curly brace placement (be consistent and follow a common convention)
    • Comments describing each field and method
    • Comments describing a complex section of code and invariants which must be maintained for classes
    • Use of internal private methods to decompose the problem beyond what is required in the spec, as needed
  • Some credit will be assigned for designing your program according to the given specification, for instance using a designated algorithm, structuring your program in a certain fashion, or utilizing a required programming element.

6 Provided Class: Document

This class is provided and does not need to be implemented. You will need to familiarize yourself with its methods, however, as a primary functionality of all spell checkers is to correct misspellings that appear in a document.

Document provides a simple way to convert text in a string into a streamable, semi-editable format. After constructing a Document, one can repeatedly ask for the String nextWord() or whether the document boolean hasNextWord(). This similar in spirit to the methods of Scanner and allow one to process the document from beginning to end. Unlike Scanner, a Document has a method to void replaceLastWord(String correction) which will alter the last word returned by nextWord() to be the provided correction. Documents can also void rewind() back to the beginning for additional passes through it.

The remaining sections provide a demonstration use of Document in a DrJava interactive loop and a summary of its public methods. You are free to examine the contents of Document.java and may learn a few new tricks but the class should not be altered to complete the project.

6.1 Demo Usage

Welcome to DrJava.
> // Create a document with the specified contents
> Document doc = new Document("They misunderestimated me.");
> doc.toString()                  // show contents
They misunderestimated me.
> doc.hasNextWord()               // any words left?
true
> String word;
> word = doc.nextWord()           // capture next word as a string
They
> doc.hasNextWord()               // any words left?
true
> word = doc.nextWord()           // capture next word as a string
misunderestimated
> doc.hasNextWord()               // an so on...
true
> word = doc.nextWord()           
me
> doc.hasNextWord()               // until no words are left
false
> word = doc.nextWord()           // at which point exceptions are raised
java.lang.RuntimeException: No words remain in the document
	at Document.nextWord(Document.java:90)
> 
> // Fresh document
> doc = new Document("They misunderestimated me. That is an incorrect analyzation of the situation.");
> 
> // Read 8 words
> for(int i=0; i<8; i++){ word = doc.nextWord(); }
> word                             // Last word read
analyzation
> doc.debugString()                // Show the doc with marks around the last word 
They misunderestimated me. That is an incorrect >>analyzation<< of the situation.
> doc.replaceLastWord("analysis")  // Replace a misspelling
> doc.debugString()                // Show the doc with marks around the last word
They misunderestimated me. That is an incorrect >>analysis<< of the situation.
> doc.toString()                   // Show contents
They misunderestimated me. That is an incorrect analysis of the situation.
> 
> doc.rewind()                     // back to the beginning of the document
> word = doc.nextWord()            // produces the first word
They
> doc.debugString()                // show first word marked
>>They<< misunderestimated me. That is an incorrect analysis of the situation.

6.2 Class Architecture

public class Document{
// Simple, editable document. Contents are initialized with a
// string. Allows scanning through the document by word with calls to
// nextWord() and hasNextWord() with rewind() resetting back to the
// beginning of the document.  Words can be replaced via calls to
// replaceLastWord(str). Display contents with toString() and
// debugString().

  public Document(String contents);
  // Construct a document with the given contents

  public Document(File file) throws Exception;
  // Construct a document contents initialized from the given file

  public String toString();
  // Returns a string representation of the entire document

  public String debugString();
  // Returns a string representation of the document with the contents
  // modified to mark the word selected by nextWord()

  public String nextWord();
  // Return the next word in the document starting with the first
  // word. Ignores punctuation and numbers. Throws an exception if
  // there are no words remaining in the document.

  public boolean hasNextWord();
  // Return true if the document contains any more words so that a
  // call to nextWord() would succeed. Returns false if no words
  // remain in the document.  Punctuation and numbers do not count as
  // words.

  public void rewind();
  // Reset the internal position of the document so that a subsequent
  // call to nextWord() will return the first word in the document

  public void replaceLastWord(String correction);
  // Replace the last word returned by nextWord() with the given
  // correction.  Internal positioning will be adjusted so that
  // subsequent calls to nextWord() will move beyond the supplied
  // correction. Throws an exception if nextWord() has not been called
  // appropriately (ex: immediately after construction or after a call
  // to rewind())

  public String currentLine();
  // Return a string showing the line of the document which contains
  // the last word returned by nextWord().  Returns the first line of
  // the document if called after construction or a call to rewind().

}

7 Basic Spell Checker

The root of the spell checker class hierarchy is the class SpellChecker. Its purpose is simply to identify misspelled words and mark them with asterisks as in the following

This is a **mispeled** word. So is **ths**.

As mentioned in the section on spell checker functionality, the SpellChecker class provides four basic methods to accomplish this task: constructor, isCorrect(word), correctWord(word), and correctDocument(doc). Additional support methods are also provided which are shown in the demo section below and outlined in the class architecture later.

7.1 Demo Usage

Below is a demonstration of several of the capabilities of the SpellChecker.

Welcome to DrJava.
> // Demonstrate spell checker capabilities

> // Construct a spell checker with the provided english dictionary ignoring case
> SpellChecker sc = new SpellChecker("english-dict.txt",true);

> sc.dictSize()                   // show # words in dictionary
119095
> sc.isCorrect("potatoes")        // is word in the dictionary
true
> sc.correctWord("potatoes")      // provide a correction for word
**potatoes**

> sc.isCorrect("potatoe")         // is word in the dictionary
false
> sc.correctWord("potatoe")       // provide a correction for word
**potatoe**
> sc.correctWord("tumato")        // provide a correction for word
**tumato**
> sc.correctWord("misunderestimated")
**misunderestimated**

> // Create a document
> String content = "One potatoe, two tumatoes, three potatoes, four. I misunderestimated how many potatoes."
> Document doc = new Document(content)

> // "Correct" misspellings in the document by highlighting them
> sc.correctDocument(doc)
> doc.toString()
One **potatoe**, two **tumatoes**, three potatoes, four. I **misunderestimated** how many potatoes.

> // Read all lines from a file using a static method
> String [] lines = SpellChecker.readAllLines("english-dict.txt");
> lines[0]
A
> lines[5]
aah
> lines.length
119095

7.2 Class Architecture

public class SpellChecker{
// A class to do spell checking. This version only marks misspelled words with
// asterisks as in **mispeling**.  It serves as a parent class for other spell
// checkers to inherit functionality to add features by overriding methods.

  protected String [] dictWords;      
  // Array of words considered correct by spell checker

  protected boolean ignoreCase;
  // If true, ignore case when checking the spelling of words; otherwise
  // capitalization differences will be counted as misspellings

  public static String [] readAllLines(String filename);
  // Utility which reads all lines from a file and returns them as an array of
  // strings. If problems are encountered during reading, return a string array
  // of length 0 (empty).  See implementation notes for dicussion of how to
  // handle exceptions and use two-pass scanning to allocate an appropriately
  // sized array.

  public SpellChecker(String dictFilename, boolean ignoreCase);
  // Construct a spellchecker. dictFilename is the name of a file containing all
  // words that are considered correct, one on each line; english-dict.txt is
  // commonly used.  ignoreCase indicates whether case should be ignored or used
  // when checking for word correctness against dictionary words.

  public int dictSize();
  // Return the size of the dictionary used by this spellchecker which is the
  // number of words read from the dictionary file and stored in the
  // dictWords array.

  public boolean isCorrect(String word);
  // Return true if the provided word is considered correct by the spell checker
  // and false otherwise. A word is correct if it is equal to a word in the
  // dictionaryWord array. It is also correct if case is being ignored and is
  // equal ignoring case to some word in the dictWords array.

  public String correctWord(String word);
  // Create a correction for the given word.  Return the word surrounded by
  // asterisks which mark it as incorrect as in the word "misunderestimated"
  // should become "**misunderestimated**".  This method produces a correction
  // for the given word even if it is in the dictionary: it is to be used in
  // conjunction with isCorrect(word) to transform only words not in the
  // dictionary. That means correctWord("apple") returns "**apple**".

  public void correctDocument(Document doc);
  // From the beginning of the document, apply corrections to all words in the
  // document. Each misspelled word will be marked with asterisks according to
  // the correctWord() method.  Methods of Document such as nextWord(),
  // hasNextWord(), and replaceLastWord(w) are used to modify the provided
  // document.

}

7.3 Demo Main Method using SpellChecker

// Demonstration of various features of basic spell checkers
public class SpellCheckerMain{
  // Utility to print by typing less
  public static void print(Object s){
    System.out.println(s);
  }

  public static void main(String args[]){
    // Construct a spell checker which uses the english dictionary
    // provided and ignores case
    SpellChecker checker = new SpellChecker("english-dict.txt",true);
    print( checker.dictSize() );                 // 119095 - words in english-dict.txt
    print( checker.isCorrect("case") );          // true
    print( checker.correctWord("case") );        // "**case**" - always put on asterisks
    print( checker.isCorrect("analyzation") );   // false
    print( checker.correctWord("analyzation") ); // "**analyzation**"
    
    Document doc = new Document("They misunderestimated me.");
    print( doc.toString() ); // They misunderestimated me.

    checker.correctDocument(doc);
    print( doc.toString() ); // They **misunderestimated** me.
  }    
}

7.4 Implementation Notes

Reading all lines from a file

SpellChecker must read its dictionary from a file during construction so it makes sense for it to provide some static methods to read the contents of a file that can be used by its children classes. The method readAllLines(filename) extracts the contents of a file and returns them as an array of lines. This is useful for reading dictionary files like the provided english-dict.txt which simply has one correct word per line as in:

abalone
abalone's
abalones
abandon
abandoned
abandoning
abandonment
abandonment's
abandons
...

Completing readAllLines(filename) eases the task of initializing the spell checker as the method is used to read all correct words to populate the protected field dictWords.

To implement readAllLines(filename), the easiest set of classes to use are

  • File to indicate input will be read from a file
  • Scanner to read contents, one line at a time

Importantly use the following strategy to efficiently read in all words. It is sometimes referred to as a "two-pass" approach.

  • First Pass
    • Open the scanner
    • Read lines from the scanner but ignore those lines
    • Read lines until no more exist, counting lines until the end of the file and close the scanner
  • Allocate an array sized to the number of lines in the file
  • Second Pass
    • Recreate the scanner which will start it back at the beginning of the file
    • Read the previously calculated number of lines from the scanner
    • Each time a line is read assign it to an element of the array
  • Return the array

Catching Exceptions while Creating Scanners

Creating a scanner from a file can go wrong: the file might not exist or might be readable with the permissions available to the program. An exception will result from this which must then either (1) be handled or (2) acknowledged in the method signature as a possible outcome of running the method. readAllLines() takes approach (1). Handling exceptions will be covered later in the course but for now, the following basic code pattern suffices to illustrate how this works in the situation at hand.

Scanner input;
try{
  input = new Scanner(...);     // this could go wrong
}
catch(Exception e){             // when something goes wrong
  return ...;                   // return some default value
}

// Nothing went wrong so start using Scanner input
...

This pattern may appear in a couple places in readAllLines(..).

Spell checking ignoring case and accounting for it

Spell checkers should honor the ignoreCase option handed to them when they are created. To demonstrate, below are two spell checkers created with different values for ignoreCase.

> // ignoreCase is false
> SpellChecker useCase = new SpellChecker("english-dict.txt",false);
> useCase.isCorrect("mellifluous")
true
> useCase.isCorrect("Mellifluous")
false
> useCase.isCorrect("MELLIFLUOUS")
false

> // ignoreCase is true
> SpellChecker ignoreCase = new SpellChecker("english-dict.txt",true);
> ignoreCase.isCorrect("mellifluous")
true
> ignoreCase.isCorrect("Mellifluous")
true
> ignoreCase.isCorrect("MELLIFLUOUS")
true

This complicates the isCorrect(word) method somewhat. The String class contains method that compares two strings for equality considering case and one that considers equality ignoring case which should be located and employed.

Correcting Documents

The correctDocument(doc) method is meant to scan through an entire document to correct all misspelled words. In the case of the SpellChecker, this correction is simply to identify misspelled words with asterisks.

It is tempting to do this manually by replacing words with explicitly constructed asterisked strings in a call such as

doc.replaceLastWord("**"+incorrectWord+"**");

and while it will get the job done for the moment, it is somewhat short-sighted for the following reasons.

  1. The class already knows how to produce a corrected version of the word by invoking its correctWord(..) method so this is a good chance to make use of existing code. If later the misspelled word markers are changed to !!mispeled!!, there is only one place in code that needs alteration.
  2. By making use of the correctWord(..) method, one opens the possibility for child classes to adjust that method's behavior by overriding it and have the effect seen wherever correctWord(..) is used in parent methods. This is exactly the tack that will be taken by child class AutomaticSC.

7.5 (20%) Manual Inspection Criteria for SpellChecker   grading

  • The readAllLines(filename) method uses an efficient input strategy such as the one suggested in the spec to avoid repeatedly re-allocating larger arrays while reading input. Only a single array allocation should be required.
  • Exception handling is incorporated into readAllLines(filename) to return an empty array if no file is found.
  • Effective use of the String methods are made to honor the ignoreCase option during isCorrect(word) calls.
  • The implementation of correctDocument(doc) makes effective use of the public methods of the Document class to replace misspelled words with the results of a call to correctWord(word) which will be important for descendant classes.
  • Standard arrays are employed for the dictWords rather than other more advanced data structures such as ArrayList
  • The Document class is NOT used in readAllLines(..); a Scanner is instead used to read from the file
  • No instances of Document are used in the SpellChecker except during the correctDocument(doc) method
  • The SpellChecker has only the two fields specified in the class architecture:
    protected String [] dictWords;      
    protected boolean ignoreCase;
    

8 Provided Class: StringComparison

StringComparison houses a single method, editDistance(x,y) which is used to measure the "distance" between two strings. The method employs an interesting technique often called dynamic programming which constructs a table of values to efficiently compute the answer to a set of recurrence relations describing a problem, in this case how many operations are required to transform one string into another. Edit distance is also referred to as Levenshtein Distance after the first researcher to publish on the problem.

As with the Document class, you may use the methods with StringComparison freely without modification. It is not essential that you know how they work, only that you know how to put them to use.

8.1 Class Architecture

public class StringComparison {
// Class which contains some utility methods to compare strings

  public static int editDistance(String x, String y);
  // Compute the edit distance (Levenstein Distance) between strings x
  // and y; returns a positive number indicating the minimum character
  // insertions, deletions, or substitutions required to transform x
  // into y.  Smaller numbers mean x and y are "closer" to each other.
  // Uses dynamic programming to solve this task as per the algorithm
  // at
  // https://en.wikipedia.org/wiki/Levenshtein_distance#Iterative_with_full_matrix.
  // 
  // Possible to optimize the performance of this using the two-row
  // approach or a global matrix though both would introduce
  // complications.

}

9 Automatic Spell Checking

While highlighting spelling errors is nice, automatic spelling correction is generally considered a very useful feature of many computing systems (though it is not without its own set of pitfalls). The AutomaticSC provides a way to automatically correct spelling without the need for user interaction.

A simple means of doing automatic spell correction is to search the dictionary for the "closest" word to one not in the dictionary. Closeness here requires a distance measure that is provided by the StringComparison.editDistance(x,y) method.

9.1 Demo Usage

Welcome to DrJava.
> // Demonstrate automatic spell checker capabilities

> // Construct an automatic spell checker with the provided english dictionary ignoring case
> AutomaticSC asc = new AutomaticSC("english-dict.txt",true);

> asc.dictionarySize()             // show size of dictionary
Static Error: No method in AutomaticSC has name 'dictionarySize'
> asc.isCorrect("potatoes")        // is word in the dictionary
true
> asc.correctWord("potatoes")      // provide a correction for word
potatoes

> asc.isCorrect("potatoe")         // is word in the dictionary
false
> asc.correctWord("potatoe")       // provide a correction for word
potato
> asc.correctWord("tumato")        // provide a correction for word
tomato
> asc.correctWord("misunderestimated")
underestimated

> // Create a document
> String content = "One potatoe, two tumatoes, three potatoes, four. I misunderestimated how many potatoes."
> Document doc = new Document(content)

> // Correct misspellings in the document by replacing with closest dictionary word
> asc.correctDocument(doc)
> doc.toString()
One potato, two tomatoes, three potatoes, four. I underestimated how many potatoes.

9.2 Class Architecture

public class AutomaticSC extends SpellChecker{
// A spell checker which automatically selects a correction for a
// misspelled word.  It inherets most functionality from its parent
// class but adjusts how correctWord(..) performs.

  public AutomaticSC(String dictFilename, boolean ignoreCase);
  // Construct an automatic spell checker. Pass the parameters to the
  // parent class constructor.

  @Override
  public String correctWord(String word);
  // Return a correction for the given word. The correction is the
  // word in the dictionary which has the smallest edit distance from
  // the given word. If there are ties, favor whichever word appears
  // earlier in the dictionary. Make use of the methods of the
  // provided StringComparison to find the closest word in the
  // dictionary. Make sure to honor the ignoreCase option which may
  // lead you to convert words to all upper or lower case.

  public static String matchCase(String model, String source);
  // Utility method to handle case matching between words. Check if
  // parameter model is all caps or only the first character is
  // capitalized and transform source to match the capitalization. In
  // the event that the model is neither all caps nor capitalized
  // followed by all lower case, return the source strnig as
  // is. Examples are given below.
  // 
  // | Situation   | model  | source | return |
  // |-------------+--------+--------+--------|
  // | All Caps    | BANANA | apple  | APPLE  |
  // | All Caps    | PEAR   | orange | ORANGE |
  // | Capitalized | Banana | orange | Orange |
  // | Capitalized | Apple  | pear   | Pear   |
  // | Neither     | banana | apple  | apple  |
  // | Neither     | banana | Apple  | Apple  |
  // | Neither     | BaNaNa | aPPle  | aPPle  |
  // | Neither     | peaR   | Orange | Orange |

}

9.3 Implementation Notes

Exploiting Inheritance

As most of the behavior of the automatic spell checker is identical to its parent, very little code needs to be written. Note in the class overview that only two method are present.

  • A class must always provide its own constructors. However, the AutomaticSC class requires exactly the same initialization as SpellChecker. Employing the parent constructor super(..) here will make the AutomaticSC constructor short and sweet.
  • AutomaticSC behaves identically to SpellChecker on all methods except correctWord(word). Thus, this is the method that needs a new definition as indicated by the @Override annotation. This means AutomaticSC will be a short class: so long as methods in SpellChecker are written correctly, they can be inherited and used without modification so require no code in AutomaticSC.

Finding the Closest Word while Ignoring/Accounting for Case

Like its parent class, AutomaticSC is initialized to either ignore case or account for it. This is somewhat tricky to account for in the correctWord(word) method and requires some consideration. The most common situation is to ignore case so that edit distance should be computed between two completely lower-case words. However, this is leads to potential problems with capitalization for misspelled words if care is not taken. Examine the corrections below carefully, particularly for the first section which ignores case.

Welcome to DrJava.
> // Ignore case is true
> AutomaticSC asc = new AutomaticSC("english-dict.txt",true);
> asc.correctWord("inhairytense")
inheritance

> // Match capitlization in result
> asc.correctWord("Inhairytense")
Inheritance

> // Match all caps in result
> asc.correctWord("INHAIRYTENSE")
INHERITANCE

> // Don't match weird case mixtures
> asc.correctWord("InhairyTENSE")
inheritance
> asc.correctWord("InHAIRYTENSE")
inheritance

> // Accounting for case leads to interesting results in edit distance
> AutomaticSC asc = new AutomaticSC("english-dict.txt",false);
> asc.correctWord("inhairytense")
inheritance
> asc.correctWord("Inhairytense")
carotene
> asc.correctWord("INHAIRYTENSE")
NYSE
> asc.correctWord("InHAIRYTENSE")
AIDS
> asc.correctWord("InhairyTENSE")
hairy

To ease the task of matching case, the define a public helper method

public static String matchCase(String model, String source)

The intent of this method is to help honor a few common letter case patterns: capitalized words and words in all caps. The model parameter should be checked for being one of

  • All capital letters
  • One capital letter followed by all lower case
  • Neither of the above

The source word should be transformed to match the pattern established by model and returned. Below are examples of the different situations along with examples of parameters and expected return value.

Situation model source return
All Caps BANANA apple APPLE
All Caps PEAR orange ORANGE
Capitalized Banana orange Orange
Capitalized Apple pear Pear
Neither banana apple apple
Neither banana Apple Apple
Neither BaNaNa aPPle aPPle
Neither peaR Orange Orange

The overarching correctWord(word) behavior for AutomaticSC is the following.

  • If ignoreCase is false, use the parameter word and dictionary words as is with editDistance(..). Return the closest word and don't use matchCase(..)
  • If ignoreCase is true, convert the parameter word and dictionary words to a uniform case (upper or lower will work) before passing them into editDistance(..). The closest word returned should then be run through matchCase(..) to produce the results.

9.4 (20%) Manual Inspection Criteria for AutomaticSC   grading

  • The constructor is very short by employing the initializtion that is performed in the parent class through use of super(..)
  • Methods are not replicated from the parent class. Only the required correctWord(word) method overrides parent behavior while other methods are inherited by leaving them unspecified.
  • The matchCase(..) method does clean case analysis to determine if the model paraemter is all caps, capitalized, or neither and simple transformations to the source to cause it to match.
  • correctWord(word) clearly incorporates the ignoreCase field, makes use of matchCase(..), and is specified in a clean and readable fashion.

10 Interactive Spell Checking

Most interactive document editor (MS Word, Open Office, Google Docs) will provide an interactive spell checker which will search the document from beginning to end presenting the user with misspelled words and prompt for corrections. This is the purpose of the InteractiveSC class which extends SpellChecker.

  • isCorrect(word) functions identically to SpellChecker.
  • Calling correctWord(word) will prompt the user for a correction (even if the word is correct) which will then be returned.
  • correctDocument(doc) will identify misspelled words, show the line on which they appear using the doc.currentLine() method, but also "highlight" the word with asterisks to show it is incorrect, then prompt for a correction via correctWord(word)

Since the behavior of correctWord(word) and correctDocument(doc) are different from the parent methods, they should be overridden.

Since a user will be communicating with instances of the InteractiveSC class, the constructor for the class takes two additional parameters aside from the dictionary file and whether to ignore case.

  • A Scanner which will provide input to the spell checker. The provided scanner should be used as given, not re-initialized. The source for the scanner may be System.in which will read from what the user types or it may be from a string source to facilitate testing. Whenever the spell checker requires input such as for a word correction, read it from the scanner provided in the constructor.
  • A PrintWriter which will allows output to be created by the spell checker. Whenever the spell checker wishes to print a prompt, it should employ the PrintWriter methods such as println(..) and printf(..) to do so. It is tempting to use System.out.println(..) for all output but there are cases in which output should be re-directed so it doesn't appear immediately on the screen such as while running tests. Use of the PrintWriter for output allows that to happen here.

Both these arguments are stored in fields of the class mentioned in its architecture.

10.1 Demo Usage

Not carefully how the Scanner stdin and PrinterWriter stdout are initialized and provided as arguments to the InteractiveSC constructor to allow the user to directly interact with the spell checker.

Also note the format of information associated with the automatic spell checker: it is preceded by the @ symbol to distinguish it from other prompts.

Welcome to DrJava.
> import java.util.*;
> import java.io.*;

> // Make initialization easy
> Scanner stdin = new Scanner(System.in);
> PrintWriter stdout = new PrintWriter(System.out,true);

> // Create an interactive spell checker using english-dict.txt
> InteractiveSC sc = new InteractiveSC("english-dict.txt",true,stdin,stdout);
> sc.isCorrect("dork")         // in the english dictionary
true


> String s;
> sc.isCorrect("dorkus")          // not in the english dictionary
false
> s = sc.correctWord("dorkus");   // prompt for adding during correction
@- Correction for **dorkus**:
dork
@ Corrected to: dork
> s
dork
> sc.isCorrect("cheese")          // in english dictionary
true
> sc.isCorrect("cheeze")          // not in english or personal dictionary
false
> s = sc.correctWord("cheeze");     // prompt for adding during correction
@- Correction for **cheeze**:
cheese
@ Corrected to: cheese
> s
cheese

> // Correct a whoel document interactively
> Document doc = new Document("One potatoe, two potatoe, three potatoe, four.");
> sc.correctDocument(doc);
@ MISSPELLING in: One **potatoe**, two potatoe, three potatoe, four.
@- Correction for **potatoe**:
potato
@ Corrected to: potato
@ MISSPELLING in: One potato, two **potatoe**, three potatoe, four.
@- Correction for **potatoe**:
potatoes
@ Corrected to: potatoes
@ MISSPELLING in: One potato, two potatoes, three **potatoe**, four.
@- Correction for **potatoe**:
potatoes
@ Corrected to: potatoes
> doc.toString()
One potato, two potatoes, three potatoes, four.

10.2 Class Architecture

public class InteractiveSC extends SpellChecker{
// A spell checker which interactively prompts users for spelling
// corrections.  It inherets much of its functionality from
// SpellCheck but the behavior of correctWord(w) and
// correctDocument(d) is modified from the parent version.

  protected Scanner input;
  // Scanner to read input from a user. The scanner should be provided
  // in the constructor and should not be created. It may be connected
  // to System.in for true interactive use or may be fixed input in
  // from a string used for testing.

  protected PrintWriter output;
  // PrintWriter used to write output for a user. It should be
  // provided in the constructor and should not be created. It may be
  // connected to System.out to write to the screen or may write a
  // temporary buffer during tests.

  public InteractiveSC(String dictFilename, boolean ignoreCase, Scanner input, PrintWriter output);
  // Constructor for the interactive spell checker.  Arguments
  // dictFilename and ignoreCase should be used to invoke the super
  // class constructor.  The input and output parameter should be set
  // to the associatd fields of this class.

  @Override
  public String correctWord(String word);
  // Prompt the user for a correction using a prompt with the format:
  //
  // @- Correction for **potatoe**:
  //
  // where "potatoe" is replaced with the misspelled word.  Read input
  // from the user and return the provided correction.  Before
  // returning, print the correction in a message formatted:
  //
  // @ Corrected to: potato
  // 
  // Note that this method overrides the version of correctWord(w)
  // from the parent class. Like the parent version, it will produce
  // corrections irrespective of whether the given word is in the
  // dictionary.

  @Override
  public void correctDocument(Document doc);
  // Starting from the beginning of the document, apply corrections to
  // all misspelled words. When a misspelled word is found, print a
  // message and line of the document with the misspelled word
  // highlighed as in
  //
  // @ MISSPELLING in: One **potatoe**, two potatoe, three potatoe, four.
  //
  // Then prompt the user for a correction as in
  //
  // @- Correction for **potatoe**:
  // 
  // Print newlines a the end of both messages.  Printing to the
  // screen should use the output PrintWriter tracked by the
  // interactive spell checker.  Reading input should use the input
  // Scanner tracked by the spell checker.  Note that this method
  // overrides the version of correctDocument(w) from the parent
  // class.

}

10.3 Implementation Notes

Prompts

The convention for prompts in the InteractiveSC is as follows.

@ This is information and requires no input

@- This is a prompt and input should be entered on the next line
userInput

Both types of prompts start with the @ symbol but prompts which require user responses start with @-.

Initialization of Input and Output

From the demo above the following initialization

Scanner stdin = new Scanner(System.in);
PrintWriter stdout = new PrintWriter(System.out,true);
InteractiveSC sc = new InteractiveSC("english-dict.txt",true,stdin,stdout);

is a good way to get an InteractiveSC up and running. The extra parameter to the PrintWriter constructor ensures that every print flushes output to the screen so that it appears immediately. Without this option, one may need to manually flush output via

stdout.flush();

in order to see anything printed to the screen.

Input Reading

It is assumed that corrections for words will always be another single word. While this is not too general, it suits the needs of the moment without complicating input for the Scanner. Use the next() method of Scanner to read corrected words.

10.4 (5%) Manual Inspection Criteria for InteractiveSC   grading

  • Only the required methods are overriden from the parent class; isCorrect(word) does not require modification
  • Only the required new fields for the InteractSC are defined; the dictionary and treatment of case fields are inherited and do not need to be redefined.

11 Personal Dictionary Checker

Most spell check systems have a system dictionary which is not changed (english-dict.txt in our case) but also allow users to define a personal dictionary with words that they consider correct. This is important for many domains to accommodate technical terms specific to the style of writing such as "sith," "kyber," and "lightsaber."

To facilitate this functionality, the PersonalSC extends the InteractiveSC by allowing a personal dictionary of words to be used. This is a file similar to english-dict.txt except that it is may be altered by the user, either by editing it directly or through the methods of the PersonalSC class.

Most methods for PersonalSC are identical to InteractiveSC with the exception of correctWord(word) which operates as follows.

  • If the word is considered correct, act exactly as the parent method prompting for a potential correction
  • Otherwise, prompt the user for a yes/no answer on whether the word should be added to the personal dictionary
  • If no, act exactly as the parent version of correctWord(word) does
  • If yes, expand the array associated with the personal dictionary and add the new word on. Return the word unaltered

PersonalSC instances can be asked about the size and contents of their personal dictionary and ultimately have their contents written back to the file from which they were read via the savePersonalDict() method.

11.1 Demo Usage

Welcome to DrJava.
> import java.util.*;
> import java.io.*;

> // Make initialization easy
> Scanner stdin = new Scanner(System.in);
> PrintWriter stdout = new PrintWriter(System.out,true);

> // Create a spell checker with a personal dictionary from personal-dict.txt
> PersonalSC sc = new PersonalSC("english-dict.txt",true,stdin,stdout,"personal-dict.txt");
> String s;
> sc.isCorrect("dork")            // in the english dictionary
true
> sc.isCorrect("dorkus")          // not in the english or personal dictionary
false
> s = sc.correctWord("dorkus");   // prompt for adding during correction
@- **dorkus** not in dictionary add it? (yes / no)
yes
> s
dorkus
> sc.isCorrect("dorkus")          // new word in personal dictionary
true
> sc.getAllPersonalDictWords()    // show personal dictionary words
dorkus

> sc.isCorrect("cheese")          // in english dictionary
true
> sc.isCorrect("cheeze")          // not in english or personal dictionary
false
> s = sc.correctWord("cheeze");   // prompt for adding during correction
@- **cheeze** not in dictionary add it? (yes / no)
yes
> s
cheeze
> sc.isCorrect("cheeze")          // new word in personal dictionary
true
> sc.getAllPersonalDictWords()    // show personal dictionary words
dorkus
cheeze

> sc.savePersonalDict()           // save personal dictionary words to file personal-dict.txt
@ Personal dictionary written to file personal-dict.txt

> // Create a new spell checker which is initialized with the personal dictionary
> PersonalSC sc2 = new PersonalSC("english-dict.txt",true,stdin,stdout,"personal-dict.txt");
> sc2.getAllPersonalDictWords()   // show personal dictionary words
dorkus
cheeze

> sc2.isCorrect("dorkus")         // already in personal dictionary
true
> sc2.isCorrect("cheeze")         // already in personal dictionary
true

> // Spell checkers are independent of one another
> s = sc.correctWord("potatoe");
@- **potatoe** not in dictionary add it? (yes / no)
yes
> s
potatoe
> sc.getAllPersonalDictWords()    // new word in personal dict
dorkus
cheeze
potatoe

> sc2.getAllPersonalDictWords()   // same personal dict as before
dorkus
cheeze


> // Adding to the personal dictionary means later words in a doc may be correct
> Document doc = new Document("One potatoe, two potatoe, three potatoe, four.");
> sc.correctDocument(doc);
> doc.toString()
One potatoe, two potatoe, three potatoe, four.
> sc.getAllPersonalDictWords()
dorkus
cheeze
potatoe


> // Correct first misspellings but accept the remainder
> Document doc = new Document("One tumato, two tumato, three tumato, four.")
> sc.correctDocument(doc);
@ MISSPELLING in: One **tumato**, two tumato, three tumato, four.
@- **tumato** not in dictionary add it? (yes / no)
no
@- Correction for **tumato**:
potato
@ Corrected to: potato
@ MISSPELLING in: One potato, two **tumato**, three tumato, four.
@- **tumato** not in dictionary add it? (yes / no)
yes
> doc.toString()
One potato, two tumato, three tumato, four.
> sc.getAllPersonalDictWords()
dorkus
cheeze
potatoe
tumato

11.2 Class Architecture

public class PersonalSC extends InteractiveSC {
// A spell checker which allows use of a personal dictionary. The
// personal dictionary is initially read from a file though the file
// may be non-existen in which case the personal dictioary is empty to
// begin with.  When checking for correctness of words, both the
// system dictionary and personal dictionary are checked. If a
// misspelled word is to be corrected, the user is interactively
// prompted as to whether the word should instead be added to the
// personal dictionary.  The class can save the personal dictionary
// back to the file from which it was read.

  protected String personalDictFilename;
  // Name of the file for the personal dictionary

  protected String [] personalDictWords;      
  // Personal dictionary words

  public PersonalSC(String dictFilename, boolean ignoreCase, Scanner input, PrintWriter output, String personalDictFilename);
  // Construct a spell checker with a personal, modifiable dictionary.
  // Arguments are identical to InteractiveSC except for the final
  // argument which is a file containing the personal dictionary
  // words. This file should be formatted in the same way as a normal
  // dictionary and the contents used to initially fill in the
  // personalDictWords field.

  public int personalDictSize();
  // Return the size of the personal dictionary used by this
  // spellchecker which is the size of the personalDictWords array.

  @Override
  public boolean isCorrect(String word);
  // Check if the word is correct according to the same methodology as
  // the parent class.  If not, check whether the word appears in the
  // personal dictionary associated with this spell checker. Honor the
  // ignoreCase setting when checking the personal dictionary.

  @Override
  public String correctWord(String word);
  // If the parameter word is not in the system or personal
  // dictionary, prompty the user on whether they would like to add it
  // to the dictionary as in
  //
  // @- **tumato** not in dictionary add it? (yes / no)
  //
  // If the response is "yes" (read using the spell checkers scanner),
  // append it to the personalDictWords. You may use library methods
  // from java.util.Arrays to make the append easier.  After
  // appending, return the word as it is now considered correct.
  // 
  // If the answer on whether to add is not "yes" (e.g. "no"), prompt
  // the user for a correction in the same way that the parent class does. 

  public String getAllPersonalDictWords();
  // Return a string showing all words currently in the spell checkers
  // personal dictionary, one word per line.

  public void savePersonalDict() throws Exception;
  // Write the contents of personalDictWords to the file from which
  // they were initially read (personalDictFilename).  Write one word
  // per line.  Print a message to the screen indicating the
  // dictionary has been saved in the format:
  //
  // @ Personal dictionary written to file personal-dict.txt
  //
  // where the last word on the line is the name of the file where the
  // contents are saved.

}

11.3 Implementation Notes

  • It is likely that you will want a helper method to expand the array associated with personalDict. This will make the code in correctWord(word) a bit shorter and cleaner. You are free to use library methods such as those in Array for this purpose.
  • It is possible that a the personal dictionary file is empty or non-existent. Use the readAllLines(filename) method to aid with this which should return an empty array if the file does not exist.
  • Make sure to retain the name of the personal dictionary file as the contents of the personalDict array must be written back to that same file on a call to savePersonalDict(). The PrintWriter class is useful for file writing.
  • Whenever writing files, always make sure to invoke the close() method of PrintWriter to ensure contents are actually written.

11.4 (5%) Manual Inspection Criteria for PersonalSC   grading

  • Make effective use of the parent class version of correctWord(word) during implementation of that method
  • Employ the readAllLines(filename) method to ease the task of reading in the personal dictionary.
  • Clean code is present to enlarge the personalDict array to accommodate new words. Library methods are encouraged to facilitate this process.
  • Standard arrays are employed for the personalDictWords rather than other more advanced data structures such as ArrayList

12 Honors Problem

The honors problem will be posted at a later time.


Author: Mark Snyder and Chris Kauffman (msnyde14@gmu.edu, kauffman@cs.gmu.edu)
Date: 2017-07-10 Mon 14:49