CS 211 Project 3: Class Inhairytense
- Due: Sunday 3/5/2017 by 11:59 pm
- Submit to Blackboard
CODE DISTRIBUTION:
- Provided code: p3pack.zip
- Test files: p3tests.zip
CHANGELOG:
- Wed Mar 1 14:08:48 EST 2017
- Additional guidance on how
correctWord(..)
should use theignoreCase
field has been added to the implementation notes for theAutomaticSC
. These may clarify the overall behavior of the method. - Wed Mar 1 09:52:02 EST 2017
- Minor update to the tests: a reference to "
small-dict.txt
" inSpellCheckerTests.java
should be "short-dict.txt
". - Wed Mar 1 01:34:21 EST 2017
- Test cases are now linked at the top of the spec.
- Tue Feb 28 09:24:37 EST 2017
- Several manual inspection criteria have been added to
SpellChecker
in response to student questions about whetherDocument
can be used duringreadAllLines(..)
: it cannot. The new manual inspection criteria are linked here.
Table of Contents
- 1. Overview
- 2. Spell Checker Functionality and Hierarchy
- 3. Project Files
- 4. Setup and Submission
- 5. Grading Breakdown
- 6. Provided Class: Document
- 7. Basic Spell Checker
- 8. Provided Class: StringComparison
- 9. Automatic Spell Checking
- 10. Interactive Spell Checking
- 11. Personal Dictionary Checker
- 12. Honors Problem
1 Overview
A primary feature of object-oriented languages like Java is inheritance, the ability to relate a new class to a previously written class to exploit a previously written code. Existing classes can be extended to create modified versions of their old methods and introduce entirely new functionality. This power comes with a cost though: inheritance is difficult to understand at first and it is often difficult to recognize an opportune moment to employ it.
In this project, we will develop a small class hierarchy of spell checkers, classes which are designed to enable the correction of misspelled words in documents. Each spell checker will share some structure and functionality with the basic spell checker which makes the collection of classes as a whole a good candidate for an inheritance hierarchy.
2 Spell Checker Functionality and Hierarchy
There are 4 required spell checkers to be implemented. Each of them implements the same basic functionality.
- Constructor
- Construct a spell checker by providing it with a file containing correct dictionary words and indicate whether upper/lower case letters should be ignored while checking spelling
-
boolean isCorrect(String word)
- Return true if the given word considered correct by the spell checker and false otherwise.
-
String correctWord(String word)
- Regardless of whether the word is correct or not, provide an alternative word to replace it.
-
void correctDocument(Document doc)
- Modify the provided document to replace all incorrect words with
a correction provided by the spell checker. The
Document
class is provided in the project pack.
The different versions of the spell checker specialize these methods to tailor the class to a specific use case. However, all of the spell checkers share a considerable amount of functionality making the collection of classes a good candidate for an inheritance hierarchy.
The hierarchy is given below.
SpellChecker : Basic spell check which highlights incorrect words | +--AutomaticSC extends SpellChecker : Automatic check which corrects words based on edit distance | +--InteractiveSC extends SpellChecker : Interactive checker which prompts users for corrections | +--PersonalSC extends InteractiveSC : Interactive checker with a personalizable dictionary Document : Editable document class StringComparison : Contains editDistance(..) method for use in AutomaticSC
A few notes
- The top of the hierarchy is the
SpellChecker
class which has various descendants such asAutomaticSC
andPersonalSC
. - The descendant classes will inherit the fields and methods of parent classes and in some cases override/specialize methods to operate differently from the parent class.
- The
Document
andStringComparison
class are not part of the hierarchy of spell checkers. They are both provided in the project code pack.
Below is a brief overview of how each of the spell checkers behaves differently as they go about correcting documents. The demonstration is given via an interactive session in DrJava.
Welcome to DrJava. > // Create a document with the provided Document class > String content = "One potatoe, two tumatoes, three potatoes, four. I misunderestimated how many potatoes." Document doc = new Document(content); > doc.toString() One potatoe, two tumatoes, three potatoes, four. I misunderestimated how many potatoes. > // Highlight misppeled words > doc = new Document(content); > SpellChecker sc = new SpellChecker("english-dict.txt",true); > sc.correctDocument(doc) > doc.toString() One **potatoe**, two **tumatoes**, three potatoes, four. I **misunderestimated** how many potatoes. > // Automatic spell correcting > doc = new Document(content); > AutomaticSC asc = new AutomaticSC("english-dict.txt",true); > asc.correctDocument(doc) > doc.toString() One potato, two tomatoes, three potatoes, four. I underestimated how many potatoes. > // Set up input / output classes > import java.util.*; import java.io.*; > Scanner stdin = new Scanner(System.in); > PrintWriter stdout = new PrintWriter(System.out,true); > // Interactive spell checking > doc = new Document(content); > SpellChecker isc = new InteractiveSC("english-dict.txt",true,stdin,stdout); > isc.correctDocument(doc) @ MISSPELLING in: One **potatoe**, two tumatoes, three potatoes, four. I misunderestimated how many potatoes. @- Correction for **potatoe**: potato @ Corrected to: potato @ MISSPELLING in: One potato, two **tumatoes**, three potatoes, four. I misunderestimated how many potatoes. @- Correction for **tumatoes**: tomatoes @ Corrected to: tomatoes @ MISSPELLING in: One potato, two tomatoes, three potatoes, four. I **misunderestimated** how many potatoes. @- Correction for **misunderestimated**: misunderstood @ Corrected to: misunderstood > doc.toString() One potato, two tomatoes, three potatoes, four. I misunderstood how many potatoes. > // Use interactive with a personal dictionary > doc = new Document(content); > PersonalSC psc = new PersonalSC("english-dict.txt",true,stdin,stdout,"personal-dict.txt"); > psc.correctDocument(doc) @ MISSPELLING in: One **potatoe**, two tumatoes, three potatoes, four. I misunderestimated how many potatoes. @- **potatoe** not in dictionary add it? (yes / no) yes @ MISSPELLING in: One potatoe, two **tumatoes**, three potatoes, four. I misunderestimated how many potatoes. @- **tumatoes** not in dictionary add it? (yes / no) no @- Correction for **tumatoes**: tomatoes @ Corrected to: tomatoes @ MISSPELLING in: One potatoe, two tomatoes, three potatoes, four. I **misunderestimated** how many potatoes. @- **misunderestimated** not in dictionary add it? (yes / no) yes > doc.toString() One potatoe, two tomatoes, three potatoes, four. I misunderestimated how many potatoes. > psc.getAllPersonalDictWords() potatoe misunderestimated
3 Project Files
Files that are "provided" are in the project code pack. Tests may be posted after the initial release of the project spec.
File | State | Notes |
---|---|---|
SpellChecker.java | create | Basic spell check which highlights incorrect words |
AutomaticSC.java | create | Automatic check which corrects words based on edit distance |
InteractiveSC.java | create | Interactive checker which prompts users for corrections |
PersonalSC.java | create | Interactive checker with a personalizable dictionary |
Document.java | provided | Editable document class |
StringComparison.java | provided | Contains editDistance(..) method for use in AutomaticSC |
english-dict.txt | data | Dictionary of 119095 English words in ASCII, one word per line |
junit-cs211.jar | provided | JUnit library for command line testing |
ID.txt | create | Create in setup to identify yourself |
Tests | tests | Will be posted later |
4 Setup and Submission
The submission procedure is identical to previous projects.
- Keep all project files in a directory named after the pattern
ckauffm2-205-p3
- Include an
ID.txt
file with your identification details in it. - When finished, create a
.zip
file of your project directory - Verify that all your project files are in the zip
- Submit your zipped project file to Blackboard under the appropriate project link.
- You may submit as many times as desired; only the most recent submission will be graded.
5 Grading Breakdown
Grading for this project will be divided into two distinct parts: Automated tests, and Manual Inspection.
5.1 Automated Tests (50%) grading
- JUnit test cases will be provided to detect errors in your code.
- Tests may not be available on initial release but will be posted at a later time.
- Tests may be expanded, changed, and corrected as the deadline approaches.
- It is your responsibility to get and use the freshest set of tests available.
- Tests will be provided in source form so that you will know what tests are doing and where you are failing.
- It is up to you to run the tests to determine whether you are passing or not. If your code fails to compile against the tests, little to no credit will be garnered for this section.
- Most of the credit will be divide evenly among the tests; e.g. 50% / 25 tests = 2% per test. However, the teaching staff reserves the right to adjust the weight of test cases after the fact if deemed necessary.
- Code that does not compile and run tests according to the specified command line invocation may lose all automated testing credit. Graders will usually try to fix small compilation errors such as bad directory structures or improper use of packages. Such corrections typically result in a loss of 5-10% credit on automated testing. However, if more than a small amount of error to fix problems seems required, no credit will be given.
5.2 Manual Inspection (50%)
- Teaching staff (GTAs) will manually inspect your work looking for a specific set of features. They are generally listed throughout the document next to the relevant project features.
- Credit will also be awarded/deducted based on adherence to good
coding style, which includes:
- Good indentation and curly brace placement (be consistent and follow a common convention)
- Comments describing each field and method
- Comments describing a complex section of code and invariants which must be maintained for classes
- Use of internal private methods to decompose the problem beyond what is required in the spec, as needed
- Some credit will be assigned for designing your program according to the given specification, for instance using a designated algorithm, structuring your program in a certain fashion, or utilizing a required programming element.
6 Provided Class: Document
This class is provided and does not need to be implemented. You will need to familiarize yourself with its methods, however, as a primary functionality of all spell checkers is to correct misspellings that appear in a document.
Document
provides a simple way to convert text in a string into a
streamable, semi-editable format. After constructing a Document
, one
can repeatedly ask for the String nextWord()
or whether the document
boolean hasNextWord()
. This similar in spirit to the methods of
Scanner
and allow one to process the document from beginning to
end. Unlike Scanner
, a Document
has a method to void
replaceLastWord(String correction)
which will alter the last word
returned by nextWord()
to be the provided correction. Documents can
also void rewind()
back to the beginning for additional passes
through it.
The remaining sections provide a demonstration use of Document
in a
DrJava interactive loop and a summary of its public methods. You are
free to examine the contents of Document.java
and may learn a few
new tricks but the class should not be altered to complete the project.
6.1 Demo Usage
Welcome to DrJava. > // Create a document with the specified contents > Document doc = new Document("They misunderestimated me."); > doc.toString() // show contents They misunderestimated me. > doc.hasNextWord() // any words left? true > String word; > word = doc.nextWord() // capture next word as a string They > doc.hasNextWord() // any words left? true > word = doc.nextWord() // capture next word as a string misunderestimated > doc.hasNextWord() // an so on... true > word = doc.nextWord() me > doc.hasNextWord() // until no words are left false > word = doc.nextWord() // at which point exceptions are raised java.lang.RuntimeException: No words remain in the document at Document.nextWord(Document.java:90) > > // Fresh document > doc = new Document("They misunderestimated me. That is an incorrect analyzation of the situation."); > > // Read 8 words > for(int i=0; i<8; i++){ word = doc.nextWord(); } > word // Last word read analyzation > doc.debugString() // Show the doc with marks around the last word They misunderestimated me. That is an incorrect >>analyzation<< of the situation. > doc.replaceLastWord("analysis") // Replace a misspelling > doc.debugString() // Show the doc with marks around the last word They misunderestimated me. That is an incorrect >>analysis<< of the situation. > doc.toString() // Show contents They misunderestimated me. That is an incorrect analysis of the situation. > > doc.rewind() // back to the beginning of the document > word = doc.nextWord() // produces the first word They > doc.debugString() // show first word marked >>They<< misunderestimated me. That is an incorrect analysis of the situation.
6.2 Class Architecture
public class Document{ // Simple, editable document. Contents are initialized with a // string. Allows scanning through the document by word with calls to // nextWord() and hasNextWord() with rewind() resetting back to the // beginning of the document. Words can be replaced via calls to // replaceLastWord(str). Display contents with toString() and // debugString(). public Document(String contents); // Construct a document with the given contents public Document(File file) throws Exception; // Construct a document contents initialized from the given file public String toString(); // Returns a string representation of the entire document public String debugString(); // Returns a string representation of the document with the contents // modified to mark the word selected by nextWord() public String nextWord(); // Return the next word in the document starting with the first // word. Ignores punctuation and numbers. Throws an exception if // there are no words remaining in the document. public boolean hasNextWord(); // Return true if the document contains any more words so that a // call to nextWord() would succeed. Returns false if no words // remain in the document. Punctuation and numbers do not count as // words. public void rewind(); // Reset the internal position of the document so that a subsequent // call to nextWord() will return the first word in the document public void replaceLastWord(String correction); // Replace the last word returned by nextWord() with the given // correction. Internal positioning will be adjusted so that // subsequent calls to nextWord() will move beyond the supplied // correction. Throws an exception if nextWord() has not been called // appropriately (ex: immediately after construction or after a call // to rewind()) public String currentLine(); // Return a string showing the line of the document which contains // the last word returned by nextWord(). Returns the first line of // the document if called after construction or a call to rewind(). }
7 Basic Spell Checker
The root of the spell checker class hierarchy is the class
SpellChecker
. Its purpose is simply to identify misspelled words and
mark them with asterisks as in the following
This is a **mispeled** word. So is **ths**.
As mentioned in the section on spell checker functionality, the
SpellChecker
class provides four basic methods to accomplish this
task: constructor, isCorrect(word)
, correctWord(word)
, and
correctDocument(doc)
. Additional support methods are also provided
which are shown in the demo section below and outlined in the class
architecture later.
7.1 Demo Usage
Below is a demonstration of several of the capabilities of the
SpellChecker
.
Welcome to DrJava. > // Demonstrate spell checker capabilities > // Construct a spell checker with the provided english dictionary ignoring case > SpellChecker sc = new SpellChecker("english-dict.txt",true); > sc.dictSize() // show # words in dictionary 119095 > sc.isCorrect("potatoes") // is word in the dictionary true > sc.correctWord("potatoes") // provide a correction for word **potatoes** > sc.isCorrect("potatoe") // is word in the dictionary false > sc.correctWord("potatoe") // provide a correction for word **potatoe** > sc.correctWord("tumato") // provide a correction for word **tumato** > sc.correctWord("misunderestimated") **misunderestimated** > // Create a document > String content = "One potatoe, two tumatoes, three potatoes, four. I misunderestimated how many potatoes." > Document doc = new Document(content) > // "Correct" misspellings in the document by highlighting them > sc.correctDocument(doc) > doc.toString() One **potatoe**, two **tumatoes**, three potatoes, four. I **misunderestimated** how many potatoes. > // Read all lines from a file using a static method > String [] lines = SpellChecker.readAllLines("english-dict.txt"); > lines[0] A > lines[5] aah > lines.length 119095
7.2 Class Architecture
public class SpellChecker{ // A class to do spell checking. This version only marks misspelled words with // asterisks as in **mispeling**. It serves as a parent class for other spell // checkers to inherit functionality to add features by overriding methods. protected String [] dictWords; // Array of words considered correct by spell checker protected boolean ignoreCase; // If true, ignore case when checking the spelling of words; otherwise // capitalization differences will be counted as misspellings public static String [] readAllLines(String filename); // Utility which reads all lines from a file and returns them as an array of // strings. If problems are encountered during reading, return a string array // of length 0 (empty). See implementation notes for dicussion of how to // handle exceptions and use two-pass scanning to allocate an appropriately // sized array. public SpellChecker(String dictFilename, boolean ignoreCase); // Construct a spellchecker. dictFilename is the name of a file containing all // words that are considered correct, one on each line; english-dict.txt is // commonly used. ignoreCase indicates whether case should be ignored or used // when checking for word correctness against dictionary words. public int dictSize(); // Return the size of the dictionary used by this spellchecker which is the // number of words read from the dictionary file and stored in the // dictWords array. public boolean isCorrect(String word); // Return true if the provided word is considered correct by the spell checker // and false otherwise. A word is correct if it is equal to a word in the // dictionaryWord array. It is also correct if case is being ignored and is // equal ignoring case to some word in the dictWords array. public String correctWord(String word); // Create a correction for the given word. Return the word surrounded by // asterisks which mark it as incorrect as in the word "misunderestimated" // should become "**misunderestimated**". This method produces a correction // for the given word even if it is in the dictionary: it is to be used in // conjunction with isCorrect(word) to transform only words not in the // dictionary. That means correctWord("apple") returns "**apple**". public void correctDocument(Document doc); // From the beginning of the document, apply corrections to all words in the // document. Each misspelled word will be marked with asterisks according to // the correctWord() method. Methods of Document such as nextWord(), // hasNextWord(), and replaceLastWord(w) are used to modify the provided // document. }
7.3 Demo Main Method using SpellChecker
// Demonstration of various features of basic spell checkers public class SpellCheckerMain{ // Utility to print by typing less public static void print(Object s){ System.out.println(s); } public static void main(String args[]){ // Construct a spell checker which uses the english dictionary // provided and ignores case SpellChecker checker = new SpellChecker("english-dict.txt",true); print( checker.dictSize() ); // 119095 - words in english-dict.txt print( checker.isCorrect("case") ); // true print( checker.correctWord("case") ); // "**case**" - always put on asterisks print( checker.isCorrect("analyzation") ); // false print( checker.correctWord("analyzation") ); // "**analyzation**" Document doc = new Document("They misunderestimated me."); print( doc.toString() ); // They misunderestimated me. checker.correctDocument(doc); print( doc.toString() ); // They **misunderestimated** me. } }
7.4 Implementation Notes
Reading all lines from a file
SpellChecker
must read its dictionary from a file during
construction so it makes sense for it to provide some static
methods
to read the contents of a file that can be used by its children
classes. The method readAllLines(filename)
extracts the contents of
a file and returns them as an array of lines. This is useful for
reading dictionary files like the provided english-dict.txt
which
simply has one correct word per line as in:
abalone abalone's abalones abandon abandoned abandoning abandonment abandonment's abandons ...
Completing readAllLines(filename)
eases the task of initializing the
spell checker as the method is used to read all correct words to
populate the protected field dictWords
.
To implement readAllLines(filename)
, the easiest set of classes to
use are
File
to indicate input will be read from a fileScanner
to read contents, one line at a time
Importantly use the following strategy to efficiently read in all words. It is sometimes referred to as a "two-pass" approach.
- First Pass
- Open the scanner
- Read lines from the scanner but ignore those lines
- Read lines until no more exist, counting lines until the end of the file and close the scanner
- Allocate an array sized to the number of lines in the file
- Second Pass
- Recreate the scanner which will start it back at the beginning of the file
- Read the previously calculated number of lines from the scanner
- Each time a line is read assign it to an element of the array
- Return the array
Catching Exceptions while Creating Scanners
Creating a scanner from a file can go wrong: the file might not exist
or might be readable with the permissions available to the program.
An exception will result from this which must then either (1) be handled
or (2) acknowledged in the method signature as a possible outcome of
running the method. readAllLines()
takes approach (1). Handling
exceptions will be covered later in the course but for now, the
following basic code pattern suffices to illustrate how this works in
the situation at hand.
Scanner input; try{ input = new Scanner(...); // this could go wrong } catch(Exception e){ // when something goes wrong return ...; // return some default value } // Nothing went wrong so start using Scanner input ...
This pattern may appear in a couple places in readAllLines(..)
.
Spell checking ignoring case and accounting for it
Spell checkers should honor the ignoreCase
option handed to them
when they are created. To demonstrate, below are two spell checkers
created with different values for ignoreCase
.
> // ignoreCase is false > SpellChecker useCase = new SpellChecker("english-dict.txt",false); > useCase.isCorrect("mellifluous") true > useCase.isCorrect("Mellifluous") false > useCase.isCorrect("MELLIFLUOUS") false > // ignoreCase is true > SpellChecker ignoreCase = new SpellChecker("english-dict.txt",true); > ignoreCase.isCorrect("mellifluous") true > ignoreCase.isCorrect("Mellifluous") true > ignoreCase.isCorrect("MELLIFLUOUS") true
This complicates the isCorrect(word)
method somewhat. The String
class contains method that compares two strings for equality
considering case and one that considers equality ignoring case which
should be located and employed.
Correcting Documents
The correctDocument(doc)
method is meant to scan through an entire
document to correct all misspelled words. In the case of the
SpellChecker
, this correction is simply to identify misspelled words
with asterisks.
It is tempting to do this manually by replacing words with explicitly constructed asterisked strings in a call such as
doc.replaceLastWord("**"+incorrectWord+"**");
and while it will get the job done for the moment, it is somewhat short-sighted for the following reasons.
- The class already knows how to produce a corrected version of the
word by invoking its
correctWord(..)
method so this is a good chance to make use of existing code. If later the misspelled word markers are changed to!!mispeled!!
, there is only one place in code that needs alteration. - By making use of the
correctWord(..)
method, one opens the possibility for child classes to adjust that method's behavior by overriding it and have the effect seen wherevercorrectWord(..)
is used in parent methods. This is exactly the tack that will be taken by child classAutomaticSC
.
7.5 (20%) Manual Inspection Criteria for SpellChecker grading
- The
readAllLines(filename)
method uses an efficient input strategy such as the one suggested in the spec to avoid repeatedly re-allocating larger arrays while reading input. Only a single array allocation should be required. - Exception handling is incorporated into
readAllLines(filename)
to return an empty array if no file is found. - Effective use of the
String
methods are made to honor theignoreCase
option duringisCorrect(word)
calls. - The implementation of
correctDocument(doc)
makes effective use of the public methods of theDocument
class to replace misspelled words with the results of a call tocorrectWord(word)
which will be important for descendant classes. - Standard arrays are employed for the
dictWords
rather than other more advanced data structures such asArrayList
- The
Document
class is NOT used inreadAllLines(..)
; aScanner
is instead used to read from the file - No instances of
Document
are used in theSpellChecker
except during thecorrectDocument(doc)
method - The
SpellChecker
has only the two fields specified in the class architecture:protected String [] dictWords; protected boolean ignoreCase;
8 Provided Class: StringComparison
StringComparison
houses a single method, editDistance(x,y)
which
is used to measure the "distance" between two strings. The method
employs an interesting technique often called dynamic programming
which constructs a table of values to efficiently compute the answer
to a set of recurrence relations describing a problem, in this case
how many operations are required to transform one string into
another. Edit distance is also referred to as Levenshtein Distance
after the first researcher to publish on the problem.
As with the Document
class, you may use the methods with
StringComparison
freely without modification. It is not essential
that you know how they work, only that you know how to put them to
use.
8.1 Class Architecture
public class StringComparison { // Class which contains some utility methods to compare strings public static int editDistance(String x, String y); // Compute the edit distance (Levenstein Distance) between strings x // and y; returns a positive number indicating the minimum character // insertions, deletions, or substitutions required to transform x // into y. Smaller numbers mean x and y are "closer" to each other. // Uses dynamic programming to solve this task as per the algorithm // at // https://en.wikipedia.org/wiki/Levenshtein_distance#Iterative_with_full_matrix. // // Possible to optimize the performance of this using the two-row // approach or a global matrix though both would introduce // complications. }
9 Automatic Spell Checking
While highlighting spelling errors is nice, automatic spelling
correction is generally considered a very useful feature of many
computing systems (though it is not without its own set of
pitfalls). The AutomaticSC
provides a way to automatically correct
spelling without the need for user interaction.
A simple means of doing automatic spell correction is to search the
dictionary for the "closest" word to one not in the dictionary.
Closeness here requires a distance measure that is provided by the
StringComparison.editDistance(x,y)
method.
9.1 Demo Usage
Welcome to DrJava. > // Demonstrate automatic spell checker capabilities > // Construct an automatic spell checker with the provided english dictionary ignoring case > AutomaticSC asc = new AutomaticSC("english-dict.txt",true); > asc.dictionarySize() // show size of dictionary Static Error: No method in AutomaticSC has name 'dictionarySize' > asc.isCorrect("potatoes") // is word in the dictionary true > asc.correctWord("potatoes") // provide a correction for word potatoes > asc.isCorrect("potatoe") // is word in the dictionary false > asc.correctWord("potatoe") // provide a correction for word potato > asc.correctWord("tumato") // provide a correction for word tomato > asc.correctWord("misunderestimated") underestimated > // Create a document > String content = "One potatoe, two tumatoes, three potatoes, four. I misunderestimated how many potatoes." > Document doc = new Document(content) > // Correct misspellings in the document by replacing with closest dictionary word > asc.correctDocument(doc) > doc.toString() One potato, two tomatoes, three potatoes, four. I underestimated how many potatoes.
9.2 Class Architecture
public class AutomaticSC extends SpellChecker{ // A spell checker which automatically selects a correction for a // misspelled word. It inherets most functionality from its parent // class but adjusts how correctWord(..) performs. public AutomaticSC(String dictFilename, boolean ignoreCase); // Construct an automatic spell checker. Pass the parameters to the // parent class constructor. @Override public String correctWord(String word); // Return a correction for the given word. The correction is the // word in the dictionary which has the smallest edit distance from // the given word. If there are ties, favor whichever word appears // earlier in the dictionary. Make use of the methods of the // provided StringComparison to find the closest word in the // dictionary. Make sure to honor the ignoreCase option which may // lead you to convert words to all upper or lower case. public static String matchCase(String model, String source); // Utility method to handle case matching between words. Check if // parameter model is all caps or only the first character is // capitalized and transform source to match the capitalization. In // the event that the model is neither all caps nor capitalized // followed by all lower case, return the source strnig as // is. Examples are given below. // // | Situation | model | source | return | // |-------------+--------+--------+--------| // | All Caps | BANANA | apple | APPLE | // | All Caps | PEAR | orange | ORANGE | // | Capitalized | Banana | orange | Orange | // | Capitalized | Apple | pear | Pear | // | Neither | banana | apple | apple | // | Neither | banana | Apple | Apple | // | Neither | BaNaNa | aPPle | aPPle | // | Neither | peaR | Orange | Orange | }
9.3 Implementation Notes
Exploiting Inheritance
As most of the behavior of the automatic spell checker is identical to its parent, very little code needs to be written. Note in the class overview that only two method are present.
- A class must always provide its own constructors. However, the
AutomaticSC
class requires exactly the same initialization asSpellChecker
. Employing the parent constructorsuper(..)
here will make theAutomaticSC
constructor short and sweet. AutomaticSC
behaves identically toSpellChecker
on all methods exceptcorrectWord(word)
. Thus, this is the method that needs a new definition as indicated by the@Override
annotation. This meansAutomaticSC
will be a short class: so long as methods inSpellChecker
are written correctly, they can be inherited and used without modification so require no code inAutomaticSC
.
Finding the Closest Word while Ignoring/Accounting for Case
Like its parent class, AutomaticSC
is initialized to either ignore
case or account for it. This is somewhat tricky to account for in the
correctWord(word)
method and requires some consideration. The most
common situation is to ignore case so that edit distance should be
computed between two completely lower-case words. However, this is
leads to potential problems with capitalization for misspelled words
if care is not taken. Examine the corrections below carefully,
particularly for the first section which ignores case.
Welcome to DrJava. > // Ignore case is true > AutomaticSC asc = new AutomaticSC("english-dict.txt",true); > asc.correctWord("inhairytense") inheritance > // Match capitlization in result > asc.correctWord("Inhairytense") Inheritance > // Match all caps in result > asc.correctWord("INHAIRYTENSE") INHERITANCE > // Don't match weird case mixtures > asc.correctWord("InhairyTENSE") inheritance > asc.correctWord("InHAIRYTENSE") inheritance > // Accounting for case leads to interesting results in edit distance > AutomaticSC asc = new AutomaticSC("english-dict.txt",false); > asc.correctWord("inhairytense") inheritance > asc.correctWord("Inhairytense") carotene > asc.correctWord("INHAIRYTENSE") NYSE > asc.correctWord("InHAIRYTENSE") AIDS > asc.correctWord("InhairyTENSE") hairy
To ease the task of matching case, the define a public helper method
public static String matchCase(String model, String source)
The intent of this method is to help honor a few common letter case
patterns: capitalized words and words in all caps. The model
parameter should be checked for being one of
- All capital letters
- One capital letter followed by all lower case
- Neither of the above
The source
word should be transformed to match the pattern
established by model
and returned. Below are examples of the
different situations along with examples of parameters and expected
return value.
Situation | model | source | return |
---|---|---|---|
All Caps | BANANA | apple | APPLE |
All Caps | PEAR | orange | ORANGE |
Capitalized | Banana | orange | Orange |
Capitalized | Apple | pear | Pear |
Neither | banana | apple | apple |
Neither | banana | Apple | Apple |
Neither | BaNaNa | aPPle | aPPle |
Neither | peaR | Orange | Orange |
The overarching correctWord(word)
behavior for AutomaticSC
is the following.
- If
ignoreCase
is false, use the parameterword
and dictionary words as is witheditDistance(..)
. Return the closest word and don't usematchCase(..)
- If
ignoreCase
is true, convert the parameterword
and dictionary words to a uniform case (upper or lower will work) before passing them intoeditDistance(..)
. The closest word returned should then be run throughmatchCase(..)
to produce the results.
9.4 (20%) Manual Inspection Criteria for AutomaticSC grading
- The constructor is very short by employing the initializtion
that is performed in the parent class through use of
super(..)
- Methods are not replicated from the parent class. Only the required
correctWord(word)
method overrides parent behavior while other methods are inherited by leaving them unspecified. - The
matchCase(..)
method does clean case analysis to determine if the model paraemter is all caps, capitalized, or neither and simple transformations to the source to cause it to match. correctWord(word)
clearly incorporates theignoreCase
field, makes use ofmatchCase(..)
, and is specified in a clean and readable fashion.
10 Interactive Spell Checking
Most interactive document editor (MS Word, Open Office, Google Docs)
will provide an interactive spell checker which will search the
document from beginning to end presenting the user with misspelled
words and prompt for corrections. This is the purpose of the
InteractiveSC
class which extends SpellChecker
.
isCorrect(word)
functions identically toSpellChecker
.- Calling
correctWord(word)
will prompt the user for a correction (even if the word is correct) which will then be returned. correctDocument(doc)
will identify misspelled words, show the line on which they appear using thedoc.currentLine()
method, but also "highlight" the word with asterisks to show it is incorrect, then prompt for a correction viacorrectWord(word)
Since the behavior of correctWord(word)
and correctDocument(doc)
are different from the parent methods, they should be overridden.
Since a user will be communicating with instances of the
InteractiveSC
class, the constructor for the class takes two
additional parameters aside from the dictionary file and whether to
ignore case.
- A
Scanner
which will provide input to the spell checker. The provided scanner should be used as given, not re-initialized. The source for the scanner may beSystem.in
which will read from what the user types or it may be from a string source to facilitate testing. Whenever the spell checker requires input such as for a word correction, read it from the scanner provided in the constructor. - A
PrintWriter
which will allows output to be created by the spell checker. Whenever the spell checker wishes to print a prompt, it should employ thePrintWriter
methods such asprintln(..)
andprintf(..)
to do so. It is tempting to useSystem.out.println(..)
for all output but there are cases in which output should be re-directed so it doesn't appear immediately on the screen such as while running tests. Use of thePrintWriter
for output allows that to happen here.
Both these arguments are stored in fields of the class mentioned in its architecture.
10.1 Demo Usage
Not carefully how the Scanner stdin
and PrinterWriter stdout
are
initialized and provided as arguments to the InteractiveSC
constructor to allow the user to directly interact with the spell checker.
Also note the format of information associated with the automatic
spell checker: it is preceded by the @
symbol to distinguish it from
other prompts.
Welcome to DrJava. > import java.util.*; > import java.io.*; > // Make initialization easy > Scanner stdin = new Scanner(System.in); > PrintWriter stdout = new PrintWriter(System.out,true); > // Create an interactive spell checker using english-dict.txt > InteractiveSC sc = new InteractiveSC("english-dict.txt",true,stdin,stdout); > sc.isCorrect("dork") // in the english dictionary true > String s; > sc.isCorrect("dorkus") // not in the english dictionary false > s = sc.correctWord("dorkus"); // prompt for adding during correction @- Correction for **dorkus**: dork @ Corrected to: dork > s dork > sc.isCorrect("cheese") // in english dictionary true > sc.isCorrect("cheeze") // not in english or personal dictionary false > s = sc.correctWord("cheeze"); // prompt for adding during correction @- Correction for **cheeze**: cheese @ Corrected to: cheese > s cheese > // Correct a whoel document interactively > Document doc = new Document("One potatoe, two potatoe, three potatoe, four."); > sc.correctDocument(doc); @ MISSPELLING in: One **potatoe**, two potatoe, three potatoe, four. @- Correction for **potatoe**: potato @ Corrected to: potato @ MISSPELLING in: One potato, two **potatoe**, three potatoe, four. @- Correction for **potatoe**: potatoes @ Corrected to: potatoes @ MISSPELLING in: One potato, two potatoes, three **potatoe**, four. @- Correction for **potatoe**: potatoes @ Corrected to: potatoes > doc.toString() One potato, two potatoes, three potatoes, four.
10.2 Class Architecture
public class InteractiveSC extends SpellChecker{ // A spell checker which interactively prompts users for spelling // corrections. It inherets much of its functionality from // SpellCheck but the behavior of correctWord(w) and // correctDocument(d) is modified from the parent version. protected Scanner input; // Scanner to read input from a user. The scanner should be provided // in the constructor and should not be created. It may be connected // to System.in for true interactive use or may be fixed input in // from a string used for testing. protected PrintWriter output; // PrintWriter used to write output for a user. It should be // provided in the constructor and should not be created. It may be // connected to System.out to write to the screen or may write a // temporary buffer during tests. public InteractiveSC(String dictFilename, boolean ignoreCase, Scanner input, PrintWriter output); // Constructor for the interactive spell checker. Arguments // dictFilename and ignoreCase should be used to invoke the super // class constructor. The input and output parameter should be set // to the associatd fields of this class. @Override public String correctWord(String word); // Prompt the user for a correction using a prompt with the format: // // @- Correction for **potatoe**: // // where "potatoe" is replaced with the misspelled word. Read input // from the user and return the provided correction. Before // returning, print the correction in a message formatted: // // @ Corrected to: potato // // Note that this method overrides the version of correctWord(w) // from the parent class. Like the parent version, it will produce // corrections irrespective of whether the given word is in the // dictionary. @Override public void correctDocument(Document doc); // Starting from the beginning of the document, apply corrections to // all misspelled words. When a misspelled word is found, print a // message and line of the document with the misspelled word // highlighed as in // // @ MISSPELLING in: One **potatoe**, two potatoe, three potatoe, four. // // Then prompt the user for a correction as in // // @- Correction for **potatoe**: // // Print newlines a the end of both messages. Printing to the // screen should use the output PrintWriter tracked by the // interactive spell checker. Reading input should use the input // Scanner tracked by the spell checker. Note that this method // overrides the version of correctDocument(w) from the parent // class. }
10.3 Implementation Notes
Prompts
The convention for prompts in the InteractiveSC
is as follows.
@ This is information and requires no input @- This is a prompt and input should be entered on the next line userInput
Both types of prompts start with the @
symbol but prompts which
require user responses start with @-
.
Initialization of Input and Output
From the demo above the following initialization
Scanner stdin = new Scanner(System.in); PrintWriter stdout = new PrintWriter(System.out,true); InteractiveSC sc = new InteractiveSC("english-dict.txt",true,stdin,stdout);
is a good way to get an InteractiveSC
up and running. The extra
parameter to the PrintWriter
constructor ensures that every print
flushes output to the screen so that it appears immediately.
Without this option, one may need to manually flush output via
stdout.flush();
in order to see anything printed to the screen.
Input Reading
It is assumed that corrections for words will always be another single
word. While this is not too general, it suits the needs of the moment
without complicating input for the Scanner
. Use the next()
method
of Scanner
to read corrected words.
10.4 (5%) Manual Inspection Criteria for InteractiveSC grading
- Only the required methods are overriden from the parent class;
isCorrect(word)
does not require modification - Only the required new fields for the
InteractSC
are defined; the dictionary and treatment of case fields are inherited and do not need to be redefined.
11 Personal Dictionary Checker
Most spell check systems have a system dictionary which is not changed
(english-dict.txt
in our case) but also allow users to define a
personal dictionary with words that they consider correct. This is
important for many domains to accommodate technical terms specific to
the style of writing such as "sith," "kyber," and "lightsaber."
To facilitate this functionality, the PersonalSC
extends the
InteractiveSC
by allowing a personal dictionary of words to be
used. This is a file similar to english-dict.txt
except that it is
may be altered by the user, either by editing it directly or through
the methods of the PersonalSC
class.
Most methods for PersonalSC
are identical to InteractiveSC
with
the exception of correctWord(word)
which operates as follows.
- If the word is considered correct, act exactly as the parent method prompting for a potential correction
- Otherwise, prompt the user for a yes/no answer on whether the word should be added to the personal dictionary
- If no, act exactly as the parent version of
correctWord(word)
does - If yes, expand the array associated with the personal dictionary and add the new word on. Return the word unaltered
PersonalSC
instances can be asked about the size and contents of
their personal dictionary and ultimately have their contents written
back to the file from which they were read via the
savePersonalDict()
method.
11.1 Demo Usage
Welcome to DrJava. > import java.util.*; > import java.io.*; > // Make initialization easy > Scanner stdin = new Scanner(System.in); > PrintWriter stdout = new PrintWriter(System.out,true); > // Create a spell checker with a personal dictionary from personal-dict.txt > PersonalSC sc = new PersonalSC("english-dict.txt",true,stdin,stdout,"personal-dict.txt"); > String s; > sc.isCorrect("dork") // in the english dictionary true > sc.isCorrect("dorkus") // not in the english or personal dictionary false > s = sc.correctWord("dorkus"); // prompt for adding during correction @- **dorkus** not in dictionary add it? (yes / no) yes > s dorkus > sc.isCorrect("dorkus") // new word in personal dictionary true > sc.getAllPersonalDictWords() // show personal dictionary words dorkus > sc.isCorrect("cheese") // in english dictionary true > sc.isCorrect("cheeze") // not in english or personal dictionary false > s = sc.correctWord("cheeze"); // prompt for adding during correction @- **cheeze** not in dictionary add it? (yes / no) yes > s cheeze > sc.isCorrect("cheeze") // new word in personal dictionary true > sc.getAllPersonalDictWords() // show personal dictionary words dorkus cheeze > sc.savePersonalDict() // save personal dictionary words to file personal-dict.txt @ Personal dictionary written to file personal-dict.txt > // Create a new spell checker which is initialized with the personal dictionary > PersonalSC sc2 = new PersonalSC("english-dict.txt",true,stdin,stdout,"personal-dict.txt"); > sc2.getAllPersonalDictWords() // show personal dictionary words dorkus cheeze > sc2.isCorrect("dorkus") // already in personal dictionary true > sc2.isCorrect("cheeze") // already in personal dictionary true > // Spell checkers are independent of one another > s = sc.correctWord("potatoe"); @- **potatoe** not in dictionary add it? (yes / no) yes > s potatoe > sc.getAllPersonalDictWords() // new word in personal dict dorkus cheeze potatoe > sc2.getAllPersonalDictWords() // same personal dict as before dorkus cheeze > // Adding to the personal dictionary means later words in a doc may be correct > Document doc = new Document("One potatoe, two potatoe, three potatoe, four."); > sc.correctDocument(doc); > doc.toString() One potatoe, two potatoe, three potatoe, four. > sc.getAllPersonalDictWords() dorkus cheeze potatoe > // Correct first misspellings but accept the remainder > Document doc = new Document("One tumato, two tumato, three tumato, four.") > sc.correctDocument(doc); @ MISSPELLING in: One **tumato**, two tumato, three tumato, four. @- **tumato** not in dictionary add it? (yes / no) no @- Correction for **tumato**: potato @ Corrected to: potato @ MISSPELLING in: One potato, two **tumato**, three tumato, four. @- **tumato** not in dictionary add it? (yes / no) yes > doc.toString() One potato, two tumato, three tumato, four. > sc.getAllPersonalDictWords() dorkus cheeze potatoe tumato
11.2 Class Architecture
public class PersonalSC extends InteractiveSC { // A spell checker which allows use of a personal dictionary. The // personal dictionary is initially read from a file though the file // may be non-existen in which case the personal dictioary is empty to // begin with. When checking for correctness of words, both the // system dictionary and personal dictionary are checked. If a // misspelled word is to be corrected, the user is interactively // prompted as to whether the word should instead be added to the // personal dictionary. The class can save the personal dictionary // back to the file from which it was read. protected String personalDictFilename; // Name of the file for the personal dictionary protected String [] personalDictWords; // Personal dictionary words public PersonalSC(String dictFilename, boolean ignoreCase, Scanner input, PrintWriter output, String personalDictFilename); // Construct a spell checker with a personal, modifiable dictionary. // Arguments are identical to InteractiveSC except for the final // argument which is a file containing the personal dictionary // words. This file should be formatted in the same way as a normal // dictionary and the contents used to initially fill in the // personalDictWords field. public int personalDictSize(); // Return the size of the personal dictionary used by this // spellchecker which is the size of the personalDictWords array. @Override public boolean isCorrect(String word); // Check if the word is correct according to the same methodology as // the parent class. If not, check whether the word appears in the // personal dictionary associated with this spell checker. Honor the // ignoreCase setting when checking the personal dictionary. @Override public String correctWord(String word); // If the parameter word is not in the system or personal // dictionary, prompty the user on whether they would like to add it // to the dictionary as in // // @- **tumato** not in dictionary add it? (yes / no) // // If the response is "yes" (read using the spell checkers scanner), // append it to the personalDictWords. You may use library methods // from java.util.Arrays to make the append easier. After // appending, return the word as it is now considered correct. // // If the answer on whether to add is not "yes" (e.g. "no"), prompt // the user for a correction in the same way that the parent class does. public String getAllPersonalDictWords(); // Return a string showing all words currently in the spell checkers // personal dictionary, one word per line. public void savePersonalDict() throws Exception; // Write the contents of personalDictWords to the file from which // they were initially read (personalDictFilename). Write one word // per line. Print a message to the screen indicating the // dictionary has been saved in the format: // // @ Personal dictionary written to file personal-dict.txt // // where the last word on the line is the name of the file where the // contents are saved. }
11.3 Implementation Notes
- It is likely that you will want a helper method to expand the array
associated with
personalDict
. This will make the code incorrectWord(word)
a bit shorter and cleaner. You are free to use library methods such as those inArray
for this purpose. - It is possible that a the personal dictionary file is empty or
non-existent. Use the
readAllLines(filename)
method to aid with this which should return an empty array if the file does not exist. - Make sure to retain the name of the personal dictionary file as the
contents of the
personalDict
array must be written back to that same file on a call tosavePersonalDict()
. ThePrintWriter
class is useful for file writing. - Whenever writing files, always make sure to invoke the
close()
method ofPrintWriter
to ensure contents are actually written.
11.4 (5%) Manual Inspection Criteria for PersonalSC grading
- Make effective use of the parent class version of
correctWord(word)
during implementation of that method - Employ the
readAllLines(filename)
method to ease the task of reading in the personal dictionary. - Clean code is present to enlarge the
personalDict
array to accommodate new words. Library methods are encouraged to facilitate this process. - Standard arrays are employed for the
personalDictWords
rather than other more advanced data structures such asArrayList
12 Honors Problem
The honors problem will be posted at a later time.