We have been using the String data type since the first week of this term. Now that we have also covered the basics of objects, it is time to revisit the String data type. In Java, the String data type is a class. A String object is essentially a container for an array of characters, and provides a set of useful methods for manipulating that array of characters.
We haven't really noticed that Strings are objects yet, because Java provides a number of shortcut forms for use with Strings that mask the fact that Strings are really objects. For example, up to this point we have been writing code like
String result = "The value of x is " + x + ".";
This code uses a collection of shortcuts to hide some of the details. Removing the shortcuts reveals the object nature of Strings:
String result = new String("The value of x is");
String xStr = Integer.toString(x);
result = result.concat(xStr);
result = result.concat(".");
The static method toString in the Integer class is used to convert integers to Strings. The concat method used here is a method in the String class that returns the result of concatenating a String onto the end of a String object.
The String class contains a long list of useful methods for doing manipulations with Strings. In these lecture notes I will discuss some of the more important methods of the String class. You will also find a discussion of many String methods in chapter 8 of the text.
We have already seen that we can create a Scanner designed to read data from a text file. Scanners are quite useful in situations where we have to read a mixture of text and other data types from a text file. In situations where we have to read only text from a text file and intend to read the text one line at a time, the BufferedReader class is a better choice. The following short program, ReadLines.java, demonstrates how to use a BufferedReader to open a text file, read the contents of the text file one line at a time and print each line to the output window.
import java.io.*;
public class ReadLines {
public static BufferedReader makeReader(String fileName) {
BufferedReader theReader = null;
try {
theReader = new BufferedReader(new FileReader(fileName));
} catch (Exception ex) {
System.out.println("Unable to open file " + fileName);
System.out.println(ex.toString());
System.exit(0);
}
return theReader;
}
public static void main(String[] args) throws Exception {
BufferedReader input = makeReader("test.txt");
String nextLine = input.readLine();
while (nextLine != null) {
System.out.println(nextLine);
nextLine = input.readLine();
}
}
}
The BufferedReader class contains a method readLine that will return the next line of text from the file. When we reach the end of the text file, readLine will return the special value null to indicate that no more lines are available to read from the text file.
One complication in working with the BufferedReader class or any class designed to work with files is that frequently methods in this class have the potential to throw exceptions. The code in the two methods above demonstrates two different approaches to handling methods that may throw an exception.
The makeReader method demonstrates the use of a try-catch block. This construct has the following form
try {
// Code that may throw an exception goes here
} catch(Exception ex) {
// Code to respond to an exception goes here
}
Whenever you write some code containing a method call that may throw an exception, you will have to do something to respond to this potential exception. If you do nothing to handle the exception, the compiler will give you an error message saying that you have an unhandled exception problem. The most common approach to dealing with code that may throw an exception is to place the offending code in the try section of a try-catch block. The try-catch block will give you an opportunity to run code that may throw an exception. If an exception gets thrown by the code in the try block, execution shifts to the code in the catch portion of the construct. The catch block contains code that typically issues an error message, maybe prints some further details about the exception, and then attempts to recover from the exception.
In cases where the exception makes it difficult or impossible to continue the program successfully, the catch block can also include code that exits the program. The example shown above uses this approach: if we are unable to open the file containing the text we want to read, the System.exit() method call causes the program to terminate.
Another approach to dealing with exceptions is to not handle them. Any method that contains code that may throw an exception can essentially pass the buck to the caller by rethrowing the exception. To rethrow any exceptions generated in a method, simply add throws Exception to the method declaration. The main method in the example above uses this approach. The readLine method of the BufferedReader class may throw an exception when we call it, so we either have to enclose that method call in a try-catch construct or add the throws Exception declaration to the enclosing method.
The next example program we are going to look at, BreakLines.java, reads some text from a text file and prints it to the console window in a neatly formatted form. Specifically, the text file contains several paragraphs of text arranged one paragraph per line. Since the lines are typically quite long, we want to break each paragraph into one or more lines of more managable length and then print the lines to the console window. Here is the code for a method that can take a paragraph, break it into lines of a specified length, and then print those lines one at a time to the console window.
public static void breakAndPrintLines(String paragraph, int lineWidth) {
String rest = paragraph;
while (rest.length() > lineWidth) {
int pos = lineWidth - 1;
while (rest.charAt(pos) != ' ') {
pos--;
}
String first = rest.substring(0, pos);
System.out.println(first);
rest = rest.substring(pos + 1);
}
System.out.println(rest);
}
As mentioned earlier, a String is really an array of characters. As in all arrays, the characters are numbered with indices from 0 up to the length of the string minus one. The method substring(start,end) in the String class returns a new String containing the characters copied from the range from index start to index end-1 of the original character array. The method substring(start) returns a String with a copy of the characters from position start to the end of the of the original String. The code above uses the substring method to break the original paragraph into a line of text with length less than or equal to the required line length and the rest of the paragraph.
To determine where to break the string into lines, we use the charAt method. Given an index in the String's character array, charAt returns the character at that index. The code above uses the test
rest.charAt(pos) != ' '
in a loop to scan backward from a starting point until a space is encountered. When we have found the location of a space, we can use the substring methods to break the text into substrings before and after the space.
A more convenient alternative to the charAt method for some applications is the indexOf method. indexOf(character) returns the index of the first location in the String where character is found. indexOf(character,position) returns the index of the first location at or after position at which character is found.
The next example program we are going to look at, ReplaceTags.java, performs a search and replace task on the text in an input file. The input file contains source text containing tags that will have to be replaced with appropriate text. The tags are in the form <...>, so the program will have to search for the tags, prompt the user for text to replace the tags, and then print the original text to the console with the tags replaced by the user's input values.
To find the tags in the original text, the program will start by using the String class's split method to split the input text into an array of individual words. split takes a single String parameter containing a list of delimiters, which are characters that mark the points at which we want the String broken. For example, if the String paragraph contains a paragraph of text we can break the paragraph into sentences with the method call
String[] sentences = paragraph.split("[.!?]");
The only problem with the split method is that it discards the delimiters. Thus, in the example above we would get the paragraph broken into sentences, but the punctuation at the end of each sentence would be discarded.
To isolate the tags in the input text, the program we will write will start by calling split with a parameter of " " to split the input text into an array of individual words. To pick out the words that are actually tags, we use the String class's match method. match takes a single String parameter containing a regular expression pattern to search for. The pattern string "<.*>.*" matches any text that consists of '<' followed by a sequence of 0 or more characters, the '>' character, and 0 or more characters. You can read more about match and patterns in section 8.2.7 of the text or in the author's supplement on regular expressions.
Once we have split the text into individual words and filtered out and replaced the tags, we will want to reassemble the converted text back into paragraphs to pass to the method that breaks the paragraphs into lines. One way to reassemble the words into paragraphs is to use simple String concatenation. For cases like this, where we will have to do a large number of concatenations, it makes more sense to use a more specialized method of building the result string from its component words. Java has a class called the StringBuilder that is designed for precisely this situation. The StringBuilder class contains an append method that we can use to append strings onto a result string. After appending as many strings as we want with append, we can then ask the StringBuilder to hand us the result string by calling the StringBuilder's toString method. This is generally more efficient that using a long sequence of String concatenations.
Here now is the code for a couple of methods that that use the strategies outlined above to break paragraphs into words, find and replace the tags, and then glue the words back together to make paragraphs.
public static String processWord(String word) {
String result = word;
// Does the word contain a tag?
if (word.matches("<.*>.*")) {
// Find the position where the tag ends
int endPos = word.indexOf('>');
String tag = word.substring(0, endPos+1);
System.out.println("Enter a value for " + tag + ": ");
result = console.nextLine();
// If there is any text after the tag, append a copy of it
// to the result.
if(endPos < word.length() - 1) {
String remainder = word.substring(endPos + 1);
result = result + remainder;
}
}
return result;
}
public static String processParagraph(String paragraph) {
StringBuilder builder = new StringBuilder();
String[] fragments = paragraph.split(" ");
String first = processWord(fragments[0]);
builder.append(first);
for (int n = 1; n < fragments.length; n++) {
builder.append(' ');
builder.append(processWord(fragments[n]));
}
return builder.toString();
}
One problem with the program ReplaceTags.java shown above is that it prompts the user for a replacement for each tag it encounters, even if the user has already entered a replacement for that tag when it appeared earlier. Clearly, what we need here is a mechanism whereby the program can remember tags and replacement text it has seen before.
To make it possible for the program to remember tag/replacement pairs, we now introduce a Dictionary class:
public class Dictionary {
private String[] originals;
private String[] replacements;
public Dictionary(int size) {
originals = new String[size];
replacements = new String[size];
String empty = "";
for (int n = 0; n < size; n++) {
originals[n] = empty;
replacements[n] = empty;
}
}
// Does the dictionary contain this word?
public boolean contains(String word) {
int n = 0;
while (originals[n].length() > 0) {
if (originals[n].equals(word)) {
return true;
}
n++;
}
return false;
}
// Add a new word/alt pair to the Dictionary
public void addEntry(String word, String alt) {
int n = 0;
while (originals[n].length() > 0) {
if (originals[n].equals(word)) {
return;
}
n++;
}
originals[n] = word;
replacements[n] = alt;
}
// Find the replacement for the given word
public String getReplacement(String word) {
int n = 0;
while (originals[n].length() > 0) {
if (originals[n].equals(word)) {
return replacements[n];
}
n++;
}
return "";
}
}
The Dictionary class has two internal arrays, originals to hold tags, and replacements to hold the corresponding replacement text for the tags. The class contains methods to determine whether or not a given tag is in the dictionary, add tag/replacement pairs, and fetch the replacement for a given tag.
The final version of our program, MailMerge.java, uses the Dictionary class to remember replacement text for tags it has already seen and thus prompt the user just once for each unique tag in the source file. Here is the code for the method that replaces the tags.
public static String processWord(String word) {
String result = word;
// Does the word contain a tag?
if (word.matches("<.*>.*")) {
// Find the position where the tag ends
int endPos = word.indexOf('>');
String tag = word.substring(0, endPos + 1);
if (!dictionary.contains(tag)) {
System.out.println("Enter a value for " + tag + ": ");
String alt = console.nextLine();
dictionary.addEntry(tag, alt);
result = alt;
} else {
result = dictionary.getReplacement(tag);
}
// If there is any text after the tag, append a copy of it
// to the result.
if (endPos < word.length() - 1) {
String remainder = word.substring(endPos + 1);
result = result + remainder;
}
}
return result;
}
The dictionary and console variables that you see this method using is declared and initialized as a static member variables of the class containing the processWord method.
private static Scanner console = new Scanner(System.in); private static Dictionary dictionary = new Dictionary(100);
This is a convenience feature that saves us from having to pass these two around as parameters.
The program MailMerge.java uses the split(), matches(), and substring() methods to separate the input paragraphs into words, search for words containing tags, and separate the tags from the rest of the word as needed. This works just fine, but there is a simpler, more direct approach we could use. We could use the indexOf() method in combination with substring() to break the input paragraphs into chunks of text and tags. For example, given the input paragraph
To order your copy of <item>, send <price> to <mailingAddress> by <date>.
We could split that paragraph into strings
"To order your copy of " "<item>" ", send " "<price>" " to " "<mailingAddress>" " by " "<date>" "."
Replace the code in the processParagraph and processWord methods of the MailMerge class with methods that use the strategy indicated here to break the paragraphs into chunks by this simplified strategy and then replace the tags. Your code should use only indexOf() and substring() to do its work - it should not have to use split() or matches() at all.