MCSD Certification Toolkit (Exam 70-483): Programming in C# (2013)

Chapter 11

Input Validation, Debugging, and Instrumentation

What You Will Learn in This Chapter

·        Understanding validation input

·        Using regular expressions

·        Managing data integrity

·        Using preprocessor directives and symbols

·        Using the Debug and Trace classes

·        Tracing, logging, and profiling

WROX.COM CODE DOWNLOADS FOR THIS CHAPTER

You can find the code downloads for this chapter at www.wrox.com/remtitle.cgi?isbn=1118612094 on the Download Code tab. The code is in the chapter11 download and individually named according to the names throughout the chapter.

This chapter explains several topics that at first may seem unrelated. It starts by discussing input validation techniques that a program can use to protect itself from bad values entered by the user.

After the data has entered the program, you still need to manage the data so that it doesn’t become corrupted by incorrect calculations. A useful method to do this is to use preprocessor directives to include testing code during debug builds, but to exclude it from release builds.

That same technique of using preprocessor directives is also useful for studying the program’s activities and performance. It enables you to include or exclude code that traces execution, logs progress, and profiles the application’s performance during the appropriate releases.

This chapter introduces all these topics: input validation, managing data integrity, tracing, logging, profiling, and preprocessor directives that you can use to determine which of those activities occur in various program builds.

Table 11-1 introduces you to the exam objectives covered in this chapter.

Table 11-1: 70-483 Exam Objectives Covered in This Chapter

Objective

Content Covered

Debug applications and implement security

Validate application input. This includes using string methods and regular expressions to validate inputs.
Debug an application. This includes creating and using preprocessor directives, and using the Debug and Trace classes to follow program execution and watch for unexpected values.
Implement diagnostics. This includes tracing, using the profiler, writing to event logs, and using performance counters.

Input Validation

It’s hard enough debugging an application when it contains correct data. It’s even harder if the data it works with is incorrect. If the data is wrong, how can you tell whether invalid results are caused by buggy code or incorrect data?

As is the case with bugs in code, mistakes in data are easiest to correct if you detect them quickly. Ideally you can catch incorrect data as soon as the user enters it. At that point if you can figure out why the data is incorrect, then you can ask the user to fix it.

The following sections describe methods you can use to validate inputs to detect incorrect data.

Avoiding Validation

Before discussing ways to validate data, it’s worth taking a moment to discuss ways to avoid validation.

If the user cannot enter an incorrect value, you don’t need to write code to validate the value. C# programs can use lots of different kinds of controls, and many of those let you restrict the user’s input to valid values.

For example, suppose the program needs the user to enter an integer value between 1 and 10. You could let the user type the value into a TextBox. In that case, the code would need to check the value to make sure the user didn’t enter 100, 0, –8, or ten. However, if the program makes the user select the value from a TrackBar, the user cannot select an invalid value. You can set the TrackBar’s Minimum and Maximum properties and the control does all the validation work for you.

Not only does this save you the trouble of writing and debugging the validation code, it also saves you the trouble of figuring out what to do if the user enters an invalid number. You don’t need to display a message box to the user, and the user’s work flow isn’t interrupted by the message box.

Many controls enable the user to select values of specific types such as colors, dates, files, folders, fonts, numbers, a single item from a list, and multiple items from a list. Whenever you build a program and plan to let the user type something into a TextBox, you should ask yourself whether there is some control that would let the user select the value instead of typing it.

Triggering Validations

Realistically controls can’t help the user select every possible value. To select an arbitrary phone number, a program would probably need to let the user select each digit one at a time, a painfully tedious process. Making the user select an arbitrary name would be even worse.

In cases like these a program should let the user type data but then it must validate the result. The program should verify that the phone number follows the format required by the program’s locale. It can’t do too much validation for a name but can at least verify that it is not blank.

All this begs the question, “When should the program validate data entered by the user?” Should it validate each keystroke? Should it validate when the user enters a value and then moves to a new field? Or should it validate all the values on a form after the user enters all the data and clicks OK?

The answer depends on how often a particular event occurs and how intrusive the validation is. For example, when the user types a value in a TextBox, many keystrokes occur. It would be extremely annoying if the program interrupted the user between every keystroke with an error message.

Instead the program may simply ignore invalid keystrokes. For example, the MaskedTextBox control enables you to specify a mask that the text must match. In the United States you could set the mask to (999)000-0000 to require that the text match a 10-digit phone number format. The user must enter a digit or space for places corresponding to the 9s and must enter a digit for places corresponding to the 0s. The parentheses and dash character are displayed by the control but the user cannot change them. If the user types any other character, the control ignores it.

The MaskedTextBox can’t prevent all invalid inputs. For example, the user could enter (000)000-0000, which is not a valid phone number.

For another example, suppose the user must enter a floating point value such as 1.73. The program can’t use a TrackBar, ScrollBar, or other control to let the user select a value because those controls select only integers, so you can let the user type a value into a TextBox. While the user types a floating-point value, there are times when the value may not be valid; it might be part of a possible valid value. For example, the value “–.” is not a valid floating point number but “–.1” is. In that case, the program shouldn’t interrupt the user with an error message after “–.” has been typed. Instead it must wait until the user finishes typing.

When the user moves to a new field, the program can validate the user’s input more fully. In a Windows Forms application, you can use a TextBox’s Validating event to validate its value when focus moves to another control that has CausesValidation property set to true. The Validating event provides a parameter of type CancelEventArgs that has a Cancel property. If you set this property to true, the program cancels the event that moved focus to this control. This traps the user in the field that set e.Cancel to true until the user fixes the input problem.

NOTE The form refuses to close as long as a control’s Validating event sets e.Cancel to true; although the form closes if focus never reaches that control. The fact that the form sometimes closes and sometimes doesn’t makes using e.Cancel even more confusing for the user.

Trapping the user in a field disrupts the user’s workflow and forces the user to take immediate action. If the user types “head down” without looking at the screen, it could be a while before the user notices that focus is stuck in the control with the problem.

A better approach, and one taken by many websites these days, is to mark the control so that users can see that it contains an error but to let users continue using other parts of the form until they decide to fix the problem. For example, the program might change a TextBox’s background color to yellow, change its foreground color to red, or display an asterisk next to it.

Later when users click the OK button or otherwise try to accept the values on the form, the code can revalidate the values and display error messages if appropriate.

The following list summarizes the three stages of input validation ranging from most frequent and least intrusive to least frequent and most intrusive:

1. Keystroke validation: The program can ignore any keystrokes that don’t make sense, but be sure to allow values that could turn into something that makes sense as “–.” could turn into “–.123”. Optionally, you could mark the field as containing an invalid value as long as you don’t interrupt the user.

2. Field validation: When focus leaves a field, the program can validate its contents and flag invalid values. Now if the field contains “–.” it is invalid. The program should display an indicator that the value is invalid but should not force the user to correct it yet.

3. Form validation: When the user tries to accept the values on the form, the program should validate all values and display error messages if appropriate. This is the only place where the program should force the user to fix values.

NOTE Form validation is also the only place where the program can perform cross-field validations where the value of one field depends on the values in other fields. For example, the ZIP code 02138 is in Cambridge, MA. If the user enters this ZIP code but the state AZ, something is wrong. Either the ZIP code is incorrect or the state is incorrect (or both).

The following section describes specific methods for validating data to ensure that it meets required formats.

Validating Data

After you decide when to validate the user’s inputs, you still need to know how to perform the actual validation. The following sections describe two approaches: using built-in functions and using regular expressions.

Using Built-in Functions

One of the most basic data validations is to verify that the user entered a required value. If the value should be entered in a TextBox, the program can simply check its length. For example, the following code checks the emailTextBox control to see if its contents are blank:

if (emailTextBox.Text.Length == 0)

{

    // The email field is blank. Display an error message.

    ...

}

For some other types of controls, the program must look at different control properties to see if the user has made a selection. For example, a ListBox or ComboBox uses its SelectedIndex and SelectedItem properties to indicate the user’s selection. To see if the user has made a selection, the code should check whether SelectedIndex == -1 or SelectedItem == null.

NOTE In normal selection mode, after a ComboBox or ListBox has selected an item, the user cannot deselect all items. The user can select a different item but cannot deselect all items. You can ensure that the user makes a selection by selecting a default value when the form loads. Then you don’t need to verify that the user has made a selection because you know there must be a selection.

If the field is not blank, the program may need to perform additional validation to determine whether the field makes sense. For example, the value test isn’t a valid e-mail address.

C# and the .NET Framework provide several built-in methods for performing additional data validation. Perhaps the most useful of these are the TryParse methods provided by built-in data types such as int, float, and DateTime. The TryParse methods attempt to parse a string value and return true if they succeed. The following code checks whether the costTextBox contains a valid currency value:

decimal cost;

if (!decimal.TryParse(costTextBox.Text,

    NumberStyles.Currency,

    CultureInfo.CurrentCulture,

    out cost))

{

    // Cost is not a valid currency value. Display an error message.

    ...

}

The NumberStyles enumeration and the CultureInfo class are in the System.Globalization namespace, so this code assumes the program has included that namespace with a using statement.

Note that some values may not make sense now, but they must still be allowed because later they may make sense. For example, as was mentioned earlier, the value “–.” is not a valid floating point number but “–.1” is, so the program must allow “–.” while the user is typing.

However, the value “– –” is not a legal part of a floating-point value, so you don’t need to allow that. Most programs just ignore the issue and don’t try to validate the entry until the user accepts the form.

If you want to validate partial values, however, you may turn a partial entry into a full entry. In this example, you can add “0” to the end of the string. Then the text “–.0” is a valid floating point value, but the text “– –0” is not.

Using String Methods

A program can use the string data type’s Length property to determine whether the string is blank. That lets you easily decide whether the user has left a required field blank on a form.

The string class also provides several methods that are useful for performing more complicated string validations. Table 11-2 summarizes the most useful of those methods.

Table 11-2: Useful String Validation Methods

Method

Purpose

Contains

Returns true if the string contains a specified substring. For example, you could use this to determine whether an e-mail address contains the @ character.

EndsWith

Returns true if the string ends with a specified substring.

IndexOf

Returns the location of a specified substring within the string, optionally starting the search at a particular position.

IndexOfAny

Returns the location of any of a specified set of characters within the string, optionally starting at a particular position.

IsNullOrEmpty

Returns true if the string is null or blank.

IsNullOrWhitespace

Returns true if the string is null, blank, or contains only whitespace characters such as spaces and tabs.

LastIndexOf

Returns the last location of a specified substring within the string, optionally starting the search at a particular position.

LastIndexOfAny

Returns the last location of any of a specified set of characters within the string, optionally starting at a particular position.

Remove

Removes characters from the string. For example, you can remove the – characters from the phone number 234-567-8901 and then examine the result to see if it makes sense.

Replace

Replaces instances of a character or substring with a new value.

Split

Returns an array containing substrings delimited by a given set of characters. For example, you could split the phone number 234-567-8901 into its pieces and examine them separately.

StartsWith

Returns true if the string starts with a specified substring.

Substring

Returns a substring at a specified location.

ToLower

Returns the string converted to lowercase. This can be useful if you want to ignore the string’s case.

ToUpper

Returns the string converted to uppercase. This can be useful if you want to ignore the string’s case.

Trim

Returns the string with leading and trailing whitespace characters removed.

TrimEnd

Returns the string with trailing whitespace characters removed.

TrimStart

Returns the string with leading whitespace characters removed.

With enough work, you can use these string methods to perform all sorts of validations. For example, suppose the user enters a phone number of the format (234)567-8901. You could use the Split method to break this into the pieces 234, 567, and 8901. You can then verify that Split returned three pieces and that the pieces have the necessary lengths.

Although you can use the string methods to perform just about any validation, sometimes that can be hard because the validations can be complex. For example, (234)567-8901 is not the only possible U.S. phone number format. You might also want the program to allow 234-567-8901, 1(234)567-8901, 1-234-567-8901, +1-234-567-8901, 234.567.8901, and other formats.

Phone numbers for other countries, e-mail addresses, postal codes, serial numbers, and other values can also require complicated validations. You can perform all these by using the string class’s methods, but sometimes it can be difficult. In those cases you can often use the regular expressions described in the following section to validate the more complex structure that these entities hold.

Using Regular Expressions

Regular expressions provide a flexible language for a pattern in strings. Regular expressions let a program determine whether an entire string matches a pattern (a regular expression used for matching parts of a string), find pieces of a string that match a pattern, and replace parts of a string with new values.

The System.Text.RegularExpressions.Regex class provides the objects that a program can use to work with regular expressions. Using the Regex class is fairly complicated, so this section describes only its most common and useful operations.

Table 11-3 summarizes the Regex class’s most useful methods.

Table 11-3: Useful Regex Methods

Method

Purpose

IsMatch

Returns true if a regular expression matches a string.

Match

Searches a string for the first part of it that matches a regular expression.

Matches

Returns a collection giving information about all parts of a string that match a regular expression.

Replace

Replaces some or all the parts of the string that match a pattern with a new value. This is more powerful than the string class’s Replace method.

Split

Splits a string into an array of substrings delimited by pieces of the string that match a regular expression.

Many of the methods described in Table 11-3 have multiple overloaded versions. In particular, many take a string as a parameter and can optionally take another parameter that gives a regular expression. If you don’t pass the method a regular expression, the method uses the expression you passed the object’s constructor.

The Regex class also provides static versions of these methods that take both a string and a regular expression as parameters. For example, the following code validates the text in a TextBox and sets its background color to give the user a hint about whether the value matches a regular expression. (Don’t worry about the regular expression just yet. Regular expressions are described shortly.)

private void ValidateTextBox(TextBox txt, bool allowBlank, string pattern)

{

    // Assume it's invalid.

    bool valid = false;

    // If the text is blank, allow it.

    string text = txt.Text;

    if (allowBlank && (text.Length == 0)) valid = true;

    // If the regular expression matches the text, allow it.

    if (Regex.IsMatch(text, pattern)) valid = true;

    // Display the appropriate background color.

    if (valid) txt.BackColor = SystemColors.Window;

    else txt.BackColor = Color.Yellow;

}

The code assumes the value is invalid, so it sets the variable valid to false.

Next, if the allowBlank parameter is true and the text is blank, the code sets valid to true.

The code then uses the Regex class’s static IsMatch method to determine whether the regular expression matches the text. If the expression matches the text, the code sets valid to true.

Finally, the code sets the TextBox’s background color to SystemColors.Window if the text is valid or yellow if the text is invalid. This gives the user a visible indication when the text is invalid without interrupting the user.

Table 11-3 lists only five methods, whereas Table 11-2 lists 18 methods provided by the string class, so you might think the Regex class isn’t as useful. The power of the Regex class comes from the flexibility of the regular expression language.

NOTE There are a few different regular expression languages used by different programming languages and environments. These languages are similar but not identical, so it’s easy to be confused and use the wrong syntax. If you find that an expression doesn’t do what you think it does, be sure you’re using the right syntax for C#.

In particular, if you use the Internet to find an expression to match some standard format such as UK phone numbers or Canadian postal codes, be sure the website where you found the expressions uses the syntax required by C#.

A regular expression is a combination of literal characters and characters that have special meanings. For example, the sequence [a-z] means the Regex object should match any single character in the range “a” through “z.”

Regular expressions can also include special character sequences called escape sequences that represent some special value. For example, the sequence \b matches a word boundary and \d matches any digit 0 through 9.

Sometimes a program needs to use a character as itself even though it looks like a special character. For example, the [ character normally begins a range of characters. If you want to use [ itself, you “escape” it by adding a \ in front of it as in \[. For example, the somewhat confusing sequence[\[-\]] matches the range of characters between [ and ].

BEST PRACTICES: Avoiding Too Many \ Characters

Remember that strings inside C# code also treat \ as a special character. For example, \t represents a tab and \n represents newline.

To add a \ inside a string in C#, you must escape it by adding another \ in front of it as in \\. This can become maddeningly confusing. For example, to put the already confusing regular expression pattern [\[-\]] in C# code, you would need to use [\\[-\\]].

A much simpler method is to use a string literal that starts with the @ character. For example, the following code creates a string named pattern that contains the text [\[-\]].

string pattern = @"[\[-\]]";

The most useful pieces of a regular expression can be divided into character escapes, character classes, anchors, grouping constructs, quantifiers, and alternation constructs. The following sections describe each of these. See the links in the “Additional Reading and Resources” section later in this chapter for information about other features of regular expressions.

Character Escapes

Table 11-4 lists the most useful regular expression character escapes.

Table 11-4: Useful Character Escapes

Character

Meaning

\t

Matches tab

\r

Matches return

\n

Matches newline

\nnn

Matches a character with ASCII code given by the two or three octal digits nnn

\xnn

Matches a character with ASCII code given by the two hexadecimal digits nn

\unnnn

Matches a character with Unicode representation given by the four hexadecimal digits nnnn

Character Classes

character class matches any one of a set of characters. Table 11-5 describes the most useful character class constructs.

Table 11-5: Useful Character Class Constructs

Construct

Meaning

[chars]

Matches a single character inside the brackets. For example, [aeiou] matches a lowercase single vowel.

[^chars]

Matches a single character that is not inside the brackets. For example, [^aeiouAEIOU] matches a single nonvowel character such as x, 7, or &.

[first-last]

Matches a single character between the character first and the character last. For example, [a-zA-Z] matches any lowercase or uppercase letter.

.

A wildcard that matches any single character except \n. (To match a period, use the \. escape sequence.)

\w

Matches a single word character. Normally this is equivalent to [a-zA-Z_0-9].

\W

Matches a single nonword character. Normally this is equivalent to [^a-zA-Z_0-9].

\s

Matches a single whitespace character. Normally this includes space, form feed, newline, return, tab, and vertical tab.

\S

Matches a single nonwhitespace character. Normally this matches everything except space, form feed, newline, return, tab, and vertical tab.

\d

Matches a single decimal digit. Normally this is equivalent to [0-9].

\D

Matches a single character that is not a decimal digit. Normally this is equivalent to [^0-9].

Anchors

An anchor, or atomic zero-width assertion, represents a state that the string must be in at a certain point. Anchors do not consume characters. For example, the ^ and $ characters represent the beginning and ending of a line or the string, depending on whether the Regex object is working in single-line or multiline mode.

Table 11-6 describes the most useful anchors.

Table 11-6: Useful Anchors

Anchor

Meaning

^

Matches the beginning of the line or string

$

Matches the end of the string or before the \n at the end of the line or string

\A

Matches the beginning of the string

\z

Matches the end of the string

\Z

Matches the end of the string or before the \n at the end of the string

\G

Matches where the previous match ended

\B

Matches a nonword boundary

Regex Options

A program can specify regular expression options in three ways.

First, it can pass an options parameter to a Regex object’s pattern matching methods such as IsMatch. The options are defined by the RegexOptions enumeration.

Second, it can use the syntax (?options) to include inline options in a regular expression. Here, options can include any of the values i, m, n, s, or x. (These are described shortly.) If the list begins with a - character, the following options are turned off. The options remain in effect until a new set of inline options reset their values.

Third, it can use the syntax (?options:subexpression) in a regular expression. In this case, options is as before and subexpression is part of a regular expression during which the options should apply.

Table 11-7 describes the available options.

Table 11-7: Regular Expression Options

Option

Meaning

i

Ignore case.

m

Multiline. Here ^ and $ match the beginning and ending of lines.

s

Single line. Here, . matches all characters including \n. (See the entry for . in Table 11-5.)

n

Explicit capture. Do not capture unnamed groups. See the section “Grouping Constructs” for more information on groups.

x

Ignore unescaped whitespace in the pattern and enable comments after the # character.

For more information on these options, see “Regular Expression Options” at http://msdn.microsoft.com/library/yd1hzczs.aspx.

Grouping Constructs

Grouping constructs let you define groups of matching pieces of a string. For example, in a U.S. phone number with the format 234-567-8901, you could define groups to hold the pieces 234, 567, and 8901. The program can later refer to those groups either with code or later inside the same regular expression.

For example, consider the expression (\w)\1. The parentheses create a numbered group that, in this example, matches a single word character. The \1 refers to numbered group 1. That means this regular expression matches a word character followed by itself. If the string is “book,” this pattern would match the “oo” in the middle.

There are several kinds of groups, some of which are fairly specialized and confusing. The two most common are numbered and named groups.

To create a numbered group, simply enclose a regular subexpression in parentheses, as shown in the previous example (\w)\1. Note that the group numbering starts with 1, not 0.

To create a named group, use the syntax (?<name>subexpression) where name is the name you want to assign to the group and subexpression is a regular subexpression. For example, the expression (?<twice>\w)\k<twice> is similar to the previous version except the group is named twice. Here, the \k makes the expression match the substring matched by the named group that follows, in this case twice.

Quantifiers

Quantifiers make the regular expression engine match the previous element a certain number of times. For example, the expression \d{3} matches any digit exactly three times. Table 11-8 describes regular expression quantifiers.

Table 11-8: Quantifiers

Quantifier

Meaning

*

Matches the previous element 0 or more times

+

Matches the previous element 1 or more times

?

Matches the previous element 0 or 1 times

{n}

Matches the previous element exactly n times

{n,}

Matches the previous element n or more times

{n,m}

Matches the previous element between n and m times (inclusive)

If you follow one of these with ?, the pattern matches as few times as possible. For example, the pattern bo+ matches b followed by 1 or more occurrences of the letter o, so it would match the “boo” in “book.” The pattern bo+? also matches b followed by 1 or more occurrences of the letter o, but it matches as few letters as possible, so it would match the “bo” in “book.”

Alternation Constructs

Alternation constructs use the | character to allow a pattern to match either of two subexpressions. For example, the expression (yes|no) matches either yes or no.

Useful Regular Expressions

The following code shows one way you could validate a TextBox to see if it contains a 7-digit U.S. phone number:

// Perform simple validation for a 7-digits US phone number.

private void phone7TextBox_TextChanged(object sender, EventArgs e)

{

    const string pattern = @"^\d{3}-\d{4}$";

    bool valid = false;

    string text = phone7TextBox.Text;

    if (text.Length == 0) valid = true;

    if (Regex.IsMatch(text, pattern)) valid = true;

    if (valid) phone7TextBox.BackColor = SystemColors.Control;

    else phone7TextBox.BackColor = Color.Yellow;

}

The code first defines a constant named pattern to hold the regular expression that the text should match. This pattern’s pieces mean the following:

Piece of Pattern

Description

^

Matches the start of the string, so the phone number must start at the beginning of the string.

\d

Match any digit.

{3}

Repeat the previous (match any digit) three times. In other words, match three digits.

Match the – character.

\d

Match any digit.

{4}

Match 4 digits.

This is a simple 7-digit phone number format and allows several illegal phone numbers such as 111-1111 and 000-0000.

The code then initializes the boolean variable valid to false.

If the text entered by the user is blank, the code sets valid to true. Next if the text matches the pattern, the code sets valid to true.

After it has performed those checks, the code sets the TextBox’s background color to the system’s control color if the value is valid or to yellow if the value is invalid.

The following list describes several useful regular expressions.

·        ^[2-9][0-9]{2}-\d{3}$: This pattern matches a 7-digit U.S. phone number more rigorously. The exchange code at the beginning must match the pattern NXX where N is a digit 2–9 and X is any digit 0–9.

·        ^[2-9][0-8]\d-[2-9][0-9]{2}-\d{3}$: This pattern matches a 10-digit U.S. phone number with the format NPA-NXX-XXXX where N is a digit 2-9, P is a digit 0–8, A is any digit 0–9, and X is any digit 0–9.

·        ^([2-9][0-8]\d-)?[2-9][0-9]{2}-\d{3}$: This pattern matches a U.S. phone number with an optional area code. The part of the pattern ([2-9][0-8]\d-)? matches an area code. The question mark at the end means the part inside the parentheses can appear 0 or 1 times. The rest of the pattern is similar to the earlier pattern that matches a 7-digit U.S. phone number.

·        ^\d{5}(-\d{4})?$: This pattern matches a US ZIP code with optional +4 as in 12345 or 12345-6789.

·        ^[A-Z]\d[A-Z] \d[A-Z]\d$: This pattern matches a Canadian postal code with the format A#A #A# where A is any capital letter and # is any digit.

·        ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9._%+-]+\.[a-zA-Z]{2,4}$: This pattern matches an e-mail address. The sequence [a-zA-Z0-9._%+-] matches letters, digits, underscores, %, +, and -. The plus sign after it means one of those characters must appear one or more times. Next, the pattern matches the @ symbol. The pattern then matches another letter 1 or more times, followed by a ., and then between 2 and 4 letters. For example, this matches RodStephens@CSharpHelper.com. This pattern isn’t perfect but it matches most valid e-mail addresses.

Notice that all these patterns begin with the beginning-of-line anchor ^ and end with the end-of-line anchor $. That makes the pattern to match the entire string not just part of it. For example, the pattern ^\d{5}(-\d{4})?$ matches complete strings that look like ZIP codes such as 12345. Without the ^ and $, it would match strings that contain a string that looks like a ZIP code such as test12345value.

Using Sanity Checks

After the program verifies that a value has a reasonable format, it can perform basic sanity checks to see whether it makes sense. If the user enters a cost of $1 trillion dollars for a notebook, wants to order 1 million pencils, or has the e-mail address a@a.com, something may be wrong. The program can look for this kind of suspicious value and display a message asking the user if the value is correct.

Sometimes these unusual values are correct, so the program should give the user a way to allow them if possible. Your company may not provide 1 million pencils, but if it can that would probably be a lucrative sale.

NOTE Names can be particularly tricky because they can contain almost anything. You can’t even count on a name to have a minimum number of letters or to contain both consonants and vowels. Some tricky first names include Sy, Ly, Su, and Gd, and some difficult last names include Ng, Bt, and O. You may even encounter people who use only a single name; although in that case some programs just ask the user to make something up for a last name even if it’s only Nothing.

REAL-WORLD CASE SCENARIO: New order form

Make a new order form similar to the one shown in Figure 11-1.

Figure 11-1: A new order form contains many validations.

image

Give the form the following field validations:

·     First and last name: These are required. For a sanity check, these should consist of an uppercase letter followed by any number of lowercase letters. That can be followed by a hyphen, apostrophe, or space, and then the whole thing can repeat. For example, O’Neil, Mary Ann, and Jones-Smythe should all be allowed.

·     Street: This is required but has no other validation.

·     City and ZIP: This is a small local business, so the program should allow only the three cities with specific ZIP codes: Programmeria (13370, 13371, and 13372), Bugsville (13375 and 13376), and Abend (13376, 13377, 13378, and 13379).

·     State: The program should allow only FL.

·     Items If any field in a row is present, all are required. As a sanity check, description should contain at least six characters. Unit cost is currency and quantity is an integer. If unit cost and quantity are both present, calculate the row’s total. As sanity checks, unit cost should be between $0.01 and $1,000.00, and quantity should be between 1 and 100.

·     Grand total: If any row has a total, calculate the grand total.

Only enable the OK button if the order is complete with all contact fields filled in and at least one row filled in.

When the user clicks OK, make the user confirm any values that violate their sanity checks.

Solution

This example has three kinds of validation. First, as the user types, it changes each field’s background color to indicate whether the current value is valid. It doesn’t try to restrict the user so, for example, it allows the user to type invalid characters in a numeric field. It just flags the value as invalid.

Second, when the user clicks OK, the program checks the fields and refuses to close the form if any values are invalid.

Finally, when the user clicks OK and all the fields contain valid values, it checks for unusual values such as prices greater than $1,000.00, or name with only one character. If it finds unusual values, the program warns the user and asks if it should continue.

The following steps walk through the solution:

1. Build the form as shown in Figure 11-1. Because there are only three choices for city, it should be a ComboBox. Given a choice for city, there are only a few choices for ZIP code so that should also be a ComboBox. By using ComboBoxes, the program prevents the user from entering invalid values.

2. All the other fields are TextBoxes. Those that the user doesn’t enter (the row totals and the grand total) are read-only. Because the state must be FL, it is also read-only. Set the ReadOnly property for those TextBoxes to true.

3. The program begins with the following setup code:

using System.Globalization;

using System.Text.RegularExpressions;

...

// Regular expressions for validation.

private const string namePattern = @"^([A-Z][a-z]*[-' ]?)+$";

// Sanity check bounds.

private const int minNameLength = 3;

private const int minDescrLength = 6;

private const decimal minUnitCost = 1.00m;

private const decimal maxUnitCost = 1000;

private const int minQuantity = 1;

private const int maxQuantity = 100;

// ZIP codes for different cities.

private string[][] zips =

{

    new string[] { "13370", "13371", "13372" },             // City 0

    new string[] { "13375", "13376" },                      // City 1

    new string[] { "13376", "13377", "13378", "13379" },    // City 2

};

// Colors for valid and invalid fields.

private Color validColor = SystemColors.Window;

private Color invalidColor = Color.Yellow;

// An array holding the item row TextBoxes.

private TextBox[,] rowTextBoxes;

The code starts by including the System.Globalization namespace it needs to parse currency values and then including the System.Text.RegularExpressions namespace it needs to use regular expressions easily.

It then defines several values that it uses in validations. The first is a regular expression pattern to validate names. The pattern matches an uppercase letter followed by any number of lowercase letters and then a hyphen, apostrophe, or space. It repeats this group at least one time. The pattern as a whole is anchored to the beginning and ending of the text, so all the text must be matched by the pattern.

The code defines some constants to use for sanity checking. It then creates an array of arrays that holds the allowed ZIP codes for the three cities that the program allows.

The last pieces of initialization code define colors to use for fields containing valid and invalid values, and an array to hold the TextBoxes that hold information about the items in the order.

The following code shows the program’s Load event handler and the event handler that executes when the user clicks Cancel:

// Get ready.

private void Form1_Load(object sender, EventArgs e)

{

    // Select a city so there's always a selection.

    cityComboBox.SelectedIndex = 0;

    // Initialize the array of item row TextBoxes.

    rowTextBoxes = new TextBox[,]

    {

        { descrTextBox1, unitCostTextBox1, quantityTextBox1, totalTextBox1 },

        { descrTextBox2, unitCostTextBox2, quantityTextBox2, totalTextBox2 },

        { descrTextBox3, unitCostTextBox3, quantityTextBox3, totalTextBox3 },

        { descrTextBox4, unitCostTextBox4, quantityTextBox4, totalTextBox4 },

    };

}

// Just close the form.

private void cancelButton_Click(object sender, EventArgs e)

{

    Close();

}

The Load event handler selects the first choice in the city ComboBox, so a city is always selected. Because a city is always selected and all the selections are valid, the program never needs to validate this entry.

The event handler also initializes the array of TextBoxes representing the order items.

The Cancel button simply closes the form.

The following code shows some of the program’s validation methods. They are enclosed in #region and #endregion directives, so it’s easy to hide the validation code.

#region Field Validation Methods

// Validate a TextBox.

private void ValidateTextBoxPattern(TextBox txt, bool allowBlank, string pattern)

{

    // Assume it's invalid.

    bool valid = false;

    // Check for allowed blank.

    string text = txt.Text;

    if (allowBlank && (text.Length == 0)) valid = true;

    // If the regular expression matches the text, allow it.

    if (Regex.IsMatch(text, pattern)) valid = true;

    // Display the appropriate background color.

    if (valid) txt.BackColor = validColor;

    else txt.BackColor = invalidColor;

}

// Validate a TextBox containing a currency value.

// Return true if the TextBox's value is valid.

private bool ValidateTextBoxCurrency(TextBox txt, bool allowBlank,

    out decimal value)

{

    // Assume it's invalid.

    bool valid = false;

    // Check for allowed blank.

    string text = txt.Text;

    if (allowBlank && (text.Length == 0)) valid = true;

    // If it contains a currency value, allow it.

    if (decimal.TryParse(text, NumberStyles.Currency,

            CultureInfo.CurrentCulture, out value))

        valid = true;

    // Display the appropriate background color.

    if (valid) txt.BackColor = validColor;

    else txt.BackColor = invalidColor;

    return valid;

}

// Validate a TextBox containing an integer.

// Return true if the TextBox's value is valid.

private bool ValidateTextBoxInteger(TextBox txt, bool allowBlank, out int value)

{

    // Assume it's invalid.

    bool valid = false;

    // If the text is blank and blank is allowed, allow it.

    string text = txt.Text;

    if (allowBlank && (text.Length == 0)) valid = true;

    // If it contains a currency value, allow it.

    if (int.TryParse(text, out value)) valid = true;

    // Display the appropriate background color.

    if (valid) txt.BackColor = validColor;

    else txt.BackColor = invalidColor;

    return valid;

}

// Validate the entries for a row.

// Return true if the row is valid.

private bool ValidateRow(int row)

{

    // If every field is blank, it's valid.

    if ((rowTextBoxes[row, 0].Text.Length == 0) &&

        (rowTextBoxes[row, 1].Text.Length == 0) &&

        (rowTextBoxes[row, 2].Text.Length == 0))

    {

        rowTextBoxes[row, 0].BackColor = validColor;

        rowTextBoxes[row, 1].BackColor = validColor;

        rowTextBoxes[row, 2].BackColor = validColor;

        // Clear the total.

        rowTextBoxes[row, 3].Clear();

        return true;

    }

    // Some fields are non-blank so all are required.

    bool descrIsValid = (rowTextBoxes[row, 0].Text.Length > 0);

    if (descrIsValid) rowTextBoxes[row, 0].BackColor = validColor;

    else rowTextBoxes[row, 0].BackColor = invalidColor;

    // Validate unit cost.

    decimal unitCost;

    bool unitCostIsValid =

        ValidateTextBoxCurrency(rowTextBoxes[row, 1], false, out unitCost);

    // Validate quantity.

    int quantity;

    bool quantityIsValid =

        ValidateTextBoxInteger(rowTextBoxes[row, 2], false, out quantity);

    // If unit cost and quantity are present, calculate total cost.

    if (unitCostIsValid && quantityIsValid)

    {

        decimal total = unitCost * quantity;

        rowTextBoxes[row, 3].Text = total.ToString("C");

    }

    else rowTextBoxes[row, 3].Clear();

    // Enable or disable the OK button.

    EnableOkButton();

    // Return true if all fields are valid.

    return (descrIsValid && unitCostIsValid && quantityIsValid);

}

#endregion Field Validation Methods

The ValidateTextBoxPattern method examines a TextBox and sets its background color to validColor or invalidColor depending on whether it matches a regular expression.

The ValidateTextBoxCurrency method validates a TextBox to see if it contains a currency value. It sets the control’s background color appropriately and returns true if the TextBox contains a valid currency value. If it contains a valid value, the method also returns the value through the parameter value.

The ValidateTextBoxInteger method is similar to ValidateTextBoxCurrency except it determines whether a TextBox contains an integer instead of a currency value.

The ValidateRow method validates a row of TextBoxes that represents an order item. If all the TextBoxes are blank, the row is valid.

If any field in the row is nonblank, every field is required and the method validates each appropriately. If the unit cost and quantity are both valid, the method calculates and displays the row’s total.

The method finishes by calling the EnableOkButton method, which are described next.

The Changed event handlers for the TextBoxes on each row call the ValidateRow method to determine whether the values in their rows are valid.

The following code shows the EnableOkButton method:

// Enable the OK button if all fields are okay.

private void EnableOkButton()

{

    // Assume all fields are okay.

    bool valid = true;

    // See if the contact fields are okay.

    if (firstNameTextBox.Text.Length == 0) valid = false;

    if (lastNameTextBox.Text.Length == 0) valid = false;

    if (streetTextBox.Text.Length == 0) valid = false;

    // See if all item fields are okay.

    foreach (TextBox txt in rowTextBoxes)

        if (txt.BackColor == invalidColor)

        {

            valid = false;

            break;

        }

    // Calculate the grand total.

    CalculateGrandTotal();

    // Make sure at least one item row has values in it.

    if (grandTotalTextBox.Text.Length == 0) valid = false;

    // If the values are valid, calculate the grand total.

    CalculateGrandTotal();

    // Enable or disable the button.

    okButton.Enabled = valid;

}

This method enables or disables the form’s OK button depending on whether all the form’s fields hold valid values. It begins by validating the first name, last name, and street fields. (The city and ZIP code ComboBoxes always have valid selections, so it doesn’t need to validate them.)

The code checks the TextBoxes in the item rows. If any of those TextBoxes has the invalid background color, the form’s entries are invalid.

The code then calls the CalculateGrandTotal method (which is described next) to calculate a grand total if possible. If the grand total value is blank, there is no row with a valid unit cost and quantity, so the form’s fields are not valid.

The method enables the OK button if all the field values are valid.

The following code shows the CalculateGrandTotal method:

// Calculate the grand total if possible.

private void CalculateGrandTotal()

{

    // See if any row has a total.

    if ((totalTextBox1.Text.Length == 0) &&

        (totalTextBox2.Text.Length == 0) &&

        (totalTextBox3.Text.Length == 0) &&

        (totalTextBox4.Text.Length == 0))

    {

        grandTotalTextBox.Clear();

        return;

    }

    // Add up the item totals.

    decimal grandTotal = 0;

    for (int row = 0; row < 4; row++)

    {

        if (rowTextBoxes[row, 3].Text.Length > 0)

            grandTotal += decimal.Parse(

                rowTextBoxes[row, 3].Text, NumberStyles.Currency);

    }

    // Display the grand total.

    grandTotalTextBox.Text = grandTotal.ToString("C");

}

If all the item rows have a blank total, none of them have valid unit costs and quantities. In that case, the method blanks the grand total TextBox.

If any item row has a total, the method adds up the totals and displays the grand total.

Each of the controls has an appropriate event handler to set its background color to indicate whether the control holds a valid value. For example, the following code shows how the first name TextBox validates changes:

private void firstNameTextBox_TextChanged(object sender, EventArgs e)

{

    ValidateTextBoxPattern(firstNameTextBox, false, namePattern);

    EnableOkButton();

}

This code uses ValidateTextBoxPattern to set the control’s background color appropriately. It then calls EnableOkButton to enable or disable the OK button.

The other fields perform similar but appropriate validation. For example, the quantity TextBoxes use ValidateTextBoxInteger to determine whether they contain valid integer values.

The city ComboBox is somewhat different from the other fields. When the user selects a city, the following code executes:

private void cityComboBox_SelectedIndexChanged(object sender, EventArgs e)

{

    zipComboBox.Items.Clear();

    foreach (string zip in zips[cityComboBox.SelectedIndex])

        zipComboBox.Items.Add(zip);

    zipComboBox.SelectedIndex = 0;

}

This code copies the ZIP codes for the selected city into the ZIP code ComboBox. It then selects the first ZIP code, so there is always one selected.

When the user clicks the OK button, the following code performs final validations:

// Make sure the form is complete.

private void okButton_Click(object sender, EventArgs e)

{

    // Perform validations that require fixing.

    string message = "";

    TextBox focusTextBox = null;

    if (firstNameTextBox.Text.Length == 0)

    {

        message += "First name cannot be blank.\n";

        if (focusTextBox == null) focusTextBox = firstNameTextBox;

    }

    if (lastNameTextBox.Text.Length == 0)

    {

        message += "Last name cannot be blank.\n";

        if (focusTextBox == null) focusTextBox = lastNameTextBox;

    }

    if (streetTextBox.Text.Length == 0)

    {

        message += "Street cannot be blank.\n";

        if (focusTextBox == null) focusTextBox = streetTextBox;

    }

    if (grandTotalTextBox.Text.Length == 0)

    {

        message += "At least one item row must be entered.\n";

        if (focusTextBox == null) focusTextBox = descrTextBox1;

    }

    // See if any of these failed.

    if (message.Length > 0)

    {

        // Remove the final \n.

        message = message.Substring(0, message.Length - 1);

        // Display the message.

        MessageBox.Show(message, "Invalid Data",

            MessageBoxButtons.OK, MessageBoxIcon.Error);

        focusTextBox.Focus();

        return;

    }

    // Perform sanity checks.

    if (firstNameTextBox.Text.Length < minNameLength)

    {

        message += "The first name is unusually short.\n";

        if (focusTextBox == null) focusTextBox = firstNameTextBox;

    }

    if (lastNameTextBox.Text.Length < minNameLength)

    {

        message += "The last name is unusually short.\n";

        if (focusTextBox == null) focusTextBox = lastNameTextBox;

    }

    if (streetTextBox.Text.Length < minNameLength)

    {

        message += "The street name is unusually short.\n";

        if (focusTextBox == null) focusTextBox = streetTextBox;

    }           

    for (int row = 0; row < 4; row++)

    {

        SanityCheckRow(row, ref message, ref focusTextBox);

    }

    // See if any sanity checks failed.

    if (message.Length > 0)

    {

        // Compose the question.

        message = "Some fields contain unusual values.\n\n" +

            message + "\nDo you want to continue anyway?";

        // Display the message and let the user decide whether to continue.

        if (MessageBox.Show(message, "Continue?",

            MessageBoxButtons.YesNo, MessageBoxIcon.Question) == DialogResult.Yes)

        {

            // Continue anymway.

            Close();

        }

        else

        {

            // Let the user edit the data.

            focusTextBox.Focus();

        }

    }

}

The code first performs mandatory validations. It verifies that the first name, last name, and street are not blank and that there is a grand total. As it checks these conditions, it builds a message describing any problems. If any of these conditions are not met, the program displays the message, sets focus to the first control that had a problem, and returns.

If the form passes its mandatory checks, the code performs sanity checks. It verifies that the first name, last name, and street have certain minimum lengths. For each item row, the code calls the SanityCheckRow method to see if its values make sense. As with the mandatory checks, the code builds a message describing any problems it finds. If any of the sanity checks finds problems, the method displays the messages describing them and asks the user if it should continue anyway. If the user clicks Yes, the form closes. (In a real application, the program would probably save the order information to a database.)

The following code shows the SanityCheckRow method:

// Perform sanity checks for a row.

// If a field fails its checks, add a message to the message string

// and set focus to that field (if focus isn't already set somewhere else).

private void SanityCheckRow(int row, ref string message, ref TextBox focusTextBox)

{

    // Either every field is present or every field is blank.

    // If the description is blank, returnu true.

    if (rowTextBoxes[row, 0].Text.Length == 0) return;

    // Check the description.

    if (rowTextBoxes[row, 0].Text.Length < minDescrLength)

    {

        message += "Description " + (row + 1) + " is unusually short.\n";

        if (focusTextBox == null) focusTextBox = rowTextBoxes[row, 0];

    }

    // Check the unit price.

    decimal price = decimal.Parse(

        rowTextBoxes[row, 1].Text, NumberStyles.Currency);

    if ((price < minUnitCost) || (price > maxUnitCost))

    {

        message += "Unit price " + (row + 1) + " is unusual.\n";

        if (focusTextBox == null) focusTextBox = rowTextBoxes[row, 1];

    }

    // Check the quantity.

    int quantity = int.Parse(rowTextBoxes[row, 2].Text);

    if ((price < minUnitCost) || (price > maxUnitCost))

    {

        message += "Quantity " + (row + 1) + " is unusual.\n";

        if (focusTextBox == null) focusTextBox = rowTextBoxes[row, 1];

    }

}

This method checks the row’s description, unit cost, and quantity to see if they make sense. In any value is suspicious, the code adds a message to the string that will be displayed to the user.

This may seem like a lot of code, but the form contains quite a few fields that have different requirements and sanity checks. Giving the user the best possible experience sometimes takes work.

Managing Data Integrity

Some programmers validate the user’s inputs and then assume that the data is correct forevermore. This is usually a big mistake. Even if the user enters correct data, the data can later be corrupted by incorrect calculations as it passes through the system. If the program isn’t constantly on the lookout for invalid data, mistakes can sneak in and go unnoticed until long after they were introduced, making it extremely hard to figure out what part of the system caused the mistake.

There are a couple of actions you can take to help prevent this kind of data corruption. Two of these are using database validations and using assertions.

Using Database Validations

If a program uses a database, you can add checks and constraints to the database to prevent it from allowing invalid data. For example, if the database requires that an address has a ZIP code that includes exactly five decimal digits, there is no way the program can insert a record without a ZIP code or with the ZIP code 2812H.

Modern databases can ensure that a field isn’t blank, has a certain format, has a unique value, and even has certain relationships with other fields in the same or other records. If these kinds of standard database validations aren’t sufficient, you can write custom validation code that the database can execute when a value is created or modified.

Making the database validate its data can prevent the program from saving invalid data and is important, but in some sense it’s also a last resort. Many programs perform a considerable amount of work with data before it is saved to a database, so there are opportunities for the data to become corrupted between the user’s input and the database.

Programs also use data stored in the database later to perform calculations and that provides other opportunities for the data to become corrupted. Finally, some programs don’t use databases at all.

Using Assertions

Another precaution you can take to manage the data’s integrity as it passes through the system is to use assertions. An assertion is a piece of code that makes a particular claim about the data and that interrupts execution if that claim is false.

One way to make assertions is to simply use an if statement to test the data and then throw an exception if the data seems invalid. The following code shows an example:

if (!Regex.IsMatch(zip, @"^\d{5}$"))

    throw new FormatException("ZIP code has an invalid format.");

This code uses the Regex class’s static IsMatch method to determine whether the string variable zip contains a value that matches a five-digit ZIP code format. If the zip contains an invalid value, the code throws a FormatException.

To make this kind of assertion easier, the .NET Framework provides the System.Diagnostics.Debug class. This class’s Assert method tests a boolean value and throws an exception if it is false. The following code is roughly equivalent to the previous code that uses an if statement:

Debug.Assert(Regex.IsMatch(zip, @"^\d{5}$"));

If an assertion fails, the program displays a dialog similar to the one shown in Figure 11-2.

Figure 11-2: If an assertion fails, the Debug.Assert method displays a dialog that includes a stack trace showing where the assertion failed.

image

If you click Abort, the program ends. If you click Retry, Visual Studio pauses the program’s execution at the Assert statement, so you can try to figure out what went wrong. If you click Ignore, the program continues executing after the Assert statement.

Overloaded versions of the Assert method let you indicate a message that the dialog should display in addition to the stack trace.

The dialog shown in Figure 11-2 is one big difference between throwing your own exceptions and using the Assert statement. Another big difference is that the Assert statement executes only in debug releases of a program. In release builds, the Assert statement is completely ignored. While you are testing the program, the assertion will help you locate bugs. When you compile a release build and give it to end users, the users won’t see the intimidating dialog shown in Figure 11-2.

NOTE To select a debug or release build, open the Build menu and select Configuration Manager. In the Active Solution Configuration drop-down, select Debug or Release.

This means the program must be prepared to continue running even if an assertion fails. The program must have a way to work around invalid data or else it will fail in a release build.

If a failed assertion means the program cannot reasonably continue, the program should throw an exception and stop instead of using an assertion that will be skipped in release builds.

For example, suppose a retailer sales program needs to print a customer invoice and a customer has ordered 1,000 pairs of sunglasses. That is a suspiciously large order, so the program can flag it by using an assertion. For instance, it might assert that the number of items ordered is less than 100.

In debug builds, the assertion fails, so you can examine the data to see if the order actually does need 1,000 pairs of sunglasses or if the data has become corrupted. In release builds, this assertion is skipped, so the program prints the invoice for 1,000 pairs of sunglasses. This is an unusual order, but it could actually happen—and you’ll make a fair amount of profit.

In contrast, suppose the order doesn’t contain a customer address. In that case, the program cannot print a meaningful invoice, so it should not try. The invoice printing code could either catch this error, display an error message, and stop trying to print the invoice, or it could throw an exception and let code higher up in the call stack deal with the problem. Unlike the case of the unusual order size, the program cannot successfully print the invoice, so it may as well give up.

You can use assertions anywhere in the program where you think the data might become corrupted. One particularly good location for assertions is at the beginning of any method that uses the data. For example, the following code shows how an invoice printing method might validate its inputs:

private void PrintInvoice(string customerName, string customerAddress,

    List<OrderItem> items)

{

    // Validate inputs.

    // Validate customer name.

    if (string.IsNullOrWhiteSpace(customerName))

        throw new ArgumentNullException("Customer name is missing.");

    // Validate customer address.

    if (string.IsNullOrWhiteSpace(customerAddress))

        throw new ArgumentNullException("Customer address is missing.");

    // Validate item quantities and unit prices.

    foreach (OrderItem item in items)

    {

        Debug.Assert(item.Quantity <= 100,

            "OrderItem " + item.Description +

            ", quantity is larger than 100.");

        Debug.Assert(item.UnitPrice <= 100,

            "OrderItem " + item.Description +

            ", unit price is larger than $100.00.");

    }

    // Print the invoice.

    ...

}

This method starts with data validation code. First, it verifies that the customer’s name and address are not blank. If either of those values is blank, the method cannot print a useful invoice, so it throws an exception.

Next, the method loops through the order’s items and validates their quantities and unit costs. For each item, the code asserts that the item’s quantity is at most 100 and its unit cost is at most $100.00. If either of those assertions fails in a debug build, executing stops, so you can try to determine if the data has been corrupted or if this is just an unusual order. In a release build, these assertions are ignored, so the program prints the invoice even if an item’s quantity or unit price is unusually large.

Another good place for assertions is at the end of any method that manipulates the data. At that point the code can verify that the changes made by the method make sense.

For example, suppose a method sorts a list of customer records, so they are ordered with those having the largest delinquent balances coming first. Before it returns the newly order list, the method can run through the list and verify that the customers are in their proper order.

NOTE Programmers sometimes resist putting data validation code at the end of their methods because they can’t visualize the code making a mistake. That’s natural because they just wrote the code, and if it contained a mistake, they would have fixed it.

Of course, if programmers were right and none of their modules contained bugs, the program as a whole wouldn’t contain bugs, and that’s rarely the case for nontrivial programs. One way to encourage programmers to add these sorts of validations is to write the validation code before writing the rest of the method. At that point, the programmer doesn’t have the preconception that the code is perfect so is more likely to validate the data thoroughly.

Because assertions are ignored in release builds, the program’s performance doesn’t suffer even if you add a lot of assertions to a method. Even if the validations never detect an error, at least you’ll have some reason to believe the code is correct. Validation code is worth the effort if for no other reason than peace of mind.

Debugging

Visual Studio provides good tools for interactively debugging an application. Breakpoints, watches, and the ability to step through the code let you study the application as it runs. You set breakpoint conditions, hit counts, and filters to further refine how breakpoints work.

These are important techniques that every programmer should know, but they are not part of the C# language, so they aren’t covered here. Instead the sections that follow describe techniques you can use to make your C# code help debug a program. They explain how to use compiler directives to determine which pieces of code are executed and which are ignored.

Preprocessor Directives

Preprocessor directives tell the C# compiler how to process pieces of code. They let a program exclude pieces of code from compilation, define symbols to use in managing compiled code, and group pieces of code for convenience.

The following sections describe the C# preprocessor directives.

#define and #undef

The #define directive defines a preprocessor symbol or conditional compilation symbol for the module that contains the directive. Later you can use the #if and #elif preprocessor directives to see if the symbol is defined.

Note that you cannot assign a value to the symbol, so it isn’t comparable to a C# variable or constant. All you can do is define or undefine the symbol and see if it has been defined.

You can also use Visual Studio to define symbols for an entire project. To do so, open the Project menu and select Properties. In the project’s property pages, select the Build tab and enter the names of any symbols you want to define in the Conditional Compilation Symbols text box.

The #undef directive undefines a previously defined preprocessor symbol.

Both the #define and #undef directives must appear in a module before any nondirective statement in a module. Because these statements must go at the beginning of the file, you normally don’t use #undef to undefine a symbol that you had just created with #define. Usually #undef is more useful for undefining a symbol that you created by using the project’s property pages.

#if, #elif, #else, and #endif

The #if, #elif, #else, and #endif directives work much like the C# if, else, and else if statements do, but they test preprocessor symbols instead of boolean expressions. Because you cannot assign a value to a preprocessor symbol, these statements merely test whether a symbol exists.

The #if and #elif directives determine whether a symbol exists and include their code in the compilation if it does. If the symbol does not exist, the code following the directive is completely omitted from the compilation.

If none of the symbols in a series of #if and #elif directives exists, the code following the #else directive (if it exists) is included in the compilation.

For example, suppose a module begins with the following #define statements:

// Debug levels. Level 2 gives the most information.

#define DEBUG1

//#define DEBUG2

This code defines the two preprocessor symbols: DEBUG1 and DEBUG2. The second is commented out, so only DEBUG1 is defined.

Now suppose the module later includes the following method:

        private void VerifyInternetConnections()

        {

#if DEBUG2

            // Display lots of debugging information.

            ...

#elif DEBUG1

            // Display some debugging information.

            ...

#else

            // Display minimal debugging information.

            ...

#endif

            // Verify the connections.

            ...

        }

The #if directive looks for the symbol DEBUG2. That symbol’s definition is commented out, so the following code is ignored.

Next, the #elif directive looks for the symbol DEBUG1. That symbol’s definition is not commented out, so the symbol exists. The code following the #elif directive is included in the compilation, and the program displays some debugging information.

If neither DEBUG2 nor DEBUG1 were defined, the #else directive would include its code, and the program would display minimal debugging information.

You can use the logical operators !, &&, and || to combine symbols in an expression. For example, the directive #if DEBUG1 && DEBUG2 includes its code if both of the symbols DEBUG1 and DEBUG2 are defined.

You can also use the relational operators != and == to compare the existence of two symbols. The directive #if DEBUG1 == DEBUG2 includes its code if both DEBUG1 and DEBUG2 are defined or both are undefined.

The special values true and false represent a symbol’s existence. For example, the following statements are all equivalent:

#if DEBUG1

#if DEBUG1 == true

#if DEBUG1 != false

Similarly, the following statements are equivalent:

#if !DEBUG1

#if DEBUG1 != true

#if DEBUG1 == false

Finally, you can use parentheses to group symbols and make complex expressions easier to understand.

NOTE Visual Studio immediately grays out any code that is not included in the current compilation. For example, if a piece of code follows the #if DEBUG1 directive and DEBUG1 is not defined, the code is grayed out. This lets you easily see which code will be included in the compilation and which will not.

#warning and #error

The #warning directive generates a warning that appears in Visual Studio’s Error List. Visual Studio’s code editor also highlights the warning by default with a squiggly green underline.

One use for this is to flag code that is included in a #ifdef directive but that is obsolete. For example, consider the following code:

#if OLD_METHOD

#warning Using obsolete method to calculate fees.

    ...

#else

    ...

#endif

If the symbol OLD_METHOD is defined, the code adds the warning to the Error List and includes whatever code is appropriate. If the symbol is not defined, the program includes the code after the #else directive and does not include the warning.

The #error directive is somewhat similar to the #warning directive except it generates an error instead of a warning. Like a warning, the error appears in Visual Studio’s Error List. Unlike a warning, the error prevents Visual Studio from successfully building the program. Visual Studio’s code editor also highlights the error by default with a squiggly red underline.

#line

The #line directive enables you to change the program’s line number and optionally the name of the file that is reported in warnings, errors, and stack traces. For example, the following code displays a stack trace with a modified line number and filename:

#line 10000 "Geometry Methods"

            Console.WriteLine("********** " + Environment.StackTrace);

This stack trace would indicate that the Console.WriteLine statement is at line 10000 in the file “Geometry Methods.”

Changing the line numbering information in this way can be confusing, so you should use this method sparingly. One reason you might want to do this is if you want to keep a section of code’s line numbering even if you insert other lines of code before it.

The #line default directive restores the lines that follow to their original numbering. In that case, the lines that were renumbered by a previous #line directive are counted normally.

The #line hidden directive hides the following lines from the debugger until the next #line directive. If you step through the code, the debugger jumps over those lines.

#region and #endregion

As their names imply, the #region and #endregion directives create a region in the code that you can expand or collapse to hide code in the code editor. Figure 11-3 shows the code editor displaying a piece of code that defines three regions. The first two, which are named Sales Routines and Billing Routines, are expanded. The third, which is named Graphical Routines, is collapsed to hide the code it contains. Click the + or - sign to the left to expand or collapse a region.

Figure 11-3: You can use regions to hide blocks of code to make a file easier to read.

image

Every #region must end with a corresponding #endregion. You can nest a region inside #if directives, but you cannot make a region that overlaps part of an #if directive.

A region can, however, overlap part of a method. For example, a region could start outside of the LocateCustomer method and end in the middle of it. If you collapsed that region, the code would be confusing to read, so you should probably not make a region that overlaps methods in that way.

You can follow the #region and #endregion directives with an optional name. If you follow the #region directive with a name, the code editor displays it when you collapse the region. In Figure 11-3 the Graphical Routines region is collapsed, and the code editor is displaying its name.

The names following the #region and #endregion directives are just strings that the code editor ignores, so they can contain any characters. The text after an #endregion directive doesn’t need to match the text after the corresponding #region directive; although, to make the code as readable as possible, you may want to make them the same.

#pragma warning

The #pragma directive gives special instructions to the compiler, potentially enabling you to create new preprocessor instructions. The C# compiler supports two #pragma directives: #pragma warning and #pragma checksum.

The #pragma warning directive can enable and disable specific warnings. For example, consider the following class definition:

private class OrderItem

{

    public string Description;

    public int Quantity = 0;

    public decimal UnitPrice = 0;

}

This code defines three public fields: Description, Quantity, and UnitPrice. It initializes Quantity and UnitPrice but not Description, so when you try to build this program, Visual Studio flags the line that declares Description with the following warning:

Field ‘WindowsFormsApplication1.Form1.OrderItem.Description’ is never assigned to, and will always have its default value null.

The following code shows how you can use a #pragma directive to hide that warning:

        private class OrderItem

        {

#pragma warning disable 0649

            public string Description;

#pragma warning restore 0649

            public int Quantity = 0;

            public decimal UnitPrice = 0;

        }

The first #pragma directive disables warning number 0649, which is the “never assigned to” warning. (More on how to find the warning number shortly.)

The second #pragma directive re-enables the warning. Warnings are displayed for a reason, so it’s not a good idea to turn them off without a good reason. Re-enabling the warning lets Visual Studio flag other uninitialized variables so that you can fix them.

Before you can disable a warning, you need to figure out its number. Unfortunately, the Error List displays the warning message, but not its number. To find the number, build the program and then look in the Output window. Somewhere buried in the copious compilation output you should find the warning and its number.

Figure 11-4 shows the Output window after building a program that contains several warnings. The first warning, Using Obsolete Method to Calculate Fees, was created by a #warning directive and has number 1030.

Figure 11-4: The Output window displays information about warnings including their numbers.

image

The next three warnings are about variables that are initialized but never used. Their warning number is 0219.

The final warning, which says the Description field is never assigned, has warning number 0649.

#pragma checksum

The #pragma checksum directive generates a checksum for a file. Normally, the compiler generates a checksum for a file and puts it in the program database (PDB) file, so the debugger can compare the file it is debugging to the source file. For ASP.NET applications, however, the checksum represents the generated source file rather than the original .aspx file, so this solution doesn’t work.

The #pragma checksum directive enables you to explicitly specify a checksum for the file. The following shows the syntax.

#pragma checksum "filename" "{guid}" "bytes"

Here filename is the name of the file, guid is the file’s globally unique identifier (GUID), and bytes is a string giving an even number of hexadecimal digits specifying the checksum.

This is a specialized directive, so it is not covered further here. For more information, see the online C# reference page http://msdn.microsoft.com/library/ms173226.aspx.

Predefined Compiler Constants

Earlier this chapter explained how you can use the #define and #undef directives to define and undefine conditional compilation symbols. By default, Visual Studio also predefines two other symbols: DEBUG and TRACE. You can use these symbols and the #if, #elif, #else, and #endif directives to include or exclude code from a build just as you can with symbols that you define.

Normally, DEBUG is defined in debug builds and TRACE is defined in both debug and release builds, but you can change that behavior.

To determine which kind of build Visual Studio will create, open the Build menu, and select Configuration Manager to display the Configuration Manager, as shown in Figure 11-5. Use the Active Solution Configuration drop-down on the upper left to select a debug or release build. The drop-down also includes a New option that lets you define a new build type.

Figure 11-5: Use the Configuration Manager to select a debug or release build.

image

Next, open the Project menu and select Properties to display the project’s property pages. On the Build tab, as shown in Figure 11-6, check or uncheck the Define DEBUG Constant and Define TRACE Constant boxes to determine whether those constants are defined. You can also add new constants of your own in the Conditional Compilation Symbols text box. (You can also use the Build page’s Configuration drop-down to select a configuration to modify; although, that doesn’t change the currently active configuration so it can be a bit confusing.)

Figure 11-6: You can use the project’s Build property page to define conditional compilation constants.

image

The DEBUG and TRACE symbols, and symbols that you define in the Debug property page, are saved with the current configuration. Later if you use the Configuration Manager to select a different configuration, its settings will apply.

For example, suppose you want to make a special configuration for weekly builds that defines the TRACE and WEEKLY_BUILD symbols but not the DEBUG symbol. To do that, you would use the Configuration Manager to define and select a new configuration named WeeklyBuild. Then on the project’s Build property page, you would uncheck the DEBUG box and add WEEKLY_BUILD to the Conditional Compilation Symbols text box. Now whenever you select the WeeklyBuild configuration, Visual Studio defines the TRACE and WEEKLY_BUILD symbols.

Debug and Trace

Earlier this chapter explained that a program can use the Debug.Assert method to test assertions in the code and that this method is ignored in release builds. Actually, the Debug class and the closely related Trace class do more than merely verify assertions. They provide services that send messages to listener objects. By default, the only listener is an instance of the DefaultTraceListener class, which sends messages to the Output window.

The following section says more about the Debug and Trace classes. The section after that one describes listeners in greater detail.

Debug and Trace Objects

Earlier this chapter said that the Debug class’s methods are ignored in release builds. It’s actually not the build that controls this behavior but the predefined DEBUG symbol. By default, that symbol is defined in debug builds and not in release builds; although, as was noted earlier, you can use the Build property page to change that behavior. As was mentioned earlier, you can also create your own configurations that may or may not define the DEBUG symbol.

You can also use the #define and #undef directives to define or undefine the DEBUG symbol. For example, you can define the symbol in a particular module to make the program execute the Debug.Assert method even in release builds for that module only.

The Trace class, which is also defined in the System.Diagnostics namespace, is similar to the Debug class except its behavior is controlled by the TRACE symbol. By default, both the debug and release builds define the TRACE symbol.

The Debug and Trace classes provide many of the same methods. Table 11-9 summarizes the classes’ most useful methods.

Table 11-9: Useful Debug and Trace Methods

Method

Purpose

Assert

Checks a boolean condition and throws an exception if it is not true.

Fail

Emits a failure message to the object’s listeners. Normally, the effect is similar to the way Assert throws an exception.

Flush

Flushes output to the listeners.

Indent

Increases the indent level by 1. This lets you make output displayed by the Write and WriteLine methods easier to read. For example, you could have a recursive method indent the output, so you can tell which method call is displaying different messages.

Unindent

Decreases the indent level by 1. If a method indents its output, it should probably unindent the output when it finishes.

Write

Writes a message to the object’s listeners.

WriteIf

If an indicated boolean expression is true, this method writes a message to the object’s listeners.

WriteLine

Writes a message followed by a new line to the object’s listeners.

WriteLineIf

If an indicated boolean expression is true, this method writes a message followed by a new line to the object’s listeners.

Listeners

Both the Debug and Trace classes have a Listeners collection that holds references to listener objects. Initially, these collections hold a reference to a DefaultTraceListener object, but you can change that if you like. To remove the DefaultTraceListener, call the Listeners collection’s Removemethod passing it the DefaultTraceListener object.

To direct output to other locations, add an appropriate trace listener object to the Listeners collection. The following list describes some of the other trace listener classes that you might use:

·        ConsoleTraceListener: Sends output to the Console window.

·        EventLogTraceListener: Sends output to an event log.

·        TextWriterTraceListener: Sends output to a stream such as a FileStream. This lets you write output into any file.

For example, the following code shows how a program might create a TextWriterTraceListener to log Trace output to the file TraceFile.txt.

using System.IO;

using System.Diagnostics;

private void Form1_Load(object sender, EventArgs e)

{

    // Create the trace output file.

    Stream traceStream = File.Create("TraceFile.txt");

    // Create a TextWriterTraceListener for the trace output file.

    TextWriterTraceListener traceListener =

        new TextWriterTraceListener(traceStream);

    Trace.Listeners.Add(traceListener);

    // Write a startup note into the trace file.

    Trace.WriteLine("Trace started " + DateTime.Now.ToString());

}

When the form loads, this code creates a stream associated with the file TraceFile.txt. It then uses that stream to create a TextWriterTraceListener that will write into the file. The Load event handler finishes by writing a message into the file indicating the time the trace started.

As the program works, it can write other messages into the file. The following code shows how the program might add trace information while processing an order:

private void processOrderButton_Click(object sender, EventArgs e)

{

    // Log an order processing message.

    Trace.WriteLine("Processing order");

    // Log the order's data.

    Trace.Indent();

    Trace.WriteLine("CustomerId: " + CustomerId);

    Trace.WriteLine("OrderId: " + OrderId);

    Trace.WriteLine("OrderItems:");

    Trace.Indent();

    foreach (OrderItem item in OrderItems)

        Trace.WriteLine(item.ToString());

    Trace.Unindent();

    Trace.WriteLine("ShippingAddress: " + ShippingAddress);

    Trace.Unindent();

    // Process the order.

    ...

}

The code starts by adding a message saying that it is processing an order. It then indents the trace output and displays the order’s information. It displays the customer and order IDs. It indents the trace again and displays the order’s items. After displaying the items, the code unindents to get back to the main order level of indentation, and displays the order’s shipping address. Finally, after displaying all the order information, the code unindents again to return to the original indentation level. It then does whatever is necessary to process the order.

When the program stops, it can use the following code to flush any buffered text to the trace file:

private void Form1_FormClosing(object sender, FormClosingEventArgs e)

{

    // Flush the trace output.

    Trace.WriteLine("Trace stopped " + DateTime.Now.ToString());

    Trace.Flush();

}

This code writes the current date and time into the file and flushes the output. If the program doesn’t flush the output before ending, any buffered output will be lost.

The following text shows some sample output describing a single order:

Trace started 4/1/2014 10:43:19 AM

Processing order

    CustomerId: 1310

    OrderId: 112980

    OrderItems:

        6 x White copy paper, ream

        1 x Pencils, dozen box

        6 x White copy paper, ream

    ShippingAddress: 123 Main St, East Zephyr NH 01293

Trace stopped 4/1/2014 10:47:19 AM

By using a TextWriterTraceListener, you can make a program keep a complete log of its activities.

There are a couple of useful modifications you can make to this technique. First, you can open the file for appending instead of creating it as in the previous example. That lets the trace file keep records of past program runs instead of overwriting the file each time.

You can also allow sharing when you open the file. That lets other programs such as Microsoft Word open the file in read-only mode, so you can take a peek at the file while the program is still running. If you do this, you should call the Trace or Debug object’s Flush method each time you write something into the file that you may want to peek at. In the previous example, you would probably want to flush the output after opening the file and after writing an order’s information into it.

NOTE If you set the Trace or Debug object’s AutoFlush property to true, then the object automatically flushes its output after every write.

The following code shows one way you could open the trace file to append new text at the end of the file if it exists and allowing other programs to read the file.

Stream traceStream = File.Open("TraceFile.txt",

    FileMode.Append, FileAccess.Write, FileShare.Read);

For information on building your own listener class, see “TraceListener Class” at http://msdn.microsoft.com/library/system.diagnostics.tracelistener.aspx.

Programming Database Files

When you build a debug release, Visual Studio creates a program database file that contains debugging information about the program. The debugger uses this information to let you debug the application.

You can use the project’s property pages to control the amount of information that Visual Studio includes in the PDB file. To do that, select the Project menu’s Properties command to open the project’s property pages. On the Build tab, click Advanced. On the Advanced Build Settings dialog, use the Debug Info drop-down to select full, pdb-only, or none.

The “full” setting, which is the default for debug builds, creates a fully debuggable program.

The “pdb-only” setting, which is the default for release builds, creates a PDB file so exception messages can include information about where the error occurred. Visual Studio doesn’t include the Debuggable attribute, however, that makes the code debuggable.

The “none” setting makes Visual Studio not create the PDB file.

If you create a compiled executable, you can still debug it if you have the correct PDB file available. Note that the PDB file is tied to a specific build, and will let you debug only that particular build. Due to the way builds are created, a PDB file cannot debug an executable from a different build even if you haven’t changed the code.

To debug a compiled executable, place the PDB file in the same directory as the executable. Next, use Visual Studio to open the executable file and run it. If a Debug or Trace object’s Assert or Fail method causes an exception, you can click the Retry button on the resulting dialog to debug the program.

The moral of the story is, if you want to debug an executable program after you build it, save its PDB file, and be sure you can figure out which PDB file goes with which version of the executable program.

Instrumenting Applications

Instrumenting a program means adding features to it to study the program itself. Usually that means adding code to monitor performance, record errors, and track program execution. With good instrumentation, you can tell what an application is doing and identify performance bottlenecks without stepping through the code in the debugger.

The following sections discuss some ways you can instrument a program to understand its behavior and performance.

Tracing

Tracing is the process of instrumenting a program, so you can track what it is doing. Earlier sections in this chapter explained how to use the Debug and Trace classes to add tracing to a program. By placing calls to Debug and Trace methods in key routines, you can follow the program’s execution through those routines. In addition to making the program tell you what it is doing, you can add the current time to messages to get an idea of how fast the program is running at different points.

Logging and Event Logs

Logging is the process of instrumenting a program, so it records key events. For example, in an order processing system, you might want to keep a record of an order’s major steps such as order creation, fulfillment, shipping, billing, and payment received.

As earlier sections explained, you can add listeners to the Debug or Trace classes to write messages into files. You could use that technique to log important events into files. The Debug and Trace classes are usually used for tracing not logging, however. In particular, most developers want logging to occur even if the DEBUG and TRACE symbols are not defined.

Instead of using Debug and Trace to log events, the program can write event records into an ordinary text file. This has the advantage of simplicity, and anyone can easily read a text file.

NOTE Often it is useful to make the amount of logging information recorded configurable, either by using preprocessor symbols or through configuration files. Then if the program is having problems, you can increase the amount of information it saves, so you can study the problem.

For example, when it creates a new customer order, a program might normally record only the new order’s ID. If you set a configuration flag, it might also log the customer’s contact information. If you set a different flag, it might also log all the order’s data including information about the order items.

When things are running smoothly, you can omit most of this information to save space in the log file, but you can increase the amount of information saved when necessary, so you can troubleshoot problems.

Another option is to write event information into the system log files. The WriteToEventLog program, which is shown in Figure 11-7 and available for download, demonstrates writing into the system event logs. Enter an event source name, event ID number, and event description of your choosing. The log name should be one of Application, Security, or System. When you have entered the values, click Write to create the log entry.

Figure 11-7: The WriteToEventLog program writes messages into the system event logs.

image

The following code shows how the program works:

using System.Diagnostics;

...

// Write an event log entry.

private void writeButton_Click(object sender, EventArgs e)

{

    string source = sourceTextBox.Text;

    string log = logTextBox.Text;

    string message = eventTextBox.Text;

    int id = int.Parse(idTextBox.Text);

    // Create the source if necessary. (Requires admin privileges.)

    if (!EventLog.SourceExists(source))

        EventLog.CreateEventSource(source, log);

    // Write the log entry.

    EventLog.WriteEntry(source, message,

        EventLogEntryType.Information, id);

    MessageBox.Show("OK");

}

The code first gets the values you entered on the form. It then uses the EventLog class’s SourceExists method to see if the source you entered is defined. If the source has not yet been defined, the code uses the CreateEventSource method to create it. Note that CreateEventSource requires administrative privilege.

Next, the code uses the WriteEntry method to create the event log entry. This method has several overloaded versions. The one used here takes as parameters the source name, entry description, entry type, and ID number.

Figure 11-8 shows the Event Viewer displaying some log entries created by this program. In this figure the third entry is selected, so the General tab at the bottom of the viewer displays that entry’s message text, “Created New Order 120193.”

Figure 11-8: The system’s Event Viewer displays log entries.

image

The system event logs provide a central place to view messages, so they can be particularly handy if you want to monitor several applications all in one place.

Profiling

Profiling is the process of gathering information about a program to study its speed, memory, disk usage, or other performance characteristics. There are two basic approaches to profiling a program: using a profiler and instrumenting the code by hand.

Using a Profiler

Automatic profilers can take several approaches to profiling an application. Some profilers instrument the source code to add timing statements to some or all the program’s methods. Others instrument the compiled code. Still others use CPU sampling, periodically peeking at the program’s state of execution and building up a statistical model of the amount of time the program spends in each method.

Visual Studio’s Premium and Ultimate editions include profiling tools that you can use to measure an application’s performance.

Some of the profiler’s features require elevated privileges, so when you’re ready to use it, start Visual Studio as an administrator. (One way to do that is to right-click the Visual Studio application and select Run As Administrator.)

To start, load a project, open the Analyze menu, and select Launch Performance Wizard to see the wizard shown in Figure 11-9. The CPU Sampling method periodically checks the program’s state to see what it is doing. This provides an idea of which routines are using the most CPU time without slowing the program down too badly. Instrumented code may provide more accurate information but adds instrumentation in the compiled code, so it slows the program down. The .NET Memory Allocation option uses sampling to gather information about memory usage. The Resource Contention Data option is used to study concurrency issues in multithreaded applications. For now, just pick CPU Sampling and click Next.

Figure 11-9: The Performance Wizard lets you study an application’s memory of CPU usage.

image

The wizard’s next page lets you pick the application that you will profile. Leave the currently loaded project selected and click Next.

The wizard’s final page says it is ready to collect performance information. Leave the Launch Profiling After the Wizard Finishes box checked and click Finish.

After the wizard closes, the program launches with the profiler running. (It may take several seconds for the profiler start, so be patient.) When the program appears, exercise the features that you want to profile. Because the sampling method takes data periodically, it may miss some fast method calls. To get the best data, exercise the features you’re studying several times.

When you finish, close the program normally and the profiler analyzes the data and presents results similar to those shown in Figure 11-10. The panels shown in Figure 11-10 show the hot path, the most often used call path, and the methods that were sampled the most often.

Figure 11-10: The profiler’s output lets you see the most active call path, the methods sampled most often, and other statistics.

image

Other views of the report let you see information for modules, methods, all call paths, lines of code, and other categories. See “Beginners Guide to Performance Profiling” at http://msdn.microsoft.com/library/ms182372.aspx for a more comprehensive introduction to using the profiler.

Profiling by Hand

You can profile a program by hand by inserting statements into the source code that record the program’s state and the current or elapsed time. For example, the following code shows how a method can use the Stopwatch class provided by the .NET Framework to time itself:

private void PerformCalculations()

{

    Stopwatch stopwatch = new Stopwatch();

    stopwatch.Start();

    // Perform the calculations here.

    ...

    Console.WriteLine("Time: " +

        stopwatch.Elapsed.TotalSeconds.ToString("0.00") +

        " seconds");

}

When the method starts, it creates a Stopwatch object and calls its Start method to begin timing. The method does whatever it needs to do and then uses the Stopwatch’s Elapsed property to see how much time has passed. The code converts the elapsed time into seconds and displays the result.

This technique is effective if you need to study only one or two key methods, but it has some drawbacks. If you don’t know where the program is spending most of its time, it’s hard to know where to put the profiling code. You can use preprocessor symbols to enable or hide this code when you don’t need it, but that could still require a lot of code.

Another approach to profiling by hand is to use performance counters.

Using Performance Counters

Performance counters track operations system wide to give you an idea of the computer’s activity.

For example, suppose an image processing program scans a directory every minute. It takes any image files it finds in that directory, processes them somehow, and then moves them into a different directory. You could make the program use a performance counter to keep track of each file it processed. Then you can use the system’s Performance Monitor tool to see the counter changing as the program executes.

Before you can use a custom performance counter, you need to make one. If you run Visual Studio with administrator privileges, you can use the Server Explorer built in to Visual Studio to create new performance counters. Open the View menu and select Server Explorer.

NOTE You can also use C# code to create performance counters. For instructions, see the article “How to: Create Custom Performance Counters” at http://msdn.microsoft.com/library/5e3s61wf.aspx.

Expand your computer’s entry, right-click Performance Counters, and select Create New Category. Figure 11-11 shows the Server Explorer on the left and the Performance Counter Builder dialog on the right.

Enter a new category name and description. Then use the New button to add new counters to the category. When you finish, click OK.

Figure 11-11 shows two performance counters being created. The first has type NumberOfItems32, which represents the total number of some event that is counted by a program. The second counter has type RateOfCountsPerSecond32, which tracks the current number of items per second. When a program increments this counter, the counter automatically updates the counts per second value.

To use a performance counter in a program, create a System.Diagnostics.PerformanceCounter object for the counter. The following code shows how a program could create PerformanceCounter objects for the two performance counters created in Figure 11-11:

private PerformanceCounter totalImages, imagesPerSecond;

private void Form1_Load(object sender, EventArgs e)

{

    totalImages = new PerformanceCounter();

    totalImages.CategoryName = "ImageProcessor";

    totalImages.CounterName = "Images processed";

    totalImages.MachineName = ".";

    totalImages.ReadOnly = false;

    imagesPerSecond = new PerformanceCounter();

    imagesPerSecond.CategoryName = "ImageProcessor";

    imagesPerSecond.CounterName = "Images per second";

    imagesPerSecond.MachineName = ".";

    imagesPerSecond.ReadOnly = false;

}

Figure 11-11: You can use Visual Studio’s Sever Explorer to create new performance counters.

image

This code first creates a PerformanceCounter object. It sets the object’s CategoryName and CounterName to the values used to create the counter in Figure 11-11. The code sets the MachineName to “.” to indicate the local computer. It then sets ReadOnly to false to allow the program to modify the counter’s value.

The code then repeats those steps to create a second PerformanceCounter object.

Having created the PerformanceCounter objects, the program can increment them when it performs whatever action you want to count.

Suppose the ImageProcessor program periodically examines a directory to see if it contains image files. When it finds a file, the program calls the following ProcessImageFile method:

private void ProcessImageFile(string filename)

{

    // Process the file.

    ...

    // Increment the performance counters.

    totalImages.Increment();

    imagesPerSecond.Increment();

}

This method does whatever it needs to do to the file. It then calls the performance counter objects’ Increment methods to increment the counters.

That’s all you need to do to create and use the performance counters. Now you need to use the system’s Performance Monitor tool to see the results.

To start the Performance Monitor in Windows 8, open the Control Panel, and use the navigation bar to go to Control Panel ⇒ All Control Panel Items ⇒ Performance Information And Tools. Click the Advanced Tools link and then click Open Performance Monitor.

In the tree view on the Performance Monitor’s left, expand Monitoring Tools and select Performance Monitor. In the graph that appears on the right, click the + sign to add performances counters to the graph. Select the performances counters that you want to view and click Add. After you have selected the counters, click OK.

Figure 11-12 shows the Performance Monitor displaying the two counters used by this example. The steadily increasing curve that wraps around from the right to left edge of the graph represents the Images Processed counter. The curve that wiggles up and down represents the Images Per Second counter.

The graph uses a Y-axis that ranges from 0 to 100 so often you’ll want to scale the counters’ values. In Figure 11-12, the program has been running for a while so Images Processed is scaled by a factor of 0.1, so its value would fit on the graph. The Images per Second value is scaled by a factor of 10, so the program is actually processing between 0 and approximately 6 image files per second.

To scale a counter, right-click the graph and select Properties. On the properties dialog, select the Data tab, click the counter that you want to scale, and set its Scale value to the scale factor that you want. When you finish, click OK.

Performance counters are fairly complicated and NumberOfItems32 and RateOfCountsPerSecond32 are only two of many counter types. For more information on performance counters, see the “Additional Reading and Resources” listed later in this chapter.

Figure 11-12: The Performance Monitor lets you view performance counters graphically.

image

Summary

This chapter described ways you can protect a program from incorrect data and study a program’s behavior.

Input validation techniques enable you to validate the user’s input before processing. Useful techniques include using methods provided by the string class and using regular expressions. Of course, you can avoid validation entirely if you use controls such as ComboBox and DateTimePicker, so the user cannot select invalid values.

Even after the program reads the user’s inputs, it must manage the data’s integrity as it moves through the program. The Debug.Assert statement lets the program detect unexpected or incorrect values within the program.

Preprocessor symbols and directives enable you to determine what code is included in a program compilation. Using these you can compile extensive data validation code only for debug builds much as the Debug.Assert statement is ignored except in debug builds.

The chapter finished by discussing different ways you can instrument applications and study a program’s performance. These include using the Debug and Trace classes, profilers, hand-coded instrumentation, and performance counters.

By using all these techniques, you can protect the program from invalid user inputs and watch for unexpected changes in the data during processing. You can also monitor the program’s performance to see what it is doing and how efficiently it is running.

Chapter Test Questions

Read each question carefully and select the answer or answers that represent the best solution to the problem. You can find the answers in Appendix A, “Answers to Chapter Test Questions.”

1. If the user is typing data into a TextBox and types an invalid character, which of the following actions would be inappropriate for the program to take?

a. Change the TextBox’s background color to indicate the error.

b. Silently discard the character.

c. Display an asterisk next to the TextBox to indicate the error.

d. Display a message box telling the user that there is an error.

2. If the user types an invalid value into a TextBox and moves focus to another TextBox, which of the following actions would be inappropriate for the program to take?

a. Force focus back into the TextBox that contains the error.

b. Change the first TextBox’s background color to indicate the error.

c. Change the first TextBox’s font to indicate the error.

d. Display an asterisk next to the first TextBox to indicate the error.

3. If the user enters some invalid data on a form and then clicks the form’s Accept button, which of the following actions would be appropriate for the program take?

a. Change the background color of TextBoxes containing invalid values to indicate the errors.

b. Display a message box telling the user that there is an error.

c. Do not close the form until the user corrects all the errors.

d. All the above.

4. Which of the following methods returns true if a regular expression matches a string?

a. Regex.Matches

b. Regex.IsMatch

c. Regexp.Matches

d. String.Matches

5. Which of the following regular expressions matches the Social Security number format ###-##-#### where # is any digit?

a. ^###-##-####$

b. ^\d3-\d2-\d4$

c. ^\d{3}-\d{2}-\d{4}$

d. ^[0-9]3-[0-9]2-[0-9]4$

6. Which of the following regular expressions matches a username that must include between 6 and 16 letters, numbers, and underscores?

a. ^[a-zA-Z0-9_]?{6}$

b. ^[a-zA-Z0-9_]{6,16}$

c. ^[A-Z0-9a-z_]?$

d. ^\w{16}?$

7. Which of the following regular expressions matches license plate values that must include three uppercase letters followed by a space and three digits, or three digits followed by a space and three uppercase letters?

a. (^\d{3} [A-Z]{3}$)|(^[A-Z]{3} \d{3}$)

b. ^\d{3} [A-Z]{3} [A-Z]{3} \d{3}$

c. ^\w{3} \d{3}|\d{3} \w{3}$

d. ^(\d{3} [A-Z]{3})?$

8. Which of the following statements about assertions is true?

a. The Debug.Assert method is ignored in release builds.

b. The program must continue running even if a Debug.Assert method stops the program.

c. When an assertion fails in debug builds, the Debug.Assert method lets you halt, debug the program, or continue running.

d. All the above.

9. Which of the following statements about the Debug and Trace classes is true?

a. The Debug class generates messages if DEBUG is defined. The Trace class generates messages if both DEBUG and TRACE are defined.

b. The Debug class generates messages if DEBUG is defined. The Trace class generates messages if TRACE is defined.

c. The Debug and Trace classes both generate messages if DEBUG is defined.

d. The Debug and Trace classes both generate messages if TRACE is defined.

10. Which of the following statements about builds is true by default?

a. Debug builds define the DEBUG symbol.

b. Debug builds define the TRACE symbol.

c. Release builds define the DEBUG symbol.

d. Release builds define the TRACE symbol.

e. Release builds define the RELEASE symbol.

11. Which of the following statements about PDB files is false?

a. You need a PDB file to debug a compiled executable.

b. You can use a PDB file to debug any version of a compiled executable.

c. The “full” PDB file contains more information than a “pdb-only” PDB file.

d. If you set the PDB file type to None, Visual Studio doesn’t create a PDB file.

12. Which of the following statements about tracing and logging is false?

a. Tracing is the process of instrumenting a program to track what it is doing.

b. Logging is the process of making the program record key events in a log file.

c. You can use DEBUG and TRACE statements to trace or log a program’s execution.

d. A program cannot write events into the system’s event logs, so you can see them in the Event Viewer.

13. Which of the following methods would probably be the easiest way to find bottlenecks in a program if you had no idea where to look?

a. Use an automatic profiler.

b. Instrument the code by hand.

c. Use performance counters.

d. Set breakpoints throughout the code and step through execution.

14. What of the following is the best use of performance counters?

a. To determine which of a program’s methods use the most CPU time.

b. To determine how often a particular operation is occurring on the system as a whole.

c. To determine how often a particular operation is occurring in a particular executing instance of a program.

d. To find the deepest path of execution in a program’s call tree.

Additional Reading and Resources

Here are some additional useful resources to help you understand the topics presented in this chapter:

.NET Framework Regular Expressions http://msdn.microsoft.com/library/hs600312.aspx

Regular Expression Language - Quick Referencehttp://msdn.microsoft.com/library/az24scfc.aspx

Character Classes in Regular Expressions http://msdn.microsoft.com/library/20bw873z.aspx

Regular Expression Options http://msdn.microsoft.com/library/yd1hzczs.aspx

C# Preprocessor Directives http://msdn.microsoft.com/library/ed8yd1ha.aspx

John Robbins’ Blog, PDB Files: What Every Developer Must Know http://www.wintellect.com/CS/blogs/jrobbins/archive/2009/05/11/pdb-files-what-every-developer-must-know.aspx

TraceListener Class http://msdn.microsoft.com/library/system.diagnostics.tracelistener.aspx

Tracing and Instrumenting Applications in Visual Basic and Visual C# http://msdn.microsoft.com/library/aa984115.aspx

Beginners Guide to Performance Profiling http://msdn.microsoft.com/library/ms182372.aspx

Find Application Bottlenecks with Visual Studio Profiler http://msdn.microsoft.com/magazine/cc337887.aspx

An Introduction To Performance Counters http://www.codeproject.com/Articles/8590/An-Introduction-To-Performance-Counters

How to: Create Custom Performance Counters http://msdn.microsoft.com/library/5e3s61wf.aspx

PerformanceCounter Class http://msdn.microsoft.com/library/system.diagnostics.performancecounter.aspx

Cheat Sheet

This cheat sheet is designed as a way for you to quickly study the key points of this chapter.

Input validation

·        Use TrackBar, ComboBox, ListBox, DateTimePicker, FolderBrowserDialog, and other controls to avoid validation if possible.

·        Make frequent validations (such as during keystrokes) provide nonintrusive feedback (such as changing the field’s background color).

·        Do not trap the user in a field until its value is entered correctly.

·        Remember that some values (such as “–.”) may be invalid but may be part of a valid value (such as “–.0”).

·        When the user tries to accept a form, validate all fields. Refuse to accept the form if there are invalid values. Warn the user if there are unusual values.

Validating data—built-in validation functions

·        Use string length to check for missing values.

·        Initialize a ComboBox or ListBox so that it always has a valid selection.

·        Use TryParse to validate data types such as int or decimal.

·        String methods that can help with validation include Contains, EndsWith, IndexOf, IndexOfAny, IsNullOrEmpty, IsNullOrWhitespace, LastIndexOf, LastIndexOfAny, Remove, Replace, Split, StartsWith, Substring, ToLower, ToUpper, Trim, TrimEnd, and TrimStart.

Validating data—regular expressions

·        Table 11-3 summarizes the useful Regex methods IsMatch, Matches, Replace, and Split.

·        Use string literals (beginning with the @ character) to make it easier to use regular expressions that contain escape characters.

·        For example, the following code checks whether the variable phone contains a value that matches a 7-digit U.S. phone number pattern:

if (Regex.IsMatch(phone, @"^\d{3}-\d{4}$")) ...

·        Table 11-10 summarizes some of the most useful regular expression components.

Table 11-10: Useful Regular Expression Components

Item

Purpose

\

Begins a special symbol such as \n or escapes the following character

^

Matches the beginning of string or line

$

Matches the end of string or line

\A

Matches the beginning of string (even if in multiline mode)

\Z

Matches the end of string (even if in multiline mode)

*

Matches the preceding 0 or more times

+

Matches the preceding 1 or more times

?

Matches the preceding 0 or 1 times

.

Matches any character

[abc]

Matches any one of the characters inside the brackets

[^abc]

Matches one character that is not inside the brackets

[a-z]

Matches one character in the range of characters

[^a-z]

Matches one character that is not in the range of characters

x|y

Matches x or y

(pattern)

Makes a numbered match group

(?<name>expr)

Makes a named match group

\2

Refers to previously defined group number 2

\k<name>

Refers to previously defined group named name

{n}

Matches exactly n occurrences

{n,}

Matches n or more occurrences

{n,m}

Matches between n and m occurrences

\b

Matches a word boundary

\B

Matches a nonword boundary

\d

Matches a digit

\D

Matches a nondigit

\f

Matches a form-feed

\n

Matches a newline

\r

Matches a carriage return

\s

Matches whitespace (space, tab, form-feed, and so on)

\S

Matches nonwhitespace

\t

Matches a tab

\v

Matches a vertical tab

\w

Matches a word character (includes underscore)

\W

Matches a nonword character

·        Use sanity checks to look for unusual values.

Managing data integrity

·        After you validate user inputs, the code must still protect the data as it is processed.

·        Use Debug.Assert statements to validate data as it moves through the program.

Debugging

·        Use the #define and #undef directives to define and undefined preprocessor symbols.

·        Use the #if, #elif, #else, and #endif directives to determine what code is included in the program depending on which preprocessor symbols are defined.

·        Use #warning and #error to add warnings and errors to the Error List.

·        Use #line to change a line number and optionally the name of the file as reported in errors.

·        Use #region and #endregion to make collapsible code regions.

·        Use #pragma warning disable number and #pragma warning restore number to disable and restore warnings.

·        The DEBUG and TRACE compiler constants are predefined. Normally, DEBUG is defined in debug builds, and TRACE is defined in debug and release builds.

·        Calls to Debug and Trace class methods are ignored if the symbols DEBUG and TRACE are not defined, respectively.

·        Useful Debug and Trace methods include Assert, Fail, Flush, Indent, Unindent, Write, WriteIf, WriteLine, and WriteLineIf.

·        You can add listeners to the Debug and Trace objects. Standard listeners write messages to the Output window, event logs, and text files.

Program Database files

·        You need a PDB file to debug a compiled executable.

Instrumenting applications

·        Tracing means instrumenting the program to trace its progress. You can use Debug and Trace for tracing.

·        Logging means recording key events. Methods for logging include writing into a text file, using Debug and Trace with a listener that writes into a text file, and writing in an event log.

·        Profiling means gathering information about a program to study characteristics such as speed and memory usage. Methods for profiling include using a profiler, instrumenting the code by hand, and using performance counters.

Review of Key Terms

assertion A piece of code that makes a particular claim about the data and that throws an exception if that claim is false. In C# you can use the System.Diagnostics.Debug.Assert method to make assertions.

character class A regular expression construction that represents a set of characters to match.

conditional compilation constant A predefined symbol created by Visual Studio that you can use with the #if, #elif, #else, and #endif directives to determine what code is included in the program. These include DEBUG and TRACE, which are normally defined in debug and release builds, respectively.

data validation Program code that verifies that a data value such as a string entered by the user makes sense. For example, the program might require that a value be nonblank, that a monetary value be a valid value such as $12.34 not “ten,” or that an e-mail address contain the @ symbol.

escape sequence A sequence of characters that have special meaning, for example, in a regular expression.

inline options Options set in a regular expression by using the syntax (?imnsx).

instrumenting Adding features to a program to study the program itself.

logging The process of instrumenting a program, so it records key events.

pattern A regular expression used for matching parts of a string.

performance counter A system-wide counter used to track some type of activity on the computer.

profiler An automated tool that gathers performance data for a program by instrumenting its code or by sampling.

profiling The process of instrumenting a program to study its speed, memory, disk usage, or other performance characteristics.

regular expression An expression in a regular expression language that defines a pattern to match. Regular expressions let a program match patterns and make replacements in strings.

sanity check A test on data to see if the data makes sense. For example, if a user enters the cost of a ream of paper as $1e10.00, that might be a typographical error, and the user may have meant $100.00. Sometimes the user might actually have intended an unusual value, so the program must decide whether to reject the value or ask the user whether the value is correct.

tracing The process of instrumenting a program so that you can track what it is doing.

EXAM TIPS AND TRICKS

The Review of Key Terms and the Cheat Sheet for this chapter can be printed off to help you study. You can find these files in the ZIP file for this chapter at www.wrox.com/remtitle.cgi?isbn=1118612094 on the Download Code tab.