C# tutorials
22. Some useful text methods
Adding and removing whitespace

Note: While reading this page, bear in mind that I'm no computer expert and that the text below may be partly inaccurate. If you find errors or have proposals for improvements, please send me a message and help make this a better page for the benefit of future visitors. To the left, there are links to more C# tutorials.


To complete this tutorial, follow these instructions:

1. Open Visual C# 2010 Express.

2. Click on New Project in the File menu.

3. Choose Windows Forms Application if that option isn't already chosen, change the name if you like and click on OK.

4. Point at Toolbox in the left margin and click on the Auto Hide icon at the top of the window (to keep the window to stay open and easier to work with).

5. Drag five Buttons and a RichTextBox to Form 1.

6. Double-click on button1 and write or paste this code where the caret is:

richTextBox1.Text = mkCompact(richTextBox1.Text);

7. Double-click on button2 and write or paste this code where the caret is:

richTextBox1.Text = mkSigns(richTextBox1.Text);

8. Double-click on button3 and write or paste this code where the caret is:

richTextBox1.Text = mkWords(richTextBox1.Text);

9. Double-click on button4 and write or paste this code where the caret is:

richTextBox1.Text = mkLines(richTextBox1.Text);

10. Double-click on button5 and write or paste this code where the caret is:

richTextBox1.Text = mkLinesInd(richTextBox1.Text);

11. Write or paste this code right below using System.Windows.Forms;:

using System.Text.RegularExpressions;

12. Write or paste this code right above private void button1_Click(object sender, EventArgs e):

private string mkRegexReplace(string str, string[] Old, string New)
{
  for (int i = 0; i < Old.Length; i++)
  {
    str = Regex.Replace(str, Old[i], New);
  }
  return str;
  }

private string mkCompact(string str)
{
  str = str.Trim();
  return Regex.Replace(str, @"\s+", "");
}

private string mkSigns(string str)
{
  str = Regex.Replace(str, @"\s+", "");
  str = Regex.Replace(str, @"(\S)", "$1 ");
  return str.Trim();
}

private string mkWords(string str)
{
  str = str.Trim();
  return Regex.Replace(str, @"\s+", " ");
}

private string mkLines(string str)
{
  string New = Environment.NewLine;
  string[] Old = { "\f", "\n", "\r", "\v", "\x0085", "\x2028", "\x2029", @"\s+" + New, New + @"\s+" };
  str = str.Trim();
  return mkRegexReplace(str, Old, New);
}

private string mkLinesInd(string str)
{
  string New = Environment.NewLine;
  string[] Old = { "\f", "\n", "\r", "\v", "\x0085", "\x2028", "\x2029", @"\s+" + New, New + @"\s" };
  str = str.TrimEnd();
  return mkRegexReplace(str, Old, New);
}

13. Press F5 to start debugging the program.

14. Write or paste this text to the textbox:

   A poem by Mats Kristiansson   
       
   Form constellation:   
      an early attempt   
         that in its pure simplicity has failed   
      or a timeless masterpiece?   

Keep the whitespace to see how the methods work.

15. Click on button1. The text should change to:

ApoembyMatsKristianssonFormconstellation:anearlyattemptthatinitspuresimplicityhasfailedoratimelessmasterpiece?

16. Clear the textbox, write or paste the text again and click on button2. The text should change to:

A p o e m b y M a t s K r i s t i a n s s o n F o r m c o n s t e l l a t i o n : a n e a r l y a t t e m p t t h a t i n i t s p u r e s i m p l i c i t y h a s f a i l e d o r a t i m e l e s s m a s t e r p i e c e ?

17. Clear the textbox, write or paste the text again and click on button3. The text should change to:

A poem by Mats Kristiansson Form constellation: an early attempt that in its pure simplicity has failed or a timeless masterpiece?

18. Clear the textbox, write or paste the text again and click on button4. The text should change to:

A poem by Mats Kristiansson
Form constellation:
an early attempt
that in its pure simplicity has failed
or a timeless masterpiece?

19. Clear the textbox, write or paste the text again and click on button5. The text should change to:

   A poem by Mats Kristiansson
   Form constellation:
      an early attempt
         that in its pure simplicity has failed
      or a timeless masterpiece?

General comments

I developed these methods some years ago in PHP and find them very useful. The C# and PHP source code for the methods differ slightly, but the functionality is exactly the same as far as I know; there may be some slight difference.

Some people may argue that these methods aren't that useful and that you could just as easily write these short pieces of code again whenever you need to. I'm not one of those people, obviously. I think it's much easier and much more reliable to use some rather well tested piece of code, like one of these methods, every time I need some functionality instead of trying and recall or finding information on how to write some regular expression and what not that.

Comment on code snippet one to five

Add code that will start one of the five methods when a button is clicked.

Comment on the sixth code snippet

A Regex needs the namespace System.Text.RegularExpressions to work.

Comment on the sixth code snippet

Line 1-8: See the C# tutorials 14. Replace with two arrays and 15. Replace with array and string.

Line 10-14: Remove all whitespace.

Line 16-21: Remove all whitespace and add a space after each sign.

Line 23-27: Remove all whitespace except a space after each word.

Line 29-35: Remove all empty lines and whitespace first and last in each line.

Line 37-43: Remove all empty lines and whitespace last in each line but keep whitespace first in each line.


Line 12, 20, 25 and 33: Trim() removes whitespace first and last in the string.

Line 13, 18, 26, 32 and 40: \s is any whitespace character. The + means 1 or more.

Line 19: \S is any character that's not a whitespace character, in this case any character in the string, because all whitespace characters have been removed in line 18. $1 is a reference to the first group in the replacement string, in this case (\S).

Line 31 and 39: Environment.NewLine gets the newline string for the current environment. According to the page Environment.NewLine Property on msdn.microsoft.com, Environment.NewLine is a string containing \r\n for non-Unix platforms, or a string containing \n for Unix platforms.

Line 32 and 40: \f, \n, \r, \v, \x0085, \x2028 and \x2029 are all newline characters listed on the page Unicode Newline Guidelines on unicode.org (beside \r\n, that's included in the array as \r and \n). \f is formfeed (FF), \n is new line a.k.a. line feed (LF), \r is carriage return (CR), \v is vertical tab, \x0085 is next line (NEL), \x2028 is line separator (LS), and \x2029 is paragraph separator (PS). \x0085, \x2028 and \x2029 are unicode characters in hexadecimal notation. It seems there's no shorthand for these characters, that you can use in Visual C# 2010 Express; I may be wrong, though. This is how these seven characters are rendered in some text editors:

Editor\f\n\r\v\x0085\x2028\x2029
mkNotepad 2010eoleoleoleolspaceeoleol
Notepad 2FFeoleolVT---·---
Notepad++FFeoleolVT---·---
Programmer's NotepadFFeoleolVT---·---
PSPadeoleol---eol---
TED Notepad---------·---
tsWebEditoreoleol---eol---
Windows Notepad------------

Codes used in the table: eol = rendered as a physical linebreak; space = rendered as a space; FF, VT = rendered as a symbol; --- = invisible.

Line 34: Replace the seven newline characters with the newline string for the current environment, and remove all empty lines and whitespace first and last in each line.

Line 41: TrimEnd() removes whitespace last in the string.

Line 42: Replace the seven newline characters with the newline string for the current environment, and remove all empty lines and whitespace last in each line but keep whitespace first in each line.


Would you like to comment on this page or some other page? Use the contact form. Write the title of the page you want to comment on and your comment in the field Message.