Converting a MatchCollection to string array Converting a MatchCollection to string array arrays arrays

Converting a MatchCollection to string array


Try:

var arr = Regex.Matches(strText, @"\b[A-Za-z-']+\b")    .Cast<Match>()    .Select(m => m.Value)    .ToArray();


Dave Bish's answer is good and works properly.

It's worth noting although that replacing Cast<Match>() with OfType<Match>() will speed things up.

Code wold become:

var arr = Regex.Matches(strText, @"\b[A-Za-z-']+\b")    .OfType<Match>()    .Select(m => m.Groups[0].Value)    .ToArray();

Result is exactly the same (and addresses OP's issue the exact same way) but for huge strings it's faster.

Test code:

// put it in a console applicationstatic void Test(){    Stopwatch sw = new Stopwatch();    StringBuilder sb = new StringBuilder();    string strText = "this will become a very long string after my code has done appending it to the stringbuilder ";    Enumerable.Range(1, 100000).ToList().ForEach(i => sb.Append(strText));    strText = sb.ToString();    sw.Start();    var arr = Regex.Matches(strText, @"\b[A-Za-z-']+\b")              .OfType<Match>()              .Select(m => m.Groups[0].Value)              .ToArray();    sw.Stop();    Console.WriteLine("OfType: " + sw.ElapsedMilliseconds.ToString());    sw.Reset();    sw.Start();    var arr2 = Regex.Matches(strText, @"\b[A-Za-z-']+\b")              .Cast<Match>()              .Select(m => m.Groups[0].Value)              .ToArray();    sw.Stop();    Console.WriteLine("Cast: " + sw.ElapsedMilliseconds.ToString());}

Output follows:

OfType: 6540Cast: 8743

For very long strings Cast() is therefore slower.


I ran the exact same benchmark that Alex has posted and found that sometimes Cast was faster and sometimes OfType was faster, but the difference between both was negligible. However, while ugly, the for loop is consistently faster than both of the other two.

Stopwatch sw = new Stopwatch();StringBuilder sb = new StringBuilder();string strText = "this will become a very long string after my code has done appending it to the stringbuilder ";Enumerable.Range(1, 100000).ToList().ForEach(i => sb.Append(strText));strText = sb.ToString();//First two benchmarkssw.Start();MatchCollection mc = Regex.Matches(strText, @"\b[A-Za-z-']+\b");var matches = new string[mc.Count];for (int i = 0; i < matches.Length; i++){    matches[i] = mc[i].ToString();}sw.Stop();

Results:

OfType: 3462Cast: 3499For: 2650