Extract Sub string from String using Regex Extract Sub string from String using Regex shell shell

Extract Sub string from String using Regex


We can use re.findall and then slice the result to get the first and third matches:

import restring = 'Hello, "How" are "you" What "are" you "doing?"'result = re.findall('".+?"', string)[1::2]print(result)

Here, the regex matches any number of characters contained within double quote marks, but tries to match as few as possible (a non-greedy match), otherwise we would end up with one single match, "How" are "you" What "are" you "doing?".

Output:

['"you"', '"doing?"']

If you want to combine them without the quote marks, you can use str.strip along with str.join:

print(' '.join(string.strip('"') for string in result))

Output:

you doing?

An alternative method would be to just split on ":

result = string.split('"')[1::2][1::2]print(result)

Output:

['you', 'doing?']

This works because, if you separate the string by double quote marks, then the output will be as follows:

  1. Everything before the first double quote
  2. Everything after the first double quote and before the second
  3. Everything after the second double quote and before the third...

This means that we can take every even element to get the ones that are in quotes. We can then just slice the result again to get the 2nd and 4th results.


Regex only solution. May not be 100% accurate since it matches every second occurrence rather than just the 2nd and 4th, but it works for the example.

"[^"]+"[^"]+("[^"]+")

Demonstration in JS:

var str = 'Hello, "How" are "you" What "are" you "doing?"';var regex = /"[^"]+"[^"]+("[^"]+")/gmatch = regex.exec(str);while (match != null) {   // matched text: match[0]   // match start: match.index   // capturing group n: match[n]  console.log(match[1])  match = regex.exec(str);}


We can try using re.findall to extract all quoted terms. Then, build a string using only even entries in the resulting list:

input = "Hello, \"How\" are \"you\" What \"are\" you \"doing?\""matches = re.findall(r'\"([^"]+)\"', input)matches = matches[1::2]output = " ".join(matches)print(output)you doing?