.net - How can I specify an optional capture group in this RegEx? -


Can I recover a file extension option to correct this with a regular expression?

I am trying to match a string with an optional component, but something looks wrong.

  * (header_ \ d {: wire is being matched by a printer log)  


My RegEx (.NET Taste) As follows.10,11} _). * (_. * _ \ D {8}). * (\. \ W {3,4}). * ---------------- ---------------------------. * # Do not ignore some garbage in the ahead (Header_ # File Name Beginning Match, \ d {10,11} _) ID (including # 11 - 11 digits) * Do not ignore type code in # * _ \ D {8}) # matches some random characters, then 8-digit date. * # Do not ignore anything between this and file extension (\. \ W {3,4}) # Match file extension, 3 or 4 characters long. * Ignore the rest of the string #


I hope it will match the string:

  str1 = "header_0000000602_t_mc2e1nrobr1a3s55niyrrqvy_20081212 [1 ] .doc [compatibility mode] "str2 =" Microsoft PowerPoint - header_00000000076_d_al41zguyvgqfj2454jki5l55_20071203 [1] .txt "Str3 =" header_0000000 0076_d_al41zguyvgqfj2454jki5l55_20071203 [1] "[/ code>  


Where something like captcha groups Return:

  $ 1 = header_0000000602_ $ 2 = _mc2e1nrobr1a3s55niyrrqvy_20081212 $ 3 = .doc  


$ 3 empty if no file extension is found It is possible. $ 3 is an optional part, as you can see in the above straws.

If I "?" At the end of the third capture group "(. \ W {3,4})?", RegEx no longer captures $ 3 for any string. If I add "+" instead of "(. \ W {3,4}) ', the regular expression is no longer str3 exactly, which captures to be expected.

I feel like using "? "The third occupation is the right thing to do at the end of the group, but it probably does not work as I expected." * "I'm very naïve with classes that I use to ignore parts of the string.


Does not work as expected:

  . * (Header_ \ d * _). * (_. * _. {8}). * (\. \ W {3,4}). *  

There is a possibility that the last . is greedy, you can try to change it:

 . * (Header_ \ d * _). * (_. * _. {8}). * (? \. \ W {3,4}) .. ^ Added that  

This was not correct, it was provided by you The first one will match the input, but it assumes that the first . is the beginning of a file extension:

 . * (Header_ \ d * _). * (_ . * _. {8}) [^.] * (\. \ W {3,4}). *  

Edit: Remove from running in.


Comments