站内搜索: 请输入搜索关键词
当前页面: 图书首页 > Java Regular Expressions: Taming the java.util.regex Engine

FAQs - Java Regular Expressions: Taming the java.util.regex Engine

Previous Section Next Section

FAQs

Q:?/b>

What does the pattern <((?i)TITLE>)(.*?)</\1 break down to?

The answer is given in Table 4-1 . Of particular interest is the subgroup (.*?) . Notice that this is a reluctant qualifier, thus it will only match as little as possible before seeing the next -title- element. The difference here is that given -title-first title-/title--title-second title--/title- , the pattern will only extract first title . However, without the reluctant qualifier, it would extract first  title-/title--title-second title- .  The Pattern -((?i)TITLE-)(.*?)-/(/1)     Regex   Description    * In English: Extract the contents of first occurrence of the TITLE element and be willing to match any case version of TITLE , including Title , title , and so on.  - The character - followed by ( A group consisting of (?i) A case-insensitive comparison of T The character T followed by I The character I followed by T The character T followed by L The character L followed by E The character E followed by - The character -  ) Close group ( Followed by a group consisting of . Any character * Repeated any number of times ? Matched reluctantly ) Close group, followed by - The character - followed by / The character / followed by \1 The first group, which matched (?i)TITLE-

Q:?/b>

How do I know if my regex is too complex?

The first goal of any regex pattern is, of course, that it works accurately and efficiently enough. The second goal is that it be legible. How do you know if it's legible? My advice is comment it with as much detail as you feel it needs, and then pass it to a few developers who are likely to have to decipher it. If they follow it (or better yet, if they're able to modify it), then it's probably clear enough. If not, then you may want to consider refactoring.

Answers

A:?/p>

The answer is given in Table 4-1. Of particular interest is the subgroup (.*?). Notice that this is a reluctant qualifier, thus it will only match as little as possible before seeing the next <title> element. The difference here is that given <title>first title</title><title>second title></title>, the pattern will only extract first title. However, without the reluctant qualifier, it would extract first title</title><title>second title>.
Table 4-1: The Pattern <((?i)TITLE>)(.*?)</(/1)

Regex

Description

<

The character < followed by

(

A group consisting of

(?i)

A case-insensitive comparison of

T

The character T followed by

I

The character I followed by

T

The character T followed by

L

The character L followed by

E

The character E followed by

>

The character >

)

Close group

(

Followed by a group consisting of

.

Any character

*

Repeated any number of times

?

Matched reluctantly

)

Close group, followed by

<

The character < followed by

/

The character / followed by

\1

The first group, which matched (?i)TITLE>

* In English: Extract the contents of first occurrence of the TITLE element and be willing to match any case version of TITLE, including Title, title, and so on.

A:?/p>

The first goal of any regex pattern is, of course, that it works accurately and efficiently enough. The second goal is that it be legible. How do you know if it's legible? My advice is comment it with as much detail as you feel it needs, and then pass it to a few developers who are likely to have to decipher it. If they follow it (or better yet, if they're able to modify it), then it's probably clear enough. If not, then you may want to consider refactoring.


Previous Section Next Section