espressione regolare (di un linguaggio)
e' un procedimento applicabile ad ogni stringa, con un risultato binario si/no, interpretato come: la stringa ha la struttura descritta dall'espressione, oppure no.
alfabeto di un'espressione regolare i caratteri con cui viene formulata.
linguaggio di un'espressione regolare l'insieme delle stringhe che la soddisfano.
regex | matches, generate |
---|---|
a|b | a b |
b* | ε "b" "bb" "bbb" ... |
b+ | "b" "bb" "bbb" ... |
a.b | axb x is any character |
ab*c | "ac" "abc" "abbc" "abbbc" ... |
ab+c | "abc", "abbc" "abbbc" ... |
a*bc* | the set of all strings consisting of: arbitrarily many "a"s,
followed by a single "b", followed by arbitrarily many "c"s |
(ab)* | ε ab abab ababab ... |
(a|b)(a|b) | aa, ab, ba, bb |
(a|b)* | ε, a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, bab, bba, bbb, … |
Most formalisms provide
A vertical bar separates alternatives. For example, gray|grey can match "gray" or "grey".
Parentheses are used to define the scope and precedence of the operators (among other uses). For example, gray|grey and gr(a|e)y are equivalent patterns which both describe the set of "gray" or "grey".
A quantifier after an element (such as a token, character, or group) specifies how many times the preceding element is allowed to repeat.
? | question mark indicates 0 or 1 occurrences of the preceding element. ex: colou?r matches both "color" and "colour". |
---|---|
* | asterisk (derived from the Kleene star) indicates 0 or more
occurrences of the preceding element.
ex: ab*c matches "ac", "abc", "abbc", "abbbc", and so on. |
+ | plus sign (Kleene plus) indicates 1 or more occurrences of the
preceding element. ex: ab+c matches "abc", "abbc", "abbbc", ... , but not "ac". |
. | full stop is wildcard ex: a.b matches any string that contains an "a", and then any character and then "b" ex: a.*b matches any string that contains an "a", and then the character "b" at some later point. |
{n} | The preceding item is matched exactly n times |
1968 Regular expressions entered popular use in pattern matching in a text editor.
“Regular expressions aren’t random jumbles of punctuation—they’re carefully thought-out jumbles of punctuation!” -- The Perl Cookbook
puo' essere definito in 3 modi
Grammar | Grammar |
---|---|
A → ε A → aA |
A → a A → aA |
Productions | Productions |
ε aA → a aA → aaA → aa aA → aaA → aaaA → aaa aA → aaA → aaaA → aaaaA → aaaa ecc ... |
a aA → aa aA → aaA → aaa aA → aaA → aaaA → aaaa aA → aaA → aaaA → aaaaA → aaaaa ecc ... |
Language | Language |
L= {an | n≥0 } | L= {an | n>0 } |
regex | regex |
a* | a+ |
quello che voglio notare e': la produzione A → aA
A → ε
A → a
A → aB
parte sx: solo 1 non terminale, come per Context-free grammar.
parte dx: max 1 non terminale, sempre a dx; o 1 terminale, o stringa vuota.
Grammar | Grammar |
---|---|
A → a A → aA A → b A → bA |
A → a A → aA A → bB B → bB B → ε |
Productions | Productions |
a b
aA → aa aA → ab bA → ba bA → bb
aA → aaA → aaa aA → abA → aba bA → baA → baa bA → bbA → bba
aA → aaA → aab aA → abA → abb bA → baA → bab bA → bbA → bbb
ecc ... |
A → a A → bB → b
A → aA → aa A → aA → abB → ab A → bB → bbB → bb
A → aA → aaA → aaa A → aA → aaA → aabB → aab A → aA → abB → abbB → abb A → bB → bbB → bbbB → bbb
A → aA → aaA → aaaA → aaaa A → aA → aaA → aaaA → aaabB → aaab A → aA → aaA → aabB → aabbB → aabb A → aA → abB → abbB → abbbB → abbb A → bB → bbB → bbbB → bbbbB → bbbb
ecc ... |
regex | regex |
(a|b)+ | a*b* |
o un nr qualsiasi di volte
Grammar | Grammar |
---|---|
A → xB B → yC C → z |
A → xB B → yC C → zA C → z |
Productions | Productions |
A → xB → xyC → xyz | A → xB → xyC → xyzA →
xyzxB → xyzxyC → xyzxyz |
regex | regex |
xyz | (xyz)+ |
per poter scrivere la grammatica in modo piu' compatto, si introduce una notazione piu' compatta-potente, che pero' genera gli stessi linguaggi. Analogo a BNF Backus–Naur Form . EBNF Extended BNF.
2 regole
A → w A, B non-terminal, w∈Σ*
A → wB
In particolare e' possibile scrivere una produzione del tipo
A → B
Grammar | Grammar | Grammar | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
A → a A → aA
|
|
|
||||||||
Language | Language | Language | ||||||||
L= {an
| n>0 } |
L= {anbm | n,m>0 } |
L= {aibjck | i,j,k>0 } |
||||||||
regex | regex | regex | ||||||||
a+ | a+b+ | a+b+c+ |
Progressivo
Grammar | Language | regex |
---|---|---|
A → ε | L= {ε} | |
A → aA | L= {an | n≥0 } | a* |
A → B
B → ε |
idem | |
B → bB | L= {anbm | n,m≥0 } | a*b* |
B → C
C → ε |
idem | |
C → cC | L= {aibjck | i,j,k≥0 } | a*b*c* |
Grammar | Grammar | Language | regex |
---|---|---|---|
A → ε A → aA B → bC |
A → ε A → aA B → bC |
L= {aibck | i,k≥0 } |
a*bc* |
Grammar | Language | regex |
---|---|---|
A1 → aA2 A2 → aA3 A3 → a |
L= {a3} | a{3} |
A1 → aA2 A2 → aB1 B1 → bB2 B3 → bC1 C1 → c |
L= {a2b3c} | a{2}b{3}c |
wp/Regular_expression#Formal_definition
Pattern recognition algorithms generally aim to provide a reasonable answer for all possible inputs and to perform "most likely" matching of the inputs, taking into account their statistical variation.
es: riconoscere la foto di un cane.
pattern matching algorithms look for exact matches in the input with pre-existing patterns.
es: pattern-matching by regex.
espressione regolare (di un linguaggio) descrive un tipo di struttura di stringhe tramite un procedimento che permette di calcolare se una qualsiasi stringa ha o no tale struttura.
A → ε A → aA L= {an |n≥0 }
regex a* |
A → ε A → aA
B → ε L= {anbm
regex a*b*
|
A → ε A → aA A → B B → ε B → C C → ε L= {aibjck
regex a*b*c* |
Grammar | Grammar | Grammar |
---|---|---|
A → a A → aA L= {an | n>0 }
regex a+ |
A → a A → aA
B → b L= {anbm
regex a+b+
|
A → a A → aA A → B B → b B → C C → c L= {aibjck
regex a+b+c+ |