Discussion 12: Regular Expressions & BNF

Files: disc12.pdf

This is an online worksheet that you can work on during discussions. Your work is not graded and you do not need to submit anything.

BNF

Backus-Naur Form (BNF) is a syntax for describing a context-free grammar. It was invented for describing the syntax of programming languages, and is still commonly used in documentation and language parsers. EBNF is a dialect of BNF which contains some convenient shorthands.

An EBNF grammar contains symbols and a set of recursive production rules. In 61A, we are using the Python Lark library to write EBNF grammars, which has a few specific rules for grammar writing.

There are two types of symbols: Non-terminal symbols can expand into non-terminals (including themselves) or terminals. In the Python Lark library, non-terminal symbols are always lowercase. Terminal symbols can be strings or regular expressions. In Lark, terminals are always uppercase.

Consider these two production rules:

numbers: INTEGER | numbers "," INTEGER
INTEGER: /-?\d+/

The symbol numbers is a non-terminal with a recursive production rule. It corresponds to either an INTEGER terminal or to the numbers symbol (itself) plus a comma plus an INTEGER terminal. The INTEGER terminal is defined using a regular expression which matches any number of digits with an optional - sign in front.

This grammar can describe strings like:

10
10,-11
10,-11,12

And so on, with any number of integers in front.

A grammar should also specify a start symbol, which corresponds to the whole expression being parsed (or the whole sentence, for a spoken language).

For the simple example of comma-separated numbers, the start symbol could just be the numbers terminal itself:

?start: numbers
numbers: numbers "," INTEGER | INTEGER
INTEGER: /-?\d+/

EBNF grammars can use these shorthand notations for specifying how many symbols to match:

EBNF Notation Meaning Pure BNF Equivalent
item\* Zero or more items items: \| items item
item+ One or more items items: item \| items item
[item]
item?
Optional item optitem: \| item

Lark also includes a few handy features:

  • You can specify tokens to complete ignore by using the ignore directive at the bottom of a grammar. For example, %ignore /\s+/ ignores all whitespace (tabs/spaces/new lines).
  • You can import pre-defined terminals for common types of data to match. For example, %import common.NUMBER imports a terminal that matches any integer or decimal number.

Q1: lambda BNF

We’ve written a simple BNF grammar to handle lambda expressions. The body of our lambda has to consist of a single expression, which can be a number, word, or another lambda expression.

?start: lambda_expression
lambda_expression: "lambda " arguments ":" body
arguments: WORD ("," WORD)*
body: expression
?expression: value | lambda_expression
?value: WORD | NUMBER

%import common.WORD
%import common.NUMBER
%ignore /\s+/

For each of the given examples, draw the resulting tree created by this BNF.

lark> lambda x: 5
lark> lambda x, y: x
lark> lambda x: lambda y: x

Regular Expressions

Q6: Email Domain Validator

Create a regular expression that makes sure a given string email is a valid email address and that its domain name is in the provided list of domains.

An email address is valid if it contains letters, number, or underscores, followed by an @ symbol, then a domain.

All domains will have a 3 letter extension following the period.

Hint: For this problem, you will have to make a regex pattern based on the inputs domains. A for loop can help with that.

Extra: There is a particularly elegant solution that utilizes join and replace instead of a for loop.

Note: The skeleton code is just a suggestion; feel free to use your own structure if you prefer.

import re
def email_validator(email, domains):
"""
>>> email_validator("oski@berkeley.edu", ["berkeley.edu", "gmail.com"])
True
>>> email_validator("oski@gmail.com", ["berkeley.edu", "gmail.com"])
True
>>> email_validator("oski@berkeley.com", ["berkeley.edu", "gmail.com"])
False
>>> email_validator("oski@berkeley.edu", ["yahoo.com"])
False
>>> email_validator("xX123_iii_OSKI_iii_123Xx@berkeley.edu", ["berkeley.edu", "gmail.com"])
True
>>> email_validator("oski@oski@berkeley.edu", ["berkeley.edu", "gmail.com"])
False
>>> email_validator("oski@berkeleysedu", ["berkeley.edu", "gmail.com"])
False
"""

pattern = _____________________
for _____________________:
'Use as many lines as necessary'
return bool(re.search(pattern, email))

© 2023 Brigham Young University, All Rights Reserved