Well lets start from the basic of Bottom up Parsing
Consider the grammar
- $S \to XX$
- $X \to aX \mid b$
Now, consider a string in $L(S)$ say aabb. We can parse it as follows by left most derivation – replacing the left most non-terminal in each step, or right most derivation – replacing the rightmost non-terminal in each step.
$S \to XX$ $\quad \to aX X$ $\quad \to aaXX$ $\quad \to aabX$ $\quad \to aabb$ |
$S \to XX$ $\quad \to Xb$ $\quad \to aXb$ $\quad \to aaXb$ $\quad \to aabb$ |
Here, the rightmost-derivation is used in bottom-up parsing. We will soon come back to it.
- Any string derivable from the start symbol is a sentential form — it becomes a sentence if it contains only terminals
- A sentential form that occurs in the leftmost derivation of some sentence is called left-sentential form
- A sentential form that occurs in the rightmost derivation of some sentence is called right-sentential form
Consider the string aabb again. We will see a method to parse this:
- Scan the input from left to right and see if any substring matches the RHS of any production. If so, replace that substring by the LHS of the production.
So, for “aabb” we do as follows
- ‘a’ “abb” – No RHS matches ‘a’
- ‘aa’ “bb” – No RHS matches ‘a’ or ‘aa’
- ‘aab’ “b” – RHS of $X \to b$ matches ‘b’ (first b from left) and so we write as
- aaXb
- ‘aaX’ “b” – RHS of $X \to aX$ matches “aX” and so we write as
- aXb – Again RHS of $X \to aX$ matches “aX” and we get
- Xb – RHS of $X \to b$ matches “b” and we get
- XX – RHS of $S \to XX$ matches XX and we get
- S – the start symbol.
Now what we did here is nothing but a bottom-up parsing. Bottom-up because we started from the string and not from the grammar. Here, we applied a sequence of reductions, which are as follows:
- $aabb \to aaXb \to aXb \to Xb \to XX \to S$
If we go up and see the Rightmost derivation of the string “aabb”, what we got is the same but in REVERSE order. i.e., our bottom up parsing is doing reverse of the RIGHTMOST derivation. So, we can call it an $LR$ parser – $L$ for scanning the input from Left side and $R$ for doing a Rightmost derivation.
In our parsing we substituted the RHS of a production at each step. This substituted “substring” is called a HANDLE and are shown in BOLD below.
- aabb $\to$ aaXb $\to$ aXb $\to$ Xb $\to$ XX $\to$ S
Formally a handle is defined as (Greek letters used to denote a string of terminals and non-terminals)
A handle of a right sentential form ‘$\gamma$’ $(\gamma = \alpha \delta \beta$) is a production $E \to \delta$ and a position in $\gamma$ where $\delta$ can be found and substituted by $E$ to get the previous step in the right most derivation of $\gamma$ — previous and not “next” because we are doing rightmost derivation in REVERSE. Handle can be given as a production or just the RHS of a production.
The handle is not necessarily starting from the left most position as clear from the above example. There is importance to the input string which occurs to the left of the handle. For example for the handles of “aabb”, we can have the following set of prefixes
aabb | $\{a, aa, aab\}$ |
aaXb | $\{a, aa, aaX\}$ |
aXb | $\{a, aX\}$ |
Xb | $\{X, Xb\}$ |
XX | $\{X, XX\}$ |
These set of prefixes are called Viable Prefixes. Formally
Viable prefixes are the prefixes of right sentential forms that do not extend beyond the end of its handle.
i.e., a viable prefix either has no handle or just one possible handle on the extreme RIGHT which can be reduced.
We will see later that viable prefixes can also be defined as the set of prefixes of the right-sentential form that can appear on the stack of a shift-reduce parser. Also, the set of all viable prefixes of the right sentential forms of a grammar is a REGULAR LANGUAGE. i.e., viable prefixes can be recognized by using a FINITE AUTOMATA. Using this FINITE AUTOMATA and a stack we get the power of a Push Down Automata and that is how we can parse context-free languages.
In the next part we will see how to construct the above FINITE AUTOMATA and how our stack works for different LR parsers.
Reference: Bottom Up Parsing
Compiler Design Preparation Resources for GATE CSE