Just revising on the previous part we have the following relation. Here $Lang$ denote the set of languages defined by the given set of grammars.
- $LL(0) \subset LL(1) \subset LL(2) \dots LL(k) \subset LL(k+1)\dots$
- $Lang(LL(0)) \subset Lang(LL(1)) \subset Lang(LL(2)) \dots Lang(LL(k)) \subset Lang(LL(k+1))\dots$
- $LR(0) \subset LR(1) \subset LR(2) \dots LR(k) \subset LR(k+1)\dots$
- $Lang(LR(0)) \subset Lang(LR(1)) = Lang(LR(2)) \dots Lang(LR(k)) = Lang(LR(k+1))\dots$ ($Lang(LR(0)$ is DCFL with prefix property and $Lang(LR(k)); k \geq 1$ is DCFL (with or without prefix property))
- $Lang(LR(0)\$) = Lang(LR(1)) = Lang(LR(2)) \dots Lang(LR(k)) = Lang(LR(k+1))\dots$ ($Lang(LR(0)\$$ is DCFL as when we add an end delimiter no string becomes a prefix of another)
Coming to LR parsing we have $3$ main types:
- Simple LR or SLR(1) – Can parse a superset of LR(0) grammars but a proper subset of LR(1) grammars
- LALR – Can parse a superset of grammars that can be parsed by SLR(1) parsers but a proper subset of LR(1) grammars
- LR(1) or Canonical LR(1) – Can parse any LR(1) grammar
Now we will see more about these:
LR(0) parser is quite uncommon because it is quite restrictive (not in terms of the language but interms of the grammars it can parse). By adding an end delimiter we can remove the prefix-free restriction of $LR(0)$ set of languages (when an end delimiter is added, no string becomes a prefix of any other string) but even then it does not allow $\epsilon$ production for a symbol if that symbol is dervining anything else. Still $LR(0)$ parser is the basic LR parser and will help us understand how LR parsers work in general. SLR(1), CLR(1) and LALR(1) are all extensions to $LR(0)$ parser.
Lets consider a grammar
- $E’ \to E$
- $E \to E + T $
- $E \to T $
- $T \to (E) $
- $T \to id$
Now, when we parse a string say “id + id”, at each step we will either Shift or Reduce. For example:
$id + id$ (Shift “id”)
$\quad => T + id$ Reduce $(T \to id)$
$\quad => E + id$ Reduce $(T \to E)$ Shift “+”
$\quad=> E + T$ Reduce $(T \to id)$
$\quad=> E$ Reduce $(E \to E + T)$
$\quad=> E’$ Reduce $(E’ \to E)$
Now, to get these steps programmatically lets construct a DFA. The states of the DFA are the LR(0) items and they are made from the Grammar productions. The formal procedure is given here but the as you can see below, the construction is pretty intuitive. For the start production add a “$\bullet$” to the leftmost point on RHS. Then, if a non-terminal follows, add its production too to the item. Rest of the procedure can even be inferred from the below diagram.
As you can see in the diagram
- the states of the DFA are the LR(0) items
- the transitions are for each grammar symbol (both terminals and non-terminals)
- If we consider all states as accepting, the DFA above is accepting the set of all viable prefixes
- Any item with a $\bullet$ at end of the production is accepting handles.
Now, how we will use this DFA to parse a given string? Say “id+id”. It is not the case that a given string will take the DFA to accepting state – because the given grammar is context-free and not necessarily regular. So, we will need something EXTRA which is a stack. Lets see how this STACK helps.
- We always start in the $I_0$ state. Since “id” comes in the input we
- GOTO $I_4$ (see DFA)
- PUSH $I_4.$ (Pushing on Stack which is used later for coming back when we have a REDUCE move)
- SHIFT “id”.
- Stack becomes $I_0 I_4$
- In $I_4$ $\bullet$ is at the end of the production and so we
- REDUCE by $T \to id.$
$\bullet$ is at end of a production in $I_4$. So, in the DFA we go back (going back requires a stack) one step (POP one item from stack which is $I_4$ and we get to $I_0$ (since RHS of production has one symbol we POP only one symbol from stack)
- Since we did a REDUCE, we will move to the state given by $GOTO (I_0, T)$ $(T$ is the LHS of the REDUCE move) which is $I_2$ (the transition on $T$ from $I_0$ is $I_2.$)
- Stack is $I_0 I_2$
- REDUCE by $T \to id.$
- Again in $I_2$ we have a $\bullet$ at end of the production and so we reduce by $E \to T$ and go back to $I_0$ and then to $I_1$ as the transition on $E$ from $I_0$ is $I_1.$
- Stack goes to $I_0 I_1$
- PS: In $I_1$ $E’$ being the start symbol, $E’ \to E \bullet$ will cause a reduce move only for the end symbol $\$.$ And so there is no Shift-Reduce conflict here.
- Stack is $I_0I_1$ only
- Next input symbol is “+” and so we SHIFT “+” and move to $I_5.$
- Stack is $I_0I_1I_5$
- Next input symbol is “id” and we go to $I_4$ by shifting “id”.
- Stack is $I_0I_1I_5I_4$
- In $I_4$ $\bullet$ is at the end of the production and so we reduce by $T \to id.$
- We come back to $I_5$ and on $T$ (the LHS of production used for Reduce) we move to $I_8.$
- Stack is $I_0I_1I_5I_8$
- In $I_8$ $\bullet$ is at the end of the production $E \to E + T \bullet$ and so we reduce by $E \to E+ T \bullet $
- Here the RHS of the production HAS 3 symbols and so we need to pop $3$ states from the stack (basically traverse back 3 steps in the DFA along the path we came)
- Stack becomes $I_0$
- Since we reduced by $E \to E + T$ we have to go to $GOTO(I_0, E)$ which is $I_1$
- Stack becomes $I_0I_1$
- In $I_1$ “$\$$” comes and we reduce by the Start production $E’ \to E$ and that signals “Accept”.
- That’s it.
If this part is thorough, the next $3$ parsers $SLR(1), CLR(1)$ and $LALR(1)$ will be quite straightforward.