antlr4: syntax ambiguity, left recursion, all?
Kode Charlie
My syntax looks like below and it doesn't compile. The error returned (from the antlr4 maven plugin) is:
[INFO] --- antlr4-maven-plugin:4.3:antlr4 (default-cli) @ beebell ---
[INFO] ANTLR 4: Processing source directory /Users/kodecharlie/workspace/beebell/src/main/antlr4
[INFO] Processing grammar: DateRange.g4
org\antlr\v4\parse\GrammarTreeVisitor.g: node from line 13:87 mismatched tree node: startTime expecting <UP>
org\antlr\v4\parse\GrammarTreeVisitor.g: node from after line 13:87 mismatched tree node: RULE expecting <UP>
org\antlr\v4\parse\GrammarTreeVisitor.g: node from line 13:87 mismatched tree node: startTime expecting <UP>
org\antlr\v4\parse\GrammarTreeVisitor.g: node from after line 13:87 mismatched tree node: RULE expecting <UP>
org\antlr\v4\parse\GrammarTreeVisitor.g: node from line 13:87 mismatched tree node: startTime expecting <UP>
org\antlr\v4\parse\GrammarTreeVisitor.g: node from after line 13:87 mismatched tree node: RULE expecting <UP>
org\antlr\v4\parse\GrammarTreeVisitor.g: node from line 13:87 mismatched tree node: startTime expecting <UP>
org\antlr\v4\parse\GrammarTreeVisitor.g: node from after line 13:87 mismatched tree node: RULE expecting <UP>
[ERROR] error(20): internal error: Rule HOUR undefined
[ERROR] error(20): internal error: Rule MINUTE undefined
[ERROR] error(20): internal error: Rule SECOND undefined
[ERROR] error(20): internal error: Rule HOUR undefined
[ERROR] error(20): internal error: Rule MINUTE undefined
I can see that the syntax can be confused - for example, is the 2 digit number MINUTE, SECOND or HOUR (or maybe the start of the year). But some articles suggest that this error is caused by left recursion.
Can you tell me what happened?
thanks. Here is the syntax:
grammar DateRange;
range : startDate (THRU endDate)? | 'Every' LONG_DAY 'from' startDate THRU endDate ;
startDate : dateTime ;
endDate : dateTime ;
dateTime : GMTOFF | SHRT_MDY | YYYYMMDD | (WEEK_DAY)? LONG_MDY ;
// Dates.
GMTOFF : YYYYMMDD 'T' HOUR ':' MINUTE ':' SECOND ('-'|'+') HOUR ':' MINUTE ;
YYYYMMDD : YEAR '-' MOY '-' DOM ;
SHRT_MDY : MOY ('/' | '-') DOM ('/' | '-') YEAR ;
LONG_MDY : (SHRT_MNTH '.'? | LONG_MNTH) WS DOM ','? (WS YEAR (','? WS TIMESPAN)? | WS startTime)? ;
YEAR : DIGIT DIGIT DIGIT DIGIT ; // year
MOY : (DIGIT | DIGIT DIGIT) ; // month of year.
DOM : (DIGIT | DIGIT DIGIT) ; // day of month.
TIMESPAN : startTime (WS THRU WS endTime)? ;
// Time-of-day.
startTime : TOD ;
endTime : TOD ;
TOD : NOON | HOUR2 (':' MINUTE)? WS? MERIDIAN ;
NOON : 'noon' ;
HOUR2 : (DIGIT | DIGIT DIGIT) ;
MERIDIAN : 'AM' | 'am' | 'PM' | 'pm' ;
// 24-hour clock. Sanity-check range in listener.
HOUR : DIGIT DIGIT ;
MINUTE : DIGIT DIGIT ;
SECOND : DIGIT DIGIT ;
// Range verb.
THRU : WS ('-'|'to') WS -> skip ;
// Weekdays.
WEEK_DAY : (SHRT_DAY | LONG_DAY) ','? WS ;
SHRT_DAY : 'Sun' | 'Mon' | 'Tue' | 'Wed' | 'Thu' | 'Fri' | 'Sat' -> skip ;
LONG_DAY : 'Sunday' | 'Monday' | 'Tuesday' | 'Wednesday' | 'Thursday' | 'Friday' | 'Saturday' -> skip ;
// Months.
SHRT_MNTH : 'Jan' | 'Feb' | 'Mar' | 'Apr' | 'May' | 'Jun' | 'Jul' | 'Aug' | 'Sep' | 'Oct' | 'Nov' | 'Dec' ;
LONG_MNTH : 'January' | 'February' | 'March' | 'April' | 'May' | 'June' | 'July' | 'August' | 'September' | 'October' | 'November' | 'December' ;
DIGIT : [0-9] ;
WS : [ \t\r\n]+ -> skip ;
Kode Charlie
I solved this problem by setting a unique production rule for each sequence of numbers (length 1, 2, 3 or 4). Again, I've simplified several rules - in fact, trying to make the alternative to production rules more straightforward. Anyway, here is the final result of the compilation:
grammar DateRange;
range : 'Every' WS longDay WS 'from' WS startDate THRU endDate
| startDate THRU endDate
| startDate
;
startDate : dateTime ; endDate : dateTime ; dateTime : utc
| shrtMdy
| yyyymmdd
| longMdy
| weekDay ','? WS longMdy
;
// Dates.
utc : yyyymmdd 'T' hour ':' minute ':' second ('-'|'+') hour ':' minute ;
yyyymmdd : year '-' moy '-' dom ;
shrtMdy : moy ('/' | '-') dom ('/' | '-') year ;
longMdy : longMonth WS dom ','? optYearAndOrTime?
| shrtMonth '.'? WS dom ','? optYearAndOrTime?
;
optYearAndOrTime : WS year ','? WS timespan
| WS year
| WS timespan
;
fragment DIGIT : [0-9] ;
ONE_DIGIT : DIGIT ;
TWO_DIGITS : DIGIT ONE_DIGIT ;
THREE_DIGITS : DIGIT TWO_DIGITS ;
FOUR_DIGITS : DIGIT THREE_DIGITS ;
year : FOUR_DIGITS ; // year
moy : ONE_DIGIT | TWO_DIGITS ; // month of year.
dom : ONE_DIGIT | TWO_DIGITS ; // day of month.
timespan : (tod THRU tod) | tod ;
// Time-of-day.
tod : noon | (hour2 (':' minute)? WS? meridian?) ;
noon : 'noon' ; hour2 : ONE_DIGIT | TWO_DIGITS ;
meridian : ('AM' | 'am' | 'PM' | 'pm' | 'a.m.' | 'p.m.') ;
// 24-hour clock. Sanity-check range in listener.
hour : TWO_DIGITS ;
minute : TWO_DIGITS ;
second : TWO_DIGITS ; // we do not use seconds.
// Range verb.
THRU : WS? ('-'|'–'|'to') WS? ;
// Weekdays.
weekDay : shrtDay | longDay ; shrtDay : 'Sun' | 'Mon' | 'Tue' | 'Wed' | 'Thu' | 'Fri' | 'Sat' ; longDay : 'Sunday' | 'Monday' | 'Tuesday' | 'Wednesday' | 'Thursday' | 'Friday' | 'Saturday' ;
// Months.
shrtMonth : 'Jan' | 'Feb' | 'Mar' | 'Apr' | 'May' | 'Jun' | 'Jul' | 'Aug' | 'Sep' | 'Oct' | 'Nov' | 'Dec' ;
longMonth : 'January' | 'February' | 'March' | 'April' | 'May' | 'June' | 'July' | 'August' | 'September' | 'October' | 'November' | 'December' ;
WS : ~[a-zA-Z0-9,.:]+ ;