Discussion:
FYI: default %printer/%destructor
(too old to reply)
Joel E. Denny
2006-07-29 05:53:55 UTC
Permalink
I committed the following, which implements default %printer's and
%destructor's. I'll get to per-types later.

Joel

Index: ChangeLog
===================================================================
RCS file: /sources/bison/bison/ChangeLog,v
retrieving revision 1.1539
diff -p -u -r1.1539 ChangeLog
--- ChangeLog 29 Jul 2006 04:11:32 -0000 1.1539
+++ ChangeLog 29 Jul 2006 05:34:37 -0000
@@ -1,5 +1,49 @@
2006-07-29 Joel E. Denny <***@ces.clemson.edu>

+ Enable declaration of default %printer/%destructor. Make the parser
+ use these for all user-declared grammar symbols for which the user does
+ not declare a specific %printer/%destructor. Thus, the parser uses it
+ for token 0 if the user declares it but not if Bison generates it as
+ $end. Discussed starting at
+ <http://lists.gnu.org/archive/html/bison-patches/2006-02/msg00064.html>,
+ <http://lists.gnu.org/archive/html/bison-patches/2006-06/msg00091.html>,
+ and
+ <http://lists.gnu.org/archive/html/bison-patches/2006-07/msg00019.html>.
+ * NEWS (2.3+): Mention.
+ * doc/bison.texinfo (Actions in Mid-Rule): It's no longer impossible to
+ declare a %destructor for a mid-rule's semantic value. It's just
+ impossible to declare one specific to it.
+ (Freeing Discarded Symbols): Mention that @$ can be used in %destructor
+ code. Describe default %destructor form.
+ * src/parse-gram.y (grammar_declaration): Parse default
+ %printer/%destructor declarations.
+ * src/output.c (symbol_destructors_output): Use symbol_destructor_get
+ and symbol_destructor_location_get rather than accessing the destructor
+ and destructor_location members of struct symbol.
+ (symbol_printers_output): Likewise but for %printer's.
+ * src/reader.c (symbol_should_be_used): Likewise but for %destructor's
+ again.
+ * src/symtab.c (default_destructor, default_destructor_location,
+ default_printer, default_printer_location): New static global
+ variables to record the default %destructor and %printer.
+ (symbol_destructor_get, symbol_destructor_location_get,
+ symbol_printer_get, symbol_printer_location_get): New functions to
+ compute the appropriate %destructor and %printer for a symbol.
+ (default_destructor_set, default_printer_set): New functions to set the
+ default %destructor and %printer.
+ * src/symtab.h: Prototype all those new functions.
+ * tests/actions.at (Default %printer and %destructor): New test to
+ check that the right %printer and %destructor are called, that they're
+ not called for $end, and that $$ and @$ work correctly.
+ (Default %printer and %destructor for user-declared end token): New
+ test to check that the default %printer and %destructor are called for
+ a user-declared end token.
+ * tests/input.at (Default %printer and %destructor redeclared, Unused
+ values with default %destructor): New tests to check related grammar
+ warnings and errors.
+
+2006-07-29 Joel E. Denny <***@ces.clemson.edu>
+
Clean up handling of %destructor for the end token (token 0).
Discussed starting at
<http://lists.gnu.org/archive/html/bison-patches/2006-07/msg00019.html>
Index: NEWS
===================================================================
RCS file: /sources/bison/bison/NEWS,v
retrieving revision 1.155
diff -p -u -r1.155 NEWS
--- NEWS 9 Jul 2006 20:36:32 -0000 1.155
+++ NEWS 29 Jul 2006 05:34:37 -0000
@@ -12,6 +12,23 @@ Changes in version 2.3+:
* Locations columns and lines start at 1.
In accordance with the GNU Coding Standards and Emacs.

+* You may now declare a default %destructor and %printer:
+
+ For example:
+
+ %union { char *string; }
+ %token <string> STRING1
+ %token <string> STRING2
+ %type <string> string1
+ %type <string> string2
+ %destructor { free ($$); }
+ %destructor { free ($$); printf ("%d", @$.first_line); } STRING1 string1
+
+ guarantees that, when the parser discards any user-declared symbol, it passes
+ its semantic value to `free'. However, when the parser discards a `STRING1'
+ or a `string1', it also prints its line number to `stdout'. It performs only
+ the second `%destructor' in this case, so it invokes `free' only once.
+
* Except for LALR(1) parsers in C with POSIX Yacc emulation enabled (with `-y',
`--yacc', or `%yacc'), Bison no longer generates #define statements for
associating token numbers with token names. Removing the #define statements
Index: doc/bison.texinfo
===================================================================
RCS file: /sources/bison/bison/doc/bison.texinfo,v
retrieving revision 1.198
diff -p -u -r1.198 bison.texinfo
--- doc/bison.texinfo 10 Jul 2006 00:37:25 -0000 1.198
+++ doc/bison.texinfo 29 Jul 2006 05:34:41 -0000
@@ -3348,8 +3348,8 @@ it might discard the previous semantic c
restoring it.
Thus, @code{$<context>5} needs a destructor (@pxref{Destructor Decl, , Freeing
Discarded Symbols}).
-However, Bison currently provides no means to declare a destructor for a
-mid-rule action's semantic value.
+However, Bison currently provides no means to declare a destructor specific to
+a particular mid-rule action's semantic value.

One solution is to bury the mid-rule action inside a nonterminal symbol and to
declare a destructor for that symbol:
@@ -4007,26 +4007,40 @@ symbol is automatically discarded.
Invoke the braced @var{code} whenever the parser discards one of the
@var{symbols}.
Within @var{code}, @code{$$} designates the semantic value associated
-with the discarded symbol. The additional parser parameters are also
-available (@pxref{Parser Function, , The Parser Function
-@code{yyparse}}).
+with the discarded symbol, and @code{@@$} designates its location.
+The additional parser parameters are also available (@pxref{Parser Function, ,
+The Parser Function @code{yyparse}}).
+@end deffn
+
+@deffn {Directive} %destructor @{ @var{code} @}
+@cindex default %destructor
+Invoke the braced @var{code} whenever the parser discards any user-declared
+grammar symbol for which the user has not specifically declared any
+@code{%destructor}.
+This is known as the default @code{%destructor}.
+As in the previous form, @code{$$}, @code{@@$}, and the additional parser
+parameters are available.
@end deffn

For instance:

@smallexample
-%union
-@{
- char *string;
-@}
-%token <string> STRING
-%type <string> string
-%destructor @{ free ($$); @} STRING string
+%union @{ char *string; @}
+%token <string> STRING1
+%token <string> STRING2
+%type <string> string1
+%type <string> string2
+%destructor @{ free ($$); @}
+%destructor @{ free ($$); printf ("%d", @@$.first_line); @} STRING1 string1
@end smallexample

@noindent
-guarantees that when a @code{STRING} or a @code{string} is discarded,
-its associated memory will be freed.
+guarantees that, when the parser discards any user-declared symbol, it passes
+its semantic value to @code{free}.
+However, when the parser discards a @code{STRING1} or a @code{string1}, it also
+prints its line number to @code{stdout}.
+It performs only the second @code{%destructor} in this case, so it invokes
+@code{free} only once.

@sp 1

Index: src/output.c
===================================================================
RCS file: /sources/bison/bison/src/output.c,v
retrieving revision 1.251
diff -p -u -r1.251 output.c
--- src/output.c 9 Jul 2006 20:36:33 -0000 1.251
+++ src/output.c 29 Jul 2006 05:34:41 -0000
@@ -390,7 +390,7 @@ symbol_destructors_output (FILE *out)

fputs ("m4_define([b4_symbol_destructors], \n[", out);
for (i = 0; i < nsyms; ++i)
- if (symbols[i]->destructor)
+ if (symbol_destructor_get (symbols[i]))
{
symbol *sym = symbols[i];

@@ -399,10 +399,12 @@ symbol_destructors_output (FILE *out)
destructor, optional typename. */
fprintf (out, "%s[", sep);
sep = ",\n";
- escaped_output (out, sym->destructor_location.start.file);
- fprintf (out, ", %d, ", sym->destructor_location.start.line);
+ escaped_output (out, symbol_destructor_location_get (sym).start.file);
+ fprintf (out, ", %d, ",
+ symbol_destructor_location_get (sym).start.line);
escaped_output (out, sym->tag);
- fprintf (out, ", %d, [[%s]]", sym->number, sym->destructor);
+ fprintf (out, ", %d, [[%s]]", sym->number,
+ symbol_destructor_get (sym));
if (sym->type_name)
fprintf (out, ", [[%s]]", sym->type_name);
fputc (']', out);
@@ -423,7 +425,7 @@ symbol_printers_output (FILE *out)

fputs ("m4_define([b4_symbol_printers], \n[", out);
for (i = 0; i < nsyms; ++i)
- if (symbols[i]->printer)
+ if (symbol_printer_get (symbols[i]))
{
symbol *sym = symbols[i];

@@ -432,10 +434,10 @@ symbol_printers_output (FILE *out)
printer, optional typename. */
fprintf (out, "%s[", sep);
sep = ",\n";
- escaped_output (out, sym->printer_location.start.file);
- fprintf (out, ", %d, ", sym->printer_location.start.line);
+ escaped_output (out, symbol_printer_location_get (sym).start.file);
+ fprintf (out, ", %d, ", symbol_printer_location_get (sym).start.line);
escaped_output (out, sym->tag);
- fprintf (out, ", %d, [[%s]]", sym->number, sym->printer);
+ fprintf (out, ", %d, [[%s]]", sym->number, symbol_printer_get (sym));
if (sym->type_name)
fprintf (out, ", [[%s]]", sym->type_name);
fputc (']', out);
Index: src/parse-gram.y
===================================================================
RCS file: /sources/bison/bison/src/parse-gram.y,v
retrieving revision 1.87
diff -p -u -r1.87 parse-gram.y
--- src/parse-gram.y 13 Jul 2006 20:05:34 -0000 1.87
+++ src/parse-gram.y 29 Jul 2006 05:34:43 -0000
@@ -261,6 +261,10 @@ grammar_declaration:
{
grammar_start_symbol_set ($2, @2);
}
+| "%destructor" "{...}"
+ {
+ default_destructor_set (translate_symbol_action ($2, @2), @2);
+ }
| "%destructor" "{...}" symbols.1
{
symbol_list *list;
@@ -277,6 +281,10 @@ grammar_declaration:
symbol_printer_set (list->sym, action, @2);
symbol_list_free ($3);
}
+| "%printer" "{...}"
+ {
+ default_printer_set (translate_symbol_action ($2, @2), @2);
+ }
| "%default-prec"
{
default_prec = true;
Index: src/reader.c
===================================================================
RCS file: /sources/bison/bison/src/reader.c,v
retrieving revision 1.266
diff -p -u -r1.266 reader.c
--- src/reader.c 29 Jul 2006 04:11:33 -0000 1.266
+++ src/reader.c 29 Jul 2006 05:34:43 -0000
@@ -259,7 +259,7 @@ grammar_current_rule_begin (symbol *lhs,
static bool
symbol_should_be_used (symbol_list const *s)
{
- return (s->sym->destructor
+ return (symbol_destructor_get (s->sym)
|| (s->midrule && s->midrule->used));
}

Index: src/symtab.c
===================================================================
RCS file: /sources/bison/bison/src/symtab.c,v
retrieving revision 1.73
diff -p -u -r1.73 symtab.c
--- src/symtab.c 7 Jul 2006 21:25:03 -0000 1.73
+++ src/symtab.c 29 Jul 2006 05:34:43 -0000
@@ -41,6 +41,15 @@ symbol *accept = NULL;
symbol *startsymbol = NULL;
location startsymbol_location;

+/*-----------------------------------.
+| Default %destructor and %printer. |
+`-----------------------------------*/
+
+static const char *default_destructor = NULL;
+static location default_destructor_location;
+static const char *default_printer = NULL;
+static location default_printer_location;
+
/*---------------------------------.
| Create a new symbol, named TAG. |
`---------------------------------*/
@@ -147,6 +156,33 @@ symbol_destructor_set (symbol *sym, cons
}
}

+/*---------------------------------------.
+| Get the computed %destructor for SYM. |
+`---------------------------------------*/
+
+const char *
+symbol_destructor_get (symbol *sym)
+{
+ /* Token 0 cannot have a %destructor unless the user renames it. */
+ if (UNIQSTR_EQ (sym->tag, uniqstr_new ("$end")))
+ return NULL;
+
+ if (sym->destructor != NULL)
+ return sym->destructor;
+ return default_destructor;
+}
+
+/*---------------------------------------------------------------.
+| Get the grammar location of the %destructor computed for SYM. |
+`---------------------------------------------------------------*/
+
+location
+symbol_destructor_location_get (symbol *sym)
+{
+ if (sym->destructor != NULL)
+ return sym->destructor_location;
+ return default_destructor_location;
+}

/*---------------------------------------------------------------.
| Set the PRINTER associated with SYM. Do nothing if passed 0. |
@@ -164,6 +200,34 @@ symbol_printer_set (symbol *sym, const c
}
}

+/*------------------------------------.
+| Get the computed %printer for SYM. |
+`------------------------------------*/
+
+const char *
+symbol_printer_get (symbol *sym)
+{
+ /* Token 0 cannot have a %printer unless the user renames it. */
+ if (UNIQSTR_EQ (sym->tag, uniqstr_new ("$end")))
+ return NULL;
+
+ if (sym->printer != NULL)
+ return sym->printer;
+ return default_printer;
+}
+
+/*------------------------------------------------------------.
+| Get the grammar location of the %printer computed for SYM. |
+`------------------------------------------------------------*/
+
+location
+symbol_printer_location_get (symbol *sym)
+{
+ if (sym->printer != NULL)
+ return sym->printer_location;
+ return default_printer_location;
+}
+

/*-----------------------------------------------------------------.
| Set the PRECEDENCE associated with SYM. Does nothing if invoked |
@@ -666,3 +730,32 @@ symbols_pack (void)
_("the start symbol %s is a token"),
startsymbol->tag);
}
+
+
+/*-----------------------------------.
+| Set default %destructor/%printer. |
+`-----------------------------------*/
+
+void
+default_destructor_set (const char *destructor, location loc)
+{
+ if (default_destructor != NULL)
+ {
+ complain_at (loc, _("redeclaration for default %%destructor"));
+ complain_at (default_destructor_location, _("previous declaration"));
+ }
+ default_destructor = destructor;
+ default_destructor_location = loc;
+}
+
+void
+default_printer_set (const char *printer, location loc)
+{
+ if (default_printer != NULL)
+ {
+ complain_at (loc, _("redeclaration for default %%printer"));
+ complain_at (default_printer_location, _("previous declaration"));
+ }
+ default_printer = printer;
+ default_printer_location = loc;
+}
Index: src/symtab.h
===================================================================
RCS file: /sources/bison/bison/src/symtab.h,v
retrieving revision 1.62
diff -p -u -r1.62 symtab.h
--- src/symtab.h 27 Jun 2006 14:09:54 -0000 1.62
+++ src/symtab.h 29 Jul 2006 05:34:43 -0000
@@ -61,16 +61,35 @@ struct symbol
/** The location of its first occurrence. */
location location;

- /** Its %type and associated printer and destructor. */
+ /** Its \c \%type. */
uniqstr type_name;
+ /** Its \c \%type's location. */
location type_location;

- /** Does not own the memory. */
+ /** Any \c \%destructor declared specifically for this symbol.
+
+ Access this field only through <tt>symbol</tt>'s interface functions. For
+ example, if <tt>symbol::destructor = NULL</tt>, the default
+ \c \%destructor or a per-type \c \%destructor might be appropriate, and
+ \c symbol_destructor_get will compute the correct one. */
const char *destructor;
+
+ /** The location of \c symbol::destructor.
+
+ Access this field only through <tt>symbol</tt>'s interface functions.
+ \sa symbol::destructor */
location destructor_location;

- /** Printer. */
+ /** Any \c \%printer declared specifically for this symbol.
+
+ Access this field only through <tt>symbol</tt>'s interface functions.
+ \sa symbol::destructor */
const char *printer;
+
+ /** The location of \c symbol::printer.
+
+ Access this field only through <tt>symbol</tt>'s interface functions.
+ \sa symbol::destructor */
location printer_location;

symbol_number number;
@@ -125,9 +144,25 @@ void symbol_type_set (symbol *sym, uniqs
/** Set the \c destructor associated with \c sym. */
void symbol_destructor_set (symbol *sym, const char *destructor, location loc);

+/** Get the computed \c \%destructor for \c sym, or \c NULL if none. */
+const char *symbol_destructor_get (symbol *sym);
+
+/** Get the grammar location of the computed \c \%destructor for \c sym.
+
+ \pre <tt>symbol_destructor_get (sym) != NULL</tt> */
+location symbol_destructor_location_get (symbol *sym);
+
/** Set the \c printer associated with \c sym. */
void symbol_printer_set (symbol *sym, const char *printer, location loc);

+/** Get the computed \c \%printer for \c sym, or \c NULL if none. */
+const char *symbol_printer_get (symbol *sym);
+
+/** Get the grammar location of the computed \c \%printer for \c sym.
+
+ \pre <tt>symbol_printer_get (sym) != NULL</tt> */
+location symbol_printer_location_get (symbol *sym);
+
/* Set the \c precedence associated with \c sym.

Ensure that \a symbol is a terminal.
@@ -155,7 +190,7 @@ extern symbol *accept;

/** The user start symbol. */
extern symbol *startsymbol;
-/** The location of the \c %start declaration. */
+/** The location of the \c \%start declaration. */
extern location startsymbol_location;


@@ -181,4 +216,15 @@ void symbols_check_defined (void);
#token_translations. */
void symbols_pack (void);

+
+/*-----------------------------------.
+| Default %destructor and %printer. |
+`-----------------------------------*/
+
+/** Set the default \c \%destructor. */
+void default_destructor_set (const char *destructor, location loc);
+
+/** Set the default \c \%printer. */
+void default_printer_set (const char *printer, location loc);
+
#endif /* !SYMTAB_H_ */
Index: tests/actions.at
===================================================================
RCS file: /sources/bison/bison/tests/actions.at,v
retrieving revision 1.61
diff -p -u -r1.61 actions.at
--- tests/actions.at 29 Jul 2006 04:11:33 -0000 1.61
+++ tests/actions.at 29 Jul 2006 05:34:43 -0000
@@ -575,3 +575,202 @@ AT_CHECK_PRINTER_AND_DESTRUCTOR([%define

AT_CHECK_PRINTER_AND_DESTRUCTOR([%glr-parser])
AT_CHECK_PRINTER_AND_DESTRUCTOR([%glr-parser], [with union])
+
+
+
+
+
+## --------------------------------- ##
+## Default %printer and %destructor. ##
+## --------------------------------- ##
+
+# Check that the right %printer and %destructor are called, that they're not
+# called for $end, and that $$ and @$ work correctly.
+
+AT_SETUP([Default %printer and %destructor])
+
+AT_DATA_GRAMMAR([[input.y]],
+[[%error-verbose
+%debug
+%locations
+%initial-action {
+ @$.first_line = @$.last_line = 1;
+ @$.first_column = @$.last_column = 1;
+}
+
+%{
+# include <stdio.h>
+# include <stdlib.h>
+ static void yyerror (const char *msg);
+ static int yylex (void);
+# define USE(SYM)
+%}
+
+%printer {
+ fprintf (yyoutput, "Default printer for '%c' @ %d", $$, @$.first_column);
+}
+%destructor {
+ fprintf (stdout, "Default destructor for '%c' @ %d.\n", $$, @$.first_column);
+}
+
+%printer {
+ fprintf (yyoutput, "'b'/'c' printer for '%c' @ %d", $$, @$.first_column);
+} 'b' 'c'
+%destructor {
+ fprintf (stdout, "'b'/'c' destructor for '%c' @ %d.\n", $$, @$.first_column);
+} 'b' 'c'
+
+%%
+
+start: 'a' 'b' 'c' 'd' 'e' { $$ = 'S'; USE(($1, $2, $3, $4, $5)); } ;
+
+%%
+
+static int
+yylex (void)
+{
+ static const char *input = "abcd";
+ static int column = 1;
+ yylval = *input++;
+ yylloc.first_line = yylloc.last_line = 1;
+ yylloc.first_column = yylloc.last_column = column++;
+ return yylval;
+}
+
+static void
+yyerror (const char *msg)
+{
+ fprintf (stderr, "%s\n", msg);
+}
+
+int
+main (void)
+{
+ yydebug = 1;
+ return yyparse ();
+}
+]])
+
+AT_CHECK([bison -o input.c input.y])
+AT_COMPILE([input])
+AT_PARSER_CHECK([./input], 1,
+[[Default destructor for 'd' @ 4.
+'b'/'c' destructor for 'c' @ 3.
+'b'/'c' destructor for 'b' @ 2.
+Default destructor for 'a' @ 1.
+]],
+[[Starting parse
+Entering state 0
+Reading a token: Next token is token 'a' (1.1-1.1: Default printer for 'a' @ 1)
+Shifting token 'a' (1.1-1.1: Default printer for 'a' @ 1)
+Entering state 1
+Reading a token: Next token is token 'b' (1.2-1.2: 'b'/'c' printer for 'b' @ 2)
+Shifting token 'b' (1.2-1.2: 'b'/'c' printer for 'b' @ 2)
+Entering state 3
+Reading a token: Next token is token 'c' (1.3-1.3: 'b'/'c' printer for 'c' @ 3)
+Shifting token 'c' (1.3-1.3: 'b'/'c' printer for 'c' @ 3)
+Entering state 5
+Reading a token: Next token is token 'd' (1.4-1.4: Default printer for 'd' @ 4)
+Shifting token 'd' (1.4-1.4: Default printer for 'd' @ 4)
+Entering state 6
+Reading a token: Now at end of input.
+syntax error, unexpected $end, expecting 'e'
+Error: popping token 'd' (1.4-1.4: Default printer for 'd' @ 4)
+Stack now 0 1 3 5
+Error: popping token 'c' (1.3-1.3: 'b'/'c' printer for 'c' @ 3)
+Stack now 0 1 3
+Error: popping token 'b' (1.2-1.2: 'b'/'c' printer for 'b' @ 2)
+Stack now 0 1
+Error: popping token 'a' (1.1-1.1: Default printer for 'a' @ 1)
+Stack now 0
+Cleanup: discarding lookahead token $end (1.5-1.5: )
+Stack now 0
+]])
+
+AT_CLEANUP
+
+
+
+
+
+## ------------------------------------------------------------- ##
+## Default %printer and %destructor for user-declared end token. ##
+## ------------------------------------------------------------- ##
+
+AT_SETUP([Default %printer and %destructor for user-declared end token])
+
+AT_DATA_GRAMMAR([[input.y]],
+[[%error-verbose
+%debug
+%locations
+%initial-action {
+ @$.first_line = @$.last_line = 1;
+ @$.first_column = @$.last_column = 1;
+}
+
+%{
+# include <stdio.h>
+# include <stdlib.h>
+ static void yyerror (const char *msg);
+ static int yylex (void);
+# define USE(SYM)
+%}
+
+%token END 0
+%printer {
+ fprintf (yyoutput, "Default printer for '%c' @ %d", $$, @$.first_column);
+}
+%destructor {
+ fprintf (stdout, "Default destructor for '%c' @ %d.\n", $$, @$.first_column);
+}
+
+%%
+
+start: { $$ = 'S'; } ;
+
+%%
+
+static int
+yylex (void)
+{
+ yylval = 'E';
+ yylloc.first_line = yylloc.last_line = 1;
+ yylloc.first_column = yylloc.last_column = 1;
+ return 0;
+}
+
+static void
+yyerror (const char *msg)
+{
+ fprintf (stderr, "%s\n", msg);
+}
+
+int
+main (void)
+{
+ yydebug = 1;
+ return yyparse ();
+}
+]])
+
+AT_CHECK([bison -o input.c input.y])
+AT_COMPILE([input])
+AT_PARSER_CHECK([./input], 0,
+[[Default destructor for 'E' @ 1.
+Default destructor for 'S' @ 1.
+]],
+[[Starting parse
+Entering state 0
+Reducing stack by rule 1 (line 37):
+-> $$ = nterm start (1.1-1.1: Default printer for 'S' @ 1)
+Stack now 0
+Entering state 1
+Reading a token: Now at end of input.
+Shifting token END (1.1-1.1: Default printer for 'E' @ 1)
+Entering state 2
+Stack now 0 1 2
+Cleanup: popping token END (1.1-1.1: Default printer for 'E' @ 1)
+Cleanup: popping nterm start (1.1-1.1: Default printer for 'S' @ 1)
+]])
+
+AT_CLEANUP
Index: tests/input.at
===================================================================
RCS file: /sources/bison/bison/tests/input.at,v
retrieving revision 1.50
diff -p -u -r1.50 input.at
--- tests/input.at 9 Jul 2006 20:36:33 -0000 1.50
+++ tests/input.at 29 Jul 2006 05:34:43 -0000
@@ -168,6 +168,65 @@ AT_CHECK_UNUSED_VALUES([1])
AT_CLEANUP


+## --------------------------------------------- ##
+## Default %printer and %destructor redeclared. ##
+## --------------------------------------------- ##
+
+AT_SETUP([Default %printer and %destructor redeclared])
+
+AT_DATA([[input.y]],
+[[%destructor { destroy ($$); }
+%printer { destroy ($$); }
+
+%destructor { destroy ($$); }
+%printer { destroy ($$); }
+
+%%
+
+start: ;
+
+%destructor { destroy ($$); };
+%printer { destroy ($$); };
+]])
+
+AT_CHECK([bison input.y], [1], [],
+[[input.y:4.13-29: redeclaration for default %destructor
+input.y:1.13-29: previous declaration
+input.y:5.10-26: redeclaration for default %printer
+input.y:2.10-26: previous declaration
+input.y:11.13-29: redeclaration for default %destructor
+input.y:4.13-29: previous declaration
+input.y:12.10-26: redeclaration for default %printer
+input.y:5.10-26: previous declaration
+]])
+
+AT_CLEANUP
+
+
+## ---------------------------------------- ##
+## Unused values with default %destructor. ##
+## ---------------------------------------- ##
+
+AT_SETUP([Unused values with default %destructor])
+
+AT_DATA([[input.y]],
+[[%destructor { destroy ($$); }
+
+%%
+
+start: end end { $1; } ;
+end: { } ;
+]])
+
+AT_CHECK([bison input.y], [0], [],
+[[input.y:5.8-22: warning: unset value: $$
+input.y:5.8-22: warning: unused value: $2
+input.y:6.6-8: warning: unset value: $$
+]])
+
+AT_CLEANUP
+
+
## ---------------------- ##
## Incompatible Aliases. ##
## ---------------------- ##
Akim Demaille
2006-09-03 14:45:50 UTC
Permalink
> 2006-07-29 Joel E. Denny <***@ces.clemson.edu>
>
> + Enable declaration of default %printer/%destructor. Make the parser
> + use these for all user-declared grammar symbols for which the
> user does
> + not declare a specific %printer/%destructor. Thus, the parser
> uses it
> + for token 0 if the user declares it but not if Bison generates it as
> + $end.

Nice work!
Joel E. Denny
2006-09-03 20:46:41 UTC
Permalink
On Sun, 3 Sep 2006, Akim Demaille wrote:

>
> > 2006-07-29 Joel E. Denny <***@ces.clemson.edu>
> >
> > + Enable declaration of default %printer/%destructor. Make the parser
> > + use these for all user-declared grammar symbols for which the user
> > does
> > + not declare a specific %printer/%destructor. Thus, the parser uses it
> > + for token 0 if the user declares it but not if Bison generates it as
> > + $end.
>
> Nice work!

Thanks.

I'm beginning to think the declaration is a bit ugly though. How about
something like this instead:

%destructor { free ($$); } %symbol-default
%printer { fprintf (yyoutput, "%s", $$); } %symbol-default

I think that's clearer than just an empty list. Unfortunately, it is yet
another % declaration, but it may have other uses: see here where I called
it %any instead:

http://lists.gnu.org/archive/html/bison-patches/2006-08/msg00033.html

What do you think?

Joel
Joel E. Denny
2006-09-04 19:29:32 UTC
Permalink
On Sun, 3 Sep 2006, Joel E. Denny wrote:

> I'm beginning to think the declaration is a bit ugly though. How about
> something like this instead:
>
> %destructor { free ($$); } %symbol-default
> %printer { fprintf (yyoutput, "%s", $$); } %symbol-default
>
> I think that's clearer than just an empty list.

I committed the following to implement %symbol-default. It shouldn't be
tough to revert if people find it objectionable.

Joel

Index: ChangeLog
===================================================================
RCS file: /sources/bison/bison/ChangeLog,v
retrieving revision 1.1555
diff -p -u -r1.1555 ChangeLog
--- ChangeLog 24 Aug 2006 01:26:07 -0000 1.1555
+++ ChangeLog 4 Sep 2006 19:25:47 -0000
@@ -1,3 +1,30 @@
+2006-09-04 Joel E. Denny <***@ces.clemson.edu>
+
+ Require default %destructor/%printer to be declared using
+ %symbol-default instead of an empty symbol list, and start working on
+ new per-type %destructor/%printer. Discussed at
+ <http://lists.gnu.org/archive/html/bison-patches/2006-09/msg00007.html>.
+ * NEWS (2.3+): Add %symbol-default to example.
+ * bison.texinfo (Freeing Discarded Symbols): Likewise.
+ (Bison Symbols): Add entry for %symbol-default.
+ * src/parse-gram.y (PERCENT_SYMBOL_DEFAULT): New token.
+ (generic_symlist, generic_symlist_item): New nonterminals for creating
+ a list in which each item is a symbol, semantic type, or
+ %symbol-default.
+ (grammar_declaration): Use generic_symlist in %destructor and %printer
+ declarations instead of symbols.1 or an empty list.
+ (symbol_declaration, precedence_declaration, symbols.1): Update actions
+ for changes to symbol_list.
+ * src/reader.c: Update for changes to symbol_list.
+ * src/scan-code.l: Likewise.
+ * src/scan-gram.l: Scan new PERCENT_SYMBOL_DEFAULT token.
+ * src/symlist.c, src/symlist.h: Extend such that a list node may
+ represent a semantic type or a %symbol-default in addition to just an
+ ordinary symbol. Add switched functions for setting %destructor's and
+ %printer's.
+ * tests/actions.at, tests/input.at: Add %symbol-default to all default
+ %destructor/%printer declarations.
+
2006-08-23 Joel E. Denny <***@ces.clemson.edu>

Whether the default %destructor/%printer applies to a particular symbol
Index: NEWS
===================================================================
RCS file: /sources/bison/bison/NEWS,v
retrieving revision 1.157
diff -p -u -r1.157 NEWS
--- NEWS 24 Aug 2006 01:26:07 -0000 1.157
+++ NEWS 4 Sep 2006 19:25:47 -0000
@@ -21,7 +21,7 @@ Changes in version 2.3+:
%token <string> STRING2
%type <string> string1
%type <string> string2
- %destructor { free ($$); }
+ %destructor { free ($$); } %symbol-default
%destructor { free ($$); printf ("%d", @$.first_line); } STRING1 string1

guarantees that, when the parser discards any user-defined symbol, it passes
Index: doc/bison.texinfo
===================================================================
RCS file: /sources/bison/bison/doc/bison.texinfo,v
retrieving revision 1.201
diff -p -u -r1.201 bison.texinfo
--- doc/bison.texinfo 24 Aug 2006 01:26:07 -0000 1.201
+++ doc/bison.texinfo 4 Sep 2006 19:25:47 -0000
@@ -3986,6 +3986,7 @@ For instance, if your locations use a fi
@subsection Freeing Discarded Symbols
@cindex freeing discarded symbols
@findex %destructor
+@findex %symbol-default

During error recovery (@pxref{Error Recovery}), symbols already pushed
on the stack and tokens coming from the rest of the file are discarded
@@ -4012,8 +4013,9 @@ The additional parser parameters are als
The Parser Function @code{yyparse}}).
@end deffn

-@deffn {Directive} %destructor @{ @var{code} @}
+@deffn {Directive} %destructor @{ @var{code} @} %symbol-default
@cindex default %destructor
+@findex %symbol-default
Invoke the braced @var{code} whenever the parser discards any user-defined
grammar symbol for which the user has not specifically declared any
@code{%destructor}.
@@ -4030,7 +4032,7 @@ For instance:
%token <string> STRING2
%type <string> string1
%type <string> string2
-%destructor @{ free ($$); @}
+%destructor @{ free ($$); @} %symbol-default
%destructor @{ free ($$); printf ("%d", @@$.first_line); @} STRING1 string1
@end smallexample

@@ -8505,6 +8507,11 @@ Bison declaration to specify the start s
Start-Symbol}.
@end deffn

+@deffn {Directive} %symbol-default
+Used to declare a default @code{%destructor} or default @code{%printer}.
+@xref{Destructor Decl, , Freeing Discarded Symbols}.
+@end deffn
+
@deffn {Directive} %token
Bison declaration to declare token(s) without specifying precedence.
@xref{Token Decl, ,Token Type Names}.
Index: src/parse-gram.y
===================================================================
RCS file: /sources/bison/bison/src/parse-gram.y,v
retrieving revision 1.89
diff -p -u -r1.89 parse-gram.y
--- src/parse-gram.y 13 Aug 2006 03:18:15 -0000 1.89
+++ src/parse-gram.y 4 Sep 2006 19:25:47 -0000
@@ -116,6 +116,8 @@ static int current_prec = 0;
%token PERCENT_TYPE "%type"
%token PERCENT_DESTRUCTOR "%destructor"
%token PERCENT_PRINTER "%printer"
+%token PERCENT_SYMBOL_DEFAULT
+ "%symbol-default"

%token PERCENT_LEFT "%left"
%token PERCENT_RIGHT "%right"
@@ -199,7 +201,7 @@ static int current_prec = 0;
%printer { fprintf (stderr, "%s:", $$->tag); } id_colon

%type <assoc> precedence_declarator
-%type <list> symbols.1
+%type <list> symbols.1 generic_symlist generic_symlist_item
%%

input:
@@ -262,30 +264,22 @@ grammar_declaration:
{
grammar_start_symbol_set ($2, @2);
}
-| "%destructor" "{...}"
- {
- default_destructor_set (translate_symbol_action ($2, @2), @2);
- }
-| "%destructor" "{...}" symbols.1
+| "%destructor" "{...}" generic_symlist
{
symbol_list *list;
const char *action = translate_symbol_action ($2, @2);
for (list = $3; list; list = list->next)
- symbol_destructor_set (list->sym, action, @2);
+ symbol_list_destructor_set (list, action, @2);
symbol_list_free ($3);
}
-| "%printer" "{...}" symbols.1
+| "%printer" "{...}" generic_symlist
{
symbol_list *list;
const char *action = translate_symbol_action ($2, @2);
for (list = $3; list; list = list->next)
- symbol_printer_set (list->sym, action, @2);
+ symbol_list_printer_set (list, action, @2);
symbol_list_free ($3);
}
-| "%printer" "{...}"
- {
- default_printer_set (translate_symbol_action ($2, @2), @2);
- }
| "%default-prec"
{
default_prec = true;
@@ -352,7 +346,7 @@ symbol_declaration:
tag_seen = true;
symbol_list *list;
for (list = $3; list; list = list->next)
- symbol_type_set (list->sym, $2, @2);
+ symbol_type_set (list->content.sym, $2, @2);
symbol_list_free ($3);
}
;
@@ -364,8 +358,8 @@ precedence_declaration:
++current_prec;
for (list = $3; list; list = list->next)
{
- symbol_type_set (list->sym, current_type, @2);
- symbol_precedence_set (list->sym, current_prec, $1, @1);
+ symbol_type_set (list->content.sym, current_type, @2);
+ symbol_precedence_set (list->content.sym, current_prec, $1, @1);
}
symbol_list_free ($3);
current_type = NULL;
@@ -383,10 +377,23 @@ type.opt:
| TYPE { current_type = $1; tag_seen = true; }
;

-/* One or more nonterminals to be %typed. */
+/* One or more symbols to be %typed. */
symbols.1:
- symbol { $$ = symbol_list_new ($1, @1); }
-| symbols.1 symbol { $$ = symbol_list_prepend ($1, $2, @2); }
+ symbol
+ { $$ = symbol_list_sym_new ($1, @1); }
+| symbols.1 symbol
+ { $$ = symbol_list_prepend ($1, symbol_list_sym_new ($2, @2)); }
+;
+
+generic_symlist:
+ generic_symlist_item { $$ = $1; }
+| generic_symlist generic_symlist_item { $$ = symbol_list_prepend ($1, $2); }
+;
+
+generic_symlist_item:
+ symbol { $$ = symbol_list_sym_new ($1, @1); }
+| TYPE { $$ = symbol_list_type_new ($1, @1); }
+| "%symbol-default" { $$ = symbol_list_default_new (@1); }
;

/* One token definition. */
Index: src/reader.c
===================================================================
RCS file: /sources/bison/bison/src/reader.c,v
retrieving revision 1.270
diff -p -u -r1.270 reader.c
--- src/reader.c 20 Aug 2006 03:10:18 -0000 1.270
+++ src/reader.c 4 Sep 2006 19:25:47 -0000
@@ -196,7 +196,7 @@ static symbol_list *grammar_end = NULL;
static void
grammar_symbol_append (symbol *sym, location loc)
{
- symbol_list *p = symbol_list_new (sym, loc);
+ symbol_list *p = symbol_list_sym_new (sym, loc);

if (grammar_end)
grammar_end->next = p;
@@ -252,7 +252,7 @@ grammar_current_rule_begin (symbol *lhs,
static bool
symbol_should_be_used (symbol_list const *s)
{
- return (symbol_destructor_get (s->sym)
+ return (symbol_destructor_get (s->content.sym)
|| (s->midrule && s->midrule->used));
}

@@ -271,13 +271,13 @@ grammar_rule_check (const symbol_list *r

Don't worry about the default action if $$ is untyped, since $$'s
value can't be used. */
- if (!r->action && r->sym->type_name)
+ if (!r->action && r->content.sym->type_name)
{
- symbol *first_rhs = r->next->sym;
+ symbol *first_rhs = r->next->content.sym;
/* If $$ is being set in default way, report if any type mismatch. */
if (first_rhs)
{
- char const *lhs_type = r->sym->type_name;
+ char const *lhs_type = r->content.sym->type_name;
const char *rhs_type =
first_rhs->type_name ? first_rhs->type_name : "";
if (!UNIQSTR_EQ (lhs_type, rhs_type))
@@ -295,7 +295,7 @@ grammar_rule_check (const symbol_list *r
{
symbol_list const *l = r;
int n = 0;
- for (; l && l->sym; l = l->next, ++n)
+ for (; l && l->content.sym; l = l->next, ++n)
if (! (l->used
|| !symbol_should_be_used (l)
/* The default action, $$ = $1, `uses' both. */
@@ -341,7 +341,7 @@ grammar_midrule_action (void)
action. Create the MIDRULE. */
location dummy_location = current_rule->action_location;
symbol *dummy = dummy_symbol_get (dummy_location);
- symbol_list *midrule = symbol_list_new (dummy, dummy_location);
+ symbol_list *midrule = symbol_list_sym_new (dummy, dummy_location);

/* Make a new rule, whose body is empty, before the current one, so
that the action just read can belong to it. */
@@ -362,7 +362,7 @@ grammar_midrule_action (void)
grammar = midrule;

/* End the dummy's rule. */
- midrule->next = symbol_list_new (NULL, dummy_location);
+ midrule->next = symbol_list_sym_new (NULL, dummy_location);
midrule->next->next = current_rule;

previous_rule_end = midrule->next;
@@ -461,11 +461,11 @@ packgram (void)
{
int rule_length = 0;
symbol *ruleprec = p->ruleprec;
- record_merge_function_type (p->merger, p->sym->type_name,
+ record_merge_function_type (p->merger, p->content.sym->type_name,
p->merger_declaration_location);
rules[ruleno].user_number = ruleno;
rules[ruleno].number = ruleno;
- rules[ruleno].lhs = p->sym;
+ rules[ruleno].lhs = p->content.sym;
rules[ruleno].rhs = ritem + itemno;
rules[ruleno].prec = NULL;
rules[ruleno].dprec = p->dprec;
@@ -487,7 +487,7 @@ packgram (void)
if (p != grammar)
grammar_rule_check (p);

- for (p = p->next; p && p->sym; p = p->next)
+ for (p = p->next; p && p->content.sym; p = p->next)
{
++rule_length;

@@ -498,11 +498,12 @@ packgram (void)

/* item_number = symbol_number.
But the former needs to contain more: negative rule numbers. */
- ritem[itemno++] = symbol_number_as_item_number (p->sym->number);
+ ritem[itemno++] =
+ symbol_number_as_item_number (p->content.sym->number);
/* A rule gets by default the precedence and associativity
of its last token. */
- if (p->sym->class == token_sym && default_prec)
- rules[ruleno].prec = p->sym;
+ if (p->content.sym->class == token_sym && default_prec)
+ rules[ruleno].prec = p->content.sym;
}

/* If this rule has a %prec,
@@ -605,16 +606,17 @@ check_and_convert_grammar (void)
{
symbol_list *node;
for (node = grammar;
- node != NULL && symbol_is_dummy (node->sym);
+ node != NULL && symbol_is_dummy (node->content.sym);
node = node->next)
{
for (node = node->next;
- node != NULL && node->sym != NULL;
+ node != NULL && node->content.sym != NULL;
node = node->next)
;
}
assert (node != NULL);
- grammar_start_symbol_set (node->sym, node->sym->location);
+ grammar_start_symbol_set (node->content.sym,
+ node->content.sym->location);
}

/* Insert the initial rule, whose line is that of the first rule
@@ -622,11 +624,11 @@ check_and_convert_grammar (void)

accept: %start EOF. */
{
- symbol_list *p = symbol_list_new (accept, empty_location);
+ symbol_list *p = symbol_list_sym_new (accept, empty_location);
p->location = grammar->location;
- p->next = symbol_list_new (startsymbol, empty_location);
- p->next->next = symbol_list_new (endtoken, empty_location);
- p->next->next->next = symbol_list_new (NULL, empty_location);
+ p->next = symbol_list_sym_new (startsymbol, empty_location);
+ p->next->next = symbol_list_sym_new (endtoken, empty_location);
+ p->next->next->next = symbol_list_sym_new (NULL, empty_location);
p->next->next->next->next = grammar;
nrules += 1;
nritems += 3;
Index: src/scan-code.l
===================================================================
RCS file: /sources/bison/bison/src/scan-code.l,v
retrieving revision 1.11
diff -p -u -r1.11 scan-code.l
--- src/scan-code.l 10 Aug 2006 04:53:04 -0000 1.11
+++ src/scan-code.l 4 Sep 2006 19:25:47 -0000
@@ -282,10 +282,10 @@ handle_action_dollar (symbol_list *rule,
_("$$ for the midrule at $%d of `%s'"
" has no declared type"),
rule->midrule_parent_rhs_index,
- effective_rule->sym->tag);
+ effective_rule->content.sym->tag);
else
complain_at (dollar_loc, _("$$ of `%s' has no declared type"),
- rule->sym->tag);
+ rule->content.sym->tag);
}
else
untyped_var_seen = true;
@@ -313,7 +313,7 @@ handle_action_dollar (symbol_list *rule,
{
if (union_seen | tag_seen)
complain_at (dollar_loc, _("$%d of `%s' has no declared type"),
- n, effective_rule->sym->tag);
+ n, effective_rule->content.sym->tag);
else
untyped_var_seen = true;
type_name = "";
Index: src/scan-gram.l
===================================================================
RCS file: /sources/bison/bison/src/scan-gram.l,v
retrieving revision 1.101
diff -p -u -r1.101 scan-gram.l
--- src/scan-gram.l 14 Aug 2006 22:40:33 -0000 1.101
+++ src/scan-gram.l 4 Sep 2006 19:25:47 -0000
@@ -193,6 +193,7 @@ splice (\\[ \f\t\v]*\n)*
"%skeleton" return PERCENT_SKELETON;
"%start" return PERCENT_START;
"%start-header" return PERCENT_START_HEADER;
+ "%symbol-default" return PERCENT_SYMBOL_DEFAULT;
"%term" return PERCENT_TOKEN;
"%token" return PERCENT_TOKEN;
"%token"[-_]"table" return PERCENT_TOKEN_TABLE;
Index: src/symlist.c
===================================================================
RCS file: /sources/bison/bison/src/symlist.c,v
retrieving revision 1.19
diff -p -u -r1.19 symlist.c
--- src/symlist.c 9 Jul 2006 19:55:15 -0000 1.19
+++ src/symlist.c 4 Sep 2006 19:25:47 -0000
@@ -31,11 +31,12 @@
`--------------------------------------*/

symbol_list *
-symbol_list_new (symbol *sym, location loc)
+symbol_list_sym_new (symbol *sym, location loc)
{
symbol_list *res = xmalloc (sizeof *res);

- res->sym = sym;
+ res->content_type = SYMLIST_SYMBOL;
+ res->content.sym = sym;
res->location = loc;

res->midrule = NULL;
@@ -55,39 +56,73 @@ symbol_list_new (symbol *sym, location l
}


-/*------------------.
-| Print this list. |
-`------------------*/
+/*--------------------------------------------.
+| Create a list containing TYPE_NAME at LOC. |
+`--------------------------------------------*/
+
+symbol_list *
+symbol_list_type_new (uniqstr type_name, location loc)
+{
+ symbol_list *res = xmalloc (sizeof *res);
+
+ res->content_type = SYMLIST_TYPE;
+ res->content.type_name = type_name;
+ res->location = loc;
+ res->next = NULL;
+
+ return res;
+}
+
+
+/*----------------------------------------------------.
+| Create a list containing a %symbol-default at LOC. |
+`----------------------------------------------------*/
+
+symbol_list *
+symbol_list_default_new (location loc)
+{
+ symbol_list *res = xmalloc (sizeof *res);
+
+ res->content_type = SYMLIST_DEFAULT;
+ res->location = loc;
+ res->next = NULL;
+
+ return res;
+}
+
+
+/*-----------------------------------------------------------------------.
+| Print this list, for which every content_type must be SYMLIST_SYMBOL. |
+`-----------------------------------------------------------------------*/

void
-symbol_list_print (const symbol_list *l, FILE *f)
+symbol_list_syms_print (const symbol_list *l, FILE *f)
{
- for (/* Nothing. */; l && l->sym; l = l->next)
+ for (/* Nothing. */; l && l->content.sym; l = l->next)
{
- symbol_print (l->sym, f);
+ symbol_print (l->content.sym, f);
fprintf (stderr, l->used ? " used" : " unused");
- if (l && l->sym)
+ if (l && l->content.sym)
fprintf (f, ", ");
}
}


-/*---------------------------------.
-| Prepend SYM at LOC to the LIST. |
-`---------------------------------*/
+/*---------------------------.
+| Prepend NODE to the LIST. |
+`---------------------------*/

symbol_list *
-symbol_list_prepend (symbol_list *list, symbol *sym, location loc)
+symbol_list_prepend (symbol_list *list, symbol_list *node)
{
- symbol_list *res = symbol_list_new (sym, loc);
- res->next = list;
- return res;
+ node->next = list;
+ return node;
}


-/*-------------------------------------------------.
-| Free the LIST, but not the symbols it contains. |
-`-------------------------------------------------*/
+/*-----------------------------------------------.
+| Free the LIST, but not the items it contains. |
+`-----------------------------------------------*/

void
symbol_list_free (symbol_list *list)
@@ -104,15 +139,17 @@ int
symbol_list_length (symbol_list const *l)
{
int res = 0;
- for (/* Nothing. */; l && l->sym; l = l->next)
+ for (/* Nothing. */;
+ l && !(l->content_type == SYMLIST_SYMBOL && l->content.sym == NULL);
+ l = l->next)
++res;
return res;
}


-/*--------------------------------.
-| Get symbol N in symbol list L. |
-`--------------------------------*/
+/*------------------------------.
+| Get item N in symbol list L. |
+`------------------------------*/

symbol_list *
symbol_list_n_get (symbol_list *l, int n)
@@ -125,7 +162,8 @@ symbol_list_n_get (symbol_list *l, int n
for (i = 0; i < n; ++i)
{
l = l->next;
- if (l == NULL || l->sym == NULL)
+ if (l == NULL
+ || (l->content_type == SYMLIST_SYMBOL && l->content.sym == NULL))
return NULL;
}

@@ -147,13 +185,14 @@ symbol_list_n_type_name_get (symbol_list
complain_at (loc, _("invalid $ value: $%d"), n);
return NULL;
}
- return l->sym->type_name;
+ assert (l->content_type == SYMLIST_SYMBOL);
+ return l->content.sym->type_name;
}


-/*----------------------------------------.
-| The symbol N in symbol list L is USED. |
-`----------------------------------------*/
+/*--------------------------------------.
+| The item N in symbol list L is USED. |
+`--------------------------------------*/

void
symbol_list_n_used_set (symbol_list *l, int n, bool used)
@@ -162,3 +201,38 @@ symbol_list_n_used_set (symbol_list *l,
if (l)
l->used = used;
}
+
+void
+symbol_list_destructor_set (symbol_list *node, const char *destructor,
+ location loc)
+{
+ switch (node->content_type)
+ {
+ case SYMLIST_SYMBOL:
+ symbol_destructor_set (node->content.sym, destructor, loc);
+ break;
+ case SYMLIST_TYPE:
+ /* FIXME: */
+ break;
+ case SYMLIST_DEFAULT:
+ default_destructor_set (destructor, loc);
+ break;
+ }
+}
+
+void
+symbol_list_printer_set (symbol_list *node, const char *printer, location loc)
+{
+ switch (node->content_type)
+ {
+ case SYMLIST_SYMBOL:
+ symbol_printer_set (node->content.sym, printer, loc);
+ break;
+ case SYMLIST_TYPE:
+ /* FIXME: */
+ break;
+ case SYMLIST_DEFAULT:
+ default_printer_set (printer, loc);
+ break;
+ }
+}
Index: src/symlist.h
===================================================================
RCS file: /sources/bison/bison/src/symlist.h,v
retrieving revision 1.17
diff -p -u -r1.17 symlist.h
--- src/symlist.h 9 Jul 2006 19:55:15 -0000 1.17
+++ src/symlist.h 4 Sep 2006 19:25:47 -0000
@@ -28,8 +28,17 @@
/* A list of symbols, used during the parsing to store the rules. */
typedef struct symbol_list
{
- /* The symbol. */
- symbol *sym;
+ /**
+ * Whether this node contains a symbol, a semantic type, or a
+ * \c \%symbol-default.
+ */
+ enum { SYMLIST_SYMBOL, SYMLIST_TYPE, SYMLIST_DEFAULT } content_type;
+ union {
+ /** The symbol or \c NULL iff <tt>node_type = SYMLIST_SYMBOL</tt>. */
+ symbol *sym;
+ /** The semantic type iff <tt>node_type = SYMLIST_TYPE</tt>. */
+ uniqstr type_name;
+ } content;
location location;

/* If this symbol is the generated lhs for a midrule but this is the rule in
@@ -61,31 +70,46 @@ typedef struct symbol_list
} symbol_list;


-/* Create a list containing SYM at LOC. */
-symbol_list *symbol_list_new (symbol *sym, location loc);
+/** Create a list containing \c sym at \c loc. */
+symbol_list *symbol_list_sym_new (symbol *sym, location loc);

-/* Print it. */
-void symbol_list_print (const symbol_list *l, FILE *f);
+/** Create a list containing \c type_name at \c loc. */
+symbol_list *symbol_list_type_new (uniqstr type_name, location loc);

-/* Prepend SYM at LOC to the LIST. */
-symbol_list *symbol_list_prepend (symbol_list *l,
- symbol *sym,
- location loc);
+/** Create a list containing a \c \%symbol-default at \c loc. */
+symbol_list *symbol_list_default_new (location loc);

-/* Free the LIST, but not the symbols it contains. */
-void symbol_list_free (symbol_list *l);
+/** Print this list.

-/* Return its length. */
+ \pre For every node \c n in the list, <tt>n->content_type =
+ SYMLIST_SYMBOL</tt>. */
+void symbol_list_syms_print (const symbol_list *l, FILE *f);
+
+/** Prepend \c node to \c list. */
+symbol_list *symbol_list_prepend (symbol_list *list, symbol_list *node);
+
+/** Free \c list, but not the items it contains. */
+void symbol_list_free (symbol_list *list);
+
+/** Return the length of \c l. */
int symbol_list_length (symbol_list const *l);

-/* Get symbol N in symbol list L. */
+/** Get item \c n in symbol list \c l. */
symbol_list *symbol_list_n_get (symbol_list *l, int n);

/* Get the data type (alternative in the union) of the value for
symbol N in rule RULE. */
uniqstr symbol_list_n_type_name_get (symbol_list *l, location loc, int n);

-/* The symbol N in symbol list L is USED. */
+/** The item \c n in symbol list \c l is \c used. */
void symbol_list_n_used_set (symbol_list *l, int n, bool used);

+/** Set the \c \%destructor for \c node as \c destructor at \c loc. */
+void symbol_list_destructor_set (symbol_list *node, const char *destructor,
+ location loc);
+
+/** Set the \c \%printer for \c node as \c printer at \c loc. */
+void symbol_list_printer_set (symbol_list *node, const char *printer,
+ location loc);
+
#endif /* !SYMLIST_H_ */
Index: tests/actions.at
===================================================================
RCS file: /sources/bison/bison/tests/actions.at,v
retrieving revision 1.64
diff -p -u -r1.64 actions.at
--- tests/actions.at 24 Aug 2006 01:26:07 -0000 1.64
+++ tests/actions.at 4 Sep 2006 19:25:47 -0000
@@ -606,10 +606,10 @@ AT_DATA_GRAMMAR([[input.y]],

%printer {
fprintf (yyoutput, "Default printer for '%c' @ %d", $$, @$.first_column);
-}
+} %symbol-default
%destructor {
fprintf (stdout, "Default destructor for '%c' @ %d.\n", $$, @$.first_column);
-}
+} %symbol-default

%printer {
fprintf (yyoutput, "'b'/'c' printer for '%c' @ %d", $$, @$.first_column);
@@ -715,10 +715,10 @@ AT_DATA_GRAMMAR([[input.y]],
%token END 0
%printer {
fprintf (yyoutput, "Default printer for '%c' @ %d", $$, @$.first_column);
-}
+} %symbol-default
%destructor {
fprintf (stdout, "Default destructor for '%c' @ %d.\n", $$, @$.first_column);
-}
+} %symbol-default

%%

@@ -800,10 +800,10 @@ AT_DATA_GRAMMAR([[input.y]],

%printer {
fprintf (yyoutput, "'%c'", $$);
-}
+} %symbol-default
%destructor {
fprintf (stderr, "DESTROY '%c'\n", $$);
-}
+} %symbol-default

%%

@@ -911,11 +911,11 @@ AT_DATA_GRAMMAR([[input.y]],
%printer {
char chr = $$;
fprintf (yyoutput, "'%c'", chr);
-}
+} %symbol-default
%destructor {
char chr = $$;
fprintf (stderr, "DESTROY '%c'\n", chr);
-}
+} %symbol-default

%union { char chr; }
%type <chr> start
Index: tests/input.at
===================================================================
RCS file: /sources/bison/bison/tests/input.at,v
retrieving revision 1.54
diff -p -u -r1.54 input.at
--- tests/input.at 20 Aug 2006 03:10:18 -0000 1.54
+++ tests/input.at 4 Sep 2006 19:25:47 -0000
@@ -175,18 +175,18 @@ AT_CLEANUP
AT_SETUP([Default %printer and %destructor redeclared])

AT_DATA([[input.y]],
-[[%destructor { destroy ($$); }
-%printer { destroy ($$); }
+[[%destructor { destroy ($$); } %symbol-default
+%printer { destroy ($$); } %symbol-default

-%destructor { destroy ($$); }
-%printer { destroy ($$); }
+%destructor { destroy ($$); } %symbol-default
+%printer { destroy ($$); } %symbol-default

%%

start: ;

-%destructor { destroy ($$); };
-%printer { destroy ($$); };
+%destructor { destroy ($$); } %symbol-default;
+%printer { destroy ($$); } %symbol-default;
]])

AT_CHECK([bison input.y], [1], [],
@@ -210,7 +210,7 @@ AT_CLEANUP
AT_SETUP([Unused values with default %destructor])

AT_DATA([[input.y]],
-[[%destructor { destroy ($$); }
+[[%destructor { destroy ($$); } %symbol-default

%%

@@ -533,7 +533,7 @@ input.y:4.10-5.0: missing `'' at end of
input.y:14.11-15.0: missing `'' at end of line
input.y:16.11-17.0: missing `"' at end of line
input.y:19.13-20.0: missing `}' at end of file
-input.y:20.1: syntax error, unexpected end of file, expecting ;
+input.y:20.1: syntax error, unexpected end of file
]])

AT_CLEANUP
Akim Demaille
2006-09-14 08:43:32 UTC
Permalink
>>> "Joel" == Joel E Denny <***@ces.clemson.edu> writes:

> I'm beginning to think the declaration is a bit ugly though. How
> about something like this instead:

> %destructor { free ($$); } %symbol-default
> %printer { fprintf (yyoutput, "%s", $$); } %symbol-default

> I think that's clearer than just an empty list.

I don't like the empty either, and I meant to introduce $empty-word or
$epsilon, or whatever for empty rhs too. Maybe we should use a single
token for all these uses?

Alternatively, we can introduce %default-printer {};.

> Unfortunately, it is yet another % declaration, but it may have
> other uses: see here where I called it %any instead:

> http://lists.gnu.org/archive/html/bison-patches/2006-08/msg00033.html

> What do you think?

How about taking this opportunity to treat differently symbols with a
value from those without?

%printer { cerr << @$ << ": " << $$; } %valued-symbols
%printer { cerr << @$; } %valueless-symbols

Or

%default-valued-printer { cerr << @$ << ": " << $$; }
%default-valueless-printer { cerr << @$; }
Akim Demaille
2006-09-14 09:05:13 UTC
Permalink
>>> "Akim" == Akim Demaille <***@lrde.epita.fr> writes:

> How about taking this opportunity to treat differently symbols with a
> value from those without?

> %printer { cerr << @$ << ": " << $$; } %valued-symbols
> %printer { cerr << @$; } %valueless-symbols

> Or

> %default-valued-printer { cerr << @$ << ": " << $$; }
> %default-valueless-printer { cerr << @$; }

Or better yet (?), no %symbol-default, but:

%printer { cerr << @$ << ": " << $$; } <*>
%printer { cerr << @$; } <->

using "special type tags".
Joel E. Denny
2006-09-15 01:22:10 UTC
Permalink
On Thu, 14 Sep 2006, Akim Demaille wrote:

> Or better yet (?), no %symbol-default, but:
>
> %printer { cerr << @$ << ": " << $$; } <*>

So, this means all symbols with types, right? I was actually thinking of
this syntax before %symbol-default, but then I rejected it because I
couldn't figure out what to do about type-less symbols....

> %printer { cerr << @$; } <->

I like this. Small difference though: what about <!>? In my mind, "!" =
"not", and it looks slightly odd, which is what we mean to imply, I think.

In the case of no %union and no <...> usage (that is, the user defines his
own YYSTYPE or uses the default int), are all symbols considered typed or
untyped? That is, does <*> or <!> apply? I'm leaning toward untyped and
<!>. That way, when you refactor the last <...> out of your grammar, the
<!> %destructor won't suddenly quit applying to all other grammar symbols.
So maybe <*> means tagged symbols and <!> means tagless symbols (rather
than typed and untyped since someone might argue that all symbols are
typed with #define YYSTYPE int).

Bison would warn about no $$ in:

%printer { cerr << @$; } <!>

because these symbols might actually have values. But this would be fine:

%printer(!) { cerr << @$; } <!>

I talk about (!) in the `no $$ in %destructor = no unset value warnings?'
thread.

By the way, currently, the default destructor applies to midrule values.
I think it should only apply to midrule values for which $$ is actually
used. I don't think even <!> should apply if $$ isn't used in this case.
That is, when $$ isn't used, not only does the midrule not have a type, it
doesn't even have a value; it's just a midrule.
Joel E. Denny
2006-10-21 04:53:16 UTC
Permalink
On Thu, 14 Sep 2006, Joel E. Denny wrote:

> By the way, currently, the default destructor applies to midrule values.
> I think it should only apply to midrule values for which $$ is actually
> used. I don't think even <!> should apply if $$ isn't used in this case.
> That is, when $$ isn't used, not only does the midrule not have a type, it
> doesn't even have a value; it's just a midrule.

I committed the following.

Index: ChangeLog
===================================================================
RCS file: /sources/bison/bison/ChangeLog,v
retrieving revision 1.1592
diff -p -u -r1.1592 ChangeLog
--- ChangeLog 21 Oct 2006 02:31:50 -0000 1.1592
+++ ChangeLog 21 Oct 2006 04:50:33 -0000
@@ -1,3 +1,19 @@
+2006-10-21 Joel E. Denny <***@ces.clemson.edu>
+
+ Don't apply the default %destructor/%printer to an unreferenced midrule
+ value. Mentioned at
+ <http://lists.gnu.org/archive/html/bison-patches/2006-09/msg00104.html>.
+ * src/symtab.c (dummy_symbol_get): Name all dummy symbols initially
+ like $@n instead of just @n so that the default %destructor/%printer
+ logic doesn't see them as user-defined symbols.
+ (symbol_is_dummy): Check for both forms of the name.
+ * src/reader.c (packgram): Remove the `$' from each midrule symbol
+ name for which the midrule value is referenced in any action.
+ * tests/actions.at (Default %printer and %destructor for mid-rule
+ values): New test.
+ * tests/regression.at (Rule Line Numbers, Web2c Report): Update output
+ for change to dummy symbol names.
+
2006-10-20 Joel E. Denny <***@ces.clemson.edu>

Warn about unset midrule $$ if the corresponding $n is used.
Index: src/reader.c
===================================================================
RCS file: /sources/bison/bison/src/reader.c,v
retrieving revision 1.274
diff -p -u -r1.274 reader.c
--- src/reader.c 21 Oct 2006 02:31:50 -0000 1.274
+++ src/reader.c 21 Oct 2006 04:50:33 -0000
@@ -459,11 +459,12 @@ packgram (void)

rules = xnmalloc (nrules, sizeof *rules);

- /* Before invoking grammar_rule_check on any rule, make sure
- all actions have already been scanned in order to set `used' flags.
- Otherwise, checking that a midrule's $$ is set will not always work
- properly because the midrule check must forward-reference the midrule's
- parent rule. */
+ /* Before invoking grammar_rule_check on any rule, make sure all actions have
+ already been scanned in order to set `used' flags. Otherwise, checking
+ that a midrule's $$ should be set will not always work properly because
+ the check must forward-reference the midrule's parent rule. For the same
+ reason, all the `used' flags must be set before checking whether to remove
+ `$' from any midrule symbol name. */
while (p)
{
if (p->action)
@@ -492,6 +493,15 @@ packgram (void)
rules[ruleno].action = p->action;
rules[ruleno].action_location = p->action_location;

+ /* If the midrule's $$ is set or its $n is used, remove the `$' from the
+ symbol name so that it's a user-defined symbol so that the default
+ %destructor and %printer apply. */
+ if (p->midrule_parent_rule
+ && (p->used
+ || symbol_list_n_get (p->midrule_parent_rule,
+ p->midrule_parent_rhs_index)->used))
+ p->content.sym->tag += 1;
+
/* Don't check the generated rule 0. It has no action, so some rhs
symbols may appear unused, but the parsing algorithm ensures that
%destructor's are invoked appropriately. */
Index: src/symtab.c
===================================================================
RCS file: /sources/bison/bison/src/symtab.c,v
retrieving revision 1.82
diff -p -u -r1.82 symtab.c
--- src/symtab.c 15 Sep 2006 16:34:48 -0000 1.82
+++ src/symtab.c 21 Oct 2006 04:50:33 -0000
@@ -769,7 +769,7 @@ dummy_symbol_get (location loc)

symbol *sym;

- sprintf (buf, "@%d", ++dummy_count);
+ sprintf (buf, "$@%d", ++dummy_count);
sym = symbol_get (buf, loc);
sym->class = nterm_sym;
sym->number = nvars++;
@@ -779,7 +779,7 @@ dummy_symbol_get (location loc)
bool
symbol_is_dummy (const symbol *sym)
{
- return sym->tag[0] == '@';
+ return sym->tag[0] == '@' || (sym->tag[0] == '$' && sym->tag[1] == '@');
}

/*-------------------.
Index: tests/actions.at
===================================================================
RCS file: /sources/bison/bison/tests/actions.at,v
retrieving revision 1.72
diff -p -u -r1.72 actions.at
--- tests/actions.at 15 Oct 2006 12:37:07 -0000 1.72
+++ tests/actions.at 21 Oct 2006 04:50:33 -0000
@@ -1096,3 +1096,108 @@ AT_CHECK([bison -o input.c input.y])
AT_COMPILE([input])

AT_CLEANUP
+
+
+
+## ------------------------------------------------------ ##
+## Default %printer and %destructor for mid-rule values. ##
+## ------------------------------------------------------ ##
+
+AT_SETUP([Default %printer and %destructor for mid-rule values])
+
+AT_DATA_GRAMMAR([[input.y]],
+[[%debug /* So that %printer is actually compiled. */
+
+%{
+# include <stdio.h>
+# include <stdlib.h>
+ static void yyerror (const char *msg);
+ static int yylex (void);
+# define USE(SYM)
+# define YYLTYPE int
+# define YYLLOC_DEFAULT(Current, Rhs, N)
+# define YY_LOCATION_PRINT(File, Loc)
+%}
+
+%printer { fprintf (yyoutput, "%d", @$); } %symbol-default
+%destructor { fprintf (stderr, "DESTROY %d\n", @$); } %symbol-default
+
+%%
+
+start:
+ { @$ = 1; } // Not set or used.
+ { USE ($$); @$ = 2; } // Both set and used.
+ { USE ($$); @$ = 3; } // Only set.
+ { @$ = 4; } // Only used.
+ 'c'
+ { USE (($$, $2, $4, $5)); @$ = 0; }
+ ;
+
+%%
+
+static int
+yylex (void)
+{
+ static int called;
+ if (called++)
+ abort ();
+ return 0;
+}
+
+static void
+yyerror (const char *msg)
+{
+ fprintf (stderr, "%s\n", msg);
+}
+
+int
+main (void)
+{
+ yydebug = 1;
+ return yyparse ();
+}
+]])
+
+AT_CHECK([bison -o input.c input.y], 0,,
+[[input.y:31.3-23: warning: unset value: $$
+input.y:28.3-33.37: warning: unused value: $3
+]])
+
+AT_COMPILE([input])
+AT_PARSER_CHECK([./input], 1,,
+[[Starting parse
+Entering state 0
+Reducing stack by rule 1 (line 28):
+-> $$ = nterm $@1 (: )
+Stack now 0
+Entering state 2
+Reducing stack by rule 2 (line 29):
+-> $$ = nterm @2 (: 2)
+Stack now 0 2
+Entering state 4
+Reducing stack by rule 3 (line 30):
+-> $$ = nterm @3 (: 3)
+Stack now 0 2 4
+Entering state 5
+Reducing stack by rule 4 (line 31):
+-> $$ = nterm @4 (: 4)
+Stack now 0 2 4 5
+Entering state 6
+Reading a token: Now at end of input.
+syntax error
+Error: popping nterm @4 (: 4)
+DESTROY 4
+Stack now 0 2 4 5
+Error: popping nterm @3 (: 3)
+DESTROY 3
+Stack now 0 2 4
+Error: popping nterm @2 (: 2)
+DESTROY 2
+Stack now 0 2
+Error: popping nterm $@1 (: )
+Stack now 0
+Cleanup: discarding lookahead token $end (: )
+Stack now 0
+]])
+
+AT_CLEANUP
Index: tests/regression.at
===================================================================
RCS file: /sources/bison/bison/tests/regression.at,v
retrieving revision 1.111
diff -p -u -r1.111 regression.at
--- tests/regression.at 15 Oct 2006 00:02:21 -0000 1.111
+++ tests/regression.at 21 Oct 2006 04:50:33 -0000
@@ -252,13 +252,13 @@ AT_CHECK([cat input.output], [],

0 $accept: expr $end

- 1 @1: /* empty */
+ 1 $@1: /* empty */

- 2 expr: 'a' @1 'b'
+ 2 expr: 'a' $@1 'b'

- 3 @2: /* empty */
+ 3 $@2: /* empty */

- 4 expr: @2 'c'
+ 4 expr: $@2 'c'


Terminals, with rules where they appear
@@ -276,9 +276,9 @@ $accept (6)
on left: 0
expr (7)
on left: 2 4, on right: 0
-@1 (8)
+$@1 (8)
on left: 1, on right: 2
-@2 (9)
+$@2 (9)
on left: 3, on right: 4


@@ -288,19 +288,19 @@ state 0

'a' shift, and go to state 1

- $default reduce using rule 3 (@2)
+ $default reduce using rule 3 ($@2)

expr go to state 2
- @2 go to state 3
+ $@2 go to state 3


state 1

- 2 expr: 'a' . @1 'b'
+ 2 expr: 'a' . $@1 'b'

- $default reduce using rule 1 (@1)
+ $default reduce using rule 1 ($@1)

- @1 go to state 4
+ $@1 go to state 4


state 2
@@ -312,14 +312,14 @@ state 2

state 3

- 4 expr: @2 . 'c'
+ 4 expr: $@2 . 'c'

'c' shift, and go to state 6


state 4

- 2 expr: 'a' @1 . 'b'
+ 2 expr: 'a' $@1 . 'b'

'b' shift, and go to state 7

@@ -333,14 +333,14 @@ state 5

state 6

- 4 expr: @2 'c' .
+ 4 expr: $@2 'c' .

$default reduce using rule 4 (expr)


state 7

- 2 expr: 'a' @1 'b' .
+ 2 expr: 'a' $@1 'b' .

$default reduce using rule 2 (expr)
]])
@@ -553,9 +553,9 @@ AT_CHECK([cat input.output], 0,
2 CONST_DEC_LIST: CONST_DEC
3 | CONST_DEC_LIST CONST_DEC

- 4 @1: /* empty */
+ 4 $@1: /* empty */

- 5 CONST_DEC: @1 undef_id_tok '=' const_id_tok ';'
+ 5 CONST_DEC: $@1 undef_id_tok '=' const_id_tok ';'


Terminals, with rules where they appear
@@ -578,7 +578,7 @@ CONST_DEC_LIST (9)
on left: 2 3, on right: 1 3
CONST_DEC (10)
on left: 5, on right: 2 3
-@1 (11)
+$@1 (11)
on left: 4, on right: 5


@@ -586,12 +586,12 @@ state 0

0 $accept: . CONST_DEC_PART $end

- $default reduce using rule 4 (@1)
+ $default reduce using rule 4 ($@1)

CONST_DEC_PART go to state 1
CONST_DEC_LIST go to state 2
CONST_DEC go to state 3
- @1 go to state 4
+ $@1 go to state 4


state 1
@@ -606,11 +606,11 @@ state 2
1 CONST_DEC_PART: CONST_DEC_LIST .
3 CONST_DEC_LIST: CONST_DEC_LIST . CONST_DEC

- undef_id_tok reduce using rule 4 (@1)
+ undef_id_tok reduce using rule 4 ($@1)
$default reduce using rule 1 (CONST_DEC_PART)

CONST_DEC go to state 6
- @1 go to state 4
+ $@1 go to state 4


state 3
@@ -622,7 +622,7 @@ state 3

state 4

- 5 CONST_DEC: @1 . undef_id_tok '=' const_id_tok ';'
+ 5 CONST_DEC: $@1 . undef_id_tok '=' const_id_tok ';'

undef_id_tok shift, and go to state 7

@@ -643,28 +643,28 @@ state 6

state 7

- 5 CONST_DEC: @1 undef_id_tok . '=' const_id_tok ';'
+ 5 CONST_DEC: $@1 undef_id_tok . '=' const_id_tok ';'

'=' shift, and go to state 8


state 8

- 5 CONST_DEC: @1 undef_id_tok '=' . const_id_tok ';'
+ 5 CONST_DEC: $@1 undef_id_tok '=' . const_id_tok ';'

const_id_tok shift, and go to state 9


state 9

- 5 CONST_DEC: @1 undef_id_tok '=' const_id_tok . ';'
+ 5 CONST_DEC: $@1 undef_id_tok '=' const_id_tok . ';'

';' shift, and go to state 10


state 10

- 5 CONST_DEC: @1 undef_id_tok '=' const_id_tok ';' .
+ 5 CONST_DEC: $@1 undef_id_tok '=' const_id_tok ';' .

$default reduce using rule 5 (CONST_DEC)
]])
Joel E. Denny
2006-10-21 10:03:41 UTC
Permalink
On Thu, 14 Sep 2006, Joel E. Denny wrote:

> On Thu, 14 Sep 2006, Akim Demaille wrote:
>
> > Or better yet (?), no %symbol-default, but:
> >
> > %printer { cerr << @$ << ": " << $$; } <*>
>
> So, this means all symbols with types, right? I was actually thinking of
> this syntax before %symbol-default, but then I rejected it because I
> couldn't figure out what to do about type-less symbols....
>
> > %printer { cerr << @$; } <->
>
> I like this. Small difference though: what about <!>? In my mind, "!" =
> "not", and it looks slightly odd, which is what we mean to imply, I think.
>
> In the case of no %union and no <...> usage (that is, the user defines his
> own YYSTYPE or uses the default int), are all symbols considered typed or
> untyped? That is, does <*> or <!> apply? I'm leaning toward untyped and
> <!>. That way, when you refactor the last <...> out of your grammar, the
> <!> %destructor won't suddenly quit applying to all other grammar symbols.
> So maybe <*> means tagged symbols and <!> means tagless symbols (rather
> than typed and untyped since someone might argue that all symbols are
> typed with #define YYSTYPE int).

I committed the following.

Index: ChangeLog
===================================================================
RCS file: /sources/bison/bison/ChangeLog,v
retrieving revision 1.1593
diff -p -u -r1.1593 ChangeLog
--- ChangeLog 21 Oct 2006 04:52:43 -0000 1.1593
+++ ChangeLog 21 Oct 2006 09:58:21 -0000
@@ -1,5 +1,51 @@
2006-10-21 Joel E. Denny <***@ces.clemson.edu>

+ Split the default %destructor/%printer into two kinds: <*> and <!>.
+ Discussed starting at
+ <http://lists.gnu.org/archive/html/bison-patches/2006-09/msg00060.html>.
+ * NEWS (2.3a+): Mention.
+ * doc/bison.texinfo (Freeing Discarded Symbols): Document this and the
+ previous change today related to mid-rules.
+ (Bison Symbols): Remove %symbol-default and add <*> and <!>.
+ * src/parser-gram.y (PERCENT_SYMBOL_DEFAULT): Remove.
+ (TYPE_TAG_ANY): Add as <*>.
+ (TYPE_TAG_NONE): Add as <!>.
+ (generic_symlist_item): Remove RHS for %symbol-default and add RHS's
+ for <*> and <!>.
+ * src/scan-gram.l (PERCENT_SYMBOL_DEFAULT): Remove.
+ (TYPE_TAG_ANY, TYPE_TAG_NONE): Add.
+ * src/symlist.c (symbol_list_default_new): Split into tagged and
+ tagless versions.
+ (symbol_list_destructor_set, symbol_list_printer_set): Split
+ SYMLIST_DEFAULT case into SYMLIST_DEFAULT_TAGGED and
+ SYMLIST_DEFAULT_TAGLESS.
+ * src/symlist.h: Update symbol_list_default*_new prototypes.
+ (symbol_list.content_type): Split enum value SYMLIST_DEFAULT into
+ SYMLIST_DEFAULT_TAGGED and SYMLIST_DEFAULT_TAGLESS.
+ * src/symtab.c (default_destructor, default_destructor_location,
+ default_printer, default_printer_location): Split each into tagged and
+ tagless versions.
+ (symbol_destructor_get, symbol_destructor_location_get,
+ symbol_printer_get, symbol_printer_location_get): Implement tagged
+ default and tagless default cases.
+ (default_destructor_set, default_printer_set): Split each into tagged
+ and tagless versions.
+ * src/symtab.h: Update prototypes.
+ * tests/actions.at (Default %printer and %destructor): Rename to...
+ (Default tagless %printer and %destructor): ... this, and extend.
+ (Per-type %printer and %destructor): Rename to...
+ (Default tagged and per-type %printer and %destructor): ... this, and
+ extend.
+ (Default %printer and %destructor for user-defined end token): Extend.
+ (Default %printer and %destructor are not for error or $undefined):
+ Update.
+ (Default %printer and %destructor are not for $accept): Update.
+ (Default %printer and %destructor for mid-rule values): Extend.
+ * tests/input.at (Default %printer and %destructor redeclared): Extend.
+ (Unused values with default %destructor): Extend.
+
+2006-10-21 Joel E. Denny <***@ces.clemson.edu>
+
Don't apply the default %destructor/%printer to an unreferenced midrule
value. Mentioned at
<http://lists.gnu.org/archive/html/bison-patches/2006-09/msg00104.html>.
Index: NEWS
===================================================================
RCS file: /sources/bison/bison/NEWS,v
retrieving revision 1.163
diff -p -u -r1.163 NEWS
--- NEWS 20 Oct 2006 22:10:50 -0000 1.163
+++ NEWS 21 Oct 2006 09:58:22 -0000
@@ -6,6 +6,25 @@ Changes in version 2.3a+ (????-??-??):
* The -g and --graph options now output graphs in Graphviz DOT format,
not VCG format.

+* Bison now recognizes two separate kinds of default %destructor's and
+ %printer's:
+
+ 1. Place `<*>' in a %destructor/%printer symbol list to define a default
+ %destructor/%printer for all grammar symbols for which you have formally
+ declared semantic type tags.
+
+ 2. Place `<!>' in a %destructor/%printer symbol list to define a default
+ %destructor/%printer for all grammar symbols without declared semantic
+ type tags.
+
+ Bison no longer supports the `%symbol-default' notation from Bison 2.3a.
+ `<*>' and `<!>' combined achieve the same effect with one exception: Bison no
+ longer applies any %destructor to a mid-rule value if that mid-rule value is
+ not actually ever referenced using either $$ or $n in a semantic action.
+
+ See the section `Freeing Discarded Symbols' in the Bison manual for further
+ details.
+
* The Yacc prologue alternatives from Bison 2.3a have been rewritten as the
following directives:

Index: doc/bison.texinfo
===================================================================
RCS file: /sources/bison/bison/doc/bison.texinfo,v
retrieving revision 1.210
diff -p -u -r1.210 bison.texinfo
--- doc/bison.texinfo 20 Oct 2006 22:10:50 -0000 1.210
+++ doc/bison.texinfo 21 Oct 2006 09:58:25 -0000
@@ -4236,8 +4236,8 @@ For instance, if your locations use a fi
@subsection Freeing Discarded Symbols
@cindex freeing discarded symbols
@findex %destructor
-@findex %symbol-default
-
+@findex <*>
+@findex <!>
During error recovery (@pxref{Error Recovery}), symbols already pushed
on the stack and tokens coming from the rest of the file are discarded
until the parser falls on its feet. If the parser runs out of memory,
@@ -4265,21 +4265,26 @@ The Parser Function @code{yyparse}}).
When a symbol is listed among @var{symbols}, its @code{%destructor} is called a
per-symbol @code{%destructor}.
You may also define a per-type @code{%destructor} by listing a semantic type
-among @var{symbols}.
+tag among @var{symbols}.
In that case, the parser will invoke this @var{code} whenever it discards any
-grammar symbol that has that semantic type unless that symbol has its own
+grammar symbol that has that semantic type tag unless that symbol has its own
per-symbol @code{%destructor}.

-Finally, you may define a default @code{%destructor} by placing
-@code{%symbol-default} in the @var{symbols} list of exactly one
-@code{%destructor} declaration in your grammar file.
-In that case, the parser will invoke the associated @var{code} whenever it
-discards any user-defined grammar symbol for which there is no per-type or
-per-symbol @code{%destructor}.
+Finally, you can define two different kinds of default @code{%destructor}s.
+You can place each of @code{<*>} and @code{<!>} in the @var{symbols} list of
+exactly one @code{%destructor} declaration in your grammar file.
+The parser will invoke the @var{code} associated with one of these whenever it
+discards any user-defined grammar symbol that has no per-symbol and no per-type
+@code{%destructor}.
+The parser uses the @var{code} for @code{<*>} in the case of such a grammar
+symbol for which you have formally declared a semantic type tag (@code{%type}
+counts as such a declaration, but @code{$<tag>$} does not).
+The parser uses the @var{code} for @code{<!>} in the case of such a grammar
+symbol that has no declared semantic type tag.
@end deffn

@noindent
-For instance:
+For example:

@smallexample
%union @{ char *string; @}
@@ -4290,35 +4295,52 @@ For instance:
%union @{ char character; @}
%token <character> CHR
%type <character> chr
-%destructor @{ free ($$); @} %symbol-default
-%destructor @{ free ($$); printf ("%d", @@$.first_line); @} STRING1 string1
+%token TAGLESS
+
%destructor @{ @} <character>
+%destructor @{ free ($$); @} <*>
+%destructor @{ free ($$); printf ("%d", @@$.first_line); @} STRING1 string1
+%destructor @{ printf ("Discarding tagless symbol.\n"); @} <!>
@end smallexample

@noindent
guarantees that, when the parser discards any user-defined symbol that has a
semantic type tag other than @code{<character>}, it passes its semantic value
-to @code{free}.
+to @code{free} by default.
However, when the parser discards a @code{STRING1} or a @code{string1}, it also
prints its line number to @code{stdout}.
It performs only the second @code{%destructor} in this case, so it invokes
@code{free} only once.
+Finally, the parser merely prints a message whenever it discards any symbol,
+such as @code{TAGLESS}, that has no semantic type tag.

-Notice that a Bison-generated parser invokes the default @code{%destructor}
-only for user-defined as opposed to Bison-defined symbols.
-For example, the parser will not invoke it for the special Bison-defined
-symbols @code{$accept}, @code{$undefined}, or @code{$end} (@pxref{Table of
-Symbols, ,Bison Symbols}), none of which you can reference in your grammar.
-It also will not invoke it for the @code{error} token (@pxref{Table of Symbols,
-,error}), which is always defined by Bison regardless of whether you reference
-it in your grammar.
-However, it will invoke it for the end token (token 0) if you redefine it from
-@code{$end} to, for example, @code{END}:
+A Bison-generated parser invokes the default @code{%destructor}s only for
+user-defined as opposed to Bison-defined symbols.
+For example, the parser will not invoke either kind of default
+@code{%destructor} for the special Bison-defined symbols @code{$accept},
+@code{$undefined}, or @code{$end} (@pxref{Table of Symbols, ,Bison Symbols}),
+none of which you can reference in your grammar.
+It also will not invoke either for the @code{error} token (@pxref{Table of
+Symbols, ,error}), which is always defined by Bison regardless of whether you
+reference it in your grammar.
+However, it may invoke one of them for the end token (token 0) if you
+redefine it from @code{$end} to, for example, @code{END}:

@smallexample
%token END 0
@end smallexample

+@cindex actions in mid-rule
+@cindex mid-rule actions
+Finally, Bison will never invoke a @code{%destructor} for an unreferenced
+mid-rule semantic value (@pxref{Mid-Rule Actions,,Actions in Mid-Rule}).
+That is, Bison does not consider a mid-rule to have a semantic value if you do
+not reference @code{$$} in the mid-rule's action or @code{$@var{n}} (where
+@var{n} is the RHS symbol position of the mid-rule) in any later action in that
+rule.
+However, if you do reference either, the Bison-generated parser will invoke the
+@code{<!>} @code{%destructor} whenever it discards the mid-rule symbol.
+
@ignore
@noindent
In the future, it may be possible to redefine the @code{error} token as a
@@ -8544,6 +8566,18 @@ Separates alternate rules for the same r
@xref{Rules, ,Syntax of Grammar Rules}.
@end deffn

+@deffn {Directive} <*>
+Used to define a default tagged @code{%destructor} or default tagged
+@code{%printer}.
+@xref{Destructor Decl, , Freeing Discarded Symbols}.
+@end deffn
+
+@deffn {Directive} <!>
+Used to define a default tagless @code{%destructor} or default tagless
+@code{%printer}.
+@xref{Destructor Decl, , Freeing Discarded Symbols}.
+@end deffn
+
@deffn {Symbol} $accept
The predefined nonterminal whose only rule is @samp{$accept: @var{start}
$end}, where @var{start} is the start symbol. @xref{Start Decl, , The
@@ -8776,11 +8810,6 @@ Bison declaration to specify the start s
Start-Symbol}.
@end deffn

-@deffn {Directive} %symbol-default
-Used to declare a default @code{%destructor} or default @code{%printer}.
-@xref{Destructor Decl, , Freeing Discarded Symbols}.
-@end deffn
-
@deffn {Directive} %token
Bison declaration to declare token(s) without specifying precedence.
@xref{Token Decl, ,Token Type Names}.
Index: src/parse-gram.y
===================================================================
RCS file: /sources/bison/bison/src/parse-gram.y,v
retrieving revision 1.94
diff -p -u -r1.94 parse-gram.y
--- src/parse-gram.y 16 Oct 2006 05:25:36 -0000 1.94
+++ src/parse-gram.y 21 Oct 2006 09:58:26 -0000
@@ -116,8 +116,6 @@ static int current_prec = 0;
%token PERCENT_TYPE "%type"
%token PERCENT_DESTRUCTOR "%destructor"
%token PERCENT_PRINTER "%printer"
-%token PERCENT_SYMBOL_DEFAULT
- "%symbol-default"

%token PERCENT_LEFT "%left"
%token PERCENT_RIGHT "%right"
@@ -177,6 +175,8 @@ static int current_prec = 0;
%token PROLOGUE "%{...%}"
%token SEMICOLON ";"
%token TYPE "type"
+%token TYPE_TAG_ANY "<*>"
+%token TYPE_TAG_NONE "<!>"

%type <character> CHAR
%printer { fputs (char_name ($$), stderr); } CHAR
@@ -395,7 +395,8 @@ generic_symlist:
generic_symlist_item:
symbol { $$ = symbol_list_sym_new ($1, @1); }
| TYPE { $$ = symbol_list_type_new ($1, @1); }
-| "%symbol-default" { $$ = symbol_list_default_new (@1); }
+| "<*>" { $$ = symbol_list_default_tagged_new (@1); }
+| "<!>" { $$ = symbol_list_default_tagless_new (@1); }
;

/* One token definition. */
Index: src/scan-gram.l
===================================================================
RCS file: /sources/bison/bison/src/scan-gram.l,v
retrieving revision 1.105
diff -p -u -r1.105 scan-gram.l
--- src/scan-gram.l 16 Oct 2006 05:25:36 -0000 1.105
+++ src/scan-gram.l 21 Oct 2006 09:58:27 -0000
@@ -194,7 +194,6 @@ splice (\\[ \f\t\v]*\n)*
"%right" return PERCENT_RIGHT;
"%skeleton" return PERCENT_SKELETON;
"%start" return PERCENT_START;
- "%symbol-default" return PERCENT_SYMBOL_DEFAULT;
"%term" return PERCENT_TOKEN;
"%token" return PERCENT_TOKEN;
"%token"[-_]"table" return PERCENT_TOKEN_TABLE;
@@ -210,6 +209,8 @@ splice (\\[ \f\t\v]*\n)*
"=" return EQUAL;
"|" return PIPE;
";" return SEMICOLON;
+ "<*>" return TYPE_TAG_ANY;
+ "<!>" return TYPE_TAG_NONE;

{id} {
val->uniqstr = uniqstr_new (yytext);
Index: src/symlist.c
===================================================================
RCS file: /sources/bison/bison/src/symlist.c,v
retrieving revision 1.22
diff -p -u -r1.22 symlist.c
--- src/symlist.c 15 Sep 2006 16:34:48 -0000 1.22
+++ src/symlist.c 21 Oct 2006 09:58:27 -0000
@@ -74,16 +74,33 @@ symbol_list_type_new (uniqstr type_name,
}


-/*----------------------------------------------------.
-| Create a list containing a %symbol-default at LOC. |
-`----------------------------------------------------*/
+/*----------------------------------------.
+| Create a list containing a <*> at LOC. |
+`----------------------------------------*/

symbol_list *
-symbol_list_default_new (location loc)
+symbol_list_default_tagged_new (location loc)
{
symbol_list *res = xmalloc (sizeof *res);

- res->content_type = SYMLIST_DEFAULT;
+ res->content_type = SYMLIST_DEFAULT_TAGGED;
+ res->location = loc;
+ res->next = NULL;
+
+ return res;
+}
+
+
+/*----------------------------------------.
+| Create a list containing a <!> at LOC. |
+`----------------------------------------*/
+
+symbol_list *
+symbol_list_default_tagless_new (location loc)
+{
+ symbol_list *res = xmalloc (sizeof *res);
+
+ res->content_type = SYMLIST_DEFAULT_TAGLESS;
res->location = loc;
res->next = NULL;

@@ -215,8 +232,11 @@ symbol_list_destructor_set (symbol_list
semantic_type_destructor_set (
semantic_type_get (node->content.type_name), destructor, loc);
break;
- case SYMLIST_DEFAULT:
- default_destructor_set (destructor, loc);
+ case SYMLIST_DEFAULT_TAGGED:
+ default_tagged_destructor_set (destructor, loc);
+ break;
+ case SYMLIST_DEFAULT_TAGLESS:
+ default_tagless_destructor_set (destructor, loc);
break;
}
}
@@ -233,8 +253,11 @@ symbol_list_printer_set (symbol_list *no
semantic_type_printer_set (
semantic_type_get (node->content.type_name), printer, loc);
break;
- case SYMLIST_DEFAULT:
- default_printer_set (printer, loc);
+ case SYMLIST_DEFAULT_TAGGED:
+ default_tagged_printer_set (printer, loc);
+ break;
+ case SYMLIST_DEFAULT_TAGLESS:
+ default_tagless_printer_set (printer, loc);
break;
}
}
Index: src/symlist.h
===================================================================
RCS file: /sources/bison/bison/src/symlist.h,v
retrieving revision 1.18
diff -p -u -r1.18 symlist.h
--- src/symlist.h 4 Sep 2006 19:29:29 -0000 1.18
+++ src/symlist.h 21 Oct 2006 09:58:27 -0000
@@ -29,10 +29,13 @@
typedef struct symbol_list
{
/**
- * Whether this node contains a symbol, a semantic type, or a
- * \c \%symbol-default.
+ * Whether this node contains a symbol, a semantic type, a \c <*>, or a
+ * \c <!>.
*/
- enum { SYMLIST_SYMBOL, SYMLIST_TYPE, SYMLIST_DEFAULT } content_type;
+ enum {
+ SYMLIST_SYMBOL, SYMLIST_TYPE,
+ SYMLIST_DEFAULT_TAGGED, SYMLIST_DEFAULT_TAGLESS
+ } content_type;
union {
/** The symbol or \c NULL iff <tt>node_type = SYMLIST_SYMBOL</tt>. */
symbol *sym;
@@ -76,8 +79,10 @@ symbol_list *symbol_list_sym_new (symbol
/** Create a list containing \c type_name at \c loc. */
symbol_list *symbol_list_type_new (uniqstr type_name, location loc);

-/** Create a list containing a \c \%symbol-default at \c loc. */
-symbol_list *symbol_list_default_new (location loc);
+/** Create a list containing a \c <*> at \c loc. */
+symbol_list *symbol_list_default_tagged_new (location loc);
+/** Create a list containing a \c <!> at \c loc. */
+symbol_list *symbol_list_default_tagless_new (location loc);

/** Print this list.

Index: src/symtab.c
===================================================================
RCS file: /sources/bison/bison/src/symtab.c,v
retrieving revision 1.83
diff -p -u -r1.83 symtab.c
--- src/symtab.c 21 Oct 2006 04:52:43 -0000 1.83
+++ src/symtab.c 21 Oct 2006 09:58:27 -0000
@@ -41,14 +41,19 @@ symbol *accept = NULL;
symbol *startsymbol = NULL;
location startsymbol_location;

-/*-----------------------------------.
-| Default %destructor and %printer. |
-`-----------------------------------*/
-
-static const char *default_destructor = NULL;
-static location default_destructor_location;
-static const char *default_printer = NULL;
-static location default_printer_location;
+/*---------------------------------------.
+| Default %destructor's and %printer's. |
+`---------------------------------------*/
+
+static const char *default_tagged_destructor = NULL;
+static location default_tagged_destructor_location;
+static const char *default_tagless_destructor = NULL;
+static location default_tagless_destructor_location;
+
+static const char *default_tagged_printer = NULL;
+static location default_tagged_printer_location;
+static const char *default_tagless_printer = NULL;
+static location default_tagless_printer_location;

/*---------------------------------.
| Create a new symbol, named TAG. |
@@ -220,10 +225,13 @@ symbol_destructor_get (symbol *sym)
return type->destructor;
}

- /* Apply the default %destructor only to user-defined symbols. */
+ /* Apply default %destructor's only to user-defined symbols. */
if (sym->tag[0] == '$' || sym == errtoken)
return NULL;
- return default_destructor;
+
+ if (sym->type_name)
+ return default_tagged_destructor;
+ return default_tagless_destructor;
}

/*---------------------------------------------------------------.
@@ -240,8 +248,9 @@ symbol_destructor_location_get (symbol *
semantic_type *type = semantic_type_get (sym->type_name);
if (type->destructor)
return type->destructor_location;
+ return default_tagged_destructor_location;
}
- return default_destructor_location;
+ return default_tagless_destructor_location;
}

/*---------------------------------------------------------------.
@@ -300,7 +309,10 @@ symbol_printer_get (symbol *sym)
/* Apply the default %printer only to user-defined symbols. */
if (sym->tag[0] == '$' || sym == errtoken)
return NULL;
- return default_printer;
+
+ if (sym->type_name)
+ return default_tagged_printer;
+ return default_tagless_printer;
}

/*------------------------------------------------------------.
@@ -317,8 +329,9 @@ symbol_printer_location_get (symbol *sym
semantic_type *type = semantic_type_get (sym->type_name);
if (type->printer)
return type->printer_location;
+ return default_tagged_printer_location;
}
- return default_printer_location;
+ return default_tagless_printer_location;
}


@@ -924,30 +937,58 @@ symbols_pack (void)
}


-/*-----------------------------------.
-| Set default %destructor/%printer. |
-`-----------------------------------*/
+/*--------------------------------------------------.
+| Set default tagged/tagless %destructor/%printer. |
+`--------------------------------------------------*/
+
+void
+default_tagged_destructor_set (const char *destructor, location loc)
+{
+ if (default_tagged_destructor != NULL)
+ {
+ complain_at (loc, _("redeclaration for default tagged %%destructor"));
+ complain_at (default_tagged_destructor_location,
+ _("previous declaration"));
+ }
+ default_tagged_destructor = destructor;
+ default_tagged_destructor_location = loc;
+}
+
+void
+default_tagless_destructor_set (const char *destructor, location loc)
+{
+ if (default_tagless_destructor != NULL)
+ {
+ complain_at (loc, _("redeclaration for default tagless %%destructor"));
+ complain_at (default_tagless_destructor_location,
+ _("previous declaration"));
+ }
+ default_tagless_destructor = destructor;
+ default_tagless_destructor_location = loc;
+}

void
-default_destructor_set (const char *destructor, location loc)
+default_tagged_printer_set (const char *printer, location loc)
{
- if (default_destructor != NULL)
+ if (default_tagged_printer != NULL)
{
- complain_at (loc, _("redeclaration for default %%destructor"));
- complain_at (default_destructor_location, _("previous declaration"));
+ complain_at (loc, _("redeclaration for default tagged %%printer"));
+ complain_at (default_tagged_printer_location,
+ _("previous declaration"));
}
- default_destructor = destructor;
- default_destructor_location = loc;
+ default_tagged_printer = printer;
+ default_tagged_printer_location = loc;
}

void
-default_printer_set (const char *printer, location loc)
+default_tagless_printer_set (const char *printer, location loc)
{
- if (default_printer != NULL)
+ if (default_tagless_printer != NULL)
{
- complain_at (loc, _("redeclaration for default %%printer"));
- complain_at (default_printer_location, _("previous declaration"));
+ complain_at (loc, _("redeclaration for default tagless %%printer"));
+ complain_at (default_tagless_printer_location,
+ _("previous declaration"));
}
- default_printer = printer;
- default_printer_location = loc;
+ default_tagless_printer = printer;
+ default_tagless_printer_location = loc;
}
Index: src/symtab.h
===================================================================
RCS file: /sources/bison/bison/src/symtab.h,v
retrieving revision 1.65
diff -p -u -r1.65 symtab.h
--- src/symtab.h 4 Sep 2006 22:20:52 -0000 1.65
+++ src/symtab.h 21 Oct 2006 09:58:27 -0000
@@ -69,8 +69,8 @@ struct symbol
/** Any \c \%destructor declared specifically for this symbol.

Access this field only through <tt>symbol</tt>'s interface functions. For
- example, if <tt>symbol::destructor = NULL</tt>, the default
- \c \%destructor or a per-type \c \%destructor might be appropriate, and
+ example, if <tt>symbol::destructor = NULL</tt>, a default \c \%destructor
+ or a per-type \c \%destructor might be appropriate, and
\c symbol_destructor_get will compute the correct one. */
const char *destructor;

@@ -255,14 +255,18 @@ void symbols_check_defined (void);
void symbols_pack (void);


-/*-----------------------------------.
-| Default %destructor and %printer. |
-`-----------------------------------*/
-
-/** Set the default \c \%destructor. */
-void default_destructor_set (const char *destructor, location loc);
-
-/** Set the default \c \%printer. */
-void default_printer_set (const char *printer, location loc);
+/*---------------------------------------.
+| Default %destructor's and %printer's. |
+`---------------------------------------*/
+
+/** Set the default \c \%destructor for tagged values. */
+void default_tagged_destructor_set (const char *destructor, location loc);
+/** Set the default \c \%destructor for tagless values. */
+void default_tagless_destructor_set (const char *destructor, location loc);
+
+/** Set the default \c \%printer for tagged values. */
+void default_tagged_printer_set (const char *printer, location loc);
+/** Set the default \c \%printer for tagless values. */
+void default_tagless_printer_set (const char *printer, location loc);

#endif /* !SYMTAB_H_ */
Index: tests/actions.at
===================================================================
RCS file: /sources/bison/bison/tests/actions.at,v
retrieving revision 1.73
diff -p -u -r1.73 actions.at
--- tests/actions.at 21 Oct 2006 04:52:43 -0000 1.73
+++ tests/actions.at 21 Oct 2006 09:58:27 -0000
@@ -583,14 +583,14 @@ AT_CHECK_PRINTER_AND_DESTRUCTOR([%glr-pa



-## --------------------------------- ##
-## Default %printer and %destructor. ##
-## --------------------------------- ##
+## ----------------------------------------- ##
+## Default tagless %printer and %destructor. ##
+## ----------------------------------------- ##

# Check that the right %printer and %destructor are called, that they're not
# called for $end, and that $$ and @$ work correctly.

-AT_SETUP([Default %printer and %destructor])
+AT_SETUP([Default tagless %printer and %destructor])

AT_DATA_GRAMMAR([[input.y]],
[[%error-verbose
@@ -610,11 +610,15 @@ AT_DATA_GRAMMAR([[input.y]],
%}

%printer {
- fprintf (yyoutput, "Default printer for '%c' @ %d", $$, @$.first_column);
-} %symbol-default
+ fprintf (yyoutput, "<*> printer should not be called.\n");
+} <*>
+
+%printer {
+ fprintf (yyoutput, "<!> printer for '%c' @ %d", $$, @$.first_column);
+} <!>
%destructor {
- fprintf (stdout, "Default destructor for '%c' @ %d.\n", $$, @$.first_column);
-} %symbol-default
+ fprintf (stdout, "<!> destructor for '%c' @ %d.\n", $$, @$.first_column);
+} <!>

%printer {
fprintf (yyoutput, "'b'/'c' printer for '%c' @ %d", $$, @$.first_column);
@@ -623,6 +627,10 @@ AT_DATA_GRAMMAR([[input.y]],
fprintf (stdout, "'b'/'c' destructor for '%c' @ %d.\n", $$, @$.first_column);
} 'b' 'c'

+%destructor {
+ fprintf (yyoutput, "<*> destructor should not be called.\n");
+} <*>
+
%%

start: 'a' 'b' 'c' 'd' 'e' { $$ = 'S'; USE(($1, $2, $3, $4, $5)); } ;
@@ -659,15 +667,15 @@ main (void)
AT_CHECK([bison -o input.c input.y])
AT_COMPILE([input])
AT_PARSER_CHECK([./input], 1,
-[[Default destructor for 'd' @ 4.
+[[<!> destructor for 'd' @ 4.
'b'/'c' destructor for 'c' @ 3.
'b'/'c' destructor for 'b' @ 2.
-Default destructor for 'a' @ 1.
+<!> destructor for 'a' @ 1.
]],
[[Starting parse
Entering state 0
-Reading a token: Next token is token 'a' (1.1-1.1: Default printer for 'a' @ 1)
-Shifting token 'a' (1.1-1.1: Default printer for 'a' @ 1)
+Reading a token: Next token is token 'a' (1.1-1.1: <!> printer for 'a' @ 1)
+Shifting token 'a' (1.1-1.1: <!> printer for 'a' @ 1)
Entering state 1
Reading a token: Next token is token 'b' (1.2-1.2: 'b'/'c' printer for 'b' @ 2)
Shifting token 'b' (1.2-1.2: 'b'/'c' printer for 'b' @ 2)
@@ -675,18 +683,18 @@ Entering state 3
Reading a token: Next token is token 'c' (1.3-1.3: 'b'/'c' printer for 'c' @ 3)
Shifting token 'c' (1.3-1.3: 'b'/'c' printer for 'c' @ 3)
Entering state 5
-Reading a token: Next token is token 'd' (1.4-1.4: Default printer for 'd' @ 4)
-Shifting token 'd' (1.4-1.4: Default printer for 'd' @ 4)
+Reading a token: Next token is token 'd' (1.4-1.4: <!> printer for 'd' @ 4)
+Shifting token 'd' (1.4-1.4: <!> printer for 'd' @ 4)
Entering state 6
Reading a token: Now at end of input.
syntax error, unexpected $end, expecting 'e'
-Error: popping token 'd' (1.4-1.4: Default printer for 'd' @ 4)
+Error: popping token 'd' (1.4-1.4: <!> printer for 'd' @ 4)
Stack now 0 1 3 5
Error: popping token 'c' (1.3-1.3: 'b'/'c' printer for 'c' @ 3)
Stack now 0 1 3
Error: popping token 'b' (1.2-1.2: 'b'/'c' printer for 'b' @ 2)
Stack now 0 1
-Error: popping token 'a' (1.1-1.1: Default printer for 'a' @ 1)
+Error: popping token 'a' (1.1-1.1: <!> printer for 'a' @ 1)
Stack now 0
Cleanup: discarding lookahead token $end (1.5-1.5: )
Stack now 0
@@ -696,11 +704,11 @@ AT_CLEANUP



-## ----------------------------------- ##
-## Per-type %printer and %destructor. ##
-## ----------------------------------- ##
+## ------------------------------------------------------ ##
+## Default tagged and per-type %printer and %destructor. ##
+## ------------------------------------------------------ ##

-AT_SETUP([Per-type %printer and %destructor])
+AT_SETUP([Default tagged and per-type %printer and %destructor])

AT_DATA_GRAMMAR([[input.y]],
[[%error-verbose
@@ -714,16 +722,20 @@ AT_DATA_GRAMMAR([[input.y]],
# define USE(SYM)
%}

+%printer {
+ fprintf (yyoutput, "<!> printer should not be called.\n");
+} <!>
+
%union { int field0; int field1; int field2; }
%type <field0> start 'a' 'g'
%type <field1> 'e'
%type <field2> 'f'
%printer {
- fprintf (yyoutput, "%%symbol-default/<field2>/e printer");
-} %symbol-default 'e' <field2>
+ fprintf (yyoutput, "<*>/<field2>/e printer");
+} <*> 'e' <field2>
%destructor {
- fprintf (stdout, "%%symbol-default/<field2>/e destructor.\n");
-} %symbol-default 'e' <field2>
+ fprintf (stdout, "<*>/<field2>/e destructor.\n");
+} <*> 'e' <field2>

%type <field1> 'b'
%printer { fprintf (yyoutput, "<field1> printer"); } <field1>
@@ -737,6 +749,10 @@ AT_DATA_GRAMMAR([[input.y]],
%printer { fprintf (yyoutput, "'d' printer"); } 'd'
%destructor { fprintf (stdout, "'d' destructor.\n"); } 'd'

+%destructor {
+ fprintf (yyoutput, "<!> destructor should not be called.\n");
+} <!>
+
%%

start:
@@ -776,17 +792,17 @@ main (void)
AT_CHECK([bison -o input.c input.y])
AT_COMPILE([input])
AT_PARSER_CHECK([./input], 1,
-[[%symbol-default/<field2>/e destructor.
-%symbol-default/<field2>/e destructor.
+[[<*>/<field2>/e destructor.
+<*>/<field2>/e destructor.
'd' destructor.
'c' destructor.
<field1> destructor.
-%symbol-default/<field2>/e destructor.
+<*>/<field2>/e destructor.
]],
[[Starting parse
Entering state 0
-Reading a token: Next token is token 'a' (%symbol-default/<field2>/e printer)
-Shifting token 'a' (%symbol-default/<field2>/e printer)
+Reading a token: Next token is token 'a' (<*>/<field2>/e printer)
+Shifting token 'a' (<*>/<field2>/e printer)
Entering state 1
Reading a token: Next token is token 'b' (<field1> printer)
Shifting token 'b' (<field1> printer)
@@ -797,17 +813,17 @@ Entering state 5
Reading a token: Next token is token 'd' ('d' printer)
Shifting token 'd' ('d' printer)
Entering state 6
-Reading a token: Next token is token 'e' (%symbol-default/<field2>/e printer)
-Shifting token 'e' (%symbol-default/<field2>/e printer)
+Reading a token: Next token is token 'e' (<*>/<field2>/e printer)
+Shifting token 'e' (<*>/<field2>/e printer)
Entering state 7
-Reading a token: Next token is token 'f' (%symbol-default/<field2>/e printer)
-Shifting token 'f' (%symbol-default/<field2>/e printer)
+Reading a token: Next token is token 'f' (<*>/<field2>/e printer)
+Shifting token 'f' (<*>/<field2>/e printer)
Entering state 8
Reading a token: Now at end of input.
syntax error, unexpected $end, expecting 'g'
-Error: popping token 'f' (%symbol-default/<field2>/e printer)
+Error: popping token 'f' (<*>/<field2>/e printer)
Stack now 0 1 3 5 6 7
-Error: popping token 'e' (%symbol-default/<field2>/e printer)
+Error: popping token 'e' (<*>/<field2>/e printer)
Stack now 0 1 3 5 6
Error: popping token 'd' ('d' printer)
Stack now 0 1 3 5
@@ -815,7 +831,7 @@ Error: popping token 'c' ('c' printer)
Stack now 0 1 3
Error: popping token 'b' (<field1> printer)
Stack now 0 1
-Error: popping token 'a' (%symbol-default/<field2>/e printer)
+Error: popping token 'a' (<*>/<field2>/e printer)
Stack now 0
Cleanup: discarding lookahead token $end ()
Stack now 0
@@ -826,12 +842,19 @@ AT_CLEANUP


## ------------------------------------------------------------- ##
-## Default %printer and %destructor for user-defined end token. ##
+## Default %printer and %destructor for user-defined end token. ##
## ------------------------------------------------------------- ##

AT_SETUP([Default %printer and %destructor for user-defined end token])

-AT_DATA_GRAMMAR([[input.y]],
+# _AT_CHECK_DEFAULT_PRINTER_AND_DESTRUCTOR_FOR_END_TOKEN(TYPED)
+# -----------------------------------------------------------------------------
+m4_define([_AT_CHECK_DEFAULT_PRINTER_AND_DESTRUCTOR_FOR_END_TOKEN],
+[m4_if($1, 0,
+ [m4_pushdef([kind], [!]) m4_pushdef([not_kind], [*])],
+ [m4_pushdef([kind], [*]) m4_pushdef([not_kind], [!])])
+
+AT_DATA_GRAMMAR([[input]]$1[[.y]],
[[%error-verbose
%debug
%locations
@@ -848,13 +871,26 @@ AT_DATA_GRAMMAR([[input.y]],
# define USE(SYM)
%}

+%destructor {
+ fprintf (yyoutput, "<]]not_kind[[> destructor should not be called.\n");
+} <]]not_kind[[>
+
%token END 0
%printer {
- fprintf (yyoutput, "Default printer for '%c' @ %d", $$, @$.first_column);
-} %symbol-default
+ fprintf (yyoutput, "<]]kind[[> for '%c' @ %d", $$, @$.first_column);
+} <]]kind[[>
%destructor {
- fprintf (stdout, "Default destructor for '%c' @ %d.\n", $$, @$.first_column);
-} %symbol-default
+ fprintf (stdout, "<]]kind[[> for '%c' @ %d.\n", $$, @$.first_column);
+} <]]kind[[>
+
+%printer {
+ fprintf (yyoutput, "<]]not_kind[[> printer should not be called.\n");
+} <]]not_kind[[>
+
+]]m4_if($1, 0, [[[
+]]],
+[[[%union { char tag; }
+%type <tag> start END]]])[[

%%

@@ -868,7 +904,7 @@ yylex (void)
static int called;
if (called++)
abort ();
- yylval = 'E';
+ yylval]]m4_if($1, 0,, [[[.tag]]])[[ = 'E';
yylloc.first_line = yylloc.last_line = 1;
yylloc.first_column = yylloc.last_column = 1;
return 0;
@@ -888,26 +924,33 @@ main (void)
}
]])

-AT_CHECK([bison -o input.c input.y])
-AT_COMPILE([input])
-AT_PARSER_CHECK([./input], 0,
-[[Default destructor for 'E' @ 1.
-Default destructor for 'S' @ 1.
+AT_CHECK([bison -o input$1.c input$1.y])
+AT_COMPILE([input$1])
+AT_PARSER_CHECK([./input$1], 0,
+[[<]]kind[[> for 'E' @ 1.
+<]]kind[[> for 'S' @ 1.
]],
[[Starting parse
Entering state 0
-Reducing stack by rule 1 (line 35):
--> $$ = nterm start (1.1-1.1: Default printer for 'S' @ 1)
+Reducing stack by rule 1 (line 46):
+-> $$ = nterm start (1.1-1.1: <]]kind[[> for 'S' @ 1)
Stack now 0
Entering state 1
Reading a token: Now at end of input.
-Shifting token END (1.1-1.1: Default printer for 'E' @ 1)
+Shifting token END (1.1-1.1: <]]kind[[> for 'E' @ 1)
Entering state 2
Stack now 0 1 2
-Cleanup: popping token END (1.1-1.1: Default printer for 'E' @ 1)
-Cleanup: popping nterm start (1.1-1.1: Default printer for 'S' @ 1)
+Cleanup: popping token END (1.1-1.1: <]]kind[[> for 'E' @ 1)
+Cleanup: popping nterm start (1.1-1.1: <]]kind[[> for 'S' @ 1)
]])

+m4_popdef([kind])
+m4_popdef([not_kind])
+])
+
+_AT_CHECK_DEFAULT_PRINTER_AND_DESTRUCTOR_FOR_END_TOKEN(0)
+_AT_CHECK_DEFAULT_PRINTER_AND_DESTRUCTOR_FOR_END_TOKEN(1)
+
AT_CLEANUP


@@ -940,10 +983,10 @@ AT_DATA_GRAMMAR([[input.y]],

%printer {
fprintf (yyoutput, "'%c'", $$);
-} %symbol-default
+} <!> <*>
%destructor {
fprintf (stderr, "DESTROY '%c'\n", $$);
-} %symbol-default
+} <!> <*>

%%

@@ -1055,11 +1098,11 @@ AT_DATA_GRAMMAR([[input.y]],
%printer {
char chr = $$;
fprintf (yyoutput, "'%c'", chr);
-} %symbol-default
+} <!> <*>
%destructor {
char chr = $$;
fprintf (stderr, "DESTROY '%c'\n", chr);
-} %symbol-default
+} <!> <*>

%union { char chr; }
%type <chr> start
@@ -1119,8 +1162,10 @@ AT_DATA_GRAMMAR([[input.y]],
# define YY_LOCATION_PRINT(File, Loc)
%}

-%printer { fprintf (yyoutput, "%d", @$); } %symbol-default
-%destructor { fprintf (stderr, "DESTROY %d\n", @$); } %symbol-default
+%printer { fprintf (yyoutput, "%d", @$); } <!>
+%destructor { fprintf (stderr, "DESTROY %d\n", @$); } <!>
+%printer { fprintf (yyoutput, "<*> printer should not be called"); } <*>
+%destructor { fprintf (yyoutput, "<*> destructor should not be called"); } <*>

%%

@@ -1159,27 +1204,27 @@ main (void)
]])

AT_CHECK([bison -o input.c input.y], 0,,
-[[input.y:31.3-23: warning: unset value: $$
-input.y:28.3-33.37: warning: unused value: $3
+[[input.y:33.3-23: warning: unset value: $$
+input.y:30.3-35.37: warning: unused value: $3
]])

AT_COMPILE([input])
AT_PARSER_CHECK([./input], 1,,
[[Starting parse
Entering state 0
-Reducing stack by rule 1 (line 28):
+Reducing stack by rule 1 (line 30):
-> $$ = nterm $@1 (: )
Stack now 0
Entering state 2
-Reducing stack by rule 2 (line 29):
+Reducing stack by rule 2 (line 31):
-> $$ = nterm @2 (: 2)
Stack now 0 2
Entering state 4
-Reducing stack by rule 3 (line 30):
+Reducing stack by rule 3 (line 32):
-> $$ = nterm @3 (: 3)
Stack now 0 2 4
Entering state 5
-Reducing stack by rule 4 (line 31):
+Reducing stack by rule 4 (line 33):
-> $$ = nterm @4 (: 4)
Stack now 0 2 4 5
Entering state 6
Index: tests/input.at
===================================================================
RCS file: /sources/bison/bison/tests/input.at,v
retrieving revision 1.59
diff -p -u -r1.59 input.at
--- tests/input.at 21 Oct 2006 02:31:50 -0000 1.59
+++ tests/input.at 21 Oct 2006 09:58:27 -0000
@@ -178,33 +178,54 @@ AT_CLEANUP
AT_SETUP([Default %printer and %destructor redeclared])

AT_DATA([[input.y]],
-[[%destructor { destroy ($$); } %symbol-default %symbol-default
-%printer { destroy ($$); } %symbol-default %symbol-default
+[[%destructor { destroy ($$); } <*> <*>
+%printer { destroy ($$); } <*> <*>

-%destructor { destroy ($$); } %symbol-default
-%printer { destroy ($$); } %symbol-default
+%destructor { destroy ($$); } <*>
+%printer { destroy ($$); } <*>
+
+%destructor { destroy ($$); } <!> <!>
+%printer { destroy ($$); } <!> <!>
+
+%destructor { destroy ($$); } <!>
+%printer { destroy ($$); } <!>

%%

start: ;

-%destructor { destroy ($$); } %symbol-default;
-%printer { destroy ($$); } %symbol-default;
+%destructor { destroy ($$); } <*>;
+%printer { destroy ($$); } <*>;
+
+%destructor { destroy ($$); } <!>;
+%printer { destroy ($$); } <!>;
]])

AT_CHECK([bison input.y], [1], [],
-[[input.y:1.13-29: redeclaration for default %destructor
+[[input.y:1.13-29: redeclaration for default tagged %destructor
input.y:1.13-29: previous declaration
-input.y:2.10-26: redeclaration for default %printer
+input.y:2.10-26: redeclaration for default tagged %printer
input.y:2.10-26: previous declaration
-input.y:4.13-29: redeclaration for default %destructor
+input.y:4.13-29: redeclaration for default tagged %destructor
input.y:1.13-29: previous declaration
-input.y:5.10-26: redeclaration for default %printer
+input.y:5.10-26: redeclaration for default tagged %printer
input.y:2.10-26: previous declaration
-input.y:11.13-29: redeclaration for default %destructor
+input.y:7.13-29: redeclaration for default tagless %destructor
+input.y:7.13-29: previous declaration
+input.y:8.10-26: redeclaration for default tagless %printer
+input.y:8.10-26: previous declaration
+input.y:10.13-29: redeclaration for default tagless %destructor
+input.y:7.13-29: previous declaration
+input.y:11.10-26: redeclaration for default tagless %printer
+input.y:8.10-26: previous declaration
+input.y:17.13-29: redeclaration for default tagged %destructor
input.y:4.13-29: previous declaration
-input.y:12.10-26: redeclaration for default %printer
+input.y:18.10-26: redeclaration for default tagged %printer
input.y:5.10-26: previous declaration
+input.y:20.13-29: redeclaration for default tagless %destructor
+input.y:10.13-29: previous declaration
+input.y:21.10-26: redeclaration for default tagless %printer
+input.y:11.10-26: previous declaration
]])

AT_CLEANUP
@@ -260,18 +281,36 @@ AT_CLEANUP
AT_SETUP([Unused values with default %destructor])

AT_DATA([[input.y]],
-[[%destructor { destroy ($$); } %symbol-default
+[[%destructor { destroy ($$); } <!>
+%type <tag> tagged

%%

-start: end end { $1; } ;
-end: { } ;
+start: end end tagged tagged { $<tag>1; $3; } ;
+end: { } ;
+tagged: { } ;
+]])
+
+AT_CHECK([bison input.y], [0], [],
+[[input.y:6.8-45: warning: unset value: $$
+input.y:6.8-45: warning: unused value: $2
+input.y:7.6-8: warning: unset value: $$
+]])
+
+AT_DATA([[input.y]],
+[[%destructor { destroy ($$); } <*>
+%type <tag> tagged
+
+%%
+
+start: end end tagged tagged { $<tag>1; $3; } ;
+end: { } ;
+tagged: { } ;
]])

AT_CHECK([bison input.y], [0], [],
-[[input.y:5.8-22: warning: unset value: $$
-input.y:5.8-22: warning: unused value: $2
-input.y:6.6-8: warning: unset value: $$
+[[input.y:6.8-45: warning: unused value: $4
+input.y:8.9-11: warning: unset value: $$
]])

AT_CLEANUP
Paolo Bonzini
2006-10-24 01:41:47 UTC
Permalink
Joel E. Denny wrote:
> On Thu, 14 Sep 2006, Joel E. Denny wrote:
>
>> On Thu, 14 Sep 2006, Akim Demaille wrote:
>>
>>> Or better yet (?), no %symbol-default, but:
>>>
>>> %printer { cerr << @$ << ": " << $$; } <*>
>> So, this means all symbols with types, right? I was actually thinking of
>> this syntax before %symbol-default, but then I rejected it because I
>> couldn't figure out what to do about type-less symbols....
>>
>>> %printer { cerr << @$; } <->
>> I like this. Small difference though: what about <!>? In my mind, "!" =
>> "not", and it looks slightly odd, which is what we mean to imply, I think.

What about <UNTAGGED>? As in

| TYPE {
if (!strcmp ($1, "UNTAGGED"))
$$ = symbol_list_default_tagless_new (@1);
else
$$ = symbol_list_type_new ($1, @1);
}

Thoughts? * is universally known as "everything", ! looks a bit weird
to me...

Paolo
Joel E. Denny
2006-10-24 01:44:41 UTC
Permalink
On Tue, 24 Oct 2006, Paolo Bonzini wrote:

> What about <UNTAGGED>? As in

An existing grammar might have a union field named UNTAGGED.
Paolo Bonzini
2006-10-24 08:54:27 UTC
Permalink
Joel E. Denny wrote:
> On Tue, 24 Oct 2006, Paolo Bonzini wrote:
>
>> What about <UNTAGGED>? As in
>
> An existing grammar might have a union field named UNTAGGED.

Can't deny that, but it seems unlikely... <!> is really too
hieroglyphic for me. If anything, I would prefer <> or a trailing
%untagged...

Paolo
Joel E. Denny
2006-10-24 22:09:26 UTC
Permalink
On Tue, 24 Oct 2006, Paolo Bonzini wrote:

> Joel E. Denny wrote:
> > On Tue, 24 Oct 2006, Paolo Bonzini wrote:
> >
> > > What about <UNTAGGED>? As in
> >
> > An existing grammar might have a union field named UNTAGGED.
>
> Can't deny that, but it seems unlikely...

May be, but with all the Yacc and Bison grammars out there, I wouldn't bet
on it.

> <!> is really too hieroglyphic for
> me.

I realize <!> looks odd when considered in isolation, but I'm trying to be
consistent with a couple of other proposals....

First, named semantic values:

exp(sum): exp(term1) '+' exp(term2) {
$sum = $term1 + $term2
}
;

grammar(): defs() rules() epilogue(!) {
$grammar = new_grammar ($defs, $rules);
}
;

Here, () = unspecified value name = default name = the symbol name. That
seems logical to me. (!) = no value is used at all. The ! conveys a
sense of caution, which I think this is appropriate given that its purpose
would be to disable any Bison warning about unused $3.

Second:

%destructor(!) { printf ("A SYM was discarded.\n"); } SYM

Again, (!) = no value and conveys a sense of caution. Again, it would
disable Bison's warning about unused $$.

I don't want to create another notation for the same concept. That is,
while (!) would indicate no value, <!> would indicate no type tag.
Avoiding Bison's static type system for some symbols often still warrants
some caution, so that ! make sense to me.

All of this really comes together nicely when imagining a grammar in which
you want to use <!> but you still have some other symbols with type tags.
You would likely need to write:

%destructor(!) { printf ("A symbol was discarded.\n"); } <!>

Here, symbols with no type tags have no values but still have a
%destructor.

Of course, if no symbols in your grammar have type tags, or if you plan to
use $<tag>$ extensively for untagged symbols, it might be reasonable to
have <!> without (!) in a %destructor.

> If anything, I would prefer <>

In my opinion, <> = unspecified tag = default tag, and I'm not sure what
that is. This interpretation is consistent with the () proposal.

> or a trailing %untagged...

%untagged seems reasonable in isolation. However, I don't like the lack
of symmetry with respect to <*> and (!).

I think any alternate proposal in this area should maintain the
consistency of (), (value-name), <tag>, (!), <!>, and <*>. (I don't know
what <> or (*) would be, but maybe one day.)

What do you think?
Paolo Bonzini
2006-10-25 01:36:39 UTC
Permalink
[Joel, I added some clarifications at the bottom]
> grammar(): defs() rules() epilogue(!) {
> $grammar = new_grammar ($defs, $rules);
> }
> ;
Here you won't break grammar source compatibility by omitting the ()
altogether.
> %destructor(!) { printf ("A symbol was discarded.\n"); } <!>
>
> Here, symbols with no type tags have no values but still have a
> %destructor.
>
> Of course, if no symbols in your grammar have type tags, or if you
> plan to use $<tag>$ extensively for untagged symbols, it might be
> reasonable to have <!> without (!) in a %destructor.
>
I still don't see much similarity with (!) and Then, why not having

%destructor BLOCK

implement a <*> destruction, and something like

%destructor(!) BLOCK
%destructor BLOCK %pragma(unused-value)

implement a <!> destruction? Going for the latter, of course, would
imply the possibility to do

%destructor BLOCK %pragma(unused-value) <foo>

even if foo is not untagged.

For now, this would mean having only the semantics of <*> available.
But besides debugging code, why would <!> functionality be useful?

Paolo
Joel E. Denny
2006-10-25 01:52:24 UTC
Permalink
On Wed, 25 Oct 2006, Paolo Bonzini wrote:

> > grammar(): defs() rules() epilogue(!) {
> > $grammar = new_grammar ($defs, $rules);
> > }
> > ;
> Here you won't break grammar source compatibility by omitting the ()
> altogether.

If Bison were to complain about value name conflicts (and I think it
should), you would have trouble with:

exp: exp '+' exp

> > %destructor(!) { printf ("A symbol was discarded.\n"); } <!>
> >
> > Here, symbols with no type tags have no values but still have a %destructor.
> >
> > Of course, if no symbols in your grammar have type tags, or if you plan to
> > use $<tag>$ extensively for untagged symbols, it might be reasonable to have
> > <!> without (!) in a %destructor.
> >
> I still don't see much similarity with (!)

<!> = no tag. (!) = no value. Both cases warrant caution. Those are the
similarities. Maybe I don't understand what you're saying.

> and Then, why not having
>
> %destructor BLOCK
>
> implement a <*> destruction, and something like

At one time I implemented the empty symbol list notation to indicate a
default destructor. Akim and I agreed that was ugly. To me, it reads
like a syntactic mistake. Moreover, one might want the <*> and <!>
destructor to be the same, and it's nice not to have to repeat code:

%destructor(!) { printf ("A symbol was discarded.\n"); } <!> <*>

> %destructor(!) BLOCK
> %destructor BLOCK %pragma(unused-value)
>
> implement a <!> destruction? Going for the latter, of course, would imply the
> possibility to do
>
> %destructor BLOCK %pragma(unused-value) <foo>
>
> even if foo is not untagged.

It doesn't make sense to me that (!) should mean "no value" in rules but
"no type tag" in %destructor. Maybe I don't understand what you're
saying.

> For now, this would mean having only the semantics of <*> available.

I'm a bit lost. I thought you just proposed an empty symbol list with (!)
as a replacement for <!>. Did I misunderstand?

> But besides debugging code, why would <!> functionality be useful?

You might use @$ in the %destructor to report the locations of discarded
symbols to the user.

More importantly, if you don't use a union, you might do something like
this:

#define YYSTYPE struct node*
%destructor { node_free ($$); } <!>
Joel E. Denny
2006-11-17 22:07:46 UTC
Permalink
On Tue, 24 Oct 2006, Joel E. Denny wrote:

> On Wed, 25 Oct 2006, Paolo Bonzini wrote:
>
> > > grammar(): defs() rules() epilogue(!) {
> > > $grammar = new_grammar ($defs, $rules);
> > > }
> > > ;
> > Here you won't break grammar source compatibility by omitting the ()
> > altogether.
>
> If Bison were to complain about value name conflicts (and I think it
> should), you would have trouble with:
>
> exp: exp '+' exp

It occurs to me now that Bison wouldn't necessarily need to complain about
conflicting value name declarations. Bison could just complain about
ambiguous value name uses. That is, let's assume your proposal: omitting
the `(name)' from `exp(name)' would request the default value name $exp.
The following would be fine because it doesn't use $exp:

exp : exp '+' exp { $$ = $1 + $3; }

Every $exp in the following would be ambiguous, and Bison would complain
about each:

exp: exp '+' exp { $exp = $exp + $exp; }

To use $exp at all, you'd need to override the default value name for at
least two exp's:

exp: exp(term1) '+' exp(term2) { $exp = $term1 + term2; }

On Tue, 24 Oct 2006, Paolo Bonzini wrote:

> <!> is really too hieroglyphic for
> me. If anything, I would prefer <> or a trailing %untagged...

As a reminder, I've been proposing:

1. (name) = a value name, and <tag> = a type tag.

2. () = default value name. Maybe <> = default type tag, but I know of no
use for that right now.

3. (!) = no value name, <!> = no type tag, and both suppress some
warnings. In other words, `!' = void.

As much as I like the cautionary look of (!) and <!>, I now see a problem
with the above scheme. Bison may one day grow another kind of construct
that would also need some sort of expression, default, and void forms.
Let's say these are [expression], [], and [!], respectively. While (name)
and <tag> cannot contain a !, ! might prove to be a reasonable character
in [expression] with some other expected meaning. Since [!] would already
have a meaning, void might have to be something like [-] instead.
However, that would be inconsistent with (!) and <!>. In the worst case,
there would be no legible character to put in [expression] to mean void
because all candidates have some other expected meaning.

Following your proposal instead, if () and <> would mean void, then it
would be the empty string we'd have to worry about in []. My guess is
that the empty string is less likely to be a problem than ! is. (This is
not pure conjecture. Although I'm not planning to use `[' and `]', I have
something in mind, the ! bothers it, and the empty string does not.)

In summary, I'm now thinking the notation for value names and type tags
would go like this:

%destructor() {
/* $$ is unused. */
printf ("A tagless symbol was discarded.\n");
} <>

%%

grammar: defs rules epilogue() {
/* epilogue somehow handled elsewhere. */
$grammar = new_grammar ($defs, $rules);
}
;

exp(sum): exp(term1) '+' exp(term2) {
$sum = $term1 + $term2;
}
;

Those ()'s look awfully innocent to be removing warnings, but I guess
they're ok.

<*> would still mean any tag.

Opinions?
Paolo Bonzini
2006-11-18 16:50:49 UTC
Permalink
> [snip long description and rationale]

I'm pretty sure I won't have to teach most of these features, but I like
the new proposal much more. :-)

Paolo
Joel E. Denny
2006-11-18 18:34:52 UTC
Permalink
On Sat, 18 Nov 2006, Paolo Bonzini wrote:

> > [snip long description and rationale]
>
> I'm pretty sure I won't have to teach most of these features, but I like the
> new proposal much more. :-)

Thanks for reviewing it. I'm hoping we can make these features consistent
and intuitive enough that the learning curve will be minimal.

Does anyone else have comments? If not, I think I'll soon rename <!> to
<>.
Joel E. Denny
2006-11-21 00:43:58 UTC
Permalink
On Sat, 18 Nov 2006, Joel E. Denny wrote:

> Does anyone else have comments? If not, I think I'll soon rename <!> to
> <>.

I committed this.

Index: ChangeLog
===================================================================
RCS file: /sources/bison/bison/ChangeLog,v
retrieving revision 1.1608
diff -p -u -r1.1608 ChangeLog
--- ChangeLog 17 Nov 2006 20:07:07 -0000 1.1608
+++ ChangeLog 21 Nov 2006 00:34:01 -0000
@@ -1,3 +1,23 @@
+2006-11-20 Joel E. Denny <***@ces.clemson.edu>
+
+ Rename <!> to <>. Discussed starting at
+ <http://lists.gnu.org/archive/html/bison-patches/2006-11/msg00039.html>.
+ * NEWS (2.3a+): Update.
+ * doc/bison.texinfo (Freeing Discarded Symbols, Bison Symbols):
+ Update.
+ * src/parse-gram.y (TYPE_TAG_NONE, generic_symlist_item): Implement.
+ * src/scan-gram.l (INITIAL): Implement.
+ * src/symlist.c (symbol_list_default_tagless_new): Update comment.
+ * src/symlist.h (symbol_list, symbol_list_default_tagless_new): Update
+ comment.
+ * tests/actions.at (Default tagless %printer and %destructor,
+ Default tagged and per-type %printer and %destructor,
+ Default %printer and %destructor are not for error or $undefined,
+ Default %printer and %destructor are not for $accept,
+ Default %printer and %destructor for mid-rule values): Update.
+ * tests/input.at (Default %printer and %destructor redeclared,
+ Unused values with default %destructor): Update.
+
2006-11-17 Joel E. Denny <***@ces.clemson.edu>

Don't let %prec take a nonterminal.
@@ -160,7 +180,7 @@
* doc/bison.texinfo (Freeing Discarded Symbols): Document this and the
previous change today related to mid-rules.
(Bison Symbols): Remove %symbol-default and add <*> and <!>.
- * src/parser-gram.y (PERCENT_SYMBOL_DEFAULT): Remove.
+ * src/parse-gram.y (PERCENT_SYMBOL_DEFAULT): Remove.
(TYPE_TAG_ANY): Add as <*>.
(TYPE_TAG_NONE): Add as <!>.
(generic_symlist_item): Remove RHS for %symbol-default and add RHS's
Index: NEWS
===================================================================
RCS file: /sources/bison/bison/NEWS,v
retrieving revision 1.165
diff -p -u -r1.165 NEWS
--- NEWS 1 Nov 2006 06:09:40 -0000 1.165
+++ NEWS 21 Nov 2006 00:34:02 -0000
@@ -33,12 +33,12 @@ Changes in version 2.3a+ (????-??-??):
%destructor/%printer for all grammar symbols for which you have formally
declared semantic type tags.

- 2. Place `<!>' in a %destructor/%printer symbol list to define a default
+ 2. Place `<>' in a %destructor/%printer symbol list to define a default
%destructor/%printer for all grammar symbols without declared semantic
type tags.

Bison no longer supports the `%symbol-default' notation from Bison 2.3a.
- `<*>' and `<!>' combined achieve the same effect with one exception: Bison no
+ `<*>' and `<>' combined achieve the same effect with one exception: Bison no
longer applies any %destructor to a mid-rule value if that mid-rule value is
not actually ever referenced using either $$ or $n in a semantic action.

Index: doc/bison.texinfo
===================================================================
RCS file: /sources/bison/bison/doc/bison.texinfo,v
retrieving revision 1.211
diff -p -u -r1.211 bison.texinfo
--- doc/bison.texinfo 21 Oct 2006 10:03:35 -0000 1.211
+++ doc/bison.texinfo 21 Nov 2006 00:34:05 -0000
@@ -4237,7 +4237,7 @@ For instance, if your locations use a fi
@cindex freeing discarded symbols
@findex %destructor
@findex <*>
-@findex <!>
+@findex <>
During error recovery (@pxref{Error Recovery}), symbols already pushed
on the stack and tokens coming from the rest of the file are discarded
until the parser falls on its feet. If the parser runs out of memory,
@@ -4271,7 +4271,7 @@ grammar symbol that has that semantic ty
per-symbol @code{%destructor}.

Finally, you can define two different kinds of default @code{%destructor}s.
-You can place each of @code{<*>} and @code{<!>} in the @var{symbols} list of
+You can place each of @code{<*>} and @code{<>} in the @var{symbols} list of
exactly one @code{%destructor} declaration in your grammar file.
The parser will invoke the @var{code} associated with one of these whenever it
discards any user-defined grammar symbol that has no per-symbol and no per-type
@@ -4279,7 +4279,7 @@ discards any user-defined grammar symbol
The parser uses the @var{code} for @code{<*>} in the case of such a grammar
symbol for which you have formally declared a semantic type tag (@code{%type}
counts as such a declaration, but @code{$<tag>$} does not).
-The parser uses the @var{code} for @code{<!>} in the case of such a grammar
+The parser uses the @var{code} for @code{<>} in the case of such a grammar
symbol that has no declared semantic type tag.
@end deffn

@@ -4300,7 +4300,7 @@ For example:
%destructor @{ @} <character>
%destructor @{ free ($$); @} <*>
%destructor @{ free ($$); printf ("%d", @@$.first_line); @} STRING1 string1
-%destructor @{ printf ("Discarding tagless symbol.\n"); @} <!>
+%destructor @{ printf ("Discarding tagless symbol.\n"); @} <>
@end smallexample

@noindent
@@ -4339,7 +4339,7 @@ not reference @code{$$} in the mid-rule'
@var{n} is the RHS symbol position of the mid-rule) in any later action in that
rule.
However, if you do reference either, the Bison-generated parser will invoke the
-@code{<!>} @code{%destructor} whenever it discards the mid-rule symbol.
+@code{<>} @code{%destructor} whenever it discards the mid-rule symbol.

@ignore
@noindent
@@ -8572,7 +8572,7 @@ Used to define a default tagged @code{%d
@xref{Destructor Decl, , Freeing Discarded Symbols}.
@end deffn

-@deffn {Directive} <!>
+@deffn {Directive} <>
Used to define a default tagless @code{%destructor} or default tagless
@code{%printer}.
@xref{Destructor Decl, , Freeing Discarded Symbols}.
Index: src/parse-gram.y
===================================================================
RCS file: /sources/bison/bison/src/parse-gram.y,v
retrieving revision 1.98
diff -p -u -r1.98 parse-gram.y
--- src/parse-gram.y 12 Nov 2006 07:39:37 -0000 1.98
+++ src/parse-gram.y 21 Nov 2006 00:34:06 -0000
@@ -176,7 +176,7 @@ static int current_prec = 0;
%token SEMICOLON ";"
%token TYPE "type"
%token TYPE_TAG_ANY "<*>"
-%token TYPE_TAG_NONE "<!>"
+%token TYPE_TAG_NONE "<>"

%type <character> CHAR
%printer { fputs (char_name ($$), stderr); } CHAR
@@ -397,7 +397,7 @@ generic_symlist_item:
symbol { $$ = symbol_list_sym_new ($1, @1); }
| TYPE { $$ = symbol_list_type_new ($1, @1); }
| "<*>" { $$ = symbol_list_default_tagged_new (@1); }
-| "<!>" { $$ = symbol_list_default_tagless_new (@1); }
+| "<>" { $$ = symbol_list_default_tagless_new (@1); }
;

/* One token definition. */
Index: src/scan-gram.l
===================================================================
RCS file: /sources/bison/bison/src/scan-gram.l,v
retrieving revision 1.110
diff -p -u -r1.110 scan-gram.l
--- src/scan-gram.l 12 Nov 2006 07:39:37 -0000 1.110
+++ src/scan-gram.l 21 Nov 2006 00:34:06 -0000
@@ -210,7 +210,7 @@ splice (\\[ \f\t\v]*\n)*
"|" return PIPE;
";" return SEMICOLON;
"<*>" return TYPE_TAG_ANY;
- "<!>" return TYPE_TAG_NONE;
+ "<>" return TYPE_TAG_NONE;

{id} {
val->uniqstr = uniqstr_new (yytext);
Index: src/symlist.c
===================================================================
RCS file: /sources/bison/bison/src/symlist.c,v
retrieving revision 1.25
diff -p -u -r1.25 symlist.c
--- src/symlist.c 12 Nov 2006 07:39:37 -0000 1.25
+++ src/symlist.c 21 Nov 2006 00:34:06 -0000
@@ -91,9 +91,9 @@ symbol_list_default_tagged_new (location
}


-/*----------------------------------------.
-| Create a list containing a <!> at LOC. |
-`----------------------------------------*/
+/*---------------------------------------.
+| Create a list containing a <> at LOC. |
+`---------------------------------------*/

symbol_list *
symbol_list_default_tagless_new (location loc)
Index: src/symlist.h
===================================================================
RCS file: /sources/bison/bison/src/symlist.h,v
retrieving revision 1.22
diff -p -u -r1.22 symlist.h
--- src/symlist.h 12 Nov 2006 07:39:37 -0000 1.22
+++ src/symlist.h 21 Nov 2006 00:34:06 -0000
@@ -30,7 +30,7 @@ typedef struct symbol_list
{
/**
* Whether this node contains a symbol, a semantic type, a \c <*>, or a
- * \c <!>.
+ * \c <>.
*/
enum {
SYMLIST_SYMBOL, SYMLIST_TYPE,
@@ -81,7 +81,7 @@ symbol_list *symbol_list_type_new (uniqs

/** Create a list containing a \c <*> at \c loc. */
symbol_list *symbol_list_default_tagged_new (location loc);
-/** Create a list containing a \c <!> at \c loc. */
+/** Create a list containing a \c <> at \c loc. */
symbol_list *symbol_list_default_tagless_new (location loc);

/** Print this list.
Index: tests/actions.at
===================================================================
RCS file: /sources/bison/bison/tests/actions.at,v
retrieving revision 1.74
diff -p -u -r1.74 actions.at
--- tests/actions.at 21 Oct 2006 10:03:35 -0000 1.74
+++ tests/actions.at 21 Nov 2006 00:34:06 -0000
@@ -614,11 +614,11 @@ AT_DATA_GRAMMAR([[input.y]],
} <*>

%printer {
- fprintf (yyoutput, "<!> printer for '%c' @ %d", $$, @$.first_column);
-} <!>
+ fprintf (yyoutput, "<> printer for '%c' @ %d", $$, @$.first_column);
+} <>
%destructor {
- fprintf (stdout, "<!> destructor for '%c' @ %d.\n", $$, @$.first_column);
-} <!>
+ fprintf (stdout, "<> destructor for '%c' @ %d.\n", $$, @$.first_column);
+} <>

%printer {
fprintf (yyoutput, "'b'/'c' printer for '%c' @ %d", $$, @$.first_column);
@@ -667,15 +667,15 @@ main (void)
AT_CHECK([bison -o input.c input.y])
AT_COMPILE([input])
AT_PARSER_CHECK([./input], 1,
-[[<!> destructor for 'd' @ 4.
+[[<> destructor for 'd' @ 4.
'b'/'c' destructor for 'c' @ 3.
'b'/'c' destructor for 'b' @ 2.
-<!> destructor for 'a' @ 1.
+<> destructor for 'a' @ 1.
]],
[[Starting parse
Entering state 0
-Reading a token: Next token is token 'a' (1.1-1.1: <!> printer for 'a' @ 1)
-Shifting token 'a' (1.1-1.1: <!> printer for 'a' @ 1)
+Reading a token: Next token is token 'a' (1.1-1.1: <> printer for 'a' @ 1)
+Shifting token 'a' (1.1-1.1: <> printer for 'a' @ 1)
Entering state 1
Reading a token: Next token is token 'b' (1.2-1.2: 'b'/'c' printer for 'b' @ 2)
Shifting token 'b' (1.2-1.2: 'b'/'c' printer for 'b' @ 2)
@@ -683,18 +683,18 @@ Entering state 3
Reading a token: Next token is token 'c' (1.3-1.3: 'b'/'c' printer for 'c' @ 3)
Shifting token 'c' (1.3-1.3: 'b'/'c' printer for 'c' @ 3)
Entering state 5
-Reading a token: Next token is token 'd' (1.4-1.4: <!> printer for 'd' @ 4)
-Shifting token 'd' (1.4-1.4: <!> printer for 'd' @ 4)
+Reading a token: Next token is token 'd' (1.4-1.4: <> printer for 'd' @ 4)
+Shifting token 'd' (1.4-1.4: <> printer for 'd' @ 4)
Entering state 6
Reading a token: Now at end of input.
syntax error, unexpected $end, expecting 'e'
-Error: popping token 'd' (1.4-1.4: <!> printer for 'd' @ 4)
+Error: popping token 'd' (1.4-1.4: <> printer for 'd' @ 4)
Stack now 0 1 3 5
Error: popping token 'c' (1.3-1.3: 'b'/'c' printer for 'c' @ 3)
Stack now 0 1 3
Error: popping token 'b' (1.2-1.2: 'b'/'c' printer for 'b' @ 2)
Stack now 0 1
-Error: popping token 'a' (1.1-1.1: <!> printer for 'a' @ 1)
+Error: popping token 'a' (1.1-1.1: <> printer for 'a' @ 1)
Stack now 0
Cleanup: discarding lookahead token $end (1.5-1.5: )
Stack now 0
@@ -723,8 +723,8 @@ AT_DATA_GRAMMAR([[input.y]],
%}

%printer {
- fprintf (yyoutput, "<!> printer should not be called.\n");
-} <!>
+ fprintf (yyoutput, "<> printer should not be called.\n");
+} <>

%union { int field0; int field1; int field2; }
%type <field0> start 'a' 'g'
@@ -750,8 +750,8 @@ AT_DATA_GRAMMAR([[input.y]],
%destructor { fprintf (stdout, "'d' destructor.\n"); } 'd'

%destructor {
- fprintf (yyoutput, "<!> destructor should not be called.\n");
-} <!>
+ fprintf (yyoutput, "<> destructor should not be called.\n");
+} <>

%%

@@ -851,8 +851,8 @@ AT_SETUP([Default %printer and %destruct
# -----------------------------------------------------------------------------
m4_define([_AT_CHECK_DEFAULT_PRINTER_AND_DESTRUCTOR_FOR_END_TOKEN],
[m4_if($1, 0,
- [m4_pushdef([kind], [!]) m4_pushdef([not_kind], [*])],
- [m4_pushdef([kind], [*]) m4_pushdef([not_kind], [!])])
+ [m4_pushdef([kind], []) m4_pushdef([not_kind], [*])],
+ [m4_pushdef([kind], [*]) m4_pushdef([not_kind], [])])

AT_DATA_GRAMMAR([[input]]$1[[.y]],
[[%error-verbose
@@ -983,10 +983,10 @@ AT_DATA_GRAMMAR([[input.y]],

%printer {
fprintf (yyoutput, "'%c'", $$);
-} <!> <*>
+} <> <*>
%destructor {
fprintf (stderr, "DESTROY '%c'\n", $$);
-} <!> <*>
+} <> <*>

%%

@@ -1098,11 +1098,11 @@ AT_DATA_GRAMMAR([[input.y]],
%printer {
char chr = $$;
fprintf (yyoutput, "'%c'", chr);
-} <!> <*>
+} <> <*>
%destructor {
char chr = $$;
fprintf (stderr, "DESTROY '%c'\n", chr);
-} <!> <*>
+} <> <*>

%union { char chr; }
%type <chr> start
@@ -1162,8 +1162,8 @@ AT_DATA_GRAMMAR([[input.y]],
# define YY_LOCATION_PRINT(File, Loc)
%}

-%printer { fprintf (yyoutput, "%d", @$); } <!>
-%destructor { fprintf (stderr, "DESTROY %d\n", @$); } <!>
+%printer { fprintf (yyoutput, "%d", @$); } <>
+%destructor { fprintf (stderr, "DESTROY %d\n", @$); } <>
%printer { fprintf (yyoutput, "<*> printer should not be called"); } <*>
%destructor { fprintf (yyoutput, "<*> destructor should not be called"); } <*>

Index: tests/input.at
===================================================================
RCS file: /sources/bison/bison/tests/input.at,v
retrieving revision 1.62
diff -p -u -r1.62 input.at
--- tests/input.at 17 Nov 2006 20:07:08 -0000 1.62
+++ tests/input.at 21 Nov 2006 00:34:06 -0000
@@ -188,11 +188,11 @@ AT_DATA([[input.y]],
%destructor { destroy ($$); } <*>
%printer { destroy ($$); } <*>

-%destructor { destroy ($$); } <!> <!>
-%printer { destroy ($$); } <!> <!>
+%destructor { destroy ($$); } <> <>
+%printer { destroy ($$); } <> <>

-%destructor { destroy ($$); } <!>
-%printer { destroy ($$); } <!>
+%destructor { destroy ($$); } <>
+%printer { destroy ($$); } <>

%%

@@ -201,8 +201,8 @@ start: ;
%destructor { destroy ($$); } <*>;
%printer { destroy ($$); } <*>;

-%destructor { destroy ($$); } <!>;
-%printer { destroy ($$); } <!>;
+%destructor { destroy ($$); } <>;
+%printer { destroy ($$); } <>;
]])

AT_CHECK([bison input.y], [1], [],
@@ -285,7 +285,7 @@ AT_CLEANUP
AT_SETUP([Unused values with default %destructor])

AT_DATA([[input.y]],
-[[%destructor { destroy ($$); } <!>
+[[%destructor { destroy ($$); } <>
%type <tag> tagged

%%
Hans Aberg
2006-11-18 20:38:09 UTC
Permalink
On 25 Oct 2006, at 00:09, Joel E. Denny wrote:

>> <!> is really too hieroglyphic for
>> me.
>
> I realize <!> looks odd when considered in isolation, but I'm
> trying to be
> consistent with a couple of other proposals....
>
> First, named semantic values:
>
> exp(sum): exp(term1) '+' exp(term2) {
> $sum = $term1 + $term2
> }
> ;
>
> grammar(): defs() rules() epilogue(!) {
> $grammar = new_grammar ($defs, $rules);
> }
> ;
>
> Here, () = unspecified value name = default name = the symbol
> name. That
> seems logical to me. (!) = no value is used at all. The ! conveys a
> sense of caution, which I think this is appropriate given that its
> purpose
> would be to disable any Bison warning about unused $3.

I have looked at this, and tried to find an examples of other similar
uses, though I failed. One variation might be
exp/sum: exp/term1 '+' exp/term2 {...}
Instead of the proposed
exp/sum -> exp/term1 '+' exp/term2 {...}
One might use UTF-8 input files, and the symbol U+2192 RIGHTWARDS
ARROW "→"; thus:
exp/sum → exp/term1 '+' exp/term2 {...}

This should solve the parsing problem of the .y files, I think.

Hans Aberg
Joel E. Denny
2006-11-18 23:03:51 UTC
Permalink
On Sat, 18 Nov 2006, Hans Aberg wrote:

> I have looked at this, and tried to find an examples of other similar uses,
> though I failed.

Thanks for responding. I think Akim got this from Lemon. There are some
examples here:

http://www.hwaci.com/sw/lemon/lemon.html

> One variation might be
> exp/sum: exp/term1 '+' exp/term2 {...}
> Instead of the proposed
> exp/sum -> exp/term1 '+' exp/term2 {...}

Parentheses are easier on my eyes than slash or colon is. (Colon is
discussed in TODO.) For example, these look ambiguous to me:

a: b/ c
a -> c :b

In either case, is b's value named c? Or is c another RHS symbol and the
value for b is not used? Yes, the space between them probably indicates
the latter interpretation, but it still looks ugly to me. Compare with:

a: b() c
a: c b()

In TODO, I just noticed a couple of other issues for this area that I had
forgotten. I don't much care for making values regular variables as
suggested by this example:

r:exp -> a:exp '+' b:exp { r = a + b; } ;

Bison currently detects uses of values for the sake of warnings, and I
think searching for variables will make parsing the actions difficult
especially when faced with new target languages.

On the other hand, what if $ is special in some target language?
Declaring $ as part of the value name might help:

exp($sum): exp($term1) '+' exp($term2)

That is, one could potentially choose something other than $. There's an
example of this in TODO as well. For now, we could require that the first
character be $ since searching for any arbitrary sequence in an action
will take some work. I'd just like to get the notation right for now.

As TODO notes, there's still the issue of how to handle locations.
Perhaps Bison could automatically append `_loc' to the value name. So,
exp($sum) would have a value of $sum and a location of $sum_loc.

Or maybe it should be possible to name these separately:

exp($sum, @sum): exp($term1, @term1) '+' exp($term2, @term2)

The notation is getting verbose, but how many sym locations do you
typically use in a semantic action anyway? You'd only need to specify the
ones you use.

This would also make it possible to declare a value as unused but still
access the location. The only value used in the following is the LHS
value:

exp($sum): exp(, @term1) '+' exp()
Hans Aberg
2006-11-19 13:21:14 UTC
Permalink
On 19 Nov 2006, at 00:03, Joel E. Denny wrote:

>> I have looked at this, and tried to find an examples of other
>> similar uses,
>> though I failed.
>
> Thanks for responding. I think Akim got this from Lemon. There
> are some
> examples here:
>
> http://www.hwaci.com/sw/lemon/lemon.html

If there would be a syntax change/extension, I would favor it to
approach standard notation, as in the book by Waite & Goos, "Compiler
Construction". Rules are written a -> b. It does not treat semantic
values, though. In an attribute grammar, though, they us a ".".

>
>> One variation might be
>> exp/sum: exp/term1 '+' exp/term2 {...}
>> Instead of the proposed
>> exp/sum -> exp/term1 '+' exp/term2 {...}
>
> Parentheses are easier on my eyes than slash or colon is. (Colon is
> discussed in TODO.)

Parenthesizes should not be overused; becomes hard for human to read.
And ";" is already overused, which may cause a grammar hard to
implement. This lead me trying something else.

Perhaps a "." would do. Then one one get:
exp.sum: exp.term1 '+' exp.term2 {...}

> For example, these look ambiguous to me:
>
> a: b/ c

Does "/" have another use?

> a -> c :b

And "->" would just be an alternative to ":" - the grammar the same,
only a different lexer, returning the same token.

> In either case, is b's value named c? Or is c another RHS symbol
> and the
> value for b is not used? Yes, the space between them probably
> indicates
> the latter interpretation, but it still looks ugly to me.

I favor a grammar where spaces do not affect the grammar, only as a
delimiter between tokens. In the first case, it would then be the
same as
a: b/c
And in the second case
a : c : b
which I think generates an error, as Bison first sees a rule ":". One
wants to make a difference between ":" and "->", then the latter
might require rules to have a termination. In the theory book above
it is a "."

> Compare with:
>
> a: b() c
> a: c b()

So what would these empty parenthesizes mean?

Also see the EBNF proposal I made n the Bug-Bison list (subject
"EBNF"). It is relatively easy to make such an extansion, and it
should then not clash with any other extension.
>
> In TODO, I just noticed a couple of other issues for this area that
> I had
> forgotten. I don't much care for making values regular variables as
> suggested by this example:
>
> r:exp -> a:exp '+' b:exp { r = a + b; } ;

My hunch is that the grammar variable should come first - easier both
to humans and make a grammar.

As for not having $, one will still need to have a symbol for
locations. And I think of making a proposal for token numbers (used
when making definitions in a language) and token names (apart from
language implementation, can be used for better error messages).

> Bison currently detects uses of values for the sake of warnings, and I
> think searching for variables will make parsing the actions difficult
> especially when faced with new target languages.

I think on should settle for something that is not too difficult to
implement.

> On the other hand, what if $ is special in some target language?
> Declaring $ as part of the value name might help:
>
> exp($sum): exp($term1) '+' exp($term2)
>
> That is, one could potentially choose something other than $.
> There's an
> example of this in TODO as well. For now, we could require that
> the first
> character be $ since searching for any arbitrary sequence in an action
> will take some work. I'd just like to get the notation right for now.
>
> As TODO notes, there's still the issue of how to handle locations.
> Perhaps Bison could automatically append `_loc' to the value name.
> So,
> exp($sum) would have a value of $sum and a location of $sum_loc.
>
> Or maybe it should be possible to name these separately:
>
> exp($sum, @sum): exp($term1, @term1) '+' exp($term2, @term2)
>
> The notation is getting verbose, but how many sym locations do you
> typically use in a semantic action anyway? You'd only need to
> specify the
> ones you use.

I am not sure this gains anything because one is essentially
referring to the same location on the stack, but different
subcomponents. Another variation might be
exp.sum: exp.term1 '+' exp.term2 {...}
And then in the action use x.value, x.location, x.token, x.name,
where x in {sum, term1, term2}.

- I am essentially here illustrating what is going on.

> This would also make it possible to declare a value as unused but
> still
> access the location. The only value used in the following is the LHS
> value:
>
> exp($sum): exp(, @term1) '+' exp()

So some such OO notation might simplify such notational problems.

Hans Aberg
Joel E. Denny
2006-11-19 20:17:17 UTC
Permalink
On Sun, 19 Nov 2006, Hans Aberg wrote:

> If there would be a syntax change/extension, I would favor it to approach
> standard notation, as in the book by Waite & Goos, "Compiler Construction".
> Rules are written a -> b.

As long as we don't use `:' in the value name notation, then the issue of
whether to provide `->' as an alternative to `:' is separate from the
value name notation issue.

> Perhaps a "." would do. Then one one get:
> exp.sum: exp.term1 '+' exp.term2 {...}

You later seem to take the view that sum is an instance of exp rather than
an attribute. I prefer viewing it as an instance, so I don't like the
above notation.

> > For example, these look ambiguous to me:
> >
> > a: b/ c
>
> Does "/" have another use?

Ok, I think I miscommunicated. The ambiguity I meant is whether `b/c' is
the same as `b/ c' as discussed below. `c :b' and `c:b' have the same
issue....

> > Compare with:
> >
> > a: b() c
> > a: c b()
>
> So what would these empty parenthesizes mean?

We've been proposing that they tell Bison that the semantic value of b is
intentionally unused. This would prevent a warning when b has a
%destructor.

If we use `:' for value names (and thus `->' after the LHS) or `/' for
values names, I guess the following would mean that b has no value:

a -> c :b
a: b/ c

But I think that looks ugly. (Maybe I was unjustified in calling it
ambiguous though.) The parenthetical notation looks better to me.

> Also see the EBNF proposal I made n the Bug-Bison list (subject "EBNF"). It is
> relatively easy to make such an extansion, and it should then not clash with
> any other extension.

I have no desire to work on an EBNF extension. However, I see your point
that parentheses sometimes have a different purpose in grammars, and I
guess Bison could one day grow such a purpose for them.

> > In TODO, I just noticed a couple of other issues for this area that I had
> > forgotten. I don't much care for making values regular variables as
> > suggested by this example:
> >
> > r:exp -> a:exp '+' b:exp { r = a + b; } ;
>
> My hunch is that the grammar variable should come first - easier both to
> humans and make a grammar.

You mean exp:r? I don't see how that's easier. Also, I'm used to the
instance:kind notation in UML, so r:exp makes sense to me.

> And then in the action use x.value, x.location

That's an interesting possibility, but I think people are too infatuated
with $ and @.

> , x.token, x.name

Why are these necessary? The grammar author knows the token number and
name when he writes the rule. On the other hand, the value and location
are computed dynamically.

> > This would also make it possible to declare a value as unused but still
> > access the location. The only value used in the following is the LHS
> > value:
> >
> > exp($sum): exp(, @term1) '+' exp()
>
> So some such OO notation might simplify such notational problems.

Your OO notation doesn't address the issue I was trying to address here.
How do you declare that the semantic value is unused but still use the
location? Will we make the gamble that this is never necessary?
Hans Aberg
2006-11-19 21:35:37 UTC
Permalink
On 19 Nov 2006, at 21:17, Joel E. Denny wrote:

>> If there would be a syntax change/extension, I would favor it to
>> approach
>> standard notation, as in the book by Waite & Goos, "Compiler
>> Construction".
>> Rules are written a -> b.
>
> As long as we don't use `:' in the value name notation, ...

I haven't followed this one

> ...then the issue of
> whether to provide `->' as an alternative to `:' is separate from the
> value name notation issue.
>
>> Perhaps a "." would do. Then one one get:
>> exp.sum: exp.term1 '+' exp.term2 {...}
>
> You later seem to take the view that sum is an instance of exp
> rather than
> an attribute. I prefer viewing it as an instance, so I don't like the
> above notation.

I think of it as a rule
exp -> exp '+' exp
with other attachments: semantic values, etc. The grammar exists
without the latter, the attachments, but not vice versa.

>>> For example, these look ambiguous to me:
>>>
>>> a: b/ c
>>
>> Does "/" have another use?
>
> Ok, I think I miscommunicated. The ambiguity I meant is whether `b/
> c' is
> the same as `b/ c' as discussed below. `c :b' and `c:b' have the same
> issue....

I think there should be normal tokenization the Bison .l file, with a
general rule, stripping out space and newlines. I think this leads to
the most intuitive grammar.

>>> Compare with:
>>>
>>> a: b() c
>>> a: c b()
>>
>> So what would these empty parenthesizes mean?
>
> We've been proposing that they tell Bison that the semantic value
> of b is
> intentionally unused. This would prevent a warning when b has a
> %destructor.

The %destructor is only needed with output language C. So perhaps it
is not worth, or even prudent, to let the whole grammar be affected
by that: I think that the input grammar and the output language
should be separated as much as possible, to simplify the addition of
other output languages. Just a hunch.

> If we use `:' for value names ...

WHat do you mean by value names, here? Is that the semantic value
variables?

> (and thus `->' after the LHS) or `/' for
> values names, I guess the following would mean that b has no value:
>
> a -> c :b
> a: b/ c
>
> But I think that looks ugly. (Maybe I was unjustified in calling it
> ambiguous though.) The parenthetical notation looks better to me.

I think of finding symbols that simplifies the Bison .y grammar.
Stuff, like ";" that optionally can be omitted, which is there for
legacy, causes problems.

>
>> Also see the EBNF proposal I made n the Bug-Bison list (subject
>> "EBNF"). It is
>> relatively easy to make such an extansion, and it should then not
>> clash with
>> any other extension.
>
> I have no desire to work on an EBNF extension. However, I see your
> point
> that parentheses sometimes have a different purpose in grammars, and I
> guess Bison could one day grow such a purpose for them.
>
>>> In TODO, I just noticed a couple of other issues for this area
>>> that I had
>>> forgotten. I don't much care for making values regular variables as
>>> suggested by this example:
>>>
>>> r:exp -> a:exp '+' b:exp { r = a + b; } ;
>>
>> My hunch is that the grammar variable should come first - easier
>> both to
>> humans and make a grammar.
>
> You mean exp:r? I don't see how that's easier.

It depends what comes first in the definition. It is easier
implementing with defined names first, and then the definee.

> Also, I'm used to the
> instance:kind notation in UML, so r:exp makes sense to me.

I think of C, which had something that wasn't successful, relative to
the functional notation. But the functional notation on each grammar
variable might become too heavy, so I am striving for something
simpler. This is my line of thought.

>> And then in the action use x.value, x.location
>
> That's an interesting possibility, but I think people are too
> infatuated
> with $ and @.

At this point, I just mention it as a way to think in OO terms.

>> , x.token, x.name
>
> Why are these necessary? The grammar author knows the token number
> and
> name when he writes the rule.

If the token number should be used with type, it is has been into the
parser class, making a C++ forward declaration impossible. If Bison
should by a default write out the token name in errors, there needs
to be standard way for that. If both these are moved over to Bison
standard features, the a polymorphic semantic type needs only hold
one object, simplifying the choice between different models (under C+
+). So these are some of the motivations I play around with.

> On the other hand, the value and location
> are computed dynamically.
>
>>> This would also make it possible to declare a value as unused but
>>> still
>>> access the location. The only value used in the following is the
>>> LHS
>>> value:
>>>
>>> exp($sum): exp(, @term1) '+' exp()
>>
>> So some such OO notation might simplify such notational problems.
>
> Your OO notation doesn't address the issue I was trying to address
> here.
> How do you declare that the semantic value is unused but still use the
> location? Will we make the gamble that this is never necessary?

I haven't considered this. Now that you mention this, I think some
special notation might be the prudent way. The location value will
probably be used more often with Bison improved parser diagnostics.
This suggests that one indeed needs some special way to indicate it.

Hans Aberg
Joel E. Denny
2006-11-19 22:04:23 UTC
Permalink
On Sun, 19 Nov 2006, Hans Aberg wrote:

> > Ok, I think I miscommunicated. The ambiguity I meant is whether `b/c' is
> > the same as `b/ c' as discussed below. `c :b' and `c:b' have the same
> > issue....
>
> I think there should be normal tokenization the Bison .l file, with a general
> rule, stripping out space and newlines. I think this leads to the most
> intuitive grammar.

I was thinking of the human reader not Bison's scanner.

> The %destructor is only needed with output language C.

Currently, it's also useful for C++. There could be other target
languages some day that need it as well

> > If we use `:' for value names ...
>
> WHat do you mean by value names, here? Is that the semantic value variables?

Yes.

> > You mean exp:r? I don't see how that's easier.
>
> It depends what comes first in the definition. It is easier implementing with
> defined names first, and then the definee.

I don't see how either is easier to implement than the other.

> > How do you declare that the semantic value is unused but still use the
> > location? Will we make the gamble that this is never necessary?
>
> I haven't considered this. Now that you mention this, I think some special
> notation might be the prudent way. The location value will probably be used
> more often with Bison improved parser diagnostics. This suggests that one
> indeed needs some special way to indicate it.

Ok, I hope you'll respond to my other email where I propose a more concise
alternative to exp($sum, @sum).
Hans Aberg
2006-11-19 22:28:47 UTC
Permalink
On 19 Nov 2006, at 23:04, Joel E. Denny wrote:

>>> Ok, I think I miscommunicated. The ambiguity I meant is whether
>>> `b/c' is
>>> the same as `b/ c' as discussed below. `c :b' and `c:b' have the
>>> same
>>> issue....
>>
>> I think there should be normal tokenization the Bison .l file,
>> with a general
>> rule, stripping out space and newlines. I think this leads to the
>> most
>> intuitive grammar.
>
> I was thinking of the human reader not Bison's scanner.

Me too.

>> The %destructor is only needed with output language C.
>
> Currently, it's also useful for C++.

You mean, of one writes code in the common C/C++ subset. :-)

> There could be other target
> languages some day that need it as well

Perhaps, but I can't really think of any.

>>> If we use `:' for value names ...
>>
>> WHat do you mean by value names, here? Is that the semantic value
>> variables?
>
> Yes.

Then I think ":" should be avoided.

>>> You mean exp:r? I don't see how that's easier.
>>
>> It depends what comes first in the definition. It is easier
>> implementing with
>> defined names first, and then the definee.
>
> I don't see how either is easier to implement than the other.

I guess this is something that show when tried. :-) I don't really know.

>>> How do you declare that the semantic value is unused but still
>>> use the
>>> location? Will we make the gamble that this is never necessary?
>>
>> I haven't considered this. Now that you mention this, I think some
>> special
>> notation might be the prudent way. The location value will
>> probably be used
>> more often with Bison improved parser diagnostics. This suggests
>> that one
>> indeed needs some special way to indicate it.
>
> Ok, I hope you'll respond to my other email where I propose a more
> concise
> alternative to exp($sum, @sum).

I will look at this, but tomorrow - it is getting late here. :-)

Hans Aberg
Joel E. Denny
2006-11-19 22:36:58 UTC
Permalink
On Sun, 19 Nov 2006, Hans Aberg wrote:

> > > The %destructor is only needed with output language C.
> >
> > Currently, it's also useful for C++.
>
> You mean, of one writes code in the common C/C++ subset. :-)

I think you, Akim, and I have all agreed that semantic types cannot have
C++ destructors given Bison's current C++ skeletons. You need %destructor
instead. That's what I'm referring to.

> > Ok, I hope you'll respond to my other email where I propose a more concise
> > alternative to exp($sum, @sum).
>
> I will look at this, but tomorrow - it is getting late here. :-)

Thanks.
Hans Aberg
2006-11-19 22:54:06 UTC
Permalink
On 19 Nov 2006, at 23:36, Joel E. Denny wrote:

>>>> The %destructor is only needed with output language C.
>>>
>>> Currently, it's also useful for C++.
>>
>> You mean, of one writes code in the common C/C++ subset. :-)
>
> I think you, Akim, and I have all agreed that semantic types cannot
> have
> C++ destructors given Bison's current C++ skeletons. You need %
> destructor
> instead. That's what I'm referring to.

Not anymore:

I think that the %define tweak that enables one to include code
enables me to use my polymorphic C++ type, which I have used for
several years my own Bison tweaks. I am looking at this around these
days, though reality prevents me to do much work on it. I still need
a tweaked skeleton file, but it need just some code placement macros.
This is for untyped .y Bison; to type it, I need another tweak.
Basically, a way to trigger the Bison type system, plus a macro to
select the correct type in the runtime object.

Hans Aberg
Joel E. Denny
2006-11-19 22:54:43 UTC
Permalink
On Sun, 19 Nov 2006, Joel E. Denny wrote:

> On Sun, 19 Nov 2006, Hans Aberg wrote:
>
> > > > The %destructor is only needed with output language C.
> > >
> > > Currently, it's also useful for C++.
> >
> > You mean, of one writes code in the common C/C++ subset. :-)
>
> I think you, Akim, and I have all agreed that semantic types cannot have
> C++ destructors given Bison's current C++ skeletons. You need %destructor
> instead. That's what I'm referring to.

I should be more careful. I'm thinking of when you use %union with the
C++ skeletons. I have not tried the #define YYSTYPE approach, but I
recall someone reporting using C++ destructors then.

Anyway, my point is that %destructor can be useful with Bison's current
C++ skeletons.
Hans Aberg
2006-11-20 15:02:06 UTC
Permalink
On 19 Nov 2006, at 23:54, Joel E. Denny wrote:

>> On Sun, 19 Nov 2006, Hans Aberg wrote:
>>
>>>>> The %destructor is only needed with output language C.
>>>>
>>>> Currently, it's also useful for C++.
>>>
>>> You mean, of one writes code in the common C/C++ subset. :-)
>>
>> I think you, Akim, and I have all agreed that semantic types
>> cannot have
>> C++ destructors given Bison's current C++ skeletons. You need %
>> destructor
>> instead. That's what I'm referring to.
>
> I should be more careful. I'm thinking of when you use %union with
> the
> C++ skeletons. I have not tried the #define YYSTYPE approach, but I
> recall someone reporting using C++ destructors then.

The problem with %union is that it implements a 'union', which in C++
cannot be used with non-PODs, essentially, classes with non-trivial
(or user-defined) destructors. So, therefore, instead use something
else, plus a method to select dynamics objects.

> Anyway, my point is that %destructor can be useful with Bison's
> current
> C++ skeletons.

If one really does not want to use one of the main advantages of C++,
classes with constructors and destructors. I am not sure why one is
not using C in such a case. :-)

Hans Aberg
Joel E. Denny
2006-11-20 22:41:31 UTC
Permalink
On Mon, 20 Nov 2006, Hans Aberg wrote:

> If one really does not want to use one of the main advantages of C++, classes
> with constructors and destructors. I am not sure why one is not using C in
> such a case. :-)

C++ offers much more than just constructors and destructors for semantic
types in parsers.
Hans Aberg
2006-11-20 22:56:41 UTC
Permalink
On 20 Nov 2006, at 23:41, Joel E. Denny wrote:

>> If one really does not want to use one of the main advantages of C+
>> +, classes
>> with constructors and destructors. I am not sure why one is not
>> using C in
>> such a case. :-)
>
> C++ offers much more than just constructors and destructors for
> semantic
> types in parsers.

I am not just sure exactly what, I mean, if one is not going to use C+
+ OO. I mean, the template system does not seem worth the effort. :-)

Hans Aberg
Joel E. Denny
2006-11-20 23:08:04 UTC
Permalink
On Mon, 20 Nov 2006, Hans Aberg wrote:

> > C++ offers much more than just constructors and destructors for semantic
> > types in parsers.
>
> I am not just sure exactly what, I mean, if one is not going to use C++ OO. I
> mean, the template system does not seem worth the effort. :-)

Why can't you use C++ OO? Pointers to non-POD's work fine. Polymorphism
works fine. The rest of your program (besides the parser) can use
non-POD's without restraint. There's also the C++ standard library.
Hans Aberg
2006-11-21 12:05:50 UTC
Permalink
On 21 Nov 2006, at 00:08, Joel E. Denny wrote:

>>> C++ offers much more than just constructors and destructors for
>>> semantic
>>> types in parsers.
>>
>> I am not just sure exactly what, I mean, if one is not going to
>> use C++ OO. I
>> mean, the template system does not seem worth the effort. :-)
>
> Why can't you use C++ OO? Pointers to non-POD's work fine.
> Polymorphism
> works fine. The rest of your program (besides the parser) can use
> non-POD's without restraint. There's also the C++ standard library.

If one should program in what essentially is the C-subset, why not
use C instead? :-) In fact, if a GC, other than a reference count
should be implemented, C++ OO is just a bother (though it may change
in a later version of C++).

As for my program, the parser builds dynamic iterated polymorphic
objects, and in cmopbination with a reference count, that is very
convenient.

Hans Aberg
Joel E. Denny
2006-11-21 18:23:07 UTC
Permalink
On Tue, 21 Nov 2006, Hans Aberg wrote:

> > Why can't you use C++ OO? Pointers to non-POD's work fine. Polymorphism
> > works fine. The rest of your program (besides the parser) can use
> > non-POD's without restraint. There's also the C++ standard library.
>
> If one should program in what essentially is the C-subset

Why are you assuming this?

> As for my program, the parser builds dynamic iterated polymorphic objects,
> and in cmopbination with a reference count, that is very convenient.

I have programs that do this too. Inside the union, pointers to non-POD's
work fine.
Hans Aberg
2006-11-21 18:31:14 UTC
Permalink
On 21 Nov 2006, at 19:23, Joel E. Denny wrote:

>> As for my program, the parser builds dynamic iterated polymorphic
>> objects,
>> and in cmopbination with a reference count, that is very convenient.
>
> I have programs that do this too. Inside the union, pointers to
> non-POD's
> work fine.

If you want to implement the cleanup by hand when you do not have to,
that is fine with me, as long as I do not have to do it. :-)

Hans Aberg
Joel E. Denny
2006-11-21 18:48:02 UTC
Permalink
On Tue, 21 Nov 2006, Hans Aberg wrote:

> On 21 Nov 2006, at 19:23, Joel E. Denny wrote:
>
> > > As for my program, the parser builds dynamic iterated polymorphic
> > > objects,
> > > and in cmopbination with a reference count, that is very convenient.
> >
> > I have programs that do this too. Inside the union, pointers to non-POD's
> > work fine.
>
> If you want to implement the cleanup by hand when you do not have to, that is
> fine with me, as long as I do not have to do it. :-)

I don't find the cleanup to be that tough especially now that we have
per-type %destructor. With the union, I find it convenient that some of
my semantic types can be primitives, and I don't have to define a common
base class for all semantic types.

It's fine if you want to do it your way. My point is simply that
%destructor is useful for some grammars using the current C++ skeletons,
and thus %destructor is not a C only feature.
Hans Aberg
2006-11-21 20:09:36 UTC
Permalink
On 21 Nov 2006, at 19:48, Joel E. Denny wrote:

> I don't find the cleanup to be that tough especially now that we have
> per-type %destructor. With the union, I find it convenient that
> some of
> my semantic types can be primitives, and I don't have to define a
> common
> base class for all semantic types.

It sounds as though it is fine with you. :-)

> It's fine if you want to do it your way.

Thank you.

> My point is simply that
> %destructor is useful for some grammars using the current C++
> skeletons,
> and thus %destructor is not a C only feature.

C++ does not have a way to force one to use its OO features.

Hans Aberg
Joel E. Denny
2006-11-21 20:45:00 UTC
Permalink
On Tue, 21 Nov 2006, Hans Aberg wrote:

> > It's fine if you want to do it your way.
>
> Thank you.

In other words, I wasn't arguing that your way is wrong... only that there
are other valid ways that require %destructor. But anyway, you're
welcome.

> C++ does not have a way to force one to use its OO features.

Fortunately. When pure OO is desirable, C++ is the wrong language in my
opinion.
Hans Aberg
2006-11-21 21:13:26 UTC
Permalink
On 21 Nov 2006, at 21:45, Joel E. Denny wrote:

>> C++ does not have a way to force one to use its OO features.
>
> Fortunately. When pure OO is desirable, C++ is the wrong language
> in my
> opinion.

Actually, I am thinking of designing a C++ successor, or, at least,
as a thought experiment. :-)

Hans Aberg
Paolo Bonzini
2006-11-19 15:49:42 UTC
Permalink
> Declaring $ as part of the value name might help:
>
> exp($sum): exp($term1) '+' exp($term2)

Not that much. Yes, this makes a Perl back-end for Bison even more
difficult, but given Perl 6 has grammars this is not a likely event
anyway (unlike the Java back-end -- shameless plug for my %language patch).

> That is, one could potentially choose something other than $. There's an
> example of this in TODO as well. For now, we could require that the first
> character be $ since searching for any arbitrary sequence in an action
> will take some work. I'd just like to get the notation right for now.
>
> As TODO notes, there's still the issue of how to handle locations.
> Perhaps Bison could automatically append `_loc' to the value name. So,
> exp($sum) would have a value of $sum and a location of $sum_loc.

Please, let's keep $sum and @sum.

> Or maybe it should be possible to name these separately:
>
> exp($sum, @sum): exp($term1, @term1) '+' exp($term2, @term2)

Please don't.

Paolo
Hans Aberg
2006-11-19 18:23:02 UTC
Permalink
On 19 Nov 2006, at 16:49, Paolo Bonzini wrote:

>> Declaring $ as part of the value name might help:
>> exp($sum): exp($term1) '+' exp($term2)
>
> Not that much. Yes, this makes a Perl back-end for Bison even more
> difficult, but given Perl 6 has grammars this is not a likely event
> anyway (unlike the Java back-end -- shameless plug for my %language
> patch).

If Bison moves that far, to multilingual output support, I expect the
parsing in each {...} be output language specific anyway. So it is
not any point in worrying about the fact that say "$" does not work
in this or that output language.

> Please, let's keep $sum and @sum.

So this seems simplest.

>> Or maybe it should be possible to name these separately:
>> exp($sum, @sum): exp($term1, @term1) '+' exp($term2, @term2)
>
> Please don't.

I think too, that such ideas will make the grammars rather unreadable
to humans. :-)

I think the best way is to think of it as one defines objects named
"sum", "term1", etc., from which value, location, plus perhaps some,
can be extracted. Right now, this is done using "$" and "@".

Hans Aberg
Joel E. Denny
2006-11-19 20:17:25 UTC
Permalink
On Sun, 19 Nov 2006, Hans Aberg wrote:

> If Bison moves that far, to multilingual output support, I expect the parsing
> in each {...} be output language specific anyway. So it is not any point in
> worrying about the fact that say "$" does not work in this or that output
> language.

I can buy that. Also, it's probably best that Bison developers come up
with the $ and @ alternatives for each output language rather than leaving
it up to every user to do it his own way.

> > > Or maybe it should be possible to name these separately:
> > > exp($sum, @sum): exp($term1, @term1) '+' exp($term2, @term2)
> >
> > Please don't.
>
> I think too, that such ideas will make the grammars rather unreadable to
> humans. :-)

Ok, here's a less verbose alternative:

a(name1): b c() d(@name2)

For a, the value and location are $name1 and @name2.
For b, they are $b and @b.
For c, the value is declared unused.

So far, these are old ideas I've discussed at length before. What's new
is that, for d, the value is also declared unused, but the location can be
referenced with @name2. This last usage may not be common, but I think it
ought to be possible.

> I think the best way is to think of it as one defines objects named "sum",
> "term1", etc., from which value, location, plus perhaps some, can be
> extracted. Right now, this is done using "$" and "@".

Yes, I like that view.
Joel E. Denny
2006-11-19 21:14:36 UTC
Permalink
On Sun, 19 Nov 2006, Joel E. Denny wrote:

> Ok, here's a less verbose alternative:
>
> a(name1): b c() d(@name2)
>
> For a, the value and location are $name1 and @name2.

I meant $name1 and @name1, of course.

> For b, they are $b and @b.
> For c, the value is declared unused.
>
> So far, these are old ideas I've discussed at length before. What's new
> is that, for d, the value is also declared unused, but the location can be
> referenced with @name2. This last usage may not be common, but I think it
> ought to be possible.
Hans Aberg
2006-11-21 12:24:49 UTC
Permalink
On 19 Nov 2006, at 21:17, Joel E. Denny wrote (corrected according to
later post):

> Ok, here's a less verbose alternative:
>
> a(name1): b c() d(@name2)
>
> For a, the value and location are $name1 and @name1.
> For b, they are $b and @b.
> For c, the value is declared unused.
>
> So far, these are old ideas I've discussed at length before.
> What's new
> is that, for d, the value is also declared unused, but the location
> can be
> referenced with @name2. This last usage may not be common, but I
> think it
> ought to be possible.

One of my worries, is what happens if one adds variables for token
values and names. Suppose, just to focus on something, the latter
have symbols #, %. Then one would end up on combinations like:
a(name1): b c() d(@#name2) e($#%name3)
The use of token values are rather rare, but essential in
definitions. So the default could be that they are not used. Token
names, if used in error messages, will be as frequent as locations.
So these two might use the same default.

Hans Aberg
Joel E. Denny
2006-11-21 18:30:09 UTC
Permalink
On Tue, 21 Nov 2006, Hans Aberg wrote:

> One of my worries, is what happens if one adds variables for token values and
> names.

To be more general, some day Bison might possibly need symbol attributes
other than semantic value and location.

> Suppose, just to focus on something, the latter have symbols #, %.
> Then one would end up on combinations like:
> a(name1): b c() d(@#name2) e($#%name3)

I would prefer:

a(name1): b c() d(@name2, #name2) e($name3, #name3, %name3)

That is, (name1) makes all attributes available. If you want a specific
subset, list the items in that subset.

Yes, we're back to the previous verbosity, but I think it's not terrible,
I think it's better than ($#%name), and we appear to have no need for it
any time soon anyway.
Hans Aberg
2006-11-21 18:46:47 UTC
Permalink
On 21 Nov 2006, at 19:30, Joel E. Denny wrote:

>> One of my worries, is what happens if one adds variables for token
>> values and
>> names.
>
> To be more general, some day Bison might possibly need symbol
> attributes
> other than semantic value and location.

Right. After reality preventing me from doing this stuff, I was just
fiddling with getting my C++ parser working without having to tweak
Bison itself. Then I arrived at the idea to perhaps put into the
token values and names as parser features, along side with location.
But my plan was to take up this idea later; it came up rather
prematurely. So something else might come up in the future, though I
do not know exactly what.

>> Suppose, just to focus on something, the latter have symbols #, %.
>> Then one would end up on combinations like:
>> a(name1): b c() d(@#name2) e($#%name3)
>
> I would prefer:
>
> a(name1): b c() d(@name2, #name2) e($name3, #name3, %name3)
>
> That is, (name1) makes all attributes available. If you want a
> specific
> subset, list the items in that subset.
>
> Yes, we're back to the previous verbosity, but I think it's not
> terrible,
> I think it's better than ($#%name), and we appear to have no need
> for it
> any time soon anyway.

I think that all that makes the grammar rues harder to read should be
avoided. The often come out pretty dirty as it is. Though I do not
know what is the best method might be. Also, it is good if the user
as far as possible can avoid writing repetitions. So I think the idea
above is good if one normally would use different names for the same
stack position semantic, location, etc., values. But I suspect that
is not going to be the case. So therefore, I think it is probably
best to think OO: giving the parser stack position a name, and from
that extract subfields, whatever the syntax may be.

Hans Aberg
Joel E. Denny
2006-11-21 18:59:22 UTC
Permalink
On Tue, 21 Nov 2006, Hans Aberg wrote:

> > > Suppose, just to focus on something, the latter have symbols #, %.
> > > Then one would end up on combinations like:
> > > a(name1): b c() d(@#name2) e($#%name3)
> >
> > I would prefer:
> >
> > a(name1): b c() d(@name2, #name2) e($name3, #name3, %name3)
> >
> > That is, (name1) makes all attributes available. If you want a specific
> > subset, list the items in that subset.
> >
> > Yes, we're back to the previous verbosity, but I think it's not terrible,
> > I think it's better than ($#%name), and we appear to have no need for it
> > any time soon anyway.
>
> I think that all that makes the grammar rues harder to read should be
> avoided.

Yes, more information will always do that.

> Also, it is good if the user as far as possible
> can avoid writing repetitions.

So, something like:

a(name1): b c() d(name2[@,#]) e(name3[$,#,%])

Still ugly, but I like it better than ($#%name), which looks like one long
cryptic variable.

Well, I think we're getting ahead of ourselves. The notation I've
proposed seems to cleanly accommodate what we need now. I think there are
several possible ways to extend it if necessary in the future.
Hans Aberg
2006-11-21 20:23:22 UTC
Permalink
On 21 Nov 2006, at 19:59, Joel E. Denny wrote:

>> I think that all that makes the grammar rues harder to read should be
>> avoided.
>
> Yes, more information will always do that.
>
>> Also, it is good if the user as far as possible
>> can avoid writing repetitions.
>
> So, something like:
>
> a(name1): b c() d(name2[@,#]) e(name3[$,#,%])
>
> Still ugly, but I like it better than ($#%name), which looks like
> one long
> cryptic variable.

I do not know. Perhaps use "{}" instead of "[]", to indicate it is
sets, if not conflicting with action "{}".

One idea I am playing around with is a header/source breakup. The
actions would then be put in the source (or in a separate segment),
and the rules in the header. One motivation is to make the grammar
readable in the case of complicated actions. Another is that if only
the actions are altered, Bison needs not recompute the parser states.

The main problem is how to avoid having to write the grammar rules
more than once.

> Well, I think we're getting ahead of ourselves. The notation I've
> proposed seems to cleanly accommodate what we need now. I think
> there are
> several possible ways to extend it if necessary in the future.

As long as it does not close any doors, if some of the other stuff
needs to be implemented, whatever it may be.

Hans Aberg
Joel E. Denny
2006-11-21 20:42:36 UTC
Permalink
On Tue, 21 Nov 2006, Hans Aberg wrote:

> > So, something like:
> >
> > a(name1): b c() d(name2[@,#]) e(name3[$,#,%])
> >
> > Still ugly, but I like it better than ($#%name), which looks like one long
> > cryptic variable.
>
> I do not know. Perhaps use "{}" instead of "[]", to indicate it is sets, if
> not conflicting with action "{}".

Yeah, something like that.

> As long as it does not close any doors, if some of the other stuff needs to be
> implemented, whatever it may be.

I've been thinking about your comment about parentheses. It's conceivable
that Bison may one day want to use parentheses to group symbols as in EBNF
syntax. More importantly, that's the customary usage of parentheses that
I suspect many users are familiar with in a grammar, so our proposed usage
may be misleading. Why not brackets instead?

a[name1]: b c[] d[@name2]

This would also encourage your braces suggestion should we ever need
attributes in addition to value and location:

a[name1]: b c[] d[name2{@,#}] e[name3{$,#,%}]

I hope we don't need that.
Hans Aberg
2006-11-21 21:11:54 UTC
Permalink
On 21 Nov 2006, at 21:42, Joel E. Denny wrote:

>> As long as it does not close any doors, if some of the other stuff
>> needs to be
>> implemented, whatever it may be.
>
> I've been thinking about your comment about parentheses. It's
> conceivable
> that Bison may one day want to use parentheses to group symbols as
> in EBNF
> syntax. More importantly, that's the customary usage of
> parentheses that
> I suspect many users are familiar with in a grammar, so our
> proposed usage
> may be misleading.

I made my EBNF proposal (in Bug-Bison, "EBNF"), because Akim seemed
wanting to have it in Bison, and the notation, I took straight out
the book:
> Waite, Goos, "Compiler Construction", Appendix A, p. 383, gives
> some EBNF to BNF translation rules. I rewrite in local notation,
> suitable for implementation in Bison:
> 1. a(b)c := axc, x: b.
> 2. a[b]c := ac | a(b)c.
> 3. au+ c := axc, x: u | xu.
> 4. au* c := a[u+]c.
> 5. a || t := a(ta)*.
> where a, b, c are arbitrary RHS rules, x a unique non-terminal, u a
> single or parenthesized grammar symbol, and t a terminal.
For more details, see this post.

But here, the LHSs should somehow be adapted to Bison notation. As
you note, the symbols "( ) [ ] + * ||" become occupied. Though there
are EBNF variations, it must be synced with the variable notation.

> Why not brackets instead?
>
> a[name1]: b c[] d[@name2]
>
> This would also encourage your braces suggestion should we ever need
> attributes in addition to value and location:
>
> a[name1]: b c[] d[name2{@,#}] e[name3{$,#,%}]
>
> I hope we don't need that.

So the problem is that EBNF might use up "[ ]". I think "[ ]" will
work much better than "(...)?", which is sometimes used - one needs
to attach actions to these EBNF constructs as well, and postfix
operators like "?" may cause problems.

Another thing that might come up is the "grammars with constraints"
method I wrote about to prohibit certain rule expansions, as
generalization, or grammar proper implementation of operator
precedences. (The current token precedences are parser algorithm
dependent.) Clearly, there is no standard notation for this :-), but
it illustrates the problems that might come up.

Hans Aberg
Joel E. Denny
2006-11-21 21:39:03 UTC
Permalink
On Tue, 21 Nov 2006, Hans Aberg wrote:

> But here, the LHSs should somehow be adapted to Bison notation.

I think this is the key. We can't hope to avoid collisions with every
notation Bison might possibly want to use in the future or that users
might already be familiar with. There are only so many characters on the
keyboard. Bison will have to develop its own alternatives in some cases.
However, using () in a grammar to group symbols is pervasive in my
experience, and so we might ought to avoid using it for other purposes.
I suspect that using [] in a grammar to mean optional is not nearly as
common.

> I think "[ ]" will work much
> better than "(...)?", which is sometimes used - one needs to attach actions to
> these EBNF constructs as well, and postfix operators like "?" may cause
> problems.

I don't follow.
Paul Eggert
2006-11-21 21:58:33 UTC
Permalink
"Joel E. Denny" <***@ces.clemson.edu> writes:

> I suspect that using [] in a grammar to mean optional is not nearly as
> common.

It's pretty common, I'm afraid. It's in the ISO standard for EBNF,
for example.

http://www.cl.cam.ac.uk/~mgk25/iso-ebnf.html

Characters unused by ISO EBNF include .:!+_%@&#$<>/\^`~.

I sort of liked the notation a.b to denote the b component of a, as
it's a common notation in many languages. But I admit I haven't been
following this discussion too closely.
Joel E. Denny
2006-11-21 22:04:53 UTC
Permalink
On Tue, 21 Nov 2006, Paul Eggert wrote:

> "Joel E. Denny" <***@ces.clemson.edu> writes:
>
> > I suspect that using [] in a grammar to mean optional is not nearly as
> > common.
>
> It's pretty common, I'm afraid. It's in the ISO standard for EBNF,
> for example.
>
> http://www.cl.cam.ac.uk/~mgk25/iso-ebnf.html
>
> Characters unused by ISO EBNF include .:!+_%@&#$<>/\^`~.

Ok, thanks. Nevertheless, surely it's not as common as () used for
grouping. And it's still not clear to me what's wrong with "(...)?"
instead of "[...]".

> I sort of liked the notation a.b to denote the b component of a, as
> it's a common notation in many languages. But I admit I haven't been
> following this discussion too closely.

Are you referring to this?

exp.sum: exp.term1 '+' exp.term2
Joel E. Denny
2006-11-22 01:33:20 UTC
Permalink
On Tue, 21 Nov 2006, Joel E. Denny wrote:

> Ok, thanks. Nevertheless, surely it's not as common as () used for
> grouping. And it's still not clear to me what's wrong with "(...)?"
> instead of "[...]".

I now see that EBNF has a use for ?.

I have to wonder why we should bother trying to make the notation for
named semantic values compatible with EBNF. Is anyone ever actually going
to contribute EBNF support to Bison? When optionality and repetitions are
possible, do named semantic values even make sense? Should we really
sacrifice the quality of our current notation to accommodate a feature
that may never happen and that may not make sense anyway? Instead, why
not let the person who tries to implement EBNF figure out another notation
for named semantic values? He can put Bison in an %ebnf mode if
necessary.

I'm guessing that you and Hans are telling me that users will have just as
much of a preconceived notion for the meaning of [] as they will for the
meaning of (). At least () for named semantic values has some precedence
in Lemon.

With all that in mind, I say we stick with parentheses:

a(name1): b c() d(@name2)

For a, the value and location are $name1 and @name1.
For b, they are $b and @b.
For c and d, the values are declared unused.
For d, the location is @name2.
Paul Eggert
2006-11-22 19:28:20 UTC
Permalink
"Joel E. Denny" <***@ces.clemson.edu> writes:

> When optionality and repetitions are
> possible, do named semantic values even make sense?

Sure. You can give names to the optional or repeated parts. The
types associated with these parts are the equivalent of the ML "'a
option" and "'a list" parameterized types.

> Should we really sacrifice the quality of our current notation to
> accommodate a feature that may never happen and that may not make
> sense anyway?

No, but if the quality of our notation isn't much affected either way,
we might as well do something that is compatible.

> At least () for named semantic values has some precedence in Lemon.
>
> With all that in mind, I say we stick with parentheses:
>
> a(name1): b c() d(@name2)
>
> For a, the value and location are $name1 and @name1.
> For b, they are $b and @b.
> For c and d, the values are declared unused.
> For d, the location is @name2.

I like the rule for 'b' -- that's simple. The rest works, but I'd
like something simpler if possible.

First, I forget: why is it important that values can be declared
"unused"? That's extra complexity -- is it really worth it? After
all, Bison and the human reader should be able to determine easily
whether a value is used by reading the corresponding action. If the
action is so complicated that this is difficult for the human reader,
the action (or grammar) should probably be rewritten anyway.

I do see the need for declaring a different name for the value and
location than the nonterminal's name, since one can have two different
instances of the same nonterminal. But parentheses are overkill for
that, since we don't need brackets just to declare a different name.

I sort-of liked ".", but it isn't really component selection, and
anyway that collides with existing practice, which allows "." in
identifiers (POSIX requires support for this), so that's out. Of the
other characters unused by ISO EBNF, ":", "%", and "_" are also
reserved by POSIX. "$", "\", and "<" have special meanings in Solaris
10 /usr/ccs/bin/yacc (for example, "$" is a valid character in
identifiers), so it's better not to use them in case we ever want to
support those special meanings compatibly. That leaves "!+@&#>/^`~".

How about this notation, which is in something of a different and
more-restricted style?

a#1: a#2 b c d;

The '#1' and '#2' serve to disambiguate the 'a's. In the action, one
writes $a#1 to get the value for the first a, and $a#2 for the second.
One writes @a#1 and @a#2 for locations. One writes $b and @b for b's
action and location, and similarly for c and d.

For example, instead of this:

exp(result):
NUM
{ $result = $NUM; }
| '-' exp(subtrahend) %prec NEG
{ $result = - $subtrahend; }
| '(' exp(subexpression) ')'
{ $result = $subexpression; }
| exp(augend) '+' exp(addend)
{ $result = $augend + $addend; }
| exp(minuend) '-' exp(subtrahend)
{ $result = $minuend - $subtrahend; }
| exp(multiplicand) '*' exp(multiplier)
{ $result = $multiplicand * $multiplier; }
| exp(numerator) '/' exp(denominator)
{ $result = $numerator / $denominator; }
| exp(base) '^' exp(exponent)
{ $result = pow ($base, $exponent); }

you write this:

exp:
NUM { $$ = $NUM; }
| '-' exp %prec NEG { $$ = - $exp; }
| '(' exp ')' { $$ = $exp; }
| exp#1 '+' exp#2 { $$ = $exp#1 + $exp#2; }
| exp#1 '-' exp#2 { $$ = $exp#1 - $exp#2; }
| exp#1 '*' exp#2 { $$ = $exp#1 * $exp#2; }
| exp#1 '/' exp#2 { $$ = $exp#1 / $exp#2; }
| exp#1 '^' exp#2 { $$ = pow ($exp#1, $exp#2); }

It's not entirely a fair comparison, but I hope you see the idea.
Often the alternative names are weird or arbitrary. The English word
"augend" hasn't been important since we stopped using Roman numerals
to do arithmetic, and I just now fixed
<http://en.wikipedia.org/wiki/Multiplication> because it misstated the
distinction between "multiplier" and "multiplicand". In cases like
these, "exp#1" is just as good.

If it's important to say a value can be unused, that could be written
"a#-", or something like that.
Hans Aberg
2006-11-22 19:55:57 UTC
Permalink
On 22 Nov 2006, at 20:28, Paul Eggert wrote:

> I sort-of liked ".", but it isn't really component selection,

I took it from attribute grammrs in Waite & Goos.

> and
> anyway that collides with existing practice, which allows "." in
> identifiers (POSIX requires support for this), so that's out. Of the
> other characters unused by ISO EBNF, ":", "%", and "_" are also
> reserved by POSIX. "$", "\", and "<" have special meanings in Solaris
> 10 /usr/ccs/bin/yacc (for example, "$" is a valid character in
> identifiers), so it's better not to use them in case we ever want to
> support those special meanings compatibly. That leaves "!+@&#>/^`~".

I suggested "/".

> How about this notation, which is in something of a different and
> more-restricted style?
>
> a#1: a#2 b c d;
>
> The '#1' and '#2' serve to disambiguate the 'a's. In the action, one
> writes $a#1 to get the value for the first a, and $a#2 for the second.
> One writes @a#1 and @a#2 for locations. One writes $b and @b for b's
> action and location, and similarly for c and d.

The idea is to avoid renumbering problems, and therefore use names
alone. But any symbol would be OK.


> For example, instead of this:
>
> exp(result):
> NUM
> { $result = $NUM; }
> | '-' exp(subtrahend) %prec NEG
> { $result = - $subtrahend; }
> | '(' exp(subexpression) ')'
> { $result = $subexpression; }
> | exp(augend) '+' exp(addend)
> { $result = $augend + $addend; }
> | exp(minuend) '-' exp(subtrahend)
> { $result = $minuend - $subtrahend; }
> | exp(multiplicand) '*' exp(multiplier)
> { $result = $multiplicand * $multiplier; }
> | exp(numerator) '/' exp(denominator)
> { $result = $numerator / $denominator; }
> | exp(base) '^' exp(exponent)
> { $result = pow ($base, $exponent); }
>
> you write this:
>
> exp:
> NUM { $$ = $NUM; }
> | '-' exp %prec NEG { $$ = - $exp; }
> | '(' exp ')' { $$ = $exp; }
> | exp#1 '+' exp#2 { $$ = $exp#1 + $exp#2; }
> | exp#1 '-' exp#2 { $$ = $exp#1 - $exp#2; }
> | exp#1 '*' exp#2 { $$ = $exp#1 * $exp#2; }
> | exp#1 '/' exp#2 { $$ = $exp#1 / $exp#2; }
> | exp#1 '^' exp#2 { $$ = pow ($exp#1, $exp#2); }

This can become cumbersome when writing:

exponential:
INTEGRAL_NUMBER { $$ = $INTEGRAL_NUMBER; }
| '-' exponential %prec NEGATION { $$ = - $exponential; }
| '(' exponential ')' { $$ = $exponential; }
| exponential#1 '+' exponential#2 { $$ = $exponential#1 +
$exponential#2; }
| exponential#1 '-' exponential#2 { $$ = $exponential#1 -
$exponential#2; }
| exponential#1 '*' exponential#2 { $$ = $exponential#1 *
$exponential#2; }
| exponential#1 '/' exponential#2 { $$ = $exponential#1 /
$exponential#2; }
| exponential#1 '^' exponential#2 { $$ = pow ($exponential#1,
$exponential#2); }

Instead of the simpler:

exponential:
INTEGRAL_NUMBER#x { $$ = $x; }
| '-' exponential#x %prec NEGATION { $$ = - $x; }
| '(' exponential#x ')' { $$ = $x; }
| exponential#x '+' exponential#y { $$ = $x + $y; }
| exponential#x '-' exponential#y { $$ = $x - $y; }
| exponential#x '*' exponential#y { $$ = $x * $y; }
| exponential#x '/' exponential#y { $$ = $x / $y; }
| exponential#x '^' exponential#y { $$ = pow($x, $y); }

> If it's important to say a value can be unused, that could be written
> "a#-", or something like that.

I think this sounds interesting.

Hans Aberg
Hans Aberg
2006-11-22 21:52:33 UTC
Permalink
On 22 Nov 2006, at 20:55, Hans Aberg wrote:

> This can become cumbersome when writing:
>
> exponential:
...
It should have been:

expression:
INTEGRAL_NUMBER { $$ = $INTEGRAL_NUMBER; }
| '-' expression %prec NEGATION { $$ = - $expression; }
| '(' expression ')' { $$ = $expression; }
| expression#1 '+' expression#2 { $$ = $expression#1 +
$expression#2; }
| expression#1 '-' expression#2 { $$ = $expression#1 -
$expression#2; }
| expression#1 '*' expression#2 { $$ = $expression#1 *
$expression#2; }
| expression#1 '/' expression#2 { $$ = $expression#1 /
$expression#2; }
| expression#1 '^' expression#2 { $$ = pow ($expression#1,
$expression#2); }

And the simpler:

expression:
INTEGRAL_NUMBER#x { $$ = $x; }
| '-' expression#x %prec NEGATION { $$ = -$x; }
| '(' expression#x ')' { $$ = $x; }
| expression#x '+' expression#y { $$ = $x + $y; }
| expression#x '-' expression#y { $$ = $x - $y; }
| expression#x '*' expression#y { $$ = $x * $y; }
| expression#x '/' expression#y { $$ = $x / $y; }
| expression#x '^' expression#y { $$ = pow($x, $y); }

Another idea (which I mentioned earlier) might be allowed to lift the
action definitions out of the grammar definition, and be put
elsewhere, say another file. Grammars with complicated actions might
then become readable, and the grammar need not be recompiled when
only the actions are altered (if put into another file).

>> If it's important to say a value can be unused, that could be written
>> "a#-", or something like that.
>
> I think this sounds interesting.

Warnings against unused variables is usually a compiler feature, not
a language feature. So this suggestion breaks that principle, if it
now is worth holding onto. :-)

Otherwise, I think getting variable names is quite important, as it
is easy to do mistakes with the numberings when altering a rule. I
would have less use for the unused variable warnings. I think a use
of such a feature might be to turn it on sometimes, to check for
problems, but otherwise having it off.

Hans Aberg
Paul Eggert
2006-11-22 23:13:49 UTC
Permalink
Hans Aberg <***@math.su.se> writes:

> This can become cumbersome when writing:

OK, thanks, I now understand the problem better: the nonterminal names
can be quite long, and you want a shorter name in the action (which is
typically small, so it's ok to have short local names). In that case,
the syntax could have an identifier after the '#', as you suggested,
and this identifier would supersede the nonterminal's identifier
within the action.
Hans Aberg
2006-11-23 12:19:32 UTC
Permalink
On 23 Nov 2006, at 00:13, Paul Eggert wrote:

> Hans Aberg <***@math.su.se> writes:
>
>> This can become cumbersome when writing:
>
> OK, thanks, I now understand the problem better: the nonterminal names
> can be quite long, and you want a shorter name in the action (which is
> typically small, so it's ok to have short local names). In that case,
> the syntax could have an identifier after the '#', as you suggested,
> and this identifier would supersede the nonterminal's identifier
> within the action.

One is really identifying a parser stack location, from which
semantic value, location, and possibly more is extracted, like
perhaps token names (for, among other things, error messages) and
token values (needed when implementing language definitions). This is
why I am thinking in terms of OO like a 'struct' or 'class'.

One might extend the # notation to indicate actions. Then one might
simplify
expression:
INTEGRAL_NUMBER#x { $$ = $x; }
| '-' expression#x %prec NEGATION { $$ = -$x; }
| '(' expression#x ')' { $$ = $x; }
| expression#x '+' expression#y { $$ = $x + $y; }
| expression#x '-' expression#y { $$ = $x - $y; }
| expression#x '*' expression#y { $$ = $x * $y; }
| expression#x '/' expression#y { $$ = $x / $y; }
| expression#x '^' expression#y { $$ = pow($x, $y); }
;
to
expression:
INTEGRAL_NUMBER#x ##identity
| '-' expression#x %prec NEGATION ##neg
| '(' expression#x ')' ##identity
| expression#x '+' expression#y ##add
| expression#x '-' expression#y ##sub
| expression#x '*' expression#y ##mul
| expression#x '/' expression#y ##div
| expression#x '^' expression#y ##pow
;

identity { $$ = $x; }
neg { $$ = -$x; }
add { $$ = $x + $y; }
sub { $$ = $x - $y; }
mul { $$ = $x * $y; }
div { $$ = $x / $y; }
pow { $$ = pow($x, $y); }

The idea is to put the action definitions, at need, somewhere where
they do not make the grammar rules hard to read. In addition, action
definitions used more than once, like "identity" above, need not be
repeated in the grammar.

One must use a double hash "##", or another token than "#", as
a # b
is ambiguous: it can mean either a variable with a parser stack
(semantic) variable, or a variable with nothing followed by a named
action.

Hans Aberg
Joel E. Denny
2006-11-23 06:02:39 UTC
Permalink
On Wed, 22 Nov 2006, Paul Eggert wrote:

> "Joel E. Denny" <***@ces.clemson.edu> writes:
>
> > When optionality and repetitions are
> > possible, do named semantic values even make sense?
>
> Sure. You can give names to the optional or repeated parts. The
> types associated with these parts are the equivalent of the ML "'a
> option" and "'a list" parameterized types.

Sorry, I have no experience with ML. Would you should me how this might
look in a Bison rule?

> No, but if the quality of our notation isn't much affected either way,

There's the rub.

> > At least () for named semantic values has some precedence in Lemon.
> >
> > With all that in mind, I say we stick with parentheses:
> >
> > a(name1): b c() d(@name2)
> >
> > For a, the value and location are $name1 and @name1.
> > For b, they are $b and @b.
> > For c and d, the values are declared unused.
> > For d, the location is @name2.
>
> I like the rule for 'b' -- that's simple.

Great.

> The rest works, but I'd
> like something simpler if possible.
>
> First, I forget: why is it important that values can be declared
> "unused"?

To suppress Bison warnings about unset/unused values.

> That's extra complexity -- is it really worth it?

I think so. Akim originally suggested the empty parentheses so the user
doesn't have to define the empty `#define USE(e)' we've been using in the
test suite. I like his suggestion because it puts more information into
Bison's hands than USE($$) does. For example, we can extend the
parentheses notation to %destructor, so you can write:

%destructor() { printf ("A TOK was discarded.\n"); } TOK

As for a rule action, the () would suppress warnings about the missing $$
in the %destructor action. Unlike USE($$), Bison would be able to
recognize the () as distinct from a mere use of $$. For a %destructor
then, () could also disable all %destructor-based warnings for TOK values
in rule actions. The logic is that, if the user has intentionally omitted
$$ from the %destructor action, then apparently there's no need to destroy
the value, so there's no need to warn about unset/unused TOK values in
rule actions.

> exp:
> NUM { $$ = $NUM; }
> | '-' exp %prec NEG { $$ = - $exp; }
> | '(' exp ')' { $$ = $exp; }
> | exp#1 '+' exp#2 { $$ = $exp#1 + $exp#2; }
> | exp#1 '-' exp#2 { $$ = $exp#1 - $exp#2; }
> | exp#1 '*' exp#2 { $$ = $exp#1 * $exp#2; }
> | exp#1 '/' exp#2 { $$ = $exp#1 / $exp#2; }
> | exp#1 '^' exp#2 { $$ = pow ($exp#1, $exp#2); }

If you prefer numbers, don't even use named values. Use $1 and $3. I
believe we'll continue to support them for backward compatibility
regardless of what happens with named values.

Well known, unevolving grammars with very short RHS's as above don't
really show how worthwhile named semantic values are. Instead, imagine a
very long RHS that keeps evolving. Renumbering is a maintenance problem.

Maybe it's just me, but I prefer Hans' suggestion:

exp/sum: exp/term1 '+' exp/term2

over

exp#sum: exp#term1 '+' exp#term2

I can't think of any semantic reason to prefer one over the other. The
slash is just a little easier on my eyes.

> If it's important to say a value can be unused, that could be written
> "a#-", or something like that.

I don't much like the `-' to mean nothing. I had originally suggested `!'
instead, but I found a reason why I don't like it either, and that reason
also applies to `-'. This one post might help you catch up:

http://lists.gnu.org/archive/html/bison-patches/2006-11/msg00039.html
Paul Eggert
2006-11-23 08:47:09 UTC
Permalink
"Joel E. Denny" <***@ces.clemson.edu> writes:

> Sorry, I have no experience with ML. Would you should me how this might
> look in a Bison rule?

Not offhand. C isn't ML, and we'd have to construct types or
something like that. It'd take some thinking. But the basic idea is
that the EBNF X* has type "list of whatever X returns", and X? maps to
"either an X-type value, or a null pointer".

> Maybe it's just me, but I prefer Hans' suggestion:
>
> exp/sum: exp/term1 '+' exp/term2
>
> over
>
> exp#sum: exp#term1 '+' exp#term2
>
> I can't think of any semantic reason to prefer one over the other. The
> slash is just a little easier on my eyes.

I think "/" bugs me because it means "or" in ABNF, which is the
standard grammatical notation used in Internet RFCs; see
<http://www.ietf.org/rfc/rfc4234>. I could live with "/",
I suppose.

> I don't much like the `-' to mean nothing. I had originally suggested `!'
> instead, but I found a reason why I don't like it either, and that reason
> also applies to `-'. This one post might help you catch up:
>
> http://lists.gnu.org/archive/html/bison-patches/2006-11/msg00039.html

Sorry, I don't follow the argument there. How does it apply to "/"
(or "#" or whatever)?
Joel E. Denny
2006-11-24 17:30:18 UTC
Permalink
On Thu, 23 Nov 2006, Paul Eggert wrote:

> > I still wonder if ISO EBNF is the right language. Aren't most
> > Lex and Yacc users more familiar with notations like "(...)*", "(...)?",
> > and "(...)+"?
>
> Yes, quite likely. I wouldn't be a slave to ISO EBNF (particularly
> since we're already incompatible with it :-), but it can't hurt to be
> inspired by it.

Then we need not fuss over [] being for options anymore.

> > The argument there isn't about the choice of "/" or "#" or "()" or "[]".
> > It's about the choice of "!" (or "-" in the current discussion) to mean
> > nothing. I prefer the empty string to mean nothing.
>
> OK, how about this idea? If rules use the syntax S$A to mean that the
> symbol S has a value that can be called $A within an action, then
> let's use plain S to mean the symbol doesn't have a value.

What about default names? Must the user write?

grammar$grammar: rules$rules decls$decls epilogue$epilogue {
$grammar = new_grammar ($rule, $decls, $epilogue);
}
;

I prefer:

grammar: rules decls epilogue {
$grammar = new_grammar ($rule, $decls, $epilogue);
}
;

Also, why $ now instead of #? $ makes it look like it works for values
and not locations.
Joel E. Denny
2006-11-24 19:49:07 UTC
Permalink
On Fri, 24 Nov 2006, Joel E. Denny wrote:

> > OK, how about this idea? If rules use the syntax S$A to mean that the
> > symbol S has a value that can be called $A within an action, then
> > let's use plain S to mean the symbol doesn't have a value.
>
> What about default names?

Another problem is that all existing grammars would implicitly declare all
values to be unused even when they're not. For example:

exp: exp '+' exp { $$ = $1; } ;

There's no warning about $3.
Hans Aberg
2006-11-24 20:35:15 UTC
Permalink
On 24 Nov 2006, at 20:49, Joel E. Denny wrote:

>> What about default names?
>
> Another problem is that all existing grammars would implicitly
> declare all
> values to be unused even when they're not. For example:
>
> exp: exp '+' exp { $$ = $1; } ;
>
> There's no warning about $3.

You would need a rule telling, in the case of multiple occurrences,
when the default names can be used. One rule could be that only the
leftmost occurrence gets the default. So the above could be written
exp: exp '+' exp { $exp = $1; };
Or if the LHS does not get a default name:
exp: exp '+' exp { $$ = $exp; };
Alternatively, if there are multiple occurrences, none can be used.

Write the rule above properly:
exp: exp '+' exp { $$ = $1 + $3; };
with alternate forms (depending on which defulat rule used):
exp: exp '+' exp { $exp = $1 + $3; };
exp: exp '+' exp { $$ = $exp + $3; };
Perhaps these forms are confusing: then none should be usable.

As for warnings: isn't best to treat it as compiler options? After
all, tokens can produce values as well. If one worries about unused
variables, one could turn these options temporarily on.

Hans Aberg
Joel E. Denny
2006-11-25 02:00:12 UTC
Permalink
On Fri, 24 Nov 2006, Hans Aberg wrote:

> You would need a rule telling, in the case of multiple occurrences, when the
> default names can be used.

I addressed the problem of ambiguous names here:

http://lists.gnu.org/archive/html/bison-patches/2006-11/msg00039.html
Hans Aberg
2006-11-25 12:18:12 UTC
Permalink
On 25 Nov 2006, at 03:00, Joel E. Denny wrote:

>> You would need a rule telling, in the case of multiple
>> occurrences, when the
>> default names can be used.
>
> I addressed the problem of ambiguous names here:
>
> http://lists.gnu.org/archive/html/bison-patches/2006-11/
> msg00039.html

I looked at it, before my post, now that the mail list server is up
again.

You are into the idea that the unused variables should be expressed
via the grammar language. It is a wholly unusual way to do it via a
computer language: normally it is a compiler option.

So I am working along the lines of a compiler option that can be
temporarily turned on in the case one needs to check unused
variables. This would then simplify the implementation of new grammar
language features, as the past discussion has shown.

I have myself little use of an unused variables feature, making it
difficult for me to focus on it - so I leave it to you to decide it
(which you, as developer, will do anyway :-)). My main concern is
that it should not conflict with anything else that might be
implemented later, like an eventual EBNF, or something else that
might come up.

Hans Aberg
Hans Aberg
2006-11-25 12:37:35 UTC
Permalink
On 25 Nov 2006, at 03:00, Joel E. Denny wrote:

>> You would need a rule telling, in the case of multiple
>> occurrences, when the
>> default names can be used.
>
> I addressed the problem of ambiguous names here:
>
> http://lists.gnu.org/archive/html/bison-patches/2006-11/
> msg00039.html

If you split up the problem of naming variables - language feature,
from the unused variables - a compiler option, then it might simplify
implementation. For example, one might write
exp : exp '+' exp { $$ = $1; }
%unused @2, $3
;
if now the location value of variable 2 and semantic value of
variable 3 are unused.

Then one is freed from restraints of the grammar language itself.

Hans Aberg
Hans Aberg
2006-11-25 15:37:18 UTC
Permalink
On 25 Nov 2006, at 03:00, Joel E. Denny wrote:

>> You would need a rule telling, in the case of multiple
>> occurrences, when the
>> default names can be used.
>
> I addressed the problem of ambiguous names here:
>
> http://lists.gnu.org/archive/html/bison-patches/2006-11/
> msg00039.html

The question of default names is somewhat similar to that one of
functions. What about:
int f(int) {
return f + 1;
}
Here I use the function name as default for the variable name. I do
not write this as polemics - the situation is not exactly the same,
but similar.

But if "/" (say) is used to indicate a variable, the default naming
will need another symbol to indicate the absence of a variable name.
It is rather tight to use another symbol than "/", so perhaps "//"
then. Thus:
expression:
INTEGRAL_NUMBER// #identity_INTEGRAL_NUMBER
| '-' expression/x %prec NEGATION #neg
| '(' expression// ')' #identity_expression
| expression// '+' expression/y #add
| expression/x '-' expression// #sub
| expression// '*' expression/y #mul
| expression/x '/' expression// #div
| expression// '^' expression/y #pow
;

identity_INTEGRAL_NUMBER { $$ = $INTEGRAL_NUMBER; }
neg { $$ = -$expression ; }
identity_expression { $$ = $expression ; }
add { $$ = $expression + $y; }
sub { $$ = $x - $expression; }
mul { $$ = $expression * $y; }
div { $$ = $x / $expression; }
pow { $$ = pow($expression, $y); }

Another idea is to admit say "0" for the empty expansion: it is
normal to have a symbol for it in grammars, and the flexible grammar
of Bison makes mistakes easy. Otherwise, one might have:

expression:
INTEGRAL_NUMBER/0 #identity_INTEGRAL_NUMBER
| '-' expression/x %prec NEGATION #neg
| '(' expression/0 ')' #identity_expression
| expression/0 '+' expression/y #add
| expression/x '-' expression/0 #sub
| expression/0 '*' expression/y #mul
| expression/x '/' expression/0 #div
| expression/0 '^' expression/y #pow
;
...

Hans Aberg
Hans Aberg
2006-11-25 17:37:40 UTC
Permalink
On 25 Nov 2006, at 03:00, Joel E. Denny wrote:

>> You would need a rule telling, in the case of multiple
>> occurrences, when the
>> default names can be used.
>
> I addressed the problem of ambiguous names here:
>
> http://lists.gnu.org/archive/html/bison-patches/2006-11/
> msg00039.html

Here is another attempt at analyzing what is going on:

Put in variable defaults in a C-function:
int add(int(), int y) {
return add + y;
}

int() add(int x, int y) {
add = x + y;
}
Only the last one might have some acceptance.

Now, in the case what is the name of the function? If a CFG rule is
written
x -> x_1 ... x_k
then it seems that the name is "->", i.e., one has in reality
->(x, x_1, ..., x_k)
In other words, the values, semantic, location, possibly more are not
functions of the grammar variables in the rules, but really of the
rule position. The grammar variables already act as plugin values in
these rules.

This last principle might explain why it is so hard to get a good way
of naming the default variables. It makes me worry whether the idea
is right.

Therefore I end up with an outline of a proposal as follows:

* Rule variables can be named using a "/".
* Actions can be named using a "#".
* The empty rule can be given the name "0". If the compiler should
require this use (resp. absence of it), there are options %empty-
named (resp. %empty-not-named).
* The Bison compiler has an option where warnings against unused
variables can be turned on. When on, the behavior of this option can
be fine tuned using option %unused, %used, (or %unused-value, %used-
value) applicable to rule variables and their subcomponents, as well
as on tokens.

So it would look like

%empty-named
%used-value INTEGRAL_NUMBER

expression-sequence:
0
| expression-sequence ',' expression
;

expression:
INTEGRAL_NUMBER/x #identity
| '-' expression/x %prec NEGATION #neg
| '(' expression/x ')' #identity
| expression/x '+' expression/y #add
| expression/x '-' expression/y #sub
| expression/x '*' expression/y #mul
| expression/x '/' expression/y #div
| expression/x '^' expression/y #pow
;

identity { $$ = $x; }
neg { $$ = -$x; }
add { $$ = $x + $y; }
sub { $$ = $x - $y; }
mul { $$ = $x * $y; }
div { $$ = $x / $y; }
pow { $$ = pow($x, $y); }

Etc.

Hans Aberg
Hans Aberg
2006-11-26 12:30:12 UTC
Permalink
On 25 Nov 2006, at 03:00, Joel E. Denny wrote:

>> You would need a rule telling, in the case of multiple
>> occurrences, when the
>> default names can be used.
>
> I addressed the problem of ambiguous names here:
>
> http://lists.gnu.org/archive/html/bison-patches/2006-11/
> msg00039.html

Here, another way analyzing what is going on.

Symuppse the grammar and action language identifiers have different
syntax. If the former, but not the latter, accepts "-" in
identifiers, then one could write:

arithmetic-expression:
INTEGRAL_NUMBER//
#identity_INTEGRAL_NUMBER
| '-' arithmetic-expression/x %prec NEGATION #neg
| '(' arithmetic-expression// ')'
#identity_expression
| arithmetic-expression// '+' arithmetic-expression// #add
| arithmetic-expression/x '-' arithmetic-expression// #sub
| arithmetic-expression// '*' arithmetic-expression/y #mul
| arithmetic-expression/x '/' arithmetic-expression// #div
| arithmetic-expression// '^' arithmetic-expression/y #pow
;

identity_INTEGRAL_NUMBER { $$ = $INTEGRAL_NUMBER; }
neg { $$ = -$arithmetic-expression ; }
identity_expression { $$ = $arithmetic-expression; }
add { $$ = $arithmetic-expression + $y; }
sub { $$ = $x - $arithmetic-expression; }
mul { $$ = $arithmetic-expression * $y; }
div { $$ = $x / $arithmetic-expression; }
pow { $$ = pow($arithmetic-expression, $y); }

Here, it does not work in the actions, because the variable names
contain "-". The reason is that syntax of two different languages
(i.e., grammar and action languages) are mixed indiscriminately.

So, strictly speaking, the explicitly indicated variables should
follow the action language syntax, though that may not be fully
possible. One way to ensure this, though, might be to somehow admit C-
strings as variable names. For example:
expression:
...
| expression/"#1" '+' expression/"#2" { $#1 + $#2 }
...
;
if there is an action variable that admits "#" in the variable names.

Hans Aberg
Hans Aberg
2006-11-24 19:07:28 UTC
Permalink
On 24 Nov 2006, at 18:30, Joel E. Denny wrote:

>>> I still wonder if ISO EBNF is the right language. Aren't most
>>> Lex and Yacc users more familiar with notations like "(...)*",
>>> "(...)?",
>>> and "(...)+"?
>>
>> Yes, quite likely. I wouldn't be a slave to ISO EBNF (particularly
>> since we're already incompatible with it :-), but it can't hurt to be
>> inspired by it.
>
> Then we need not fuss over [] being for options anymore.

Not for the sake of this standard.

But otherwise, it seems good to do what Flex is already doing. It
might be confusing switching syntax.

>>> The argument there isn't about the choice of "/" or "#" or "()"
>>> or "[]".
>>> It's about the choice of "!" (or "-" in the current discussion)
>>> to mean
>>> nothing. I prefer the empty string to mean nothing.
>>
>> OK, how about this idea? If rules use the syntax S$A to mean that
>> the
>> symbol S has a value that can be called $A within an action, then
>> let's use plain S to mean the symbol doesn't have a value.
>
> What about default names? Must the user write?
>
> grammar$grammar: rules$rules decls$decls epilogue$epilogue {
> $grammar = new_grammar ($rule, $decls, $epilogue);
> }
> ;
>
> I prefer:
>
> grammar: rules decls epilogue {
> $grammar = new_grammar ($rule, $decls, $epilogue);
> }
> ;

If you think those are needed.

> Also, why $ now instead of #? $ makes it look like it works for
> values
> and not locations.

This is why I think OO: The variables indicated in the grammar rules
really are parser stack locations.
Joel E. Denny
2006-11-24 00:42:29 UTC
Permalink
On Thu, 23 Nov 2006, Paul Eggert wrote:

> "Joel E. Denny" <***@ces.clemson.edu> writes:
>
> > Sorry, I have no experience with ML. Would you should me how this might
> > look in a Bison rule?
>
> Not offhand. C isn't ML, and we'd have to construct types or
> something like that. It'd take some thinking. But the basic idea is
> that the EBNF X* has type "list of whatever X returns", and X? maps to
> "either an X-type value, or a null pointer".

And what if X is an alternation of symbols with different types?

Well, this reminds me of the proposal I made for treating the error token
as a nonterminal. (I posted it to Bison patches on Aug. 21, but the
archives are down, so I can't get the URL right now.) I suppose it would
be possible for the user to designate a special ebnf symbol and type and
then provide generic rules similar to the rules I was proposing for the
error token:

%ebnf-sym ebnf;
%type <list> ebnf;
ebnf:
/* empty */ { /* Init $$. */ }
| ebnf sym1 { /* Append sym1 to $$. */ }
| ebnf SYM2 { /* Append SYM2 to $$. */ }
| ebnf <field1> { /* Automatic rules for syms of type <field1>. */ }
| ebnf <field2> { /* Automatic rules for syms of type <field2>. */ }
| ebnf <list> { /* Automatic rules for syms of type <list>. */ }
| ebnf <*> { /* Default rule for tagged symbol? */ }
| ebnf <> { /* Default rule for tagless symbol? */ }
;

Bison would not allow rules of any other form when the LHS is the ebnf
symbol.

Now, borrowing a part of Akim's example, the alternations and repetitions
below would be constructed using the generic rules above:

options: (opt1 | opt2) (',' (opt1 | opt2))*

There are two groups on the RHS of this rule. In this rule's semantic
action, each group would have the type of the ebnf symbol, <list>. The
values are $1 and $2, but if you want to name the values:

options: (opt1 | opt2)[a] (',' (opt1 | opt2))*[b]

Now, $a and $b will work.

Ok, I guess I'm on board with EBNF... as a very distant prospect.
However, I still wonder if ISO EBNF is the right language. Aren't most
Lex and Yacc users more familiar with notations like "(...)*", "(...)?",
and "(...)+"?

> I think "/" bugs me because it means "or" in ABNF, which is the
> standard grammatical notation used in Internet RFCs; see
> <http://www.ietf.org/rfc/rfc4234>.

That's fine.

> > I don't much like the `-' to mean nothing. I had originally suggested `!'
> > instead, but I found a reason why I don't like it either, and that reason
> > also applies to `-'. This one post might help you catch up:
> >
> > http://lists.gnu.org/archive/html/bison-patches/2006-11/msg00039.html
>
> Sorry, I don't follow the argument there. How does it apply to "/"
> (or "#" or whatever)?

The argument there isn't about the choice of "/" or "#" or "()" or "[]".
It's about the choice of "!" (or "-" in the current discussion) to mean
nothing. I prefer the empty string to mean nothing.
Paul Eggert
2006-11-24 07:13:56 UTC
Permalink
"Joel E. Denny" <***@ces.clemson.edu> writes:

> And what if X is an alternation of symbols with different types?

Its type would be their discriminated union.

> I still wonder if ISO EBNF is the right language. Aren't most
> Lex and Yacc users more familiar with notations like "(...)*", "(...)?",
> and "(...)+"?

Yes, quite likely. I wouldn't be a slave to ISO EBNF (particularly
since we're already incompatible with it :-), but it can't hurt to be
inspired by it.

> The argument there isn't about the choice of "/" or "#" or "()" or "[]".
> It's about the choice of "!" (or "-" in the current discussion) to mean
> nothing. I prefer the empty string to mean nothing.

OK, how about this idea? If rules use the syntax S$A to mean that the
symbol S has a value that can be called $A within an action, then
let's use plain S to mean the symbol doesn't have a value. That's
even shorter, and simpler. So, something like this:

exp:
NUM$a { $$ = $a; }
| '-' exp$a %prec NEG { $$ = - $a; }
| '(' exp$a ')' { $$ = $a; }
| exp$a '+' exp$b { $$ = $a + $b; }
| exp$a '-' exp$b { $$ = $a - $b; }
| exp$a '*' exp$b { $$ = $a * $b; }
| exp$a '/' exp$b { $$ = $a / $b; }
| exp$a '^' exp$b { $$ = pow ($a, $b); }

We issue a diagnostic if the user attempts to combine this new
notation with the old $1, $2, $3 notation.
Hans Aberg
2006-11-23 12:59:41 UTC
Permalink
On 23 Nov 2006, at 09:47, Paul Eggert wrote:

>> Maybe it's just me, but I prefer Hans' suggestion:
>>
>> exp/sum: exp/term1 '+' exp/term2
>>
>> over
>>
>> exp#sum: exp#term1 '+' exp#term2
>>
>> I can't think of any semantic reason to prefer one over the
>> other. The
>> slash is just a little easier on my eyes.
>
> I think "/" bugs me because it means "or" in ABNF, which is the
> standard grammatical notation used in Internet RFCs; see
> <http://www.ietf.org/rfc/rfc4234>. I could live with "/",
> I suppose.

Here are the two variations, for comparisons:

expression:
INTEGRAL_NUMBER/x { $$ = $x; }
| '-' expression/x %prec NEGATION { $$ = -$x; }
| '(' expression/x ')' { $$ = $x; }
| expression/x '+' expression/y { $$ = $x + $y; }
| expression/x '-' expression/y { $$ = $x - $y; }
| expression/x '*' expression/y { $$ = $x * $y; }
| expression/x '/' expression/y { $$ = $x / $y; }
| expression/x '^' expression/y { $$ = pow($x, $y); }
;

expression:
INTEGRAL_NUMBER#x { $$ = $x; }
| '-' expression#x %prec NEGATION { $$ = -$x; }
| '(' expression#x ')' { $$ = $x; }
| expression#x '+' expression#y { $$ = $x + $y; }
| expression#x '-' expression#y { $$ = $x - $y; }
| expression#x '*' expression#y { $$ = $x * $y; }
| expression#x '/' expression#y { $$ = $x / $y; }
| expression#x '^' expression#y { $$ = pow($x, $y); }
;

One advantage of the first one, using "/", is that frees "#" for
indicating actions, if it should be cobined with my other proposal.
Then it will look like:

expression:
INTEGRAL_NUMBER/x #identity
| '-' expression/x %prec NEGATION #neg
| '(' expression/x ')' #identity
| expression/x '+' expression/y #add
| expression/x '-' expression/y #sub
| expression/x '*' expression/y #mul
| expression/x '/' expression/y #div
| expression/x '^' expression/y #pow
;

identity { $$ = $x; }
neg { $$ = -$x; }
add { $$ = $x + $y; }
sub { $$ = $x - $y; }
mul { $$ = $x * $y; }
div { $$ = $x / $y; }
pow { $$ = pow($x, $y); }

I think this last variation makes the grammar standing out quite
clearly.

One problem though is that the variable definitions end up at
different places. Ten one might combine the different ideas to arrive
at:

expression:
INTEGRAL_NUMBER/x #identity(x)
| '-' expression/x %prec NEGATION #neg(x)
| '(' expression/x ')' #identity(x)
| expression/x '+' expression/y #add(x, y)
| expression/x '-' expression/y #sub(x, y)
| expression/x '*' expression/y #mul(x, y)
| expression/x '/' expression/y #div(x, y)
| expression/x '^' expression/y #pow(x, y)
;

identity(x) { $$ = $x; }
neg(x) { $$ = -$x; }
add(x, y) { $$ = $x + $y; }
sub(x, y) { $$ = $x - $y; }
mul(x, y) { $$ = $x * $y; }
div(x, y) { $$ = $x / $y; }
pow(x, y) { $$ = pow($x, $y); }

But I am not sure it adds something. - I put in this example, to see
where things are heading. Apparently, the actions act as implicit
functions, and the grammar variables insert values into those functions.

Hans Aberg
Hans Aberg
2006-11-23 12:31:39 UTC
Permalink
On 23 Nov 2006, at 09:47, Paul Eggert wrote:

>> Maybe it's just me, but I prefer Hans' suggestion:
>>
>> exp/sum: exp/term1 '+' exp/term2
>>
>> over
>>
>> exp#sum: exp#term1 '+' exp#term2
>>
>> I can't think of any semantic reason to prefer one over the
>> other. The
>> slash is just a little easier on my eyes.
>
> I think "/" bugs me because it means "or" in ABNF, which is the
> standard grammatical notation used in Internet RFCs; see
> <http://www.ietf.org/rfc/rfc4234>.

As for BNF extensions, I think Bison will have to get its own
variation. These standards where cooked up with other objectives in
mind.

> I could live with "/",
> I suppose.

I think "/" could be used in math, but one is there used to overload
symbols. I thought at first "#" might look to heavy, but in the
calculator example, it looks fine. And "#" is used in TeX i onccetion
with variables.

>> I don't much like the `-' to mean nothing. I had originally
>> suggested `!'
>> instead, but I found a reason why I don't like it either, and that
>> reason
>> also applies to `-'. This one post might help you catch up:
>>
>> http://lists.gnu.org/archive/html/bison-patches/2006-11/
>> msg00039.html
>
> Sorry, I don't follow the argument there. How does it apply to "/"
> (or "#" or whatever)?

And when I tried to look, the server was down.

Hans Aberg
Hans Aberg
2006-11-21 22:28:25 UTC
Permalink
On 21 Nov 2006, at 22:39, Joel E. Denny wrote:

>> But here, the LHSs should somehow be adapted to Bison notation.
>
> I think this is the key. We can't hope to avoid collisions with every
> notation Bison might possibly want to use in the future or that users
> might already be familiar with.

The problem is that EBNF is so fundamental, it is important to find a
good notation of it.

> There are only so many characters on the
> keyboard.

Well, Unicode has some 100000 plus characters, and one might
substitute poor-mans-ASCII for some of them.

> Bison will have to develop its own alternatives in some cases.
> However, using () in a grammar to group symbols is pervasive in my
> experience, and so we might ought to avoid using it for other
> purposes.

So this seems.

> I suspect that using [] in a grammar to mean optional is not nearly as
> common.

I think it is quite common, in fact.

>> I think "[ ]" will work much
>> better than "(...)?", which is sometimes used - one needs to
>> attach actions to
>> these EBNF constructs as well, and postfix operators like "?" may
>> cause
>> problems.
>
> I don't follow.

It might cause a long lookahead, though I do not know if it is needed
in this case. I had to implement the implicit set notation used in a
metamath book
{f|P}_x
where x is the variable bound by the set construct. Then one does not
know what variable is bound until after the "_x" subscript has been
read, making it hard to parse f and P. So I changed it to:
{_x f|P}

Perhaps not a problem here, as "?" will be a key symbol. Just came to
my mind.

Hans Aberg
Joel E. Denny
2006-11-21 22:41:53 UTC
Permalink
On Tue, 21 Nov 2006, Hans Aberg wrote:

> The problem is that EBNF is so fundamental, it is important to find a good
> notation of it.

Bison will already need an alternative for {}.

> > There are only so many characters on the
> > keyboard.
>
> Well, Unicode has some 100000 plus characters, and one might substitute
> poor-mans-ASCII for some of them.

Do you have a specific proposal in mind?

> > I suspect that using [] in a grammar to mean optional is not nearly as
> > common.
>
> I think it is quite common, in fact.

I don't mean to say it isn't. I mean to say that it's less common than
(), and we have to pick something.
Hans Aberg
2006-11-22 12:25:00 UTC
Permalink
On 21 Nov 2006, at 23:41, Joel E. Denny wrote:

>> The problem is that EBNF is so fundamental, it is important to
>> find a good
>> notation of it.
>
> Bison will already need an alternative for {}.

The proposal I made does not use "{ }", except in Bison actions.

>>> There are only so many characters on the
>>> keyboard.
>>
>> Well, Unicode has some 100000 plus characters, and one might
>> substitute
>> poor-mans-ASCII for some of them.
>
> Do you have a specific proposal in mind?

I already gave one: using U+2192 '→', which in ASCII looks like "->".

>>> I suspect that using [] in a grammar to mean optional is not
>>> nearly as
>>> common.
>>
>> I think it is quite common, in fact.
>
> I don't mean to say it isn't. I mean to say that it's less common
> than
> (), and we have to pick something.

I think "[ ]" for optional looks good. You seem to have made your
mind about these variables, and want to adapt the other stuff around
it. I wonder if this is wise: if clashes can be avoided that way, the
combination might be cumbersome.

Hans Aberg
Joel E. Denny
2006-11-22 14:44:41 UTC
Permalink
On Wed, 22 Nov 2006, Hans Aberg wrote:

> The proposal I made does not use "{ }", except in Bison actions.

Sorry, I was thinking of ISO EBNF.

> > Do you have a specific proposal in mind?
>
> I already gave one: using U+2192 '→', which in ASCII looks like "->".

First, given you've still chosen an ASCII representation, I don't see the
benefit of Unicode here. Second, I don't see how this addresses the issue
we're discussing anyway: names for semantic values and locations.

> You seem to have made your mind about
> these variables, and want to adapt the other stuff around it. I wonder if this
> is wise: if clashes can be avoided that way, the combination might be
> cumbersome.

I tried considering alternatives, but I think it may be a lost cause.
More importantly, when it comes to EBNF, I'm not sure it's worthwhile.
Do you know of someone who's actually going to contribute EBNF support to
Bison?
Hans Aberg
2006-11-22 15:44:46 UTC
Permalink
On 22 Nov 2006, at 15:44, Joel E. Denny wrote:

>> The proposal I made does not use "{ }", except in Bison actions.
>
> Sorry, I was thinking of ISO EBNF.

I looked at that one a long time ago, and I think it was strange in
some respects, though I do not immediately recall what. Better to
design a grammar ones own.

>>> Do you have a specific proposal in mind?
>>
>> I already gave one: using U+2192 '→', which in ASCII looks like "-
>> >".
>
> First, given you've still chosen an ASCII representation, I don't
> see the
> benefit of Unicode here.

Or vice versa: choosing a Unicode representation makes ASCII
unnecessary, ecept for those poor guys that do not have an UTF-8
editor. But then for this latter, one might make separate Unicode-
ASCII translators.

> Second, I don't see how this addresses the issue
> we're discussing anyway: names for semantic values and locations.

Overuse of tokens may cause grammar conflicts.

>> You seem to have made your mind about
>> these variables, and want to adapt the other stuff around it. I
>> wonder if this
>> is wise: if clashes can be avoided that way, the combination might be
>> cumbersome.
>
> I tried considering alternatives, but I think it may be a lost cause.
> More importantly, when it comes to EBNF, I'm not sure it's worthwhile.
> Do you know of someone who's actually going to contribute EBNF
> support to
> Bison?

It has popped up from time in Help-Bison, and the last time Akim
seemed to be interested, given that Bison already has all the
features making an implementation easy: a .y grammar, and implicit
grammar variables, already used for the implementation of rule-mid-
actions. Therefore, I wrote this EBNF proposal in Bug-Bison.

Hans Aberg
Joel E. Denny
2006-11-22 16:27:02 UTC
Permalink
On Wed, 22 Nov 2006, Hans Aberg wrote:

> Or vice versa: choosing a Unicode representation makes ASCII unnecessary,
> ecept for those poor guys that do not have an UTF-8 editor. But then for this
> latter, one might make separate Unicode-ASCII translators.

If the ASCII looks fine and is necessary anyway, why bother with Unicode?

> > Second, I don't see how this addresses the issue
> > we're discussing anyway: names for semantic values and locations.
>
> Overuse of tokens may cause grammar conflicts.

That sounds nice in general, but I don't see a solution to our problem
developing.

> It has popped up from time in Help-Bison, and the last time Akim seemed to be
> interested

If you're referring to the discussion in March, I see that he asked how it
would work, which I still don't understand. I saw no expression of
interest afterwards. Did I miss a message?
Hans Aberg
2006-11-22 18:04:57 UTC
Permalink
On 22 Nov 2006, at 17:27, Joel E. Denny wrote:

>> Or vice versa: choosing a Unicode representation makes ASCII
>> unnecessary,
>> ecept for those poor guys that do not have an UTF-8 editor. But
>> then for this
>> latter, one might make separate Unicode-ASCII translators.
>
> If the ASCII looks fine and is necessary anyway, why bother with
> Unicode?

Don't know. I suggest to use Unicode when the ASCII isn't necessary
and looks awful.

>>> Second, I don't see how this addresses the issue
>>> we're discussing anyway: names for semantic values and locations.
>>
>> Overuse of tokens may cause grammar conflicts.
>
> That sounds nice in general, but I don't see a solution to our problem
> developing.

You insist on using "( )", etc.

>> It has popped up from time [to time] in Help-Bison, and the last
>> time Akim seemed to be
>> interested
>
> If you're referring to the discussion in March, I see that he asked
> how it
> would work, which I still don't understand. I saw no expression of
> interest afterwards. Did I miss a message?

I saw no reaction after my EBNF proposal. This is why I cc him and
Paul Eggert, to see if there can be clarification. Should Bison have
EBNF sometime in the future or not, that is the question.

Hans Aberg
Joel E. Denny
2006-11-22 19:22:31 UTC
Permalink
On Wed, 22 Nov 2006, Hans Aberg wrote:

> You insist on using "( )", etc.

I'm not insisting. However, no one has proposed an alternative that
works. As far as I can tell, you have agreed that there are problems with
"exp.sum", "exp/sum", and "sum:exp". You keep suggesting Unicode but you
won't make a specific proposal other than something related to "->", which
is not what we're talking about.
Hans Aberg
2006-11-22 19:35:42 UTC
Permalink
On 22 Nov 2006, at 20:22, Joel E. Denny wrote:

>> You insist on using "( )", etc.
>
> I'm not insisting. However, no one has proposed an alternative that
> works.

I think it is a tricky question. I would not want to rush ahead with it.

> As far as I can tell, you have agreed that there are problems with
> "exp.sum", "exp/sum", and "sum:exp".

No. Only the last one. You claimed there were some tokenization
problems with the first two, but I didn't see it.

> You keep suggesting Unicode but you
> won't make a specific proposal other than something related to "-
> >", which
> is not what we're talking about.

One would throw them in at need, when sitting down, trying to
compiled the grammar.

Hans Aberg
Joel E. Denny
2006-11-23 06:03:34 UTC
Permalink
On Wed, 22 Nov 2006, Hans Aberg wrote:

> On 22 Nov 2006, at 20:22, Joel E. Denny wrote:
>
> > > You insist on using "( )", etc.
> >
> > I'm not insisting. However, no one has proposed an alternative that
> > works.
>
> I think it is a tricky question. I would not want to rush ahead with it.

For the record, we've been discussing these issues off and on for roughly
a year now. The named value discussion appears to have started at least
several years before that. I wasn't working on Bison then though.

> > As far as I can tell, you have agreed that there are problems with
> > "exp.sum", "exp/sum", and "sum:exp".
>
> No. Only the last one. You claimed there were some tokenization problems with
> the first two, but I didn't see it.

I'm guessing it'll come up as I catch Paul up, so I'll explain it again
then.
Hans Aberg
2006-11-23 12:00:24 UTC
Permalink
On 23 Nov 2006, at 07:03, Joel E. Denny wrote:

>> I think it is a tricky question. I would not want to rush ahead
>> with it.
>
> For the record, we've been discussing these issues off and on for
> roughly
> a year now. The named value discussion appears to have started at
> least
> several years before that. I wasn't working on Bison then though.

So it is tricky then.

>>> As far as I can tell, you have agreed that there are problems with
>>> "exp.sum", "exp/sum", and "sum:exp".
>>
>> No. Only the last one. You claimed there were some tokenization
>> problems with
>> the first two, but I didn't see it.
>
> I'm guessing it'll come up as I catch Paul up, so I'll explain it
> again
> then.

I like Paul's suggestion using "#" for indicating variable names. So
that one could be used, in my opinion.

Hans Aberg
Continue reading on narkive:
Loading...