This program reads C source code from a file and writes RISC-V assembly to another file.
The input file will be pre-processed ANSI C, also called C90 or C89.
A Lexer is a tool that processes an input sequence of characters (usually source code or text) and converts it into a sequence of tokens. Each token represents a meaningful element, such as:
These tokens are then used in subsequent steps of compilation.
In this project, we use Flex (Fast Lexical Analyzer Generator), a tool for generating a lexical analyzer.
A sample code is shown below:
"sizeof" { count(); return(SIZEOF); }
"static" { count(); return(STATIC); }
"struct" { count(); return(STRUCT); }
"switch" { count(); return(SWITCH); }
"typedef" { cout << "typedef "; return(TYPEDEF); }
"union" { count(); return(UNION); }
"unsigned" { count(); return(UNSIGNED); }
"void" { count(); return(VOID); }
"volatile" { count(); return(VOLATILE); }
"while" { count(); return(WHILE); }
{L}({L}|{D}|[_])* { cout << "id" << endl; yylval.my_string = new std::string(yytext); return(IDENTIFIER); }
[0-9]+ { cout << std::stoi(yytext, NULL); yylval.my_int = std::stoi(yytext, NULL); return INTNUM; }
"..." { count(); return(ELLIPSIS); }
">>=" { count(); return(RIGHT_ASSIGN); }
"<<=" { count(); return(LEFT_ASSIGN); }
"+=" { count(); return(ADD_ASSIGN); }
"-=" { count(); return(SUB_ASSIGN); }
"*=" { count(); return(MUL_ASSIGN); }
"/=" { count(); return(DIV_ASSIGN); }
"%=" { count(); return(MOD_ASSIGN); }
"&=" { count(); return(AND_ASSIGN); }
"^=" { count(); return(XOR_ASSIGN); }
A Parser is a tool that takes the sequence of tokens generated by the lexer and analyzes their grammatical structure based on the rules of a formal grammar.
The parser constructs a parse tree (or syntax tree) that represents the hierarchical structure of the input. This tree is essential for further processes like:
For this project, we use Bison, a tool to define the grammar of a language and automatically generate a parser in C or C++.
A demo of code is shown below:
%type <my_program_list> PROGRAM_LIST
%type <my_enum_list> enum_list
%type <my_enum_assign> enum_assign
%type <my_struct_list> struct_list
%type <my_struct_assign> struct_assign
%start ROOT
%%
ROOT
: PROGRAM_LIST {cout<<"program->root"<<endl; g_root = $1; }
// | function_definition {g_root = $1;std::cout << "root reached" << std::endl;}
;
PROGRAM_LIST
: PROGRAM {$$ = new ProgramList(); $$->addProgram($1);cout << "program_list" << std::endl;}
| PROGRAM_LIST PROGRAM {$$ = $1; $$->addProgram($2);cout << "program_list_add" << std::endl;}
;
PROGRAM
: function_definition {$$ = $1;}
| enum_declaration {$$ = $1;}
| struct_declaration {$$ = $1;}
| declaration_statement {$$ = $1;}
| array {$$ = $1;}
| TYPEDEF type IDENTIFIER ';' {$$ = new Typedef(*$2, *$3);}
;
function_definition
: type IDENTIFIER '(' ')' compound_statement {$$ = new FunctionDefination(*$1, *$2, $5); cout<<"function_definition no argument"<<endl;}
| type IDENTIFIER '(' parameter_list ')' compound_statement {$$ = new FunctionDefination(*$1, *$2, $6, $4); cout<<"function_definition with argument"<<endl;}
| type IDENTIFIER '(' parameter_list ')' ';' {$$ = new FunctionDefination(*$1, *$2, NULL, $4); cout<<"empty_function_definition"<<endl;}
| type IDENTIFIER '(' ')' ';' {$$ = new FunctionDefination(*$1, *$2);}
;
struct_reference
: STRUCT IDENTIFIER IDENTIFIER ';' {$$ = new StructReference(*$2, *$3);}
;