A compiler for the C language

This program reads C source code from a file and writes RISC-V assembly to another file.

The input file will be pre-processed ANSI C, also called C90 or C89.

Lexer (Lexical Analyzer)

A Lexer is a tool that processes an input sequence of characters (usually source code or text) and converts it into a sequence of tokens. Each token represents a meaningful element, such as:

Keywords
Identifiers
Operators
Delimiters

These tokens are then used in subsequent steps of compilation.

In this project, we use Flex (Fast Lexical Analyzer Generator), a tool for generating a lexical analyzer.

A sample code is shown below:


"sizeof"		{ count(); return(SIZEOF); }
"static"		{ count(); return(STATIC); }
"struct"		{ count(); return(STRUCT); }
"switch"		{ count(); return(SWITCH); }
"typedef"		{ cout << "typedef "; return(TYPEDEF); }
"union"			{ count(); return(UNION); }
"unsigned"		{ count(); return(UNSIGNED); }
"void"			{ count(); return(VOID); }
"volatile"		{ count(); return(VOLATILE); }
"while"			{ count(); return(WHILE); }

{L}({L}|{D}|[_])*		{ cout << "id" << endl; yylval.my_string = new std::string(yytext); return(IDENTIFIER); }




[0-9]+		{ cout << std::stoi(yytext, NULL); yylval.my_int = std::stoi(yytext, NULL); return INTNUM; }


"..."			{ count(); return(ELLIPSIS); }
">>="			{ count(); return(RIGHT_ASSIGN); }
"<<="			{ count(); return(LEFT_ASSIGN); }
"+="			{ count(); return(ADD_ASSIGN); }
"-="			{ count(); return(SUB_ASSIGN); }
"*="			{ count(); return(MUL_ASSIGN); }
"/="			{ count(); return(DIV_ASSIGN); }
"%="			{ count(); return(MOD_ASSIGN); }
"&="			{ count(); return(AND_ASSIGN); }
"^="			{ count(); return(XOR_ASSIGN); }

Parser

A Parser is a tool that takes the sequence of tokens generated by the lexer and analyzes their grammatical structure based on the rules of a formal grammar.

The parser constructs a parse tree (or syntax tree) that represents the hierarchical structure of the input. This tree is essential for further processes like:

Code generation
Interpretation

For this project, we use Bison, a tool to define the grammar of a language and automatically generate a parser in C or C++.

A demo of code is shown below:

%type <my_program_list>	PROGRAM_LIST
%type <my_enum_list>	enum_list
%type <my_enum_assign>	enum_assign
%type <my_struct_list>	struct_list
%type <my_struct_assign>	struct_assign


%start ROOT
%%

ROOT 
	: PROGRAM_LIST	{cout<<"program->root"<<endl; g_root = $1; }
	// | function_definition	{g_root = $1;std::cout << "root reached" << std::endl;}
	;

PROGRAM_LIST
	: PROGRAM				{$$ = new ProgramList(); $$->addProgram($1);cout << "program_list" << std::endl;}
	| PROGRAM_LIST PROGRAM	{$$ = $1; $$->addProgram($2);cout << "program_list_add" << std::endl;}
	;

PROGRAM
	: function_definition	{$$ = $1;}
	| enum_declaration		{$$ = $1;}
	| struct_declaration	{$$ = $1;}
	| declaration_statement	{$$ = $1;}
	| array 				{$$ = $1;}
	| TYPEDEF type IDENTIFIER ';'	{$$ = new Typedef(*$2, *$3);}
	;

function_definition
	: type IDENTIFIER '(' ')' compound_statement	{$$ = new FunctionDefination(*$1, *$2, $5); cout<<"function_definition no argument"<<endl;}
	| type IDENTIFIER '(' parameter_list ')' compound_statement	{$$ = new FunctionDefination(*$1, *$2, $6, $4); cout<<"function_definition with argument"<<endl;}
	| type IDENTIFIER '(' parameter_list ')' ';'	{$$ = new FunctionDefination(*$1, *$2, NULL, $4); cout<<"empty_function_definition"<<endl;} 
	| type IDENTIFIER '(' ')' ';' {$$ = new FunctionDefination(*$1, *$2);}
	;

struct_reference
	: STRUCT IDENTIFIER IDENTIFIER ';' {$$ = new StructReference(*$2, *$3);}
	;