Daeryabaar.com | I.T. Crash Course | Part 1

A jump into the boiling water

C++ is the english of Information Technology. C is outdated and C# is ... well ... a botched concept that entirely misunderstands what a modern Programming Language should be like. Its differences to C++ are evidently 'gross' - as the differences between C++ and C are practically neglectable; Aside of the 'additions' that C++ ... 'adds' to it.
In that sense I'll be trying to "teach" you C++; And next to C++ I'll try to deliver you a centric piece of understanding regarding 'Assembler', the bare-bones programming Language of the X86 CPU architecture.

To really get started with programming I suggest you either use Linux and already have GNU C++ (almost) directly available (else get it via the package manager (g++ or gcc respectively (same thing!))) - or you're using Windows (for Mac users, I honestly don't know if GCC is available for Mac!) I suggest you download and install MinGW.
As assembler of choice I'm using NASM, which is simply put: Not GAs - and GAs is to me the single greatest example for where Linux derails from a common standard in a simply put: odd way!

With that out of the way, here's some basic knowledge: 'Programming' (to Code) is first done in a Text-Editor (I recommend Sublime-Text (semi free), Kate (Linux, free) or Notepad++ (Windows, free) (in that order)), and the therein created Files are "passed" to a compiler. These compilers (gcc and nasm) can either produce .obj files or binaries. .obj files are a mid-stage, being half way between source-code (the creates Files) and binary (executable).
Using gcc and nasm we can create .obj files; And finally link them together producing the final executable. For simplicity we'll usually compile binaries/executables while linking object files into that compile progress; Instead of using 'inline assembly' (assembler code embedded into c++ code).

The First basic Coding Structure

The way an application works, is that the Operating System (or the BIOS/Bootloader where the Application IS the Operating System) does at first load it into RAM. The CPU does have so called 'Registers' - the width of these registers depends on the CPU "type". A 16bit CPU has 16bit registers, a 32bit CPU has 32bit registers. This means that a register holds X-bits - and each higher generation extends the lower generations logic by a bit. 16bit registers are 'addressable' as either 16 bit value or two 8 bit values. 32 bit registers are 'addressable' as either 32 bit values, or 16 or 2x8 bit registers.
Some registers are 'general purpose registers' used to perform the most basic tasks; That because any operation on any value of the program is done by the CPU. In order to manipulate a value it first has to be loaded into the CPU. There are exceptions, such as some floating point operations, but these do also need some value inside of one register; While further exceptions are more hardware specific basics.

The classic Harddrive I/O interrupt for instance requires a few registers of the CPU to be set, while a part of it is a reference to a piece of RAM wherein more specific data regarding the data-transfer is being stored.
As of that, some registers have a specific purpose.

The first pair of registers we need to know of is the 'Code Segment' register and the 'Index Pointer'. After the Operating System loaded the Program into RAM, that chunk of Memory is referred to via the Code Segment Register. How these segment pointers work is different from 16 to 32 bits - and mostly nothing we need to be bothered with; And that because since the 32bit architecture the programmer isn't supposed to touch them, and doing so will usually cause an error since these are protected in order to prevent programs from accessing memory they aren't supposed to!
Imagine the Code Segment as a piece of paper - somewhere on that paper the program begins, and here the Index Pointer is finally set to the Programs "in-point". The CPU uses it to fetch an instruction from RAM and increments it. The most essential instruction to any CPU is the 'goto' (C++, 'jmp' in assembler) command.

goto LABEL_XYZ;
...
LABEL_XYZ:

as should be obvious: jmp manipulates the Index Pointer so that the CPU continues fetching the code from another location. A close relative to jmp, perhaps more essential also, is the 'call' command. Call is like jmp, but before it "jumps" it "pushes" the 'next' instruction address onto the stack, so once the 'ret' command is used, this address can be loaded into/as the Index Pointer and so the CPU continues executing where call left off.

The Big Information Technological Epiphany

It may just be a personal thing, but still, it is the most essential 'Coding Mystery' that any beginner has to face at some point before 'really' understanding what he/she/it is doing! '"Where is my Memory?"'.

C++ knows three types of Memory: Static, Stack and Heap. As static memory we understand the memory on the Code Segment. 'The Stack' is a different Segment, referred to by the 'Stack Segment' Register and the 'Stack Pointer' respectively. This stack is something the Operating System has to create for itself, while assigning one to the application. Last time I checked (and that are some years now) the OS assigns about 1 MB to the application. The stack essentially is background stuff. The X86 architecture further holds two instructions that are generally used to work with it: push and pop. push register stores a value in the stack segment and increments (or decrements, depending on how the stack is aligned) the Stack Pointer respectively. Pop reverses this increment and moves the value to a designated register (if specified (//)).
In c++ the 'call' command isn't used anywhere. Instead 'functions' are written, and calling a function is done by simply 'writing it out':

int FunctionName (int param1,int param2) {

return param1+param2; };

...
int anotherFunction (void) {

return FunctionName(1,2) + 3; };

What happens here, exactly, is that in 'anotherFunction' first 1 and 2 are pushed onto the stack (mov eax,1; push eax), then FunctionName is called, there the two values are taken from the stack (no pop), added, stored in the 'a' register (the standard register to contain return values) and then 'ret'urns. anotherFunction then either pops twice or manipulates the stack pointer directly; And then adds 3 to the 'a' register (AX (16bit), EAX (32bit) RAX (64bit)) before 'ret'urning.

int value1;//global variables are static memory variables

int main (int argc,char**argv) {

};

This is the 'main' function, the standard 'in-point' of any program. (Windows has a slightly different Main function, but under the hood it yet uses this kind of main function, does a few things and then expects its own kind of main function). What this function "does" is that it 'reserves' a variable on the stack (value2). More on stack variables will be a little bit complicated; And that is yet a detail we can safely ignore for now.

In order to have access to more than just 1 MB of stack, we need 'heap' Memory. The thing there is: As a matter of standards, Operating Systems do not create a heap for any application. So, if the program needs any Memory at all; And most often they do; It will need to ask for it. And here I want to 'break' with todays standards; And look a little closer into Stack Variables by telling you a bit more about the 'Base Pointer'.

Each function that holds stack variables requires a 'stack frame' - and that is because 'each push' changes the stack pointer and hence the function itself "has a hard time" organizing the code. Thus the stack pointer is incremented by the amount of space required for the stack variables of itself; And the base pointer is there to hold the initial stack position. So the stack pointer may change. So does each function that creates such a 'stack frame' also push the base pointer onto stack before creating a stack frame; So that once it returns to the calling function that one has its own base pointer "unharmed".
For 'bleeding edge performance' the presence of stack frames is idiotic! (And so is paging, but that is another topic!) However writing a program without un-necessary stack-frames is respectively more demanding. So for this crash course I'll 'use' the Base Pointer for our Heap; Although in 64bit Architecture there's plenty of additional registers that can be used instead! And I'll do it here so because thats how I've done it in my early beginnings - where I also didn't have a 64bit PC (neither were there a lot I guess).

To acquire heap, there 'was' the 'malloc' function; Though I don't know the exact details, the most frequently used way is to use 'new':

char * memory_handle = new char [16*1024*1024];
...
delete [] memory_handle;

or in "our case":

((char*)__EBP) = new char [16*1024*1024];
...
delete [] (char*)__EBP;
/*(Note: __EBP isn't a default term "understood" by most(any other than Borland) compilers, but can/hasto be created manually (register void* __EBP asm("ebp");)/-> global definition of register variables (GCC))*/

Which acquires 16*1024*1024 'char sized' memory units; So: 16 Megabytes.
The main reason why I "push" this way of doing things is mostly to not even get you started to use new each and every time you need a few bytes for something. No, instead use it only once or twice for big chunks; And satisfy all of your further memory needs from there! 'Dynamic Memory' is one of those 'Honey Traps' - in my book - it is deceptively convenient; But all you do is that you push the management requirements on the OS and in turn end up having 'fragmented Memory' which may just be the reason why your machine gets slower and slower the more you use it!

My first goto solution for 'stack-frame-less programming' was to "split" the acquired heap into three sections. The first section were like 512 bytes of "pseudo stack", then "sizeof(MAIN_CLASS)" for the main class; And the rest for general memory requirements.
Keeping it this way will also 'require you' to stick to the limits - and this way avoid the OS from sweating in case the program wants more RAM than there is. And this is why I will vastly avoid the topic of class constructors and destructors also!

To properly 'use' the pseudo-stack, one has to however be aware of where and when to use it. The general rule is that of course: One shouldn't use it and call a function that uses it another way. You can however see that instead of assigning EBP each time we call a function, we assign it once and never change it. The stack can roll up and down however - in which sense the so proclaimed ideal is to use the stack for function calling purposes only.
Here we so also avoid 'global variables' - where simply: The Main Class 'is' the global Memory!
A good way to think about it is, that 'the runtime' shouldn't need any Stack Frames at all. Anything that doesn't belong into the runtime may as well use stack-frames; Simply enough because these are 'one time' functions. But this of course is just a suggestion and whether or not 512 bytes of Pseudo Stack are enough ... is another thing.
However - as to get away from this topic: One might also assign 'Pseudo-Stack Levels' to various functions - where the Pseudo-Stack as any other stack does always only have temporary meaning.

Some "Hello World!" C++ shenanigans

"Hello World!" in I.T. is a string usually used to showcase the way a Programming Language works by showing how the string "Hello World!" is printed to the screen.

.bat or .sh: echo "Hello World!"
QBasic: print "Hello World!"
C++:

printf("Hello World!");

Where instead of showcasing how its done in assembler (ugh ... @@*&*(@$^$@())) - I want to change this function to look a little more as it would look in assembler (C++ practically translates into assembler before compiling into a binary):

#include <stdio.h>

char helloWorld [] = "Hello World!";

int main(int argc,char**argv) {

printf(&helloWorld[0]);//or: printf(helloWorld); };

The big thing is that strings in C++ do need to be a part of the Binary if coded in that way; As any other variable. So if you for instance compiled the upper example, you could yet open the executable and find the string somewhere in there. The thing I want to show you here is how different 'the feel' of the code is without a big difference in the final product. While I wouldn't really advise against taking the easier path where a greater effort isn't really justified; I want you to see the 'good' of knowing "where things come from". In Information Technology that is either Hardware Stuff or Memory; Usually. Generally one would use files to get data into memory; And while it may seem wasteful to always have Buffer Memory available at runtime, Games like GTA wouldn't really get far without it!

Well, after all this first part is just one more botched writing of mine; With an abysmally lucky twist in the end to somehow save the day. As of that, here's a class 'stub' I'm generally using for all sorts of memory issues:

typedef char text;//because why not?
typedef unsigned char dega;
typedef unsigned long long int wide;/*it happened to me once that 'just' long or long int didn't result into a full CPU wide variable!*/
//...
class datajet { public:

};
/*usage: if(datajet::__text__::compare(str1,str2) == 0) //... not entirely sure yet!*/

Using which the whole 'design' revolves around organizing "Levels" of Memory usage - while generally just using them as "tanks" that are 'supposed' to become obsolete as the systems runtime (memory needs) is(/are) initialized. Further, once extending the __mem__ and __text__ functions, writing a filesystem 'path' class is a breeze. Try:

...
int main (int argc,char**argv) {

};

If you however don't have any coding experience yet; And really aren't interested in learning any programming now already, just wait ... uhm, I guess! ...

Next: []