A Clunky Way to Describe Operating Systems and Programming Languages By MAGICCARPET

By @MAGICCARPET
POSTED: 25 Jul 2014 03:12
CATEGORY: Educational
FEATURED: Yes (@Haruspex)
(Go to comments)

Whenever you initiate an executable, often ending with .EXE or .ELF, the device must load the file into an area in memory and the Central Processing Unit, whatever it may be, will take in each value from each space in memory and process it accordingly. However, certain CPUs cannot execute the same file. This is because of the design of the hardware itself--the physical connections inside of the CPU--cause the hardware to react differently to certain values.

Let's pretend there exists two CPUs, the X1 and X2, which by design are not meant to process data the exact same way. Let's feed X1 the value 00000001 (1) in binary, followed by 00000011 (3) and then 00000110 (6). To the CPU it means MOVE 6 into memory address 00000011.

Congratulations, you caused the CPU to store a value into memory. CPUs would use extra data to differentiate between an integer and an address, but for simplicity we will not go into lengthy code just yet.

If you gave the same code to the X2 CPU, it might interpret 00000001 00000011 00000110 as ADD 00000011 00000110, in which it will add the numbers 3 and 6 to get 9 (00001001) and probably do nothing with it.

This gives architecture to CPUs, meaning that their design directly influences how a programmer will construct their programs. Right now, we are using MACHINE CODE, a low-level programming method, and scientists back in the day used to labor for hours on pen and paper writing these lines (usually in hexadecimal). To make the process easier and faster, ASSEMBLY was developed. Assembly takes machine code up just a level and allows us to represent values as words and phrases. This is what I showed alongside the machine code. Just like these two fictional CPUs have their own architecture, the assembly for either has their own instruction set.

When you write in assembly, you can't just feed those bytes to the processor and expect results. You have to compile the code, it transform each line into a binary sequence that the CPU will use correctly. This is true for the next level of programming, high level languages.

These include the common C++. This level adds complexity, and makes it even more understandable by human method. It also takes more memory to run. When you compile them, you might do so for the CPU you are using. But this poses a puzzle: Why can I compile this code for my Windows machine, move it to my friend's PC with a different CPU, and still run it?

There are two possible answers. One, your friend's CPU may still be compatible with the architecture the executable was compiled for, or you compiled the program for the Win32 layer. What this means is that there is a layer in Windows that runs the program, and in turn delivers the code to the CPU as machine code for that CPU.

This helps describe the complexity of Operating Systems such as the Windows NT family. Before we can get to this, let's try to understand the machine by itself.

When you boot up your PC, the first things that loads is the Basic Input Output System, BIOS. This is a program typically in machine code that tests the CPU, peripherals like the HDD or SSD, and then look for some higher level code to give access the hardware to. This is called a Power On Self Test, or POST, in order to make sure the main devices will function. Graphics Processing Units have their own BIOS and POST, and the main POST will usually both initiate it and get a value back. If all is a go, you might hear a beep or nothing at all. Different beep patterns indicate something important about damage detected by the POST.

When the POST is complete, the BIOS will look for the main files of an OS to load. Once those are found, and they are usually in machine code, the OS takes over. The OS has two functions: to provide an interface for maintaining communication between software and hardware (called a kernel) and providing an understandable interface for which users are able to send recognizable input and receive logical output (the User Interface).

So the PC at the point is software upon software upon hardware. There is a layer in between, for Win32 applications. Win32 applications are executables compiled for the APIs Windows provides. In other words, the layer is almost like its own machine; It takes in the program and sends signals to other programs to work how the primary application wants windows to act. This code is passed to the kernel, where it will use drivers to determine how to send the same signal in machine code to the CPU, and vice versa. (We talk as if these layers are abstract entities on their own matter, even though it's all still on the CPU, and how it all truly converts is beyond me.) The BIOS is stilled commonly used as the final transition.

So the User Interface is a way to interact with the computer in a sort of natural way, the Kernel is a transition between the UI and the hardware, and the hardware plus the BIOS gets the real work down, sending it back up the ladder to you.

Let's name five people: You, Uileam, Kenny, Bob, and Hank. If you want to get Hank to do something specific, like add two numbers and show the result to you, you first need to consult Uileam. Uileam is your User Interface, and at this point he's so advanced you can ask him, "Add one and two, and then show me the result." So Uileam tells Kenny, the Kernel, in a bit of a more clunky way, "Print two plus three." Kenny goes to Bob, the BIOS, to find out how to show the command to Hank. After they decide how, Bob tells Hank, "F0, 02, 03, 0A, 3B, 0B, 12, 0A." (Very very degenerative way to show this.) If we could see this as words, they would be something like "ADD TWO THREE TEN HARDWARE 11 INITIALIZE TEN." This just means "Add two and three, save in memory address ten, then send the data in memory address ten to hardware eleven (in this case the screen, which will process that by itself)." It's important to note that the kernel decided where the CPU will save the result of 2+3, and that it also sent the command to print to screen. The kernel could have told the CPU to save 2+3 to address ten, and then the GPU to load the value in address ten and present it to us... Although in actuality that is the same thing, since the kernel still lives on the CPU, but again very complex to explain.

Now on a side note, there is one more level of programming called interpreter language. This includes Java and Python, in which the code is compiled into bytecode. The JVM, Java Virtual Machine, is a program that takes input and processes it like a piece of hardware, so that the bytecode becomes machine code for the actual CPU on-the-fly. This is especially a better method because the JVM can be made for almost any architecture and the bytecode doesn't change to accommodate.

+4 -0

@HullBreach Premium64

25 Jul 2014 11:57

That was a great summary of all the basic aspects of software execution, except one glaring topic that's worth mentioning when discussing low-level languages: CPU registers.

For the reader... A CPU register is analogous to RAM, but it is super fast and resides inside the core of the CPU itself. Modern processors have dozens of these, and they generally have specialized purposes, such as memory access, arithmetic, error states, code execution, etc.

When software is loaded into memory, the CPU sets an instruction pointer (IP) register, which advances as the code executes. When another program needs to temporarily take over or that same program needs to jump to another block of code for a moment, that IP gets "pushed" onto a memory stack for safe-keeping. When everything is done, the value gets "popped" of the stack so that the CPU knows where it left off. This can all be compared to sliding a finger across a book while reading and putting in a bookmark to check another chapter for a moment.

Entire textbooks have been written on just the arithmetic possible with registers (MMX, SIMD, etc.), so I won't get into those, but it is worth saying that this can be where the majority of time is spent optimizing code, since very little changes can make vast differences in the speed of time-critical actions in software.

@Haruspex
25 Jul 2014 03:15

@Star Shadow
25 Jul 2014 04:53
In reply to Haruspex

It takes about at least 5 people just to get a command to go through a computer.

A Clunky Way to Describe Operating Systems and Programming Languages

Comments