What is Assembly Language?
Assembly language is a low-level programming language for programmable device. It is an alphanumeric representation of machine code. In contrast to high-level programming language which are generally portable across multiple systems, assembly language is very architecture specific. Assembly language is very strong correspondence between the language and the architecture’s machine code instruction.
Assembly consists of a list of instructions that are in no way comparable to anything you might know from C, Basic or Pascal. It does not have structure like high-level language does (like for or while)
Let’s see the example below. The code below is an example of a AVR assembly code written in assembly language. Each line of the code is an instruction telling the micro controller to carry out a task.
ADD R16, R17 ; Add value in R16 to value in R17
DEC R17 ; Minus 1 from the value contained in R17
MOV R18, R16 ; Copy the value in R16 to R18
END: JMP END ; Jump to the label END
As we stated before, the instruction or mnemonic are note general but specific. Each company provides a set of instructions for their micro controllers. Also note that not all instructions are available to all micro controllers. We should consult datasheet for each AVR microcontroller.
Assembler or other languages, that is the question. Why should I learn another language, if I already learned other programming languages? The best argument: while we live in France we are able to get through by speaking English, but we will never feel at home then, and life remains complicated. We can get through with this, but it is rather inappropriate. If things need a hurry, we should use the country’s language.
Many people who dwelling in higher-level language start diving into AVR assembly. The reasons are sometimes similar to people who come to assembly language in x86, such as:
- to know the architecture better
- analyze bug
- use hardware features which aren’t supported by higher-level language
- need time-critical code
- just for fun
It is necessary to know assembly language, e.g. to understand what the higher-level language compiler produced. Without understanding assembly language we do not have a chance to proceed further in these cases.
High Level Languages and Assembly
High level languages insert additional nontransparent separation levels between the CPU and the source code. An example for such an nontransparent concept are variables. These variables are storages that can store a number, a text string or a single Boolean value. In the source code, a variable name represents a place where the variable is located, and, by declaring variables, the type (numbers and their format, strings and their length, etc.).
For learning assembler, just forget the high level language concept of variables. Assembler only knows bits, bytes, registers and SRAM bytes. The term “variable” has no meaning in assembler. Also, related terms like “type” are useless and do not make any sense here.
High level languages require us to declare variables prior to their first use in the source code, e. g. as Byte (8-bit), double word (16-bit), integer (15-bit plus 1 sign bit). Compilers for that language place such declared variables somewhere in the available storage space, including the 32 registers. If this placement is selected rather blind by the compiler or if there is some priority rule used, like the assembler programmer carefully does it, is depending more from the price of the compiler. The programmer can only try to understand what the compiler “thought” when he placed the variable. The power to decide has been given to the compiler. That “relieves” the programmer from the trouble of that decision, but makes him a slave of the compiler.
The instruction “A = A + B” is now type-proofed: if A is defined as a character and B a number (e. g. = 2), the formulation isn’t accepted because character codes cannot be added with numbers. Programmers in high level languages believe that this type check prevents them from programming nonsense. The protection, that the compiler provides in this case by prohibiting your type error, is rather useless: adding 2 to the character “F” of course should yield a “H” as result, what else? Assembler allows us to do that, but not a compiler.
Assembler allows us to add numbers like 7 or 48 to add and subtract to every byte storage, no matter what type of thing is in the byte storage. What is in that storage, is a matter of decision by the programmer, not by a compiler. If an operation with that content makes sense is a matter of decision by the programmer, not by the compiler. If four registers represent a 32-bit-value or four ASCII characters, if those four bytes are placed low-to-high, high-to-low or completely mixed, is just up to the programmer. He is the master of placement, no one else. Types are unknown, all consists of bits and bytes somewhere in the available storage place. The programmer has the task of organizing, but also the chance of optimizing.
Of a similar effect are all the other rules, that the high level programmer is limited to. It is always claimed that it is saver and of a better overview to program anything in subroutines, to not jump around in the code, to hand over variables as parameters, and to give back results from functions. Forget most of those rules in assembler, they don’t make much sense. Good assembler programming requires some rules, too, but very different ones. And, what’s the best: most of them have to be created by yourself to help yourself. So: welcome in the world of freedom to do what we want, not what the compiler decides for us or what theoretical professors think would be good programming rules.
High level programmers are addicted to a number of concepts that stand in the way of learning assembler: separation in different access levels, in hardware, drivers and other interfaces. In assembler this separation is complete nonsense, separation would urge us to numerous workarounds, if we want to solve your problem in an optimal way.
Because most of the high level programming rules don’t make sense, and because even puristic high level programmers break their own rules, whenever appropriate, see those rules as a nice cage, preventing us from being creative. Those questions don’t play a role here. Everything is direct access, every storage is available at any time, nothing prevents your access to hardware, anything can be changed – and even can be corrupted. Responsibility remains by the programmer only, that has to use his brain to avoid conflicts when accessing hardware.
The other side of missing protection mechanisms is the freedom to do everything at any time. So, smash your ties away to start learning assembler. We will develop your own ties later on to prevent yourself from running into errors.
What is Really Easier in Assembly?
All words and concepts that the assembler programmer needs is in the datasheet of the processor: the instruction and the port table. Done! With the words found there anything can be constructed. No other documents necessary. How the timer is started (is writing “Timer.Start(8)” somehow easier to understand than “LDI R16,0x02” and “OUT TCCR0,R16”?), how the timer is restarted at zero (“CLR R16” and “OUT TCCR0,R16”), it is all in the data sheet. No need to consult a more or less good documentation on how a compiler defines this or that. No special, compiler-designed words and concepts to be learned here, all is in the datasheet. If we want to use a certain timer in the processor for a certain purpose in a certain mode of the 15 different possible modes, nothing is in the way to access the timer, to stop and start it, etc.
What is in a high level language easier to write “A = A * B” instead of “MUL R16,R17”? Not much. If A and B aren’t defined as bytes or if the processor type is tiny and doesn’t understand MUL, the simple MUL has to be exchanged with some other source code, as designed by the assembler programmer or copy/pasted and adapted to the needs. No reason to import an nontransparent library instead, just because you’re to lazy to start your brain and learn.
Assembler teaches us directly how the processor works. Because no compiler takes over your tasks, we are completely the master of the processor. The reward for doing this work, we are granted full access to everything. If we want, we can program a baud-rate of 45.45 bps on the UART. A speed setting that no Windows PC allows, because the operating system allows only multiples of 75 (Why? Because some historic mechanical teletype writers had those special mechanical gear boxes, allowing quick selection of either 75 or 300 bps.). If, in addition, we want 1 and a half stop bytes instead of either 1 or 2, why not programming your own serial device with assembler software. No reason to give things up.
The Instruction Set
Instruction Set or Instruction Set Architecture (ISA) is set of instruction for computer architecture, including native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O. In simple word, it is the set of instruction we use to program the device.
Every instruction is actually a number. These numbers are stored in flash memory and when the chip is powered, the CPU then start to fetch instruction from storage and executed what the instruction want. The instructions are 16-bit number. But there is no need for us to learn that binary code and pattern, because in assembler those codes are mapped to human-readable instruction. The so called mnemonics are machine instruction represented as human-readable text.
Still, the CPU understands instruction expressed in 16-bit word. The mnemonics only represents those instruction.
Instruction Set can be divided into some categories:
- Data handling and memory operations: set register to a fixed value, move data from memory location to register or vice.
- Arithmetic and logic operations: add, subtract, multiply, divide, bitwise operations, comparison.
- Control flow operations: branch, conditionally branch, function call.
You can see the complete AVR instruction set here: Download
Apart from the microcontroller instruction set that is used in writing an AVR assembly code the AVR Assembler support the use of assembler directives. Assembler Directives are used as instruction to the AVR assembler and are in no way translated to machine language during the assembly process.
Assembler Directives are use by the programmer to assist in adjusting the location of the program in memory, defining macros and so fort. A list of the AVR assembler directives available to the AVR assembler is given below.
|BYTE||Reserve a Byte for a Variable|
|CSEG||Define the Code Segment Section|
|DB||Define Constant Byte(s)|
|DEF||Define a Symbolic Name|
|DEVICE||Define which Micro controller to Assemble for|
|DSEG||Define the Data Segment Section|
|DW||Define Constant Word(s)|
|ENDMACRO||Signifies the End of a Macro|
|EQU||Set a Symbol Equal to an Expression|
|ESEG||Define the EEPROM Segment Section|
|Exit||Exit from a File|
|INCLUDE||Read Source Code from a File|
|LIST||Turn List Generation ON|
|LISTMAC||Turn Macro Expression ON|
|MACRO||Signifies the Beginning of a Macro|
|NOLIST||Turn List Generation OFF|
|ORG||Set Program Origin|
|SET||Set Symbol to an Expression|
Each assembler directives in an AVR assembly program must be preceded by a period “.” as shown below where the directives INCLUDE and ORG are used.
begin: ADD R16, R17 ; Add R17 to R16
MOV R17, R18 ; Copy R18 to R17