Immediate Operand

Architecture

David Coin Harris , Sarah L. Harris , in Digital Design and Figurer Architecture (2nd Edition), 2013

Constants/Immediates

Load give-and-take and store give-and-take, lw and sw, also illustrate the use of constants in MIPS instructions. These constants are called immediates, because their values are immediately available from the pedagogy and practice non require a register or memory admission. Add together immediate, addi , is some other common MIPS instruction that uses an immediate operand. addi adds the immediate specified in the instruction to a value in a register, as shown in Lawmaking Example 6.9.

Code Example 6.9

Firsthand Operands

Loftier-Level Code

a = a + 4;

b = a − 12;

MIPS Assembly Code

#   $s0 = a, $s1 = b

  addi $s0, $s0, 4   # a = a + iv

  addi $s1, $s0, −12   # b = a − 12

The immediate specified in an teaching is a 16-bit ii'due south complement number in the range [–32,768, 32,767]. Subtraction is equivalent to adding a negative number, so, in the interest of simplicity, in that location is no subi instruction in the MIPS architecture.

Recall that the add and sub instructions utilize 3 annals operands. But the lw, sw, and addi instructions use ii register operands and a constant. Because the instruction formats differ, lw and sw instructions violate design principle one: simplicity favors regularity. However, this issue allows u.s. to introduce the concluding pattern principle:

Design Principle 4: Proficient design demands good compromises.

A unmarried instruction format would be simple simply not flexible. The MIPS education set makes the compromise of supporting three education formats. One format, used for instructions such as add and sub, has three register operands. Another, used for instructions such as lw and addi, has two annals operands and a xvi-flake immediate. A third, to be discussed later, has a 26-fleck firsthand and no registers. The next department discusses the three MIPS instruction formats and shows how they are encoded into binary.

Read full affiliate

URL:

https://www.sciencedirect.com/science/article/pii/B9780123944245000069

Architecture

Sarah L. Harris , David Money Harris , in Digital Pattern and Figurer Architecture, 2016

Constants/Immediates

In add-on to register operations, ARM instructions tin use constant or immediate operands. These constants are called immediates, because their values are immediately available from the instruction and do not require a register or memory access. Code Example half dozen.6 shows the ADD instruction adding an immediate to a register. In assembly code, the firsthand is preceded by the # symbol and can be written in decimal or hexadecimal. Hexadecimal constants in ARM assembly linguistic communication start with 0x, as they do in C. Immediates are unsigned viii- to 12-bit numbers with a peculiar encoding described in Section 6.iv.

Code Example half-dozen.6

Firsthand Operands

High-Level Code

a = a + 4;

b = a − 12;

ARM Assembly Lawmaking

; R7 = a, R8 = b

  ADD R7, R7, #4   ; a = a + 4

  SUB R8, R7, #0xC   ; b = a − 12

The movement instruction (MOV) is a useful way to initialize register values. Code Case 6.7 initializes the variables i and x to 0 and 4080, respectively. MOV can also accept a register source operand. For example, MOV R1, R7 copies the contents of register R7 into R1.

Code Example vi.7

Initializing Values Using Immediates

High-Level Lawmaking

i = 0;

x = 4080;

ARM Assembly Lawmaking

; R4 = i, R5 = ten

  MOV R4, #0   ; i = 0

  MOV R5, #0xFF0   ; x = 4080

Read full chapter

URL:

https://world wide web.sciencedirect.com/scientific discipline/article/pii/B9780128000564000066

Compages

Sarah L. Harris , David Harris , in Digital Design and Computer Compages, 2022

Constants/Immediates

In improver to register operations, RISC-Five instructions tin use abiding or immediate operands. These constants are called immediates because their values are immediately available from the instruction and exercise not require a annals or memory access. Code Example 6.half dozen shows the add immediate instruction, addi, that adds an firsthand to a register. In associates code, the firsthand can be written in decimal, hexadecimal, or binary. Hexadecimal constants in RISC-V assembly language kickoff with 0x and binary constants start with 0b, every bit they do in C. Immediates are 12-bit 2's complement numbers, so they are sign-extended to 32 bits. The addi pedagogy is a useful way to initialize register values with small constants. Code Example 6.seven initializes the variables i, x, and y to 0, 2032, –78, respectively.

Code Example 6.6

Immediate Operands

High-Level Code

a = a + iv;

b = a − 12;

RISC-V Assembly Code

# s0 = a, s1 = b

  addi s0, s0, iv   # a = a + 4

  addi s1, s0, −12   # b = a − 12

Code Example 6.seven

Initializing Values Using Immediates

High-Level Lawmaking

i = 0;

x = 2032;

y = −78;

RISC-5 Assembly Code

# s4 = i, s5 = x, s6 = y

  addi s4, naught, 0   # i = 0

  addi s5, zero, 2032   # ten = 2032

  addi s6, nada, −78   # y = −78

Immediates tin can exist written in decimal, hexadecimal, or binary. For instance, the following instructions all put the decimal value 109 into s5:

addi s5,x0,0b1101101

addi s5,x0,0x6D

addi s5,x0,109

To create larger constants, apply a load upper immediate instruction (lui) followed by an add firsthand instruction (addi), as shown in Code Example vi.8. The lui instruction loads a 20-flake immediate into the nearly meaning twenty bits of the didactics and places zeros in the least meaning bits.

Code Example half dozen.viii

32-Bit Abiding Instance

High-Level Code

int a = 0xABCDE123;

RISC-V Associates Lawmaking

lui   s2, 0xABCDE   # s2 = 0xABCDE000

addi s2, s2, 0x123   # s2 = 0xABCDE123

When creating large immediates, if the 12-bit immediate in addi is negative (i.eastward., bit eleven is i), the upper immediate in the lui must be incremented by one. Remember that addi sign-extends the 12-bit immediate, so a negative firsthand will have all ane'due south in its upper twenty bits. Considering all 1'due south is −1 in ii'southward complement, calculation all ane's to the upper firsthand results in subtracting one from the upper immediate. Code Example 6.9 shows such a case where the desired immediate is 0xFEEDA987. lui s2, 0xFEEDB puts 0xFEEDB000 into s2. The desired 20-fleck upper immediate, 0xFEEDA, is incremented by 1. 0x987 is the 12-scrap representation of −1657, then addi s2, s2, −1657 adds s2 and the sign-extended 12-flake immediate (0xFEEDB000 + 0xFFFFF987 = 0xFEEDA987) and places the result in s2, as desired.

Code Example vi.nine

32-flake Constant with a Ane in Bit 11

High-Level Code

int a = 0xFEEDA987;

RISC-Five Assembly Code

lui   s2, 0xFEEDB   # s2 = 0xFEEDB000

addi s2, s2, −1657   # s2 = 0xFEEDA987

The int data type in C represents a signed number, that is, a two'south complement integer. The C specification requires that int be at least 16 bits wide merely does not crave a particular size. Most modernistic compilers (including those for RV32I) use 32 $.25, and so an int represents a number in the range [−ii31, 231− one]. C besides defines int32_t every bit a 32-chip ii'southward complement integer, simply this is more cumbersome to type.

Read full affiliate

URL:

https://www.sciencedirect.com/science/article/pii/B9780128200643000064

Embedded Processor Compages

Peter Barry , Patrick Crowley , in Modern Embedded Computing, 2012

Firsthand Operands

Some instructions use data encoded in the instruction itself as a source operand. The operands are called firsthand operands. For instance, the following education loads the EAX register with nothing.

MOV   EAX, 00

The maximum value of an firsthand operand varies amid instructions, only it tin can never exist greater than two32. The maximum size of an firsthand on RISC architecture is much lower; for example, on the ARM architecture the maximum size of an immediate is 12 bits equally the instruction size is stock-still at 32 bits. The concept of a literal pool is commonly used on RISC processors to get around this limitation. In this instance the 32-bit value to be stored into a register is a information value held as part of the code department (in an surface area ready bated for literals, often at the end of the object file). The RISC education loads the register with a load program counter relative operation to read the 32-bit information value into the register.

Read full chapter

URL:

https://www.sciencedirect.com/scientific discipline/commodity/pii/B9780123914903000059

PIC Microcontroller Systems

Martin P. Bates , in Programming viii-bit PIC Microcontrollers in C, 2008

Programme Execution

The chip has 8   thousand (8096 × fourteen $.25) of flash ROM programme memory, which has to be programmed via the series programming pins PGM, PGC, and PGD. The stock-still-length instructions contain both the operation code and operand (immediate data, register address, or jump address). The mid-range Pic has a express number of instructions (35) and is therefore classified as a RISC (reduced instruction set up computer) processor.

Looking at the internal architecture, nosotros tin identify the blocks involved in program execution. The program memory ROM contains the machine code, in locations numbered from 0000h to 1FFFh (8   one thousand). The program counter holds the address of the current instruction and is incremented or modified after each stride. On reset or ability up, it is reset to zero and the start instruction at address 0000 is loaded into the instruction register, decoded, and executed. The programme and then proceeds in sequence, operating on the contents of the file registers (000–1FFh), executing information move instructions to transfer data betwixt ports and file registers or arithmetic and logic instructions to process it. The CPU has one main working register (Westward), through which all the information must pass.

If a branch instruction (provisional spring) is decoded, a chip test is carried out; and if the effect is true, the destination address included in the educational activity is loaded into the program counter to force the leap. If the consequence is false, the execution sequence continues unchanged. In assembly language, when Call and Render are used to implement subroutines, a similar process occurs. The stack is used to store return addresses, and so that the plan tin return automatically to the original program position. Even so, this machinery is non used past the CCS C compiler, as it limits the number of levels of subroutine (or C functions) to eight, which is the depth of the stack. Instead, a unproblematic GOTO instruction is used for function calls and returns, with the return address computed past the compiler.

Read full chapter

URL:

https://world wide web.sciencedirect.com/scientific discipline/article/pii/B9780750689601000018

HPC Compages 1

Thomas Sterling , ... Maciej Brodowicz , in High Performance Calculating, 2018

2.7.1 Single-Instruction, Multiple Information Architecture

The SIMD array form of parallel computer architecture consists of a very large number of relatively unproblematic Human foot, each operating on its own data memory (Fig. 2.13). The Pes are all controlled by a shared sequencer or sequence controller that broadcasts instructions in club to all the PEs. At whatsoever indicate in time all the Human foot are doing the aforementioned operation but on their respective dedicated memory blocks. An interconnection network provides data paths for concurrent transfers of information between Pes, also managed by the sequence controller. I/O channels provide loftier bandwidth (in many cases) to the system as a whole or straight to the Foot for rapid postsensor processing. SIMD assortment architectures have been employed as standalone systems or integrated with other computer systems as accelerators.

Figure 2.thirteen. The SIMD assortment grade of parallel estimator architecture.

The PE of the SIMD array is highly replicated to deliver potentially dramatic operation gain through this level of parallelism. The approved PE consists of fundamental internal functional components, including the following.

Memory cake—provides role of the system total retention which is directly accessible to the individual PE. The resulting system-wide retentiveness bandwidth is very loftier, with each memory read from and written to its own PE.

ALU—performs operations on contents of data in local memory, possibly via local registers with additional immediate operand values within broadcast instructions from the sequence controller.

Local registers—hold electric current working data values for operations performed by the PE. For load/store architectures, registers are direct interfaces to the local memory cake. Local registers may serve as intermediate buffers for nonlocal data transfers from organisation-broad network and remote PEs also as external I/O channels.

Sequencer controller—accepts the stream of instructions from the organization didactics sequencer, decodes each instruction, and generates the necessary local PE control signals, mayhap as a sequence of microoperations.

Instruction interface—a port to the broadcast network that distributes the instruction stream from the sequence controller.

Data interface—a port to the organisation data network for exchanging data amidst PE memory blocks.

External I/O interface—for those systems that associate private Foot with system external I/O channels, the PE includes a direct interface to the dedicated port.

The SIMD array sequence controller determines the operations performed by the set of PEs. It also is responsible for some of the computational work itself. The sequence controller may have various forms and is itself a target for new designs even today. Only in the nearly full general sense, a ready of features and subcomponents unify virtually variations.

As a start approximation, Amdahl'south law may be used to estimate the performance gain of a classical SIMD assortment computer. Presume that in a given instruction bicycle either all the assortment processor cores, p n , perform their respective operations simultaneously or only the control sequencer performs a series operation with the assortment processor cores idle; also assume that the fraction of cycles, f, can have advantage of the array processor cores. Then using Amdahl's police force (meet Section 2.7.ii) the speedup, S, can be determined as:

(two.11) S = 1 1 f + ( f p north )

Read full affiliate

URL:

https://www.sciencedirect.com/science/commodity/pii/B9780124201583000022

MPUs for Medical Networks

Syed Five. Ahamed , in Intelligent Networks, 2013

11.4.3 Object Processor Units

The architectural framework of typical object processor units (OPUs) is consistent with the typical representation of CPUs. Pattern of the object operation code (Oopc) plays an important office in the pattern of OPU and object-oriented machine. In an uncomplicated sense, this role is comparable to role of the 8-bit opc in the design of IAS machine during the 1944–1945 periods. For this (IAS) car, the opc length was eight bits in the 20-chip instructions, and the memory of 4096 discussion, 40-bit retention corresponds to the address space of 12 binary bits. The pattern experience of the game processors and the mod graphical processor units will serve as a platform for the pattern of the OPUs and hardware-based object machines.

The intermediate generations of machines (such as IBM 7094, 360-serial) provide a rich array of guidelines to derive the pedagogy sets for the OPUs. If a set of object registers or an object cache tin can be envisioned in the OPU, then the instructions respective to register instructions (R-series), register-storage (RS-series), storage (SS), immediate operand (I-series), and I/O series instructions for OPU tin can also be designed. The teaching set up will need an expansion to suit the application. It is logical to foresee the demand of control object memories to replace the command memories of the microprogrammable computers.

The pedagogy set of the OPU is derived from the most frequent object functions such every bit (i) unmarried-object instructions, (two) multiobject instructions, (iii) object to object memory instructions, (iv) internal object–external object instructions, and (v) object relationship instructions. The separation of logical, numeric, seminumeric, alphanumeric, and convolutions functions between objects volition also be necessary. Hardware, firmware, or creature-forcefulness software (compiler power) tin can accomplish these functions. The need for the next-generation object and noesis machines (discussed in Section xi.5) should provide an economic incentive to develop these architectural improvements beyond the basic OPU configuration shown in Effigy eleven.2.

Figure xi.two. Schematic of a hardwired object processor unit (OPU). Processing northward objects with m (maximum) attributes generates an n×one thousand matrix. The common, interactive, and overlapping attributes are thus reconfigured to establish primary and secondary relationships between objects. DMA, direct memory access; IDBMS, Intelligent, data, object, and attribute base(s) management arrangement(s); KB, knowledge base of operations(s). Many variations tin can be derived.

The designs of OPU can be as diversified as the designs of a CPU. The CPUs, I/O device interfaces, different retention units, and direct retentiveness access hardware units for high-speed data exchange between main memory units and large secondary memories. Over the decades, numerous CPU architectures (single bus, multibus, hardwired, micro- and nanoprogrammed, multicontrol memory-based systems) have come and gone.

Some of microprogrammable and RISC architecture still exist. Efficient and optimal performance from the CPUs also needs combined SISD, SIMD, MISD, and MIMD, (Stone 1980) and/or pipeline architectures. Combined CPU designs can utilize different clusters of architecture for their subfunctions. Some formats (east.g., array processors, matrix manipulators) are in agile use. Two concepts that have survived many generations of CPUs are (i) the algebra of functions (i.e., opcodes) that is well delineated, accepted, and documented and (ii) the operands that undergo dynamic changes as the opcode is executed in the CPU(due south).

An architectural consonance exists between CPUs and OPUs. In pursuing the similarities, the 5 variations (SISD, SIMD, MISD, MIMD, and/or pipeline) design established for CPUs can be mapped into v corresponding designs; single process unmarried object (SPSO), single process multiple objects (SPMO), multiple process unmarried object (MPSO), multiple process multiple objects (MPMO), and/or fractional procedure pipeline, respectively (Ahamed, 2003).

Read total affiliate

URL:

https://www.sciencedirect.com/science/commodity/pii/B978012416630100011X

Demultiplexing

George Varghese , in Network Algorithmics, 2005

eight.6 DYNAMIC PACKET FILTER: COMPILERS TO THE RESCUE

The Pathfinder story ends with an appeal to hardware to handle demultiplexing at high speeds. Since it is unlikely that nigh workstations and PCs today tin afford defended demultiplexing hardware, it appears that implementors must choose between the flexibility afforded past early on demultiplexing and the limited performance of a software classifier. Thus it is hardly surprising that loftier-functioning TCP [CJRS89], agile messages [vCGS92], and Remote Process Call (RPC) [TNML93] implementations utilize hand-crafted demultiplexing routines.

Dynamic packet filter [EK96] (DPF) attempts to accept its cake (gain flexibility) and swallow it (obtain performance) at the aforementioned fourth dimension. DPF starts with the Pathfinder trie idea. Even so, it goes on to eliminate indirections and extra checks inherent in cell processing by recompiling the classifier into machine code each time a filter is added or deleted. In effect, DPF produces separate, optimized code for each cell in the trie, equally opposed to generic, unoptimized code that can parse whatsoever cell in the trie.

DPF is based on dynamic code generation technology [Eng96], which allows lawmaking to be generated at run time instead of when the kernel is compiled. DPF is an awarding of Principle P2, shifting computation in time. Annotation that by run time nosotros mean classifier update time and non packet processing time.

This is fortunate because this implies that DPF must be able to recompile code fast enough then as not to slow downwardly a classifier update. For instance, it may take milliseconds to set a connection, which in turn requires adding a filter to identify the endpoint in the same time. Past contrast, information technology can accept a few microseconds to receive a minimum-size packet at gigabit rates. Despite this leeway, submillisecond compile times are even so challenging.

To understand why using specialized code per cell is useful, it helps to sympathise two generic causes of cell-processing inefficiency in Pathfinder:

Interpretation Overhead: Pathfinder code is indeed compiled into machine instructions when kernel lawmaking is compiled. However, the code does, in some sense, "interpret" a generic Pathfinder cell. To meet this, consider a generic Pathfinder cell C that specifies a 4-tuple: commencement, length, mask, value. When a packet P arrives, idealized machine code to check whether the cell matches the parcel is as follows:

LOAD R1, C(Showtime); (* load outset specified in cell into register R1 *)

LOAD R2, C(length); (* load length specified in prison cell into register R1 *)

LOAD R3, P(R1, R2); (* load packet field specified by first into R3 *)

LOAD R1, C(mask); (* load mask specified in prison cell into register R1 *)

AND R3, R1; (* mask packet field every bit specified in prison cell *)

LOAD R2, C(value); (* load value specified in prison cell into register R5 *)

BNE R2, R3; (* branch if masked package field is not equal to value *)

Observe the extra instructions and extra retentivity references in Lines i, 2, 4, and 6 that are used to load parameters from a generic prison cell in gild to be available for later comparison.

Condom-Checking Overhead: Considering bundle filters written by users cannot be trusted, all implementations must perform checks to guard against errors. For example, every reference to a bundle field must be checked at run time to ensure that information technology stays within the electric current parcel existence demultiplexed. Similarly, references need to exist checked in real time for retentivity alignment; on many machines, a memory reference that is not aligned to a multiple of a word size can crusade a trap. After these additional checks, the code fragment shown earlier is more than complicated and contains even more than instructions.

By specializing code for each prison cell, DPF can eliminate these two sources of overhead past exploiting information known when the jail cell is added to the Pathfinder graph.

Exterminating Estimation Overhead: Since DPF knows all the cell parameters when the cell is created, DPF can generate code in which the jail cell parameters are directly encoded into the machine code as immediate operands. For example, the before lawmaking fragment to parse a generic Pathfinder prison cell collapses to the more meaty prison cell-specific code:

LOAD R3, P(offset, length); (* load package field into R3 *)

AND R3, mask; (* mask packet field using mask in instruction *)

BNE R3, value; (* branch if field not equal to value *)

Find that the extra instructions and (more than chiefly) extra retention references to load parameters accept disappeared, because the parameters are straight placed equally immediate operands inside the instructions.

Mitigating Safe-Checking Overhead: Alignment checking can exist reduced in the expected case (P11) by inferring at compile time that most references are word aligned. This can exist washed by examining the consummate filter. If the initial reference is discussion aligned and the current reference (kickoff plus length of all previous headers) is a multiple of the word length, then the reference is word aligned. Existent-time alignment checks need merely be used when the compile time inference fails, for example, when indirect loads are performed (east.1000., a variable-size IP header). Similarly, at compile time the largest offset used in any cell can be determined and a single check can be placed (before packet processing) to ensure that the largest get-go is within the length of the current packet.

Once one is onto a good thing, it pays to push it for all information technology is worth. DPF goes on to exploit compile-time cognition in DPF to perform farther optimizations as follows. A start optimization is to combine small accesses to adjacent fields into a single large access. Other optimizations are explored in the exercises.

DPF has the following potential disadvantages that are made manageable through careful design.

Recompilation Fourth dimension: Recall that when a filter is added to the Pathfinder trie (Figure 8.6), merely cells that were non present in the original trie need to be created. DPF optimizes this expected case (P11) past caching the code for existing cells and copying this lawmaking directly (without recreating them from scratch) to the new classifier code cake. New lawmaking must exist emitted only for the newly created cells. Similarly, when a new value is added to a hash table (e.thou., the new TCP port added in Figure 8.half-dozen), unless the hash role changes, the lawmaking is reused and only the hash table is updated.

Code Bloat: One of the standard advantages of interpretation is more compact code. Generating specialized code per cell appears to create excessive amounts of code, peculiarly for large numbers of filters. A large lawmaking footprint can, in turn, result in degraded instruction cache operation. However, a careful examination shows that the number of distinct lawmaking blocks generated by DPF is only proportional to the number of singled-out header fields examined by all filters. This should scale much ameliorate than the number of filters. Consider, for case, x,000 simultaneous TCP connections, for which DPF may emit only iii specialized lawmaking blocks: 1 for the Ethernet header, one for the IP header, and one hash tabular array for the TCP header.

The final performance numbers for DPF are impressive. DPF demultiplexes messages xiii–26 times faster than Pathfinder on a comparable platform [EK96]. The time to add together a filter, even so, is only three times slower than Pathfinder. Dynamic lawmaking generation accounts for only 40% of this increased insertion overhead.

In any instance, the larger insertion costs appear to be a reasonable mode to pay for faster demultiplexing. Finally, DPF demultiplexing routines appear to rival or crush hand-crafted demultiplexing routines; for example, a DPF routine to demultiplex IP packets takes 18 instructions, compared to an earlier value, reported in Clark [Cla85], of 57 instructions. While the two implementations were on different machines, the numbers provide some indication of DPF quality.

The concluding message of DPF is twofold. Showtime, DPF indicates that 1 can obtain both functioning and flexibility. Just as compiler-generated code is often faster than hand-crafted lawmaking, DPF code appears to brand paw-crafted demultiplexing no longer necessary. 2d, DPF indicates that hardware support for demultiplexing at line rates may non be necessary. In fact, it may be hard to allow dynamic lawmaking generation on filter creation in a hardware implementation. Software demultiplexing allows cheaper workstations; it too allows demultiplexing code to do good from processor speed improvements.

Technology Changes Can Invalidate Design Assumptions

In that location are several examples of innovations in architecture and operating systems that were discarded after initial apply and and so returned to exist used again. While this may seem like the whims of mode ("collars are frilled once more in 1995") or reinventing the bike ("there is cipher new under the sun"), it takes a conscientious understanding of current technology to know when to dust off an old idea, possibly even in a new guise.

Have, for example, the cadre of the telephone network used to send voice calls via analog signals. With the advent of fiber optics and the transistor, much of the cadre phone network at present transmits phonation signals in digital formats using the T1 and SONET hierarchies. However, with the advent of wavelength-sectionalisation multiplexing in optical fiber, there is at least some talk of returning to analog manual.

Thus the good system designer must constantly monitor available technology to check whether the organisation design assumptions have been invalidated. The idea of using dynamic compilation was mentioned past the CSPF designers in Mogul et al. [MRA87] simply was was not considered farther. The CSPF designers assumed that tailoring code to specific sets of filters (past recompiling the classifier code whenever a filter was added) was too "complicated."

Dynamic compilation at the fourth dimension of the CSPF pattern was probably ho-hum and as well not portable across systems; the gains at that time would have also been marginal because of other bottlenecks. All the same, by the time DPF was being designed, a number of systems, including VCODE [Eng96], had designed adequately fast and portable dynamic compilation infrastructure. The other classifier implementations in DPF's lineage had too eliminated other bottlenecks, which allowed the benefits of dynamic compilation to stand out more conspicuously.

Read full affiliate

URL:

https://www.sciencedirect.com/science/article/pii/B9780120884773500102

Early Intel® Architecture

In Ability and Performance, 2015

1.1.four Machine Code Format

One of the more than complex aspects of x86 is the encoding of instructions into car codes, that is, the binary format expected by the processor for instructions. Typically, developers write associates using the instruction mnemonics, and allow the assembler select the proper teaching format; yet, that isn't always feasible. An engineer might want to bypass the assembler and manually encode the desired instructions, in society to utilize a newer didactics on an older assembler, which doesn't support that education, or to precisely control the encoding utilized, in order to control code size.

8086 instructions, and their operands, are encoded into a variable length, ranging from 1 to 6 bytes. To accommodate this, the decoding unit parses the earlier bits in order to determine what bits to await in the futurity, and how to interpret them. Utilizing a variable length encoding format trades an increase in decoder complexity for improved code density. This is because very common instructions can exist given brusk sequences, while less common and more circuitous instructions can be given longer sequences.

The get-go byte of the machine code represents the education's opcode . An opcode is simply a fixed number corresponding to a specific grade of an teaching. Different forms of an pedagogy, such as i form that operates on a annals operand and 1 form that operates on an firsthand operand, may have different opcodes. This opcode forms the initial decoding land that determines the decoder's side by side actions. The opcode for a given teaching format can be found in Volume 2, the Education Prepare Reference, of the Intel SDM.

Some very mutual instructions, such every bit the stack manipulating Push button and Popular instructions in their register form, or instructions that employ implicit registers, can exist encoded with only i byte. For instance, consider the PUSH instruction, that places the value located in the annals operand on the acme of the stack, which has an opcode of 01010ii. Annotation that this opcode is merely 5 bits. The remaining iii to the lowest degree significant bits are the encoding of the register operand. In the modernistic instruction reference, this instruction format, "PUSH r16," is expressed every bit "01050 + rw" (Intel Corporation, 2013). The rw entry refers to a register lawmaking specifically designated for unmarried byte opcodes. Table i.3 provides a list of these codes. For example, using this table and the reference above, the binary encoding for PUSH AX is 0x50, for Push BP is 0x55, and for Push DI is 0x57. As an bated, in later processor generations the 32- and 64-bit versions of the PUSH didactics, with a register operand, are also encoded equally 1 byte.

Tabular array 1.3. Register Codes for Single Byte Opcodes "+rw" (Intel Corporation, 2013)

rw Register
0 AX
1 CX
2 DX
3 BX
iv SP
5 BP
6 SI
7 DI

If the format is longer than 1 byte, the second byte, referred to equally the Mod R/M byte, describes the operands. This byte is comprised of three different fields, MOD, $.25 vii and 6, REG, bits 5 through 3, and R/1000, bits 2 through 0.

The Modernistic field encodes whether 1 of the operands is a retentivity address, and if and then, the size of the memory commencement the decoder should expect. This memory offset, if present, immediately follows the Mod R/M byte. Tabular array i.four lists the meanings of the MOD field.

Table 1.4. Values for the Modernistic Field in the Modern R/Grand Byte (Intel Corporation, 2013)

Value Retention Operand Offset Size
00 Yes 0
01 Yes 1 Byte
10 Yep 2 Bytes
eleven No 0

The REG field encodes i of the register operands, or, in the instance where there are no annals operands, is combined with the opcode for a special instruction-specific pregnant. Table 1.5 lists the various register encodings. Notice how the loftier and low byte accesses to the information group registers are encoded, with the byte access to the arrow/index classification of registers really accessing the high byte of the data grouping registers.

Table 1.5. Register Encodings in Mod R/M Byte (Intel Corporation, 2013)

Value Register (xvi/8)
000 AX/AL
001 CX/CL
010 DX/DL
011 BX/BL
100 SP/AH
101 BP/CH
110 SI/DH
111 DI/BH

In the case where Modern = iii, that is, where in that location are no memory operands, the R/Thou field encodes the second annals operand, using the encodings from Table 1.5. Otherwise, the R/M field specifies how the retentiveness operand's address should be calculated.

The 8086, and its other 16-scrap successors, had some limitations on which registers and forms could be used for addressing. These restrictions were removed one time the architecture expanded to 32-$.25, so it doesn't make also much sense to certificate them here.

For an example of the REG field extending the opcode, consider the CMP teaching in the form that compares an 16-bit immediate against a 16-chip register. In the SDM, this class, "CMP r16,imm16," is described as "81 /7 iw" (Intel Corporation, 2013), which ways an opcode byte of 0x81, then a Mod R/M byte with Modernistic = 112, REG = 7 = 1112, and the R/Thou field containing the 16-fleck register to test. The iw entry specifies that a 16-bit firsthand value volition follow the Mod R/M byte, providing the immediate to test the register against. Therefore, "CMP DX, 0xABCD," will be encoded as: 0x81, 0xFA, 0xCD, 0xAB. Notice that 0xABCD is stored byte-reversed because x86 is picayune-endian.

Consider some other example, this fourth dimension performing a CMP of a 16-bit immediate against a retention operand. For this example, the retentivity operand is encoded as an offset from the base pointer, BP + viii. The CMP encoding format is the same every bit earlier, the departure will be in the Modern R/M byte. The MOD field will exist 012, although 10two could be used equally well merely would waste an extra byte. Similar to the last example, the REG field will exist 7, 1112. Finally, the R/One thousand field will exist 1102. This leaves us with the beginning byte, the opcode 0x81, and the second byte, the Mod R/M byte 0x7E. Thus, "CMP 0xABCD, [BP + 8]," will be encoded as 0x81, 0x7E, 0ten08, 0xCD, 0xAB.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B978012800726600001X