OneSwap Series 4 - ABI is not the invisible man

13 min readSep 27, 2020

In the past development of programming languages, ABI (Application Binary Interface) has played a vital role. The function call syntax in the source code can be understood by programmers, but not by machines, which can only understand binary data. Whether the parameters should be passed through the register or the stack, the order, and the arrangement of the members of the structure parameters… all these must be clearly defined when a function calls another function. That is the ABI. Under the same ABI, the functions in the binary files generated by compilers from different vendors, or even functions from different languages such as C language and Pascal language, can call each other.

Despite its importance, most programmers are unfamiliar with ABI. Why? Because the compiler encapsulates the ABI so well that programmers hardly need to care about its details by themselves. For them, ABI is just the invisible man.

ABI is also very important for both Solidity and EVM. For example, the following important matters are defined by the ABI: how to pass the parameters when calling a contract’s functions; how to pass the return values of the contract (Solidity supports multiple return values); and how to pass the parameters of the Event.

Unlike C language programmers, Solidity programmers find ABI a big headache. What forces us to care about ABI in the Solidity language? There are two main reasons:

Solidity is yet to be mature, and the underlying ABI is not encapsulated well enough, so programmers need to care about the underlying ABI
ABI will affect gas consumption. Without much knowledge about ABI, you may consume much Gas unintentionally

Next, we will briefly describe the details of Solidity’s ABI first, and then introduce some scenarios where we need to pay attention to the details of ABI.

Solidity’s ABI

A contract is always executed in an EVM instance. The EVM is relatively isolated from the outside. Its stack and memory are not readable or writable from the outside. At the same time, it has especially opened up the calldata area and the returndata area for inputs. Byte strings at any length can be stored in the two areas. The detailed interaction mechanism is described as below:

When an external account calls the contract, the calldata data in the transaction is copied to the calldata area of the called EVM as input
When a contract calls another contract, a fragment in the memory of the EVM needs to be specified, which is copied to the calldata area of the called EVM as input
When a contract finishes execution, a fragment in the memory of the EVM needs to be specified, and this fragment will be copied to the returndata area of the caller

EVM provides special instructions to:

Copy the data in the calldata area to the stack or memory to understand what parameters were passed to me from outside
Copy the data in the returndata area to the memory to understand what data the called contract has returned

The Solidity ABI specifies how the parameters should be arranged to form a byte string. How the input parameters are arranged in the calldata area, how the output parameters are arranged in the returndata area, and how the Event parameters are encoded as log data: these issues are all specified by this ABI. To go deep into ABI, you can refer to Specification. Here is a brief introduction:

Encode the parameters in the order they appear. The encoding result is divided into two parts: head (the content and offset of the fixed-length data) and tail (the content of the variable-length data)
The encoded data are in units of 32 bytes, and every 32 bytes of data is counted as a slot
Integers and bool values occupy one slot. If the length is less than 32 bytes, zeros should be added to the high digit
An integer array or bool array of length N occupies N slots, and each member occupies 1 slot
Variable-length data types need to be represented by three parts: offset, length, and content

The offset is placed in the head part, as a pointer, pointing to the starting position of the variable-length data content
The length occupies 1 slot, which represents the length of the content; the content occupies an integer number of slots, and they are all placed in the tail part
The length of string and bytes types is their number of bytes. The content is densely arranged in N slots. If it cannot occupy an integer number of slots, the last slot needs to be filled with zeros.
The length of variable-length integer arrays or bool arrays is the number of their members, and each member occupies a slot

For the calldata for the called contract, in addition to the parameters, the first four bytes will also include a “function selector”, which will be described in detail below.

If the calldata actually passed to a contract is shorter than expected, the calldata parsing logic inserted by Solidity will find that and revert execution. What if it is longer than expected? Solidity just ignores these redundant data and will not report errors.

This is a very interesting feature. In blockchains like EOS and CoinEx Chain, a memo can be added to a transaction. It does not change the semantics of the transaction but appends extra information to the transaction which is packaged to the chain as a permanent storage. Ethereum does not allow explicit support for memo in the transaction format, but we can append harmless information at the end of calldata for a memo-like effect.

Gas Consumption When External Accounts Call Contracts

The ABI of Solidity set people thinking: Aren’t there too many zeros? A 32-bit integer parameter needs to be filled with 28 zeros, and a bool value with 31 zeros. In this way, in the transaction that calls a contract, most of the bytes are used to fill zero.

That is true. To reduce the Gas consumed by these zero-filled users, Ethereum stipulates that a non-zero byte in a transaction costs 68 Gas, and a zero byte costs 4 Gas, a difference of 17 times. So you will not feel that painful from the zero-filled bytes.

But we still have to note that the parameter definition of the external function is still very important for Gas consumption. For example, to pass 256 bool values in, if you use the bool[256] type, it will consume 4*32*256=32768 Gas, and if you use a uint256 as a bit mask to pass it, it only consumes 68*32=2176 Gas.

In the OneSwap contracts, parameters are compressed when the number of bytes for zero filling may be relatively large. For example, the function of OneSwapToken to transfer tokens to multiple addresses:

function multiTransfer(uint256[] calldata mixedAddrVal) public override returns (bool) {
      for (uint i = 0; i < mixedAddrVal.length; i++) {
          address to = address(mixedAddrVal[i]>>96);
          uint256 value = mixedAddrVal[i]&(2**96-1);
          _transfer(msg.sender, to, value);
      }
      return true;
  }

In the mixedAddrVal array here, the upper 160 digits of each member are the payment address, and the lower 96 digits are the amount of the payment. It does not use two variable-length arrays to save the list of recipient addresses and the list of transfer amounts, as that leads to a lot of zero-filled bytes.

For another example, the function of trading pairs for deleting orders in batches is defined as follows:

function removeOrders(uint[] calldata rmList) external override lock {
      uint[5] memory proxyData;
      uint expectedCallDataSize = 4+32*(ProxyData.COUNT+2+rmList.length);
      ProxyData.fill(proxyData, expectedCallDataSize);
      for(uint i = 0; i < rmList.length; i++) {
          uint rmInfo = rmList[i];
          bool isBuy = uint8(rmInfo) != 0;
          uint32 id = uint32(rmInfo>>8);
          uint72 prevKey = uint72(rmInfo>>40);
          _removeOrder(isBuy, id, prevKey, proxyData);
      }
  }

In the rmList array, each member contains a bool value, a 32-bit integer, and a 72-bit integer, a total of 14 non-zero bytes. Well, each member still has 18 zero bytes. To compress it further, you can make each array member correspond to two orders to be deleted, but considering the readability of the code, this method was not adopted.

Keyword calldata

In the two examples above, you may have noticed that the parameters are marked with the calldata attribute. In Solidity, all parameters passed by reference, including bytes, string, and fixed-length and variable-length arrays, need to be marked as to which one of storage, memory, and calldata the source is. If it is storage, the parameter comes from persistent storage; if it is memory, it comes from memory; if it is calldata, it comes from the calldata area. If a function has an external attribute, it cannot receive storage-type parameters, because external storage cannot be accessed from the outside. The following code segment will report an error after compilation:

function try2(uint[] storage aList) external returns (uint) {
return aList[0]+aList[1];
}
// Error: Data location must be "memory" or "calldata" for parameter in external function, but "storage" was given.

If the parameter of an internal function is memory, then it can be called after other functions modify the parameters in memory; if its parameter is calldata, it means that this parameter is read-only, as other functions cannot modify calldata area.

Whether the parameter of an external function is marked as memoryor calldatadoes not seem to make a big difference as both can do the same job. But in fact, they vary in Gas consumption. Before an external function is executed, some ABI parsing logic inserted by Solidity itself will be executed, which is to copy the data in the calldata area. Specifically:

Ordinary value-type data (integer, bool, address) need to be copied to the stack
Memory-type data need to be copied to memory
Calldata-type data is not copied beforehand. Instead, during the execution, the external function will decide when to copy and which to copy.

For external functions, it is advised to set the parameters to calldata instead of memory, so that it can be copied on demand and save Gas.

The first “copy-to-stack” approach mentioned above sometimes leads to a headache for programmers. For example:

function try1(uint a1, uint a2, uint a3, uint a4, uint a5, uint a6, uint a7, uint a8, uint a9) public pure returns (uint) {
return try2(a1, a2, a3, a4, a5, a6, a7, a8, a9);
}

function try2(uint a1, uint a2, uint a3, uint a4, uint a5, uint a6, uint a7, uint a8, uint a9) public pure returns (uint) {
return a1+a2+a3+a4+a5+a6+a7+a8+a9;
}

These two functions seem perfect. In other programming languages, similar codes can be compiled and executed smoothly. But not in Solidity. An error is reported:

Compiler error: Stack too deep, try removing local variables.

That is because, in Solidity, there cannot be over 16 active local variables on the stack at any place in the code. Variables declared inside the function, incoming parameters, and the parameters for calling other functions, are all considered as local variables. The above try1 function has 9+9=18>16 local variables, thus causing an error.

Therefore, when faced with similar troubles because of too many input parameters, you may consider packing some parameters into an array of calldata type, so that it does not take up stack space. For example, following Solidity code can compile and run:

function try3(uint[9] calldata a) public pure returns (uint) {
return a[0]+a[1]+a[2]+a[3]+a[4]+a[5]+a[6]+a[7]+a[8];
}

Gas Consumption of Event Parameters

To be specific, ABI-encoded data may consume Gas in the following five scenarios:

An external account calls a contract, producing calldata
A contract calls another contract, producing calldata
A contract is called by an external account, producing returndata
A contract is called by another contract, producing returndata
Gas consumed by Event parameters

According to the specifications of Ethereum, the 2, 3, and 4 listed above will not consume Gas at all. Of course, when calldata and returndata are consumed (read), Gas is consumed; but when they are produced, it requires no Gas.

We have just introduced the first condition above. As for the fifth, note that: After the Event parameter is encoded as a data segment, 8 Gas per byte is charged for both zero bytes and non-zero bytes. In this way, a bool variable may consume 32*8=256 Gas. To save Gas, it is necessary to compress the Event parameters, and we will talk more about it in subsequent articles.

Understand the ABI Before You Make a Low-level Call

In the OneSwap contract, there are some strange constant definitions:

bytes4 private constant _SELECTOR = bytes4(keccak256(bytes("transfer(address,uint256)")));
bytes4 private constant _SELECTOR2 = bytes4(keccak256(bytes("transferFrom(address,address,uint256)")));
bytes4 private constant _APPROVE_SELECTOR = bytes4(keccak256(bytes("approve(address,uint256)")));

What are these bytes4 type variables? It turns out that they are just the first four bytes of calldata as mentioned above, that is, the function selector. From the above statement, we can see the generation mechanism of the function selector: the function signature (namely, the function name and parameter type list) is passed as a string to the hash function keccak256, and the lowest 4 bytes of the obtained hash value is the function selector.

Unlike JVM and WebAssembly, EVM has no concept of functions. The function in Solidity is nothing more than a fragment of bytecode. After the EVM is started, a function selection logic inserted by Solidity will be executed, which examines the four-byte function selector and determines jumping to the beginning of which function’s bytecode fragment, through a series of if-else judgments.

In the OneSwap contract, the function selector is used in this way:

function _safeTransferToMe(address token, address from, uint value) internal {
      (bool success, bytes memory data) = token.call(abi.encodeWithSelector(_SELECTOR2, from, address(this), value));
      require(success && (data.length == 0 || abi.decode(data, (bool))), "LockSend: TRANSFER_TO_ME_FAILED");
  }
  function _safeTransfer(address token, address to, uint value) internal {
      (bool success, bytes memory data) = token.call(abi.encodeWithSelector(_SELECTOR, to, value));
      require(success && (data.length == 0 || abi.decode(data, (bool))), "LockSend: TRANSFER_FAILED");
  }
  function _safeApprove(address token, address spender, uint value) internal {
      token.call(abi.encodeWithSelector(_APPROVE_SELECTOR, spender, value)); // result does not matter
  }

We can apply the callfunction to the address of a contract, and its parameter is the calldata for calling this contract. We use abi.encodeWithSelectorto generate calldata. Its first parameter is a function selector, and the subsequent parameters will be encoded as a byte string according to the ABI specification and appended after the selector to form calldata. After the contract is called, it will return two values: one is the result of the call of the bool type, indicating whether the called contract has successfully ended; and the other is the return value of the contract of the bytes type, that is, returndata. This returndata can decoded into one or more parameters using abi.decode.

This method of calling the contract is quite low-level, exposing the mechanism of EVM contract call and the details of the ABI specification to users. You may be curious: As Solidity does provide a user-friendly syntax to call functions of other contracts, which is very similar to calling functions inside this contract: uint256 govOnes = IERC20(ones).balanceOf(address(this));, why bother to use the low-level callfunction?

The reason is that such a user-friendly mechanism is not flexible. In the above _safeApprove, for example, we completely ignore the two parameters returned by call. In other words, nothing will be done even if the call fails (that is, the result of the bool type call is false). This is business logic demand. With a user-friendly syntax, however, if the call fails, the current transaction will also fail. For another example, the above _safeTransfer take a contract as successfully executed as long as the returndata length returned by the contract is zero (data.length == 0) or there is a bool value that can be parsed as True (abi.decode(data, (bool))); yet in the user-friendly syntax, only the latter case will be considered as successful execution.

If you cannot confirm in advance whether the called contract is safe, it is advised to use the call function to make a low-level call, and then carefully analyze its two return values according to the needs of the business logic to determine the state of the call before deciding how to respond.

Use Assembly to Construct Variable-length Arrays

Solidity has very limited support for variable-length arrays in memory. For example, it isn’t like variable-length arrays in storage, which supports dynamic modification of its length with push and pop. The following functions cannot compile:

function copyArr(uint[] calldata a) public pure returns (uint[] memory b) {
      for(uint i=0; i<a.length; i++) b.push(a[i]);
      return b;
  }
  //Error: Member "push" is not available in uint256[] memory outside of storage.

We must determine its length when creating a variable-length array. Modify the code this way and it will compile and execute:

function copyArr(uint[] calldata a) public pure returns (uint[] memory b) {
      b = new uint[](a.length);
      for(uint i=0; i<a.length; i++) b[i] = a[i];
      return b;
  }

On the other hand, it’s not like the variable-length array in calldata that supports slicing. The sliceArrMemory in the code below cannot compile, but sliceArrCalldata can:

function sliceArrCalldata(uint[] calldata a) public pure returns (uint[] calldata b) {
      return a[:1];
  }
  function sliceArrMemory(uint[] memory a) public pure returns (uint[] memory b) {
      return a[:1];
  }

But sometimes, we do hope to return a variable-length array of an exactly appropriate length. For example, in the getOrderList function of querying the order book, each member of the return value corresponds to an order (except the zeroth one). Can I apply for a large enough memory array in advance and then do slicing to fetch valid fragments? Not in Solidity. How about applying for a small memory array in advance and then dynamically increasing it? Solidity does not support it, either.

Therefore, the only way is to implement a function with a variable-length array as the return value in assembly, according to the ABI specification. This is how OneSwap finally implements the getOrderList function:

// Get the orderbook's content, starting from id, to get no more than maxCount orders
  function getOrderList(bool isBuy, uint32 id, uint32 maxCount) external override view returns (uint[] memory) {
      if(id == 0) {
          if(isBuy) {
              id = uint32(_bookedStockAndMoneyAndFirstBuyID>>224);
          } else {
              id = uint32(_reserveStockAndMoneyAndFirstSellID>>224);
          }
      }
      uint[1<<22] storage orderbook;
      if(isBuy) {
          orderbook = _buyOrders;
      } else {
          orderbook = _sellOrders;
      }
      //record block height at the first entry
      uint order = (block.number<<24) | id;
      uint addrOrig; // start of returned data
      uint addrLen; // the slice's length is written at this address
      uint addrStart; // the address of the first entry of returned slice
      uint addrEnd; // ending address to write the next order
      uint count = 0; // the slice's length
      assembly {
          addrOrig := mload(0x40) // There is a “free memory pointer” at address 0x40 in memory
          mstore(addrOrig, 32) //the meaningful data start after offset 32
      }
      addrLen = addrOrig + 32;
      addrStart = addrLen + 32;
      addrEnd = addrStart;
      while(count < maxCount) {
          assembly {
              mstore(addrEnd, order) //write the order
          }
          addrEnd += 32;
          count++;
          if(id == 0) {break;}
          order = orderbook[id];
          require(order!=0, "OneSwap: INCONSISTENT_BOOK");
          id = uint32(order&_MAX_ID);
      }
      assembly {
          mstore(addrLen, count) // record the returned slice's length
          let byteCount := sub(addrEnd, addrOrig)
          return(addrOrig, byteCount)
      }
  }

We hope to construct a variable-length array conforming to the ABI specification as the return value, in EVM’s memory. As we introduced before, a variable-length array is encoded into three parts: offset, number of members, and the content of the array. The return value here has only a variable-length array and no other parameters, so the offset is fixed at 32, which means that the second part (i.e. the number of members) starts at the 32nd byte.

Solidity will save the end position of all allocated memory space at the memory location of 0x40. The memory space higher than this location is not allocated, and is also the space available for constructing the returndata of the variable-length array. The first uint256 in this space is used to store the offset 32 (mstore(addrOrig, 32)), the second uint256 in this space to store the number of members (mstore(addrLen, count)), and the following uint256 to store array members. Here, the variable addrEnd is used to point to the location where the next array member should be saved (mstore(addrEnd, order)). Every time a new array member is saved, its value is incremented by 32 (addrEnd += 32;).

Conclusion

In this article, we first introduce the Solidity ABI in brief. Then we describe some issues and tricks, and to understand and use them, you need to know the details of ABI.