OneSwap Series 7 — Basic Data Structures

11 min readOct 10, 2020

Every smart contract on Ethereum can read and write a dedicated KVStore, and both Key and Value are 256 bits long. Of course, you can also regard the KVStore as a huge array with a length of 2256, and each element must be a 256-bit-long byte sequence. Such a large KVStore/array has a cost. At present, it takes 800 Gas to access KVStore once (using the SLOADinstruction) and 20,000 Gas for the first writing (using the SSTOREinstruction). If you access a key/index that has never been written, 0 is returned; if you write 0, it is equivalent to deleting the value at the key/index (if there is a non-zero value here). To encourage smart contracts to delete useless KV pairs and save storage space, the system will refund 15,000 Gas when KV pairs are deleted (but the total Gas refunded for a transaction cannot exceed half of the total Gas consumption).

When we write smart contracts using Solidity, we don’t need to directly consider the SLOADand SSTOREinstructions. We can define state variables in Solidity contracts, which will be stored in KVStore in a certain order based on some rules (to be determined by the compiler). In addition to simple value variables (such as addresses, booleans, and various types of integers), Solidity also supports arrays and mapping. On that basis, we can also construct data structures such as Set, List, Stack, and Queue. This article will discuss the various storage variables supported by Solidity and their implementation details as well as the principles and applications of various data structures with OneSwap as a reference.

How State Variables Work?

Let’s first discuss the state variables supported by the Solidity language and see how these variables occupy storage space. We will take a simple contract as an example to analyze its compiled runtime bytecode (for an introduction to the runtime bytecode, please refer to the second article in this series). Here is the complete code for this example contract:

pragma solidity =0.6.12;

contract StateDemo {

  address                                         private v1;
  uint64                                          private v2;
  uint256                                         private v3;
  uint256[0x2000]                                 private a1;
  uint256[]                                       private a2;
  mapping(address => uint256)                     private m1;
  mapping(address => mapping(address => uint256)) private m2;

  function setV123() external {
      v1 = address(0xADD0);
      v2 = 0x1234;
      v3 = 0x5678;
  }

  function setA1() external returns (uint256) {
      a1[0x1001] = 0x1234;
      a1[0x1002] = 0x5678;
      return a1.length;
  }
  function setA2() external returns (uint256) {
      a2[0x1001] = 0xABAB;
      a2[0x1002] = 0xCDCD;
      return a2.length;
  }

  function setM1() external {
      m1[address(0xADD1)] = 0x1234;
      m1[address(0xADD2)] = 0x5678;
  }
  function setM2() external {
      m2[address(0xADD3)][address(0xADD4)] = 0x1234;
      m2[address(0xADD5)][address(0xADD6)] = 0x5678;
  }

}

Simple Value Variables

If a smart contract only uses state variables of value type and each state variable occupies one slot (256 bits), then these state variables will be stored in the KVStore in the defined order. In other words, the Nth state variable is in the Nth slot (N starts from 0). As mentioned earlier, the reading and writing of storage space are very Gas-consuming, so the compiler needs to try its best to pack multiple small-length value variables into one slot. Currently, the Solidity compiler has been optimized for this purpose but it will not rearrange these state variables. Therefore, to optimize Gas consumption to the utmost, the programmer must carefully arrange the state variables of the contract by himself.

In the example contract above, the first three state variables are value types, and the first two state variables can be placed in one slot. So these three state variables occupy a total of two slots, and the indexes are 0 and 1. This can be confirmed through the analysis of the runtime bytecode after the contract is compiled. To better observe the bytecode, we can use the online disassemblerto disassemble the code. The following is the disassembled result of the setV123()function:

function FUNC_4DEB3804() public return () {
  sstore(0x0, uint160(0xADD0));
  sstore(0x0, ((uint64(0x1234) * 0x10000000000000000000000000000000000000000) | (~0xFFFFFFFFFFFFFFFF0000000000000000000000000000000000000000 & sload(0x0))));
  sstore(0x1, 0x5678);
  return();
}

The above logic can be expressed in pseudo code as:

store[0x00] = (0x1234 << 160) | 0xADD0; // v1 = address(0xADD0); v2 = 0x1234;
store[0x01] = 0x5678;                   // v3 = 0x5678;

Almost all contracts use state variables of the value type, so there is no need to take examples one by one here. In addition, if the state variable is a simple structure type (the fields are all the value type), it will be implemented similarly: the compiler will try to put as many consecutive fields as possible into one slot, and reserve enough slots for the entire structure. The state variables of the structure type are not to be introduced in detail, and you can analyze the variables using the method described in this article.

Fixed-length Array

Solidity supports two types of arrays: fixed-length arrays and variable-length arrays. The length of the fixed-length array is known at compile time, so the Solidity compiler will reserve enough slots for the array. In other words, if the length of the fixed-length array is L, the compiler will reserve L slot(s). Assuming that other variables have used N slots before a fixed-length array, then the Mth element of this fixed-length array will be placed in the N+Mth slot (L, M, N all start from 0).

In the above example contract, a1 is a fixed-length array with a length of 0x2000. Two slots have been used before, so the compiler reserves the following 0x2000 slots for a1. We can confirm this by observing the bytecodes of the setA1() and setA2() functions. The following is the disassembled result of the setA1() function:

function FUNC_6E7E996E() public return (var0) {
  assert((0x1001 < 0x2000));
  sstore(0x1003, 0x1234);
  assert((0x1002 < 0x2000));
  sstore(0x1004, 0x5678);
  return(0x2000);
}

As you can see, since the length of the fixed-length array is determined at compile time, the compiler also checks the validity of the index by the way. The above logic can be expressed in pseudo code as:

store[0x1003] = 0x1234; // a1[0x1001] = 0x1234;
store[0x1004] = 0x5678; // a1[0x1002] = 0x5678;
return 0x2000;

OneSwap has implemented an order book using two fixed-length arrays in the OneSwapPaircontract. The key code is as follows (the contract will be introduced again when we discuss the linked list later):

contract OneSwapPair is OneSwapPool, IOneSwapPair {
  // the orderbooks. Gas is saved when using array to store them instead of mapping
  uint[1<<22] private _sellOrders;
  uint[1<<22] private _buyOrders;
  ... // Other codes omitted
}

Variable-length Array

Unlike fixed-length arrays, the length of variable-length arrays can be known only at runtime and can change dynamically. Since the length is variable, slots cannot be reserved like in fixed-length arrays. Solidity reserves one slot for the fixed-length array to record the actual length of the array. The starting slot of the array is calculated by the hash algorithm (keccak256) on the basis of the slot reserved for the array. Assuming that N slots have been used before a variable-length array, the length of the array is recorded in the Nth slot; the Mth element of the array is stored in the hash(N) + Mth slot (M and N both start from 0).

In the above example contract, a2is a variable-length array, and 0x2002slots have been used before it, so the Solidity compiler records the length of a2in the 0x2002th slot and the starting index of a2is hash(0x2002). We can confirm this by observing the bytecode of the setA2()function. The following is the disassembled result of the function:

function FUNC_498E6857() public return (var0) {
  assert((0x1001 < sload(0x2002)));
  mstore(0x0, 0x2002);
  temp0 = keccak256(0x0, 0x20);
  sstore((temp0 + 0x1001), 0xABAB);
  assert((0x1002 < sload(0x2002)));
  mstore(0x0, 0x2002);
  temp1 = keccak256(0x0, 0x20);
  sstore((temp1 + 0x1002), 0xCDCD);
  return(sload(0x2002));
}

As you can see, the compiler also checks the validity of the array index. The above logic can be expressed in pseudo code as:

store[hash(0x2002) + 0x1001] = 0xABAB; // a2[0x1001] = 0xABAB;
store[hash(0x2002) + 0x1002] = 0xCDCD; // a2[0x1001] = 0xCDCD;
return store[0x2002];

OneSwap uses a variable-length array (allPairsfield) in the OneSwapFactorycontract to record the addresses of all trading pairs it creates. The key code is as follows:

contract OneSwapFactory is IOneSwapFactory {
  struct TokensInPair {
      address stock;
      address money;
  }

  address public override feeTo;
  address public override feeToSetter;
  address public immutable gov;
  address public immutable ones;
  uint32 public override feeBPS = 50;
  address public override pairLogic;
  mapping(address => TokensInPair) private _pairWithToken;
  mapping(bytes32 => address) private _tokensToPair;
  address[] public allPairs;
  ... // Other codes omitted
}

Simple Mapping

In Solidity, the mapped Key must be a built-in value type, but Value can be a value type or a complex type such as structure, array, and mapping. The fundamental principle remains the same. Let’s first look at how the simplest mapping (Value is a value type) works. Assuming that N slots has been used before a mapping, the compiler will reserve the Nth slot for the mapping (instead of storing the data). And the slot corresponding to a Key K of this mapping is the hash result of the concatenating of K and N, which is hash(K, N).

In the above example contract, m1 is a simple mapping, its Key is an address type, and Value is an integer of uint256. Before m1 0x2003 slots have been used, so for m1, the slot corresponding to Key K is hash(K, 0x2003). We can confirm this by observing the bytecode of the setM1() function. The following is the disassembled result of the function (manually generated by the author):

function FUNC_D216F61F() public return () {
  mstore(0x00, 0xadd1 & 0xffffffffffffffffffffffffffffffffffffffff);
  mstore(0x20, 0x2003);
  sstore(keccak256(0x00, 0x40), 0x1234);

  mstore(0x00, 0xadd2 & 0xffffffffffffffffffffffffffffffffffffffff);
  mstore(0x20, 0x2003);
  sstore(keccak256(0x00, 0x40), 0x5678);
  return();
}

The above logic can be expressed in pseudo code as:

store[hash(0xadd1, 0x2003)] = 0x1234; // m1[address(0xADD1)] = 0x1234;
store[hash(0xadd2, 0x2003)] = 0x5678; // m1[address(0xADD2)] = 0x5678;

Many contracts in OneSwap use mapping. For example, the OneSwapFactory contract mentioned above defines two mappings (_tokensToPair and _pairWithToken). It is worth noting that if we use the mapping alone, we cannot iterate through the KV pairs. To obtain the iteration capability, the mapping often needs to be used with an array. For example, the allPairs array in the OneSwapFactory contract records addresses of all trading pairs, making it possible to iterate through all trading pairs created by the factory.

Complex Mapping

As mentioned earlier, the mapped Value can also be a complex type, such as structure, array, and mapping. Here we only analyze the most commonly used case where Value is a simple mapping. You can use the method described in this article to analyze other cases by yourself. In the above example contract, m2 defines a "mapping of mapping", and the setM2() function operates on the mapping. This time, let’s first take a look at the decompilation result of the setM2() function (manually generated by the author):

function FUNC_3ACBD4FB() public return () {
  mstore(0x00, 0xadd3 & 0xffffffffffffffffffffffffffffffffffffffff);
  mstore(0x20, 0x2004);
  hash1 = keccak256(0x00, 0x40);
  mstore(0x00, 0xadd4 & 0xffffffffffffffffffffffffffffffffffffffff);
  mstore(0x20, hash1);
  hash2 = keccak256(0x00, 0x40);
  sstore(hash2, 0x1234);

  mstore(0x00, 0xadd5 & 0xffffffffffffffffffffffffffffffffffffffff);
  mstore(0x20, 0x2004);
  hash1 = keccak256(0x00, 0x40);
  mstore(0x00, 0xadd6 & 0xffffffffffffffffffffffffffffffffffffffff);
  mstore(0x20, hash1);
  hash2 = keccak256(0x00, 0x40);
  sstore(hash2, 0x5678);
  return();
}

It is clear from the above code that the compiler also reserves one slot for m1. Through twice hashes on the basis of the slot index and the actual key, we can get the final slot index of Value. Assuming that the slot occupied by the mapping is N, the compiler will reserve the Nth slot. Assuming that the two Keys are K1and K2respectively, the slot occupied by Value is hash(K2, hash(K1, N)). The logic of the above function can be expressed in pseudo code as:

store[hash(0xADD4, hash(0xADD3, 0x2004))] = 0x1234; // m2[address(0xADD3)][address(0xADD4)] = 0x1234;
store[hash(0xADD6, hash(0xADD5, 0x2004))] = 0x1234; // m2[address(0xADD5)][address(0xADD6)] = 0x5678;

In OneSwap, the OneSwapTokencontract uses the "mapping of mapping" to implement the "approve" function of ERC20 token. The key states of the contract is given below:

contract OneSwapToken is IOneSwapToken, OneSwapBlackList {

  using SafeMath256 for uint256;

  mapping (address => uint256) private _balances;
  mapping (address => mapping (address => uint256)) private _allowances;
  uint256 private _totalSupply;
  string private _name;
  string private _symbol;
  uint8 private immutable _decimals;
  ... // Other codes omitted

}

Note that despite the huge space of KVStorage provided by EVM and the strong collision resistance of the keccak256 hash algorithm, collisions remain possible theoretically, but the possibility is too small to be considered. However, the use of variable length arrays will increase the possibility of collisions and may bring safety hazards, so prudent treatment is required. For more discussion on Ethereum storage hash collision, please refer to this article.

Data Structure

Arrays and mappings are the basic data structures provided by the Solidity language. On this basis, it is easy to construct data structures such as Sets, Linked Lists, Stacks, and Queues. Next, we will introduce in brief the implementation of these data structures and their application in OneSwap.

Set

The implementation of the Set is very simple. Just define a mapping where Value is a Boolean type. For example, the OneSwapBlackList abstract contract uses a collection to record blacklisted users. The key code is as follows:

abstract contract OneSwapBlackList is IOneSwapBlackList {

  address private _owner;
  address private _newOwner;
  mapping(address => bool) private _isBlackListed;
  ... // Other codes omitted
}

Like mapping, the set is not iterable. If you want to implement a iterable set, you need the help of an additional array. For example, the OneSwapBuybackcontract uses mapping and arrays to implement a iterable list of mainstream tokens. The key code is as follows:

contract OneSwapBuyback is IOneSwapBuyback {

  mapping(address => bool) private _mainTokens;
  address[] private _mainTokenArr;
  ... // Other codes omitted
}

Linked List

The linked list can be implemented by mapping + struct, or array + struct. If it is a single-linked list, then we only need to record the previous (or next) node in the structure; if it is a double-linked list, both the previous and next nodes need to be recorded. For example, the OneSwapPair contract uses a fixed-length array + struct to construct single-linked list to keep an order book. The key code is as follows:

struct Order { // total 256 bits
  address sender; // 160 bits, sender creates this order
  uint32 price;   // 32-bit decimal floating point number
  uint64 amount; // 42 bits are used, the stock amount to be sold or bought
  uint32 nextID; // 22 bits are used
}

contract OneSwapPair is OneSwapPool, IOneSwapPair {
  // the orderbooks. Gas is saved when using array to store them instead of mapping
  uint[1<<22] private _sellOrders;
  uint[1<<22] private _buyOrders;
  ... // Other codes omitted
}

Since the order list may be very long, OneSwap applies a technique of “off-chain query and on-chain confirmation”. For example, when placing a limit order, you can query the position of the order on the chain beforehand, and then directly confirm the position and insert the order when the transaction is executed. Thanks to these optimization techniques and other various optimization methods, OneSwap keeps the gas consumption of various operations at the same low level as UniSwap while supporting limit orders.

Stacks and Queues

Solidity variable-length arrays provide push() and pop() functions, so they can be used directly as stacks. The queue can be implemented with a variable length array or a mapping. OneSwap does not directly use stacks and queues, so we will not introduce them in detail in this article. You can refer to this article to know more about the implementation of these two data structures.

Conclusion

Ethereum provides a huge KVStore for each smart contract deployed on it, and the state of the contract is stored in the KVStore. The underlying EVM provides SSTORE and SLOAD instructions to read and write this KVStore, and these two instructions are relatively Gas-consuming. In the Solidity contract, we can define various types of state variables, including value types, structure types, fixed-length or variable-length arrays, and mappings. Whatever type of state variable, it must be stored in the same KVStore. To use this storage space properly and efficiently, the Solidity compiler has been optimized a lot. However, the contract author must also carefully arrange various state variables.

This article introduces the underlying mechanisms of various state variables in Solidity language, how to construct data structures such as Set and Linked List based on arrays and mappings, and the application of these data structures in the OneSwap project. For more information about OneSwap, please follow our subsequent articles.