In every Cryptocurrency, each transaction included in the ledger needs to be uniquely identified by an id, which is generally a hash digest of the transaction data. A cryptocurrency provides transaction signature segregation if it excludes the transaction signature when computing the transaction id. This has two benefits: it allows signatures to be removed from storage to save hard-disk space and it prevents transaction id malleability. Malleability arises because signatures can sometimes be modified but still be valid. The original Bitcoin protocol did not segregate the signatures. Signature segregation was introduced recently to Bitcoin with a soft-fork, by the name of Segregated Witness or Segwit. With Segwit not only the signatures, but all authentication data that is required to verify a scriptPub (or “witness” data) is segregated. The segregated data includes selectors for OP_IF script conditionals, public keys for P2PKH addresses or entire scripts for P2WSH addresses. The designers of Segwit decided to segregate the whole scriptSig because it simplified the transaction format, however there are very good reasons not to mix signatures with other components of a scriptSig. Back in 2014 I proposed to add to Bitcoin the OP_PUSHSIG opcode, in order to segregate the signatures in a signature-only stack. On the other hand, in RSK and Ethereum, signatures were segregated from the start. We’ll present in this post one strong reason why it’s good to segregate signatures, and why it’s good to keep them separated from everything else.
In the LTCP whitepaper we showed how the segregation of the signature allows compression of transactions up to 20x (shrinking to 5% of the original size) for use cases where the same user account performs frequent payments. The highest compression ratios are reached when the destination also repeats, as is the case with payment channels of the Lumino network, a lighting-network like layer for the RSK blockchain. If don’t know about LTCP, this is what LTCP does in a nutshell: LTCP enables network nodes to remove from the blockchain old transaction signatures when a new signed transaction from the same source also signs the old transactions. This can be defined as chained (or Merkelized) signature aggregation.
As previously said, an additional compression is reached by reusing the same destination addresses for many transactions, avoiding storing them them explicitly in each transaction. it can be argued that reusing the same account and destination reduces user privacy. We argue that when LTCP is used to top up or settle payment channels of the Lumino network this reduction in privacy is minimal, because on-chain transactions are only indicators of the intermediate transaction volume, but final destination and amounts are still handled by off-chain transactions. Also the user can use different payment channels for different privacy domains, and use transaction tumblers between privacy domains. Last, because Lumino transactions are off-chain and multi-hop, it’s possible to create paths that do not disclose to a single node information about origin and destination of a transaction even if the same payment channel is reused.
From Same-origin to Multi-origin
LTCP’s same-origin compression is only one of the compression opportunities. Another is multi-origin compression. Multi-origin compression would, in theory, take transactions from different origins and compress them by reducing redundant information. We have three possibilities here: using standard data compression techniques, aggregating signatures and removing intermediate transfer hops. The highest compression can be obtained from aggregating signatures.
In RSK/Ethereum, a transaction signature is about 64 bytes in length, and this represents about 60% of the transaction size. By aggregating signatures we could shrink the blockchain considerably, and also reduce the computation time for new nodes syncing old parts of the blockchain because less signatures must be verified. However since RSK recovers the source account address (and public key) of a transaction from the signature, and since an aggregated signature cannot contain the information of all aggregated public keys, an aggregated transaction would need to specify the public keys involved. As a public key is 32 bytes in length, the 60% signature overhead cannot be fully eliminated, but only reduced to 30%. To improve this, the platform could store the account public keys in account records, and specify signatories with an unique address prefix. This would enable a reduction of about 50% of the transaction transaction size, on average. But how do we aggregate signatures from different sources? By using more cryptography, of course.
Cryptographic Signature Aggregation
Cryptographic signature aggregation is an old, well studied subject. In Bitcoin, the first use case of this technique was to cryptographically aggregate the signatures of a single transaction. Standard signature aggregation protocols have one drawback: all parties that want to aggregate signatures must take part in an multi-round off-chain protocol to build a special shared transaction from individual inputs and outputs. Aggregating the signatures of transaction inputs oneself controls is easy, but does not lead to high space savings, because standard transactions have few inputs. Creating transactions that aggregate inputs and outputs for hundreds of parties opens up the opportunity to also aggregate signatures. Space savings now become relevant, possibly about 30%. However multi-party aggregation brings several new problems:
(1) Users generally want to have their transactions confirmed ASAP. Delaying the confirmation because the inputs must be co-signed, mixed or aggregated may become annoying, unless there is a huge amount of parties with pending transactions willing to participate. For example, some cryptocurrency exchanges delay user transactions to aggregate their inputs in a single Bitcoin transaction and pay lower fees. This also results in uncertainties for the user, because the transaction id is not available immediately.
(2) Executing a co-signing multi-party protocol generally requires disclosing machine IPs, hurting user privacy. Therefore users should execute the protocol over Tor or other IP de-anonymization networks every time a transactions has to be broadcast.
(3) The co-signing protocol must be protected from DoS attacks: a malicious party may interfere with the protocol or drop out in the middle to prevent the rest from finishing it. Therefore participants need to have pseudonymous or provide a PoW-based identity.
(4) Reputation scores and punitive measures like IP/identity banning must be used against malicious parties. Tor exit nodes are problematic because they should not be banned, as they may serve multiple users.
It easy to see that solving the signature aggregation problem introduces many additional risks. However signature aggregation can be applied in other forms. The NimbleWimble design can aggregate signatures from different transactions, and remove intermediate hops. This is a huge benefit for scaling and for privacy, but the drawback is that many other scripting features must be abandoned. There was a proposal for Bitcoin to partially aggregate Schnorr signatures non-interactively. The aggregation is performed by the block miner, after transactions have been constructed and signed. The original proposal was later improved to prevent a known vulnerability, but the space and computation savings of this technique are limited. Still, non-interactive aggregation has enormous technical benefits.
Basic Signature Aggregation in RSK
RSK can be easily extended to allow signature aggregation by extending the transaction format, allowing a “batch transaction” to contain multiple sub-transactions, and a single aggregated signature for all sub-transactions. But as previously stated, the ECDSA public key recovery method is used to extract the source address from a transaction signature. It’s not possible to recover all signer’s addresses from an aggregated signature. Therefore each sub-transaction must specify the sender (as an address or public key), and the space savings are halved.
A detailed analysis shows that with basic signature aggregation using batch transactions the gas cost of each sub-transaction could be reduced from 21K to 15.1K: a modest 28% reduction. The cost of the off-chain protocol execution is not taken into account. Yet, batch transactions have the drawbacks of the off-chain co-signing protocol: privacy loss, transaction delays, and cheap DoS attacks.
We propose an innovative solution for delayed interactive full signature aggregation that has most of the benefits of a non-interactive full signature aggregation protocol, even if it is interactive.
Delayed Signature Aggregation (DSAG)
LTCP has the nice property of providing amortized (delayed) space savings. The first transaction does not benefit from compression, but as soon as a second transaction is created the system can strip the signature from the first and leave only the second. The second transaction pays less fees, because the size of the first transaction signature is subtracted from the size of the second transaction.
LTCP requires the blockchain synchronization method to be modified so that blocks remain not fully verified until the delayed signature appears in a late block.
We’ll adapt LTCP idea of delayed signature removal to multiple originators for RSK. Instead of creating a transaction with segregated signatures initially, we let users transact normally and afterwards, for a limited period of time, we let users create a special transaction with an aggregated signature. This transaction, if valid, will allow the removal of previous signatures. We call this special transaction the Delayed Signature Aggregation Transaction or DSAT.
Delayed Signature Aggregation for Bitcoin
Delayed Signature Aggregation in Bitcoin does not yield any benefit because of Bitcoin’s UTXO model. If the reimbursement to an originator for the space of a single signature is performed by creating a new UTXO, the output amount would be tiny, comparable to the cost of 100 bytes of transaction data. But the created output would consume at least 30 additional bytes and to consume the output the user needs to add to a transaction an input consuming more than 100 bytes. Therefore it’s uneconomical to create the reimbursements as UTXOs. However, in RSK account model, we can reimburse efficiently.
Delayed Signature Aggregation for RSK
To apply Delayed Signature Aggregation to RSK, we must make two modifications to the platform. First we must enable accounts controlled by Schnorr signatures. Second, we introduce a new kind of transaction: the DSAT. The DSAT contains three fields:
- sigIndexList: A list of tuples (blockNum, txNum). Each tuple uniquely specifies a transaction that originates from an account controlled by a Schnorr signature.
- aggregatedSignature: A Schnorr signature of a last block hash, where the signature corresponds to a private key that results from the addition of all de-linearized private keys of all signatures in the sigIndexList.
- A gasPrice
To prevent that a DSAT transaction becomes invalid too often because of a reorganization of the blockchain, nodes will only broadcast DSAT transactions that reference in sigIndexList other blocks which are not the last 30. Therefore transactions in a window of 30 blocks can be aggregated.
Using standard delta compression techniques, each of the two pointer elements in a SegIndexList tuple could be as short as 2 bytes, and even less using arithmetic or Huffman coding.
When a user Alice wants to enable signature aggregation for an account with address A, a first transaction T originating from A must be confirmed in a block. The transaction T must have the null address as destination, and no data. When executed, T stores A’s public key in A’s account record. This enables aggregation for the account A. At a later time, Alice will create and broadcast a normal transaction Tx1 originating from the account A. The transaction Tx1, having a signature S, will be included in a block and the block will be broadcast as normal. Later on, Alice will collaborate to create and broadcast the transaction DSAT. This transaction will also be confirmed by the miners. The DSAT transaction cannot be included later than M blocks from the oldest transaction to aggregate. We will use M=60 here (approximately 10 minutes in RSK). When a full node receives a block at height N containing a DSAT transaction, it will:
- Scan all transactions referenced by the DSAT and collect all public keys from the addresses involved.
- Associate each transaction referenced in the DSAT with its respective account addresses (E.g. A is associated with transaction Tx1).
- Verify the aggregated signature.
- If the block at height N-60 was not previously executed in SPV mode, it is executed now.
- After the block containing the DSAT receives enough confirmations (e.g. more than 1000), the signatures referenced by this DSAT are permanently removed (e.g. the signature S).
After 1000 blocks have past since a DSAT transaction, the signatures that were aggregated ar removed from the transactions. Therefore blocks containing removed transactions will be incomplete. Transactions whose signatures have been removed will be called incomplete transactions. Blocks with incomplete transactions will be called incomplete blocks. Incomplete blocks can not be executed until the DSAT transaction is received. Therefore nodes syncing old parts of the blockchain must have a “look-ahead buffer” of 60 blocks.
Fees and Reimbursements
For each transaction whose signature is stripped, a reimbursement will be paid to the transaction originator account. The amount corresponds to the gas price of the bytes saved (68 each) multiplied by the gas price paid. All available reimbursements for the set of transactions in a block B must be temporarily held in a locked account, and only paid to the the miner of B after the 60 block period has elapsed. This prevents the available reimbursements to be spent by the miner before the DSAT transaction is published. Alternatively, the whole block reward payment can be delayed 60 blocks and then paid subtracting the reimburstments that became effective. This delay of payments to miners is similar to Bitcoin coinbase maturity period.
Finally, since all the originators are accounts (not smart-contracts), the DSAT transaction cost can be paid evenly by all accounts.
Execution Costs of the DSAT
Now we analyze the cost savings provided by DSAT. We have to decompose RSK transaction cost into its components. A standard RSK transaction costs 21K gas, which is the composition of the following approximate costs:
- Signature verification costs: 3000 (because ECRECOVER charges 3000 gas, and we assume each full node verifies each signature once). A transaction does not need to be re-verified as long as a signature cache is maintained.
- Bandwidth consumption, opportunity, and disk storage costs: 6800 (because each non-zero data byte cost is 68, and a standard transaction consumes approximately 100 bytes). We can assume 50% of this cost is bandwidth consumption (3400), while the other 50% is disk storage (3400).
- Updating the source and destination balances costs in a data structure hosted in an SSD drive: 9000 (because value transfer in CALL costs 9000). From this we can derive that updating a each balance costs 4500.
- In Ethereum, the balance of the coinbase account must also be updated after each transaction, and therefore has an additional 2200 cost. In RSK, the payment to the miner is delayed and processed at the end of the block, so the transaction cost could be 2200 gas lower.
- Total cost: 21000 gas.
You can see that the DSAT needs to lookup and update all source accounts (3000 cost each), plus reference all aggregated signatures (~136 cost each). This is to save the cost of a signature (4352 cost each). We can see that processing the DSAT transaction in a standard RSK network only saves 6% of the transaction cost, which is insignificant. The DSAG method presented is only beneficial to the network when additional changes to the full node inner workings are applied. The main change is delaying the commitment to disk of the world state changes occurring in the last 60 block. All operations in the last 60 blocks are performed in a RAM cache. This leads to a saving of about 20% of the transaction cost.
To Delay or not to Delay Aggregation
Standard (non-delayed) aggregation is simpler. However there is a one huge advantage of the delayed method over the standard one: the off-chain aggregation protocol is much less susceptible to DoS attacks. This is because the transaction signatures that can be aggregated are the ones used in transactions in the last 60 confirmed blocks, so the participants have already payed a cost to participate in the aggregation protocol in terms of transaction fees. In the case of standard transaction aggregation, a participant may drop out of the protocol before signing the batch message and it cannot be easily penalized.
We have shown two techniques that can reduce transaction costs: standard signature aggregation and the new delayed signature aggregation method (DSAG). We show that standard signature aggregation increases scalability due to bandwidth and signature computation reductions, lowering transaction fees 28% in the RSK blockchain. However, the multi-party signing protocol may be subject to DoS attacks. Contrary, we show that DSAG has the the benefit of reducing the DoS attack risks in the signature aggregation protocol, and can lower transaction fees 20%, if the RSK full node is slightly optimized.