
For Layer 1s to function, they rely on transaction data, smart contract data, program instructions, transaction history, etc. Ensuring the recoverability of this data is key for blockchains to remain functional and immutable. Every blockchain supports data recovery; however, the method of data recovery differs:
RSEC Scheme aids Solana in achieving fast data (shred) propagation throughout the network via the tree structure of Turbine. In this article, we explore what Reed-Solomon codes are and how they play an important role in Solana.
Let’s dive in
The show doesn’t go on because it’s ready; it goes on because it’s 11:30. — Lorne Michael.
Reed-Solomon Erasure Coding Scheme encodes shreds before they are propagated through the Turbine. It is a specific type of Forward Error Correction (FEC) algorithm that uses a polynomial-based error detection and correction scheme. Reed-Solomon-based error correction works by adding extra bits called parity at the end of the actual data and using those extra bits to recover the corrupted data.
The original data, which gets added parity, is called Reed-Solomon-based error correction code and is made up of two components—encoder and decoder.
The architecture of the Reed-Solomon-based error correction system.
The encoder takes data in the form of symbols, where each symbol represents a fixed number of bits (b). It then adds parity symbols to these data symbols. Together, the data and parity symbols form a codeword.
If we have d data symbols in a codeword with a symbol size of b bits, the number of parity symbols that can be added is n−dn - d, where n is the total number of symbols in the codeword. The number of symbol errors that can be corrected depends on how many parity symbols are included.
With n-dn-d parity bits, the decoder can correct up to n-dn-d symbol erasures. If the total number of erasures is less than or equal to n−dn-d, the decoder can successfully recover the original transmitted or stored data. However, if the number of erasures exceeds n−dn - d, the decoder will either indicate that it cannot recover the data or, in some cases, perform incorrect decoding, returning erroneous data without notifying the end user.
In the example of Reed-Solomon erasure coding, we started with three data symbols (D₁ = 3, D₂ = 5, D₃ = 2) and used a coding method to generate two parity symbols (P₁ = 3, P₂ = 5). The parity symbols were calculated based on the data symbols, creating a complete set of five symbols: (3, 5, 2, 3, 5). This encoding ensures that even if some symbols are lost, the original data can still be recovered. In the scenario where D₂ and P₂ were lost, we demonstrated how the remaining symbols, D₁, D₃, and P₁, could be used to recover the missing data, showing how Reed-Solomon codes provide error correction and data recovery capabilities.
During block propagation, the Solana Turbine relies on packet retransmission by validators, who could be malicious by rebroadcasting incomplete data due to network packet loss. Due to the retransmission tree structure of Turbine, any network-wide packet loss is compounded, and the probability of the packet failing to reach its destination increases on each hop.
To counter this, Reed-Solomon erasure coding enables effective error correction, ensuring that data can still be reconstructed correctly even if parts of the data are missing. During the shredding process, it generates recovery shreds that protect the original block data, allowing for its recovery even if some parts are lost or corrupted during transmission.
In this report, we explored the Reed-Solomon Erasure Coding scheme, including its encoder and decoder, and its role in Solana's Turbine block propagation. Additionally, we compared Solana's approach to Ethereum's data recovery mechanism, which utilizes KZG commitments and Data Availability Sampling (DAS).