What is Protobuf?
Protocol Buffers (protobuf) is a language-neutral, binary serialization format developed by Google. You define your data structures in.proto files using a schema language, then generate code in your target language from that schema. The generated code handles serialization (converting structured data into bytes) and deserialization (converting bytes back into structured data).
A simple protobuf message looks like this:
Why the Cosmos SDK uses protobuf
The Cosmos SDK uses protobuf for a fundamental reason: consensus requires determinism. Every validator in the network independently executes each block. After execution, each validator computes the app hash, a cryptographic hash of the application state. For validators to agree on the app hash, they must all produce exactly the same bytes for every piece of state they write. Protobuf alone does not guarantee this. The Cosmos SDK uses protobuf with additional deterministic encoding rules formalized in ADR-027 (Deterministic Protobuf Serialization). ADR-027 specifies constraints such as requiring fields to appear in ascending field-number order and varint encodings to be as short as possible. The SDK validates incoming transactions against these rules before processing them, so a non-deterministically encoded transaction is rejected rather than producing divergent state. Every validator encoding the same data under these rules produces an identical byte sequence. Beyond determinism, protobuf provides:- Compact encoding: binary wire format is smaller than JSON or XML, which matters for transaction throughput and block size.
- Schema evolution: fields can be added or deprecated without breaking existing clients, which is critical for chain upgrades.
- Code generation:
.protofiles generate Go structs, gRPC service stubs, and REST gateway handlers automatically. - Cross-language support: clients in any language can interact with the chain by generating code from the same
.protofiles.
Binary and JSON encoding
The Cosmos SDK uses protobuf in two encoding modes: Binary encoding is the default for everything that participates in consensus: transactions written to blocks, state stored in KV stores, and genesis data. Binary encoding is compact and deterministic. When a transaction is broadcast to the network, it travels as protobuf binary. When a module writes state, it serializes values to protobuf binary before callingSet on the store.
JSON encoding is used for human-readable output: the CLI, gRPC-gateway REST endpoints, and off-chain tooling. The Cosmos SDK uses protobuf’s JSON encoding (ProtoMarshalJSON) rather than standard Go JSON, which preserves field names from the .proto schema and handles special types like Any correctly.
It is important to keep in mind that binary encoding is consensus-critical. Two validators must produce identical binary bytes for identical data. JSON is only used where humans or external clients need to read the data; it never influences the AppHash.
genesis.json, but during chain initialization InitGenesis deserializes that JSON into protobuf structs and writes them to the KV store as binary. The KV store (and therefore the AppHash) only ever contains the binary form.
Transaction encoding
Transactions are protobuf messages defined incosmos.tx.v1beta1. A transaction is composed of three parts:
- TxBody contains the messages to execute, serialized as
repeated google.protobuf.Any messages. - AuthInfo contains signer information (including the per-signer sequence number) and fee.
- signatures contains the cryptographic signatures, one per signer.
google.protobuf.Any values so that a single transaction can contain multiple message types from different modules.
When a user submits a transaction, the SDK encodes it as a TxRaw—a flat structure with the TxBody bytes, AuthInfo bytes, and signatures already serialized. It then broadcasts that binary representation over the network.
Transaction signing and SignDoc
Transactions are not signed directly. Instead, the SDK constructs a deterministic structure called a SignDoc, which defines exactly what bytes the signer commits to:
SignDoc is serialized to protobuf binary and then signed with the user’s private key:
SignDoc is serialized deterministically, all validators verify the exact same bytes when checking transaction signatures. The per-signer sequence number lives in AuthInfo.SignerInfo.sequence and is included in auth_info_bytes, which is part of SignDoc—this is what prevents replay attacks.
Sign modes
A sign mode determines what bytes a signer commits to when signing a transaction. The SDK supports multiple sign modes to accommodate different clients and hardware:-
SIGN_MODE_DIRECT(default): the signer signs over the protobuf-binary-serializedSignDocdescribed above. This is compact, deterministic, and the correct choice for all new development. -
SIGN_MODE_LEGACY_AMINO_JSON: the signer signs over an Amino JSON-encodedStdSignDocinstead of the protobufSignDoc. This exists for backward compatibility with hardware wallets (e.g., older Ledger firmware) and client tooling that predates protobuf. New modules and chains should not depend on it. -
SIGN_MODE_TEXTUAL: the signer signs over a human-readable CBOR-encoded representation of the transaction, designed to display legibly on hardware wallet screens (introduced in v0.50, see ADR-050). This is the SDK’s newer direction for human-readable signing on hardware wallets, intended to replaceSIGN_MODE_LEGACY_AMINO_JSONover time. Its specification is versioned and has evolved across SDK releases. -
SIGN_MODE_DIRECT_AUX: allows N-1 signers in a multi-signer transaction to sign over onlyTxBodyand their ownSignerInfo, without specifying fees. The designated fee payer signs last usingSIGN_MODE_DIRECT. This simplifies multi-signature UX.
signing.proto.
For module developers:
SIGN_MODE_DIRECT requires no extra work. If you want your module’s messages to be signable on Ledger hardware wallets using SIGN_MODE_LEGACY_AMINO_JSON, register your message types with the Amino codec via RegisterLegacyAminoCodec in your module’s codec.go.How protobuf is used in modules
Most public and persisted data types in modern SDK modules are defined in.proto files and serialized with protobuf. This covers the core API surface: transaction messages, query request/response types, stored state values, and genesis state.
Messages and transactions
Each module defines its transaction messages in atx.proto file. The MsgSend definition above is an example. When a user submits a transaction, the SDK serializes the transaction body (including its messages) to binary using protobuf before broadcasting it.
[todo: link to tutorial tx.proto section]
Queries
Modules define their query services inquery.proto. Request and response types are protobuf messages. The SDK uses gRPC for queries, and gRPC uses protobuf as its serialization format by definition.
[todo: link to tutorial query proto section]
State types
Data stored in the KV store is protobuf-encoded. A module that stores a custom struct first marshals it to bytes using the codec, then writes those bytes to the store. When reading, it unmarshals the bytes back into the struct. Note that only values are protobuf-encoded; keys are manually constructed byte sequences, not protobuf. Key layout is covered in the State, Storage, and Genesis section.Genesis
Genesis state is defined ingenesis.proto. InitGenesis and ExportGenesis use protobuf to deserialize genesis state from genesis.json and serialize it back.
A concrete example shows how a module reads and writes typed state as bytes:
k.cdc) is the protobuf codec described in the next section.
The codec and interface registry
The Cosmos SDK wraps protobuf in a codec that modules use for marshaling and unmarshaling. The primary implementation isProtoCodec, which calls protobuf’s Marshal and Unmarshal under the hood.
Interface types and Any
Protobuf is strongly typed. You cannot store a field as “some implementation of an interface” directly in a protobuf message. The Cosmos SDK solves this using protobuf’s google.protobuf.Any, which wraps an arbitrary message type alongside a URL that identifies what type it contains.
Any is used anywhere the SDK needs to serialize a value whose concrete type is not known at compile time. The most common example is public keys. An account might use a secp256k1 key, an ed25519 key, or a multisig key. The BaseAccount stores the public key as Any:
Any field holds the serialized public key bytes plus a type URL like /cosmos.crypto.secp256k1.PubKey. When the SDK reads the account, it uses the type URL to look up the concrete Go type, then unmarshals the bytes into that type.
Messages inside transactions
Transaction messages are the most common use ofAny in the SDK. A transaction can carry multiple message types from different modules (bank.MsgSend, staking.MsgDelegate, gov.MsgVote) in a single TxBody. Because protobuf requires concrete types at the field level, each message is packed into an Any before being placed inside the transaction:
type_url, looks up the concrete type in the interface registry, and unmarshals the bytes into the correct message struct. This is why every sdk.Msg implementation must be registered with RegisterInterfaces before the application starts.
This lookup is handled by the interface registry.
Interface registry
TheInterfaceRegistry is a runtime map from type URLs to Go types. When the SDK encounters an Any value, it queries the registry with the type URL to find the concrete Go type, then uses protobuf to unmarshal the bytes.
Any values. This is why types must be explicitly registered before they can be deserialized.
Registering interface implementations
Because the interface registry is a runtime lookup table, every concrete type that implements an SDK interface must be registered before the application starts. This is done withRegisterInterfaces:
PubKey interface can be a secp256k1.PubKey or an ed25519.PubKey.” If a type is used in an Any field anywhere in the application and is not registered, the codec will fail to unmarshal it and return an error.
Each module calls RegisterInterfaces during app initialization, and app.go calls these registration functions through the module manager when building the app. Custom types that implement SDK interfaces must follow the same pattern.
codec.go
By convention, modules collect all codec registration in a single file: x/mymodule/types/codec.go. This file typically contains two functions:
RegisterInterfaces is required for every module that defines message types. Without it, the SDK cannot decode those messages from transactions. RegisterLegacyAminoCodec is optional and only needed for Ledger hardware wallet support via SIGN_MODE_LEGACY_AMINO_JSON.
Proto-to-code generation workflow
Writing.proto files produces .pb.go files through a code generation step. The generated Go code contains struct definitions, marshal/unmarshal methods, and gRPC service stubs. You never edit these generated files directly.
The workflow is:
1. Write the .proto file
Proto files for a module live in the proto/ directory at the repository root:
buf (or protoc with plugins) against the .proto files and produces Go code under the module’s types/ directory:
proto.Message and can be passed directly to the codec for marshaling, registered with the interface registry, and used in keeper methods and message handlers:
Legacy Amino encoding
Before protobuf, the Cosmos SDK used a custom serialization format called Amino for transaction encoding, JSON signing documents, and interface serialization. Protobuf has replaced it in all of those roles. TheLegacyAmino codec still exists for backward compatibility, but is not used in the consensus-critical path.
Some legacy components still reference it:
LegacyAminois still present in the codec package for backward-compatibilityLegacyAminoPubKey(multisig) is registered alongside protobuf public key types- Some older chains, hardware wallets, and client tooling depend on Amino JSON signing
Encoding in context
Every layer of the Cosmos SDK depends on encoding:sdk.Context, gas metering, and events.