404 Not Found
  • Introduction
  • Monitoring related
    • K8s cluster monitoring
    • Monitor Jenkins with G.A.P on K8s cluster
    • Monitoring tools | projects
      • Grafana
      • AlertManager
      • Prometheus
      • Wavefront
  • Logging related
    • BOSH logs
    • How to gather systemd log
    • K8s cluster logging
    • Logging tools | projects
      • vRealize Log Insight
      • Fluentd
      • syslog vs fluentd
  • Having fun with docker
    • Using docker-compose for redmine
    • Customize Fluentd docker image
  • K8S or Apache Mesos
  • K8S Related
    • Main Architecture
      • Master components
        • API Server
        • etcd
        • Controller Manager
        • Kube Scheduler
      • Worker components
        • kubelet
        • kube-proxy
    • K8S Storage
      • Volume Provisioning
      • Understand CSI
      • How to write CSI
      • VMware CNS
      • K8S storage e2e experiment under VMware vSphere
      • Experiment on Persistent Volume Access Mode
      • Design: Storage in Cluster-API architecture
    • K8S Networking
      • Ingress
      • Endpoints
    • K8S Policies
      • Resource Quotas
    • K8S Management Platform
    • K8S Tests Tool
    • K8S Extension
      • CRDs
        • Custom Resources
        • Custom Controllers
        • How to user code-generator
        • K8S Operators
        • Operators Development Tools
          • Kubebuilder
          • Metacontroller
          • Operator SDK
      • Custom API Server
    • K8S Resource CRUD Workflow
    • K8S Garbage Collection
  • K8S CONTROLLER RELATED
    • IsController: true
    • Controller clients
  • PKS RELATED
    • How to Access VMs and Databases related to PKS
    • PKS Basics
    • BOSH Director
    • Backup and Restore on Ent. PKS with Velero
  • CICD RELATED
    • Configure Jenkins to run on K8S
    • Customize Jenkins JNLP slave image
    • Jenkins global shared libs
  • Google Anthos
    • Google Anthos Day from KubeCon 2019 San Diego
    • Migrate for Anthos
    • Config Connector
  • SYSTEM DESIGN RELATED
    • Design Data Intensive Application - Notes
      • RSM
        • Reliability
        • Scalability
      • Data models and Query Languages
      • Storage and Retrieval
    • How Alibaba Ensure K8S Performance At Large Scale
  • Miscellaneous
    • Knative
    • Serverless
    • Service Mesh
    • gRPC
    • Local persistent volumes
    • ownerReferences in K8S
    • File(NAS) vs Block(SAN) vs Object storage
    • KubeVirt
    • Why K8S HA chooses 3 instead of 5..6..7 as the size of masters?
    • goroutine & go channel
    • How to make docker images smaller
Powered by GitBook
On this page
  • Log structure storage engine
  • page-oriented storage engine

Was this helpful?

  1. SYSTEM DESIGN RELATED
  2. Design Data Intensive Application - Notes

Storage and Retrieval

Log structure storage engine

For write, append the new data(e.g. key, value pair) into the file.

For read, get the last occurrence from the file.

Hash Index

Maintain a in memory hash table. Key is the index key (e.g. the key if data is key, value pairs). Value is the offset of the value in file. So each time we want to read the data, we could get the file offset from hash table and find the starting offset where the latest data is stored.

Log file segment and compaction

If there are too much data cannot fit into single log file, we could break the log file into segments, and perform the compaction (throw away some duplicate keys). Each compaction makes the segment smaller, so that we could merge small segments into a new log file. Merging could happen in background, so the old segments could still serve the read, for writes it will be appended to new segment. Until the merging is done, old segments could be safely deleted.

Each segment has its own Hash Index Table

Things to consider

  • File format: Binary format would be better. (encode the length of a string in bytes, followed by the raw string)

  • Deleting records: Append a deletion record (tombstone).

  • Crash recovery: Store a snapshot of the in memory hash index table on disk

  • Partially written data: Include a checksum

  • Concurrency control: single writer(to keep the sequence), and multiple reader

Pros of append-only against updating old value

  • Appending operation is faster than random write

  • Crash recovery is much easier, since the log file has all the previous records

  • Merging keeps the log file tidy

Cons of hash index

  • Hash index table must fit in memory

  • Range queries are inefficient

page-oriented storage engine

PreviousData models and Query LanguagesNextHow Alibaba Ensure K8S Performance At Large Scale

Last updated 5 years ago

Was this helpful?