404 Not Found
  • Introduction
  • Monitoring related
    • K8s cluster monitoring
    • Monitor Jenkins with G.A.P on K8s cluster
    • Monitoring tools | projects
      • Grafana
      • AlertManager
      • Prometheus
      • Wavefront
  • Logging related
    • BOSH logs
    • How to gather systemd log
    • K8s cluster logging
    • Logging tools | projects
      • vRealize Log Insight
      • Fluentd
      • syslog vs fluentd
  • Having fun with docker
    • Using docker-compose for redmine
    • Customize Fluentd docker image
  • K8S or Apache Mesos
  • K8S Related
    • Main Architecture
      • Master components
        • API Server
        • etcd
        • Controller Manager
        • Kube Scheduler
      • Worker components
        • kubelet
        • kube-proxy
    • K8S Storage
      • Volume Provisioning
      • Understand CSI
      • How to write CSI
      • VMware CNS
      • K8S storage e2e experiment under VMware vSphere
      • Experiment on Persistent Volume Access Mode
      • Design: Storage in Cluster-API architecture
    • K8S Networking
      • Ingress
      • Endpoints
    • K8S Policies
      • Resource Quotas
    • K8S Management Platform
    • K8S Tests Tool
    • K8S Extension
      • CRDs
        • Custom Resources
        • Custom Controllers
        • How to user code-generator
        • K8S Operators
        • Operators Development Tools
          • Kubebuilder
          • Metacontroller
          • Operator SDK
      • Custom API Server
    • K8S Resource CRUD Workflow
    • K8S Garbage Collection
  • K8S CONTROLLER RELATED
    • IsController: true
    • Controller clients
  • PKS RELATED
    • How to Access VMs and Databases related to PKS
    • PKS Basics
    • BOSH Director
    • Backup and Restore on Ent. PKS with Velero
  • CICD RELATED
    • Configure Jenkins to run on K8S
    • Customize Jenkins JNLP slave image
    • Jenkins global shared libs
  • Google Anthos
    • Google Anthos Day from KubeCon 2019 San Diego
    • Migrate for Anthos
    • Config Connector
  • SYSTEM DESIGN RELATED
    • Design Data Intensive Application - Notes
      • RSM
        • Reliability
        • Scalability
      • Data models and Query Languages
      • Storage and Retrieval
    • How Alibaba Ensure K8S Performance At Large Scale
  • Miscellaneous
    • Knative
    • Serverless
    • Service Mesh
    • gRPC
    • Local persistent volumes
    • ownerReferences in K8S
    • File(NAS) vs Block(SAN) vs Object storage
    • KubeVirt
    • Why K8S HA chooses 3 instead of 5..6..7 as the size of masters?
    • goroutine & go channel
    • How to make docker images smaller
Powered by GitBook
On this page
  • Document model
  • Relational model
  • Graph model
  • Property Graphs
  • Triple store

Was this helpful?

  1. SYSTEM DESIGN RELATED
  2. Design Data Intensive Application - Notes

Data models and Query Languages

Document model

The data model for document oriented information. Like JSON formatted data, XML formatted data, etc.

{
    "user_id": "1",
    "first_name": "Daniel",
    "last_name": "Guo",
    "education": [
        {
            "school_name": "Carnegie Mellon University",
            "start": "2011",
            "end": "2013",
        },
        {
            "school_name": "Beijing Jiaotong University",
            "start": "2007",
            "end": "2011",
        }
    ],
}

Pros:

  • Schema flexibility (schema on read)

    As No-SQL database model, there is no need to create schema

  • Better performance due to the locality

    Like the JSON document above, all information is local, it does not require multiple queries to fetch the information needed. However, in relational model, multiple queries or join is needed.

Cons:

  • It is hard to represent many-to-many or many-to-one relationships. Like, many people live in one city.

  • Document databases do not have a better support for Joins than relational databases.

Relational model

The data model for relational information. like tables in relational database.

Pros:

  • Better support of joins

  • Better support of many-to-many, many-to-one relationships

Cons:

  • Object relational mismatch. For developer, you usually have to write a translational layer to match the data model in table, even there are some framework could do the job, like Hibernate.

  • Have to predefine the database schema(schema on write), and not that easy to be change in the future.

Graph model

Pros:

  • Any kind of vertex can connect to any other kind of vertex

  • Both forward and backward traverse are efficient. Since you have both incoming and outgoing edges

  • With labels, you could store different kinds of vertexes in a single graph, and still knows the relationships

Cons:

  • TBA

Property Graphs

# Vertex
{
    id Int;
    outgoing_edges Set;
    incoming_edges Set;
    properties     Map<key, value>;
}

# Edges
{
    id Int;
    start_vertex Vertex;
    end_vertex   Vertex;
    label        String; # relationships between two edges
    properties   Map<key, value>;
}

Triple store

In a triple-store, all information is stored in the form of very simple three-part statements: ( subject , predicate , object ). For example, in the triple ( Jim , likes , bananas ), Jim is the subject, likes is the predicate (verb), and bananas is the object. The subject of a triple is equivalent to a vertex in a graph. The object is one of two things: A value in a primitive datatype, such as a string or a number. In that case, the predicate and object of the triple are equivalent to the key and value of a property on the subject vertex. For example, ( lucy , age , 33 ) is like a vertex lucy with properties {"age":33} . Another vertex in the graph. In that case, the predicate is an edge in the graph, the subject is the tail vertex, and the object is the head vertex. For example, in ( lucy , marriedTo , alain ) the subject and object lucy and alain are both vertices, and the predicate marriedTo is the label of the edge that connects them.

PreviousScalabilityNextStorage and Retrieval

Last updated 5 years ago

Was this helpful?