big data

yueyuan
1 min readJan 15, 2021

Name node: what keeps track of what’s on all the data nodes and individual data nodes are ultimately what your client application will be talking to.

Using hdfs: UI(ambari)

Why map reduce?

  • Distribute the processing of data on your cluster
  • Divide your data up into partitions that are mapped (transformed) and reduced (aggregated) by mapper and reducer functions you define
  • Resilient to failure -an application master monitors your mappers and reducers on each partition

mapper <k,v> → shuffle and sort →reducer (count)

Hive is not suitable for online transaction (OLTP) processing. It’s not suitable for being hit with tons of queries all at once, from a website or something like that ==> hbase.

NOsql:

— large scale data

— fast transaction

Hbase

  • build on hdfs
  • web service
  • high transaction rate
  • <key, value> storage
  • sparse data -> column family
  • horizontally scalability
  • each cell can have many versions as timestamps

Mongodb

  • 无需要跨文档或跨表的事务及复杂的join查询支持 // 目前已经支持事务,join的支持也越来越好。
  • 敏捷迭代的业务,需求变动频繁,数据模型无法确定
  • 存储的数据格式灵活,不固定,或属于半结构化数据
  • 业务并发访问量大,需数千的QPS
  • TB级以上的海量数据存储,且数据量不断增加
  • 要求存储的数据持久化、不丢失
  • 需要99.999%的数据高可用性
  • 需要大量的地理位置查询、文本查询

--

--