etcd api client 请求重试逻辑

使用 Consul 作为配置中心,按照官方的说法,没必要创建 consul client 节点。那么直接连 consul server 就好了。

Running an agent is not required for discovering other services or getting/setting key/value data. The agent is responsible for health checking the services on the node as well as the node itself.

https://www.consul.io/intro/index.html#basic-architecture-of-consul

Consul api client (https://github.com/hashicorp/consul/tree/master/api) 目前只能接收一个server地址。那么这个server地址得保证高可用才行啊。

etcd api client (https://github.com/etcd-io/etcd/tree/master/client) 倒是能接收多个server地址,看看 etcd 是怎么做的。

etcd api client

创建了 httpClusterClient。

../../images/image-20200417151856806.png

机器学习在线推理部署方案:Cortex

Cortex 介绍

官方网站:Deploy machine learning models in production https://www.cortex.dev/

GitHub https://github.com/cortexlabs/cortex

The CLI sends configuration and code to the cluster every time you run cortex deploy. Each model is loaded into a Docker container, along with any Python packages and request handling code. The model is exposed as a web service using Elastic Load Balancing (ELB), TensorFlow Serving, and ONNX Runtime. The containers are orchestrated on Elastic Kubernetes Service (EKS) while logs and metrics are streamed to CloudWatch.

k8s 应用日志收集

k8s 日志收集架构

以下是比较一般、普适的架构。更多参考:Kubernetes 日志架构 https://kubernetes.io/zh/docs/concepts/cluster-administration/logging/

../../images/logging-with-node-agent.png

  1. 容器化应用将日志写入stdoutstderr
  2. Docker容器引擎将stdoutstderr流重定向到日志驱动,比如默认的json-file。
  3. json-file日志驱动将日志写入到(宿主机上的)文件。
  4. 日志收集工具以DaemonSet的形式安装在每个节点。
  5. 日志收集工具监听文件变化,并将日志写入到日志中心服务。

k8s 日志收集细节

实战

可以直接参考以下教程:minikube创建了一个Kubernetes集群,Fluentd收集日志,存入ElasticSearch,使用Kibana查看日志。典型的EFK技术栈。

  1. Logging in Kubernetes with Elasticsearch, Kibana, and Fluentd https://mherman.org/blog/logging-in-kubernetes-with-elasticsearch-Kibana-fluentd/

在Kibana上看收集到的日志。能看到日志收集工具也采集了容器、镜像、Pod有关的信息。这些上下文信息能让人定位到是哪个应用在生产日志。

../../images/image-20200414104511399.png

fluentd 收集上下文信息

Docker json-file 日志驱动写文件,并不记录上下文信息。 https://docs.docker.com/config/containers/logging/json-file/

{"log":"Log line is here\n","stream":"stdout","time":"2019-01-01T11:11:11.111111111Z"}

上文中使用的日志收集镜像是 fluent/fluentd-kubernetes-daemonset:v1.3-debian-elasticsearch

具体代码路径在此 https://github.com/fluent/fluentd-kubernetes-daemonset/tree/master/docker-image/v1.3/debian-elasticsearch

收集容器目录下的日志。

../../images/image-20200414105706563.png

使用kubernetes_metadata这个第三方插件获取容器相关的上下文信息。这里是通过请求API server得到metadata的。

Metabase + Spark SQL

这是大数据 BI 平台的第二步,BI 工具的搭建。假设已经配置好 Spark SQL JDBC Server,并启用了Kerberos。参考 https://xujiahua.github.io/posts/20200410-spark-thrift-server-cdh/

这里,我们选择了开源产品 Metabase。

最终,大数据 BI 平台,是由 1) 以Metabase作为BI可视化,2) 由HDFS(分布式文件存储) + parquet(列式数据存储格式)+ Hive metastore(SQL表结构信息维护) + Spark SQL(批处理引擎)组合的OLAP数据库组成。

Metabase 简介

Metabase is the easy, open source way for everyone in your company to ask questions and learn from data.

https://www.metabase.com/

数据库支持

  • BigQuery
  • Druid
  • Google Analytics
  • H2
  • MongoDB
  • MySQL/MariaDB
  • PostgreSQL
  • Presto
  • Amazon Redshift
  • Snowflake
  • Spark SQL
  • SQLite
  • SQL Server

https://www.metabase.com/docs/latest/faq/setup/which-databases-does-metabase-support.html

CDH6 启用 Spark Thrift Server

很遗憾,CDH版本的Spark阉割了Thrift Server。(可能与自家产品Impala有竞争关系的原因。)

参考 https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/spark.html#spark__d99299e107

# ll /opt/cloudera/parcels/CDH/lib/spark/sbin/
total 84
-rwxr-xr-x 1 root root 2803 Nov  9 00:05 slaves.sh
-rwxr-xr-x 1 root root 1429 Nov  9 00:05 spark-config.sh
-rwxr-xr-x 1 root root 5689 Nov  9 00:05 spark-daemon.sh
-rwxr-xr-x 1 root root 1262 Nov  9 00:05 spark-daemons.sh
-rwxr-xr-x 1 root root 1190 Nov  9 00:05 start-all.sh
-rwxr-xr-x 1 root root 1274 Nov  9 00:05 start-history-server.sh
-rwxr-xr-x 1 root root 2050 Nov  9 00:05 start-master.sh
-rwxr-xr-x 1 root root 1877 Nov  9 00:05 start-mesos-dispatcher.sh
-rwxr-xr-x 1 root root 1423 Nov  9 00:05 start-mesos-shuffle-service.sh
-rwxr-xr-x 1 root root 1279 Nov  9 00:05 start-shuffle-service.sh
-rwxr-xr-x 1 root root 3151 Nov  9 00:05 start-slave.sh
-rwxr-xr-x 1 root root 1527 Nov  9 00:05 start-slaves.sh
-rwxr-xr-x 1 root root 1478 Nov  9 00:05 stop-all.sh
-rwxr-xr-x 1 root root 1056 Nov  9 00:05 stop-history-server.sh
-rwxr-xr-x 1 root root 1080 Nov  9 00:05 stop-master.sh
-rwxr-xr-x 1 root root 1227 Nov  9 00:05 stop-mesos-dispatcher.sh
-rwxr-xr-x 1 root root 1084 Nov  9 00:05 stop-mesos-shuffle-service.sh
-rwxr-xr-x 1 root root 1067 Nov  9 00:05 stop-shuffle-service.sh
-rwxr-xr-x 1 root root 1557 Nov  9 00:05 stop-slave.sh
-rwxr-xr-x 1 root root 1064 Nov  9 00:05 stop-slaves.sh

可见,没有Thrift Server的启动脚本。

Docker日志驱动小结

docker logskubectl logs能看到Docker容器的标准输出、标准错误,方便定位问题。而 xxx logs之所以能看到,是因为标准输出、标准错误存储在每个容器独有的日志文件中。

另外日志量大了,用docker logs看历史数据不大合适。我们就需要考虑将日志存储到日志中心去。

Docker默认支持如下日志驱动。有直接写文件的,有使用云服务的。下面简单介绍下。

../../images/Screen-Shot-2017-09-11-at-3.08.50-PM.png

credit: https://jaxenter.com/docker-logging-gotchas-137049.html

官方文档 https://docs.docker.com/config/containers/logging/configure/

默认驱动:json-file

默认的Logging Driver是json-file。docker info可以查看。全局的日志驱动设置,可以修改daemon配置文件 /etc/docker/daemon.json

写入文件的日志格式长这样:{"log":"Log line is here\n","stream":"stdout","time":"2019-01-01T11:11:11.111111111Z"},每一行是一个json文件,log字段为容器原来输出的每行内容。

默认配置,创建的容器的信息在这个目录下: /var/lib/docker/containers

实验

root@ubuntu-parallel:~# docker run --name default_logging_driver hello-world

root@ubuntu-parallel:~# cd /var/lib/docker/containers/$(docker ps --no-trunc -aqf "name=default_logging_driver")

root@ubuntu-parallel:~# cat $(docker ps --no-trunc -aqf "name=default_logging_driver")-json.log
{"log":"\n","stream":"stdout","time":"2020-04-02T01:46:54.096347888Z"}
{"log":"Hello from Docker!\n","stream":"stdout","time":"2020-04-02T01:46:54.096377382Z"}
{"log":"This message shows that your installation appears to be working correctly.\n","stream":"stdout","time":"2020-04-02T01:46:54.096381118Z"}
{"log":"\n","stream":"stdout","time":"2020-04-02T01:46:54.096383725Z"}

https://docs.docker.com/config/containers/logging/json-file/