starrocks是一个mpp的数据库
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
|
StarRocks 是一种新一代极速全场景 MPP(Massively Parallel Processing)数据库,专为高性能数据分析而设计。它的架构融合了 MPP 数据库和分布式系统的设计思想,具有以下特点:
1. **高性能查询**:
- StarRocks 使用全面向量化引擎和基于代价的优化器(CBO),能够实现亚秒级查询速度,特别是在多表关联查询中表现出色。
2. **实时数据分析**:
- 支持实时更新数据并进行高效查询,适用于实时数据仓库和实时指标监控等场景。
3. **灵活的数据建模**:
- 支持多种数据模型,包括宽表、星型模型和雪花模型,满足复杂的数据分析需求。
4. **湖仓一体**:
- 结合数据湖的灵活性和数据仓库的分析能力,提供统一的数据平台,简化数据存储、处理和分析流程。
5. **高并发查询**:
- 通过优化查询调度和资源分配,确保在多用户同时访问时系统能够稳定运行并快速响应。
6. **兼容性**:
- 兼容 MySQL 协议,支持标准 SQL 语法,易于与常用 BI 工具(如 Tableau、Power BI)集成。
StarRocks 的设计目标是让数据分析变得更加简单和敏捷,适用于企业级用户的多种分析需求,包括 OLAP 多维分析、实时数据分析和高并发查询等场景。如果您想了解更多,可以参考 [官方文档](https://docs.starrocks.io/zh/docs/introduction/what_is_starrocks/) 或 [社区资源](https://docs.starrocks.io/zh/docs/introduction/StarRocks_intro/)。
希望这些信息对您有所帮助! 😊
|
存算分离
StarRocks 的存算分离架构是一种创新设计,旨在优化资源利用率并提升系统的弹性和扩展性。以下是存算分离的主要特点:
-
存储与计算分离:
- 数据存储在远程存储系统中,例如 Amazon S3、Google Cloud Storage、Azure Blob Storage 或支持 S3 协议的存储(如 MinIO)。
- 计算节点(CN)负责执行查询,而不存储数据。
-
本地缓存机制:
- 热数据会被缓存到本地磁盘,在查询命中缓存的情况下,性能与存算一体架构相当。
- 支持数据缓存预热功能,可以提前加载所需数据以加速查询。
-
弹性扩展:
- 计算节点可以根据需求快速扩缩容,几秒内即可完成。
- 存储成本更低,同时保证资源隔离性能。
-
支持多种存储类型:
- 包括 HDFS、Azure Blob、AWS S3 等对象存储服务。
-
适用场景:
- 存算分离架构特别适合云环境,能够降低存储成本并优化资源隔离。
如果您想了解更多,可以参考 官方文档 或 相关教程。希望这些信息对您有所帮助! 😊
我来创建一个docker-compose启动的存算分离版本
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
|
services:
minio:
container_name: minio
environment:
MINIO_ROOT_USER: miniouser
MINIO_ROOT_PASSWORD: miniopassword
image: minio/minio:latest
ports:
- "9001:9001"
- "9000:9000"
entrypoint: sh
command: '-c ''mkdir -p /minio_data/starrocks && minio server /minio_data --console-address ":9001"'''
healthcheck:
test: ["CMD", "mc", "ready", "local"]
interval: 5s
timeout: 5s
retries: 5
minio_mc:
# This service is short lived, it does this:
# - starts up
# - checks to see if the MinIO service `minio` is ready
# - creates a MinIO Access Key that the StarRocks services will use
# - exits
image: minio/mc:latest
entrypoint:
- sh
- -c
- |
until mc ls minio > /dev/null 2>&1; do
sleep 0.5
done
mc alias set myminio http://minio:9000 miniouser miniopassword
mc admin user svcacct add --access-key AAAAAAAAAAAAAAAAAAAA \
--secret-key BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB \
myminio \
miniouser
depends_on:
minio:
condition: service_healthy
starrocks-fe:
image: starrocks/fe-ubuntu:3.3-latest
hostname: starrocks-fe
container_name: starrocks-fe
user: root
command:
- /bin/bash
- -c
- |
echo "# enable shared data, set storage type, set endpoint" >> /opt/starrocks/fe/conf/fe.conf
echo "run_mode = shared_data" >> /opt/starrocks/fe/conf/fe.conf
echo "cloud_native_storage_type = S3" >> /opt/starrocks/fe/conf/fe.conf
echo "aws_s3_endpoint = minio:9000" >> /opt/starrocks/fe/conf/fe.conf
echo "# set the path in MinIO" >> /opt/starrocks/fe/conf/fe.conf
echo "aws_s3_path = starrocks" >> /opt/starrocks/fe/conf/fe.conf
echo "# credentials for MinIO object read/write" >> /opt/starrocks/fe/conf/fe.conf
echo "aws_s3_access_key = AAAAAAAAAAAAAAAAAAAA" >> /opt/starrocks/fe/conf/fe.conf
echo "aws_s3_secret_key = BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB" >> /opt/starrocks/fe/conf/fe.conf
echo "aws_s3_use_instance_profile = false" >> /opt/starrocks/fe/conf/fe.conf
echo "aws_s3_use_aws_sdk_default_behavior = false" >> /opt/starrocks/fe/conf/fe.conf
echo "# Set this to false if you do not want default" >> /opt/starrocks/fe/conf/fe.conf
echo "# storage created in the object storage using" >> /opt/starrocks/fe/conf/fe.conf
echo "# the details provided above" >> /opt/starrocks/fe/conf/fe.conf
echo "enable_load_volume_from_conf = true" >> /opt/starrocks/fe/conf/fe.conf
/opt/starrocks/fe/bin/start_fe.sh --host_type FQDN
ports:
- 8030:8030
- 9020:9020
- 9030:9030
healthcheck:
test: 'mysql -u root -h starrocks-fe -P 9030 -e "show frontends\G" |grep "Alive: true"'
interval: 10s
timeout: 5s
retries: 3
depends_on:
minio:
condition: service_healthy
starrocks-cn:
image: starrocks/cn-ubuntu:3.3-latest
command:
- /bin/bash
- -c
- |
sleep 15s;
ulimit -u 65535;
ulimit -n 65535;
mysql --connect-timeout 2 -h starrocks-fe -P9030 -uroot -e "ALTER SYSTEM ADD COMPUTE NODE \"starrocks-cn:9050\";"
/opt/starrocks/cn/bin/start_cn.sh
environment:
- HOST_TYPE=FQDN
ports:
- 8040:8040
hostname: starrocks-cn
container_name: starrocks-cn
user: root
depends_on:
starrocks-fe:
condition: service_healthy
restart: true
minio:
condition: service_healthy
healthcheck:
test: 'mysql -u root -h starrocks-fe -P 9030 -e "SHOW COMPUTE NODES\G" |grep "Alive: true"'
interval: 10s
timeout: 5s
retries: 3
|
先启动cn看看:
1
|
nerdctl run -p 9060:9060 -p 8040:8040 -p 9050:9050 -p 8060:8060 -p 9070:9070 -it --name cn -e "TZ=Asia/Shanghai" starrocks/cn-ubuntu:3.4-latest
|
进入到cn容器中:
1
2
3
|
nerdctl exec -it cn /bin/bash
cd cn/conf
echo "priority_networks = 10.7.10.190/24" >>cn.properties
|
接下来重启一下服务
1
2
|
先杀死进程
bin/start_cn.sh --daemon
|
接下在idea中启动fe
需要修改python为Python3
接下来需要安装- Protobuf。
以上步骤都做完之后,进行编译·mvn clean install -DskipTests=true·,不报错即可。
接下来本地启动fe
再starrocks目录下操作以下命令:
1
2
3
4
5
6
7
|
cp -r conf fe/conf
cp -r bin fe/bin
cp -r webroot fe/webroot
cd fe
mkdir log
mkdir meta
|
启动的主类是·com.starrocks.StarRocksFE·,再启动配置文件中添加以下环境变量
1
2
3
4
|
# 修改为自己的目录
export PID_DIR=/Users/hxf/CodeSpace/starrocks/fe/bin
export STARROCKS_HOME=/Users/hxf/CodeSpace/starrocks/fe
export LOG_DIR=/Users/hxf/CodeSpace/starrocks/fe/log
|
接下来修改fe.conf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
|
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#####################################################################
## The uppercase properties are read and exported by bin/start_fe.sh.
## To see all Frontend configurations,
## see fe/fe-core/src/main/java/com/starrocks/common/Config.java
# the output dir of stderr/stdout/gc
LOG_DIR = ${STARROCKS_HOME}/log
DATE = "$(date +%Y%m%d-%H%M%S)"
JAVA_OPTS="-Dlog4j2.formatMsgNoLookups=true -Xmx8192m -XX:+UseG1GC -Xlog:gc*:${LOG_DIR}/fe.gc.log.$DATE:time -XX:ErrorFile=${LOG_DIR}/hs_err_pid%p.log -Djava.security.policy=${STARROCKS_HOME}/conf/udf_security.policy"
##
## the lowercase properties are read by main program.
##
# DEBUG, INFO, WARN, ERROR, FATAL
sys_log_level = INFO
# store metadata, create it if it is not exist.
# Default value is ${STARROCKS_HOME}/meta
# meta_dir = ${STARROCKS_HOME}/meta
http_port = 8030
rpc_port = 9020
query_port = 9030
edit_log_port = 9010
mysql_service_nio_enabled = true
# Enable jaeger tracing by setting jaeger_grpc_endpoint
# jaeger_grpc_endpoint = http://localhost:14250
run_mode = shared_data
cloud_native_storage_type = S3
aws_s3_endpoint = 10.7.10.190:9000
# set the path in MinIO
aws_s3_path = starrocks
# credentials for MinIO object read/write
# 这里的 key 为刚才设置的 access token
aws_s3_access_key = AAAAAAAAAAAAAAAAAAAA
aws_s3_secret_key = BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
aws_s3_use_instance_profile = false
aws_s3_use_aws_sdk_default_behavior = false
# Set this to false if you do not want default
# storage created in the object storage using
# the details provided above
enable_load_volume_from_conf = true
# Choose one if there are more than one ip except loopback address.
# Note that there should at most one ip match this list.
# If no ip match this rule, will choose one randomly.
# use CIDR format, e.g. 10.10.10.0/24
# Default value is empty.
priority_networks = 10.7.10.190/24
# Advanced configurations
# log_roll_size_mb = 1024
# sys_log_dir = ${STARROCKS_HOME}/log
# sys_log_roll_num = 10
# sys_log_verbose_modules =
# audit_log_dir = ${STARROCKS_HOME}/log
# audit_log_modules = slow_query, query
# audit_log_roll_num = 10
# meta_delay_toleration_second = 10
# qe_max_connection = 1024
# max_conn_per_user = 100
# qe_query_timeout_second = 300
# qe_slow_log_ms = 5000
|
再idea中启动看到console如下输出
我们接下来再dbeaver中试试连接这个服务
接下来我们要连接cn节点了
能看到lastStartTime有数据即可。
接下来我们进行测试一下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
|
create database test;
use test;
admin set frontend config("tablet_create_timeout_second"="100")
CREATE TABLE IF NOT EXISTS par_tbl1
(
datekey DATETIME,
k1 INT,
item_id STRING,
v2 INT
)PRIMARY KEY (`datekey`,`k1`)
PARTITION BY date_trunc('day', `datekey`)
PROPERTIES (
"compression" = "LZ4",
"datacache.enable" = "true",
"enable_async_write_back" = "false",
"enable_persistent_index" = "true",
"persistent_index_type" = "LOCAL",
"replication_num" = "1",
"storage_volume" = "builtin_storage_volume"
);
|
注意
admin set frontend config(“tablet_create_timeout_second”=“100”),这条sql是为了让创建语句正常运行,不然会报错超时。
创建成功后可以看到成功创建的表格
来手动插入一条数据看看

参考文档:
https://crossoverjie.top/2025/02/26/ob/StarRocks-dev-shard-data-build/