阅读更多
1 Killing Feature
2 Spark & Iceberg
2.1 Step1: Create a shared network
1 | # Create a network to be used by both spark and hadoop |
2.2 Step2: Start Hadoop
Start a single-node hadoop cluster joining the shared network.
1 | SHARED_NS=iceberg-ns |
Test:
1 | docker exec ${HADOOP_CONTAINER_NAME} bash -c 'hadoop jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar pi 10 100' |
2.3 Step3: Start Spark
Start a spark container joining the shared network.
1 | SHARED_NS=iceberg-ns |
Test:
1 | CREATE TABLE iceberg_spark_demo.db.table (id bigint, data string) USING iceberg; |
3 Trino & Iceberg
Trino only support hive-metastore based catalog rather than raw hadoop filesystem based catalog.
3.1 Step1 & Step2
We can use the same container and network created in section Step1: Create a shared network and Step2: Start a hadoop as storage of iceberg
3.2 Step3: Start Hive
Start a hive container joining the shared network.
- Tez-tips: Don’t use
apache-tez-0.10.3-bin.tar.gz
directly but useshare/tez.tar.gz
after uncompressing. (Error: Could not find or load main class org.apache.tez.dag.app.DAGAppMaster)
1 | SHARED_NS=iceberg-ns |
Test:
1 | docker exec -it ${HIVE_SERVER_CONTAINER_NAME} beeline -u 'jdbc:hive2://localhost:10000/' -e " |
3.3 Step4: Start Trino
1 | SHARED_NS=iceberg-ns |
Test:
1 | docker exec -it ${TRINO_CONTAINER_NAME} trino --catalog iceberg --execute " |
4 API-Demo
1 | <properties> |
1 | package org.byconity.iceberg; |
5 Tips
5.1 Reserved Field Ids
Refer to Reserved Field IDs for details.