(Trino) Read-only file system; nested exception is java.sql.SQLException 에러

지혜와 본질을 추구하는 자 2024. 12. 27. 20:31

개요

Trino를 사용하는 어플리케이션에서 아래와 같은 에러가 발생하였다.

/tmp/trino-s3-13154018856034091017.tmp: Read-only file system; nested exception is java.sql.SQLException

해당 장애의 트러블 슈팅 방안을 살펴보자.

내용

/tmp 디렉토리는 무엇인가?

Trino를 쿠버네티스 환경에 클러스터로 배포하기 위해서 이 레포지토리의 helm 차트를 사용하고 있다.

GitHub - joshuarobinson/trino-on-k8s: Setup for running Trino with Hive Metastore on Kubernetes

Setup for running Trino with Hive Metastore on Kubernetes - joshuarobinson/trino-on-k8s

github.com

그리고, Trino에서는 메모리의 한계를 극복하기 위하여 Disk에 값을 임시 저장한 후 메모리에서 그 값을 읽어들여 처리하는 방식을 사용할 수 있는데, 그 Spill의 임시 디렉토리를 헬름 차트에서 /tmp로 지정해둔 것이다. 이는 기본 값으로 설정된 값이기도 하다.

trino-on-k8s/trino-cfgs.yaml at master · joshuarobinson/trino-on-k8s

Setup for running Trino with Hive Metastore on Kubernetes - joshuarobinson/trino-on-k8s

github.com

---
kind: ConfigMap 
apiVersion: v1 
metadata:
  name: trino-configs
data:
  jvm.config: |-
    -server
    -Xmx16G
    -XX:-UseBiasedLocking
    -XX:+UseG1GC
    -XX:G1HeapRegionSize=32M
    -XX:+ExplicitGCInvokesConcurrent
    -XX:+ExitOnOutOfMemoryError
    -XX:+UseGCOverheadLimit
    -XX:+HeapDumpOnOutOfMemoryError
    -XX:ReservedCodeCacheSize=512M
    -Djdk.attach.allowAttachSelf=true
    -Djdk.nio.maxCachedBufferSize=2000000
  config.properties.coordinator: |-
    coordinator=true
    node-scheduler.include-coordinator=false
    http-server.http.port=8080
    query.max-memory=200GB
    query.max-memory-per-node=8GB
    query.max-total-memory-per-node=10GB
    query.max-stage-count=200
    task.writer-count=4
    discovery-server.enabled=true
    discovery.uri=http://trino:8080
  config.properties.worker: |-
    coordinator=false
    http-server.http.port=8080
    query.max-memory=200GB
    query.max-memory-per-node=10GB
    query.max-total-memory-per-node=10GB
    query.max-stage-count=200
    task.writer-count=4
    discovery.uri=http://trino:8080
  node.properties: |-
    node.environment=test
    spiller-spill-path=/tmp
    max-spill-per-node=4TB
    query-max-spill-per-node=1TB
  hive.properties: |-
    connector.name=hive-hadoop2
    hive.metastore.uri=thrift://metastore:9083
    hive.allow-drop-table=true
    hive.max-partitions-per-scan=1000000
    hive.s3.endpoint=10.62.64.200
    hive.s3.path-style-access=true
    hive.s3.ssl.enabled=false
    hive.s3.max-connections=100
  iceberg.properties: |-
    connector.name=iceberg
    hive.metastore.uri=thrift://metastore:9083
    hive.max-partitions-per-scan=1000000
    hive.s3.endpoint=10.62.64.200
    hive.s3.path-style-access=true
    hive.s3.ssl.enabled=false
    hive.s3.max-connections=100
  mysql.properties: |-
    connector.name=mysql
    connection-url=jdbc:mysql://metastore-db.default.svc.cluster.local:13306
    connection-user=root
    connection-password=mypass

[사진 1] Trino 공식 문서 내에 정의된 Spilling properties

즉 오류 내용과 /tmp파일의 용도로 추려봤을 때

트리노는 디스크에 임시로 메타 데이터를 저장하고 이를 메모리에서 순차 처리한다.
이를 위해 디스크 Write 작업이 필요한데, PVC 권한 설정이 Read Only로 설정되어 있어 장애가 발생했다.

라는 것이다.

정말 readOnly 처리 되어있는가?

정말 readonly 처리 되어있는지 확인해본 결과 11개의 워커 노드 중 1개의 워커 노드가 정말 Read-Only 처리되고 있었다.

trino@trino-worker-2 /]$ mount | grep /tmp
/dev/mapper/fb8aa4b6_f68f_40fe_bb02_a2cecfdc72ee-ntnxLV on /tmp type ext4 (ro,relatime,stripe=96,data=ordered)

0	trino-worker-0	/dev/mapper/b3bcc969_a5e5_4cb7_833d_bb56fdba73b0-ntnxLV	rw, relatime, stripe=96, data=ordered
1	trino-worker-1	/dev/mapper/743dbfa4_472f_47a8_9fc6_670b972f8b3c-ntnxLV	rw, relatime, stripe=96, data=ordered
2	trino-worker-2	/dev/mapper/fb8aa4b6_f68f_40fe_bb02_a2cecfdc72ee-ntnxLV	ro, relatime, stripe=96, data=ordered
3	trino-worker-3	/dev/mapper/e2b9d824_b68f_4d93_8a46_800882b0ca15-ntnxLV	rw, relatime, stripe=96, data=ordered
4	trino-worker-4	/dev/mapper/26568506_966d_43ea_a7d6_41ddda0cc8f5-ntnxLV	rw, relatime, stripe=96, data=ordered
5	trino-worker-5	/dev/mapper/d792d209_4c44_46a6_8e41_9bcc08c7620a-ntnxLV	rw, relatime, stripe=96, data=ordered
6	trino-worker-6	/dev/mapper/af6abe94_8258_4dc5_9300_3a6999c0614e-ntnxLV	rw, relatime, stripe=96, data=ordered
7	trino-worker-7	/dev/mapper/ec47e1fe_6592_426c_8028_3bd5ba70d4d3-ntnxLV	rw, relatime, stripe=96, data=ordered
8	trino-worker-8	/dev/mapper/dd682469_21b6_4b16_beb7_d01d3d347ed8-ntnxLV	rw, relatime, stripe=96, data=ordered
9	trino-worker-9	/dev/mapper/a60a6418_eb6f_4b22_8b08_0a5414f71be0-ntnxLV	rw, relatime, stripe=96, data=ordered
10	trino-worker-10	/dev/mapper/492b1204_3503_4b3c_8f44_5a6de07baf17-ntnxLV	rw, relatime, stripe=96, data=ordered

왜 readOnly 처리 되었는가?

확인 결과 인프라팀에서 인증서 교체 관련 작업을 하다가 특정 쿠배 cluster node의 pvc를 ro로 변경한 것을 발견하였다.

해결 방안

해당 Trino worker Node를 리눅스 명령어를 통해 rw로 수정하여 해결하였으며, 장애 시간동안 트리거 된 모든 job을 재트리거 하여 복원.

참고 자료

Spilling properties — Trino 468 Documentation

Spilling properties These properties control Spill to disk. spill-enabled Type: boolean Default value: false Session property: spill_enabled Try spilling memory to disk to avoid exceeding memory limits for the query. Spilling works by offloading memory to

trino.io