Python：jsonl

itsven2026/1/25大约 1 分钟

1 逐行读取为列表

import json

def read_jsonl(path):
    data = []
    with open(path, "r", encoding="utf-8") as f:
        for line in f:
            line = line.strip()
            if not line:
                continue
            data.append(json.loads(line))
    return data


# 用法
records = read_jsonl("data.jsonl")
print(len(records))
print(records[0])

内存中是 list[dict]
适合中小规模数据
可随便索引、遍历、处理

2 大文件推荐：生成器版本（不吃内存）

如果你是像在做 RAG / PathRAG / KG 数据流水线 那种百万行级别的 jsonl，这个更专业 👇

import json

def iter_jsonl(path):
    with open(path, "r", encoding="utf-8") as f:
        for line in f:
            line = line.strip()
            if line:
                yield json.loads(line)


# 用法
for record in iter_jsonl("data.jsonl"):
    print(record)

tqdm版本

from tqdm import tqdm
import json

def iter_jsonl(path, show_progress=True):
    total = None
    if show_progress:
        with open(path, "r", encoding="utf-8") as f:
            total = sum(1 for _ in f)

    with open(path, "r", encoding="utf-8") as f:
        it = f
        if show_progress:
            it = tqdm(f, total=total, desc="Reading JSONL")

        for line in it:
            line = line.strip()
            if line:
                yield json.loads(line)

一行一行处理
内存占用极低
非常适合管道式处理 / 批量写入向量库 / 图构建

3 带异常保护（工程推荐版）

import json

def read_jsonl_safe(path):
    data = []
    with open(path, "r", encoding="utf-8") as f:
        for i, line in enumerate(f, 1):
            line = line.strip()
            if not line:
                continue
            try:
                data.append(json.loads(line))
            except json.JSONDecodeError as e:
                print(f"Line {i} JSON parse error: {e}")
    return data

4 Pandas DataFrame

import pandas as pd

df = pd.read_json("data.jsonl", lines=True)
print(df.head())

.jsonl 不是一个大 JSON 数组，不能直接 json.load()
必须 lines=True 才能用 pandas

更新日志

2026/1/30 19:51

查看所有更新日志

72fda-调整页面层级于 2026/1/30
25785-批量更新文章标签与分类于 2026/1/30
95b1c-优化图标于 2026/1/30
7fe01-新增python json相关文章于 2026/1/25