使用Python将数据文本文件拆分为MySQL的多个文本文件

我有一个数据txt文件格式，以以下格式加载到数据库（MySQL）中（有点夸张）：数据.txtname age profession datestampJohn 23 engineer 2020-03-01Amy 17 doctor 2020-02-27Gordon 19 artist 2020-02-27Kevin 25 chef 2020-03-01以上是由通过python执行的以下命令生成的：LOAD DATA LOCAL INFILE '/home/sample_data/data.txt' REPLACE INTO TABLE person_professions FIELDS TERMINATED BY 0x01 OPTIONALLY ENCLOSED BY '\"' LINES TERMINATED BY '\n' (name,age,profession,datestamp)创建数据.txt;但是，数据.txt对于一次插入所有数据库（设置约200 MB插入限制）来说真的很大，我想将数据切成几个块（data_1.txt，data_2.txt，data_3.txt等），并逐个插入它们以避免达到插入大小限制。我知道您可以逐行查找条件来切片数据，例如with open('data.txt', 'w') as f: data = f.read().split('\n') if some condition: with open('data_1.txt', 'w') as f2: insert data 但我不太确定如何提出一个条件断点来使其开始插入到新的txt文件中，除非有更好的方法来做到这一点。

查看完整描述

1 回答

繁花不似锦

TA贡献1851条经验获得超4个赞

我编写了一个可以完成工作的函数，具体取决于文件的大小。代码注释中的说明。

def split_file(file_name, lines_per_file=100000):

# Open large file to be read in UTF-8

with open(file_name, 'r', encoding='utf-8') as rf:

# Read all lines in file

lines = rf.readlines()

print ( str(len(lines)) + ' LINES READ.')

# Set variables to count file number and count of lines written

file_no = 0

wlines_count = 0

# For x from 0 to length of lines read stepping by number of lines that will be written in each file

for x in range(0, len(lines), lines_per_file):

# Open new "split" file for writing in UTF-8

with open( 'data' + '-' + str(file_no) + '.txt', 'w', encoding='utf-8') as wf:

# Write lines

wf.writelines(lines[x:x+lines_per_file])

# Update the written lines count

wlines_count += (len(lines[x:x + lines_per_file]))

# Update new "split" file count mainly for naming

file_no+=1

print(str(wlines_count) + " LINES WRITTEN IN " + str(file_no) + " FILES.")

# Split data.txt into files containing 100000 lines

split_file('data.txt',100000)

反对回复 2022-08-16

热搜

最近搜索清空

使用Python将数据文本文件拆分为MySQL的多个文本文件

使用Python将数据文本文件拆分为MySQL的多个文本文件

1 回答

添加回答