首页猿问尝试为 Rail-RNA...

尝试为 Rail-RNA 创建清单生成脚本 (python) 时输出错误。文件不匹配

Python

慕森卡 2021-09-02 14:44:25

我想要什么：创建一个“清单”，用于在目录中的多个 fastq 文件上运行 rail-RNA ( http://rail.bio/ )，格式如下：FASTQ URL 1 (tab) optional MD5 1 (tab) FASTQ URL 2 (tab) optional MD5 2 (tab) sample label喜欢：/home/data/10080-17_r1.fastq 0 /home/data/10080-17_r2.fastq 0 10080-17_r1/home/data/10300-25_r1.fastq 0 /home/data/10300-25_r2.fastq 0 10300-25_r1/home/data/40500-72_r1.fastq 0 /home/data/40500-72_r2.fastq 0 10300-25_r2.. 等等我所做的：创建了一个 python 脚本来从特定目录中的 fastq 文件生成清单：#!/usr/bin/pythonimport osimport csv thisdir = os.getcwd()# Create empty listsforward = []reverse = []zero = []names = []# Extract file-paths for files ending with .fastq and append them to "forward" and "reverse"for r, d, f in os.walk(thisdir): # r=root, d=directories, f = files for file in f: if "_1.fastq" in file: forward.append(os.path.join(r, file)) if "_2.fastq" in file: reverse.append(os.path.join(r, file))# make a list containing 0 with the length of the forward listfor i in range(len(forward)): zero.append('0')# extract filenames without extensions:l = os.listdir(thisdir)li = [x.split('.')[0] for x in l]for name in li: if "_1" in name: names.append(name)names = [s.strip('_1') for s in names]# write the output to a filewith open('manifest.txt', 'w') as f: writer = csv.writer(f, delimiter='\t') for path in zip(forward, zero, reverse, zero, names): writer.writerow(list(path))怎么了？：我得到了正确格式的 manifest.txt，但是，它与正确的 *_r1.fastq 和 *_r2.fastq 文件不匹配。它做这样的事情（第一列中的 r1 与第三列中的 r2 不匹配）/home/data/10080-17_r1.fastq 0 /home/data/40500-72_r2.fastq 0 10080-17_r1/home/data/10300-25_r1.fastq 0 /home/data/10080-17_r2.fastq 0 10300-25_r1/home/data/40500-72_r1.fastq 0 /home/data/10300-25_r2.fastq 0 10300-25_r2你们中的一些更有经验的 Python'ers 有解决这个问题的解决方案吗？那将不胜感激！

查看完整描述

2 回答

小怪兽爱吃肉

TA贡献1852条经验获得超1个赞

在提供的解决方案中，如果 *_r1.fastq 文件的数量与 *_r2.fastq 文件的数量不对应，则会发生此错误，因为该代码仅通过数组索引生成新的 csv 行并且不比较文件名。
我更新了那个解决方案。检查文件名，它们应该是这样的：

/home/data/10080-17_r1.fastq
/home/data/10080-17_r2.fastq

目前我们得到了所有正向文件（ *_r1.fastq ），我们正在尝试
在同一目录中找到合适的反向文件（ *_r2.fastq ）。如果我们没有找到它，则输入“-”而不是反向文件的名称。
请检查代码并阅读评论：

#!/usr/bin/python

import os

import csv

this_dir = os.getcwd()

forward_arr = []

reverse_arr = []

for r, d, f in os.walk(this_dir): # r=root, d=directories, f = files

for file in f:

if "_r1.fastq" in file:

forward_arr.append(os.path.join(r, file))

if "_r2.fastq" in file:

reverse_arr.append(os.path.join(r, file))

# collect result rows in this array

csv_arr = []

# foreach file in forward_arr

for forward_file in forward_arr:

# get sample label from full file path

# 1. split by '/' and select last element:

# /home/data/10080-17_r1.fastq -> 10080-17_r1.fastq

# 2. split by '_r' and select first element: 10080-17_r1.fastq -> 10080-17

sample_label = forward_file.split('/')[-1].split('_r')[0]

# we will search the reverse file for the same forward file in the reverse_arr

# but if we don't find it, in that case we'll put '-'

# instead of the path to the reverse file

reverse_file_result = "-"

# we are looking for a file with the same name and in the same location

# but it should be a reverse file with '_r2' instead of '_r1' in its name

reverse_file_for_search = forward_file.replace("_r1", "_r2")

# search that reverse_file in the reverse_arr

for reverse_file in reverse_arr:

# if we found that file

if reverse_file_for_search == reverse_file:

# assign the reverse file name

# to reverse_file_result variable insted of '-'

reverse_file_result = reverse_file

# go to the next forward_file

break

# in that place we can count md5 for the FORWARD file

md5_url_1 = 0

# in that place we can count md5 for the REVERSE file

md5_url_2 = 0

# append the result row in the csv_arr

csv_arr.append((forward_file, md5_url_1, reverse_file_result,

md5_url_2, sample_label))

# re-write all data to csv file per one iteration

with open('manifest.txt', 'w') as f:

writer = csv.writer(f, delimiter='\t')

writer.writerows(csv_arr)

反对回复 2021-09-02

宝慕林4294392

TA贡献2021条经验获得超8个赞

我认为这应该适合您的需要。很难辨认，因为：

names = [s.strip('_1') for s in names]

看起来它不应该做任何事情（我怀疑它应该是“_r1”，就像我在那里修改的第一个循环一样）

import os

import csv

thisdir = os.getcwd()

# Create empty lists

forward = []

reverse = []

names = []

for r, d, f in os.walk(thisdir): # r=root, d=directories, f = files

if f.endswith("_r1.fastq"):

forward.append(os.path.join(r, file))

names.append(f.strip("_r1.fastq"))

elif f.endswith("_r2.fastq"):

reverse.append(os.path.join(r, file))

# write the output to a file

with open('manifest.txt', 'w') as f:

writer = csv.writer(f, delimiter='\t')

for for, rev, nam in zip(forward, reverse, names):

path = [for, 0, rev, o, nam]

writer.writerow(path)

反对回复 2021-09-02

2 回答
0 关注
130 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

尝试为 Rail-RNA 创建清单生成脚本 (python) 时输出错误。文件不匹配

尝试为 Rail-RNA 创建清单生成脚本 (python) 时输出错误。文件不匹配

2 回答

添加回答