我正在尝试构建一个应用程序,它将“检查”一个单元格,该单元格是一个覆盖地理数据库中一部分土地的正方形,并对该单元格内的特征进行分析。由于我要处理许多单元格,因此我使用的是多处理方法。我让它在我的对象内部有点像这样工作:class DistributedGeographicConstraintProcessor: ... def _process_cell(self, conn_string): conn = pg2.connect(conn_string) try: cur = conn.cursor() cell_id = self._check_out_cell(cur) conn.commit() print(f"processing cell_id {cell_id}...") for constraint in self.constraints: # print(f"processing {constraint.name()}...") query = constraint.prepare_distributed_query(self.job, self.grid) cur.execute(query, { "buffer": constraint.buffer(), "cell_id": cell_id, "name": constraint.name(), "simplify_tolerance": constraint.simplify_tolerance() }) # TODO: do a final race condition check to further suppress duplicates self._check_in_cell(cur, cell_id) conn.commit() finally: del cur conn.close() return None def run(self): while True: if not self._job_finished(): params = [self.conn_string] * self.num_cores processes = [] for param in params: process = mp.Process(target=self._process_cell, args=(param,)) processes.append(process) sleep(0.1) # Prevent multiple processes from checkout out the same grid square process.start() for process in processes: process.join() else: self._finalize_job() break但问题是它只会启动四个进程并等到它们都完成后再启动四个新进程。我想这样当一个进程完成它的工作时,它会立即开始在下一个单元上工作,即使它的协同进程还没有完成。我不确定如何实现这一点,我尝试过使用这样的池:def run(self): pool = mp.Pool(self.num_cores) unprocessed_cells = self._unprocessed_cells() for i in pool.imap(self._process_cell, unprocessed_cells): print(i)
1 回答
拉风的咖菲猫
TA贡献1995条经验 获得超2个赞
我的猜测是您将一些连接对象附加到self; 尝试仅使用函数(无类/方法)重写您的解决方案。
这是我前段时间使用的单生产者/多工人解决方案的简化版本:
def worker(param):
//connect to pg
//do work
def main():
pool = Pool(processes=NUM_PROC)
tasks = []
for param in params:
t = pool.apply_async(utils.process_month, args=(param, ))
tasks.append(t)
pool.close()
finished = false
while not finished:
finished = True
for t in tasks:
if not t.ready():
finished = False
break
time.sleep(1)
添加回答
举报
0/150
提交
取消
