首页猿问 AWS DMS 使用具有大结果集的...

AWS DMS 使用具有大结果集的 DatabaseMigrationService.Client.

Python

蝴蝶不菲 2021-12-21 10:42:41

我正在使用 describe_table_statistics 来检索给定 DMS 任务中的表列表，并有条件地循环使用响应 ['Marker'] 的 describe_table_statistics。当我不使用过滤器时，我得到了正确的记录数，13k+。当我使用结果集少于 MaxRecords 的过滤器或过滤器组合时，我得到正确的记录数。但是，当我传入一个过滤器时，它会获得比 MaxRecords 更大的记录集，我得到的记录比我应该得到的要少得多。这是我检索表集的函数：def get_dms_task_tables(account, region, task_name, schema_name=None, table_state=None): tables=[] max_records=500 filters=[] if schema_name: filters.append({'Name':'schema-name', 'Values':[schema_name]}) if table_state: filters.append({'Name':'table-state', 'Values':[table_state]}) task_arn = get_dms_task_arn(account, region, task_name) session = boto3.Session(profile_name=account, region_name=region) client = session.client('dms') response = client.describe_table_statistics( ReplicationTaskArn=task_arn ,Filters=filters ,MaxRecords=max_records) tables += response['TableStatistics'] while len(response['TableStatistics']) == max_records: response = client.describe_table_statistics( ReplicationTaskArn=task_arn ,Filters=filters ,MaxRecords=max_records ,Marker=response['Marker']) tables += response['TableStatistics'] return tables为了排除故障，我循环遍历表格，每张表格打印一行： print(', '.join(( t['SchemaName'] ,t['TableName'] ,t['TableState'])))当我没有为“表已完成”的表状态传递过滤器和 grep 时，我通过控制台获得了 12k+ 条记录，这是正确的计数至少从表面上看，响应循环是有效的。当我传入架构名称和表状态过滤条件时，我得到了正确的计数，正如控制台所确认的那样，但此计数小于 MaxRecords。当我刚刚传入“表已完成”的表状态过滤器时，我只得到 949 条记录，所以我缺少 11k 条记录。我尝试从循环内的 describe_table_statistics 中省略 Filter 参数，但在所有情况下我都得到相同的结果。我怀疑我对循环内的 describe_table_statistics 的调用有问题，但我一直无法在亚马逊的文档中找到这样的例子来确认这一点。

查看完整描述

1 回答

长风秋雁

TA贡献1757条经验获得超7个赞

应用过滤器时，describe_table_statistics 不遵守 MaxRecords 限制。

事实上，它似乎所做的是检索（2 x MaxRecords），应用过滤器，然后返回该集合。或者它可能检索 MaxRecords，应用过滤器，并继续直到结果集大于 MaxRecords。无论哪种方式，我的 while 条件都是问题所在。

我换了

while len(response['TableStatistics']) == max_records:

和

while 'Marker' in response:

现在该函数返回正确的记录数。

顺便说一句，我的第一次尝试是

while len(response['TableStatistics']) >= 1:

但在循环的最后一次迭代时，它抛出了这个错误：

KeyError: 'Marker'

完成的工作功能现在看起来是这样的：

def get_dms_task_tables(account, region, task_name, schema_name=None, table_state=None):

tables=[]

max_records=500

filters=[]

if schema_name:

filters.append({'Name':'schema-name', 'Values':[schema_name]})

if table_state:

filters.append({'Name':'table-state', 'Values':[table_state]})

task_arn = get_dms_task_arn(account, region, task_name)

session = boto3.Session(profile_name=account, region_name=region)

client = session.client('dms')

response = client.describe_table_statistics(

ReplicationTaskArn=task_arn

,Filters=filters

,MaxRecords=max_records)

tables += response['TableStatistics']

while 'Marker' in response:

response = client.describe_table_statistics(

ReplicationTaskArn=task_arn

,Filters=filters

,MaxRecords=max_records

,Marker=response['Marker'])

tables += response['TableStatistics']

return tables

反对回复 2021-12-21

1 回答
0 关注
201 浏览

关注

添加回答

0/150

提交

取消

热搜

最近搜索清空

AWS DMS 使用具有大结果集的 DatabaseMigrationService.Client.

AWS DMS 使用具有大结果集的 DatabaseMigrationService.Client.

1 回答

添加回答