首页手记 python 数据结构使用技巧

python 数据结构使用技巧

标签：

Python

一、在列表、字典、集合中根据条件筛选数据
下面实验的数据都是采用random模块随机生成符合条件的数据，故每次实验结果会有不同

1. 过滤列表中的负数
# -*- coding:utf-8 -*-
from random import randint

data = [randint(-10,10) for _ in xrange(10)]

# 方法一采用filter函数
print filter(lambda x:x>=0, data)
# 方法二采用列表解析
print [x for x in data if x>=0]

输出：
[8, 5, 0, 3]
[8, 5, 0, 3]
2. 筛出字典中值高于90的项
# -*- coding:utf-8 -*-
from random import randint

d = {x: randint(60,100) for x in xrange(1,21)}

print { k:v for k,v in d.iteritems() if v >90}

输出：

{10: 97}
3. 筛出集合中能被3整除的元素
# -*- coding:utf-8 -*-

from random import randint

data = [randint(-10,10) for _ in xrange(10)]

s = set(data)

print {x for x in s if x % 3 ==0}

输出：
set([0, 9])
二、命名统计字典
1.如何为元组中的每个元素命名，提高程序可读性

方案一：

定义类似与其他语言类似的枚举类型，也就是定义一系列数值常量
# -*- coding:utf-8 -*-
NAME = 0
AGE = 1
SEX = 2
EMAIL = 3

student = ('jim', 16, 'male', 'jim@gmail.com')

if student[AGE] > 18:
  pass

if student[SEX] == 'male':
  pass
方案二：

使用标准库中collections.namedtuple 替代内置tuple
生成的s是个元组，namedtuple相当于一个类的工厂，s既可以用索引，也可以用属性查找
# -*- coding:utf-8 -*-
from collections import namedtuple
student = namedtuple('Student',['name', 'age', 'sex', 'male'])

s = student('jim', 16, 'male', 'jim@gmail.com')

print s
print s.name

输出：

Student(name='jim', age=16, sex='male', male='jim@gmail.com')
jim
### 2.如何统计序列中元素的出现频度
例如1： 某随机序列中，找到出现次数最高的3个元素，它们的出现次数是多少？

方法一：
-- coding:utf-8 --
from random import randint

data = [randint(0,20) for _ in xrange(30)]

c = dict.fromkeys(data, 0)
print c
for x in data:
c[x] = c[x] + 1
print c.items()
print sorted(c.items(), key=lambda d:d[1])[-3:]

输出：
[(0, 1), (1, 2), (2, 3), (3, 2), (4, 1), (5, 2), (6, 1), (7, 2), (8,
3), (9, 3), (11, 2), (12, 2), (15, 4), (19, 1), (20, 1)]
[(8, 3), (9, 3), (15, 4)]
方法二：使用collections.Counter对象

将序列传入Counter的构造器，得到Counter对象是元素频度的字典，Counter.most_common(n)方法得到频度最高的n个元素的列表
-- coding:utf-8 --
from random import randint
from collections import Counter

data = [randint(0,20) for _ in xrange(30)]
c2 = Counter(data)

print c2.most_common(3)

输出：
[(18, 4), (5, 3), (14, 3)]
例如2： 对某英文文章的单词，进行词频统计，找到出现次数最高的10个单词，它们出现次数是多少？
> 以文件内容不是英文字符进行切片
-- coding:utf-8 --
from collections import Counter
import re

txt = open('test.txt').read()
c3 = Counter(re.split('\W+', txt))
print c3.most_common(3)

输出：
[('openhpc', 26), ('resource', 17), ('queue', 16)]
### 3.根据字典中值的大小，对字典中的项排序

解决方案：

1.利用zip将字典转化为元组

2.传递sorted函数的key参数
-- coding:utf-8 --
from random import randint

d = {x:randint(60,100) for x in 'xyzabc' }
print sorted(zip(d.itervalues(),d.iterkeys()))

输出：
[(80, 'y'), (89, 'x'), (91, 'b'), (94, 'a'), (94, 'z'), (99, 'c')]
-- coding:utf-8 --
from random import randint

d = {x:randint(60,100) for x in 'xyzabc' }
print sorted(d.items(), key=lambda x: x[1])

输出：

[('x', 67), ('y', 71), ('c', 72), ('a', 75), ('z', 88), ('b', 89)]
## 三、公共键

### 1.如何快速找到多个字典中的公共键
-- coding:utf-8 --
from random import randint, sample

s1 = {x: randint(1,4) for x in sample('abcdefg', randint(3,6))}
s2 = {x: randint(1,4) for x in sample('abcdefg', randint(3,6))}
s3 = {x: randint(1,4) for x in sample('abcdefg', randint(3,6))}

如果数据集比较少可以采用下面方法
print s1.viewkeys() & s2.viewkeys() & s3.viewkeys()

step1：使用字典的viewkeys()方法,得到一个字典的keys集合；

step2: 使用map函数，得到所有字典的keys集合；

step3：使用reduce函数，取所有字典的keys集合的交集。

数据集多的话采用下面方法
print reduce(lambda a,b:a&b, map(dict.viewkeys, [s1,s2,s3]))

输出：

set(['c', 'd'])
set(['c', 'd'])
## 四、如何让字典保持有序
### 1.使用collections.OrderedDict
from time import time
from random import randint
from collections import OrderedDict

d = OrderedDict()
players = list('ABCDEFGH')
start = time()

for i in xrange(8):
raw_input()
p = players.pop(randint(0,7-i))
end = time()
print i+1,p, end - start
d[p] = (i+1, end - start)

print ''20
for k in d:
print k, d[k]

输出：
后面for循环遍历的字典是以元素进入字典的顺序进行排列的

1 C 0.934000015259

2 D 1.40899991989

3 F 1.67999982834

4 A 1.95599985123

5 E 2.16599988937

6 H 2.37599992752

7 B 2.60699987411

8 G 2.99799990654

C (1, 0.9340000152587891)
D (2, 1.4089999198913574)
F (3, 1.679999828338623)
A (4, 1.9559998512268066)
E (5, 2.1659998893737793)
H (6, 2.375999927520752)
B (7, 2.6069998741149902)
G (8, 2.997999906539917)
## 五、历史记录
### 1. 实现用户的历史记录功能（最多n条）

使用容量为n的队列历史存储记录

使用标准库collections中的deque，它是一个双端循环队列，程序退出前，可以使用pickle将队列对象存入文件，再次运行程序时将其导入。
from random import randint
from collections import deque
N = randint(0, 100)
history = deque([], 5)

def guess(k):
if k == N:
print 'right'
return True

if k < N:
print '%s is less than N' % k
else:
print '%s is greater than N' % k
return False

while True:
line = raw_input("please input a number: ")
if line.isdigit():
k = int(line)
history.append(k)
if guess(k)：
break
elif line == 'history' or line =='h?':
print list(history)
In [1]: import pickle

In [2]: from collections import deque

In [3]: q = deque([],5)

In [4]: q.append(1)

In [5]: q.append(2)

In [6]: q.append(3)

In [7]: q.append(4)

In [8]: q.append(5)

In [9]: q.append(6)

In [10]: q
Out[10]: deque([2, 3, 4, 5, 6])

In [11]: pickle.dump(q,open('history','w'))

In [12]: pickle.load(open('history'))
Out[12]: deque([2, 3, 4, 5, 6])

ps：最后吐槽一下，我明明编写好的markdown格式，并且预览都是正常的，为什么发布的时候格式就全变了。@慕女神

点击查看更多内容

为 TA 点赞

若觉得本文不错，就分享一下吧！

评论

评论

共同学习，写下你的评论

评论加载中...

展开查看更多评论

作者其他优质文章

正在加载中

孤独的小猪

全栈工程师

手记
篇

粉丝

29

获赞与收藏

89

关注作者，订阅最新文章

阅读免费教程

Python 办公自动化教程

17个小节 24377 822

Python 算法入门教程

15个小节 25782 1014

Python 进阶应用教程

38个小节 61450 960

推荐

评论

收藏

共同学习，写下你的评论



感谢您的支持，我会继续努力的～

扫码打赏，你说多少就多少

赞赏金额会直接到老师账户

支付方式

打开微信扫一扫，即可进行扫码打赏哦

今天注册有机会得

100积分直接送

付费专栏免费学

大额优惠券免费领

立即参与放弃机会

点击
抽奖

慕课手记新用户专享福利

恭喜你，你的运气太好了，居然抽中了 100个积分！

恭喜你，抽中了价值元的专栏！

太棒了，直接落到你账户里！

积分商城里的罗技鼠标、机械键盘、
Kindle 阅读器、小米平衡车
Apple iPad （10.2英寸）、大额优惠券
在等着你去兑换了噢

作者：

免费赠送

兑换码：1111222211 复制

优惠券可用于购买实战课、体系课
无门槛使用

先去看看，有什么好东西马上兑换我爱学习，选课去


热搜

最近搜索清空

python 数据结构使用技巧

1. 过滤列表中的负数

2. 筛出字典中值高于90的项

3. 筛出集合中能被3整除的元素

1.如何为元组中的每个元素命名，提高程序可读性

阅读免费教程