Error message here!

Hide Error message here!

Error message here!

Hide Error message here!

Error message here!

Close

## Python 3 uses the iterative generator yield to reduce memory consumption

Lu Yanjun 2021-05-04 19:42:30 阅读数:9 评论数:0 点赞数:0 收藏数:0

# Technical background

stay python In the encoding for When cycling through tasks , Will load all the parameters to be traversed into memory . It's not necessary , Because these parameters are likely to be disposable , Even in many scenarios, these parameters do not need to be stored in memory at the same time , In this case, we will use the iterative generator introduced in this article yiled.

# Basic use

Let's start with an example to illustrate the iterative generator yield The basic use method , The purpose of this example is to construct a function to generate a square array \({0^2, 1^2, 2^2 ...}\). In ordinary scenes, we usually construct an empty list directly , Then fill in the list with the results of each calculation , Last return Just list , It corresponds to the function here `square_number`. And another function `square_number_yield` To demonstrate yield And the constructed function , It uses the same syntax as return It's the same , The difference is that only one value is returned at a time ：

``````# test_yield.py
def square_number(length):
s = []
for i in range(length):
s.append(i ** 2)
return s
def square_number_yield(length):
for i in range(length):
yield i ** 2
if __name__ == '__main__':
length = 10
sn1 = square_number(length)
sn2 = square_number_yield(length)
for i in range(length):
print (sn1[i], '\t', end='')
print (next(sn2))
``````

stay main Function, we compare the results of the two methods , Print on the same line , use `end=''` Instructions can replace the line feed at the end of a line , The results are as follows ：

``````[dechin@dechin-manjaro yield]\$ python3 test_yield.py
0 0
1 1
4 4
9 9
16 16
25 25
36 36
49 49
64 64
81 81
``````

You can see that the results of the two methods are the same . Maybe in some scenarios, it is the result returned from the storage function that needs to be persisted , This is useful yield It can also be realized , You can refer to the following example ：

``````# test_yield.py
def square_number(length):
s = []
for i in range(length):
s.append(i ** 2)
return s
def square_number_yield(length):
for i in range(length):
yield i ** 2
if __name__ == '__main__':
length = 10
sn1 = square_number(length)
sn2 = square_number_yield(length)
sn3 = list(square_number_yield(length))
for i in range(length):
print (sn1[i], '\t', end='')
print (next(sn2), '\t', end='')
print (sn3[i])
``````

The method used here is to directly yield The generated object is transformed into list Format , Or use `sn3 = [i for i in square_number_yield(length)]` This way of writing is also possible , There should be little difference in performance . The execution result of the above code is as follows ：

``````[dechin@dechin-manjaro yield]\$ python3 test_yield.py
0 0 0
1 1 1
4 4 4
9 9 9
16 16 16
25 25 25
36 36 36
49 49 49
64 64 64
81 81 81
``````

In the previous chapter we mentioned , Use yield It can save the memory of the program , Here we test a 100000 The sum of squares of a random array of sizes . If you use normal logic , So the program is as follows （ About python Memory footprint tracking method , You can refer to this article Blog ）：

``````# square_sum.py
import tracemalloc
import time
import numpy as np
tracemalloc.start()
start_time = time.time()
ss_list = np.random.randn(100000)
s = 0
for ss in ss_list:
s += ss ** 2
end_time = time.time()
print ('Time cost is: {}s'.format(end_time - start_time))
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')
for stat in top_stats[:5]:
print (stat)
``````

This program, on the one hand, through time To test the execution time , On the other hand, use tracemalloc Track the memory changes of the program . Here is the first one `np.random.randn()` Directly produced 100000 An array of random numbers used to calculate , Naturally, these generated random numbers need to be stored in the process of calculation , It's going to take up so much memory . If you use yield Methods , Only one random number for calculation is generated at a time , And according to the usage in the previous chapter , The random number generated by this iteration can also be transformed into a complete list Of ：

``````# yield_square_sum.py
import tracemalloc
import time
import numpy as np
tracemalloc.start()
start_time = time.time()
def ss_list(length):
for i in range(length):
yield np.random.random()
s = 0
ss = ss_list(100000)
for i in range(100000):
s += next(ss) ** 2
end_time = time.time()
print ('Time cost is: {}s'.format(end_time - start_time))
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')
for stat in top_stats[:5]:
print (stat)
``````

The results of these two examples are as follows , Can be put together for comparison ：

``````[dechin@dechin-manjaro yield]\$ python3 square_sum.py
Time cost is: 0.24723434448242188s
square_sum.py:9: size=781 KiB, count=2, average=391 KiB
square_sum.py:12: size=24 B, count=1, average=24 B
square_sum.py:11: size=24 B, count=1, average=24 B
[dechin@dechin-manjaro yield]\$ python3 yield_square_sum.py
Time cost is: 0.23023390769958496s
yield_square_sum.py:9: size=136 B, count=1, average=136 B
yield_square_sum.py:14: size=112 B, count=1, average=112 B
yield_square_sum.py:11: size=79 B, count=2, average=40 B
yield_square_sum.py:10: size=76 B, count=2, average=38 B
yield_square_sum.py:15: size=28 B, count=1, average=28 B
``````

After comparison, we find that , The calculation time of the two methods is almost the same , But in terms of memory footprint yield There are clear advantages . Of course , Maybe this example is not very appropriate , But this article mainly introduces yield How to use it and its application scenarios .

# Infinite iterators

In the reference link 1 One use mentioned in is infinite iterators , For example, return all prime numbers in order , So if we use return To return all the elements and store them in a list , It's a very uneconomic way , So you can use yield To iteratively generate , Reference link 1 The source code in is as follows ：

``````def get_primes(number):
while True:
if is_prime(number):
yield number
number += 1
``````

So similar , Here we use `while True` Can show a simple case —— Returns all even numbers ：

``````# yield_iter.py
def yield_range2(i):
while True:
yield i
i += 2
iter = yield_range2(0)
for i in range(10):
print (next(iter))
``````

Because here we limit the length to be 10, So in the end it will return to 10 An even number ：

``````[dechin@dechin-manjaro yield]\$ python3 yield_iter.py
0
2
4
6
8
10
12
14
16
18
``````

# Summary

This paper introduces python The iterator yield, In fact, about yield, We can simply understand it as a single element of return. This is not only a preliminary understanding yield The usage grammar of , I can also get a general idea of yield The advantages of , That is to say, in the process of calculation, only one element of memory is occupied at a time , Instead of storing a large number of elements in memory all the time .