Error message here!

Hide Error message here!

忘记密码?

Error message here!

请输入正确邮箱

Hide Error message here!

密码丢失?请输入您的电子邮件地址。您将收到一个重设密码链接。

Error message here!

返回登录

Close

Grab the comments of all microblogs published by users

The month is small and the water is long 2021-11-25 20:33:05 阅读数:2 评论数:0 点赞数:0 收藏数:0

Click on the above The moon is small and the water is long and Set to star , Receive and push dry goods at the first time

This is a The moon is small and the water is long Of the 95 Original dry goods

At present, the official account platform has changed the push mechanism , spot “ Fabulous ”、 spot “ Looking at ”、 Added “ Star standard ” Classmate , Will give priority to receiving my article push , So after reading the article , Remember to click “ Looking at ” and “ Fabulous ”.

according to Microblog topic crawler perhaps Microblog user crawler Grab the saved microblog csv file , There are microblogs , Batch crawling comments are based on these microblogs ID, Grab their comments .

Previously, according to the topic crawler results , Grab comments on topic microblogs in batches , stay 【B Stop video tutorial 】 Grab users' microblogs and batch grab comments It has been explained in detail , Today we are talking about the results saved by the user crawler , Batch capture and save the user's microblog comments .

If you don't know how to capture the microblog of the specified user , You can see A crawler that crawls all users' microblogs , Can still break the net and continue to climb that kind of , The specific operation process can be seen in B Station supporting video tutorial .

If we captured the microblog of singer Li Jian in the last step , Its preserved csv The contents are as follows :

Comment on the crawler on Weibo 2021 The new microblog comments and its sub comments crawler are released We can know , It requires a configuration file similar to the following .

{
"cookie": " Replace it with your cookie",
"comments": [
{
"mid": "KCAqH0IpS",
"uid": "1744395855",
"limit": 10000,
"desc": " Forwarding reason : This song records that summer , Will also begin to record your future days , May you always have that innate inner power "
}
]
}

uid It refers to Li Jian's users id,mid It's one of Li Jianfa's microblogs id,limit It refers to the maximum number of comments on this microblog , Default 1w,desc Is the information describing this microblog , The default here is the microblog text , Mainly mid, uid important .

The above configuration file describes only crawling mid=KCAqH0IpS this 1 Comments on Weibo , If you want to batch grab , You need to give comments list append contain mid、uid、limit、desc A dictionary of these four fields , The previous topic crawler batch comment capture configuration file has an automatically generated script , This time, we will grab the comments of users' microblogs in batches , There are also .

# -*- coding: utf-8 -*-
# author: inspurer( The moon is small and the water is long )
# create_time: 2021/10/17 10:31
# Running environment Python3.6+
# github https://github.com/inspurer
# WeChat official account The moon is small and the water is long
import json
import pandas as pd
limit = 10000
config_path = 'mac_comment_config.json'
input_file = './1744395855_ Singer Li Jian .csv'
if '/' in input_file:
user_id = input_file[input_file.rindex('/')+1:input_file.rindex('_')]
else:
user_id = input_file[:input_file.rindex('_')]
user_name = input_file[input_file.rindex('_')+1:input_file.rindex('.')]
def drop_duplicate(path, col_index=0):
df = pd.read_csv(path)
first_column = df.columns.tolist()[col_index]
# Remove duplicate row data
df.drop_duplicates(keep='first', inplace=True, subset=[first_column])
# There may be repetition left header
df = df[-df[first_column].isin([first_column])]
df.to_csv(path, encoding='utf-8-sig', index=False)
drop_duplicate(input_file)
with open(config_path, 'r', encoding='utf-8-sig') as f:
config_json = json.loads(f.read())
df = pd.read_csv(input_file)
# Remove the original comments To configure , If not required, please note
config_json['comments'].clear()
for index, row in df.iterrows():
print(f'{index + 1}/{df.shape[0]}')
mid = row[' Microblogging id']
config_json['comments'].append({
'mid': mid,
'uid': user_id,
'limit': limit,
'desc': row[' Microblog text ']
})
with open(config_path, 'w', encoding='utf-8-sig') as f:
f.write(json.dumps(config_json, indent=2, ensure_ascii=False))

Then run the comment crawler , For the above specific operation process, please refer to B Stand below this video .

This article is from WeChat official account. - The moon is small and the water is long (inspurer) , author :BuyiXiao

The source and reprint of the original text are detailed in the text , If there is any infringement , Please contact the yunjia_community@tencent.com Delete .

Original publication time : 2021-11-01

Participation of this paper Tencent cloud media sharing plan , You are welcome to join us , share .

Copyright statement
In this paper,the author:[The month is small and the water is long],Reprint please bring the original link, thank you

编程之旅,人生之路,不止于编程,还有诗和远方。
阅代码原理,看框架知识,学企业实践;
赏诗词,读日记,踏人生之路,观世界之行;

支付宝红包,每日可领