How to grab red envelopes scientifically: At the end of the year, you can get rich and write a program to grab red envelopes

helen · Posted on 2/13/2015 10:44:10 PM

0×00 Background

What are the red envelopes like? His brother's son, Huer, said, "The money is almost comparable. The brother and daughter Dao Yun said, "It's not as good as my aunt because of the wind." "Everyone understands the background, it's the New Year, and it's the day when red envelopes are flying all over the sky. It just so happened that I learned Python two days ago, and I was more excited, so I studied and studied the crawling of Weibo red envelopes, why Weibo red envelopes instead of Alipay red envelopes, because I only understand the Web, and if I have the energy, I may also study the whack-a-mole algorithm in the future.
Because I am a beginner in Python, this program is also the third program I wrote after learning Python, so please don't poke in person if there is any pit in the code, the focus is on the idea, well, if there is any pit in the idea, please don't poke it in person, you see IE has the face to set itself as the default browser, I write a scum article is also acceptable......
I use Python 2.7, and it is said that there is a big difference between Python 2 and Python 3.

0×01 Ideas

I was too lazy to describe it in words, so I drew a sketch, and everyone should be able to understand it.

First of all, the old rule, first introduce a library that you don't know is useful for but can't do without:
[mw_shl_code=java,true]import re import urllib import urllib2 import cookielib import base64 import binascii import os import json import sys import cPickle as p import rsa[/mw_shl_code] Then declare some other variables that you will need to use later:

[mw_shl_code=java,true]reload(sys)sys.setdefaultencoding('utf-8&') #将字符编码置为utf-8luckyList=[] #红包列表lowest=10 #能忍受红包领奖记录最低为多少[/mw_shl_code]An rsa library is used here, which is not included in Python by default. Need to install it :https://pypi.python.org/pypi/rsa/

After downloading it, run setpy.py install and then we can start our development steps.

0×02 Weibo login

The action of grabbing red envelopes must be carried out after logging in, so there must be a login function, login is not the key, the key is the preservation of cookies, here the cooperation of cookielib is required.
[mw_shl_code=java,true]cj = cookielib. CookieJar()opener = urllib2.build_opener(urllib2. HTTPCookieProcessor(cj))urllib2.install_opener(opener)[/mw_shl_code] In this way, all network operations using opener will handle the state of cookies, although I don't know much about it, but it feels amazing.
Next, we need to encapsulate two modules, one is the data acquisition module, which is used to simply GET data, and the other is used to POST data.
[mw_shl_code=java,true]def getData(url) : try: req = urllib2. Request(url) result = opener.open(req) text = result.read() text=text.decode("utf-8").encode("gbk",'ignore') return text except Exception, e: print u' request exception, url: '+url print e def postData(url,data,header) : try: data = urllib.urlencode(data) req = urllib2. Request(url,data,header) result = opener.open(req) text = result.read() return text except Exception, e: print u'Request exception, url: '+url[/mw_shl_code] With these two modules, we can GET and POST data, among which the reason why getData decode and then encode is because under Win7 I always garbled the output when debugging, so I added some encoding processing, these are not the point, the login function below is the core of Weibo login.
[mw_shl_code=java,true]def login(nick, pwd): print u"----------login----------" print "----------......----------" prelogin_url= 'http://login.sina.com.cn/sso/prelogin.php?entry=weibo&callback=sinaSSOController.preloginCallBack&su=%s&rsakt=mod&checkpin=1&client=ssologin.js(v1.4.15)&_= 1400822309846' % nick preLogin = getData(prelogin_url) servertime = re.findall('"servertime":(.+?),' , preLogin)[0] pubkey = re.findall('"pubkey":"(.+?)",' , preLogin)[0] rsakv = re.findall('"rsakv":"(.+?)",' , preLogin)[0] nonce = re.findall('"nonce":"(.+?)",' , preLogin)[0] #print bytearray('xxxx','utf-8') su = base64.b64encode(urllib.quote(nick)) rsaPublickey= int(pubkey,16) key = rsa. PublicKey(rsaPublickey,65537) message = str(servertime) +'\t' + str(nonce) + '\n' + str(pwd) sp = binascii.b2a_hex(rsa.encrypt(message,key)) header = {'User-Agent' : 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)'} param = { 'entry': 'weibo', 'gateway': '1', 'from': '', 'savestate': '7', 'userticket': '1', 'ssosimplelogin': '1', 'vsnf': '1', 'vsnval': '', 'su': su, 'service': 'miniblog', 'servertime': servertime, 'nonce': nonce, 'pwencode': 'rsa2', 'sp': sp, 'encoding': 'UTF-8', 'url': 'http://weibo.com/ajaxlogin.php?framelogin=1&callback=parent.sinaSSOController.feedBackUrlCallBack', 'returntype': 'META', 'rsakv' : rsakv, } s = postData('http://login.sina.com.cn/sso/login.php?client=ssologin.js(v1.4.15)',param,header) try: urll = re.findall("locatio remove n.replace\(\'(.+?) \'\); " , s)[0] login=getData(urll) print u"--------- Login successful! ------- "print" ----------......---------- "except Exception, e: print u" --------- login failed! -------" print "----------......----------" exit(0)[/mw_shl_code]The parameters and encryption algorithms in this are copied from the Internet, I don't understand very well, probably it is to request a timestamp and public key first, then rsa encryption and finally process the processing and submit it to the Sina login interface, after successfully logging in from Sina, it will return a Weibo address, you need to request it, so that the login status can take effect completely, After successful login, subsequent requests will carry the current user's cookie.

0×03 Designated red envelope drawing

After successfully logging in to Weibo, I can't wait to find a red envelope to try it first, of course, first in the browser. Finally, I found a page with a red envelope button, F12 summoned the debugger to see what the data packet was requesting.

You can see that the address of the request is http://huodong.weibo.com/aj_hongbao/getlucky, there are two main parameters, one is ouid, that is, the red envelope id, which can be seen in the URL, the other share parameter determines whether to share it to Weibo, and there is a _t I don't know what it is for.
Okay, now theoretically, you can complete the extraction of red envelopes by submitting three parameters to this URL, but when you actually submit the parameters, you will find that the server will magically return such a string for you:
[mw_shl_code=java,true] {"code":303403,"msg":"Sorry, you don't have permission to access this page","data":[]}[/mw_shl_code] Don't panic at this time, according to my many years of web development experience, the other party's programmer should judge the referer, very simple, copy all the headers of the past request.
[mw_shl_code=java,true]def getLucky(id): #抽奖程序 print u"--- draw red envelope from:"+str(id)+"---" print "----------......----------" if checkValue(id)==False: #不符合条件, this is the function return later       luckyUrl="http://huodong.weibo.com/aj_hongbao/getlucky"       param={             'ouid':id,             'share':0,             '_t':0             }       header= {             'Cache-Control':'no-cache',             'Content-Type':'application/x-www-form-urlencoded',             'Origin':'http://huodong.weibo.com',             'Pragma':'no-cache',             'Referer': 'http://huodong.weibo.com/hongbao/'+str(id),             'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.146 BIDUBrowser/6.x Safari/537.36',             'X-Requested-With':'XMLHttpRequest'             }       res = postData(luckyUrl,param, header)[/mw_shl_code] In this case, there is no problem in theory, but in fact there is no problem. After the lottery action is completed, we need to judge the status, and the returned res is a json string, where the code is 100000 is successful, and if it is 90114, it is the upper limit of today's lottery, and the other values are also failed, so:
[mw_shl_code=java,true]hbRes=json.loads(res)if hbRes["code"]=='901114': #今天红包已经抢完 print u"--------- has reached the upper limit---------" print "----------......----------" log('lucky', str(id)+'---'+str(hbRes["code"])+'---'+hbRes["data"]["title"]) exit(0)elif hbRes["code"]=='100000':#成功 print u"---------Wishing you prosperity---------" print "----------......----------"       log('success',str(id)+'---'+res)       exit(0) if hbRes["data"] and hbRes["data"]["title"]:       print hbRes["data"]["title"]       print  "----------......----------"       log('lucky', str(id)+'---'+str(hbRes["code"])+'---'+hbRes["data"]["title"])else: print u"---------Request error---------" print "----------......----------" log('lucky', str(id)+'---'+res)[/mw_shl_code], where log is also a function I customize, which is used to record logs:
[mw_shl_code=java,true]def log(type,text):       fp = open(type+'.txt','a')       fp.write(text)       fp.write('\r\n')       fp.close()[/mw_shl_code]

helen · Posted on 2/13/2015 10:46:37 PM

0×04 Crawl the list of red envelopes

After the successful test of a single red envelope collection action, it is the core big move module of our program - crawling the red envelope list, there should be many methods and entrances to crawl the red envelope list, such as various Weibo search keywords and so on, but I use the simplest method here: crawling the red envelope list.
On the homepage (http://huodong.weibo.com/hongbao of the red envelope activity, through various points more, all can be observed, although the list is connected a lot, it can be summarized into two categories (except for the richest red envelope list): theme and leaderboard.
Continuing to summon F12, analyze the format of both pages, starting with a list of topical forms, such as: http://huodong.weibo.com/hongbao/special_quyu

You can see that the information of the red envelope is in a div named info_wrap, so we only need to activate the source code of this page, and then grab all the infowraps, and then simply process it to get the red envelope list of this page, here we need to use some regulars:
[mw_shl_code=java,true]def getThemeList(url,p):#主题红包 print u"--------- "+str(p)+"page---------" print "----------......----------" html=getData(url+'?p='+str(p)) pWrap= re.compile(r'<div class="info_wrap">(.+?) ',re. DOTALL) #h Get info_wrap the regular pInfo=re.compile(r'.+(.+).+(.+).+(.+)..+href="(.+)" class="btn"',re. DOTALL) #获取红包信息 List=pWrap.findall(html,re. DOTALL) n=len(List) if n==0: return for i in range(n): #traverse all info_wrap div s=pInfo.match(List) #取得红包信息 info=list(s.groups(0)) info[0]=float(info[0].replace('\xcd\xf2','0000')) #现金,ten->0000 try: info[1]=float(info[1].replace('\xcd\xf2','0000')) #礼品价值 except Exception, e: info[1]=float(info[1].replace('\xd2\xda','00000000')) #礼品价值 info[2]=float(info[2].replace('\xcd\xf2','0000')) #已发送 if info[2]==0: info[2]=1 #防止除数为0 if info[1]==0: info[1]=1 #防止除数为0 info.append(info[0]/(info[2]+info[1])) #红包价值, cash/(number of claimees + prize value) # if info[0]/(info[2]+info[1])>100: # print url luckyList.append(info) if 'class="page"' in html:#存在下一页 p=p+1 getThemeList(url,p) #递归调用自己爬取下一页[/mw_shl_code]It is so difficult to say that it is so difficult, and it took a long time to write these two sentences. There is also an info in the info here append [4], which is my algorithm to roughly judge the value of red envelopes, why do this, because there are many red envelopes but we can only draw four times, in the vast sea of bags, we must find the most valuable red envelopes and then draw them, here are three data for reference: cash value, gift value and number of recipients, obviously if the cash is small and the number of people receiving a lot of people or the value of the prize is super high (some are even crazy in units of hundreds of millions), Then it's not worth grabbing, so I held back for a long time and finally came up with an algorithm to measure the weight of red envelopes: red envelope value = cash / (number of recipients + prize value).
The principle of the leaderboard page is the same, find the key tags, and match them regularly.

[mw_shl_code=java,true]def getTopList(url,daily,p):#排行榜红包 print u"--------- "+str(p)+"page---------" print "----------......----------" html=getData(url+'?daily='+str(daily)+'& p='+str(p)) pWrap=re.compile(r'<div class="list_info">(.+?) ',re. DOTALL) #h Get list_info the regular pInfo=re.compile(r'.+(.+).+(.+).+(.+)..+href="(.+)" class="btn rob_btn"',re. DOTALL) #获取红包信息 List=pWrap.findall(html,re. DOTALL) n=len(List) if n==0: return for i in range(n): #Iterate through all info_wrap div s=pInfo.match(List) #取得红包信息 topinfo=list(s.groups(0)) info=list(topinfo) info[0]=topinfo[1].replace('\xd4\xaa','') #元->'' info[0]=float(info[0].replace('\xcd\xf2','0000')) #现金,10,>0000 info[1]=topinfo[2]. replace('\xd4\xaa','') #元->'' try: info[1]=float(info[1].replace('\xcd\xf2','0000')) #礼品价值 except Exception, e: info[1]=float(info[1].replace('\xd2\xda','00000000')) #礼品价值 info[2]=topinfo[0].replace('\xb8\xf6','') #个->'' info[2]=float(info[2].replace('\xcd\xf2','0000')) #已发送 if info[2]==0: info[2]=1 #防止除数为0 if info[1]==0: info[1]=1 #防止除数为0 info.append(info[0]/(info[2]+info[1])) #红包价值, cash/(number of recipients + gift value) # if info[0]/(info[2]+info[1])>100: # print url luckyList.append(info) if 'class="page"' in html:#存在下一页 p=p+1 getTopList(url,daily,p) #递归调用自己爬取下一页[/mw_shl_code]Okay, now we can successfully crawl the list of the two special pages, and the next step is to get the list of lists, that is, the collection of all these list addresses, and then grab them one by one:
[mw_shl_code=java,true]def getList(): print u"---------Find target---------" print "----------......----------" themeUrl={ #主题列表 'theme':'http://huodong.weibo.com/hongbao/theme', 'pinpai':'http://huodong.weibo.com/hongbao/special_pinpai', 'daka':'http://huodong.weibo.com/hongbao/special_daka', 'youxuan':'http://huodong.weibo. com/hongbao/special_youxuan', 'qiye':'http://huodong.weibo.com/hongbao/special_qiye', 'quyu':'http://huodong.weibo.com/hongbao/special_quyu', 'meiti':'http: huodong.weibo.com/hongbao/special_meiti', 'hezuo':'http://huodong.weibo.com/hongbao/special_hezuo' } topUrl={ #排行榜列表 'mostmoney':'http://huodong.weibo. com/hongbao/top_mostmoney', 'mostsend':'http://huodong.weibo.com/hongbao/top_mostsend', 'mostsenddaka':'http://huodong.weibo.com/hongbao/top_mostsenddaka', 'mostsendpartner':'http://huodong.weibo.com/hongbao/top_mostsendpartner', 'cate':'http://huodong.weibo.com/hongbao/cate?type=', 'clothes':'http://huodong.weibo.com/ hongbao/cate?type=clothes', 'beauty':'http://huodong.weibo.com/hongbao/cate?type=beauty', 'fast':'http://huodong.weibo.com/hongbao/cate?type=fast', 'life':'http: //huodong.weibo.com/hongbao/cate?type=life', 'digital':'http://huodong.weibo.com/hongbao/cate?type=digital', 'other':'http://huodong.weibo.com/hongbao/cate?type=other' } for (theme,url) in themeUrl.items(): print "----------"+theme+"----------" print url print "----------......----------" getThemeList(url,1) for (top,url) in topUrl.items(): print "----------"+top+"----------" print url print "----------......----------" getTopList(url,0,1) getTopList(url,1,1)[/mw_shl_code]

helen · Posted on 2/13/2015 10:47:13 PM

0×05 Judge the availability of red envelopes

This is relatively simple, first search for keywords in the source code to see if there is a red envelope grab button, and then go to the collection ranking to see what the highest record is, if the most only receives a few dollars, goodbye......
The address to view the collection record is http://huodong.weibo.com/aj_hongbao/detailmore?page=1&type=2&_t=0&__rnd=1423744829265&uid=Red Envelope ID

[mw_shl_code=java,true]def checkValue(id): infoUrl='http://huodong.weibo.com/hongbao/'+str(id) html=getData(infoUrl) if 'action-type="lottery"' in html or True: #存在抢红包按钮 logUrl="http://huodong.weibo.com/aj_hongbao/detailmore?page=1&type=2&_t=0&__rnd=1423744829265&uid="+id #查看排行榜数据 param={} header= { 'Cache-Control': 'no-cache', 'Content-Type':'application/x-www-form-urlencoded', 'Pragma':'no-cache', 'Referer':'http://huodong.weibo.com/hongbao/detail? uid='+str(id), 'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.146 BIDUBrowser/6.x Safari/537.36', 'X-Requested-With':'XMLHttpRequest' } res = postData(logUrl,param,header) pMoney=re.compile(r'(\d+?. +?) \xd4\xaa',re. DOTALL) #h Get all list_info regulars luckyLog=pMoney.findall(html,re. DOTALL) if len(luckyLog)==0: maxMoney=0 else: maxMoney=float(luckyLog[0]) if maxMoney <lowest: #记录中最大红包小于设定值 return False else: print u"--------- hand slows down---------" print "----------......----------" return false return True[/mw_shl_code]0×06 Finishing work
The main modules are already in place, and now all the steps need to be connected in series:
[mw_shl_code=java,true]def start(username,password,low,fromFile): gl=False lowest=low login(username , password) if fromfile=='y': if os.path.exists('luckyList.txt'): try: f = file('luckyList.txt') newList = [] newList = p.load(f) print u'--------- load list---------' print "----------......----------" except Exception, e: print u' failed to parse the local list, crawling the online page. ' print "----------......----------" gl=True else: print u' does not exist locally luckyList.txt crawling online pages. ' print "----------......----------" gl=True if gl==True: getList() from operator import itemgetter newList=sorted(luckyList, key=itemgetter(4),reverse=True) f = file('luckyList.txt', 'w') p.dump(newList, f) #把抓到的列表存到文件里, so you don't need to catch f.close() for lucky in next time newList: if not 'http://huodong.weibo.com' in lucky[3]: #不是红包 continue print lucky[3] id=re.findall(r'(\w*[0-9]+)\w*',lucky[3]) getLucky(id[0])[/mw_shl_code] Because it is very troublesome to crawl the red envelope list repeatedly every time it is tested, I added a paragraph to dump the full list into the file code, so that you can read the local list and grab the red envelope in the future.
[mw_shl_code=java,true]if __name__ == "__main__": print u"------------------ Weibo Red Envelope Assistant------------------" print "---------------------v0.0.1---------------------" print u"------------- by @***----------------" print "-------------------------------------------------" try: uname=raw_input(u"Please enter your Weibo account: ".decode('utf-8').encode('gbk')) pwd=raw_input(u"Please enter your Weibo password: ".decode('utf-8').encode('gbk')) low=int(raw_input(u"Red envelope to receive the maximum cash greater than n: ".decode('utf-8').encode('gbk')))) fromfile=raw_input(u) Whether to use the red envelope list in the luckyList.txt: (y/n) ".decode('utf-8').encode('gbk')) except Exception, e: print u" parameter error" print "----------......----------" print e exit(0) print u"--------- program start---------" print "----------......----------" start(uname,pwd,low,fromfile) print u" --------- program end--------- "print "----------...... ----------" os.system('pause')[/mw_shl_code]

0×07 Let's go!

0×07 Summary

The basic crawler skeleton has been basically completed, in fact, there is still a lot of room for this crawler to play in many details, such as modifying it to support batch login, such as optimizing the red envelope value algorithm, the code itself should also have a lot of places to optimize, but with my ability, I estimate that I can get this.
In the end, everyone saw the result of the program, I wrote hundreds of lines of code, thousands of words of articles, and all I worked hard to get was just a set of two-color balls, Nima pit dad, how could it be a two-color ball!! (Narrator: The more the author talked, the more excited he became, and he actually cried, and the people around him persuaded: Brother, it's not like that, isn't it just a Weibo red envelope, yesterday my hands were sore, and I didn't shake out a WeChat red envelope.) ）

admin · Posted on 2/14/2015 7:14:33 AM

How much money did you grab in Loulou?

The temperature in the northern city is 22 degrees · Posted on 3/2/2016 1:48:27 PM

Looks so tall

I am who I am · Posted on 3/7/2016 12:51:02 PM

I don't know if it's serious

How to grab red envelopes scientifically: At the end of the year, you can get rich and write a program to grab red envelopes

Related Posts

Sections viewed