python实战项目，搜索自己网站的关键词，使用代理并且模拟点击

文章目录

python实战项目，搜索自己网站的关键词，使用代理并且模拟点击

		发表于: 2018-06-12 23:06:47 | 已被阅读: 318 | 分类于: Python

前面，我们介绍了 python实战项目，获取指定网站关键词百度排名，为seo提供参考资料，那为了满足自己的虚荣心，而且听说点击可以提升网站关键词的排名。不过模拟点击，欺骗搜索引擎肯定知道咱们的 IP 了，且不说有没有提升排名的效果，如果欺骗搜索引擎发现某一个 IP 一直在重复的点击自己的网站，肯定把网站按照作弊处理了，反而起到反作用。所以，咱们python模拟点击应该使用代理，以欺骗欺骗搜索引擎。

python 怎样模拟点击网站

python 模拟点击网站，其实咱们前面已经说了很多了，就是请求一次网站内容。咱们以python模拟点击我的博客

https://www.xrkzn.cn

为例，代码其实可以很简单：

#encoding=utf8
import requests
url = "https://www.xrkzn.cn"
requests.get(url)

在我的博客后台，看到了访问记录：

我们还能直接看出，是通过 python-requests 访问的，对于python模拟点击网站来说，这个信息不应该出现，解决办法就是加上


        headers

信息，咱们直接上代码：

#encoding=utf8
import requests
url = "https://www.xrkzn.cn"
header = {
        'User-agent': 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)',
}
requests.get(url, headers=header)

这个时候，访客记录变成了：

完美了吗？肯定没，假设咱们从百度搜索访问的，应该有来源信息。咱们把它加上：

#encoding=utf8
import requests
url = "https://www.xrkzn.cn"
header = {
        'User-agent': 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)',
        'Referer': 'http://www.baidu.com'
}
requests.get(url, headers=header)

python获取代理IP

途径总体可以分为两类：免费和收费。我一直用的是免费方式：从一些公开代理 IP 的网站爬取。我以为收费的会更好用，更方便，于是花了钱买了xx代理，结果发现被坑了。。。下面介绍从免费代理网站


        python获取代理IP

吧。咱们直接上代码：

#coding:utf-8
import requests
import random
from bs4 import BeautifulSoup

User_AgentArray = ["Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50",
                   "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50",
                   "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0",
                   "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0)",
                   "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)",
                   "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)",
                   "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0.1) Gecko/20100101 Firefox/4.0.1",
                   "Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1",
                   "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Maxthon 2.0)"]

def GetProxies(url):   
    header = {}
    header["User-Agent"] = User_AgentArray[random.randint(0, len(User_AgentArray) - 1)]
    data = requests.get(url, headers=header)
    soup = BeautifulSoup(data.content, "lxml")
    # print(soup.find_all('tr'))
    ips = soup.find_all('tr')

    proxies = []
    for i in range(1,len(ips)):
        tds = ips[i].findAll('td')
        ip = tds[1].string
        port = tds[2].string
        type = tds[4].string
        proxy = "%s:%s" % (ip, port)
        proxies.append(proxy)

    return proxies

url = "http://www.xicidaili.com/nt/1"   #请勿恶意爬取别人网站哈
proxies = GetProxies(url)
for item in proxies:
    print item

输出如下图：

这样，我们就获得了很多代理 IP。

python 找出可用代理 IP

上面利用 python 从免费网站获取到很多代理IP，但是有很多都是不能用的，所以再使用之前，要再写个程序判断某个代理 IP 是否可用，方法如下：

咱们使用代理访问


        http://2017.ip138.com/ic.asp

，如果能够正确显示 IP，就认为该 IP 是可用的

代码可以如下写：

#coding:utf-8
import requests
import random
from bs4 import BeautifulSoup

User_AgentArray = ["Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50",
                   "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50",
                   "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0",
                   "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0)",
                   "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)",
                   "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)",
                   "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0.1) Gecko/20100101 Firefox/4.0.1",
                   "Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1",
                   "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Maxthon 2.0)"]

def GetProxies(url):   
    header = {}
    header["User-Agent"] = User_AgentArray[random.randint(0, len(User_AgentArray) - 1)]
    data = requests.get(url, headers=header)
    soup = BeautifulSoup(data.content, "lxml")
    # print(soup.find_all('tr'))
    ips = soup.find_all('tr')

    proxies = []
    for i in range(1,len(ips)):
        tds = ips[i].findAll('td')
        ip = tds[1].string
        port = tds[2].string
        type = tds[4].string
        proxy = "%s:%s" % (ip, port)
        proxies.append(proxy)

    return proxies

url = "http://www.xicidaili.com/nt/1"   #请勿恶意爬取别人网站哈
testUrl = "http://2017.ip138.com/ic.asp"

proxies = GetProxies(url)
for item in proxies:
    proxy = {u"http": u"http://%s" % item}
    try:
        res = requests.get(testUrl, proxies=proxy, timeout=3)
        if u"您的IP是：" in res.content.decode("gb2312"):
            print "发现一个可用的代理：%s" % proxy
    except Exception, e:
        pass

运行，输出如下：

这样，我们就从一堆不可用的代理中，挑选出了可用的代理，可用于


        python模拟点击网站

。

python实战项目，搜索自己网站的关键词，使用代理并且模拟点击

python 怎样模拟点击网站

完美了吗？肯定没，假设咱们从百度搜索访问的，应该有来源信息。咱们把它加上：

python获取代理IP

python 找出可用代理 IP

付费内容: 1.01 元，输入手机号或者邮箱查看

更多阅读