scrapy.github.io/companies.html at master · a-mkh/scrapy.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
---
layout: default
title: Companies using Scrapy
permalink: companies/
---

<h2>Companies using Scrapy</h2>

<table class="companieslist">
    <tr>
        <th><a href="http://parsely.com/">Parsely</a></th>
        <td>uses Scrapy to scrape articles from a hundred of news sites. His CTO talks about Scrapy in <a href="https://speakerdeck.com/amontalenti/web-crawling-and-metadata-extraction-in-python">this talk</a></td>
    </tr>
    <tr>
        <th><a href="http://directemployersfoundation.org/">Direct Employers Foundation</a></th>
        <td>uses Scrapy to scrape job postings from many websites, which are published on the <a href="http://my.jobs/">My.jobs site</a>.</td>
    </tr>
    <tr>
        <th><a href="http://www.weotta.com/">Weotta</a></th>
        <td>uses Scrapy to crawl data for post-processing (<a href="http://twitter.com/japerk/status/79304855486865408">tweet</a>)</td>
    </tr>
    <tr>
        <th><a href="http://mydeco.com">Mydeco</a></th>
        <td>uses Scrapy to scrape more than 4000 furniture retailer sites daily</td>
    </tr>
    <tr>
        <th><a href="http://medialab.sciences-po.fr/">Médialab Sciences Po</a></th>
        <td>in Paris is using Scrapy to develop a web mining tool for Social Sciences researchers (<a href="https://groups.google.com/d/topic/scrapy-users/ApfTGGokSKo/discussion">announcement here</a>)</td>
    </tr>
    <tr>
        <th><a href="http://www.bitehunter.com">BiteHunter</a></th>
        <td>uses Scrapy to crawl deal sites and restaurant directory sites, to provide real time aggregator and search engine for dining</td>
    </tr>
    <tr>
        <th><a href="http://www.innerballoons.com/">Inner Balloons</a></th>
        <td>uses Scrapy to scrape data for products including local search content generation, reputation management, web presence management and marketing analytics.</td>
    </tr>
    <tr>
        <th><a href="http://www.flax.co.uk/">Flax</a></th>
        <td>is a search consulting company based in Cambridge (UK) that uses Scrapy to power the crawling needs of their solutions (<a href="http://www.flax.co.uk/blog/2013/02/22/cambridge-search-meetup-a-night-of-crawling-and-scraping/">blog post</a>).</td>
    </tr>
    <tr>
        <th><a href="http://www.jetrank.com">JetRank</a></th>
        <td>uses Scrapy to crawl websites for detecting SEO issues</td>
    </tr>
    <tr>
        <th><a href="http://www.lyst.com">Lyst</a></th>
        <td>uses Scrapy to crawl and scrape the fashion websites they index</td>
    </tr>
    <tr>
        <th><a href="https://scraperwiki.com/">ScraperWiki</a></th>
        <td>is a data services company based in Liverpool providing bespoke solutions for data scraping and aggregation using Scrapy as a core technology (<a href="http://blog.scraperwiki.com/2013/03/14/tools-of-the-trade/">blog post</a>).</td>
    </tr>
    <tr>
        <th><a href="http://www.data.gov.uk/">World Government Data</a></th>
        <td>- UK government data aggregation site (<a href="http://twitter.com/bfirsh/status/8025368963">tweet</a>)</td>
    </tr>
    <tr>
        <th><a href="http://www.oposicionesaldia.com/">Oposicionesaldia</a></th>
        <td>uses Scrapy to collect data from jobs postings, scholarships and online free courses in Spain.</td>
    </tr>
    <tr>
        <th><a href="http://www.iberstudios.com">Iberestudios</a></th>
        <td>uses Scrapy to collect data from masters degrees, doctorates and postgradute degrees in Spain. </td>
    </tr>
    <tr>
        <th><a href="http://www.usedaywatch.com/">DayWatch</a></th>
        <td>is an Internet Market Intelligence tool that uses Scrapy to empower real-time business information retrieval from Daily Deal sites.</td>
    </tr>
    <tr>
        <th><a href="http://www.marketmaters.com/">MarketMate RS</a></th>
        <td>uses Scrapy to scrape job offers from many sites</td>
    </tr>
    <tr>
        <th><a href="http://www.pricewiki.com/">Pricewiki.com</a></th>
        <td>uses Scrapy to scrape various websites for cost of living information</td>
    </tr>
    <tr>
        <th><a href="http://dealshelve.com/">Dealshelve</a></th>
        <td>uses Scrapy to scrape daily deals from many sites</td>
    </tr>
    <tr>
        <th><a href="http://www.zinmoo.es/">Zinmoo</a></th>
        <td>uses Scrapy to scrape thousands of real state properties a per day (<a href="http://blog.sophilabs.com/2011/02/introducing-zinmoo-spain-real-state-searcher/">blog post</a>)</td>
    </tr>
    <tr>
        <th><a href="http://www.careerbuilder.com/">CareerBuilder.com</a></th>
        <td>uses Scrapy to scrape job offers from many sites</td>
    </tr>
    <tr>
        <th><a href="http://grablab.org/">GrabLab</a></th>
        <td>is Russian company which specializes in web scraping, data collection and web automation tasks.</td>
    </tr>
    <tr>
        <th><a href="http://www.simplespot.it">SimpleSpot</a></th>
        <td>uses Scrapy to build their geolocalized information service</td>
    </tr>
    <tr>
        <th><a href="http://www.monetate.com">Monetate</a></th>
        <td>uses Scrapy daily to collect catalog information from their clients</td>
    </tr>
    <tr>
        <th><a href="http://www.clanslots.com">ClanSlots</a></th>
        <td>uses Scrapy daily to collect levels and plugins for games they host</td>
    </tr>
    <tr>
        <th><a href="http://www.alisverisrobotu.com/">Alışveriş Robotu</a></th>
        <td>is a Turkish price comparison site that uses Scrapy to collect data from hundreds of retailers everyday</td>
    </tr>
    <tr>
        <th><a href="http://www.tothego.com/">ToTheGo</a></th>
        <td>is an aggregator of ads related to homes, jobs and cars trading. They use Scrapy to scrape everyday announces from the biggest portals and organize them inside their database.</td>
    </tr>
    <tr>
        <th><a href="http://jobmistral.com/">JobMistral</a></th>
        <td>is an aggregator of jobs classified ads. The use Scrapy to scrape job ads hourly from many sites</td>
    </tr>
    <tr>
        <th><a href="http://www.tuvalabs.com/">TuvaLabs</a></th>
        <td>uses Scrapy to scrape RSS feeds of major newspapers from all over the world. They scrape the web to find the most interesting articles of significant news stories taking place around the world and transforming them into interactive math learning units.</td></tr>
    <tr>
        <th><a href="http://www.alistek.com/">Alistek</a></th>
        <td>uses Scrapy for updating partner related information in their OpenERP based back-office system, by scraping various data sources, both on the web and off-line.</a></td>
</tr>
    <tr>
        <th><a href="http://www.zhitongba.com">Zhitongba</a></th>
        <td>is a company trying to help people better commute within big cities in China. They use Scrapy to scrape ride-sharing information from multiple sources.</td></tr>
    <tr>
        <th><a href="http://www.offertazo.com/">Offertazo</a></th>
        <td>uses Scrapy to scrape offers from many Spanish websites.</td>
    </tr>
    <tr>
        <th><a href="http://www.lionseek.com/">LionSeek</a></th>
        <td>is a search engine that uses Scrapy to find items for sale in forums.</td>
    </tr>
    <tr>
        <th><a href="http://www.stilivo.com/">Stilivo</a></th>
        <td>is a discovery shopping site that uses Scrapy to collect product information from e-commerce sites.</td>
    </tr>
    <tr>
        <th><a href="http://www.mapado.com/">Mapado</a></th>
        <td>uses Scrapy to find local activities on the web</td>
    </tr>
    <tr>
        <th><a href="http://oony.com/">Oony</a></th>
        <td>is a deal aggregator in more than 16 countries. They currently have more than 500 Scrapy spiders running to gather their information.</td>
    </tr>
    <tr>
        <th><a href="http://woppu.my/">Woppu</a></th>
        <td>is using Scrapy to collect product information from online shopping malls in Malaysia and Singapore.
    </tr>
    <tr>
        <th><a href="http://jobuzu.co.uk/">Jobuzu</a></th>
        <td>uses Scrapy to scrape over 100,000 jobs daily from UK job boards</td>
    </tr>
    <tr>
        <th><a href="http://www.reviews42.com/">Reviews42</a></th>
        <td>uses Scrapy to crawl hundreds of ecommerce portals for classifying the products sold online</td>
    </tr>
    <tr>
        <th><a href="http://www.coenterprise.com/solutions/applications/bookradar">BookRadar</a> (from CoEnterprise)</th>
        <td>uses Scrapy to scrape information form many book retailers</td>
    </tr>
    <tr>
        <th><a href="http://pazar.org/">Pazar</a></th>
        <td>is a price comparison website that uses Scrapy daily to collect data from many websites</td>
    </tr>
    <tr>
        <th><a href="http://wp-rocket.me/">WP Rocket</a></th>
        <td>uses Scrapy to preload the cache of all customer sites</td>
    </tr>
    <tr>
        <th><a href="http://www.cavucador.com.br/">Cavucador - Ache seu Imóvel</a></th>
        <td>uses Scrapy to collect real estate data from many different websites in Brazil. They scrape more than 100,000 real estates daily and generate a search engine and statistics database</td>
    </tr>
    <tr>
        <th><a href="http://competera.net/">Competera</a></th>
        <td>is a price intelligence service that uses Scrapy to collect price, availability and promo data from over the million product pages every day</td>
    </tr>
</table>

<p class="note">If you use Scrapy, we'd like to include you on the list. Email <a href="mailto:info@scrapinghub.com">info@scrapinghub.com</a> and request to be included, or just fork <a href="https://github.com/scrapy/scrapy.github.io">this GitHub repo</a>, add yourself, and send a pull request.</p>