Fetching Patreon Data

Jun 9, 2017 2 min read Uncategorized

Patreon is a delight to scrape. Actually, scrapping is the wrong word for it – the frontend of Patreon is a react application that calls a number of very sensibly designed json end points. Call the same endpoints and you get delightfully clean json that exactly matches what gets displayed on the site.

A disclaimer – this is undocumented as far as I can tell – the publicly documented API (JS Implementation) is targeted at creators and provides access to privately information only visible to creators. All of this could change at any point.

I was all ready to parse HTML, but looking at the source there was a beautiful JS object containing all the data needed to display most pages.

"data": {
  "attributes": {
    "created_at": "2016-04-30T13:58:22+00:00",
    "creation_name": "Entertainment",
    "display_patron_goals": false,
    "earnings_visibility": null,
...

Even better, at the tail end of the long object is the call to fetch just the JSON:

"links": {
<span id="line535"></span>  "self": "https://api.patreon.com/campaigns/355645"
}

So as long as I can get the ID of a campaign, I can get all the information about it in an easily processed format. Thanks to the explore pages and a bit of network monitoring reveals calls to the following URL’s:

https://api.patreon.com/explore/category/12?include=creator.null&fields[user]=full_name,image_url,url&fields[campaign]=creation_name,patron_count,pledge_sum,is_monthly,earnings_visibility&page[count]=20&json-api-version=1.0';

Inspection of the different category pages reveals the number after ‘/category/’ runs from 1 through 14, and 99 for the ‘All’ category. This way, I can fetch all the top campaigns then use the campaign API to retrieve detailed information.

An interesting note – the data structures reveal a lot about how site has been designed and where complexity can be added later – multiple campaigns per user, links between campaigns, etc.

Full code for my scrapper is after the break – I’ll be diving into analysis next.