Introducting edgarWebR

There are plenty of packages for R that allow for fetching and manipulation of companies’ financial data, often fetching that direct from public filings with the SEC. All of these packages have the goal of getting to the XBRL data, containing financial statements, typically in annual (10-K) or quarterly (10-Q)

SEC filings however contain far more information. edgarWebR is the first step in accessing that data by providing an interface to the SEC EDGAR search tools and the metadata they provide.

Current Features

  • Search for companies and mutual funds.
  • List filings for a company or mutual fund.
  • Get all information associated with a particular filing

Simple Usecase

Using information about filings, we can use edgarWebR to see how long after the close of a financial period it takes for a company to make a filing. We can also see how that time has changed.

Get Data

First, we’ll fetch a bunch of 10-K and 10-Q filings for our given company using company_filings(). To make sure we’re going far enough back we’ll take a peak at the tail of our results:

ticker <- "EA"

filings <- company_filings(ticker, type="10-", count=100)
initial_count <- nrow(filings)
# Specifying the type provides all forms that start with 10-, so we need to
# manually filter.
filings <- filings[filings$type == "10-K" | filings$type == "10-Q",]

Note that explicitly filtering caused us to go from 93 to 89 rows.

filings$md_href <- paste0("[Link](", filings$href, ")")
knitr::kable(tail(filings)[,c("type", "filing_date", "accession", "size",
             col.names=c("Type", "Filing Date", "Accession No.", "Size", "Link"),
             digits = 2,
             format.args = list(big.mark=","))
Type Filing Date Accession No. Size Link
87 10-Q 1996-08-14 0000950005-96-000615 72 KB Link
88 10-K 1996-07-01 0000912057-96-013563 197 KB Link
89 10-Q 1996-02-14 0000912057-96-002551 85 KB Link
90 10-Q 1995-11-14 0000912057-95-009843 83 KB Link
91 10-Q 1995-08-10 0000912057-95-006218 142 KB Link
93 10-Q 1995-02-13 0000912057-95-000644 83 KB Link

We’ve received filings dating back to 1995 which seems good enough for our purposes, so next we’ll get the filing information for each filing.

So far we’ve done everything in base R, but now we’ll use some useful functions from dplyr and purrr to make things a bit easier.

# this can take a while - we're fetching ~100 html files!
filing_infos <- map_df(filings$href, filing_information)

filings <- bind_cols(filings, filing_infos)
filings$filing_delay <- filings$filing_date - filings$period_date

# Take a peak at the data
knitr::kable(head(filings) %>% select(type, filing_date, period_date,
                                      filing_delay, documents, filing_bytes) %>% 
             mutate(filing_delay = as.numeric(filing_delay)),
             col.names=c("Type", "Filing Date", "Period Date", "Delay",
                         "Documents", "Size (B)"),
             digits = 2,
             format.args = list(big.mark=","))
Type Filing Date Period Date Delay Documents Size (B)
10-K 2017-05-24 2017-03-31 54.00 127 14,944,223
10-Q 2017-02-07 2016-12-31 38.00 89 10,322,322
10-Q 2016-11-08 2016-09-30 39.04 89 10,233,460
10-Q 2016-08-09 2016-06-30 40.00 89 9,207,799
10-K 2016-05-27 2016-03-31 57.00 133 15,865,077
10-Q 2016-02-08 2015-12-31 39.00 97 11,389,536

Basic Analysis

Now our data is arranged, we can run some summary statistics and plot using ggplot2.

knitr::kable(filings %>%
             group_by(type) %>% summarize(
               avg_delay = as.numeric(mean(filing_delay)),
               median_delay = as.numeric(median(filing_delay)),
               avg_size = mean(filing_bytes / 1024),
               avg_docs = mean(documents)
             col.names=c("Type", "Count", "Avg. Delay", "Median Delay",
                         "Avg. Size", "Avg. Docs"),
             digits = 2,
             format.args = list(big.mark=","))
Type Count Avg. Delay Median Delay Avg. Size Avg. Docs
10-K 22 67.98 62.48 6,848.91 22.18
10-Q 67 40.04 40.00 4,357.80 13.12

No surprise, yearly filings (10-K’s) are larger and take more time than quarterly filings (10-K’s). Lets see how the times are destributed:

ggplot(filings, aes(x = factor(type), y=filing_delay)) +
  geom_violin() + geom_jitter(height = 0, width = 0.1) +
  labs(x = "Filing Date", y = "Filing delay (days)")

plot of chunk unnamed-chunk-6

We can also examine how the filing delay has changed over time:

ggplot(filings, aes(x = filing_date, y=filing_delay, group=type, color=type)) +
  geom_point() + geom_line() +
  labs(x = "Filing Type", y = "Filing delay (days)")

plot of chunk unnamed-chunk-7
Whether because of some internal change or change to SEC rules, the time
between the end of the fiscal year and the 10-K has dropped quite a bit, though there is still a bit of variation.

An interesting extension would be to compare the filing delays to the company’s stock price, overall market performance and other companies to see if there are particular drivers to the pattern.

ggplot(filings, aes(x = filing_date, y=filing_bytes/1024, group=type, color=type)) +
  geom_point() + geom_line() +
  labs(x = "Filing Type", y = "Filing Size (KB)")

plot of chunk unnamed-chunk-8
The jump in size ~2010 is the requirement for inclusion of financial datafiles starting, but there is still interesting variations.

More to come

The SEC contains a wealth of information and we’re only scratching the surface. edgarWebR itself has a lot more functionality than what we’ve explored here and there is more coming.

How to Download

edgarWebR is not yet of CRAN, as the API hasn’t stabilized yet. In the meantime, you can get a copy from github by using devtools:

# install.packages("devtools")

Fetching Patreon Data

Patreon is a delight to scrape. Actually, scrapping is the wrong word for it – the frontend of Patreon is a react application that calls a number of very sensibly designed json end points. Call the same endpoints and you get delightfully clean json that exactly matches what gets displayed on the site.

A disclaimer – this is undocumented as far as I can tell – the publicly documented API (JS Implementation) is targeted at creators and provides access to privately information only visible to creators. All of this could change at any point.

I was all ready to parse HTML, but looking at the source there was a beautiful JS object containing all the data needed to display most pages.

"data": {
  "attributes": {
    "created_at": "2016-04-30T13:58:22+00:00",
    "creation_name": "Entertainment",
    "display_patron_goals": false,
    "earnings_visibility": null,

Even better, at the tail end of the long object is the call to fetch just the JSON:

"links": {
  "self": ""

So as long as I can get the ID of a campaign, I can get all the information about it in an easily processed format. Thanks to the explore pages and a bit of network monitoring reveals calls to the following URL’s:[user]=full_name,image_url,url&fields[campaign]=creation_name,patron_count,pledge_sum,is_monthly,earnings_visibility&page[count]=20&json-api-version=1.0';

Inspection of the different category pages reveals the number after ‘/category/’ runs from 1 through 14, and 99 for the ‘All’ category. This way, I can fetch all the top campaigns then use the campaign API to retrieve detailed information.

An interesting note – the data structures reveal a lot about how site has been designed and where complexity can be added later – multiple campaigns per user, links between campaigns, etc.

Full code for my scrapper is after the break – I’ll be diving into analysis next.
Continue reading “Fetching Patreon Data”

Quick Analysis – Patreon Funding

Inspired by the launch yesterday of the Patreon funding campaign for Movies with Mikey, a movie analysis YouTube channel, I’ve performed some rudimentary analysis of how Patreon donors fit into donation tiers.

Movies with Mikey Patreon

A quick primer – Patreon allows creators to collect donations from supporters on an ongoing basis as opposed to a one-time engagement as with Kickstarter. Donations can be by month or by produced work. While various donation levels provide perks or recognition, Patreon tends to be more focused on “support” than perk compared to other funding platforms. As a result, tiers tend to be less about “value for money” and therefore more interesting to analyze.

Patreon, as with Twitch, limits the information publicly available to summary information, but still enough to assess donation breakdown given some reasonable assumptions. For Movies with Mikey (MWM), we are given the total donations and how those are broken down by tier.

While we would like to know the specific distribution of donations, we can estimate the average donation per tier to get an idea of how donations break down. One caveat is that there is some missing data – the sum of donors in tiers only adds up to 744, so 39 supporters or about 5% didn’t select a tier and we have no idea where they fall. To analyze the breakdown, I made a simple spreadsheet to allow easy estimation of donation amounts.

This estimation underestimates the total by 1.8%, which given the ‘missing’ donations suggests a fair degree of accuracy. There are a few hypothesis that come out of this:

  • Donors give the minimum to be in a given tier.
  • There are likely a few high-end donations that pull the average of the top tier up.

I’d stress that this is exploratory work at the moment that uncovered some reasonable hypotheses. To confirm them and to make any recommendations will entail seeing if this pattern holds for other Patreon campaigns which I hope to do in the next few days looking across categories and sizes of campaigns.

One last comment I also plan of returning to is the nature of sites that provide summary statistics. By limiting the information available, they are creating a market for 3rd party scrapers and losing control over the data. For Patreon, there is Graphtreon and Kickstarter has Kicktraq. I’m torn on how the platforms and their users should feel about these, particularly as personal finances are often involved, but I’ll delve more into the issue later.

Amazon’s Twitch Prime Move – How can we see the result?

When Amazon made their move to create Twitch Prime (in what I called the best marketing move of all time), they had all the data – cross usage rates, demographics, subscription rates, etc. Now as we look from the outside, particularly as we’ve developed a number of alternate hypothesis, can we see and evaluate the success of the program?

As outsiders, while we don’t have direct access to the data, there are a number of point we can observe.

Market Embrace

Amazon is leveraging market forces, expecting that the benefits to streamers and viewers are enough to drive adoption. We can watch what streamers do to see if they are seeing the benefits. If Twitch Prime is creating value for them in the form of incremental subscribers, successful streamers should be heavily promoting it to their viewers.

We can certainly see some adoption today in on-stream pitches, stream titles, channel pages and social media mentions.


At the moment, I would hypothesize that there is a lot of excitement over Twitch Prime so current promotions aren’t necessarily market tested. The real proof will be in a month or two to see how normalized the messaging becomes.


We can also watch what Amazon does. They may opt to drop or modify the program if it isn’t delivering the results they are looking for.

Increased Viewers

It is a few steps away, but we could predict that because of Twitch Prime, more people participate on the platform, driving incremental views. Similarly, if viewers subscribe to more channels, they are more likely to regularly watch more, boosting total hours watched and views.

Total views and current viewers are the only numbers available, but watching them over time may provide some insight into overall platform growth thought there are many non-Prime factors impacting these metrics.

Watching Chat Participants

Twitch provides a chat interface where viewers interact with each-other and the streamer. A couple of features of the chat makes it worth watching. Users have “badges” that indicate if they are Prime Members, Subscribers, Moderators and a number of statuses. When viewers subscribe, a message is also sent to the chat.

Twitch Chat showing badges and subscription messages.
Twitch Chat showing badges and subscription messages.

What to look at in the example –

  • Sirlexon – the ‘POW’ is the badge that indicates subscribers while the crown indicates Prime members
  • You can see that Bline_Ophtalmologist subscribed using Twitch Prime
  • EQWashu has a sword that indicates they are a moderator
  • Franch1s3 has no badges

What this means is that we can test our hypothesis by counting users in the twitch chat to see how many users are subscribers and/or Prime members.

Twitch Prime: Alternate Hypothesis

I proposed yesterday that Twitch Prime is an amazing move by Amazon to help turn streamers into salespeople for Amazon Prime. Clearly, people love the new program and perks – it is showing up all over twitch in channels I follow. For example –

Twitch Prime Sign-up banner for The Attack
Twitch Prime Sign-up banner for The Attack

If you want to hear 10 minutes of Amazon love, here’s Day9 and friends discussing Twitch Prime and how great Amazon Prime is (beyond just for the twitch offerings…). It also touches on the oh-so-important community parts of twitch and subscriptions.

Even when streamers aren’t giving direct calls to action to join Amazon Prime, they can’t resist discussing the other positives to Prime. Day9 is a big enough deal in the streaming world that he’s heard from many other subscribers and raises a couple of interesting points based on what he’d heard –

  • Some streamers reported their subscribers quadrupling over the weekend when TwitchPrime was announced.
  • Instead of hanging out watching for 6 months before becoming a subscriber, with Prime they’re more likely to dive right in a to test the community waters.
  • The first subscription is the highest hurdle – so the motivation may be to drive those incremental subscriptions as people are more likely to make a subsequent full-cost subscription.

That last point could serve as an alternate (or additional) motivation for Twitch Prime. Tomorrow, I’ll explore how these hypothesis could be tested.