RStats
I recently realized that edgarWebR 1.0 was released a while ago without much fanfare. 1.0 is a major milestone for the library, bringing the full set of (initial) planned functionality along with some bonus features.
Headline features:
100% coverage of SEC search tools. Parsing of submissions into component files and 10-x filings into items and parts. A dataset of SIC mappings What’s Next:
Bugfixes - corner cases keep popping up that need fixing Parsing Improvements - I have some ideas about table handling that will help anyone interested in getting data out of older filings EDGAR Tools The EDGAR System provides a number of tools for filing and entity lookup and examination.
Last time we made contour maps of densities of points on a globe, now it is time to take another step and make heatmaps. We created all the data we needed when creating the contours, but heatmaps add new challenges of dealing with large amounts of raster and polygon data. Lets get to it.
DISCLAIMER: While I know a thing or two, there’s a reasonable chance I got some things wrong or at very least there are certainly more efficient ways to go about things.
It always happens… I get interested in what I think will be a small data project to scratch some itch and end up down a deep rabbit hole. In this case, a passing interest in the geographic distribution of some samples (more on that in a future post) led to a deep dive into spherical distributions and densities.
DISCLAIMER: While I know a thing or two, there’s a reasonable chance I got some things wrong or at very least there are certainly more efficient ways to go about things.
PDS3 is a data standard used extensively by NASA for archiving data from science missions, maintained by JPL. While being replaced by PDS4, PDS4 covers all currently active missions and those covering the history of US space exploration.
The R pds3 package provides tools for parsing PDS3 data, particularly the ODL label format which describes all the metadata of data collection. Want to plot a heatmap of Mars of all the images taken?
New to edgarWebR 0.2.0 are functions for parsing SEC documents. While there are good R packages for XBRL processing, there is a gap in extracting information from other document types available via the site. edgarWebR currently provides functions for 2 of those -
parse_submission() - Processes a raw SGML filing into component documents. These are the ‘Complete submission text file’ on filing pages. Similar to zip files, they contain all the files included in particular submission.
Nothing in this article should be considered as investment advice
Overview In the US, publicly traded companier are required to publish an annual report, called a 10-K. In addition to basic financial information, these reports include management commentary and a disclosure of perceived risks. While the financials are typically where analysts focus, attention is given to reading between the lines of the typically bland and risk adverse narrative sections.
I got interested in using R to automate the process of grabbing the 10K from the SEC website, parsing out the narrative sections, and applying basic sentiment and text analysis.
There are plenty of packages for R that allow for fetching and manipulation of companies’ financial data, often fetching that direct from public filings with the SEC. All of these packages have the goal of getting to the XBRL data, containing financial statements, typically in annual (10-K) or quarterly (10-Q) filings.
SEC filings however contain far more information. edgarWebR is the first step in accessing that data by providing an interface to the SEC EDGAR search tools and the metadata they provide.