Title: | SEC Filings Access |
---|---|
Description: | A set of methods to access and parse live filing information from the U.S. Securities and Exchange Commission (SEC - <https://www.sec.gov/>) including company and fund filings along with all associated metadata. |
Authors: | Micah J Waldstein [aut, cre] |
Maintainer: | Micah J Waldstein <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.1.0 |
Built: | 2025-03-08 03:31:25 UTC |
Source: | https://github.com/mwaldstein/edgarwebr |
Provides access to the SEC CIK search tool from here
cik_search(company)
cik_search(company)
company |
Search term to search for CIK |
A dataframe with one row per company with Includes the following columns -
cik
company_href
company_name
try(cik_search("cloudera"))
try(cik_search("cloudera"))
For a given company, either by ticker, CIK, or pre-fetched page, we extract 2 sets of information:
Filing date, accepted date, etc.
Companies included in the filing
company_details( x, ownership = FALSE, type = "", before = "", count = 40, page = 1 )
company_details( x, ownership = FALSE, type = "", before = "", count = 40, page = 1 )
x |
either a stock ticker, CIK number, or XML document for a company page |
ownership |
boolean for inclusion of company change filings |
type |
Type of filing to fetch. NOTE: due to the way the SEC EDGAR system works, it is actually is a 'starts-with' search, so for instance specifying 'type = "10-K" will return "10-K/A" and "10-K405" filings as well. To ensure you only get the type you want, best practice would be to filter the results. |
before |
yyyymmdd format of latest filing to fetch |
count |
Number of filings to fetch per page. Valid options are 10, 20, 40, 80, or 100. Other values will result in the closest count. |
page |
Which page of results to return. |
A list with the following components
data.frame as returned by company_information
data.frame as returned by company_filings
try(company_details("AAPL", before = "20170810"))
try(company_details("AAPL", before = "20170810"))
SEC Company Filings
company_filings( x, ownership = FALSE, type = "", before = "", count = 40, page = 1 )
company_filings( x, ownership = FALSE, type = "", before = "", count = 40, page = 1 )
x |
either a stock ticker, CIK number, or XML document for a company page |
ownership |
boolean for inclusion of company change filings |
type |
Type of filing to fetch. NOTE: due to the way the SEC EDGAR system works, it is actually is a 'starts-with' search, so for instance specifying 'type = "10-K" will return "10-K/A" and "10-K405" filings as well. To ensure you only get the type you want, best practice would be to filter the results. |
before |
yyyymmdd format of latest filing to fetch |
count |
Number of filings to fetch per page. Valid options are 10, 20, 40, 80, or 100. Other values will result in the closest count. |
page |
Which page of results to return. |
A dataframe of company filings
try(company_filings("AAPL", before = "20170810"))
try(company_filings("AAPL", before = "20170810"))
Given a CIK, provide a link to the company information page.
company_href(cik, ownership = FALSE, atom = FALSE)
company_href(cik, ownership = FALSE, atom = FALSE)
cik |
Company code |
ownership |
(default: FALSE) boolean for inclusion of company change filings |
atom |
(default: FALSE) if the link should be to the atom XML feed |
A string with URL requested
company_href("0000037912")
company_href("0000037912")
Fetches basic information on a given company from the SEC site
company_information(x)
company_information(x)
x |
Either a stock symbol (for the 10,000 largest companies) or CIK code |
a dataframe with all SEC company information
try(company_information("INTC"))
try(company_information("INTC"))
Provides access to the SEC Company Name Search from here using a company's formal name rather than its common name.
company_search( x, match = "start", file_number = FALSE, state = "", country = "", sic = "", ownership = FALSE, type = "", count = 40, page = 1 )
company_search( x, match = "start", file_number = FALSE, state = "", country = "", sic = "", ownership = FALSE, type = "", count = 40, page = 1 )
x |
Name of company to search or file number |
match |
(default: 'start') Either 'start' or 'contains' for where in the company name to search |
file_number |
(default: FALSE) if set to TRUE, x is treated as a file number |
state |
(default: ”) Limit to a specific state of registration using 2-letter state abbreviations. Special values:
|
country |
2-character country code. The mapping is non-obvious, so unfortunately the best way to find it is to examine the company search page. |
sic |
SIC Code |
ownership |
boolean for inclusion of company change filings |
type |
Limit to companies with a given filing type - e.g. 'N-PX' |
count |
Number of filings to fetch per page. Valid options are 10, 20, 40, 80, or 100. Other values will result in the closest count. |
page |
Which page of results to return. |
Note On 'Fast Search' –
The SEC
Company Search
page also includes a 'Fast Search' function to "search" by CIK or Stock
Ticker. This doesn't actually search, but rather goes directly to the
company details page if found. If you have a company's CIK or Ticker, use the
company_information
, company_filings
, or
company_details
functions.
A dataframe of companies
cik
company_href
name
location
location_href
formerly
sic
sic_description
sic_href
try(company_search("Intel"))
try(company_search("Intel"))
Provides access to the SEC Current Events search tool from here
current_events(day, form)
current_events(day, form)
day |
(0-5) Day to search for current forms. e.g. '2' returns forms from 2 business days ago. |
form |
Form to return filings (e.g. '10-K') |
A dataframe with one row per company with Includes the following columns -
cik
type
href
company_name
company_href
filing_date
try(current_events(0, "10-K")[1:5,])
try(current_events(0, "10-K")[1:5,])
Returns the current Notice of Effectiveness from the most recently completed business day from here
effectiveness()
effectiveness()
You can also see the same filings going further back by using 'latest_filings()' specifying the type = "EFFECT"
a data.frame with each row as a submission with the following columns:
try(effectiveness())
try(effectiveness())
The SEC generates a html page as an index for every filing it receives containing all the meta-information about the filing. We extract 3 main types of information:
Filing date, accepted date, etc.
All the documents included in the filing
Companies included in the filing
Funds included in the filing
filing_details(x) ## S3 method for class 'character' filing_details(x) ## S3 method for class 'xml_node' filing_details(x)
filing_details(x) ## S3 method for class 'character' filing_details(x) ## S3 method for class 'xml_node' filing_details(x)
x |
URL to a SEC filing index page |
For a company, there is typically a single filer and no funds, but many filings for funds get more complicated - e.g. 400+ funds with 100's of companies
NOTE: This can get process intensive for large fund pages. If you don't need all components, try just using filing_info
A list with the following components:
A data.frame as returned by filing_information
A data.frame as returned by filing_documents
A data.frame as returned by filing_filers
A data.frame as returned by filing_funds
# Typically you'd get the URL from one of the search functions x <- paste0("https://www.sec.gov/Archives/edgar/data/", "712515/000071251517000063/0000712515-17-000063-index.htm") try(filing_details(x))
# Typically you'd get the URL from one of the search functions x <- paste0("https://www.sec.gov/Archives/edgar/data/", "712515/000071251517000063/0000712515-17-000063-index.htm") try(filing_details(x))
If you know you're going to want all the details of a filing, including documents funds and filers, look at 'filing_details'
filing_documents(x) ## S3 method for class 'character' filing_documents(x) ## S3 method for class 'xml_node' filing_documents(x)
filing_documents(x) ## S3 method for class 'character' filing_documents(x) ## S3 method for class 'xml_node' filing_documents(x)
x |
URL or xml_document for a SEC filing index page |
Information returned:
seq
description
document
href
type
size
A dataframe with all the documents in the filing along with their meta info
# Typically you'd get the URL from one of the search functions x <- paste0("https://www.sec.gov/Archives/edgar/data/", "712515/000071251517000063/0000712515-17-000063-index.htm") try(filing_documents(x))
# Typically you'd get the URL from one of the search functions x <- paste0("https://www.sec.gov/Archives/edgar/data/", "712515/000071251517000063/0000712515-17-000063-index.htm") try(filing_documents(x))
SEC Filing Included Filers
filing_filers(x) ## S3 method for class 'character' filing_filers(x) ## S3 method for class 'xml_node' filing_filers(x)
filing_filers(x) ## S3 method for class 'character' filing_filers(x) ## S3 method for class 'xml_node' filing_filers(x)
x |
URL to a SEC filing index page |
A dataframe with all the filers in the filing along with their info
# Typically you'd get the URL from one of the search functions x <- paste0("https://www.sec.gov/Archives/edgar/data/", "712515/000071251517000063/0000712515-17-000063-index.htm") try(filing_filers(x))
# Typically you'd get the URL from one of the search functions x <- paste0("https://www.sec.gov/Archives/edgar/data/", "712515/000071251517000063/0000712515-17-000063-index.htm") try(filing_filers(x))
SEC Filing Funds
filing_funds(x) ## S3 method for class 'character' filing_funds(x) ## S3 method for class 'xml_node' filing_funds(x)
filing_funds(x) ## S3 method for class 'character' filing_funds(x) ## S3 method for class 'xml_node' filing_funds(x)
x |
URL to a SEC filing index page |
A dataframe with all the funds associated with a given filing
# Typically you'd get the URL from one of the search functions x <- paste0("https://www.sec.gov/Archives/edgar/data/", "933691/000119312517247698/0001193125-17-247698-index.htm") try(filing_funds(x))
# Typically you'd get the URL from one of the search functions x <- paste0("https://www.sec.gov/Archives/edgar/data/", "933691/000119312517247698/0001193125-17-247698-index.htm") try(filing_funds(x))
The SEC generates a html page as an index for every filing it receives containing all the meta-information about the filing.
filing_information(x) ## S3 method for class 'character' filing_information(x) ## S3 method for class 'xml_node' filing_information(x)
filing_information(x) ## S3 method for class 'character' filing_information(x) ## S3 method for class 'xml_node' filing_information(x)
x |
URL or xml_document for a SEC filing index page |
Information returned:
type
description
accession_number
filing_date
accepted_date
documents
period_date
changed_date
effective_date
filing_bytes
Not all details are valid for all filings, but the column will always be present
If you know you're going to want all the details of a filing, including documents funds and filers, look at 'filing_details'
A dataframe with all the parsed meta-info on the filing
# Typically you'd get the URL from one of the search functions x <- paste0("https://www.sec.gov/Archives/edgar/data/", "933691/000119312517247698/0001193125-17-247698-index.htm") try(filing_information(x))
# Typically you'd get the URL from one of the search functions x <- paste0("https://www.sec.gov/Archives/edgar/data/", "933691/000119312517247698/0001193125-17-247698-index.htm") try(filing_information(x))
Provides access to the SEC fillings full-text search tool.
full_text( q = "*", type = "", reverse_order = FALSE, count = 100, page = 1, stemming = TRUE, name = "", cik = "", sic = "", from = "", to = "", location = "", incorporated_location = FALSE )
full_text( q = "*", type = "", reverse_order = FALSE, count = 100, page = 1, stemming = TRUE, name = "", cik = "", sic = "", from = "", to = "", location = "", incorporated_location = FALSE )
q |
Search query. For details on special formatting, see the FAQ. |
type |
Type of forms to search - e.g. '10-K'. Can also be a list of types - e.g. c("10-K", "10-Q") |
reverse_order |
[DEP] If true, order by oldest first instead of newest first |
count |
[DEP] Number of results to return - will always try to return 100 |
page |
Which page of results to return |
stemming |
[DEP] Search by base words(default) or exactly as entered |
name |
Company name OR individual's name. Cannot be combined with 'cik' or 'sik'. |
cik |
Company code to search. Cannot be combined with 'name' or 'sic' |
sic |
[DEP] Standard Industrial Classification of filer to search for. Cannot be combined with 'cik' or 'name'. Special options - 1: all, 0: Unspecified. |
from |
Start date. Must be in the form of 'mm/dd/yyyy'. Must also specify 'to' |
to |
End date. Must be in the form of 'mm/dd/yyyy'. Must also specify 'from' |
location |
Filter based on company's location |
incorporated_location |
boolean to use location of incorporation rather than location of HQ |
A dataframe list of results including the following columns -
filing_date
name
href
company_name
cik
sic
content
parent_href
index_href
try(full_text('intel'))
try(full_text('intel'))
Provides access to the results of the SEC's Mutual fund search tool available here
fund_search(term) fund_fast_search(identifier)
fund_search(term) fund_fast_search(identifier)
term |
Search term to search for in a fund name |
identifier |
A Series, Class/Contract ID, Ticker Symbol or CIK |
NOTE: This is really a specific version of the Variable Insurance search tool.
A dataframe of funds found including the following columns -
class_id
class_filings_href
class_name
class_ticker
series_id
series_filings_href
series_name
series_funds_href
cik
cik_name
cik_filings_href
cik_funds_href
fund_fast_search
: Performs a 'Fast Search' based on a fund identifier
try(fund_search("precious metals")) try(fund_fast_search("VMFVX"))
try(fund_search("precious metals")) try(fund_fast_search("VMFVX"))
Searches filing headers going back to 1994 excluding the most recent day using the interface here
header_search(q, page = 1, from = 1994, to = 2017)
header_search(q, page = 1, from = 1994, to = 2017)
q |
The search string. Documentation here |
page |
Which results page to return (default: 1) |
from |
Start year (default: 1994) |
to |
End year (default: Current year) |
A dataframe of funds found including the following columns -
company_name
filing_href
form
filing_date
size
try(header_search("company-name = Apple"))
try(header_search("company-name = Apple"))
Provides access to the latest SEC filings from here
latest_filings( name = "", cik = "", type = "", owner = "include", count = 40, page = 1 )
latest_filings( name = "", cik = "", type = "", owner = "include", count = 40, page = 1 )
name |
Optional company name to limit filing results |
cik |
Optional company cik to limit filing results |
type |
Optional form type to limit filing results |
owner |
How to include ownership filings. Options are
|
count |
Number of results to return |
page |
Which page of results to return |
a dataframe list of recent results, ordered by descending accepted date. Includes the following columns -
type
href
company_name
company_type
cik
filing_date
accepted_date
accession_number
size
try(latest_filings())
try(latest_filings())
Given a link to filing document (e.g. the 10-K, 8-K) in HTML, process the file into parts and items. This enables follow-up processing of a desired section - e.g. just the Risk Factors. 'item.name' and 'part.name' are taken directly from the document without any attempt to normalize.
parse_filing(x, strip = TRUE, include.raw = FALSE, fix.errors = TRUE)
parse_filing(x, strip = TRUE, include.raw = FALSE, fix.errors = TRUE)
x |
- URL to a filing HTML document, html text or xml_document |
strip |
- Should non-text elements be removed? Default: true |
include.raw |
- Include unprocessed nodes in result? Default: false |
fix.errors |
- Try to fix document errors (e.g. missing part labels). WIP. Default: true |
NOTE: This has been tested on a range of documents, but formatting differences could cause failures. Please report an issue for any document that isn't parsed correctly.
FURTHER NOTE: Not all filings are well formed - missing headings, bad spacing, etc. These can all throw the parsing off!
a dataframe with one row per paragraph
Detected name of the Part
Detected name of the Item
Text of the paragraph / node
Raw HTML of the node if include.raw = TRUE
try(head(parse_filing(paste0('https://www.sec.gov/Archives/edgar/data/', '712515/000071251517000010/ea12312016-q3fy1710qdoc.htm')), 6))
try(head(parse_filing(paste0('https://www.sec.gov/Archives/edgar/data/', '712515/000071251517000010/ea12312016-q3fy1710qdoc.htm')), 6))
Raw SEC filings are sent in a SGML file - this parses that master submission into component documents, with content lines in list column 'TEXT'.
parse_submission(x, include.binary = T, include.content = T)
parse_submission(x, include.binary = T, include.content = T)
x |
- Input submission to parse. May be one of the following:
|
include.binary |
- Default TRUE, determines if the content of binary documents is returned. |
include.content |
- Default TRUE, determines if the content of documents is returned. |
Most of the time the information you need along with the specific files
will be available by using filing_documents
, but there are
scenarios where you may want to access the full contents of the master
submission -
Older submissions are not parsed into component documents by the SEC so access requires parsing the main filing
The SEC only provides what it considers the relevant documents, but filings often include many more ancillary files
If you're fetching many documents from a filing over many filings, there can be efficiency gains from just downloading a single file.
NOTE: non-text documents are uuencoded and need a separate decoder to be viewed.
a dataframe with one row per document. For the metadata (TYPE, DESCRIPTION, FILENAME) it is important to note that these are provided by the filer and have little standardization or enforcement.
Sequence number of the file
The type of document, e.g. 10-K, EX-99, GRAPHIC
The type of document, e.g. 10-K, EX-99, GRAPHIC
The document's filename
The text representation of the document. For text-based documents (txt, html) this is the actual file contents. For binary files (graphics, pdfs) this contains the uuencoded contents.
try( parse_submission(paste0('https://www.sec.gov/Archives/edgar/data/', '37996/000003799617000084/0000037996-17-000084.txt'))[ , c('SEQUENCE', 'TYPE', 'DESCRIPTION', 'FILENAME')] )
try( parse_submission(paste0('https://www.sec.gov/Archives/edgar/data/', '37996/000003799617000084/0000037996-17-000084.txt'))[ , c('SEQUENCE', 'TYPE', 'DESCRIPTION', 'FILENAME')] )
Given a link to a filing document (e.g. the 10-K, 8-K) in TXT, process the file into parts and items. This enables follow-up processing of a desired section - e.g. just the Risk Factors. 'item.name' and 'part.name' are taken directly from the document without any attempt to normalize.
parse_text_filing(x, strip = TRUE, include.raw = FALSE, fix.errors = TRUE)
parse_text_filing(x, strip = TRUE, include.raw = FALSE, fix.errors = TRUE)
x |
- URL to a filing text document or actual text |
strip |
- Should non-text elements be removed? Default: true |
include.raw |
- Include unprocessed nodes in result? Default: false |
fix.errors |
- Try to fix document errors (e.g. missing part labels). WIP. Default: true |
NOTE: This has been tested on a range of documents, but formatting differences could cause failures. Please report an issue for any document that isn't parsed correctly.
FURTHER NOTE: Not all filings are well formed - missing headings, bad spacing, etc. These can all throw the parsing off!
a dataframe with one row per paragraph
Detected name of the Part
Detected name of the Item
Text of the paragraph / node
Raw HTML of the node if include.raw = TRUE
try(head(parse_text_filing( "https://www.sec.gov/Archives/edgar/data/37996/000003799602000015/v7.txt" )))
try(head(parse_text_filing( "https://www.sec.gov/Archives/edgar/data/37996/000003799602000015/v7.txt" )))
SIC code table with structure.
sic_codes
sic_codes
A data frame with 1005 rows and 6 variables:
Standard Industrial Classification
Name of industry
Letter code for the division
Name of the division
Name of the major group, identified by 1st 2 digits of the sic
Name of the group, identified by the 1st 3 digits of the sic
https://www.osha.gov/data/sic-manual
https://www.sec.gov/info/edgar/siccodes.htm
EDGAR submissions are organized fairly regularly. These functions help to fint the URL to submission components.
submission_index_href(cik, accession) submission_href(cik, accession) submission_file_href(cik, accession, filename)
submission_index_href(cik, accession) submission_href(cik, accession) submission_file_href(cik, accession, filename)
cik |
Company code |
accession |
accession number for a filing |
filename |
filename provided in a submission |
A string with URL requested
submission_href
: Creates a link to the master submission
sgml submission file
submission_file_href
: provides the link to a given file within a
particular submission.
submission_index_href("0000712515", "0000712515-17-000090") submission_href("0000712515", "0000712515-17-000090") submission_file_href("0000712515", "0000712515-17-000090", "pressrelease-ueberroth.htm")
submission_index_href("0000712515", "0000712515-17-000090") submission_href("0000712515", "0000712515-17-000090") submission_file_href("0000712515", "0000712515-17-000090", "pressrelease-ueberroth.htm")
Provides access to the results of the SEC's Variable Insurance Product search tool available here
variable_insurance_search(term) variable_insurance_fast_search(identifier)
variable_insurance_search(term) variable_insurance_fast_search(identifier)
term |
Search term to search for in a company, fund or contract name |
identifier |
A Series, Class/Contract ID, Ticker Symbol or CIK |
A dataframe of products found including the following columns -
class_id
class_filings_href
class_name
class_ticker
series_id
series_filings_href
series_name
series_funds_href
cik
cik_name
cik_filings_href
cik_funds_href
variable_insurance_fast_search
: Performs a 'Fast Search' based on an identifier
try(variable_insurance_search("precious metals")) try(variable_insurance_fast_search("VMFVX"))
try(variable_insurance_search("precious metals")) try(variable_insurance_fast_search("VMFVX"))