Kensho Extract

Bulk text, table and key-value extraction made easy.

Access the Data Stored Inside Your Documents

Kensho Extract has been trained on millions of financial documents to make it easier to get access to the data hidden inside your files. Whether you’re looking to make your financial documents machine readable, trying to tie table data into your proprietary database, or looking for specific data points across multiple documents, Kensho Extract can help.

See for yourself! Document Segmentation & Layout Analysis.

Kensho Extract is a fundamental machine learning capability that allows users to get access to the data stored inside their financial documents in a simple-to-use format for further analysis and action. Kensho Extract can be used independently or in conjunction with other services offered by Kensho.

Document being processed by Kensho Extract

Combining our document layout analysis and table structure recognition models, Kensho Extract allows users to quickly transform their unstructured documents into a machine-readable format that organizes the headers, titles, paragraphs, tables and footers detected within the document in their natural reading order. Our extraction capability interprets messy page layouts, structuring text into cohesive paragraphs that can be effectively analyzed and searched.

Kensho Extract will work with you!

Kensho Extract can be accessed in one of three ways:

A simple, easy-to-use API for fast, programatic, high-throughput extraction.
An intuitive UI for your team to review extraction results, make corrections and (optionally) to train our machine learning models.
A full service human-in-the-loop solution, providing you with the best possible possible data extraction quality.

Human-In-The-Loop (HITL) Services

Automated extraction services are never 100% perfect, but in partnership with S&P Global, Kensho provides you with the best possible experience. The human-in-the-loop service can be staffed on your end with access to our UI, or by Kensho, to allow a more hands-off approach to achieving the highest possible data extraction quality for your unique specifications.

Kensho Extract Use Cases

Text Extraction
Parse apart your documents and turn them into an easy-to-consume machine readable format.
Table Extraction
Find and extract the tables you care about for easy analysis or database updates.
Extract text and tables while maintaining page structure for easy translation to other languages.
Key-Value Extraction
Find specific values in your documents to reduce your manual data operations efforts.
Augment your documents by pairing Kensho Extract with our NERD and LINK services.
Standardize the data you extract across different document types, companies, industries or geographies into a unified dictionary of terms.
Natural Language Processing (NLP)
Make it easy to run your own NLP models on documents without having to deal with data extraction or structuring yourself.

Frequently Asked Questions (FAQs)

Do you support any type of document?

We support any type of document which contains readable text, though poorly formatted documents are likely to result in lower extraction quality. Kensho Extract performs best with PDF files.

Do you support languages beyond English?

Yes, we support extraction in any language, although performance will be better for left-to-right languages.

Do you support table extraction?

Yes! You can extract tables and text from documents in their correct reading order.

Do you ever miss tables?

No extraction model is perfect, which is why our intuitive UI will allow you to easily draw a boundary around the tables we missed for a quick way to resolve any errors.

I only care about a single table in each document (e.g. the income statement), can you automate its extraction?

With some training, Kensho Extract will be able to identify and send back just the table(s) or section(s) that interest you, leaving out everything else.

Why Kensho Extract?

Structured data is valuable. Whether you’re…

...adding data to your CRM
...looking to invest in a new startup
…needing a better understanding of your financial operations
...starting a new consulting project

The fundamental block for all of these initiatives is having access to clean, structured data.

Unfortunately, the data most companies have is neither structured nor clean — whether hidden in slide decks, PDFs, or in a database that has mutated a dozen times since inception, data is frequently all but inaccessible. That is, unless you’re willing invest a lot of incredibly valuable expert time in trying to understand the information and then attempt to structure it via liberal use of spreadsheets.

We feel your pain.

S&P Global employs thousands of trained analysts who process more than 5 million pages of financial content on a yearly basis. Luckily, all that effort has created one of the largest data sets of machine learning training data for corporate financial documents, allowing us to speed up our internal operations anywhere from 50% - 100% depending on the task at hand.

Let us help you too