GoVertical presents

Vertical ML/AI Startup Creation Weekend

Hosted by Madrona Venture Labs & TiE Seattle

As a free benefit for participants, we would like to extend an invitation to the Amazon SageMaker workshop on Feb 14 from 1p-5p.


Insurance resources

Welcome to the Insurance vertical page! In order to make the most of the time the weekend of the event, please review our key educational materials and data sets. 

Be Prepared! Start thinking through what types of data could power your business and product ideas. Often times a combination of multiple, disparate data sets can yield the most ingenious ideas and solutions!

Panel videos

The following videos were recording during the April 19 Panel event. You may wish to reference them in preparation of the weekend ML event.

ML Panel moderated by Dan Weld. Panelists: Xin Luna Dong, Yejin Choi & Kevin Jamieson


VC Panel moderated by Jay Bartot. Panelists: Tim Porter, Mike Miller, Pradeep Rathinam & Ankur Teredesai


Sector analysis

Vertical description

Insurance is a contract, represented by a policy, in which an individual or entity receives financial protection or reimbursement against losses from an insurance company. The company pools clients' risks to make payments more affordable for the insured. Insurance policies are used to hedge against the risk of financial losses, both big and small, that may result from damage to the insured or her property, or from liability for damage or injury caused to a third party. The main theme would be to use new technology, ML/AI to improve decision making abilities, decrease costs, and improve the experience for consumers. There are opportunities to disrupt the sector in many ways from smartphone insurance to business insurance.

How big an opportunity space is this, how is it growing, and what’s driving that growth?  

What are the segments/pockets?

The consumer insurance market can be segmented into Life, P&C, and Health insurance. The US insurance markets are all growing single digits and driven by the same trends.  Consumers embracing technology, new potential customers, and a willingness to change will lead to growth. (e-visits, more sensors, data collection, InsurTech).

What is the technology spend and trend in this category, or the revenue growth rate of companies in the category?

IT spend is estimated to go up by 4.7% globally in the insurance industry. Over the last 2 years IT spend in the US insurance industry has gone up ~7% Global revenue growth is about the same at 4.6% YoY, much higher in emerging areas.

Insurance Industry IT Spending

What has been the VC investing trend in this category?  

What are the proof points that success may be rewarded?

At a high level, what problems are there to be solved using technology?  

What current trends are driving change in this category?  

How specifically can ML/AI change the game in this category?  

Investment hypothesis / rationale

Potential vulnerabilities

What adverse conditions / headwinds are there for a play in this space? What makes it difficult?

Data sets

Your novel business idea should be grounded in real-world data with plausible machine-learning/analytics on top. We've compiled a collection of datasets from which to gain inspiration. Note that you are not restricted to basing your idea on the data sets below. You may discover other open source data sets that inspire your creativity or you may bring your own proprietary data sets if you wish.

Many of the datasets below are from Kaggle, Figure-Eight (Crowdflower), Data.World, etc. The advantage of these datasets is that many have been cleaned and normalized and are ready to be explored with ML and data science tools. Note that the use of these datasets is often intended for research purposes only. Be sure to read any associated license agreements to understand if there are commercial restrictions if you plan to continuing using the data after the workshop is over.

Sample Data Sets

Can computer vision spot distracted drivers?

Fatal car crashes for 2015-2016

Prediction of the charges of insurance based on information given by the people

In this dataset, you are provided over a hundred variables describing attributes of life insurance applicants. The task is to predict the "Response" variable for each Id in the test set. "Response" is an ordinal measure of risk that has 8 levels.

This data set used in the CoIL 2000 Challenge contains information on customers of an insurance company. The data consists of 86 variables and includes product usage data and socio-demographic data

This folder contains data behind the story Dear Mona, Which State Has The Worst Drivers?

Misc US Census data sets