Big Mountain Data Spring Event Sessions


Data Science

Intro to Deep Learning
  14  

This session is for anyone exploring the exciting field of deep learning. Topics include: a brief overview of deep learning and the current state of the field (separating the hype from the reality), a discussion of how deep learning fits in the data scientist toolbox, an outline of pragmatic steps for getting your first deep learning project off the ground, and a live demo. Attendees should have an understanding of machine learning and some familiarity with neural networks.

Level 300 - (Intermediate): Basic knowledge of subject matter is suggested
Duration: Hour
Presenter: Devin Didericksen


Model Text Like A Rockstar
  12  

Unstructured text is a hot topic now when building models to get predictive value. Most discussions around this have focused on the text, however there are a lot of secondary features that can be extracted using IBM's Watson, NLTK, and other libraries to boost the predictive value of your text data. This will also show you how to build better models than your typical bayesian approaches. Another topic will be feature reduction when dealing with 100,000s of text inputs for a model as well as text feature insight. Talk is advanced, but for those entering this space I plan to include enough foundation for you to follow along as well.

Level 400 - (Advanced): Experience with subject matter is strongly recommended
Duration: Hour
Presenter: Ben Taylor


Python Pandas
  11  

Pandas is a tool for manipulating tabular. It is one of the core "data science" tools in the Python stack. Come see the basic operations and how it is used by companies big and small for: * Munging data * Basic statistics * Plotting * Handling CSV files * Pivoting data * Feeding in ML

Level 100 - Introduction
Duration: Half Hour
Presenter: Matt Harrison


Lessons Learned from Taylor Swift Competition
  7  

Recent predictive modeling competition highlighted key take-a-ways to remember when building models. This session will cover the logic behind two submissions (one finished second, one sixth) and conclude with points to remember when doing this in anger.

Level 300 - (Intermediate): Basic knowledge of subject matter is suggested
Duration: Hour
Presenter: Anthony Power


Examining Customer Behavior Over Time
  6  

You want to describe or predict customer performance? You have row-by-row transactions tables for your customers? We'll start by walking through distilling summary information from transactional data. We'll develop summaries by customer and rank them. Next we'll add simple tests of significance comparing across two time periods. Finally, we'll take a brief look at repeated measures modeling (using linear mixed models). Demos in R and SAS, code provided. Most of this assumes only some familiarity with data transformation. The concluding bits assume a modest background in statistics.

Level 300 - (Intermediate): Basic knowledge of subject matter is suggested
Duration: Hour
Presenter: doug tharp



Data Operations

Pragmatic Steps to Implement Big Data Analytics
  15  

How to approach and deliver on a big data or analytics project. Do you have a tough problem that begs a tough solution? We discuss why and how to complete the project without rockstars, ninjas, or unicorns. A few case studies will be presented for discussion. Attendee participation is encouraged as we discuss different vendor capabilities and technologies.

Level 200 - (Beginner): Introductory / fast moving
Duration: Hour
Presenter: Alton Alexander


Hadoop 101
  14  

This presentation is for experienced database professionals who are new to Hadoop. The attendees will learn about design principles behind Hadoop and it's basic components such as HDFS and MapReduce. Modern components of Hadoop ecosystem will also be covered such as Hive, Pig, Spark, Oozie, Sqoop and etc. Various real-time SQL implementations will also be reviewed. The attendees will discover some of the typical use cases and blue-prints for Hadoop deployments such as data-warehouse offload, events and logs processing, data exploration platform.

Level 200 - (Beginner): Introductory / fast moving
Duration: 75 Min
Presenter: Alex Gorbachev


Hands on Spark - Workshop
  8  

This is a hands on workshop for professionals experienced with databases, query patterns etc and are looking to get started on Spark. This is a limited seats workshop on a first come first serve basis. Attendes will be given VM's which run a hadoop/spark cluster on it. We will go over the basics of ingesting data into hadoop, go over using Map reduce paradigm. We will discuss spark internals at a higher level and then dive into hands on uses such as ETL, SQL on Spark, running simple Clustering algorithms etc.

Level 200 - (Beginner): Introductory / fast moving
Duration: 75 Min
Presenter: Anant Asthana


Integrating a RDBMS with Hadoop and Big Data Technologies
  7  

Our company built a system mixing Big Data technologies (hadoop/ElasticSearch) along with SQL Server/RDBMS to make a system that is both highly scalable and cost effective. In this session I’ll walk you through the ETL process of pulling data through sqoop, transforming data in hive and presenting a denormalized table in hive. If you are looking to understand how to get data from RDBMS(Relational) into hadoop and leveraging parallel architecture this is the session for you.

Level 100 - Introduction
Duration: Hour
Presenter: Pat Wright


Systems Architecture Solutions for Big Data: From Cloud to Bare Metal
  7  

This will be a brief introduction to the Big Data processing problem. Then there will be an insightful comparison of different approaches that organizations have used to facilitate Big Data processing needs. We will cover a comparison of Cloud Solutions to include AWS, Google, Rackspace, and MS Azure. For Private Cloud deployments, we'll cover solutions from Oracle, EMC/Pivotal, NetApp, Dell, and SuperMicro. Additionally, relative compute power and cost will be addressed using documented benchmarks for these architectures. We'll wrap up with a forward looking view of breakthrough systems technologies for Big Data and some of the ridiculous processing feats that organization's have pulled off.

Level 300 - (Intermediate): Basic knowledge of subject matter is suggested
Duration: Hour
Presenter: Brett Weninger


NoSQL & Hadoop
  7  

Hadoop and NoSQL. They are similar only in that they are both classified as software and do not use SQL. It is often helpful when learning about new technology to juxtapose two things side by side to better understand the differences, the job they are trying to get done, and how they often are seen fitting together within the wild. This talk is an architect’s perspective of what these technologies solve, tools that make using them easier to live with, and suggestions for how to fit them together within your business.

Level 200 - (Beginner): Introductory / fast moving
Duration: Hour
Presenter: Randy Secrist



Track Name

Suggesting a topic does not mean you are presenting that suggestion.

Session Name

Track

Level

Duration

Session Abstract


Would you like a mentor to help you with this presentation?



The unconference track is designed to create sessions and ideas on the fly. We want to allow our attendees to use a room through the day for open conversations and concepts. Everyone can vote on the ideas you want to see and then we will publish them in the schedule. Feel free to submit any topic/idea for discussion.

Idea Name

Description