Some Thoughts on Data Science

by Prof. Seungchul Lee
Industrial AI Lab

Table of Contents

1. Get Involved in Data Science Now

  • Most researchers will be interested in how data science and machine learning techniques can be applied to their domains
  • but you will need to spend substantial time learning the domain itself

Data Problems We Would Like to Solve

Solving with Deep Learning

  • When you come up against some machine learning problem with “traditional” features (i.e., human-interpretable characteristics of the data),
    • do not try to solve it by applying deep learning methods first
    • Instead, use
      • linear regression/classification,
      • linear regression/classification with non-linear features, or
      • gradient boosting methods
  • If you really want to squeeze out a 1-2% improvement in performance, then you can apply deep learning
    • However, it’s also undeniable that deep learning has made remarkable progress for structured data like images, audio, or text

What About “Superhuman” Machine Learning

  • It’s a common misconception that machine learning will outperform human experts on most tasks
    • No, it is supervised learning
    • Cannot be better than your training data
  • In reality, the benefit from machine learning often doesn’t come from superhuman performance in most cases,
    • it comes from the ability to scale up expert-level performance extremely quickly

Dealing with Impossible Problems

  • You’ve built a tool to manually classify examples, run through many cases (or had a domain expert run through them), and you get poor performance
  • What do you do?
    • You do not try to throw more, bigger machine learning algorithms at the problem
  • Instead you need to change the problem by:
    • 1) changing the input (i.e., the features),
    • 2) changing the output (i.e., decomposing it to smaller sub-problems)

Machine Learning vs. Deep Learning

  • State-of-the-art until 2012

  • Deep supervised learning

  • Hyperparameters

    • Learning rate
    • # of iterations
    • # of hidden layers
    • # of hidden units
    • Choice of activation functions

2. Study Materials

인공지능 어떻게 공부할 것인가?

  • Deep learning 으로 인공지능을 처음 공부하면 안된다.
  • Linear algebra, Optimization, Statistics, Probability, Machine Learning
    • Then deep learning
  • (Numerical or Scientific) Computer Programming
    • MATLAB or Python
    • 개념, 수식, 코드

강의 대부분의 내용은 아래 연구자분들의 자료를 선택적으로 취합해서 만들어졌습니다.

1) Linear Algebra

2) Optimization and Linear Systems

3) Machine Learning

4) Deep Learning

University Lectures on Deep Learning

  • CMU

  • NYU

  • MIT

  • Toronto


한국어 강좌

In [1]: