What is Data Science?

PSTAT 234 (Fall 2025)

Sang-Yun Oh

University of California, Santa Barbara

Data Science Example: Google Ads

  • Backbone of Google’s revenue model

  • 80.20% market share in pay-per-click (PPC) ads market ($237.8 billion revenue in 2023)

  • Precision targeting: search intent, demographics, location, time

  • Scalable for all businesses: $50 to $5M budgets

  • Global reach with local control

  • “Leveling the Playing Field: Great for Small Businesses”

  • 82% of small businesses attribute revenue growth to digital ads and 79% say these tools help them compete with larger companies.

  • Faced accusations of anti-competitive practices, market dominance, and potential favoring its own services in search results.

How Google Ads Work

For this example, assume we are looking at search ads.

  • Advertisers bid for ads to appear during searches.
  • An auction determines which ads are shown based on bid and estimated user interest.
  • High-placing ads are shown to users
  • Advertisers pay Google if users click on the ads.
  • Enhanced Campaigns allow customized bidding based on user context (e.g., time, device, location).

Internet Advertising Data

Advertising data often have the following variables:

Variable Description
impr Number of ad impressions (ads shown)
click Number of clicks
cost What advertisers paid
conv Number of conversions (purchases)
value Value of conversions as reported by advertisers
cpm Cost per impression (cost/impr)
cpc Cost per click (cost/click)
cpa Cost per conversion (cost/conv) (or 0, if conv is 0)
cpr Cost per return (cost/value) (or 0 if value is 0)
roi Return on investment (value/cost)

Data variations

Prefix indicates platform
m.* Mobile, e.g. m.impr
d.* Desktop/tablet, e.g. d.impr
Suffix indicates when
*_pre Before experiment, e.g. m.impr_pre
*_post In experiment, e.g. m.impr_post

From the data, derive new features for analysis:

Derived features
error.cpr* m.cpr - d.cpr (pre, post)
mult.change Change in mobile multiplier

Common Questions from Advertisers

  • How much am I paying for ads? On desktop? On mobile?
  • How much am I paying for clicks?
  • How much am I paying for conversions (purchases)?
  • How much was spent for each dollar of value generated (cost per return)?
  • How much am I getting back in comparison to what I spent? (return on investment)

Data Science is …

  • A multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data (reference)

  • Merger of statistics, data analysis, machine learning and their related methods in order to understand and analyze actual phenomena with data (reference)

  • Composed of techniques and theories drawn from many fields within the context of mathematics, statistics, computer science, and information science. (reference)

[International Statistical Review]

  • Domain expertise: data analysis collaborations in subject matter areas.

  • Mathematics/Statistics: models, estimation, and distribution based on probabilistic inference.

  • Computing: hardware and software; computational algorithms

  • Theory: foundations of data science; mathematical investigations of models and methods

Data Science Approach

[UC Berkeley, School of Information]

Statistics and Data Science

  • Many statistical methods make (optimistic) assumptions.

  • Data science often focus on practical benefits (predictions).

  • Data science often iterate to make improvements.

  • Data science process emphasize entire data science lifecycle

  • Data science develops processes (often custom built)

Data Science Diagram

Drew Conway: The Data Science Venn Diagram

References

Chihara, Laura M., and Tim C. Hesterberg. 2018. Mathematical Statistics with Resampling and R. 1st ed. Wiley. https://doi.org/10.1002/9781119505969.