Projects · Data Storytelling
The Hip-Hop
Database
138 artists. 40 variables. A from-scratch dataset built to make measurement theory concrete — and to start arguments in the best possible way.
<div class="hh-stat">
<div class="hh-stat-num">138</div>
<div class="hh-stat-label">Artists</div>
</div>
<div class="hh-stat">
<div class="hh-stat-num">40</div>
<div class="hh-stat-label">Variables</div>
</div>
<div class="hh-stat">
<div class="hh-stat-num">6</div>
<div class="hh-stat-label">Eras</div>
</div>
<div class="hh-stat">
<div class="hh-stat-num">8</div>
<div class="hh-stat-label">Lyrical Styles</div>
</div>
<div class="hh-stat">
<div class="hh-stat-num">19</div>
<div class="hh-stat-label">Deceased</div>
</div>
<a href="hiphop_periodic_table.html" class="hh-btn-primary">Open Periodic Table →</a>
<a href="hiphop_dashboard.html" class="hh-btn-secondary">Open Dashboard →</a>
Explore
What’s in this project
138 artists arranged as chemical elements — grouped by lyrical style,
ordered by era, assigned a unique atomic weight. Click any element
for the full artist profile. Filter by region, era, or confidence tier.
Six interactive tabs walking through construct validity, sampling bias,
the aggregation problem, uncertainty communication, and visualization
design — all using the same underlying dataset.
The living source file: 40 columns including five lyricism dimension scores,
Billboard chart data, Grammy and award counts, career span, platinum albums,
atomic weight formula, deceased status, and confidence flags.
Methodology
How the data was built
This dataset was built in three layers. The first layer is subjective: five lyricism dimensions scored 1–10 using published critical analysis, academic literature, and lyrical scholarship as source material. The second layer is objective: verifiable commercial and biographical data drawn from Billboard, RIAA, and award body records. The third layer is derived: composite scores, atomic weights, and confidence flags calculated from the first two.
<div class="method-block-title">Rhyme Density</div>
<div class="method-block-text">Internal rhyme complexity, multisyllabic patterns, and technical flow construction.</div>
<div class="method-block-title">Vocab Breadth</div>
<div class="method-block-text">Lexical diversity, unique word usage, and linguistic range across recorded output.</div>
<div class="method-block-title">Storytelling</div>
<div class="method-block-text">Narrative coherence, character construction, and scene-building across tracks.</div>
<div class="method-block-title">Metaphor / Imagery</div>
<div class="method-block-text">Figurative language density, originality, and conceptual reach.</div>
<div class="method-block-title">Conceptual Depth</div>
<div class="method-block-text">Thematic ambition, philosophical weight, and consistency of ideas across a body of work.</div>
"Lyricism" is a single-word construct that we decomposed into five measurable dimensions.
That choice is not neutral — it reflects assumptions about what counts as lyricism and
whose critical tradition we used to define it. This is the construct validity question
that underlies every KPI, survey instrument, and performance rating in your organization.
Formula
The atomic weight
Every artist has a unique atomic weight — a continuous value that works like an element’s atomic number, but is calculated rather than assigned. It can accommodate new artists without renumbering everyone else.
Atomic Weight = (Career Span × Composite Score) + Total Awards
+ (Birth Year ÷ 100,000) + (Hot 100 Entries ÷ 10,000,000)
— The last two terms are tiebreakers only, invisible at 4 decimal places.
— Career Span = Last album year − Debut year (minimum 1).
— Total Awards = Grammy wins + BET Hip-Hop wins + MTV VMA wins + AMA wins.
The formula rewards sustained artistic output over raw peak performance. Rakim (#1 at 281.62) has no Grammys and modest chart presence — he ranks first because a 32-year career multiplied by an 8.8 composite accumulates more weight than a short brilliant run. Biggie, with a 9.0 composite and only a 4-year career, sits at 38.02. That tension is the point.
Transparency
Documented uncertainty
Every artist carries a confidence flag — High, Medium, or Low — reflecting the quality of critical documentation available when the lyricism scores were assigned.
| Flag | Meaning | Example artists |
|---|---|---|
| H | Substantial academic and critical documentation. Scores are well-grounded. | Kendrick Lamar, Rakim, Nas, Eminem, OutKast |
| M | Moderate documentation. Scores are reasonable but provisional. | Cam’ron, J.I.D, Doechii, Smino, Maxo Kream |
| L | Thin critical record. Scores are estimates only — treat with caution. | GloRilla, Mavi, Zach Fox, Yaya Bey, Dana Dane |
Most dashboards present numbers without confidence levels. This one doesn't.
Where in your organization's reporting does uncertainty get hidden?
What decisions are being made on L-confidence data presented as H-confidence?
Built with R · Quarto · readxl · ggplot2 · plotly · reactable