Take a Bunch of Empty Words and Make Them Mean Something

Jean Stagarescu · May 2026

Aggregate profiling applied to Bladee (284 songs) and Ecco2k (45 songs) using Warriner et al. VAD norms and Brysbaert et al. concreteness ratings yields nearly identical profiles across all four dimensions (max gap 1.8 percentage points [pp]) between the two artists. A mainstream rap baseline sample (476 songs over 10 artists) shows a similarly small separation. Weighted log-odds analysis (Monroe et al.) applied to the vocabulary reveals a divergence the aggregate suppressed: a 11.5pp valence gap and 7.7pp arousal gap between the two artists' distinct lexica, with Ecco2k's vocabulary exhibiting a bimodal valence distribution not seen in Bladee's.

Data and Methods

Corpus

Lyrics were gathered using the Genius API. The final corpus comprised 284 Bladee songs and 45 Ecco2k songs after filtering out songs with insufficient data. A mainstream rap baseline was compiled from 50 songs each across 10 artists (Drake, Kendrick, J. Cole, Travis Scott, Future, 21 Savage, Kanye West, Lil Wayne, Uzi Vert, Young Thug), yielding 476 total songs.

Bladee:Ecco2k:
Valid songs:24845
Total tokens:56,9547,199
Mean tokens/song:201160
VAD coverage:51.5%51.1%
Concreteness coverage:90.8%91.5%

The VAD coverage of approximately 50% reflects the expected proportion of content words in running text while function words and proper nouns account for most uncovered tokens. Concreteness coverage above 90% reflects the larger Brysbaert vocabulary.

Aggregate profiling

For each song, mean VAD and concreteness scores were computed over all tokens present in the respective database. Artist-level profiles were computed as unweighted means across songs. The scores were reported as percentage of scale:

raw − scaleminscalemax − scalemin × 100

where VAD uses a 1-9 scale and concreteness uses a 1-5 scale.

Distinctive Vocabulary

Aggregate profiling assigns equal weight to all tokens; this causes the large shared pool of natural function words to dominate and suppress artist-specific signals. I applied the weighted log-odds ratio developed by Monroe et al. in order to isolate lexical preference. This procedure produces a z-score for each word type reflecting the degree to which it is over-represented in one artist's corpus relative to the other, penalized for high-variance estimates arising from low counts. Only word types with >= 4 total occurrences across both corpora were included (1147 types). Positive z-scores indicate Bladee's distinctive usage while negative z-scores indicate Ecco2k's distinctive usage.

Results

Aggregate profiles

Aggregate VAD and concreteness profiles for both artists and the mainstream baseline:

Bladee:Ecco2k:Baseline:B-E gap:B-Base gap:E-Base gap:
Valence:58.8%60.6%57.8%1.8pp+1.0pp+2.8pp
Arousal:39.4%39.1%41.0%0.3pp-1.5pp-1.9pp
Dominance:56.1%56.8%55.8%0.7pp+0.3pp-1.0pp
Concreteness:48.8%51.1%47.6%2.3pp+1.2pp+3.5pp

Bladee and Ecco2k are not meaningfully separated across any dimension. The max gap of 1.8pp is within the range of sampling noise when taking into account the corpus size disparity. Separation from the baseline is small as well; the largest deviation being Ecco2k's concreteness score.

Bladee's Career Arc

Profiles across Bladee's 23-album discography reveal high arousal stability (5.3pp total range: 36.6% to 41.9%) alongside a wider spread in valence (13pp range: 53.9% to 66.9%). EXETER is an outlier at 66.9% valence (8.1pp over Bladee's mean) and 36.7% arousal (2.7pp under Bladee's mean). This is the only album that stands clearly outside the main cluster on both dimensions simultaneously. Evil World occupies the opposite extreme (53.9% valence, 41.9% arousal). The near-constant arousal range suggests the sedated register is a stable production effort rather than a varying emotional state.

Distinctive Vocabulary

Mean affective scores for top-12 distinctive words per artist:

Bladee top-12:Ecco2k top-12:
Mean valence:49.7%61.3%
Mean arousal:42.9%35.2%
Valence gap:11.5pp (Ecco2k greater)
Arousal gap:7.7pp (Bladee greater)

Bladee's most distinctive content words (die, blood, drain, cold, shit, fuck) concentrate in low-valence, high-arousal space. The highest arousal distinctive words in Bladee's vocabulary, fuck (7.14/9), die (6.27/9), blood (5.76/9), are among the most activating items in the Earriner database. Ecco2k's most distinctive words (heart, crystal, arm, rock, card, ask) occupy a moderate to high valence and low arousal space.

The major structural difference between the artists concerns the distribution rather than the mean. Ecco2k's distinctive vocabulary contains both the highest valence words in the dataset (happy [8.47/9], kiss [7.78/9], peace [7.75/9], romance [7.28/9]) and a concentrated somatic distress cluster (nauseous [1.95/9], afraid [2.25/9], bleed [2.47/9], bruise [3.24/9]). Bladee's distinctive vocabulary has no equivalent high-valence pole; it is more uniformly distributed in the low valence and high arousal region. The bimodal distribution in Ecco2k's distinctive vocabulary is entirely suppressed in the aggregate mean, explaining the near-zero valence gap at the aggregate level.

Ecco2k's floor valence song (BLEEEEEDDD, valence 5.021) exceeds the valence of almost all of Bladee's bottom decile. This indicates Ecco2k's vocabulary systematically avoids the very low valence register which Bladee consistently reaches.

Discussion

Aggregate profiling is not sensitive enough to detect stylistic divergence between artists who share a large neutral vocabulary substrate. The Monroe et al. weighted log-odds approach recovers a signal that aggregate means suppress, producing effect sizes an order of magnitude greater than the aggregate gaps. This is a general limitation of unweighted aggregate profiling applied to corpora with high overlap rather than a property of Ecco2k and Bladee in particular.

Ecco2k's characteristic vocabulary being substantially calmer than Bladee's despite a vocal register often perceived as more emotionally heightened is the finding least recoverable from listening alone and most clearly demonstrated by the norms. The bimodal valence distribution in Ecco2k's distinctive vocabulary is consistent with a lyrical mode organized around the coexistence of somatic vulnerability and tenderness, rather than the uniform affective intensity characterizing Bladee's distinctive lexicon.

References

Warriner et al. (2013) - Valence, arousal, and dominance norms

Brysbaert et al. (2014) - Concreteness ratings

Monroe et al. (2017) - Fightin' Words

Drain Gang

Genius API

Relevant Python Scripts