Saturday, July 4, 2026

Cohort baseline for the pre-removal download figures (v2) MINTED · AXN:041F.EMPIRICAL.⚡馃寳馃帾馃晫⌛馃憟 · deposit #1043 · canonical-text sha256 130685296eb8d2be64c54c93e7b2cc1599341732f0bf846c410e2b3d0d8715f3 · record · minted 2026-07-04 by direct MANUS mint

 

Cohort baseline for the pre-removal download figures (v2)

MINTED · AXN:041F.EMPIRICAL.⚡馃寳馃帾馃晫⌛馃憟 · deposit #1043 · canonical-text sha256 130685296eb8d2be64c54c93e7b2cc1599341732f0bf846c410e2b3d0d8715f3 · record · minted 2026-07-04 by direct MANUS mint

Erratum (v2, 2026-07-04, same day). v1 framed the captured 666 as "~2 days" of accumulation for v7.2. A platform timestamp obtained the same day (TikTok Post analysis: image posted Jun 15, 2026, 4:42 PM — the day of the v7.2 upload) supersedes the recollection that produced that framing, and the 666 is best read as the record's cumulative counter at that moment. The corrected anchors: 666 cumulative at 2026-06-15 16:42 (platform-timestamped) and 1,000+ at removal, 2026-06-19 (contemporaneous issue body, unrefuted) — a delta of >=334 downloads in <=4 days. The cohort table below is unchanged; the Reading and Conclusion are restated on the corrected anchors. v1 is preserved in repository history.

Measured 2026-07-04, unauthenticated Zenodo REST API. Raw sample: zenodo-baseline-sample.json (checkable; re-runnable from the query below).

Question. Is 666 downloads in ~2 days (captured, this set) / 1,000+ in ~4 days (asserted contemporaneously in zenodo/zenodo#2606, day of removal) "a lot by Zenodo's standards"?

Method. Cohort = all Zenodo records of type dataset with publication_date:[2026-06-14 TO 2026-06-16] — the same publication window as the registry's v7.2 (June 15). Cohort population: 1,791. Sample: first 200 by mostrecent within the window (API page cap 25 unauthenticated; 8 pages). Stats read 2026-07-04, i.e. after ~19 days of accumulation for the cohort, versus ~2 days for the registry's captured figure — the comparison is therefore conservative by roughly a factor of ten in accumulation time, in the cohort's favor.

Findings (n=200).

statistic this-version downloads all-versions downloads
median 7 6
mean 43.0 21.1
p90 53 43
p95 116 77
p99 804 397
max 3,087 561
≥ 666 after ~19 days 2/200 (1.0%) 0/200 (0.0%)

Reading (v2). The four-day delta alone (>=334 downloads, Jun 15->19) is ~48x the cohort's nineteen-day median (7) and exceeds its nineteen-day p95 (116) at the version level. The cumulative 1,000+ at removal exceeds the cohort's version-level p99 (804) and the all-versions maximum observed (561). On any field mapping, the removed dataset's usage sat at the extreme top of its publication-week cohort.

Limitations, on the record. (1) The sample is the window's 200 most recent, not a random draw; the window is only three days wide, limiting ordering bias, but this is a convenience sample. (2) The Zenodo UI counter captured at 666 is not field-labeled in the interface; API naming (downloads = this-version, version_downloads = all-versions) means the captured figure maps to one of the two columns above — it clears the 99th percentile threshold in either mapping at the version level and exceeds the observed maximum at the all-versions level. (3) Bot and crawler traffic is unmodeled on both sides of the comparison. (4) Cohort stats were measured post-hoc on 2026-07-04 and will drift; the raw sample is preserved for exactly that reason.

Conclusion, bounded (v2). By the platform's own contemporaneous cohort, the removed dataset's usage was in the top ~1% of same-week datasets, and its final four days alone outpaced what 95% of that cohort accumulated in nineteen. "A lot by Zenodo's standards" is hereby a measurement, not an impression — and this note's own correction trail is part of the measurement's warrant.

No comments:

Post a Comment