Library term·FinTech & data science
Puppeteer Pipelines for Central Bank & Regulator Document Harvesting
Headless ethics, rate respect, DOM brittleness, PDF extraction and legal TOU compliance for macro feeds.
Authored by·Editorially reviewed
Onur Erkan YıldızFounder, Financial Engineer · CMB-licensed
Higher education in Financial Engineering and Money & Capital Markets. SPK (Turkey CMB) licence. 16 years across institutional markets, research, and quant-driven analytics.
When APIs fail
Some macro PDFs lack clean APIs — headless fetch emerges. Still respect robots/TOS and throttle politely.Brittleness
CSS selectors break on redesigns — prefer stable anchors or vendor OCR fallbacks paired with alerting.Normalisation pipeline
Convert to structured JSON with checksum versioning for downstream factor libraries.Finvestopia
Our news layering philosophy: primary sources trump social rewrites — scraping must mirror that ethos legally.Related entries
Rate Limiting Market Data APIs — Consumer & Provider Perspectives
Token bucket, exponential backoff with jitter, per-tenant fairness and abuse containment for websocket farms.
Missing Data in Financial Time Series — Imputation Without Lookahead
MCAR/MAR/MNAR taxonomy, forward-only fills, Kalman-style smoothers vs naive ffill dangers in backtests.
Educational content authored by our team — informational only, not investment advice.
