Skip to content

Welcome to Analytics-Data-Where-House Docs!

Table of Contents

  1. System setup
  2. User's Guide
  3. Developer's Guide
  4. Data Sources

Platform Overview

This platform automates curating a local data warehouse of interesting, up-to-date public data sets. It enables users (well, mainly one user, me) to easily add data sets to the warehouse, build analyses that explore and answer questions with current data, and discover existing assets to accelerate exploring new questions.

At present, it uses docker to provision and run:

  • a PostgreSQL + PostGIS database as the data warehouse,
  • Apache Superset for:

    • Interactive Data Visualization and EDA
    • Dashboarding and Reporting

    Geospatial Data Analysis Time Series Analysis Dashboarding

  • a pgAdmin4 database administration interface,

    Sample Exploration of a DWH table

  • Airflow components to orchestrate execution of tasks,

    Airflow DagBag for Cook County tag

  • dbt to:

    • manage sequential data transformation + cleaning tasks,
    • serve data documentation and data lineage graphs, and
    • facilitate search of the data dictionary and data catalog

    dbt Data Lineage Graph All Data Tables' Lineage Graphs One Data Set's Lineage Graph

  • great_expectations for anomaly detection and data monitoring, and

    great_expectations Data Docs after checkpoint run

  • custom python code that makes it easy to implement an ELT pipeline for any other table hosted by Socrata

    data-loading TaskGroups in load_data_tg TaskGroup

    load_data_tg TaskGroup High Level

    automate as much pipeline development as possible