This repository is my personal knowledge base and practice space for learning Docker and Podman with a focus on data engineering use cases.
It contains notes, examples, Dockerfiles, and mini-projects that I can revisit anytime.
docker-data-engineering-notes/
βββ README.md # Main introduction (this file)
βββ concepts/ # Core concepts and theory
β βββ docker_basics.md
β βββ podman_basics.md
β βββ docker_vs_podman.md
β βββ cheatsheet.md
βββ dockerfiles/ # Example Dockerfiles
β βββ Dockerfile_Guide.md
β βββ python_app/
β βββ postgres/
β βββ airflow/
βββ compose-examples/ # Docker Compose examples
| βββ DockerCompose_Guide.md
β βββ postgres.yml
β βββ airflow.yml
β βββ kafka_spark.yml
βββ mini-projects/ # Hands-on projects for practice
β βββ 01_docker_postgres/
β βββ 02_docker_airflow/
β βββ 03_kafka_pipeline/
βββ notes/ # Extra notes and troubleshooting
βββ networking.md
βββ volumes.md
βββ security.md
βββ troubleshooting.md
- Learn Docker basics (images, containers, volumes, networks).
- Compare Docker vs Podman and understand when to use each.
- Practice data engineering workflows using Docker:
- PostgreSQL in containers
- Apache Airflow for ETL pipelines
- Kafka + Spark for streaming
- Build a personal cheatsheet for quick reference.
- Document troubleshooting tips I encounter along the way.
- Docker: Core containerization tool.
- Podman: Docker alternative, daemonless and rootless.
- Docker Compose: Multi-container setup.
- Data Engineering Tools in Containers:
- PostgreSQL
- Apache Airflow
- Apache Kafka
- Apache Spark
- Clone this repo:
git clone https://github.com/<your-username>/docker-data-engineering-notes.git cd docker-data-engineering-notes
- Explore notes in the
concepts/andnotes/folders. - Run examples from
dockerfiles/orcompose-examples/. - Try the
mini-projects/for hands-on practice.
- Set up repo structure
- Add Docker basics notes
- Add Podman basics notes
- Create PostgreSQL mini-project
- Add Airflow mini-project
- Add Kafka + Spark pipeline
βοΈ This repo is a living document. Iβll keep updating it as I learn more about Docker, Podman, and data engineering workflows.