A comprehensive collection of 22 production-ready skills for Genie Code, designed to empower data professionals working on the Databricks platform. This repository teaches Genie Code domain-specific expertise in data engineering, analytics, ML operations, platform optimization, and governance.
π Repository Stats: 22 skills | 14,841 lines of code | 8 categories | Production-tested patterns
- π Overview
- ποΈ Repository Structure
- π Quick Start
- π‘ Using Skills with Genie Code
- βοΈ Creating Custom Skills
- π€ Contributing
- β Show Your Support
- π¬ Support & Feedback
- π Connect & Collaborate
- π Additional Resources
- π Learning Paths
- π License
- π Acknowledgments
This repository enhances Genie Code's capabilities through four powerful configuration layers:
Personal, persistent instructions for Genie Code - your coding style, preferences, and project context.
Organization-wide guidelines configured by workspace admins for consistency across teams.
Custom skills you create to teach Genie Code new capabilities specific to your workflows.
Enterprise-level domain expertise available to all users in the workspace.
databricks-genie-code-skills/
βββ README.md # This file
βββ SKILLS_INVENTORY.md # Detailed skill catalog
βββ .assistant_instructions.md # Pre-configured assistant instructions template
βββ examples/
β βββ user-instructions.md # Template for personal config
β βββ workspace-instructions.md # Example workspace config
βββ skills/ # 22 production skills
β βββ auto-loader/ # 593 lines
β βββ cost-optimization/ # 825 lines
β βββ data-contracts/ # 824 lines
β βββ data-modeling/ # 489 lines
β βββ data-quality-checks/ # 552 lines
β βββ delta-lake-optimization/ # 681 lines
β βββ documentation-practices/ # 893 lines
β βββ error-handling/ # 677 lines
β βββ feature-engineering/ # 822 lines
β βββ incremental-processing/ # 593 lines
β βββ job-scheduling/ # 631 lines
β βββ medallion-architecture/ # 497 lines
β βββ mlflow-tracking/ # 702 lines
β βββ model-deployment/ # 758 lines
β βββ monitoring-observability/ # 427 lines
β βββ performance-tuning/ # 819 lines
β βββ spark-optimization/ # 960 lines
β βββ sql-best-practices/ # 826 lines
β βββ streaming-pipelines/ # 644 lines
β βββ testing-strategies/ # 530 lines
β βββ unity-catalog-governance/ # 348 lines
β βββ workflow-orchestration/ # 750 lines
βββ docs/ # Additional documentation
βββ installation.md
βββ creating-skills.md
βββ best-practices.md
Option 1: Direct Copy (Recommended)
# 1. Navigate to your Databricks workspace
# 2. Clone or download this repository to your workspace folder
# 3. Copy skills to Genie Code's skills folder
cp -r /Workspace/Users/your.email@company.com/databricks-genie-code-skills/skills/* ~/.assistant/skills/
# 4. (Recommended) Copy the pre-configured assistant instructions
# This includes all 22 skill references for automatic skill loading
cp /Workspace/Users/your.email@company.com/databricks-genie-code-skills/.assistant_instructions.md ~/.assistant_instructions.mdOption 2: Git Clone
# Clone into your workspace
git clone <repository-url> /Workspace/Users/your.email@company.com/databricks-genie-code-skills
# Link skills to Genie Code
ln -s /Workspace/Users/your.email@company.com/databricks-genie-code-skills/skills/* ~/.assistant/skills/
# Copy assistant instructions to your home directory
cp /Workspace/Users/your.email@company.com/databricks-genie-code-skills/.assistant_instructions.md ~/.assistant_instructions.mdOption 3: Selective Skills
# Copy only specific skills you need
cp -r /Workspace/Users/your.email@company.com/databricks-genie-code-skills/skills/delta-lake-optimization ~/.assistant/skills/
cp -r /Workspace/Users/your.email@company.com/databricks-genie-code-skills/skills/spark-optimization ~/.assistant/skills/
# Still recommended to copy assistant instructions
cp /Workspace/Users/your.email@company.com/databricks-genie-code-skills/.assistant_instructions.md ~/.assistant_instructions.md# List installed skills
ls -la ~/.assistant/skills/
# Each skill should have a SKILL.md file
cat ~/.assistant/skills/delta-lake-optimization/SKILL.md
# Verify assistant instructions
cat ~/.assistant_instructions.md# If you prefer to start from the examples template instead
cp /Workspace/Users/your.email@company.com/databricks-genie-code-skills/examples/user-instructions.md ~/.assistant_instructions.md
# Edit with your preferences
databricks workspace edit ~/.assistant_instructions.mdOnce installed, skills are automatically available to Genie Code. Simply interact naturally:
"Help me optimize this Delta table for better query performance"
β Genie Code uses: delta-lake-optimization, spark-optimization
"Set up a medallion architecture pipeline with data quality checks"
β Genie Code uses: medallion-architecture, data-quality-checks, incremental-processing
"Deploy this ML model with monitoring and versioning"
β Genie Code uses: model-deployment, mlflow-tracking, monitoring-observability
Genie Code automatically selects relevant skills based on:
- Keywords in your question (Delta, Spark, MLflow, Unity Catalog, etc.)
- Context from your notebook (existing code, table references, imports)
- Task type (optimization, deployment, governance, orchestration)
- Best practices (automatically applies patterns from multiple skills)
# Reference specific skills explicitly
"Using the spark-optimization skill, improve this join operation"
# Combine multiple skills
"Apply sql-best-practices and performance-tuning to optimize this query"
# Request comprehensive solutions
"Build a complete streaming pipeline following best practices"
β Uses: streaming-pipelines, auto-loader, data-quality-checks,
error-handling, monitoring-observabilityExtend this repository with your own domain-specific skills:
# Skill Name
## Purpose
Brief description of what this skill teaches Genie Code
## When to Use
Situations where this skill applies
## Key Concepts
1. Core principle 1
2. Core principle 2
3. Core principle 3
## Examples
### Example 1: Basic Usage
```python
# Code example with detailed comments
# Explain the pattern and why it works# More complex real-world example
# Include error handling and edge cases- Practice 1 with rationale
- Practice 2 with rationale
- Practice 3 with rationale
- β What to avoid
- β What to do instead
- π‘ Why it matters
### Adding New Skills
1. **Create skill folder:**
```bash
mkdir -p skills/your-skill-name
cd skills/your-skill-name
-
Create SKILL.md file using the template above
-
Add comprehensive examples (aim for 500+ lines)
-
Test with Genie Code:
cp -r skills/your-skill-name ~/.assistant/skills/ -
Update this README in the Key Skills section
We welcome and encourage contributions! This repository thrives on community knowledge and real-world patterns.
β¨ Share New Skills
- Industry-specific patterns (healthcare, finance, retail, etc.)
- Cloud-specific optimizations (AWS, Azure, GCP)
- Advanced MLOps workflows
- Real-time analytics patterns
- Security and compliance patterns
π Improve Existing Skills
- Add more examples and use cases
- Update with latest Databricks features
- Fix errors or clarify explanations
- Add edge cases and troubleshooting tips
π Enhance Documentation
- Better installation guides
- Video tutorials or notebooks
- Architecture diagrams
- Performance benchmarks
- Fork this repository
- Create a feature branch (
git checkout -b feature/your-skill-name) - Add your skill following the template structure
- Test with Genie Code to ensure it works
- Submit a Pull Request with clear description
- Share your use case - help others understand when to use it
β
Production-tested patterns (real-world usage)
β
Comprehensive examples (500+ lines recommended)
β
Clear explanations (why, not just how)
β
Best practices & pitfalls (learn from experience)
β
Related skills (help Genie Code connect concepts)
If this repository helps you or your team:
- β Star this repository to help others discover it
- π Share with colleagues working on Databricks
- π¬ Provide feedback on what works and what doesn't
- π€ Contribute your own skills from production use cases
Your stars and contributions help grow this knowledge base for the entire Databricks community!
This is a living repository designed to evolve with the Databricks platform, Genie Code capabilities, and community needs.
- Questions? Open an issue with the
questionlabel - Bug found? Report with detailed reproduction steps
- Feature request? Share your use case and requirements
- What's working well? Tell us which skills save you the most time
- What's missing? Suggest new skill categories or topics
- What needs improvement? Help us enhance existing content
- Production examples? Share your real-world usage patterns
- Participate in discussions
- Review pull requests
- Share your learning journey
- Help other users
We read every issue and pull request. Your feedback directly shapes this repository's evolution.
I'm excited to discuss on:
- π Scaling patterns - How you're using these skills in production
- π Training programs - Onboarding teams to Databricks with these skills
- π’ Enterprise adoption - Deploying skills across large organizations
- π€ Partnerships - Collaborating on industry-specific skill libraries
- π Content creation - Writing blogs, tutorials, or courses together
- π€ Speaking opportunities - Presenting at conferences or meetups
Have an idea or opportunity? Open a discussion or reach out through issues or connect with me in Linkedin!
- Databricks Documentation
- Delta Lake Guide
- Unity Catalog Documentation
- MLflow Documentation
- Databricks SQL
Whether you're just starting with Databricks or optimizing production workloads, follow these curated learning paths:
Goal: Build foundation in data engineering and analytics
-
Start: SQL Best Practices (826 lines)
- Learn Databricks SQL syntax and optimization patterns
- Understand query execution and performance
-
Next: Delta Lake Optimization (681 lines)
- Master the lakehouse storage layer
- Learn OPTIMIZE, VACUUM, and Z-ORDER
-
Then: Medallion Architecture (497 lines)
- Understand Bronze/Silver/Gold patterns
- Build multi-hop data pipelines
-
Practice: Data Quality Checks (552 lines)
- Implement validation in your pipelines
- Use Lakehouse Monitoring
Time Investment: 2-3 weeks | Lines to Study: 2,556 lines
Goal: Create reliable, scalable data pipelines
-
Foundation: Incremental Processing (593 lines)
- Efficient batch and streaming patterns
- Change Data Feed and merge operations
-
Ingestion: Auto Loader (593 lines)
- Schema inference and evolution
- Unity Catalog Volumes integration
-
Real-time: Streaming Pipelines (644 lines)
- Structured Streaming and DLT
- Exactly-once processing guarantees
-
Governance: Unity Catalog Governance (348 lines)
- Access control and lineage
- Data discovery and audit logging
-
Quality: Data Contracts (824 lines)
- Schema enforcement and validation
- Contract-driven development
-
Operations: Workflow Orchestration (750 lines)
- Multi-task job dependencies
- Error handling and monitoring
Time Investment: 4-6 weeks | Lines to Study: 3,752 lines
Goal: Optimize for cost, performance, and scale
-
Performance: Spark Optimization (960 lines)
- Partitioning strategies and caching
- Broadcast joins and shuffle optimization
- Photon engine best practices
-
Tuning: Performance Tuning (819 lines)
- Query plan analysis
- Memory and resource tuning
- Bottleneck identification
-
Cost: Cost Optimization (825 lines)
- Cluster sizing and autoscaling
- Serverless vs. classic compute
- Spot instance strategies
-
Reliability: Error Handling (677 lines)
- Retry strategies and circuit breakers
- Failure recovery patterns
- Alert configuration
-
Monitoring: Monitoring & Observability (427 lines)
- Metrics collection and dashboards
- System audit logging
- SLO tracking
Time Investment: 3-4 weeks | Lines to Study: 3,708 lines
Goal: Deploy and manage production ML systems
-
Features: Feature Engineering (822 lines)
- Feature Store implementation
- Real-time feature serving
- Feature transformation pipelines
-
Tracking: MLflow Tracking (702 lines)
- Experiment tracking and comparison
- Model registry and versioning
- Artifact management
-
Deployment: Model Deployment (758 lines)
- Batch vs. real-time inference
- REST API endpoints
- Serverless model serving
-
Scheduling: Job Scheduling (631 lines)
- Training job orchestration
- Retraining triggers
- Model monitoring jobs
-
Quality: Testing Strategies (530 lines)
- Model validation tests
- Integration testing
- A/B testing patterns
Time Investment: 3-4 weeks | Lines to Study: 3,443 lines
Goal: Establish organizational standards and practices
-
Documentation: Documentation Practices (893 lines)
- Code documentation standards
- Notebook organization
- Knowledge management
-
Modeling: Data Modeling (489 lines)
- Dimensional modeling (star/snowflake)
- SCD Type 2 patterns
- Normalization techniques
-
Testing: Testing Strategies (530 lines)
- Unit and integration testing
- Data validation frameworks
- CI/CD integration
-
Monitoring: Monitoring & Observability (427 lines)
- Team dashboards and alerts
- Audit logging and compliance
- Performance tracking
Time Investment: 2-3 weeks | Lines to Study: 2,339 lines
Goal: Comprehensive expertise across all domains
Complete all 22 skills in recommended order by category:
- Data Engineering (5 skills, 3,063 lines)
- Governance & Architecture (3 skills, 1,669 lines)
- Analytics & Data Modeling (2 skills, 1,315 lines)
- Performance & Optimization (3 skills, 2,604 lines)
- Orchestration & Workflow (3 skills, 2,058 lines)
- ML Operations (3 skills, 2,282 lines)
- Quality & Monitoring (2 skills, 957 lines)
- Best Practices (1 skill, 893 lines)
Time Investment: 12-16 weeks | Total Content: 14,841 lines
This repository is provided as-is for educational and productivity purposes.
- β Free to use, modify, and distribute
- β Customize freely for your organization's needs
- β Share with your team and community
- β Build upon and create derivatives
No warranty provided. Use at your own discretion and always test in development before production.
Chakradhar Dodda
Senior Data Engineer | Azure Data Engineer | Databricks
A Senior Data Engineer with 8 years of experience designing enterprise-scale data solutions on Azure Cloud and Databricks. Specializing in ETL/ELT pipelines, Delta Lake architecture, and Medallion data lakehouse patterns. This repository represents production-tested patterns and best practices from real-world implementations across Telecom and Energy domains.
Certifications:
π
Databricks Certified Data Engineer Associate
π
Microsoft Certified: Azure Data Engineer Associate (DP-203)
π
Microsoft Certified: Azure AI Engineer Associate (AI-102)
Core Expertise:
Azure Databricks β’ Delta Lake β’ PySpark β’ Azure Data Factory β’ Azure Synapse Analytics β’ Data Lakehouse Architecture β’ Medallion Architecture β’ Unity Catalog β’ ETL/ELT Pipelines
- The Databricks community for sharing knowledge and best practices
- Contributors who enhance these skills with real-world experience
- Genie Code team for building an incredible AI assistant
- Users who provide feedback and star this repository
Built with β€οΈ for the Databricks community
Last Updated: 2026-04-09
Repository Version: 1.0
Total Skills: 22 | Total Lines: 14,841
Created by: Chakradhar Dodda
