Skip to content

chakradhar-dodda/databricks-genie-code-skills

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

readme_pic_1

Databricks Genie Code Skills Repository

A comprehensive collection of 22 production-ready skills for Genie Code, designed to empower data professionals working on the Databricks platform. This repository teaches Genie Code domain-specific expertise in data engineering, analytics, ML operations, platform optimization, and governance.

πŸ“Š Repository Stats: 22 skills | 14,841 lines of code | 8 categories | Production-tested patterns


πŸ“‘ Table of Contents


πŸ“š Overview

This repository enhances Genie Code's capabilities through four powerful configuration layers:

1. User Instructions (~/.assistant_instructions.md)

Personal, persistent instructions for Genie Code - your coding style, preferences, and project context.

2. Workspace Instructions (Admin-configured)

Organization-wide guidelines configured by workspace admins for consistency across teams.

3. User Skills (~/.assistant/skills/)

Custom skills you create to teach Genie Code new capabilities specific to your workflows.

4. Workspace Skills (Admin-configured)

Enterprise-level domain expertise available to all users in the workspace.


πŸ—‚οΈ Repository Structure

databricks-genie-code-skills/
β”œβ”€β”€ README.md                                   # This file
β”œβ”€β”€ SKILLS_INVENTORY.md                         # Detailed skill catalog
β”œβ”€β”€ .assistant_instructions.md                  # Pre-configured assistant instructions template
β”œβ”€β”€ examples/
β”‚   β”œβ”€β”€ user-instructions.md                    # Template for personal config
β”‚   └── workspace-instructions.md               # Example workspace config
β”œβ”€β”€ skills/                                     # 22 production skills
β”‚   β”œβ”€β”€ auto-loader/                           # 593 lines
β”‚   β”œβ”€β”€ cost-optimization/                     # 825 lines  
β”‚   β”œβ”€β”€ data-contracts/                        # 824 lines
β”‚   β”œβ”€β”€ data-modeling/                         # 489 lines
β”‚   β”œβ”€β”€ data-quality-checks/                   # 552 lines
β”‚   β”œβ”€β”€ delta-lake-optimization/               # 681 lines
β”‚   β”œβ”€β”€ documentation-practices/               # 893 lines
β”‚   β”œβ”€β”€ error-handling/                        # 677 lines
β”‚   β”œβ”€β”€ feature-engineering/                   # 822 lines
β”‚   β”œβ”€β”€ incremental-processing/                # 593 lines
β”‚   β”œβ”€β”€ job-scheduling/                        # 631 lines
β”‚   β”œβ”€β”€ medallion-architecture/                # 497 lines
β”‚   β”œβ”€β”€ mlflow-tracking/                       # 702 lines
β”‚   β”œβ”€β”€ model-deployment/                      # 758 lines
β”‚   β”œβ”€β”€ monitoring-observability/              # 427 lines
β”‚   β”œβ”€β”€ performance-tuning/                    # 819 lines
β”‚   β”œβ”€β”€ spark-optimization/                    # 960 lines
β”‚   β”œβ”€β”€ sql-best-practices/                    # 826 lines
β”‚   β”œβ”€β”€ streaming-pipelines/                   # 644 lines
β”‚   β”œβ”€β”€ testing-strategies/                    # 530 lines
β”‚   β”œβ”€β”€ unity-catalog-governance/              # 348 lines
β”‚   └── workflow-orchestration/                # 750 lines
└── docs/                                       # Additional documentation
    β”œβ”€β”€ installation.md
    β”œβ”€β”€ creating-skills.md
    └── best-practices.md

πŸš€ Quick Start

Installation

Option 1: Direct Copy (Recommended)

# 1. Navigate to your Databricks workspace
# 2. Clone or download this repository to your workspace folder
# 3. Copy skills to Genie Code's skills folder

cp -r /Workspace/Users/your.email@company.com/databricks-genie-code-skills/skills/* ~/.assistant/skills/

# 4. (Recommended) Copy the pre-configured assistant instructions
#    This includes all 22 skill references for automatic skill loading
cp /Workspace/Users/your.email@company.com/databricks-genie-code-skills/.assistant_instructions.md ~/.assistant_instructions.md

Option 2: Git Clone

# Clone into your workspace
git clone <repository-url> /Workspace/Users/your.email@company.com/databricks-genie-code-skills

# Link skills to Genie Code
ln -s /Workspace/Users/your.email@company.com/databricks-genie-code-skills/skills/* ~/.assistant/skills/

# Copy assistant instructions to your home directory
cp /Workspace/Users/your.email@company.com/databricks-genie-code-skills/.assistant_instructions.md ~/.assistant_instructions.md

Option 3: Selective Skills

# Copy only specific skills you need
cp -r /Workspace/Users/your.email@company.com/databricks-genie-code-skills/skills/delta-lake-optimization ~/.assistant/skills/
cp -r /Workspace/Users/your.email@company.com/databricks-genie-code-skills/skills/spark-optimization ~/.assistant/skills/

# Still recommended to copy assistant instructions
cp /Workspace/Users/your.email@company.com/databricks-genie-code-skills/.assistant_instructions.md ~/.assistant_instructions.md

Verify Installation

# List installed skills
ls -la ~/.assistant/skills/

# Each skill should have a SKILL.md file
cat ~/.assistant/skills/delta-lake-optimization/SKILL.md

# Verify assistant instructions
cat ~/.assistant_instructions.md

(Optional) Customize User Instructions

# If you prefer to start from the examples template instead
cp /Workspace/Users/your.email@company.com/databricks-genie-code-skills/examples/user-instructions.md ~/.assistant_instructions.md

# Edit with your preferences
databricks workspace edit ~/.assistant_instructions.md

πŸ’‘ Using Skills with Genie Code

Once installed, skills are automatically available to Genie Code. Simply interact naturally:

Natural Language Queries

"Help me optimize this Delta table for better query performance"
β†’ Genie Code uses: delta-lake-optimization, spark-optimization

"Set up a medallion architecture pipeline with data quality checks"
β†’ Genie Code uses: medallion-architecture, data-quality-checks, incremental-processing

"Deploy this ML model with monitoring and versioning"
β†’ Genie Code uses: model-deployment, mlflow-tracking, monitoring-observability

Skill Selection

Genie Code automatically selects relevant skills based on:

  • Keywords in your question (Delta, Spark, MLflow, Unity Catalog, etc.)
  • Context from your notebook (existing code, table references, imports)
  • Task type (optimization, deployment, governance, orchestration)
  • Best practices (automatically applies patterns from multiple skills)

Advanced Usage

# Reference specific skills explicitly
"Using the spark-optimization skill, improve this join operation"

# Combine multiple skills
"Apply sql-best-practices and performance-tuning to optimize this query"

# Request comprehensive solutions
"Build a complete streaming pipeline following best practices"
β†’ Uses: streaming-pipelines, auto-loader, data-quality-checks, 
        error-handling, monitoring-observability

✍️ Creating Custom Skills

Extend this repository with your own domain-specific skills:

Skill Template Structure

# Skill Name

## Purpose
Brief description of what this skill teaches Genie Code

## When to Use
Situations where this skill applies

## Key Concepts
1. Core principle 1
2. Core principle 2
3. Core principle 3

## Examples

### Example 1: Basic Usage
```python
# Code example with detailed comments
# Explain the pattern and why it works

Example 2: Advanced Pattern

# More complex real-world example
# Include error handling and edge cases

Best Practices

  • Practice 1 with rationale
  • Practice 2 with rationale
  • Practice 3 with rationale

Common Pitfalls

  • ❌ What to avoid
  • βœ… What to do instead
  • πŸ’‘ Why it matters

Related Skills


### Adding New Skills

1. **Create skill folder:**
   ```bash
   mkdir -p skills/your-skill-name
   cd skills/your-skill-name
  1. Create SKILL.md file using the template above

  2. Add comprehensive examples (aim for 500+ lines)

  3. Test with Genie Code:

    cp -r skills/your-skill-name ~/.assistant/skills/
  4. Update this README in the Key Skills section


🀝 Contributing

We welcome and encourage contributions! This repository thrives on community knowledge and real-world patterns.

How to Contribute

✨ Share New Skills

  • Industry-specific patterns (healthcare, finance, retail, etc.)
  • Cloud-specific optimizations (AWS, Azure, GCP)
  • Advanced MLOps workflows
  • Real-time analytics patterns
  • Security and compliance patterns

πŸ› Improve Existing Skills

  • Add more examples and use cases
  • Update with latest Databricks features
  • Fix errors or clarify explanations
  • Add edge cases and troubleshooting tips

πŸ“š Enhance Documentation

  • Better installation guides
  • Video tutorials or notebooks
  • Architecture diagrams
  • Performance benchmarks

Contribution Process

  1. Fork this repository
  2. Create a feature branch (git checkout -b feature/your-skill-name)
  3. Add your skill following the template structure
  4. Test with Genie Code to ensure it works
  5. Submit a Pull Request with clear description
  6. Share your use case - help others understand when to use it

What Makes a Great Contribution?

βœ… Production-tested patterns (real-world usage)
βœ… Comprehensive examples (500+ lines recommended)
βœ… Clear explanations (why, not just how)
βœ… Best practices & pitfalls (learn from experience)
βœ… Related skills (help Genie Code connect concepts)


⭐ Show Your Support

If this repository helps you or your team:

  • ⭐ Star this repository to help others discover it
  • πŸ”— Share with colleagues working on Databricks
  • πŸ’¬ Provide feedback on what works and what doesn't
  • 🀝 Contribute your own skills from production use cases

Your stars and contributions help grow this knowledge base for the entire Databricks community!


πŸ’¬ Support & Feedback

This is a living repository designed to evolve with the Databricks platform, Genie Code capabilities, and community needs.

Get Help

  • Questions? Open an issue with the question label
  • Bug found? Report with detailed reproduction steps
  • Feature request? Share your use case and requirements

Share Feedback

  • What's working well? Tell us which skills save you the most time
  • What's missing? Suggest new skill categories or topics
  • What needs improvement? Help us enhance existing content
  • Production examples? Share your real-world usage patterns

Join the Community

  • Participate in discussions
  • Review pull requests
  • Share your learning journey
  • Help other users

We read every issue and pull request. Your feedback directly shapes this repository's evolution.


🌐 Connect & Collaborate

Open to Discussion

I'm excited to discuss on:

  • πŸ“ˆ Scaling patterns - How you're using these skills in production
  • πŸŽ“ Training programs - Onboarding teams to Databricks with these skills
  • 🏒 Enterprise adoption - Deploying skills across large organizations
  • 🀝 Partnerships - Collaborating on industry-specific skill libraries
  • πŸ“ Content creation - Writing blogs, tutorials, or courses together
  • 🎀 Speaking opportunities - Presenting at conferences or meetups

Have an idea or opportunity? Open a discussion or reach out through issues or connect with me in Linkedin!


πŸ“š Additional Resources

Databricks Documentation

Community Resources

Learning Platforms


πŸŽ“ Learning Paths

Whether you're just starting with Databricks or optimizing production workloads, follow these curated learning paths:

🌱 Beginner Path (New to Databricks)

Goal: Build foundation in data engineering and analytics

  1. Start: SQL Best Practices (826 lines)

    • Learn Databricks SQL syntax and optimization patterns
    • Understand query execution and performance
  2. Next: Delta Lake Optimization (681 lines)

    • Master the lakehouse storage layer
    • Learn OPTIMIZE, VACUUM, and Z-ORDER
  3. Then: Medallion Architecture (497 lines)

    • Understand Bronze/Silver/Gold patterns
    • Build multi-hop data pipelines
  4. Practice: Data Quality Checks (552 lines)

    • Implement validation in your pipelines
    • Use Lakehouse Monitoring

Time Investment: 2-3 weeks | Lines to Study: 2,556 lines


πŸš€ Intermediate Path (Building Production Pipelines)

Goal: Create reliable, scalable data pipelines

  1. Foundation: Incremental Processing (593 lines)

    • Efficient batch and streaming patterns
    • Change Data Feed and merge operations
  2. Ingestion: Auto Loader (593 lines)

    • Schema inference and evolution
    • Unity Catalog Volumes integration
  3. Real-time: Streaming Pipelines (644 lines)

    • Structured Streaming and DLT
    • Exactly-once processing guarantees
  4. Governance: Unity Catalog Governance (348 lines)

    • Access control and lineage
    • Data discovery and audit logging
  5. Quality: Data Contracts (824 lines)

    • Schema enforcement and validation
    • Contract-driven development
  6. Operations: Workflow Orchestration (750 lines)

    • Multi-task job dependencies
    • Error handling and monitoring

Time Investment: 4-6 weeks | Lines to Study: 3,752 lines


⚑ Advanced Path (Performance & Scale Optimization)

Goal: Optimize for cost, performance, and scale

  1. Performance: Spark Optimization (960 lines)

    • Partitioning strategies and caching
    • Broadcast joins and shuffle optimization
    • Photon engine best practices
  2. Tuning: Performance Tuning (819 lines)

    • Query plan analysis
    • Memory and resource tuning
    • Bottleneck identification
  3. Cost: Cost Optimization (825 lines)

    • Cluster sizing and autoscaling
    • Serverless vs. classic compute
    • Spot instance strategies
  4. Reliability: Error Handling (677 lines)

    • Retry strategies and circuit breakers
    • Failure recovery patterns
    • Alert configuration
  5. Monitoring: Monitoring & Observability (427 lines)

    • Metrics collection and dashboards
    • System audit logging
    • SLO tracking

Time Investment: 3-4 weeks | Lines to Study: 3,708 lines


πŸ€– ML Engineering Path (End-to-End ML Operations)

Goal: Deploy and manage production ML systems

  1. Features: Feature Engineering (822 lines)

    • Feature Store implementation
    • Real-time feature serving
    • Feature transformation pipelines
  2. Tracking: MLflow Tracking (702 lines)

    • Experiment tracking and comparison
    • Model registry and versioning
    • Artifact management
  3. Deployment: Model Deployment (758 lines)

    • Batch vs. real-time inference
    • REST API endpoints
    • Serverless model serving
  4. Scheduling: Job Scheduling (631 lines)

    • Training job orchestration
    • Retraining triggers
    • Model monitoring jobs
  5. Quality: Testing Strategies (530 lines)

    • Model validation tests
    • Integration testing
    • A/B testing patterns

Time Investment: 3-4 weeks | Lines to Study: 3,443 lines


🏒 Enterprise Path (Team Standards & Best Practices)

Goal: Establish organizational standards and practices

  1. Documentation: Documentation Practices (893 lines)

    • Code documentation standards
    • Notebook organization
    • Knowledge management
  2. Modeling: Data Modeling (489 lines)

    • Dimensional modeling (star/snowflake)
    • SCD Type 2 patterns
    • Normalization techniques
  3. Testing: Testing Strategies (530 lines)

    • Unit and integration testing
    • Data validation frameworks
    • CI/CD integration
  4. Monitoring: Monitoring & Observability (427 lines)

    • Team dashboards and alerts
    • Audit logging and compliance
    • Performance tracking

Time Investment: 2-3 weeks | Lines to Study: 2,339 lines


πŸ“Š Complete Mastery Path (All Skills)

Goal: Comprehensive expertise across all domains

Complete all 22 skills in recommended order by category:

  1. Data Engineering (5 skills, 3,063 lines)
  2. Governance & Architecture (3 skills, 1,669 lines)
  3. Analytics & Data Modeling (2 skills, 1,315 lines)
  4. Performance & Optimization (3 skills, 2,604 lines)
  5. Orchestration & Workflow (3 skills, 2,058 lines)
  6. ML Operations (3 skills, 2,282 lines)
  7. Quality & Monitoring (2 skills, 957 lines)
  8. Best Practices (1 skill, 893 lines)

Time Investment: 12-16 weeks | Total Content: 14,841 lines


πŸ“„ License

This repository is provided as-is for educational and productivity purposes.

  • βœ… Free to use, modify, and distribute
  • βœ… Customize freely for your organization's needs
  • βœ… Share with your team and community
  • βœ… Build upon and create derivatives

No warranty provided. Use at your own discretion and always test in development before production.


πŸ™ Acknowledgments

Created By

Chakradhar Dodda
Senior Data Engineer | Azure Data Engineer | Databricks

LinkedIn

A Senior Data Engineer with 8 years of experience designing enterprise-scale data solutions on Azure Cloud and Databricks. Specializing in ETL/ELT pipelines, Delta Lake architecture, and Medallion data lakehouse patterns. This repository represents production-tested patterns and best practices from real-world implementations across Telecom and Energy domains.

Certifications:
πŸ… Databricks Certified Data Engineer Associate
πŸ… Microsoft Certified: Azure Data Engineer Associate (DP-203)
πŸ… Microsoft Certified: Azure AI Engineer Associate (AI-102)

Core Expertise:
Azure Databricks β€’ Delta Lake β€’ PySpark β€’ Azure Data Factory β€’ Azure Synapse Analytics β€’ Data Lakehouse Architecture β€’ Medallion Architecture β€’ Unity Catalog β€’ ETL/ELT Pipelines


Special Thanks

  • The Databricks community for sharing knowledge and best practices
  • Contributors who enhance these skills with real-world experience
  • Genie Code team for building an incredible AI assistant
  • Users who provide feedback and star this repository

Built with ❀️ for the Databricks community

Last Updated: 2026-04-09
Repository Version: 1.0
Total Skills: 22 | Total Lines: 14,841
Created by: Chakradhar Dodda

About

Databricks Genie Code Skills πŸ§žβ€β™‚οΈβš‘ is a curated collection of code resources, utilities, and best practices to power up your data workflow on the Databricks platform. This repo offers skills for πŸš€ Data engineering automation πŸ€– Machine learning workflows πŸ“Š Data analytics solutions πŸ› οΈ Productivity tools for Databricks users

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors