Skip to content

drateberry/cms-detector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Universal CMS Detector

A comprehensive Python script that automatically detects Content Management Systems (CMS) and identifies their version numbers from a list of URLs.

Supported CMS Platforms

The script can detect 25+ popular CMS and web platforms:

Content Management Systems

  • WordPress - with version detection
  • Joomla - with version detection
  • Drupal - with version detection
  • TYPO3 - with version detection
  • Craft CMS - with version detection
  • Ghost - with version detection
  • Django CMS
  • Bitrix - with version detection

E-Commerce Platforms

  • Shopify
  • Magento - with version detection
  • PrestaShop - with version detection
  • OpenCart - with version detection
  • WooCommerce (detected as WordPress)

Website Builders

  • Wix
  • Squarespace
  • Weebly
  • Webflow
  • HubSpot CMS

Static Site Generators

  • Jekyll - with version detection
  • Hugo - with version detection
  • Gatsby - with version detection
  • GitHub Pages

Other Platforms

  • Blogger
  • Contentful
  • ASP.NET
  • Node.js/Express
  • PHP-based CMS (generic detection)

Features

  • 🔍 Multi-CMS Detection: Identifies 25+ different CMS platforms
  • 📊 Version Discovery: Extracts version numbers when available
  • 🎯 Confidence Scoring: Provides confidence levels (High/Medium/Low) for detections
  • Parallel Processing: Checks multiple sites simultaneously
  • 📈 Smart Detection: Uses multiple detection methods per CMS
  • 🛡️ Error Handling: Gracefully handles timeouts and connection issues
  • 📝 Detailed Reporting: CSV output with CMS, version, and confidence levels

Requirements

  • Python 3.6 or higher
  • requests library

Installation

  1. Clone or download the script cms_detector.py

  2. Install the required Python package:

    pip install requests

Usage

Basic Usage

  1. Create an input CSV file with URLs (one per line):

    example.com
    https://wordpress-site.org
    shopify-store.com
    joomla-website.net

    Note: The script automatically adds https:// if no protocol is specified.

  2. Run the script:

    python cms_detector.py input_urls.csv
  3. The script will create cms_detection_results.csv with the results.

Advanced Usage

Customize the behavior with command-line options:

python cms_detector.py input_urls.csv -o results.csv -t 15 -w 10

Command Line Options

Option Long Form Description Default
-o --output Output CSV filename cms_detection_results.csv
-t --timeout Request timeout in seconds 10
-w --workers Number of parallel workers 5

Examples

Quick scan with default settings:

python cms_detector.py sites.csv

Scan with longer timeout for slow servers:

python cms_detector.py sites.csv -t 30

Fast parallel scanning of large list:

python cms_detector.py large_list.csv -w 20 -o bulk_results.csv

Output Format

The script generates a CSV file with four columns:

Column Description Example Values
URL The original URL example.com
CMS Detected CMS platform WordPress, Shopify, Unknown
Version CMS version (if available) 6.4.2, unknown, N/A
Confidence Detection confidence level High, Medium, Low, N/A

Sample Output

URL,CMS,Version,Confidence
techblog.com,WordPress,6.4.2,High
store.example.com,Shopify,unknown,High
news-site.org,Joomla,4.3.1,Medium
company.com,Drupal,10.1.5,High
portfolio.net,Wix,unknown,High
unknown-site.com,Unknown,N/A,N/A

Understanding Confidence Levels

  • High: Multiple strong indicators found (3+ indicators, specific headers, or known hosting patterns)
  • Medium: Some indicators found (2 indicators or moderate certainty)
  • Low: Few indicators found (1 indicator, might be false positive)
  • N/A: No CMS detected or error occurred

Detection Methods

The script uses multiple sophisticated detection techniques:

1. HTML Analysis

  • Searches for CMS-specific paths (/wp-content/, /components/com_, etc.)
  • Identifies unique HTML structures and class names
  • Detects CMS-specific JavaScript files and variables

2. HTTP Headers

  • Examines X-Powered-By headers
  • Checks for CMS-specific headers (X-Shopify, X-Wix, etc.)
  • Analyzes server response patterns

3. Meta Tags

  • Looks for generator meta tags with version info
  • Checks for CMS-specific meta properties

4. Special Files

  • Tests for CMS-specific files (readme.html, CHANGELOG.txt)
  • Checks admin panel URLs (/wp-admin/, /administrator/)
  • Probes API endpoints (/wp-json/, /jsonapi/)

5. Version Detection

  • Parses version from generator tags
  • Extracts from changelog files
  • Reads from manifest/configuration files
  • Checks RSS/Atom feeds

Performance Tips

For Large Lists (100+ URLs)

  • Increase workers: -w 15 or -w 20
  • Use reasonable timeout: -t 10 (default is usually fine)
  • Consider splitting very large lists (1000+) into batches

For Slow/International Sites

  • Increase timeout: -t 20 or -t 30
  • Reduce workers to avoid overwhelming connection: -w 3

For Fast Local Scanning

  • Reduce timeout for known fast sites: -t 5
  • Increase workers for local network: -w 10

Troubleshooting

Common Issues and Solutions

"Unknown" for most sites:

  • Sites may be using custom CMS or static HTML
  • Heavy caching/CDN might hide CMS signatures
  • Try accessing sites directly to verify they're using a CMS

"Error" results:

  • Check if URLs are accessible in browser
  • Increase timeout with -t 30 for slow sites
  • Verify internet connection stability

Version shows "unknown" despite correct CMS detection:

  • Many sites hide version numbers for security
  • Hosted platforms (Wix, Shopify) don't expose versions
  • Security plugins may remove version information

SSL/Certificate errors:

  • Site has certificate issues
  • Add https:// or http:// explicitly to URL

Testing the Script

Test with known CMS sites:

echo "wordpress.org" > test.csv
echo "joomla.org" >> test.csv
echo "drupal.org" >> test.csv
python cms_detector.py test.csv

Limitations

  • Authentication Required: Cannot detect CMS on sites requiring login
  • Security Plugins: Some security tools hide CMS signatures
  • CDN/WAF: Cloudflare and similar services may block detection
  • Custom CMS: Proprietary or heavily modified CMS may not be detected
  • Headless CMS: API-based CMS without frontend markers are harder to detect
  • Version Hiding: Many sites remove version info for security

Best Practices

Ethical Usage

  • Only scan sites you have permission to analyze
  • Respect robots.txt and rate limits
  • Don't use for malicious reconnaissance
  • Consider adding delays for large-scale scanning

Security Considerations

  • This tool performs read-only operations (GET requests)
  • No authentication attempts or vulnerability scanning
  • Results should be used for legitimate administrative purposes

How It Works

  1. Initial Request: Fetches the homepage HTML and headers
  2. Pattern Matching: Compares against signatures of 25+ CMS platforms
  3. Confidence Scoring: Rates detection based on number of indicators
  4. Version Extraction: Attempts multiple methods to find version
  5. Special Probes: Makes additional requests for version files if needed
  6. Result Compilation: Outputs findings with confidence levels

Contributing

Contributions are welcome! To add support for a new CMS:

  1. Add detection patterns to self.cms_patterns dictionary
  2. Include version extraction patterns if available
  3. Add special URL endpoints if applicable
  4. Test with known sites using that CMS

When reporting issues, please include:

  • Python version (python --version)
  • Error messages
  • Sample URLs (if not sensitive)
  • Command used

Output Summary

After processing, the script displays:

  • Real-time progress for each URL
  • CMS detection results with confidence
  • Summary statistics showing CMS distribution

Example summary output:

=== SUMMARY ===
WordPress: 45
Shopify: 23
Unknown: 12
Drupal: 8
Joomla: 5
Wix: 3

Advanced Features

Custom Detection Rules

You can extend the script by adding custom CMS patterns:

'CustomCMS': {
    'indicators': ['custom-signature', '/custom-path/'],
    'version_patterns': [
        (r'CustomCMS v([\d.]+)', 'html'),
    ],
    'special_urls': ['/custom-admin/']
}

Batch Processing

For processing multiple CSV files:

for file in *.csv; do
    python cms_detector.py "$file" -o "results_${file}"
done

Filtering Results

Post-process results to find specific CMS:

# Find all WordPress sites
grep ",WordPress," cms_detection_results.csv

# Count sites by CMS
cut -d',' -f2 cms_detection_results.csv | sort | uniq -c

Version History

  • 2.0.0 - Complete rewrite as universal CMS detector

    • Added support for 25+ CMS platforms
    • Implemented confidence scoring
    • Enhanced version detection methods
    • Improved parallel processing
  • 1.0.0 - Initial WordPress-only version

License

This script is provided as-is for educational and administrative purposes. Users are responsible for ensuring their use complies with applicable laws and website terms of service.

Disclaimer

This tool is designed for legitimate website administration and analysis. It should not be used for:

  • Unauthorized security scanning
  • Building databases of vulnerable sites
  • Competitive intelligence without permission
  • Any activity that violates laws or terms of service

Always ensure you have proper authorization before scanning websites you don't own.

Support

For questions, issues, or feature requests, please create an issue in the project repository. Include relevant details such as error messages, Python version, and example URLs (if appropriate).

Acknowledgments

This script uses publicly available CMS signatures and detection methods. It does not exploit any vulnerabilities or use any proprietary detection techniques.

About

A Python script to detect the CMS of a website and return a CSV with the CMS and version number

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages