A comprehensive Python script that automatically detects Content Management Systems (CMS) and identifies their version numbers from a list of URLs.
The script can detect 25+ popular CMS and web platforms:
- WordPress - with version detection
- Joomla - with version detection
- Drupal - with version detection
- TYPO3 - with version detection
- Craft CMS - with version detection
- Ghost - with version detection
- Django CMS
- Bitrix - with version detection
- Shopify
- Magento - with version detection
- PrestaShop - with version detection
- OpenCart - with version detection
- WooCommerce (detected as WordPress)
- Wix
- Squarespace
- Weebly
- Webflow
- HubSpot CMS
- Jekyll - with version detection
- Hugo - with version detection
- Gatsby - with version detection
- GitHub Pages
- Blogger
- Contentful
- ASP.NET
- Node.js/Express
- PHP-based CMS (generic detection)
- 🔍 Multi-CMS Detection: Identifies 25+ different CMS platforms
- 📊 Version Discovery: Extracts version numbers when available
- 🎯 Confidence Scoring: Provides confidence levels (High/Medium/Low) for detections
- ⚡ Parallel Processing: Checks multiple sites simultaneously
- 📈 Smart Detection: Uses multiple detection methods per CMS
- 🛡️ Error Handling: Gracefully handles timeouts and connection issues
- 📝 Detailed Reporting: CSV output with CMS, version, and confidence levels
- Python 3.6 or higher
requestslibrary
-
Clone or download the script
cms_detector.py -
Install the required Python package:
pip install requests
-
Create an input CSV file with URLs (one per line):
example.com https://wordpress-site.org shopify-store.com joomla-website.netNote: The script automatically adds
https://if no protocol is specified. -
Run the script:
python cms_detector.py input_urls.csv
-
The script will create
cms_detection_results.csvwith the results.
Customize the behavior with command-line options:
python cms_detector.py input_urls.csv -o results.csv -t 15 -w 10| Option | Long Form | Description | Default |
|---|---|---|---|
-o |
--output |
Output CSV filename | cms_detection_results.csv |
-t |
--timeout |
Request timeout in seconds | 10 |
-w |
--workers |
Number of parallel workers | 5 |
Quick scan with default settings:
python cms_detector.py sites.csvScan with longer timeout for slow servers:
python cms_detector.py sites.csv -t 30Fast parallel scanning of large list:
python cms_detector.py large_list.csv -w 20 -o bulk_results.csvThe script generates a CSV file with four columns:
| Column | Description | Example Values |
|---|---|---|
| URL | The original URL | example.com |
| CMS | Detected CMS platform | WordPress, Shopify, Unknown |
| Version | CMS version (if available) | 6.4.2, unknown, N/A |
| Confidence | Detection confidence level | High, Medium, Low, N/A |
URL,CMS,Version,Confidence
techblog.com,WordPress,6.4.2,High
store.example.com,Shopify,unknown,High
news-site.org,Joomla,4.3.1,Medium
company.com,Drupal,10.1.5,High
portfolio.net,Wix,unknown,High
unknown-site.com,Unknown,N/A,N/A- High: Multiple strong indicators found (3+ indicators, specific headers, or known hosting patterns)
- Medium: Some indicators found (2 indicators or moderate certainty)
- Low: Few indicators found (1 indicator, might be false positive)
- N/A: No CMS detected or error occurred
The script uses multiple sophisticated detection techniques:
- Searches for CMS-specific paths (
/wp-content/,/components/com_, etc.) - Identifies unique HTML structures and class names
- Detects CMS-specific JavaScript files and variables
- Examines
X-Powered-Byheaders - Checks for CMS-specific headers (X-Shopify, X-Wix, etc.)
- Analyzes server response patterns
- Looks for generator meta tags with version info
- Checks for CMS-specific meta properties
- Tests for CMS-specific files (readme.html, CHANGELOG.txt)
- Checks admin panel URLs (
/wp-admin/,/administrator/) - Probes API endpoints (
/wp-json/,/jsonapi/)
- Parses version from generator tags
- Extracts from changelog files
- Reads from manifest/configuration files
- Checks RSS/Atom feeds
- Increase workers:
-w 15or-w 20 - Use reasonable timeout:
-t 10(default is usually fine) - Consider splitting very large lists (1000+) into batches
- Increase timeout:
-t 20or-t 30 - Reduce workers to avoid overwhelming connection:
-w 3
- Reduce timeout for known fast sites:
-t 5 - Increase workers for local network:
-w 10
"Unknown" for most sites:
- Sites may be using custom CMS or static HTML
- Heavy caching/CDN might hide CMS signatures
- Try accessing sites directly to verify they're using a CMS
"Error" results:
- Check if URLs are accessible in browser
- Increase timeout with
-t 30for slow sites - Verify internet connection stability
Version shows "unknown" despite correct CMS detection:
- Many sites hide version numbers for security
- Hosted platforms (Wix, Shopify) don't expose versions
- Security plugins may remove version information
SSL/Certificate errors:
- Site has certificate issues
- Add
https://orhttp://explicitly to URL
Test with known CMS sites:
echo "wordpress.org" > test.csv
echo "joomla.org" >> test.csv
echo "drupal.org" >> test.csv
python cms_detector.py test.csv- Authentication Required: Cannot detect CMS on sites requiring login
- Security Plugins: Some security tools hide CMS signatures
- CDN/WAF: Cloudflare and similar services may block detection
- Custom CMS: Proprietary or heavily modified CMS may not be detected
- Headless CMS: API-based CMS without frontend markers are harder to detect
- Version Hiding: Many sites remove version info for security
- Only scan sites you have permission to analyze
- Respect robots.txt and rate limits
- Don't use for malicious reconnaissance
- Consider adding delays for large-scale scanning
- This tool performs read-only operations (GET requests)
- No authentication attempts or vulnerability scanning
- Results should be used for legitimate administrative purposes
- Initial Request: Fetches the homepage HTML and headers
- Pattern Matching: Compares against signatures of 25+ CMS platforms
- Confidence Scoring: Rates detection based on number of indicators
- Version Extraction: Attempts multiple methods to find version
- Special Probes: Makes additional requests for version files if needed
- Result Compilation: Outputs findings with confidence levels
Contributions are welcome! To add support for a new CMS:
- Add detection patterns to
self.cms_patternsdictionary - Include version extraction patterns if available
- Add special URL endpoints if applicable
- Test with known sites using that CMS
When reporting issues, please include:
- Python version (
python --version) - Error messages
- Sample URLs (if not sensitive)
- Command used
After processing, the script displays:
- Real-time progress for each URL
- CMS detection results with confidence
- Summary statistics showing CMS distribution
Example summary output:
=== SUMMARY ===
WordPress: 45
Shopify: 23
Unknown: 12
Drupal: 8
Joomla: 5
Wix: 3
You can extend the script by adding custom CMS patterns:
'CustomCMS': {
'indicators': ['custom-signature', '/custom-path/'],
'version_patterns': [
(r'CustomCMS v([\d.]+)', 'html'),
],
'special_urls': ['/custom-admin/']
}For processing multiple CSV files:
for file in *.csv; do
python cms_detector.py "$file" -o "results_${file}"
donePost-process results to find specific CMS:
# Find all WordPress sites
grep ",WordPress," cms_detection_results.csv
# Count sites by CMS
cut -d',' -f2 cms_detection_results.csv | sort | uniq -c-
2.0.0 - Complete rewrite as universal CMS detector
- Added support for 25+ CMS platforms
- Implemented confidence scoring
- Enhanced version detection methods
- Improved parallel processing
-
1.0.0 - Initial WordPress-only version
This script is provided as-is for educational and administrative purposes. Users are responsible for ensuring their use complies with applicable laws and website terms of service.
This tool is designed for legitimate website administration and analysis. It should not be used for:
- Unauthorized security scanning
- Building databases of vulnerable sites
- Competitive intelligence without permission
- Any activity that violates laws or terms of service
Always ensure you have proper authorization before scanning websites you don't own.
For questions, issues, or feature requests, please create an issue in the project repository. Include relevant details such as error messages, Python version, and example URLs (if appropriate).
This script uses publicly available CMS signatures and detection methods. It does not exploit any vulnerabilities or use any proprietary detection techniques.