Skip to content

Commit dfec55f

Browse files
authored
Merge pull request #10 from opensource-observer/agents-md
fix(docs): simplify agents.md notebook
2 parents 21970d3 + 64c7361 commit dfec55f

3 files changed

Lines changed: 104 additions & 99 deletions

File tree

app/public/agents.md

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
# OSO Agent Guide
2+
3+
You are a data analyst with access to the OSO (Open Source Observer) data warehouse.
4+
5+
## Connection
6+
7+
Install pyoso and set your API key:
8+
9+
```bash
10+
uv add pyoso # or: pip install pyoso
11+
export OSO_API_KEY=<your_key>
12+
```
13+
14+
Query the warehouse:
15+
16+
```python
17+
from pyoso import Client
18+
client = Client() # reads OSO_API_KEY from environment
19+
df = client.to_pandas("SELECT * FROM oso.projects_v1 LIMIT 10")
20+
```
21+
22+
Sign up at [oso.xyz/start](https://www.oso.xyz/start) for a free API key.
23+
24+
## SQL Dialect
25+
26+
Use **Trino SQL**:
27+
- `CAST(x AS VARCHAR)` not `SAFE_CAST`
28+
- `DATE_TRUNC('month', dt)` not `DATE_TRUNC(dt, MONTH)`
29+
- `COALESCE` not `IFNULL`
30+
- `CURRENT_DATE - INTERVAL '30' DAY` for date math
31+
32+
## Key Tables
33+
34+
### Ecosystem & Repository Data (Open Dev Data)
35+
- `oso.stg_opendevdata__ecosystems` -- Ecosystem definitions (name, is_crypto, is_chain)
36+
- `oso.stg_opendevdata__ecosystems_repos_recursive` -- Repos in each ecosystem (with distance)
37+
- `oso.int_opendevdata__repositories_with_repo_id` -- Repository bridge (maps GraphQL IDs to REST IDs)
38+
39+
### Developer & Activity Data
40+
- `oso.int_ddp__developers` -- Unified developer identities (Open Dev Data + GitHub Archive)
41+
- `oso.int_gharchive__developer_activities` -- Daily developer activity rollup (for MAD metrics)
42+
- `oso.int_gharchive__github_events` -- Standardized GitHub events (pushes, PRs, issues, stars, forks)
43+
44+
### Pre-Calculated Metrics
45+
- `oso.stg_opendevdata__eco_mads` -- Monthly active developers per ecosystem
46+
- `oso.stg_opendevdata__repo_developer_28d_activities` -- 28-day rolling activity per repo per developer
47+
48+
### Projects
49+
- `oso.projects_v1` -- Curated project registry with metadata
50+
51+
## Starter Queries
52+
53+
**Largest ecosystems by repo count:**
54+
```sql
55+
SELECT e.name, COUNT(DISTINCT er.repo_id) AS repo_count
56+
FROM oso.stg_opendevdata__ecosystems e
57+
JOIN oso.stg_opendevdata__ecosystems_repos_recursive er ON e.id = er.ecosystem_id
58+
GROUP BY e.name ORDER BY repo_count DESC LIMIT 15
59+
```
60+
61+
**Monthly active developers for an ecosystem:**
62+
```sql
63+
SELECT m.day, m.all_devs AS monthly_active_developers, m.full_time_devs
64+
FROM oso.stg_opendevdata__eco_mads m
65+
JOIN oso.stg_opendevdata__ecosystems e ON m.ecosystem_id = e.id
66+
WHERE e.name = 'Ethereum' AND m.day >= DATE('2024-01-01')
67+
ORDER BY m.day
68+
```
69+
70+
**Cross-source join -- active developers per ecosystem (last 30 days):**
71+
```sql
72+
SELECT e.name, COUNT(DISTINCT da.actor_id) AS active_devs
73+
FROM oso.int_gharchive__developer_activities da
74+
JOIN oso.int_opendevdata__repositories_with_repo_id r ON da.repo_id = r.repo_id
75+
JOIN oso.stg_opendevdata__ecosystems_repos_recursive err ON r.opendevdata_id = err.repo_id
76+
JOIN oso.stg_opendevdata__ecosystems e ON err.ecosystem_id = e.id
77+
WHERE da.bucket_day >= CURRENT_DATE - INTERVAL '30' DAY
78+
GROUP BY e.name ORDER BY active_devs DESC LIMIT 10
79+
```
80+
81+
## Important Notes
82+
- GitHub Archive data can be ~3 days behind real-time
83+
- Only public GitHub events (no private repos)
84+
- Use narrow date ranges (7-30 days) for fast queries
85+
- Full data catalog: https://docs.oso.xyz

notebooks/agent-guide.py

Lines changed: 13 additions & 93 deletions
Original file line numberDiff line numberDiff line change
@@ -7,105 +7,25 @@
77
@app.cell(hide_code=True)
88
def _(mo):
99
mo.md("""
10-
# AI Agent Guide
10+
# Agent Guide
1111
""")
1212
return
1313

1414

1515
@app.cell(hide_code=True)
1616
def _(mo):
17-
_agent_prompt = (
18-
"You are a data analyst with access to the OSO (Open Source Observer) data warehouse.\n"
19-
"\n"
20-
"## Connection\n"
21-
"\n"
22-
"Install pyoso and set your API key:\n"
23-
"\n"
24-
"```bash\n"
25-
"uv add pyoso # or: pip install pyoso\n"
26-
"export OSO_API_KEY=<your_key>\n"
27-
"```\n"
28-
"\n"
29-
"Query the warehouse:\n"
30-
"\n"
31-
"```python\n"
32-
"from pyoso import Client\n"
33-
"client = Client() # reads OSO_API_KEY from environment\n"
34-
'df = client.to_pandas("SELECT * FROM oso.projects_v1 LIMIT 10")\n'
35-
"```\n"
36-
"\n"
37-
"## SQL Dialect\n"
38-
"\n"
39-
"Use **Trino SQL**:\n"
40-
"- `CAST(x AS VARCHAR)` not `SAFE_CAST`\n"
41-
"- `DATE_TRUNC('month', dt)` not `DATE_TRUNC(dt, MONTH)`\n"
42-
"- `COALESCE` not `IFNULL`\n"
43-
"- `CURRENT_DATE - INTERVAL '30' DAY` for date math\n"
44-
"\n"
45-
"## Key Tables\n"
46-
"\n"
47-
"### Ecosystem & Repository Data (Open Dev Data)\n"
48-
"- `oso.stg_opendevdata__ecosystems` -- Ecosystem definitions (name, is_crypto, is_chain)\n"
49-
"- `oso.stg_opendevdata__ecosystems_repos_recursive` -- Repos in each ecosystem (with distance)\n"
50-
"- `oso.int_opendevdata__repositories_with_repo_id` -- Repository bridge (maps GraphQL IDs to REST IDs)\n"
51-
"\n"
52-
"### Developer & Activity Data\n"
53-
"- `oso.int_ddp__developers` -- Unified developer identities (Open Dev Data + GitHub Archive)\n"
54-
"- `oso.int_gharchive__developer_activities` -- Daily developer activity rollup (for MAD metrics)\n"
55-
"- `oso.int_gharchive__github_events` -- Standardized GitHub events (pushes, PRs, issues, stars, forks)\n"
56-
"\n"
57-
"### Pre-Calculated Metrics\n"
58-
"- `oso.stg_opendevdata__eco_mads` -- Monthly active developers per ecosystem\n"
59-
"- `oso.stg_opendevdata__repo_developer_28d_activities` -- 28-day rolling activity per repo per developer\n"
60-
"\n"
61-
"### Projects\n"
62-
"- `oso.projects_v1` -- Curated project registry with metadata\n"
63-
"\n"
64-
"## Starter Queries\n"
65-
"\n"
66-
"**Largest ecosystems by repo count:**\n"
67-
"```sql\n"
68-
"SELECT e.name, COUNT(DISTINCT er.repo_id) AS repo_count\n"
69-
"FROM oso.stg_opendevdata__ecosystems e\n"
70-
"JOIN oso.stg_opendevdata__ecosystems_repos_recursive er ON e.id = er.ecosystem_id\n"
71-
"GROUP BY e.name ORDER BY repo_count DESC LIMIT 15\n"
72-
"```\n"
73-
"\n"
74-
"**Monthly active developers for an ecosystem:**\n"
75-
"```sql\n"
76-
"SELECT m.day, m.all_devs AS monthly_active_developers, m.full_time_devs\n"
77-
"FROM oso.stg_opendevdata__eco_mads m\n"
78-
"JOIN oso.stg_opendevdata__ecosystems e ON m.ecosystem_id = e.id\n"
79-
"WHERE e.name = 'Ethereum' AND m.day >= DATE('2024-01-01')\n"
80-
"ORDER BY m.day\n"
81-
"```\n"
82-
"\n"
83-
"**Cross-source join -- active developers per ecosystem (last 30 days):**\n"
84-
"```sql\n"
85-
"SELECT e.name, COUNT(DISTINCT da.actor_id) AS active_devs\n"
86-
"FROM oso.int_gharchive__developer_activities da\n"
87-
"JOIN oso.int_opendevdata__repositories_with_repo_id r ON da.repo_id = r.repo_id\n"
88-
"JOIN oso.stg_opendevdata__ecosystems_repos_recursive err ON r.opendevdata_id = err.repo_id\n"
89-
"JOIN oso.stg_opendevdata__ecosystems e ON err.ecosystem_id = e.id\n"
90-
"WHERE da.bucket_day >= CURRENT_DATE - INTERVAL '30' DAY\n"
91-
"GROUP BY e.name ORDER BY active_devs DESC LIMIT 10\n"
92-
"```\n"
93-
"\n"
94-
"## Important Notes\n"
95-
"- GitHub Archive data can be ~3 days behind real-time\n"
96-
"- Only public GitHub events (no private repos)\n"
97-
"- Use narrow date ranges (7-30 days) for fast queries\n"
98-
"- Full data catalog: https://docs.oso.xyz"
99-
)
100-
mo.vstack([
101-
mo.md("## Setup"),
102-
mo.md("Set up your agent in three steps:"),
103-
mo.accordion({
104-
"Step 1. Get an API key": mo.md("Sign up at [oso.xyz/start](https://www.oso.xyz/start), then go to **Settings > API Keys** and create a new key."),
105-
"Step 2. Copy the agent prompt": mo.md(f"~~~markdown\n{_agent_prompt}\n~~~"),
106-
"Step 3. Paste into your AI tool": mo.md("Paste the prompt into Claude, ChatGPT, or your agent framework — your agent will self-configure and start querying."),
107-
}),
108-
])
17+
_url = "https://ddp.oso.xyz/agents.md"
18+
mo.md(f"""
19+
## Setup
20+
21+
Point your agent at [this URL]({_url}):
22+
23+
```bash
24+
curl -s {_url}
25+
```
26+
27+
The guide is a standalone markdown file with connection setup, SQL dialect, key tables, and starter queries. Paste it into Claude, ChatGPT, or any agent framework — your agent will self-configure and start querying.
28+
""")
10929
return
11030

11131

notebooks/styles/base.css

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -114,16 +114,16 @@ code {
114114
color: var(--ddp-text) !important;
115115
}
116116

117-
/* === Code blocks === */
117+
/* === Code blocks (GitHub-style: subtle bg, no border) === */
118118
pre code {
119119
display: block;
120-
padding: 0.75em 1em;
120+
padding: 0.85em 1em;
121121
overflow-x: auto;
122-
background: var(--ddp-surface);
123-
border: 1px solid var(--ddp-border);
124-
border-radius: 4px;
122+
background: #f6f8fa;
123+
border: none;
124+
border-radius: 6px;
125125
font-size: 0.8125em;
126-
line-height: 1.6;
126+
line-height: 1.55;
127127
}
128128

129129
/* === Links === */

0 commit comments

Comments
 (0)