You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**Zero dependencies** — never add to go.mod. stdlib only (`math`, `sort`).
22
+
-**Zero dependencies** — never add to go.mod. stdlib only: `math`, `sort`, `encoding/gob`, `encoding/json`, `os`, `strings`, `unicode`, `math/rand`.
21
23
-**Vector = []float32** — no struct, no interface, just a slice.
22
-
-**Mismatched lengths → zero** — Dot/Norm/Cosine/Euclidean/Manhattan all return 0 for mismatched-length inputs rather than panicking. Add/Sub return nil.
23
-
-**Clone on output** — Get() and Search() return copies. Store.Add() clones on insert. No internal backing arrays ever leak.
24
-
-**Zero-alloc distance** — Cosine and Euclidean compute in a single pass without intermediate allocations. All distance functions have 0 allocs in benchmarks.
25
-
-**Tests in `_test.go` files** — package `vector`, no separate test package.
26
-
-**Benchmarks** — `bench_test.go` covers all operations at 768/1536 dims. Run with `-benchmem`.
24
+
-**Mismatched lengths → zero** — return zero/nil rather than panicking.
25
+
-**Clone on output** — Get() and Search() return copies. Store.Add() clones on insert.
26
+
-**Zero-alloc distance** — Cosine and Euclidean compute in a single pass.
3. Add test file with Fit/Embed/determinism/similarity tests
67
+
4. Document in README under Text Embedding section
59
68
60
-
## Performance Rules
69
+
## Adding a New Metric
61
70
62
-
- Distance functions must be zero-allocation (verified by `-benchmem`)
63
-
- Single-pass computation where possible (Cosine, Euclidean already are)
64
-
- Brute-force search is O(n·d) — acceptable for the zero-dep target
65
-
- Benchmark before and after any metric or store change
71
+
1. Add constant to `Metric` enum in `similarity.go`
72
+
2. Add case to `Distance()` switch, update `Ascending()` if needed
73
+
3. Add direct function — zero-alloc, single-pass
74
+
4. Add test + benchmark
66
75
67
-
## Landing Page
76
+
## Security
68
77
69
-
The `docs/` directory is deployed via GitHub Pages (Settings → Pages → Source: main / /docs). Uses the standard BackendStack21 dark theme: Outfit for body, Monaspace Neon for code. No build step — just raw HTML + CSS.
78
+
- No panics. Return zero/nil for invalid input.
79
+
- No unsafe. Pure Go, no CGo, no syscalls.
80
+
- Clone hygiene. Internal state never exposed.
81
+
- Persistence: `Load` replaces all data. Caller handles atomicity.
Copy file name to clipboardExpand all lines: README.md
+84-42Lines changed: 84 additions & 42 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# go-vector
2
2
3
-
Zero-dependency vector similarity library for Go. Pure Go `[]float32` vectors, four distance metrics, brute-force nearest-neighbor search. No CGo, no BLAS, no third-party imports — just `math` and `sort`.
3
+
Zero-dependency vector similarity library for Go. Pure Go `[]float32` vectors, four distance metrics, text embedding via random projections, and disk-backed persistence. No CGo, no BLAS, no third-party imports.
4
4
5
5
## Install
6
6
@@ -36,6 +36,45 @@ func main() {
36
36
}
37
37
```
38
38
39
+
## Text Embedding
40
+
41
+
```go
42
+
// Fit a random projection embedder on your corpus
43
+
rp:= vector.NewRandomProjections(256)
44
+
rp.Fit([]string{
45
+
"machine learning is fascinating",
46
+
"deep neural networks transform AI",
47
+
"the weather today is sunny",
48
+
})
49
+
50
+
// Embed text into a 256-dim vector
51
+
v, _:= rp.Embed("learning about machine intelligence")
52
+
// v is a normalized Vector suitable for cosine similarity search
53
+
54
+
// Use with the store
55
+
store:= vector.NewStore(vector.CosineDistance)
56
+
store.Add("doc1", v)
57
+
store.Search(rp.MustEmbed("AI and learning"), 5)
58
+
```
59
+
60
+
The `Embedder` interface lets you swap backends: bring your own OpenAI, Ollama, or sentence-transformers adapter. The built-in `RandomProjections` is zero-dependency and deterministic.
61
+
62
+
## Persistence
63
+
64
+
```go
65
+
// Save to disk (gob — compact binary)
66
+
store.Save("/data/vectors.db")
67
+
store.SaveJSON("/data/vectors.json") // human-readable alternative
68
+
69
+
// Restore later
70
+
restored:= vector.NewStore(vector.CosineDistance)
71
+
restored.Load("/data/vectors.db")
72
+
73
+
// Full roundtrip — metric and all data preserved
74
+
```
75
+
76
+
Gob-encoded stores are compact (~4 bytes per float32 + overhead). For a 10K × 1536d store, expect ~60 MB on disk and ~200ms save/load times.
77
+
39
78
## API
40
79
41
80
### Vector Type
@@ -44,18 +83,16 @@ func main() {
44
83
45
84
### Core Operations
46
85
47
-
| Function | Returns | Description |
48
-
|----------|---------|-------------|
49
-
|`Dims(v Vector) int`|`int`| Dimensionality |
50
-
|`Dot(a, b Vector) float32`|`float32`| Dot product (0 if lengths differ) |
|`Normalize(v Vector) Vector`|`Vector`| Unit vector (nil for zero vector) |
53
-
|`Add(a, b Vector) Vector`|`Vector`| Element-wise sum (nil if lengths differ) |
54
-
|`Sub(a, b Vector) Vector`|`Vector`| Element-wise difference (nil if lengths differ) |
55
-
|`Scale(v Vector, s float32) Vector`|`Vector`| Scalar multiplication |
56
-
|`Equal(a, b Vector) bool`|`bool`| Approximate equality (ε = 1e-6) |
57
-
|`EqualEps(a, b Vector, eps float32) bool`|`bool`| Custom epsilon equality |
58
-
|`Clone(v Vector) Vector`|`Vector`| Deep copy |
86
+
-`Dims(v Vector) int` — dimensionality
87
+
-`Dot(a, b Vector) float32` — dot product (0 if lengths differ)
88
+
-`Norm(v Vector) float32` — L2 norm
89
+
-`Normalize(v Vector) Vector` — unit vector (nil for zero vector)
90
+
-`Add(a, b Vector) Vector` — element-wise sum (nil if lengths differ)
91
+
-`Sub(a, b Vector) Vector` — element-wise difference (nil if lengths differ)
92
+
-`Scale(v Vector, s float32) Vector` — scalar multiplication
93
+
-`Equal(a, b Vector) bool` — approximate equality (ε = 1e-6)
94
+
-`EqualEps(a, b Vector, eps float32) bool` — custom epsilon
95
+
-`Clone(v Vector) Vector` — deep copy
59
96
60
97
### Distance Metrics
61
98
@@ -66,34 +103,46 @@ vector.ManhattanDistance // L1 distance → [0, ∞), lower = more similar
66
103
vector.DotProductSimilarity// dot product → (−∞, ∞), higher = more similar
67
104
```
68
105
69
-
Direct functions available:
70
-
-`Cosine(a, b)` — cosine similarity [−1, 1]
71
-
-`CosineDist(a, b)` — 1 − cosine [0, 2]
72
-
-`Euclidean(a, b)` — L2 distance (zero-alloc)
73
-
-`Manhattan(a, b)` — L1 distance
74
-
-`Distance(a, b, metric)` — metric-dispatch version
106
+
Direct functions: `Cosine`, `CosineDist`, `Euclidean`, `Manhattan`, `Distance`.
75
107
76
108
### Vector Store
77
109
78
110
```go
79
111
store:= vector.NewStore(vector.CosineDistance)
80
112
81
-
store.Add(id string, v Vector) // insert (clones input)
82
-
store.Search(query Vector, k int) // top-k nearest neighbors
83
-
store.Get(id string) Vector// lookup by id (clone)
84
-
store.Remove(id string) bool// remove by id
85
-
store.Len() int// count
113
+
store.Add(id, v) // insert (clones input)
114
+
store.Search(query, k) // top-k nearest neighbors
115
+
store.Get(id) // lookup by id (clone)
116
+
store.Remove(id) // remove by id
117
+
store.Len() // count
118
+
store.Save(path) // gob-encode to file
119
+
store.Load(path) // restore from gob file
120
+
store.SaveJSON(path) // JSON export
121
+
store.LoadJSON(path) // JSON import
122
+
```
123
+
124
+
### Text Embedding
125
+
126
+
```go
127
+
typeEmbedderinterface {
128
+
Embed(text string) (Vector, error)
129
+
Dims() int
130
+
}
86
131
```
87
132
88
-
`Search()` returns `[]SearchResult` sorted by distance:
89
-
- Distance metrics (Cosine, Euclidean, Manhattan): ascending order
90
-
- DotProductSimilarity: descending order
133
+
**Built-in: `RandomProjections`**
91
134
92
-
Results include **cloned** vectors — mutations won't corrupt store state.
135
+
Johnson-Lindenstrauss sparse random projection (Achlioptas 2003). Projects tokenized text into a fixed-size normalized vector. Deterministic (fixed seed), zero dependencies, ~10µs per embed.
| Thread safety | 🟡 Store is read-safe but not write-safe — guard with `sync.Mutex`|
121
169
122
-
`SearchResult.Vector` is a `[]float32` — it's a clone from the store, but if your application mutates float32 slices returned by Search, clone them again. The store's internal state is never exposed.
123
-
124
-
### Float32 Precision
125
-
126
-
Dot products on high-dimensional vectors with large magnitudes (>1e19) can overflow float32 (±3.4e38). For typical embedding use (normalized vectors, dims < 10K), this is not a concern. If your vectors have large unnormalized magnitudes, normalize before insertion.
127
-
128
170
## Design
129
171
130
-
-**Zero dependencies** — `go.mod` has no `require` block. `math` + `sort` only.
172
+
-**Zero dependencies** — `go.mod` has no `require` block
131
173
-**Type alias** — `Vector` is `[]float32`, interoperable with any `[]float32` data
132
174
-**Brute-force search** — O(n·d) per query; pair with an approximate index for n > 100K
133
-
-**Clone safety** — `Get()`, `Search()`, and `Add()` all clone — no accidental mutation
175
+
-**Clone safety** — `Get()`, `Search()`, and `Add()` all clone
134
176
-**Graceful degradation** — mismatched lengths return zero/nil, never panic
135
-
-**Single-pass** — Cosine and Euclidean compute in one pass without intermediate allocations
177
+
-**Deterministic embeddings** — fixed seed (42) for reproducible results
0 commit comments