DSL Hybrid Search

Beyond Keywords and Vectors: DSL-Based Hybrid Search for Enterprise Code Discovery

Traditional code search forces teams to pick between fuzzy semantics or brittle exact matches. CoderSwap’s DSL layer blends both signals—plus recency—in real time so developers surface the right snippet, error, or pattern on the very first try.

October 15, 202512 min readOracle 23ai • Secure by Design

Why Traditional Search Fails for Code

Engineering teams sit on millions of lines of code, RFCs, and runbooks. Keyword-only systems miss synonyms and architectural concepts; semantic-only systems ignore literal identifiers like `ORA-12154` or `getInvoiceStatus`. The downstream effect is endless context switching, duplicate implementations, and tribal knowledge that never scales.

  • Pure semantic search understands concepts but overlooks exact error codes, function names, or API endpoints.
  • Pure keyword search finds literals yet fails whenever authors write “exception handling” while you search for “error handling”.
  • Most hybrid systems assume fixed weights, so the blend never adapts to the developer’s real intent.

DSL-Powered Hybrid Intelligence

CoderSwap generates a domain-specific policy at deployment time. The DSL encodes regex-based intent detectors alongside weight budgets for semantic, lexical, and recency signals. At query time the policy fires the first matching intent and streams safe bind parameters into Oracle—no dynamic SQL required.

Sample DSL Snippet

{
  "intents": [
    {
      "pattern": "error|exception|ora-\\d+",
      "weights": { "semantic": 0.3, "lexical": 0.6, "recency": 0.1 }
    },
    {
      "pattern": "implement|create|build|develop",
      "weights": { "semantic": 0.7, "lexical": 0.2, "recency": 0.1 }
    }
  ],
  "defaults": { "semantic": 0.5, "lexical": 0.4, "recency": 0.1 }
}
Dense vectors

Semantic understanding

All-MiniLM-L12-v2 embeddings (384-dimension) run inside Oracle ADW 23ai so concept-level matches like “auth” → “authentication” land instantly.

Sparse vectors

Lexical precision

Oracle TEXT indexes (with graceful LIKE fallback) capture brittle identifiers such as `ORA-12154`, REST endpoints, or class names with O(log n) lookups.

Recency scoring

Temporal relevance

A 36-month half-life decay keeps recently merged code ahead of decade-old utilities while still surfacing long-lived reference docs.

Query-to-Result Architecture

Every search runs the same deterministic pipeline: detect intent, pull weights, execute vector and keyword branches in parallel, then fuse the scores with an intent-aware budget. A diversity filter keeps results from repeating the same file.

User Query
  ↓
Intent Detection (Regex)
  ↓
DSL Weight Selection
  ↓
Parallel Retrieval
    ├─ Vector Search (HNSW)
    └─ Keyword Search (Oracle TEXT)
  ↓
Weighted Fusion & Diversity Filter
  ↓
Results

Hybrid SQL Outline

WITH q AS (
  SELECT VECTOR_EMBEDDING(all_minilm_l12_v2 USING :query AS data) v FROM dual
)
SELECT
  chunk_id,
  text,
  metadata,
  (1 - VECTOR_DISTANCE(embedding, (SELECT v FROM q), COSINE)) AS semantic_sim,
  SCORE(1) AS lexical_sim,
  GREATEST(0, 1 - (MONTHS_BETWEEN(SYSDATE, created_at) / 36)) AS recency_score,
  (:w_semantic * semantic_sim +
   :w_lexical * lexical_sim +
   :w_recency * recency_score) AS final_score
FROM CS_PROJECT_CHUNKS
WHERE CONTAINS(text, :lexical_query, 1) > 0
   OR VECTOR_DISTANCE(embedding, (SELECT v FROM q), COSINE) < 0.8
ORDER BY final_score DESC
FETCH FIRST :k ROWS ONLY;

DSL Compilation Pipeline

The configuration lives entirely inside Oracle and is versioned for rollback. Anthropic Claude analyzes a 20K-token corpus sample, proposes a DSL policy, and CoderSwap validates weights, regex specificity, and schema before committing it.

  1. 1. Corpus sampling – Random 50-chunk slice with true randomness ensures Claude sees representative data.
  2. 2. Haiku DSL generation – Tool invocation returns JSON only; no SQL ever leaves the model boundary.
  3. 3. Guardrails & validation – Weight normalization, regex specificity checks, and shared DSLValidator enforce safety.
  4. 4. Versioned storage – Policies persist in `CS_CONFIGURATION_VERSIONS` with automatic history and rollback.
  5. 5. Query-time fusion – Search APIs load the active DSL and feed Oracle bind parameters on every request.

Safe scoring expression

def compile_weights(selected_weights):
    ALLOWED_IDENTIFIERS = {"semantic", "lexical", "recency"}

    terms, binds = [], {}
    for signal, weight in selected_weights.items():
        if signal in ALLOWED_IDENTIFIERS:
            param = f"w_{signal}"
            terms.append(f":{param} * {signal}")
            binds[param] = min(max(weight, 0.0), 1.0)

    return " + ".join(terms), binds

Future Directions

Cross-encoder re-ranking

Add a lightweight second stage that re-sorts the top 100 hybrid candidates with a neural cross-encoder for precision-critical queries.

Learned sparse embeddings (SPLADE)

Blend semantic and lexical cues in one model to lift nDCG by 15–20% while shrinking the orchestration footprint.

Multi-modal search

Index diagrams, screenshots, and transcripts next to code so solution discovery spans every artifact developers touch.

Why It Matters

Developers

  • Find the right code 3× faster.
  • Stay in flow with precise, intent-aware results.
  • Spot patterns and reuse implementations instantly.

Enterprises

  • Accelerate onboarding for new engineers.
  • Expose existing solutions and reduce duplication.
  • Enforce compliance checks across massive repos.

Security Teams

  • Oracle residency keeps code inside the customer’s tenant.
  • Soft deletes and backups protect against accidental loss.
  • Whitelist-based DSL compilation removes SQL injection risk.

Get Started in Minutes

Upload your repo, let CoderSwap build a DSL profile, and query your codebase with enterprise-grade safeguards.

# Upload your codebase
curl -X POST https://api.coderswap.ai/v1/upload \
  -H "X-API-Key: your-key" \
  -F "files=@src.zip"

# Generate search profile
curl -X POST https://api.coderswap.ai/v1/settings:generate \
  -H "X-API-Key: your-key" \
  -d '{"project_id": "your-project"}'

# Search with intelligence
curl -X POST https://api.coderswap.ai/v1/search \
  -H "X-API-Key: your-key" \
  -d '{"query": "ORA-12154 connection handling", "k": 10}'

Integrations ready today

  • VS Code extension for in-IDE discovery
  • Slack slash command (`/search error handling in payment service`)
  • CI/CD hooks to surface related code during reviews

References & Further Reading

About CoderSwap

We are building the future of enterprise code search. Every deployment ships with intent-aware DSL policies, Oracle-native storage, and automation that keeps your developers focused on shipping features.