This document consolidates the project goals, architecture overview, implementation plan, and feature matrix for the sikuli-go port.
go.mod: root GoLang modulepkg/sikuli: public API surface and compatibility-facing typesinternal/core: shared contracts and primitives (SearchRequest, Matcher, resize helpers)internal/cv: concrete matching engine implementationinternal/ocr: OCR backend adapters and hOCR parsing helpersinternal/input: input automation backend adaptersinternal/observe: observe/event backend adaptersinternal/app: app/window backend adaptersinternal/testharness: golden corpus loading and parity comparatorsThe matcher boundary is fixed behind core.Matcher:
type Matcher interface {
Find(req SearchRequest) ([]MatchCandidate, error)
}
This keeps pkg/sikuli stable while allowing alternate implementations (e.g., gocv) later.
pkg/sikuli objects| Type | Kind | Role | ||
|---|---|---|---|---|
Point |
object | coordinate pair | ✅ | |
Location |
object | parity-friendly coordinate object | ✅ | |
Offset |
object | parity-friendly offset object | ✅ | |
Rect |
object | geometry primitive | ✅ | |
Region |
object | geometry + search defaults container | ✅ | |
Screen |
object | screen identity/bounds abstraction | ✅ | |
Image |
object | grayscale image holder | ✅ | |
Pattern |
object | matching intent/configuration | ✅ | |
Match |
object | match result payload | ✅ | |
TextMatch |
object | OCR text match payload | ✅ | |
OCRParams |
object | OCR request option payload | ✅ | |
InputOptions |
object | input action option payload | ✅ | |
InputController |
object | input automation orchestrator | ✅ | |
ObserveOptions |
object | observe operation option payload | ✅ | |
ObserveEventType |
object | observe event enum | ✅ | |
ObserveEvent |
object | observe event payload | ✅ | |
ObserverController |
object | observe orchestration controller | ✅ | |
AppOptions |
object | app operation option payload | ✅ | |
Window |
object | app/window payload | ✅ | |
AppController |
object | app/window orchestration controller | ✅ | |
Finder |
object | user-facing matching orchestrator | ✅ | |
RuntimeSettings |
object | global runtime behavior values | ✅ | |
Options |
object | typed string-map options wrapper | ✅ |
pkg/sikuli interfaces| Interface | Contract | ||
|---|---|---|---|
ImageAPI |
image surface | ✅ | Signature and tests are in place |
PatternAPI |
pattern surface | ✅ | Signature and tests are in place |
FinderAPI |
finder surface | ✅ | Signature and tests are in place |
RegionAPI |
region surface | ✅ | Signature and tests are in place |
InputAPI |
input automation surface | ✅ | Signature and tests are in place |
ObserveAPI |
observe/event surface | ✅ | Signature and tests are in place |
AppAPI |
app/window surface | ✅ | Signature and tests are in place |
internal/core protocol objects| Type | Kind | Role | ||
|---|---|---|---|---|
SearchRequest |
protocol object | match request | ✅ | Stable request contract |
MatchCandidate |
protocol object | match response item | ✅ | Stable response contract |
Matcher |
protocol interface | matcher boundary | ✅ | Used by finder protocol |
OCRRequest |
protocol object | OCR request | ✅ | Stable OCR request contract |
OCRWord |
protocol object | OCR word payload | ✅ | Stable OCR word contract |
OCRResult |
protocol object | OCR response payload | ✅ | Stable OCR response contract |
OCR |
protocol interface | OCR boundary | ✅ | Used by finder OCR protocol |
InputAction |
protocol object | input action enum | ✅ | Stable input action contract |
InputRequest |
protocol object | input request | ✅ | Stable input request contract |
Input |
protocol interface | input boundary | ✅ | Used by input controller |
ObserveEventType |
protocol object | observe event enum | ✅ | Stable observe event contract |
ObserveRequest |
protocol object | observe request | ✅ | Stable observe request contract |
ObserveEvent |
protocol object | observe event payload | ✅ | Stable observe payload contract |
Observer |
protocol interface | observe boundary | ✅ | Used by observer controller |
AppAction |
protocol object | app action enum | ✅ | Stable app action contract |
AppRequest |
protocol object | app request | ✅ | Stable app request contract |
WindowInfo |
protocol object | window payload | ✅ | Stable window payload contract |
AppResult |
protocol object | app response payload | ✅ | Stable app response contract |
App |
protocol interface | app boundary | ✅ | Used by app controller |
internal/cv protocol implementation| Type | Kind | Role | ||
|---|---|---|---|---|
NCCMatcher |
protocol implementer | default matcher | ✅ | Primary backend in use |
SADMatcher |
protocol implementer | alternate matcher | ✅ | Conformance-tested alternate |
internal/ocr protocol implementation| Type | Kind | Role | ||
|---|---|---|---|---|
unsupportedBackend |
protocol implementer | default OCR behavior | ✅ | returns unsupported unless gosseract tag is enabled |
gosseractBackend |
protocol implementer | OCR adapter | ✅ | enabled with -tags gosseract |
internal/input protocol implementation| Type | Kind | Role | ||
|---|---|---|---|---|
darwinBackend |
protocol implementer | concrete input for macOS | ✅ | supports move/click/type/hotkey dispatch |
linuxBackend |
protocol implementer | concrete input for Linux | ✅ | command-driven move/click/type/hotkey via xdotool |
windowsBackend |
protocol implementer | concrete input for Windows | ✅ | PowerShell-driven move/click/type/hotkey |
unsupportedBackend |
protocol implementer | non-target fallback input behavior | ✅ | returns unsupported on !darwin && !linux && !windows builds |
internal/observe protocol implementation| Type | Kind | Role | ||
|---|---|---|---|---|
pollingBackend |
protocol implementer | deterministic observe behavior | ✅ | matcher-driven interval polling for appear/vanish/change |
internal/app protocol implementation| Type | Kind | Role | ||
|---|---|---|---|---|
darwinBackend |
protocol implementer | concrete app/window for macOS | ✅ | supports open/focus/close/is-running/list-windows |
linuxBackend |
protocol implementer | concrete app/window for Linux | ✅ | command-driven open/focus/close/is-running/list-windows |
windowsBackend |
protocol implementer | concrete app/window for Windows | ✅ | PowerShell-driven open/focus/close/is-running/list-windows |
unsupportedBackend |
protocol implementer | non-target fallback behavior | ✅ | returns unsupported for non-darwin/linux/windows builds |
internal/testharness protocol objects| Type | Kind | Role | ||
|---|---|---|---|---|
GoldenCase |
protocol object | serialized test case schema | ✅ | Active fixture schema |
ExpectedMatch |
protocol object | expected match schema | ✅ | Active fixture schema |
CompareOptions |
protocol object | comparator tolerance schema | ✅ | Active comparator contract |
Image, Pattern, Match, Finder, Region, ScreenStatus: ✅ Completed (baseline implemented)
Current extension state: Region geometry/runtime helper surface, Finder wait/vanish helpers, Region-scoped search/wait parity scaffolding, and Location/Offset parity objects are implemented and covered by unit tests.
Current extension state additionally includes Options typed configuration helpers, sorted FindAll parity helpers, OCR text-search APIs (ReadText/FindText) with optional gosseract backend integration, input automation scaffolding, observe/event scaffolding, and app/window scaffolding.
go test ./... from repo root as the regression baseline.Status: ✅ Completed (baseline implemented)
| Workstream | Baseline scaffold | Concrete backend | |
|---|---|---|---|
| Workstream 5: OCR and text-search parity | ✅ | ✅ | gosseract module version is pinned and enabled with -tags gosseract |
| Workstream 6: Input automation and hotkey parity | ✅ | ✅ | concrete darwin/linux/windows backends implemented |
| Workstream 7: Observe/event subsystem parity | ✅ | ✅ | deterministic polling backend implemented in internal/observe |
| Workstream 8: App/window/process control parity | ✅ | ✅ | concrete darwin/linux/windows backends implemented |
pkg/sikuli to include additional parity objects and behaviors (location/offset aliases, broader region/finder helpers, options surfaces).Status: ✅ Completed
Completed scope:
core.Matcher protocol.Status: ✅ Completed
Completed scope:
NCCMatcher and SADMatcher.internal/core.Finder.ReadText/FindText and region-scoped text operations.gosseract module version.Status (Baseline scaffold): ✅ Completed
Status (Concrete backend): ✅ Completed (pinned gosseract backend with tagged tests)
internal/core.InputController with move/click/type/hotkey APIs.Status (Baseline scaffold): ✅ Completed
Status (Concrete backend): ✅ Completed (darwin + linux + windows backends implemented)
internal/core.ObserverController with appear/vanish/change APIs.Status (Baseline scaffold): ✅ Completed Status (Concrete backend): ✅ Completed (deterministic polling backend implemented)
internal/core.AppController with open/focus/close/is-running/list-window APIs.darwin, linux, windows).Status (Baseline scaffold): ✅ Completed
Status (Concrete backend): ✅ Completed (darwin + linux + windows backends implemented)
| Area | Scope | Priority | ||
|---|---|---|---|---|
| Geometry primitives | Point, Rect, Region construction and transforms |
P0 | ✅ | includes region union/intersection/containment and runtime setters |
| Location/offset parity types | Location, Offset value objects |
P0 | ✅ | supports parity-friendly coordinate APIs |
| Screen abstraction | Screen id/bounds object |
P1 | ✅ | add monitor discovery later |
| Image model | Image constructors, copy, dimensions |
P0 | ✅ | add advanced image utilities later |
| Pattern semantics | similarity, exact, offset, resize, mask | P0 | ✅ | currently fully covered by default table |
| Match result model | score, target, index, geometry | P0 | ✅ | extend with comparator helpers if needed |
| Finder single target | Find + fail semantics |
P0 | ✅ | includes Exists and Has helper semantics |
| Finder wait/vanish semantics | Wait and WaitVanish timeout polling |
P0 | ✅ | global wait scan rate polling |
| Finder multi-target | FindAll ordering + indexing |
P0 | ✅ | deterministic order defined |
| Finder sorted multi-target helpers | FindAllByRow / FindAllByColumn |
P0 | ✅ | helper sorting + reindexing behavior |
| Region-scoped search | Region.Find/Exists/Has/Wait with timeout polling |
P0 | ✅ | uses source crop + finder backend |
| Region sorted multi-target helpers | FindAll / FindAllByRow / FindAllByColumn |
P0 | ✅ | region-scoped delegation |
| Image crop protocol | Image.Crop(rect) absolute-coordinate crop behavior |
P0 | ✅ | enables region-scoped search protocol |
| Finder protocol swappability | SetMatcher(core.Matcher) |
P0 | ✅ | enables backend evolution |
| Global settings | RuntimeSettings get/update/reset |
P1 | ✅ | expand settings map as parity grows |
| Options/config object | typed get/set/delete/clone/merge | P1 | ✅ | string-map compatibility helper |
| Signature compatibility layer | ImageAPI, PatternAPI, FinderAPI, RegionAPI, InputAPI, ObserveAPI, AppAPI |
P0 | ✅ | compatibility documented in API docs |
| Core matcher protocol | SearchRequest, MatchCandidate, Matcher |
P0 | ✅ | strict boundary maintained |
| Core image protocol util | ResizeGrayNearest |
P1 | ✅ | may add interpolation variants later |
| CV backend implementation | NCCMatcher |
P0 | ✅ | first backend |
| Alternate matcher backend | SADMatcher |
P1 | ✅ | enables multi-backend protocol checks |
| Golden parity protocol | corpus loader + comparator + tests | P0 | ✅ | active in CI/local tests |
| Backend conformance protocol | ordering/threshold/mask/resize assertions | P0 | ✅ | active tests in internal/cv |
| CI test visibility | race tests + vet + tidy diff enforcement | P0 | ✅ | workflow publishes strict signal |
| End-to-end parity flows | app + input + observe + OCR chained behavior | P1 | ✅ | dedicated parity e2e tests for default and -tags gosseract builds |
| OCR/text search | read text/find text parity | P1 | ✅ | finder/region OCR APIs with optional gosseract backend |
| OCR backend swappability | core.OCR protocol + backend selection |
P1 | ✅ | unsupported default + pinned gosseract build-tag backend |
| OCR conformance tests | confidence filtering + ordering + backend behavior | P1 | ✅ | includes unsupported backend and tagged hOCR parser conformance tests |
| Input automation | mouse/keyboard parity | P1 | ✅ | InputController scaffolding with protocol boundary and tests |
| Input backend swappability | core.Input protocol + backend selection |
P1 | ✅ | concrete darwin/linux/windows backends + non-target fallback |
| Observe/events | appear/vanish/change parity | P1 | ✅ | ObserverController + concrete deterministic polling backend |
| Observe backend swappability | core.Observer protocol + backend selection |
P1 | ✅ | concrete default polling backend via internal/observe |
| App/window/process | focus/open/close/window parity | P2 | ✅ | AppController protocol with concrete darwin/linux/windows backends |
| App backend swappability | core.App protocol + backend selection |
P2 | ✅ | concrete backends for major desktop OS targets |
Each existing object/interface/protocol is considered feature-complete when:
docs/guides/default-behavior-table.md.docs/guides/default-behavior-table.mddocs/guides/backend-capability-matrix.md