Skip to content

Commit d445b68

Browse files
committed
looks like it works!
1 parent 4e4c885 commit d445b68

File tree

9 files changed

+491
-2
lines changed

9 files changed

+491
-2
lines changed

CLAUDE.md

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,8 @@ The following libraries should be preferred to alternatives when their functiona
4141
- RDF handling : rdf-ext, grapoi, @rdfjs/data-model @rdfjs/namespace @rdfjs/parser-n3
4242
- code editing : codemirror
4343
- templating : nunjucks
44-
- markdown : marked
44+
- markdown : marked (markdown→HTML conversion)
45+
- HTML parsing : cheerio (DOM manipulation and HTML→markdown conversion)
4546

4647
## Transmissions Framework
4748

@@ -80,6 +81,7 @@ Transmissions is a message-driven pipeline framework where:
8081
- **SPARQL:** SPARQLSelect, SPARQLUpdate - interact with RDF stores
8182
- **Transform:** Restructure, PathOps - modify message structure
8283
- **I/O:** FileReader, FileWriter, HttpClient - external interactions
84+
- **Markup:** MarkdownToHTML, HTMLToMarkdown, MarkdownToLinks - content format conversion
8385
- **Note:** Many more processors exist in `src/processors/`. Always search for existing processors before creating new ones. Use `Glob` to find processors: `src/processors/**/*.js`
8486

8587
**Creating New Processors:**
@@ -93,7 +95,21 @@ Transmissions is a message-driven pipeline framework where:
9395
- Use `super.getProperty(ns.trn.propertyName, defaultValue)` for configuration
9496
- Emit processed message: `this.emit('message', message)`
9597

98+
**Common Application Patterns:**
99+
- **SPARQL Query → ForEach → Process → SPARQL Update** - Process multiple items from store
100+
- Example: `bookmark-get` fetches HTML for bookmarks, converts to markdown, stores back
101+
- SPARQLSelect with FILTER NOT EXISTS to skip already-processed items
102+
- ForEach iterates over query results
103+
- Restructure extracts fields from SPARQL result bindings
104+
- Processing steps (HttpClient, conversion, etc.)
105+
- SPARQLUpdate stores processed data
106+
- **File → Process → SPARQL Store** - Ingest content from files
107+
- Example: `md-to-store` reads markdown, creates entries, stores in SPARQL
108+
- **SPARQL Query → ForEach → Template → File** - Export from store to files
109+
- Example: `sparqlstore-to-html` generates HTML pages from stored content
110+
96111
**Debugging:**
97112
- Run with `-v` flag for verbose output: `./trans -v app-name`
98113
- Use `LOG_LEVEL=debug` for detailed logging
99114
- Add `:SM` (ShowMessage) processor in pipeline to inspect messages
115+
- Check message fields with Restructure to ensure correct paths between processors

src/apps/bookmark-get/about.md

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
# bookmark-get Application
2+
3+
## Description
4+
5+
Fetches HTML content for bookmarks with HTTP status 200, converts to Markdown, and stores in SPARQL graph.
6+
7+
## Runner
8+
9+
```sh
10+
cd ~/hyperdata/transmissions
11+
./trans bookmark-get
12+
```
13+
14+
## Workflow
15+
16+
1. **SPARQLSelect** - Query bookmarks with status 200 that don't have content yet
17+
2. **ForEach** - Iterate over each bookmark
18+
3. **Restructure** - Extract target URL and bookmark URI
19+
4. **HttpClient** - Fetch HTML page
20+
5. **HTMLToMarkdown** - Convert HTML to Markdown
21+
6. **Restructure** - Prepare data for update
22+
7. **SPARQLUpdate** - Store markdown content in graph
23+
8. **ShowMessage** - Debug output
24+
25+
## SPARQL Data Model
26+
27+
**Input Query:**
28+
```sparql
29+
PREFIX bm: <http://purl.org/stuff/bm/>
30+
SELECT ?bookmark ?target ?title
31+
WHERE {
32+
?bookmark a bm:Bookmark ;
33+
bm:target ?target ;
34+
bm:status 200 .
35+
FILTER NOT EXISTS { ?bookmark bm:content ?content }
36+
}
37+
```
38+
39+
**Output Update:**
40+
```sparql
41+
PREFIX bm: <http://purl.org/stuff/bm/>
42+
INSERT DATA {
43+
GRAPH <http://hyperdata.it/content> {
44+
<bookmark-uri> bm:content "markdown content" ;
45+
bm:contentType "text/markdown" ;
46+
bm:fetched "2025-09-30T12:00:00Z"^^xsd:dateTime .
47+
}
48+
}
49+
```
50+
51+
## Verification Query
52+
53+
To verify the pipeline has worked, check which bookmarks have content:
54+
55+
```sparql
56+
PREFIX bm: <http://purl.org/stuff/bm/>
57+
58+
SELECT ?target ?title (SUBSTR(?content, 1, 100) AS ?preview) ?fetched
59+
FROM <http://hyperdata.it/content>
60+
WHERE {
61+
?bookmark a bm:Bookmark ;
62+
bm:target ?target ;
63+
bm:content ?content ;
64+
bm:fetched ?fetched .
65+
OPTIONAL { ?bookmark bm:title ?title }
66+
}
67+
ORDER BY DESC(?fetched)
68+
```
69+
70+
Or via curl:
71+
72+
```sh
73+
curl -s -H "Accept: text/plain" --user admin:admin123 \
74+
"http://localhost:3030/test/query?query=PREFIX%20bm%3A%20%3Chttp%3A%2F%2Fpurl.org%2Fstuff%2Fbm%2F%3E%0ASELECT%20%3Ftarget%20%28EXISTS%7B%3Fbookmark%20bm%3Acontent%20%3Fcontent%7D%20AS%20%3FhasContent%29%20WHERE%20%7B%20GRAPH%20%3Chttp%3A%2F%2Fhyperdata.it%2Fcontent%3E%20%7B%20%3Fbookmark%20a%20bm%3ABookmark%20%3B%20bm%3Atarget%20%3Ftarget%20%7D%20%7D"
75+
```
76+
77+
## Notes
78+
79+
- Only processes bookmarks with status 200 (successful HTTP)
80+
- Skips bookmarks that already have content
81+
- Removes scripts, styles, nav, footer, aside, header elements before conversion
82+
- Preserves links, images, formatting, lists, code blocks, tables
83+
- Handles nested HTML structures
84+
- Second run will skip bookmarks that already have content

src/apps/bookmark-get/config.ttl

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# src/apps/bookmark-get/config.ttl
2+
3+
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
4+
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
5+
6+
@prefix : <http://purl.org/stuff/transmissions/> .
7+
8+
:selectBookmarks a :ConfigSet ;
9+
:templateFilename "data/select-html-bookmarks.njk" ;
10+
:endpointSettings "data/endpoints.json" ;
11+
:graph <http://hyperdata.it/content> .
12+
13+
:iterateBookmarks a :ConfigSet ;
14+
:remove "true" ;
15+
:forEach "queryResults.results.bindings" .
16+
17+
:prepFetch a :ConfigSet ;
18+
:rename (:pf1 :pf2) .
19+
:pf1 :pre "currentItem.target.value" ;
20+
:post "url" .
21+
:pf2 :pre "currentItem.bookmark.value" ;
22+
:post "bookmarkURI" .
23+
24+
:httpSettings a :ConfigSet ;
25+
:url "message.url" ;
26+
:method "GET" .
27+
28+
:htmlToMd a :ConfigSet ;
29+
:inputField "http.data" ;
30+
:outputField "content" ;
31+
:cleanSelectors "script,style,nav,footer,aside,noscript,header" .
32+
33+
:prepUpdate a :ConfigSet ;
34+
:rename (:pu1 :pu2) .
35+
:pu1 :pre "bookmarkURI" ;
36+
:post "bookmark" .
37+
:pu2 :pre "content" ;
38+
:post "content" .
39+
40+
:updateContent a :ConfigSet ;
41+
:dataBlock "message" ;
42+
:templateFilename "data/update-bookmark-content.njk" ;
43+
:endpointSettings "data/endpoints.json" ;
44+
:graph <http://hyperdata.it/content> .
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
[
2+
{
3+
"name": "local Fuseki",
4+
"type": "query",
5+
"url": "http://localhost:3030/test/query",
6+
"credentials": {
7+
"user": "admin",
8+
"password": "admin123"
9+
}
10+
},
11+
{
12+
"name": "local Fuseki",
13+
"type": "update",
14+
"url": "http://localhost:3030/test/update",
15+
"credentials": {
16+
"user": "admin",
17+
"password": "admin123"
18+
}
19+
}
20+
]
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
PREFIX bm: <http://purl.org/stuff/bm/>
2+
SELECT ?bookmark ?target ?title
3+
FROM <{{graph}}>
4+
WHERE {
5+
?bookmark a bm:Bookmark ;
6+
bm:target ?target ;
7+
bm:status 200 .
8+
OPTIONAL { ?bookmark bm:title ?title }
9+
FILTER NOT EXISTS { ?bookmark bm:content ?content }
10+
}
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
PREFIX bm: <http://purl.org/stuff/bm/>
2+
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
3+
4+
INSERT DATA {
5+
GRAPH <{{graph}}> {
6+
<{{bookmark}}> bm:content """{{content | safe}}""" ;
7+
bm:contentType "text/markdown" ;
8+
bm:fetched "{{fetched}}"^^xsd:dateTime .
9+
}
10+
}
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# src/apps/bookmark-get/transmissions.ttl
2+
3+
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
4+
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
5+
6+
@prefix : <http://purl.org/stuff/transmissions/> .
7+
8+
##################################################################
9+
# Utility Processors : insert into pipe for debugging #
10+
# #
11+
:SM a :ShowMessage . # verbose report, continues pipe #
12+
:SC a :ShowConfig . # verbose report, continues pipe #
13+
:DE a :DeadEnd . # ends the current pipe quietly #
14+
:H a :Halt . # kills everything #
15+
:N a :NOP . # no operation (except for showing stage in pipe) #
16+
:UF a :Unfork . # collapses all pipes but one #
17+
##################################################################
18+
19+
:bookmark-get a :Transmission ;
20+
:pipe (:p10 :p20 :p30 :p40 :p50 :p60 :p70 :p80) .
21+
22+
:p10 a :SPARQLSelect ;
23+
:settings :selectBookmarks .
24+
25+
:p20 a :ForEach ;
26+
:settings :iterateBookmarks .
27+
28+
:p30 a :Restructure ;
29+
:settings :prepFetch .
30+
31+
:p40 a :HttpClient ;
32+
:settings :httpSettings .
33+
34+
:p50 a :HTMLToMarkdown ;
35+
:settings :htmlToMd .
36+
37+
:p60 a :Restructure ;
38+
:settings :prepUpdate .
39+
40+
:p70 a :SPARQLUpdate ;
41+
:settings :updateContent .
42+
43+
:p80 a :ShowMessage .

0 commit comments

Comments
 (0)