Discussion:
Meta: a usenet server just for sci.math
Add Reply
Ross A. Finlayson
2016-12-02 04:24:39 UTC
Reply
Permalink
I have an idea here to build a usenet server
only for sci.math and sci.logic. The idea is
to find archives of sci.math and sci.logic and
to populate a store of the articles in a more
or less enduring form (say, "on the cloud"),
then to offer some usual news server access
then to, say, 1 month 3 month 6 month retention,
and then some cumulative retention (with a goal
of unlimited retention of sci.math and sci.logic
articles). The idea would be to have basically
various names of servers then reflect those
retentions for various uses for a read-only
archival server and a read-only daily server
and a read-and-write posting server. I'm willing
to invest time and effort to write the necessary
software and gather existing archives and integrate
with existing usenet providers to put together these
things.

Then, where basically it's in part an exercise
in vanity, I've been cultivating some various
notions of how to generate some summaries or
reports of various post, articles, threads, and
authors, toward the specialization of the cultivation
of summary for reporting and research purposes.


So, I wonder others' idea about such a thing and
how they might see it as a reasonably fruitful
thing, basically for the enjoyment and for the
most direct purposes of the authors of the posts.


I invite comment, as I have begun to carry this out.
Ross A. Finlayson
2016-12-02 19:19:17 UTC
Reply
Permalink
On Thursday, December 1, 2016 at 8:24:47 PM UTC-8, Ross A. Finlayson wrote:
> I have an idea here to build a usenet server
> only for sci.math and sci.logic. The idea is
> to find archives of sci.math and sci.logic and
> to populate a store of the articles in a more
> or less enduring form (say, "on the cloud"),
> then to offer some usual news server access
> then to, say, 1 month 3 month 6 month retention,
> and then some cumulative retention (with a goal
> of unlimited retention of sci.math and sci.logic
> articles). The idea would be to have basically
> various names of servers then reflect those
> retentions for various uses for a read-only
> archival server and a read-only daily server
> and a read-and-write posting server. I'm willing
> to invest time and effort to write the necessary
> software and gather existing archives and integrate
> with existing usenet providers to put together these
> things.
>
> Then, where basically it's in part an exercise
> in vanity, I've been cultivating some various
> notions of how to generate some summaries or
> reports of various post, articles, threads, and
> authors, toward the specialization of the cultivation
> of summary for reporting and research purposes.
>
>
> So, I wonder others' idea about such a thing and
> how they might see it as a reasonably fruitful
> thing, basically for the enjoyment and for the
> most direct purposes of the authors of the posts.
>
>
> I invite comment, as I have begun to carry this out.

So far I've read through the NNTP specs and looked
a bit at the INND code. Then, the general idea is
to define a filesystem layout convention, that then
would be used for articles, then for having those
on virtual disks (eg, "EBS volumes") or cloud storage
(eg, "S3") in essentially a Write-Once-Read-Many
configuration, where the goal is to implement data
structures that have a forward state machine so that
they remain consistent with unreliable computing
resources (eg, "runtimes on EC2 hosts"), and that
are readily cacheable (and horizontally scaleable).

Then, the runtimes are of the collection and maintenance
of posts ("infeeds" and "outfeeds", backfills), about
summary generation (overview, metadata, key extraction,
information content, working up auto-correlation), then
reader servers, then some maintenance and admin. As a
usual software design principle there is a goal of the
both "stack-on-a-box" and also "abstraction of resources"
and a usual separation of domain, library, routine, and
runtime logic.

So basically it looks like:
1) gather mbox files of sci.math and sci.logic
2) copy those to archive inputs
3) break those out into a filesystem layout for each article
(there are various filesystems that support this many files
these days)
4) generate partition and overview summaries
5) generate various revisioning schemes (the "article numbers"
of the various servers)
6) figure out the incremental addition and periodic truncation
7) establish a low-cost but high-availability endpoint runtime
8) make elastic/auto-scaling service routine behind that
9) have opportunistic / low cost periodic maintenance
10) emit that as a configuration that anybody can run
as "stack-on-a-box" or with usual "free tier" cloud accounts
Ross A. Finlayson
2016-12-05 02:44:58 UTC
Reply
Permalink
On Friday, December 2, 2016 at 11:19:23 AM UTC-8, Ross A. Finlayson wrote:
> On Thursday, December 1, 2016 at 8:24:47 PM UTC-8, Ross A. Finlayson wrote:
> > I have an idea here to build a usenet server
> > only for sci.math and sci.logic. The idea is
> > to find archives of sci.math and sci.logic and
> > to populate a store of the articles in a more
> > or less enduring form (say, "on the cloud"),
> > then to offer some usual news server access
> > then to, say, 1 month 3 month 6 month retention,
> > and then some cumulative retention (with a goal
> > of unlimited retention of sci.math and sci.logic
> > articles). The idea would be to have basically
> > various names of servers then reflect those
> > retentions for various uses for a read-only
> > archival server and a read-only daily server
> > and a read-and-write posting server. I'm willing
> > to invest time and effort to write the necessary
> > software and gather existing archives and integrate
> > with existing usenet providers to put together these
> > things.
> >
> > Then, where basically it's in part an exercise
> > in vanity, I've been cultivating some various
> > notions of how to generate some summaries or
> > reports of various post, articles, threads, and
> > authors, toward the specialization of the cultivation
> > of summary for reporting and research purposes.
> >
> >
> > So, I wonder others' idea about such a thing and
> > how they might see it as a reasonably fruitful
> > thing, basically for the enjoyment and for the
> > most direct purposes of the authors of the posts.
> >
> >
> > I invite comment, as I have begun to carry this out.
>
> So far I've read through the NNTP specs and looked
> a bit at the INND code. Then, the general idea is
> to define a filesystem layout convention, that then
> would be used for articles, then for having those
> on virtual disks (eg, "EBS volumes") or cloud storage
> (eg, "S3") in essentially a Write-Once-Read-Many
> configuration, where the goal is to implement data
> structures that have a forward state machine so that
> they remain consistent with unreliable computing
> resources (eg, "runtimes on EC2 hosts"), and that
> are readily cacheable (and horizontally scaleable).
>
> Then, the runtimes are of the collection and maintenance
> of posts ("infeeds" and "outfeeds", backfills), about
> summary generation (overview, metadata, key extraction,
> information content, working up auto-correlation), then
> reader servers, then some maintenance and admin. As a
> usual software design principle there is a goal of the
> both "stack-on-a-box" and also "abstraction of resources"
> and a usual separation of domain, library, routine, and
> runtime logic.
>
> So basically it looks like:
> 1) gather mbox files of sci.math and sci.logic
> 2) copy those to archive inputs
> 3) break those out into a filesystem layout for each article
> (there are various filesystems that support this many files
> these days)
> 4) generate partition and overview summaries
> 5) generate various revisioning schemes (the "article numbers"
> of the various servers)
> 6) figure out the incremental addition and periodic truncation
> 7) establish a low-cost but high-availability endpoint runtime
> 8) make elastic/auto-scaling service routine behind that
> 9) have opportunistic / low cost periodic maintenance
> 10) emit that as a configuration that anybody can run
> as "stack-on-a-box" or with usual "free tier" cloud accounts



I've looked into this a bit more and the implementation is
starting to look along these lines.

First there's the ingestion side, or "infeed", basically
the infeed connects and pushes articles. Here then the
basic store of the articles will be an object store (or
here "S3" as an example object store). This is durable
and the object keys are the article's "unique" message-id.

If the message-id already exists in the store, then the
infeed just continues.

The article is stored with matching the message-id, noting
the body offset, and counting the lines, and storing that
with the object. Then, the message-id pushed to
a queue, can also have the headers as extracted from
the article, that are relevant to the article and overview,
and the arrival date or effective arrival date. The slow-
and-steady database worker (or, distributed data structure
on "Dynamo tables") then retrieves a queue item, at some
metered rate, and gets an article number for each of the
newsgroups (by some conditional update that might starve a thread)
for each group that is in the newsgroups of the article and
some "all" newsgroup, so that each article also has a (sequential) number.

Assigning a sequence is a bit the wicket, because, here
there's basically "eventual consistency" and "forward safe"
operations. Any of the threads, connections, or boxes
could die at any time, then the primary concern is "no
drops, then, no dupes". So, there isn't really a transactional
context to make atomic "for each group, give it the next
sequence value, doing that together for each groups' numbering
of articles in an atomic transaction". Luckily, while NNTP
requires strictly increasing values, it allows gaps in the
sequences. So, here, when mapping article-number to message-id
and message-id to article-number, if some other thread has
already stored a value for that article-number, then it can
be re-tried until there is an unused article-number. Updating
the high-water mark can fail if it was updated by another thread,
then to re-try again with the new, which could lead to starvation.

(There's a notion then, when an article-number is assigned, to
toss that back onto queue for the rest of the transaction to
be carried out.)

Then, this having established a data structure for the message
store, these are basically the live data structures, distributed,
highly available, fault-tolerant and maintenance free, this
implements the basic function for getting feeds (or new articles)
and also the reader capability, which is basically a protocol
listener that maintains the reader's current group and article.

To implement then some further features of NNTP, there's an idea
to store the article numbers for each group and "all" basically
a bucket for each time period (eg, 1 day), then, that scans over
the articles by their numbers find those as the partitions, then
that sequentially (or rather, increasingly) the rest follow.

To omit or remove articles or expire them for no-archive, that
is basically ignored, but the idea is to maintain for the all
group series of 1000 or 10000 articles then for what offsets in
those series are cancelled. Basically the object store is
write-once, immutable, and flat, where it's yet to be determined
how to backfill the article store from archive files or suck
feeds from live servers with long retentions. Then there's an
idea to start the numbering at 1 000 000 or so an then have
plenty of ranges where to fill in articles as archived or
according to their receipt date header.

Then, as the primary data stores would basically just implement
a simple news server, there are two main notions of priority,
to implement posting and to implement summaries and reports.

Then, as far as I can tell, this pretty much fits within the
"free tier" then that it's pretty economical.
David Melik
2016-12-05 02:57:14 UTC
Reply
Permalink
On 12/01/2016 08:24 PM, Ross A. Finlayson wrote:
>
>
> I have an idea here to build a usenet server
> only for sci.math and sci.logic. The idea is
> to find archives of sci.math and sci.logic and
> to populate a store of the articles in a more
> or less enduring form (say, "on the cloud"),
> then to offer some usual news server access
> then to, say, 1 month 3 month 6 month retention,
> and then some cumulative retention (with a goal
> of unlimited retention of sci.math and sci.logic
> articles). The idea would be to have basically
> various names of servers then reflect those
> retentions for various uses for a read-only
> archival server and a read-only daily server
> and a read-and-write posting server. I'm willing
> to invest time and effort to write the necessary
> software and gather existing archives and integrate
> with existing usenet providers to put together these
> things.
>
> Then, where basically it's in part an exercise
> in vanity, I've been cultivating some various
> notions of how to generate some summaries or
> reports of various post, articles, threads, and
> authors, toward the specialization of the cultivation
> of summary for reporting and research purposes.
>
>
> So, I wonder others' idea about such a thing and
> how they might see it as a reasonably fruitful
> thing, basically for the enjoyment and for the
> most direct purposes of the authors of the posts.
>
>
> I invite comment, as I have begun to carry this out.

What about including all the sci.math.* and alt.math.*?
Ross A. Finlayson
2016-12-05 04:05:03 UTC
Reply
Permalink
On Sunday, December 4, 2016 at 6:57:19 PM UTC-8, David Melik wrote:
> On 12/01/2016 08:24 PM, Ross A. Finlayson wrote:
> >
> >
> > I have an idea here to build a usenet server
> > only for sci.math and sci.logic. The idea is
> > to find archives of sci.math and sci.logic and
> > to populate a store of the articles in a more
> > or less enduring form (say, "on the cloud"),
> > then to offer some usual news server access
> > then to, say, 1 month 3 month 6 month retention,
> > and then some cumulative retention (with a goal
> > of unlimited retention of sci.math and sci.logic
> > articles). The idea would be to have basically
> > various names of servers then reflect those
> > retentions for various uses for a read-only
> > archival server and a read-only daily server
> > and a read-and-write posting server. I'm willing
> > to invest time and effort to write the necessary
> > software and gather existing archives and integrate
> > with existing usenet providers to put together these
> > things.
> >
> > Then, where basically it's in part an exercise
> > in vanity, I've been cultivating some various
> > notions of how to generate some summaries or
> > reports of various post, articles, threads, and
> > authors, toward the specialization of the cultivation
> > of summary for reporting and research purposes.
> >
> >
> > So, I wonder others' idea about such a thing and
> > how they might see it as a reasonably fruitful
> > thing, basically for the enjoyment and for the
> > most direct purposes of the authors of the posts.
> >
> >
> > I invite comment, as I have begun to carry this out.
>
> What about including all the sci.math.* and alt.math.*?

It's a matter of scale and configuration.

It should scale quite well enough, though at some point
it would involve some money. In rough terms, it looks
like storing 1MM messages is ~$25/month, and supporting
readers is a few cents a day but copying it would be
twenty or thirty dollars. (I can front that.)

I'm for it where it might be useful, where I hope to
establish an archive with the goal of indefinite retention,
and basically to present an archive and for my own
purposes to generate narratives and timelines.

The challenge will be to get copies of archives of these
newsgroups. Somebody out of news.admin.peering might
have some insight into who has the Dejanews CDs or what
there might be in the Internet Archive Usenet Archive,
then in terms of today's news servers which claim about
ten years retention. Basically I'm looking for twenty
plus years of retention.

Now, some development is underway, and in no real hurry.
Basically I'm looking at the runtimes and a software
library to be written, (i.e., interfaces for the components
above and local file-system versions for stack-on-a-box,
implementing a subset of NNTP, in a simple service runtime
that idles really low).

Then, as above, it's kind of a vanity project or author-centric,
about making it so that custom servers could be stood up with
whatever newsgroups you want with the articles filtered
however you'd so care, rendered variously.
Yuri Kreaton
2016-12-05 05:19:30 UTC
Reply
Permalink
On 12/4/2016 10:05 PM, Ross A. Finlayson wrote:
> On Sunday, December 4, 2016 at 6:57:19 PM UTC-8, David Melik wrote:
>> On 12/01/2016 08:24 PM, Ross A. Finlayson wrote:
>>>

>>> I have an idea here to build a usenet server
>>> only for sci.math and sci.logic. The idea is
>>> to find archives of sci.math and sci.logic and
>>> to populate a store of the articles in a more
>>> or less enduring form (say, "on the cloud"),
>>> then to offer some usual news server access
>>> then to, say, 1 month 3 month 6 month retention,
>>> and then some cumulative retention (with a goal
of unlimited retention of sci.math and sci.logic
>>> articles). The idea would be to have basically
>>> various names of servers then reflect those
>>> retentions for various uses for a read-only
>>> archival server and a read-only daily server
>>> and a read-and-write posting server. I'm willing
>>> to invest time and effort to write the necessary
>>> software and gather existing archives and integrate
>>> with existing usenet providers to put together these
>>> things.

>>> Then, where basically it's in part an exercise
>>> in vanity, I've been cultivating some various
>>> notions of how to generate some summaries or
>>> reports of various post, articles, threads, and
>>> authors, toward the specialization of the cultivation
>>> of summary for reporting and research purposes.
>>>

>>> So, I wonder others' idea about such a thing and
>>> how they might see it as a reasonably fruitful
>>> thing, basically for the enjoyment and for the
>>> most direct purposes of the authors of the posts.
>>>
>>>
>>> I invite comment, as I have begun to carry this out.

>> What about including all the sci.math.* and alt.math.*?
>
> It's a matter of scale and configuration.
>
> It should scale quite well enough, though at some point
> it would involve some money. In rough terms, it looks
> like storing 1MM messages is ~$25/month, and supporting
> readers is a few cents a day but copying it would be
> twenty or thirty dollars. (I can front that.)

> I'm for it where it might be useful, where I hope to
> establish an archive with the goal of indefinite retention,
> and basically to present an archive and for my own
> purposes to generate narratives and timelines.

> The challenge will be to get copies of archives of these
> newsgroups. Somebody out of news.admin.peering might
> have some insight into who has the Dejanews CDs or what
> there might be in the Internet Archive Usenet Archive,
> then in terms of today's news servers which claim about
> ten years retention. Basically I'm looking for twenty
> plus years of retention.

> Now, some development is underway, and in no real hurry.
> Basically I'm looking at the runtimes and a software
> library to be written, (i.e., interfaces for the components
> above and local file-system versions for stack-on-a-box,
> implementing a subset of NNTP, in a simple service runtime
> that idles really low).

> Then, as above, it's kind of a vanity project or author-centric,
> about making it so that custom servers could be stood up with
> whatever newsgroups you want with the articles filtered
> however you'd so care, rendered variously.
>
>

talk to the other news group server admins out there, read their web
information, call the dudes or email them, some will hook you up free.
you attach,and you get all sci.math they have and get updates, I think
for free, suck, feed the linkup are called. they may also want to link
up with you for redundancy

not much sw to write either, look for newsgroups w server in the name
for info, some were very good 5 years ago
Ross A. Finlayson
2016-12-07 00:40:15 UTC
Reply
Permalink
On Sunday, December 4, 2016 at 9:19:39 PM UTC-8, Yuri Kreaton wrote:
> On 12/4/2016 10:05 PM, Ross A. Finlayson wrote:
> > On Sunday, December 4, 2016 at 6:57:19 PM UTC-8, David Melik wrote:
> >> On 12/01/2016 08:24 PM, Ross A. Finlayson wrote:
> >>>
>
> >>> I have an idea here to build a usenet server
> >>> only for sci.math and sci.logic. The idea is
> >>> to find archives of sci.math and sci.logic and
> >>> to populate a store of the articles in a more
> >>> or less enduring form (say, "on the cloud"),
> >>> then to offer some usual news server access
> >>> then to, say, 1 month 3 month 6 month retention,
> >>> and then some cumulative retention (with a goal
> of unlimited retention of sci.math and sci.logic
> >>> articles). The idea would be to have basically
> >>> various names of servers then reflect those
> >>> retentions for various uses for a read-only
> >>> archival server and a read-only daily server
> >>> and a read-and-write posting server. I'm willing
> >>> to invest time and effort to write the necessary
> >>> software and gather existing archives and integrate
> >>> with existing usenet providers to put together these
> >>> things.
>
> >>> Then, where basically it's in part an exercise
> >>> in vanity, I've been cultivating some various
> >>> notions of how to generate some summaries or
> >>> reports of various post, articles, threads, and
> >>> authors, toward the specialization of the cultivation
> >>> of summary for reporting and research purposes.
> >>>
>
> >>> So, I wonder others' idea about such a thing and
> >>> how they might see it as a reasonably fruitful
> >>> thing, basically for the enjoyment and for the
> >>> most direct purposes of the authors of the posts.
> >>>
> >>>
> >>> I invite comment, as I have begun to carry this out.
>
> >> What about including all the sci.math.* and alt.math.*?
> >
> > It's a matter of scale and configuration.
> >
> > It should scale quite well enough, though at some point
> > it would involve some money. In rough terms, it looks
> > like storing 1MM messages is ~$25/month, and supporting
> > readers is a few cents a day but copying it would be
> > twenty or thirty dollars. (I can front that.)
>
> > I'm for it where it might be useful, where I hope to
> > establish an archive with the goal of indefinite retention,
> > and basically to present an archive and for my own
> > purposes to generate narratives and timelines.
>
> > The challenge will be to get copies of archives of these
> > newsgroups. Somebody out of news.admin.peering might
> > have some insight into who has the Dejanews CDs or what
> > there might be in the Internet Archive Usenet Archive,
> > then in terms of today's news servers which claim about
> > ten years retention. Basically I'm looking for twenty
> > plus years of retention.
>
> > Now, some development is underway, and in no real hurry.
> > Basically I'm looking at the runtimes and a software
> > library to be written, (i.e., interfaces for the components
> > above and local file-system versions for stack-on-a-box,
> > implementing a subset of NNTP, in a simple service runtime
> > that idles really low).
>
> > Then, as above, it's kind of a vanity project or author-centric,
> > about making it so that custom servers could be stood up with
> > whatever newsgroups you want with the articles filtered
> > however you'd so care, rendered variously.
> >
> >
>
> talk to the other news group server admins out there, read their web
> information, call the dudes or email them, some will hook you up free.
> you attach,and you get all sci.math they have and get updates, I think
> for free, suck, feed the linkup are called. they may also want to link
> up with you for redundancy
>
> not much sw to write either, look for newsgroups w server in the name
> for info, some were very good 5 years ago



I've been studying this a bit more.

I set up a linux development environment
by installing ubuntu to a stick PC, then
installing vim, gcc, java, mvn, git. While
ubuntu is a debian distribution and Amazon
Linux (a designated target) is instead along
the lines of RedHat/Yellowdog (yum, was rpm,
instead of apt-get, for component configuration),
then I'm pretty familiar with these tools.

Looking to the available components, basically
the algorithm is being designed with data
structures that can be local or remote. Then,
these are usually that much more complicated
than just the local or just the remote, and
here also besides the routine or state machine
also the exception or error handling and the
having of the queues everywhere for both
throttling and delay-retries (besides the
usual inline re-tries and about circuit
breaker). So, this is along the lines of
"this is an object/octet store" (and AWS
has an offering "Elastic File System" which
is an NFS Networked File System that looks
quite the bit more economical than S3 for
this purpose), "this is a number allocator"
(without sequence.nextVal in an RDBMS, the
requirements allow some gaps in the sequence,
here to use some DynamoDB table attribute's
"atomic counter"), then along the lines of
"this is a queue" and separately "I push to
queues" and "I pop queues", and about "queue
this for right now" and "queue this for later".
Then, there's various mappings, like id to number
and number to id, where again for no-drops / no-dupes
/ Murphy's-law that the state of the mappings is
basically "forward-safe" and that retries make
the system robust and "self-healing". Other mappings
include a removed/deleted bag, this basically looks
like a subset of a series or range of the assigned
numbers, of the all-table and each group-table,
basically numbers are added as attributes to the
item for the series or range.

Octet Store
Queue
Mapping

Then, as noted above, with Murphy's law, any of the
edges of the flowgraph can break at any time, about
the request/response each that defines the boundary
(and a barrier), there is basically defined an abstract
generic exception "TryableException" that has only two
subclasses, "Retryable" and "Nonretryable". Then, the
various implementations of the data structures in the
patterns of their use variously throw these in puking
back the stack trace, then for inline re-tries, delay
re-tries, and fails. Here there's usually a definition
of "idempotence" for methods that are re-tryable besides
exceptions that might go away. The idea is to build
this into the procedure, so it's all built at compile-
time the correctness of the composition of the steps
of the flowgraph of the procedure.


Then, for the runtime, basically it will be some Java
container on the host or in a container, with basically
a cheap simple watchdog/heartbeat that uses signals on
unix (posix) to be keeping the service/routine nodes
(that can fail) up, to bounce (restart) them with signals,
and to reasonably fail and alarm if thrashing of the
child process of the watchdog/nanny, with maybe some
timer update up to the watchdog/heartbeat. Then basically
this runner executes the routine/workflow logic in the jar,
besides that then a mount of the NFS being the only admin
on the box, everything else being run up out of the
environment from the build artifact.

The build artifact then looks that I'd use Spring for
wiring a container and also configuration profiles and
maybe Spring AOP and this kind of thing, i.e., just
spring-core (toward avoiding "all" of spring-boot).

Then, with local (in-memory and file) and remote
(distributed) implementations, basically the
design is to the distributed components, making
abstract those patterns then implementing for the
usual local implementation as standard containers
and usual remote implementation as building transactions
and defined behavior over the network.
Ross A. Finlayson
2016-12-09 21:46:49 UTC
Reply
Permalink
On Tuesday, December 6, 2016 at 4:40:26 PM UTC-8, Ross A. Finlayson wrote:
> On Sunday, December 4, 2016 at 9:19:39 PM UTC-8, Yuri Kreaton wrote:
> > On 12/4/2016 10:05 PM, Ross A. Finlayson wrote:
> > > On Sunday, December 4, 2016 at 6:57:19 PM UTC-8, David Melik wrote:
> > >> On 12/01/2016 08:24 PM, Ross A. Finlayson wrote:
> > >>>
> >
> > >>> I have an idea here to build a usenet server
> > >>> only for sci.math and sci.logic. The idea is
> > >>> to find archives of sci.math and sci.logic and
> > >>> to populate a store of the articles in a more
> > >>> or less enduring form (say, "on the cloud"),
> > >>> then to offer some usual news server access
> > >>> then to, say, 1 month 3 month 6 month retention,
> > >>> and then some cumulative retention (with a goal
> > of unlimited retention of sci.math and sci.logic
> > >>> articles). The idea would be to have basically
> > >>> various names of servers then reflect those
> > >>> retentions for various uses for a read-only
> > >>> archival server and a read-only daily server
> > >>> and a read-and-write posting server. I'm willing
> > >>> to invest time and effort to write the necessary
> > >>> software and gather existing archives and integrate
> > >>> with existing usenet providers to put together these
> > >>> things.
> >
> > >>> Then, where basically it's in part an exercise
> > >>> in vanity, I've been cultivating some various
> > >>> notions of how to generate some summaries or
> > >>> reports of various post, articles, threads, and
> > >>> authors, toward the specialization of the cultivation
> > >>> of summary for reporting and research purposes.
> > >>>
> >
> > >>> So, I wonder others' idea about such a thing and
> > >>> how they might see it as a reasonably fruitful
> > >>> thing, basically for the enjoyment and for the
> > >>> most direct purposes of the authors of the posts.
> > >>>
> > >>>
> > >>> I invite comment, as I have begun to carry this out.
> >
> > >> What about including all the sci.math.* and alt.math.*?
> > >
> > > It's a matter of scale and configuration.
> > >
> > > It should scale quite well enough, though at some point
> > > it would involve some money. In rough terms, it looks
> > > like storing 1MM messages is ~$25/month, and supporting
> > > readers is a few cents a day but copying it would be
> > > twenty or thirty dollars. (I can front that.)
> >
> > > I'm for it where it might be useful, where I hope to
> > > establish an archive with the goal of indefinite retention,
> > > and basically to present an archive and for my own
> > > purposes to generate narratives and timelines.
> >
> > > The challenge will be to get copies of archives of these
> > > newsgroups. Somebody out of news.admin.peering might
> > > have some insight into who has the Dejanews CDs or what
> > > there might be in the Internet Archive Usenet Archive,
> > > then in terms of today's news servers which claim about
> > > ten years retention. Basically I'm looking for twenty
> > > plus years of retention.
> >
> > > Now, some development is underway, and in no real hurry.
> > > Basically I'm looking at the runtimes and a software
> > > library to be written, (i.e., interfaces for the components
> > > above and local file-system versions for stack-on-a-box,
> > > implementing a subset of NNTP, in a simple service runtime
> > > that idles really low).
> >
> > > Then, as above, it's kind of a vanity project or author-centric,
> > > about making it so that custom servers could be stood up with
> > > whatever newsgroups you want with the articles filtered
> > > however you'd so care, rendered variously.
> > >
> > >
> >
> > talk to the other news group server admins out there, read their web
> > information, call the dudes or email them, some will hook you up free.
> > you attach,and you get all sci.math they have and get updates, I think
> > for free, suck, feed the linkup are called. they may also want to link
> > up with you for redundancy
> >
> > not much sw to write either, look for newsgroups w server in the name
> > for info, some were very good 5 years ago
>
>
>
> I've been studying this a bit more.
>
> I set up a linux development environment
> by installing ubuntu to a stick PC, then
> installing vim, gcc, java, mvn, git. While
> ubuntu is a debian distribution and Amazon
> Linux (a designated target) is instead along
> the lines of RedHat/Yellowdog (yum, was rpm,
> instead of apt-get, for component configuration),
> then I'm pretty familiar with these tools.
>
> Looking to the available components, basically
> the algorithm is being designed with data
> structures that can be local or remote. Then,
> these are usually that much more complicated
> than just the local or just the remote, and
> here also besides the routine or state machine
> also the exception or error handling and the
> having of the queues everywhere for both
> throttling and delay-retries (besides the
> usual inline re-tries and about circuit
> breaker). So, this is along the lines of
> "this is an object/octet store" (and AWS
> has an offering "Elastic File System" which
> is an NFS Networked File System that looks
> quite the bit more economical than S3 for
> this purpose), "this is a number allocator"
> (without sequence.nextVal in an RDBMS, the
> requirements allow some gaps in the sequence,
> here to use some DynamoDB table attribute's
> "atomic counter"), then along the lines of
> "this is a queue" and separately "I push to
> queues" and "I pop queues", and about "queue
> this for right now" and "queue this for later".
> Then, there's various mappings, like id to number
> and number to id, where again for no-drops / no-dupes
> / Murphy's-law that the state of the mappings is
> basically "forward-safe" and that retries make
> the system robust and "self-healing". Other mappings
> include a removed/deleted bag, this basically looks
> like a subset of a series or range of the assigned
> numbers, of the all-table and each group-table,
> basically numbers are added as attributes to the
> item for the series or range.
>
> Octet Store
> Queue
> Mapping
>
> Then, as noted above, with Murphy's law, any of the
> edges of the flowgraph can break at any time, about
> the request/response each that defines the boundary
> (and a barrier), there is basically defined an abstract
> generic exception "TryableException" that has only two
> subclasses, "Retryable" and "Nonretryable". Then, the
> various implementations of the data structures in the
> patterns of their use variously throw these in puking
> back the stack trace, then for inline re-tries, delay
> re-tries, and fails. Here there's usually a definition
> of "idempotence" for methods that are re-tryable besides
> exceptions that might go away. The idea is to build
> this into the procedure, so it's all built at compile-
> time the correctness of the composition of the steps
> of the flowgraph of the procedure.
>
>
> Then, for the runtime, basically it will be some Java
> container on the host or in a container, with basically
> a cheap simple watchdog/heartbeat that uses signals on
> unix (posix) to be keeping the service/routine nodes
> (that can fail) up, to bounce (restart) them with signals,
> and to reasonably fail and alarm if thrashing of the
> child process of the watchdog/nanny, with maybe some
> timer update up to the watchdog/heartbeat. Then basically
> this runner executes the routine/workflow logic in the jar,
> besides that then a mount of the NFS being the only admin
> on the box, everything else being run up out of the
> environment from the build artifact.
>
> The build artifact then looks that I'd use Spring for
> wiring a container and also configuration profiles and
> maybe Spring AOP and this kind of thing, i.e., just
> spring-core (toward avoiding "all" of spring-boot).
>
> Then, with local (in-memory and file) and remote
> (distributed) implementations, basically the
> design is to the distributed components, making
> abstract those patterns then implementing for the
> usual local implementation as standard containers
> and usual remote implementation as building transactions
> and defined behavior over the network.

Having been researching this a bit more, and
tapping at the code, I've written out most of
the commands then to build a state machine of
the results, and, having analyze the algorithm
of article ingestion and group and session state,
have defined interfaces suitable either for local
or remote operation, with the notion that local
operation would be self-contained (with a quite
simple file backing) while remote operation would
be quite usually durable and horizontally scalable.

I've written up a message reader/writer interface
or ("Scanner" and "Printer") for non-blocking I/O
and implementing reading Commands and writing Results
via non-blocking I/O. This should allow connection
scaling, with threads on accepter/closer and reader/
writer and an execution pool for the commands. The
Scanner and Printer use some BufferPool (basically
abut 4*1024 or 4K buffers), with an idea that that's
pretty much all the I/O usage of RAM and is reasonably
efficient, and that if RAM is hogged it's simple enough
to self-throttle the reader for the writer to balance
out.

About the runtime, basically the idea is to have it
installable as a "well-known service" for "socket
activation" as via inetd or systemd. The runtime is
really rather lean and starts quickly, here on-demand,
that it can be configured as "on-demand" or "long-running".
For some container without systemd or the equivalent,
it could have a rather lean nanny. There's some notion
of integrating heartbeat or status about Main.main(),
then that it runs as "java -jar nntp.jar".

Where the remote backing store or article file system
is some network file system, it also seems that the
runtime would so configure dependency on its file system
resource with quite usual system configuration tools,
for a fault-tolerant and graceful box that reboots as activable.

It interests me that SMTP is quite similar to NNTP. With
an idea of an on-demand server, which is quite rather usual,
these service nodes run on the smallest cloud instances
(here the "t2.nano") and scale to traffic, with a very low
idle or simply the "on-demand" (then for "containerized").


About usenet them I've been studying what it would mean to
be compliant and example what to do with some "control" or
"junk" (sideband) groups and otherwise what it would mean
and take to make a horizontally scalable elastic cloud
usenet server (and persistent store). This is where the
service node is quite lean, the file store and database
(here of horizontally scalable "tables") is basically unbounded.
Ross A. Finlayson
2016-12-12 00:19:22 UTC
Reply
Permalink
On Friday, December 9, 2016 at 1:46:54 PM UTC-8, Ross A. Finlayson wrote:
> On Tuesday, December 6, 2016 at 4:40:26 PM UTC-8, Ross A. Finlayson wrote:
> > On Sunday, December 4, 2016 at 9:19:39 PM UTC-8, Yuri Kreaton wrote:
> > > On 12/4/2016 10:05 PM, Ross A. Finlayson wrote:
> > > > On Sunday, December 4, 2016 at 6:57:19 PM UTC-8, David Melik wrote:
> > > >> On 12/01/2016 08:24 PM, Ross A. Finlayson wrote:
> > > >>>
> > >
> > > >>> I have an idea here to build a usenet server
> > > >>> only for sci.math and sci.logic. The idea is
> > > >>> to find archives of sci.math and sci.logic and
> > > >>> to populate a store of the articles in a more
> > > >>> or less enduring form (say, "on the cloud"),
> > > >>> then to offer some usual news server access
> > > >>> then to, say, 1 month 3 month 6 month retention,
> > > >>> and then some cumulative retention (with a goal
> > > of unlimited retention of sci.math and sci.logic
> > > >>> articles). The idea would be to have basically
> > > >>> various names of servers then reflect those
> > > >>> retentions for various uses for a read-only
> > > >>> archival server and a read-only daily server
> > > >>> and a read-and-write posting server. I'm willing
> > > >>> to invest time and effort to write the necessary
> > > >>> software and gather existing archives and integrate
> > > >>> with existing usenet providers to put together these
> > > >>> things.
> > >
> > > >>> Then, where basically it's in part an exercise
> > > >>> in vanity, I've been cultivating some various
> > > >>> notions of how to generate some summaries or
> > > >>> reports of various post, articles, threads, and
> > > >>> authors, toward the specialization of the cultivation
> > > >>> of summary for reporting and research purposes.
> > > >>>
> > >
> > > >>> So, I wonder others' idea about such a thing and
> > > >>> how they might see it as a reasonably fruitful
> > > >>> thing, basically for the enjoyment and for the
> > > >>> most direct purposes of the authors of the posts.
> > > >>>
> > > >>>
> > > >>> I invite comment, as I have begun to carry this out.
> > >
> > > >> What about including all the sci.math.* and alt.math.*?
> > > >
> > > > It's a matter of scale and configuration.
> > > >
> > > > It should scale quite well enough, though at some point
> > > > it would involve some money. In rough terms, it looks
> > > > like storing 1MM messages is ~$25/month, and supporting
> > > > readers is a few cents a day but copying it would be
> > > > twenty or thirty dollars. (I can front that.)
> > >
> > > > I'm for it where it might be useful, where I hope to
> > > > establish an archive with the goal of indefinite retention,
> > > > and basically to present an archive and for my own
> > > > purposes to generate narratives and timelines.
> > >
> > > > The challenge will be to get copies of archives of these
> > > > newsgroups. Somebody out of news.admin.peering might
> > > > have some insight into who has the Dejanews CDs or what
> > > > there might be in the Internet Archive Usenet Archive,
> > > > then in terms of today's news servers which claim about
> > > > ten years retention. Basically I'm looking for twenty
> > > > plus years of retention.
> > >
> > > > Now, some development is underway, and in no real hurry.
> > > > Basically I'm looking at the runtimes and a software
> > > > library to be written, (i.e., interfaces for the components
> > > > above and local file-system versions for stack-on-a-box,
> > > > implementing a subset of NNTP, in a simple service runtime
> > > > that idles really low).
> > >
> > > > Then, as above, it's kind of a vanity project or author-centric,
> > > > about making it so that custom servers could be stood up with
> > > > whatever newsgroups you want with the articles filtered
> > > > however you'd so care, rendered variously.
> > > >
> > > >
> > >
> > > talk to the other news group server admins out there, read their web
> > > information, call the dudes or email them, some will hook you up free.
> > > you attach,and you get all sci.math they have and get updates, I think
> > > for free, suck, feed the linkup are called. they may also want to link
> > > up with you for redundancy
> > >
> > > not much sw to write either, look for newsgroups w server in the name
> > > for info, some were very good 5 years ago
> >
> >
> >
> > I've been studying this a bit more.
> >
> > I set up a linux development environment
> > by installing ubuntu to a stick PC, then
> > installing vim, gcc, java, mvn, git. While
> > ubuntu is a debian distribution and Amazon
> > Linux (a designated target) is instead along
> > the lines of RedHat/Yellowdog (yum, was rpm,
> > instead of apt-get, for component configuration),
> > then I'm pretty familiar with these tools.
> >
> > Looking to the available components, basically
> > the algorithm is being designed with data
> > structures that can be local or remote. Then,
> > these are usually that much more complicated
> > than just the local or just the remote, and
> > here also besides the routine or state machine
> > also the exception or error handling and the
> > having of the queues everywhere for both
> > throttling and delay-retries (besides the
> > usual inline re-tries and about circuit
> > breaker). So, this is along the lines of
> > "this is an object/octet store" (and AWS
> > has an offering "Elastic File System" which
> > is an NFS Networked File System that looks
> > quite the bit more economical than S3 for
> > this purpose), "this is a number allocator"
> > (without sequence.nextVal in an RDBMS, the
> > requirements allow some gaps in the sequence,
> > here to use some DynamoDB table attribute's
> > "atomic counter"), then along the lines of
> > "this is a queue" and separately "I push to
> > queues" and "I pop queues", and about "queue
> > this for right now" and "queue this for later".
> > Then, there's various mappings, like id to number
> > and number to id, where again for no-drops / no-dupes
> > / Murphy's-law that the state of the mappings is
> > basically "forward-safe" and that retries make
> > the system robust and "self-healing". Other mappings
> > include a removed/deleted bag, this basically looks
> > like a subset of a series or range of the assigned
> > numbers, of the all-table and each group-table,
> > basically numbers are added as attributes to the
> > item for the series or range.
> >
> > Octet Store
> > Queue
> > Mapping
> >
> > Then, as noted above, with Murphy's law, any of the
> > edges of the flowgraph can break at any time, about
> > the request/response each that defines the boundary
> > (and a barrier), there is basically defined an abstract
> > generic exception "TryableException" that has only two
> > subclasses, "Retryable" and "Nonretryable". Then, the
> > various implementations of the data structures in the
> > patterns of their use variously throw these in puking
> > back the stack trace, then for inline re-tries, delay
> > re-tries, and fails. Here there's usually a definition
> > of "idempotence" for methods that are re-tryable besides
> > exceptions that might go away. The idea is to build
> > this into the procedure, so it's all built at compile-
> > time the correctness of the composition of the steps
> > of the flowgraph of the procedure.
> >
> >
> > Then, for the runtime, basically it will be some Java
> > container on the host or in a container, with basically
> > a cheap simple watchdog/heartbeat that uses signals on
> > unix (posix) to be keeping the service/routine nodes
> > (that can fail) up, to bounce (restart) them with signals,
> > and to reasonably fail and alarm if thrashing of the
> > child process of the watchdog/nanny, with maybe some
> > timer update up to the watchdog/heartbeat. Then basically
> > this runner executes the routine/workflow logic in the jar,
> > besides that then a mount of the NFS being the only admin
> > on the box, everything else being run up out of the
> > environment from the build artifact.
> >
> > The build artifact then looks that I'd use Spring for
> > wiring a container and also configuration profiles and
> > maybe Spring AOP and this kind of thing, i.e., just
> > spring-core (toward avoiding "all" of spring-boot).
> >
> > Then, with local (in-memory and file) and remote
> > (distributed) implementations, basically the
> > design is to the distributed components, making
> > abstract those patterns then implementing for the
> > usual local implementation as standard containers
> > and usual remote implementation as building transactions
> > and defined behavior over the network.
>
> Having been researching this a bit more, and
> tapping at the code, I've written out most of
> the commands then to build a state machine of
> the results, and, having analyze the algorithm
> of article ingestion and group and session state,
> have defined interfaces suitable either for local
> or remote operation, with the notion that local
> operation would be self-contained (with a quite
> simple file backing) while remote operation would
> be quite usually durable and horizontally scalable.
>
> I've written up a message reader/writer interface
> or ("Scanner" and "Printer") for non-blocking I/O
> and implementing reading Commands and writing Results
> via non-blocking I/O. This should allow connection
> scaling, with threads on accepter/closer and reader/
> writer and an execution pool for the commands. The
> Scanner and Printer use some BufferPool (basically
> abut 4*1024 or 4K buffers), with an idea that that's
> pretty much all the I/O usage of RAM and is reasonably
> efficient, and that if RAM is hogged it's simple enough
> to self-throttle the reader for the writer to balance
> out.
>
> About the runtime, basically the idea is to have it
> installable as a "well-known service" for "socket
> activation" as via inetd or systemd. The runtime is
> really rather lean and starts quickly, here on-demand,
> that it can be configured as "on-demand" or "long-running".
> For some container without systemd or the equivalent,
> it could have a rather lean nanny. There's some notion
> of integrating heartbeat or status about Main.main(),
> then that it runs as "java -jar nntp.jar".
>
> Where the remote backing store or article file system
> is some network file system, it also seems that the
> runtime would so configure dependency on its file system
> resource with quite usual system configuration tools,
> for a fault-tolerant and graceful box that reboots as activable.
>
> It interests me that SMTP is quite similar to NNTP. With
> an idea of an on-demand server, which is quite rather usual,
> these service nodes run on the smallest cloud instances
> (here the "t2.nano") and scale to traffic, with a very low
> idle or simply the "on-demand" (then for "containerized").
>
>
> About usenet them I've been studying what it would mean to
> be compliant and example what to do with some "control" or
> "junk" (sideband) groups and otherwise what it would mean
> and take to make a horizontally scalable elastic cloud
> usenet server (and persistent store). This is where the
> service node is quite lean, the file store and database
> (here of horizontally scalable "tables") is basically unbounded.

I've collected what RFC's or specs there are for usenet,
then having surveyed the most of the specified use cases,
have cataloged descriptions of the commands about the protocol
that they are self-contained descriptions within the protocol
of each command. Then, for where there is the protocol and
perhaps any exchange or change of the protocol, for example
for TLS, then that is also being worked into the state machine
of sorts (simply enough a loop over the input buffer to generate
command values from the input given the command descriptions),
for that then as commands are generated (and maintained in their
order) that the results (eg, in the parallel) are thus computed
and returned (again back in the order).

Then, within the protocol, and basically for encryption and
compression, these are established within the protocol instead
of, for example, externally to the protocol. So, there is
basically a filter between the I/O reader and I/O writer and
the scanner and the printer, as it were, that scans input data
to commands and writes command results to output data. This is
again with the "non-blocking I/O" then about that the blocks or
buffers I've basically settled to 4 kibibyte (4KB) buffers, where,
basically an entire input or output in the protocol (here a message
body or perhaps a list of up to all the article numbers) would be
buffered (in RAM), so I'm looking to spool that off to disk if it
so results that essentially unbounded inputs and outputs are to be
handled gracefully in the limited CPU, RAM, I/O, and disk resources
of the usually quite reliable but formally unreliable computing node
(and at cost).

The data structures for access and persistence evolve as the in-memory
and file-based local editions and networked or cloud remote editions.
The semantics are built out to the remote editions, as then they can be
erased in the difference for efficiencies of the local editions.
The in-memory structures (with the article bodies themselves yet
actually written to a file store) are quite efficient and bounded
by RAM or the heap, the file-based structures which makes use of the
memory-mapped files as you may well know comprise all the content of
"free" RAM caching the disk files may be mostly persistent with
a structure that can be bounded by disk size, then the remote network-
based structures here have a usual expectation of being highly reliable
(i.e., that the remote files, queues, and records have a higher reliability
than any given component in their distributed design, at the corresponding
cost in efficiency and direct performance, but of course, this is design
for correctness).

So, that said, then I'm tapping away at the implementation of a queue of
byte buffers, or the I/O RAM convention. Basically, there is some I/O,
and it may or may not be a complete datum or event in the protocol, which
is 1-client-1-server or a stateful protocol. So, what is read off the
I/O buffer, so the I/O controller can service that and other I/O lines,
is copied to a byte buffer. Then, this is to be filtered as above as
necessary, that it is copied to a list of byte buffers (a double ended
queue or linked list). These buffers maintain their current position
and limit, from their beginning, the "buffer" is these pointers and the
data itself. So, that's their concrete type already, then the scanner
or printer also maintains its scan or print position, that the buffer can
be filled and holds some data, then that as the scan pointer moves past
a buffer boundary, that buffer can be reclaimed, with only moving the
scan pointer when a complete datum is read (here as defined for the scanner
in small constant terms by the command descriptions as above).

So, that is pretty much sorted out, then about that basically it should
ingest articles just fine and be a mostly compliant NNTP server.

Then, generating the overview and such is another bit to get figured out,
which is summary.

Another thing in this design to get figured out is how to implement the
queue and database action for the remote, where, the cost efficiency of
the (managed, durable, redundant) remote database, is on having a more-or-
less constant (and small) rate of reads and writes. Then the distributed
queue will hold the backlog, but, the queue consumer is to be constant
rate not for the node but for the fleet, so I'm looking at how to implement
some leader election (fault-tolerance) or otherwise to have loaner threads
of the runtime for any service of the queue. This is where, ingestion is
de-coupled from inbox, so, there's an idea of having a sentinel queue consumer
(because this data might be high volume or low or zero) on a publish/subscribe,
it listens to the queue and if it gets an item it refuses it and wakes up
the constant-rate (or spiking) queue consumer workers, that then proceed
with the workflow items and then retire themselves if and when traffic drops
to zero again, standing back up the sentinel consumer.


Anyways that's just about how to handle variable load but here there's
that it's OK for the protocol to separate ingestion and inbox, otherwise
establishing the completion of the workflow item from the initial request
involves usual asynchronous completion considerations.


So, that said, then, the design is seeming pretty flexible, then about,
what extension commands might be suitable. Here the idea is about article
transfer and which articles to transfer to other servers. The idea is to
add some X-RETRANSFER-TO command or along these lines,

X-RETRANSFER-TO host [group [dateBegin [dateEnd]]]

then that this simply has the host open a connection to the other host
and offer via IHAVE/CHECK/TAKETHIS all the articles so in the range
or until the connection is closed. This way then, for example, if this
NNTP system was running, and, someone wanted a subset of the articles,
then this command would have them sent out-of-band, or, "automatic out-feed".
Figuring out how to re-distribute or message routing besides simple
message store and retrieval is its own problem.

Another issue is expiry, I don't really intend to delete anything, because
the purpose is archival, but people still use usenet in some corners of
the internet for daily news, again that's its own problem. Handling
out-of-order ingestion with the backfilling or archives as they can be
discovered is another issue, with that basically being about filling a
corpus of the messages, then trying to organize them that the message
date is effectively the original injection date.


Anyways, it proceeds along these lines.
Ross A. Finlayson
2016-12-13 08:05:02 UTC
Reply
Permalink
On Sunday, December 11, 2016 at 4:19:27 PM UTC-8, Ross A. Finlayson wrote:
> On Friday, December 9, 2016 at 1:46:54 PM UTC-8, Ross A. Finlayson wrote:
> > On Tuesday, December 6, 2016 at 4:40:26 PM UTC-8, Ross A. Finlayson wrote:
> > > On Sunday, December 4, 2016 at 9:19:39 PM UTC-8, Yuri Kreaton wrote:
> > > > On 12/4/2016 10:05 PM, Ross A. Finlayson wrote:
> > > > > On Sunday, December 4, 2016 at 6:57:19 PM UTC-8, David Melik wrote:
> > > > >> On 12/01/2016 08:24 PM, Ross A. Finlayson wrote:
> > > > >>>
> > > >
> > > > >>> I have an idea here to build a usenet server
> > > > >>> only for sci.math and sci.logic. The idea is
> > > > >>> to find archives of sci.math and sci.logic and
> > > > >>> to populate a store of the articles in a more
> > > > >>> or less enduring form (say, "on the cloud"),
> > > > >>> then to offer some usual news server access
> > > > >>> then to, say, 1 month 3 month 6 month retention,
> > > > >>> and then some cumulative retention (with a goal
> > > > of unlimited retention of sci.math and sci.logic
> > > > >>> articles). The idea would be to have basically
> > > > >>> various names of servers then reflect those
> > > > >>> retentions for various uses for a read-only
> > > > >>> archival server and a read-only daily server
> > > > >>> and a read-and-write posting server. I'm willing
> > > > >>> to invest time and effort to write the necessary
> > > > >>> software and gather existing archives and integrate
> > > > >>> with existing usenet providers to put together these
> > > > >>> things.
> > > >
> > > > >>> Then, where basically it's in part an exercise
> > > > >>> in vanity, I've been cultivating some various
> > > > >>> notions of how to generate some summaries or
> > > > >>> reports of various post, articles, threads, and
> > > > >>> authors, toward the specialization of the cultivation
> > > > >>> of summary for reporting and research purposes.
> > > > >>>
> > > >
> > > > >>> So, I wonder others' idea about such a thing and
> > > > >>> how they might see it as a reasonably fruitful
> > > > >>> thing, basically for the enjoyment and for the
> > > > >>> most direct purposes of the authors of the posts.
> > > > >>>
> > > > >>>
> > > > >>> I invite comment, as I have begun to carry this out.
> > > >
> > > > >> What about including all the sci.math.* and alt.math.*?
> > > > >
> > > > > It's a matter of scale and configuration.
> > > > >
> > > > > It should scale quite well enough, though at some point
> > > > > it would involve some money. In rough terms, it looks
> > > > > like storing 1MM messages is ~$25/month, and supporting
> > > > > readers is a few cents a day but copying it would be
> > > > > twenty or thirty dollars. (I can front that.)
> > > >
> > > > > I'm for it where it might be useful, where I hope to
> > > > > establish an archive with the goal of indefinite retention,
> > > > > and basically to present an archive and for my own
> > > > > purposes to generate narratives and timelines.
> > > >
> > > > > The challenge will be to get copies of archives of these
> > > > > newsgroups. Somebody out of news.admin.peering might
> > > > > have some insight into who has the Dejanews CDs or what
> > > > > there might be in the Internet Archive Usenet Archive,
> > > > > then in terms of today's news servers which claim about
> > > > > ten years retention. Basically I'm looking for twenty
> > > > > plus years of retention.
> > > >
> > > > > Now, some development is underway, and in no real hurry.
> > > > > Basically I'm looking at the runtimes and a software
> > > > > library to be written, (i.e., interfaces for the components
> > > > > above and local file-system versions for stack-on-a-box,
> > > > > implementing a subset of NNTP, in a simple service runtime
> > > > > that idles really low).
> > > >
> > > > > Then, as above, it's kind of a vanity project or author-centric,
> > > > > about making it so that custom servers could be stood up with
> > > > > whatever newsgroups you want with the articles filtered
> > > > > however you'd so care, rendered variously.
> > > > >
> > > > >
> > > >
> > > > talk to the other news group server admins out there, read their web
> > > > information, call the dudes or email them, some will hook you up free.
> > > > you attach,and you get all sci.math they have and get updates, I think
> > > > for free, suck, feed the linkup are called. they may also want to link
> > > > up with you for redundancy
> > > >
> > > > not much sw to write either, look for newsgroups w server in the name
> > > > for info, some were very good 5 years ago
> > >
> > >
> > >
> > > I've been studying this a bit more.
> > >
> > > I set up a linux development environment
> > > by installing ubuntu to a stick PC, then
> > > installing vim, gcc, java, mvn, git. While
> > > ubuntu is a debian distribution and Amazon
> > > Linux (a designated target) is instead along
> > > the lines of RedHat/Yellowdog (yum, was rpm,
> > > instead of apt-get, for component configuration),
> > > then I'm pretty familiar with these tools.
> > >
> > > Looking to the available components, basically
> > > the algorithm is being designed with data
> > > structures that can be local or remote. Then,
> > > these are usually that much more complicated
> > > than just the local or just the remote, and
> > > here also besides the routine or state machine
> > > also the exception or error handling and the
> > > having of the queues everywhere for both
> > > throttling and delay-retries (besides the
> > > usual inline re-tries and about circuit
> > > breaker). So, this is along the lines of
> > > "this is an object/octet store" (and AWS
> > > has an offering "Elastic File System" which
> > > is an NFS Networked File System that looks
> > > quite the bit more economical than S3 for
> > > this purpose), "this is a number allocator"
> > > (without sequence.nextVal in an RDBMS, the
> > > requirements allow some gaps in the sequence,
> > > here to use some DynamoDB table attribute's
> > > "atomic counter"), then along the lines of
> > > "this is a queue" and separately "I push to
> > > queues" and "I pop queues", and about "queue
> > > this for right now" and "queue this for later".
> > > Then, there's various mappings, like id to number
> > > and number to id, where again for no-drops / no-dupes
> > > / Murphy's-law that the state of the mappings is
> > > basically "forward-safe" and that retries make
> > > the system robust and "self-healing". Other mappings
> > > include a removed/deleted bag, this basically looks
> > > like a subset of a series or range of the assigned
> > > numbers, of the all-table and each group-table,
> > > basically numbers are added as attributes to the
> > > item for the series or range.
> > >
> > > Octet Store
> > > Queue
> > > Mapping
> > >
> > > Then, as noted above, with Murphy's law, any of the
> > > edges of the flowgraph can break at any time, about
> > > the request/response each that defines the boundary
> > > (and a barrier), there is basically defined an abstract
> > > generic exception "TryableException" that has only two
> > > subclasses, "Retryable" and "Nonretryable". Then, the
> > > various implementations of the data structures in the
> > > patterns of their use variously throw these in puking
> > > back the stack trace, then for inline re-tries, delay
> > > re-tries, and fails. Here there's usually a definition
> > > of "idempotence" for methods that are re-tryable besides
> > > exceptions that might go away. The idea is to build
> > > this into the procedure, so it's all built at compile-
> > > time the correctness of the composition of the steps
> > > of the flowgraph of the procedure.
> > >
> > >
> > > Then, for the runtime, basically it will be some Java
> > > container on the host or in a container, with basically
> > > a cheap simple watchdog/heartbeat that uses signals on
> > > unix (posix) to be keeping the service/routine nodes
> > > (that can fail) up, to bounce (restart) them with signals,
> > > and to reasonably fail and alarm if thrashing of the
> > > child process of the watchdog/nanny, with maybe some
> > > timer update up to the watchdog/heartbeat. Then basically
> > > this runner executes the routine/workflow logic in the jar,
> > > besides that then a mount of the NFS being the only admin
> > > on the box, everything else being run up out of the
> > > environment from the build artifact.
> > >
> > > The build artifact then looks that I'd use Spring for
> > > wiring a container and also configuration profiles and
> > > maybe Spring AOP and this kind of thing, i.e., just
> > > spring-core (toward avoiding "all" of spring-boot).
> > >
> > > Then, with local (in-memory and file) and remote
> > > (distributed) implementations, basically the
> > > design is to the distributed components, making
> > > abstract those patterns then implementing for the
> > > usual local implementation as standard containers
> > > and usual remote implementation as building transactions
> > > and defined behavior over the network.
> >
> > Having been researching this a bit more, and
> > tapping at the code, I've written out most of
> > the commands then to build a state machine of
> > the results, and, having analyze the algorithm
> > of article ingestion and group and session state,
> > have defined interfaces suitable either for local
> > or remote operation, with the notion that local
> > operation would be self-contained (with a quite
> > simple file backing) while remote operation would
> > be quite usually durable and horizontally scalable.
> >
> > I've written up a message reader/writer interface
> > or ("Scanner" and "Printer") for non-blocking I/O
> > and implementing reading Commands and writing Results
> > via non-blocking I/O. This should allow connection
> > scaling, with threads on accepter/closer and reader/
> > writer and an execution pool for the commands. The
> > Scanner and Printer use some BufferPool (basically
> > abut 4*1024 or 4K buffers), with an idea that that's
> > pretty much all the I/O usage of RAM and is reasonably
> > efficient, and that if RAM is hogged it's simple enough
> > to self-throttle the reader for the writer to balance
> > out.
> >
> > About the runtime, basically the idea is to have it
> > installable as a "well-known service" for "socket
> > activation" as via inetd or systemd. The runtime is
> > really rather lean and starts quickly, here on-demand,
> > that it can be configured as "on-demand" or "long-running".
> > For some container without systemd or the equivalent,
> > it could have a rather lean nanny. There's some notion
> > of integrating heartbeat or status about Main.main(),
> > then that it runs as "java -jar nntp.jar".
> >
> > Where the remote backing store or article file system
> > is some network file system, it also seems that the
> > runtime would so configure dependency on its file system
> > resource with quite usual system configuration tools,
> > for a fault-tolerant and graceful box that reboots as activable.
> >
> > It interests me that SMTP is quite similar to NNTP. With
> > an idea of an on-demand server, which is quite rather usual,
> > these service nodes run on the smallest cloud instances
> > (here the "t2.nano") and scale to traffic, with a very low
> > idle or simply the "on-demand" (then for "containerized").
> >
> >
> > About usenet them I've been studying what it would mean to
> > be compliant and example what to do with some "control" or
> > "junk" (sideband) groups and otherwise what it would mean
> > and take to make a horizontally scalable elastic cloud
> > usenet server (and persistent store). This is where the
> > service node is quite lean, the file store and database
> > (here of horizontally scalable "tables") is basically unbounded.
>
> I've collected what RFC's or specs there are for usenet,
> then having surveyed the most of the specified use cases,
> have cataloged descriptions of the commands about the protocol
> that they are self-contained descriptions within the protocol
> of each command. Then, for where there is the protocol and
> perhaps any exchange or change of the protocol, for example
> for TLS, then that is also being worked into the state machine
> of sorts (simply enough a loop over the input buffer to generate
> command values from the input given the command descriptions),
> for that then as commands are generated (and maintained in their
> order) that the results (eg, in the parallel) are thus computed
> and returned (again back in the order).
>
> Then, within the protocol, and basically for encryption and
> compression, these are established within the protocol instead
> of, for example, externally to the protocol. So, there is
> basically a filter between the I/O reader and I/O writer and
> the scanner and the printer, as it were, that scans input data
> to commands and writes command results to output data. This is
> again with the "non-blocking I/O" then about that the blocks or
> buffers I've basically settled to 4 kibibyte (4KB) buffers, where,
> basically an entire input or output in the protocol (here a message
> body or perhaps a list of up to all the article numbers) would be
> buffered (in RAM), so I'm looking to spool that off to disk if it
> so results that essentially unbounded inputs and outputs are to be
> handled gracefully in the limited CPU, RAM, I/O, and disk resources
> of the usually quite reliable but formally unreliable computing node
> (and at cost).
>
> The data structures for access and persistence evolve as the in-memory
> and file-based local editions and networked or cloud remote editions.
> The semantics are built out to the remote editions, as then they can be
> erased in the difference for efficiencies of the local editions.
> The in-memory structures (with the article bodies themselves yet
> actually written to a file store) are quite efficient and bounded
> by RAM or the heap, the file-based structures which makes use of the
> memory-mapped files as you may well know comprise all the content of
> "free" RAM caching the disk files may be mostly persistent with
> a structure that can be bounded by disk size, then the remote network-
> based structures here have a usual expectation of being highly reliable
> (i.e., that the remote files, queues, and records have a higher reliability
> than any given component in their distributed design, at the corresponding
> cost in efficiency and direct performance, but of course, this is design
> for correctness).
>
> So, that said, then I'm tapping away at the implementation of a queue of
> byte buffers, or the I/O RAM convention. Basically, there is some I/O,
> and it may or may not be a complete datum or event in the protocol, which
> is 1-client-1-server or a stateful protocol. So, what is read off the
> I/O buffer, so the I/O controller can service that and other I/O lines,
> is copied to a byte buffer. Then, this is to be filtered as above as
> necessary, that it is copied to a list of byte buffers (a double ended
> queue or linked list). These buffers maintain their current position
> and limit, from their beginning, the "buffer" is these pointers and the
> data itself. So, that's their concrete type already, then the scanner
> or printer also maintains its scan or print position, that the buffer can
> be filled and holds some data, then that as the scan pointer moves past
> a buffer boundary, that buffer can be reclaimed, with only moving the
> scan pointer when a complete datum is read (here as defined for the scanner
> in small constant terms by the command descriptions as above).
>
> So, that is pretty much sorted out, then about that basically it should
> ingest articles just fine and be a mostly compliant NNTP server.
>
> Then, generating the overview and such is another bit to get figured out,
> which is summary.
>
> Another thing in this design to get figured out is how to implement the
> queue and database action for the remote, where, the cost efficiency of
> the (managed, durable, redundant) remote database, is on having a more-or-
> less constant (and small) rate of reads and writes. Then the distributed
> queue will hold the backlog, but, the queue consumer is to be constant
> rate not for the node but for the fleet, so I'm looking at how to implement
> some leader election (fault-tolerance) or otherwise to have loaner threads
> of the runtime for any service of the queue. This is where, ingestion is
> de-coupled from inbox, so, there's an idea of having a sentinel queue consumer
> (because this data might be high volume or low or zero) on a publish/subscribe,
> it listens to the queue and if it gets an item it refuses it and wakes up
> the constant-rate (or spiking) queue consumer workers, that then proceed
> with the workflow items and then retire themselves if and when traffic drops
> to zero again, standing back up the sentinel consumer.
>
>
> Anyways that's just about how to handle variable load but here there's
> that it's OK for the protocol to separate ingestion and inbox, otherwise
> establishing the completion of the workflow item from the initial request
> involves usual asynchronous completion considerations.
>
>
> So, that said, then, the design is seeming pretty flexible, then about,
> what extension commands might be suitable. Here the idea is about article
> transfer and which articles to transfer to other servers. The idea is to
> add some X-RETRANSFER-TO command or along these lines,
>
> X-RETRANSFER-TO host [group [dateBegin [dateEnd]]]
>
> then that this simply has the host open a connection to the other host
> and offer via IHAVE/CHECK/TAKETHIS all the articles so in the range
> or until the connection is closed. This way then, for example, if this
> NNTP system was running, and, someone wanted a subset of the articles,
> then this command would have them sent out-of-band, or, "automatic out-feed".
> Figuring out how to re-distribute or message routing besides simple
> message store and retrieval is its own problem.
>
> Another issue is expiry, I don't really intend to delete anything, because
> the purpose is archival, but people still use usenet in some corners of
> the internet for daily news, again that's its own problem. Handling
> out-of-order ingestion with the backfilling or archives as they can be
> discovered is another issue, with that basically being about filling a
> corpus of the messages, then trying to organize them that the message
> date is effectively the original injection date.
>
>
> Anyways, it proceeds along these lines.



One of the challenges of writing this kind of system
is vending the article-id's (or article numbers) for
each newsgroup of each message-id. The message-id is
received with the article as headers and body, or set
as part of the injection info when the article is posted.
So, vending a number means that there is known a previous
number to give the next. Now, this is clear and simple
in a stand-alone environment, with integer increment or
"x = i++". It's not so simple in a distributed environment,
with that the queuing system does not "absolutely guarantee"
no dupes, with the priority being no drops, and also, the
independent workers A and B can't know the shared value of
x to make and take atomic increments, without establishing
a synchronization barrier, here over the network, which is
to be avoided (eg, blocking and locking on a database's
critical transactional atomic sequence.nextval, with, say,
a higher guarantee of no gaps). So, there is a database
for vending strictly increasing numbers, each group of
an article has a current number and there's an "atomic
increment" feature thus that A working on A' will get
i+1 and B working on B' will get i+2 (or maybe i+3, if
for example the previous edition of B died). If A working
on A' and B working on A' duplicated from the queue get
i+1 and i+2, then, there is as mentioned above a conditional
update to make sure the article number always increases,
so there is a gap from the queue dupe or a gap from the
worker drop, but then A or B has a consistent view of the
article-id of A' or B'.

So, then with having the number, once that's established,
then all's well and good to associate the message-id, and
the article-id.

group: article-id -> message-id
message: groups -> article-ids

Then, looking at the performance, this logical association
is neatly maintainable in the DB tables, with consistent
views for A and B. But it's a limited resource, in this
implementation, there are actually only so many reads and
writes per period. So, workers can steadily chew away the
intake queue, assigning numbers, but then querying for the
numbers is also at a cost, which is primarily what the
reader connections do.

Then, the idea is to maintain the logical associations, of
the message-id <-> article-id, also in a growing file, with
a write-once read-many file about the NFS file system. There's
no file locking, and, writes to the file that are disordered
or contentious could (and by Murphy's law, would) write corrupt
entries to the file. There are various notions of leader election
or straw-pulling for exactly one of A or B to collect the numbers
in order and write them to the article-ids file, one "row" (or 64
byte fixed length record) per number, at the offset 64*number
(as from some 0 or the offset from the first number). But,
consensus and locking for serialization of tasks couples A and B
which are otherwise running entirely independently. So, then
the idea is to identify the next offset for the article-ids file,
and collect a batch of numbers as make a block-sized block of
the NFS implementation (eg 4Kb or 8Kb and hopefully configurably
and not 1Mb which is about 64Kb records of 64b each). So, as
A and B each collect the numbers (and detect if there were gaps
now) then either (or both) completes a segment to append to the
file. There aren't append modes of the NFS files, which is fine
because actually the block now is written to the computed offset,
which is the same for A and B. In the off chance A and B both
make writes, file corruption doesn't follow because it's the
same content, and it's block size, and it's an absolute offset.

So, in this way, it seems that over time, the contents of the DB
are written out to the sequence by article-id of message-id for
each group

group: article-id -> message-id

besides that the message-id folder contains the article-ids

message-id: groups -> article-id

the content of which is known when the article-id numbers for
the groups of the message are vended.


Then, in the usual routine of looking up the message-id or
article-id given the group, the DB table is authoritative,
but, the NFS file is also correct, where a value exists.
(Also it's immutable or constant and conveniently a file.)
So, readers can map into memory the file, and consult the
offset in the file, to find the message-id for the requested
article-id, if that's not found, then the DB table, where it
would surely be, as the message-id had vended an article-id,
before the groups article-id range was set to include the
new article.

When a range of the article numbers is passed, then effectively,
the lookup will always be satisfied by the file lookup instead
of the DB table lookup, so there won't be the cost of the DB
table lookup. In some off chance the open files of the NFS
(also a limited resource, say 32K) are all exhausted, there's
still a DB table to read, that is a limited and expensive
resource, but also elastic and autoscalable.

Anyways, this design issue also has the benefit of keeping it
so that the file system has a convention with that all the data
remains in the file system, with then usual convenience in
backup and durability concerns, while still keeping it correct
and horizontally scalable, basically with the notion of then
even being able to truncate the database in any lull of traffic,
for that the entire state is consistent on the file system.

It remains to be figured out that NFS is OK with writing duplicate
copies of a file block, toward having this highly reliable workflow
system.


That is basically the design issue then, I'm tapping away on this.
Ross A. Finlayson
2016-12-15 04:31:52 UTC
Reply
Permalink
On Tuesday, December 13, 2016 at 12:05:13 AM UTC-8, Ross A. Finlayson wrote:
> On Sunday, December 11, 2016 at 4:19:27 PM UTC-8, Ross A. Finlayson wrote:
> > On Friday, December 9, 2016 at 1:46:54 PM UTC-8, Ross A. Finlayson wrote:
> > > On Tuesday, December 6, 2016 at 4:40:26 PM UTC-8, Ross A. Finlayson wrote:
> > > > On Sunday, December 4, 2016 at 9:19:39 PM UTC-8, Yuri Kreaton wrote:
> > > > > On 12/4/2016 10:05 PM, Ross A. Finlayson wrote:
> > > > > > On Sunday, December 4, 2016 at 6:57:19 PM UTC-8, David Melik wrote:
> > > > > >> On 12/01/2016 08:24 PM, Ross A. Finlayson wrote:
> > > > > >>>
> > > > >
> > > > > >>> I have an idea here to build a usenet server
> > > > > >>> only for sci.math and sci.logic. The idea is
> > > > > >>> to find archives of sci.math and sci.logic and
> > > > > >>> to populate a store of the articles in a more
> > > > > >>> or less enduring form (say, "on the cloud"),
> > > > > >>> then to offer some usual news server access
> > > > > >>> then to, say, 1 month 3 month 6 month retention,
> > > > > >>> and then some cumulative retention (with a goal
> > > > > of unlimited retention of sci.math and sci.logic
> > > > > >>> articles). The idea would be to have basically
> > > > > >>> various names of servers then reflect those
> > > > > >>> retentions for various uses for a read-only
> > > > > >>> archival server and a read-only daily server
> > > > > >>> and a read-and-write posting server. I'm willing
> > > > > >>> to invest time and effort to write the necessary
> > > > > >>> software and gather existing archives and integrate
> > > > > >>> with existing usenet providers to put together these
> > > > > >>> things.
> > > > >
> > > > > >>> Then, where basically it's in part an exercise
> > > > > >>> in vanity, I've been cultivating some various
> > > > > >>> notions of how to generate some summaries or
> > > > > >>> reports of various post, articles, threads, and
> > > > > >>> authors, toward the specialization of the cultivation
> > > > > >>> of summary for reporting and research purposes.
> > > > > >>>
> > > > >
> > > > > >>> So, I wonder others' idea about such a thing and
> > > > > >>> how they might see it as a reasonably fruitful
> > > > > >>> thing, basically for the enjoyment and for the
> > > > > >>> most direct purposes of the authors of the posts.
> > > > > >>>
> > > > > >>>
> > > > > >>> I invite comment, as I have begun to carry this out.
> > > > >
> > > > > >> What about including all the sci.math.* and alt.math.*?
> > > > > >
> > > > > > It's a matter of scale and configuration.
> > > > > >
> > > > > > It should scale quite well enough, though at some point
> > > > > > it would involve some money. In rough terms, it looks
> > > > > > like storing 1MM messages is ~$25/month, and supporting
> > > > > > readers is a few cents a day but copying it would be
> > > > > > twenty or thirty dollars. (I can front that.)
> > > > >
> > > > > > I'm for it where it might be useful, where I hope to
> > > > > > establish an archive with the goal of indefinite retention,
> > > > > > and basically to present an archive and for my own
> > > > > > purposes to generate narratives and timelines.
> > > > >
> > > > > > The challenge will be to get copies of archives of these
> > > > > > newsgroups. Somebody out of news.admin.peering might
> > > > > > have some insight into who has the Dejanews CDs or what
> > > > > > there might be in the Internet Archive Usenet Archive,
> > > > > > then in terms of today's news servers which claim about
> > > > > > ten years retention. Basically I'm looking for twenty
> > > > > > plus years of retention.
> > > > >
> > > > > > Now, some development is underway, and in no real hurry.
> > > > > > Basically I'm looking at the runtimes and a software
> > > > > > library to be written, (i.e., interfaces for the components
> > > > > > above and local file-system versions for stack-on-a-box,
> > > > > > implementing a subset of NNTP, in a simple service runtime
> > > > > > that idles really low).
> > > > >
> > > > > > Then, as above, it's kind of a vanity project or author-centric,
> > > > > > about making it so that custom servers could be stood up with
> > > > > > whatever newsgroups you want with the articles filtered
> > > > > > however you'd so care, rendered variously.
> > > > > >
> > > > > >
> > > > >
> > > > > talk to the other news group server admins out there, read their web
> > > > > information, call the dudes or email them, some will hook you up free.
> > > > > you attach,and you get all sci.math they have and get updates, I think
> > > > > for free, suck, feed the linkup are called. they may also want to link
> > > > > up with you for redundancy
> > > > >
> > > > > not much sw to write either, look for newsgroups w server in the name
> > > > > for info, some were very good 5 years ago
> > > >
> > > >
> > > >
> > > > I've been studying this a bit more.
> > > >
> > > > I set up a linux development environment
> > > > by installing ubuntu to a stick PC, then
> > > > installing vim, gcc, java, mvn, git. While
> > > > ubuntu is a debian distribution and Amazon
> > > > Linux (a designated target) is instead along
> > > > the lines of RedHat/Yellowdog (yum, was rpm,
> > > > instead of apt-get, for component configuration),
> > > > then I'm pretty familiar with these tools.
> > > >
> > > > Looking to the available components, basically
> > > > the algorithm is being designed with data
> > > > structures that can be local or remote. Then,
> > > > these are usually that much more complicated
> > > > than just the local or just the remote, and
> > > > here also besides the routine or state machine
> > > > also the exception or error handling and the
> > > > having of the queues everywhere for both
> > > > throttling and delay-retries (besides the
> > > > usual inline re-tries and about circuit
> > > > breaker). So, this is along the lines of
> > > > "this is an object/octet store" (and AWS
> > > > has an offering "Elastic File System" which
> > > > is an NFS Networked File System that looks
> > > > quite the bit more economical than S3 for
> > > > this purpose), "this is a number allocator"
> > > > (without sequence.nextVal in an RDBMS, the
> > > > requirements allow some gaps in the sequence,
> > > > here to use some DynamoDB table attribute's
> > > > "atomic counter"), then along the lines of
> > > > "this is a queue" and separately "I push to
> > > > queues" and "I pop queues", and about "queue
> > > > this for right now" and "queue this for later".
> > > > Then, there's various mappings, like id to number
> > > > and number to id, where again for no-drops / no-dupes
> > > > / Murphy's-law that the state of the mappings is
> > > > basically "forward-safe" and that retries make
> > > > the system robust and "self-healing". Other mappings
> > > > include a removed/deleted bag, this basically looks
> > > > like a subset of a series or range of the assigned
> > > > numbers, of the all-table and each group-table,
> > > > basically numbers are added as attributes to the
> > > > item for the series or range.
> > > >
> > > > Octet Store
> > > > Queue
> > > > Mapping
> > > >
> > > > Then, as noted above, with Murphy's law, any of the
> > > > edges of the flowgraph can break at any time, about
> > > > the request/response each that defines the boundary
> > > > (and a barrier), there is basically defined an abstract
> > > > generic exception "TryableException" that has only two
> > > > subclasses, "Retryable" and "Nonretryable". Then, the
> > > > various implementations of the data structures in the
> > > > patterns of their use variously throw these in puking
> > > > back the stack trace, then for inline re-tries, delay
> > > > re-tries, and fails. Here there's usually a definition
> > > > of "idempotence" for methods that are re-tryable besides
> > > > exceptions that might go away. The idea is to build
> > > > this into the procedure, so it's all built at compile-
> > > > time the correctness of the composition of the steps
> > > > of the flowgraph of the procedure.
> > > >
> > > >
> > > > Then, for the runtime, basically it will be some Java
> > > > container on the host or in a container, with basically
> > > > a cheap simple watchdog/heartbeat that uses signals on
> > > > unix (posix) to be keeping the service/routine nodes
> > > > (that can fail) up, to bounce (restart) them with signals,
> > > > and to reasonably fail and alarm if thrashing of the
> > > > child process of the watchdog/nanny, with maybe some
> > > > timer update up to the watchdog/heartbeat. Then basically
> > > > this runner executes the routine/workflow logic in the jar,
> > > > besides that then a mount of the NFS being the only admin
> > > > on the box, everything else being run up out of the
> > > > environment from the build artifact.
> > > >
> > > > The build artifact then looks that I'd use Spring for
> > > > wiring a container and also configuration profiles and
> > > > maybe Spring AOP and this kind of thing, i.e., just
> > > > spring-core (toward avoiding "all" of spring-boot).
> > > >
> > > > Then, with local (in-memory and file) and remote
> > > > (distributed) implementations, basically the
> > > > design is to the distributed components, making
> > > > abstract those patterns then implementing for the
> > > > usual local implementation as standard containers
> > > > and usual remote implementation as building transactions
> > > > and defined behavior over the network.
> > >
> > > Having been researching this a bit more, and
> > > tapping at the code, I've written out most of
> > > the commands then to build a state machine of
> > > the results, and, having analyze the algorithm
> > > of article ingestion and group and session state,
> > > have defined interfaces suitable either for local
> > > or remote operation, with the notion that local
> > > operation would be self-contained (with a quite
> > > simple file backing) while remote operation would
> > > be quite usually durable and horizontally scalable.
> > >
> > > I've written up a message reader/writer interface
> > > or ("Scanner" and "Printer") for non-blocking I/O
> > > and implementing reading Commands and writing Results
> > > via non-blocking I/O. This should allow connection
> > > scaling, with threads on accepter/closer and reader/
> > > writer and an execution pool for the commands. The
> > > Scanner and Printer use some BufferPool (basically
> > > abut 4*1024 or 4K buffers), with an idea that that's
> > > pretty much all the I/O usage of RAM and is reasonably
> > > efficient, and that if RAM is hogged it's simple enough
> > > to self-throttle the reader for the writer to balance
> > > out.
> > >
> > > About the runtime, basically the idea is to have it
> > > installable as a "well-known service" for "socket
> > > activation" as via inetd or systemd. The runtime is
> > > really rather lean and starts quickly, here on-demand,
> > > that it can be configured as "on-demand" or "long-running".
> > > For some container without systemd or the equivalent,
> > > it could have a rather lean nanny. There's some notion
> > > of integrating heartbeat or status about Main.main(),
> > > then that it runs as "java -jar nntp.jar".
> > >
> > > Where the remote backing store or article file system
> > > is some network file system, it also seems that the
> > > runtime would so configure dependency on its file system
> > > resource with quite usual system configuration tools,
> > > for a fault-tolerant and graceful box that reboots as activable.
> > >
> > > It interests me that SMTP is quite similar to NNTP. With
> > > an idea of an on-demand server, which is quite rather usual,
> > > these service nodes run on the smallest cloud instances
> > > (here the "t2.nano") and scale to traffic, with a very low
> > > idle or simply the "on-demand" (then for "containerized").
> > >
> > >
> > > About usenet them I've been studying what it would mean to
> > > be compliant and example what to do with some "control" or
> > > "junk" (sideband) groups and otherwise what it would mean
> > > and take to make a horizontally scalable elastic cloud
> > > usenet server (and persistent store). This is where the
> > > service node is quite lean, the file store and database
> > > (here of horizontally scalable "tables") is basically unbounded.
> >
> > I've collected what RFC's or specs there are for usenet,
> > then having surveyed the most of the specified use cases,
> > have cataloged descriptions of the commands about the protocol
> > that they are self-contained descriptions within the protocol
> > of each command. Then, for where there is the protocol and
> > perhaps any exchange or change of the protocol, for example
> > for TLS, then that is also being worked into the state machine
> > of sorts (simply enough a loop over the input buffer to generate
> > command values from the input given the command descriptions),
> > for that then as commands are generated (and maintained in their
> > order) that the results (eg, in the parallel) are thus computed
> > and returned (again back in the order).
> >
> > Then, within the protocol, and basically for encryption and
> > compression, these are established within the protocol instead
> > of, for example, externally to the protocol. So, there is
> > basically a filter between the I/O reader and I/O writer and
> > the scanner and the printer, as it were, that scans input data
> > to commands and writes command results to output data. This is
> > again with the "non-blocking I/O" then about that the blocks or
> > buffers I've basically settled to 4 kibibyte (4KB) buffers, where,
> > basically an entire input or output in the protocol (here a message
> > body or perhaps a list of up to all the article numbers) would be
> > buffered (in RAM), so I'm looking to spool that off to disk if it
> > so results that essentially unbounded inputs and outputs are to be
> > handled gracefully in the limited CPU, RAM, I/O, and disk resources
> > of the usually quite reliable but formally unreliable computing node
> > (and at cost).
> >
> > The data structures for access and persistence evolve as the in-memory
> > and file-based local editions and networked or cloud remote editions.
> > The semantics are built out to the remote editions, as then they can be
> > erased in the difference for efficiencies of the local editions.
> > The in-memory structures (with the article bodies themselves yet
> > actually written to a file store) are quite efficient and bounded
> > by RAM or the heap, the file-based structures which makes use of the
> > memory-mapped files as you may well know comprise all the content of
> > "free" RAM caching the disk files may be mostly persistent with
> > a structure that can be bounded by disk size, then the remote network-
> > based structures here have a usual expectation of being highly reliable
> > (i.e., that the remote files, queues, and records have a higher reliability
> > than any given component in their distributed design, at the corresponding
> > cost in efficiency and direct performance, but of course, this is design
> > for correctness).
> >
> > So, that said, then I'm tapping away at the implementation of a queue of
> > byte buffers, or the I/O RAM convention. Basically, there is some I/O,
> > and it may or may not be a complete datum or event in the protocol, which
> > is 1-client-1-server or a stateful protocol. So, what is read off the
> > I/O buffer, so the I/O controller can service that and other I/O lines,
> > is copied to a byte buffer. Then, this is to be filtered as above as
> > necessary, that it is copied to a list of byte buffers (a double ended
> > queue or linked list). These buffers maintain their current position
> > and limit, from their beginning, the "buffer" is these pointers and the
> > data itself. So, that's their concrete type already, then the scanner
> > or printer also maintains its scan or print position, that the buffer can
> > be filled and holds some data, then that as the scan pointer moves past
> > a buffer boundary, that buffer can be reclaimed, with only moving the
> > scan pointer when a complete datum is read (here as defined for the scanner
> > in small constant terms by the command descriptions as above).
> >
> > So, that is pretty much sorted out, then about that basically it should
> > ingest articles just fine and be a mostly compliant NNTP server.
> >
> > Then, generating the overview and such is another bit to get figured out,
> > which is summary.
> >
> > Another thing in this design to get figured out is how to implement the
> > queue and database action for the remote, where, the cost efficiency of
> > the (managed, durable, redundant) remote database, is on having a more-or-
> > less constant (and small) rate of reads and writes. Then the distributed
> > queue will hold the backlog, but, the queue consumer is to be constant
> > rate not for the node but for the fleet, so I'm looking at how to implement
> > some leader election (fault-tolerance) or otherwise to have loaner threads
> > of the runtime for any service of the queue. This is where, ingestion is
> > de-coupled from inbox, so, there's an idea of having a sentinel queue consumer
> > (because this data might be high volume or low or zero) on a publish/subscribe,
> > it listens to the queue and if it gets an item it refuses it and wakes up
> > the constant-rate (or spiking) queue consumer workers, that then proceed
> > with the workflow items and then retire themselves if and when traffic drops
> > to zero again, standing back up the sentinel consumer.
> >
> >
> > Anyways that's just about how to handle variable load but here there's
> > that it's OK for the protocol to separate ingestion and inbox, otherwise
> > establishing the completion of the workflow item from the initial request
> > involves usual asynchronous completion considerations.
> >
> >
> > So, that said, then, the design is seeming pretty flexible, then about,
> > what extension commands might be suitable. Here the idea is about article
> > transfer and which articles to transfer to other servers. The idea is to
> > add some X-RETRANSFER-TO command or along these lines,
> >
> > X-RETRANSFER-TO host [group [dateBegin [dateEnd]]]
> >
> > then that this simply has the host open a connection to the other host
> > and offer via IHAVE/CHECK/TAKETHIS all the articles so in the range
> > or until the connection is closed. This way then, for example, if this
> > NNTP system was running, and, someone wanted a subset of the articles,
> > then this command would have them sent out-of-band, or, "automatic out-feed".
> > Figuring out how to re-distribute or message routing besides simple
> > message store and retrieval is its own problem.
> >
> > Another issue is expiry, I don't really intend to delete anything, because
> > the purpose is archival, but people still use usenet in some corners of
> > the internet for daily news, again that's its own problem. Handling
> > out-of-order ingestion with the backfilling or archives as they can be
> > discovered is another issue, with that basically being about filling a
> > corpus of the messages, then trying to organize them that the message
> > date is effectively the original injection date.
> >
> >
> > Anyways, it proceeds along these lines.
>
>
>
> One of the challenges of writing this kind of system
> is vending the article-id's (or article numbers) for
> each newsgroup of each message-id. The message-id is
> received with the article as headers and body, or set
> as part of the injection info when the article is posted.
> So, vending a number means that there is known a previous
> number to give the next. Now, this is clear and simple
> in a stand-alone environment, with integer increment or
> "x = i++". It's not so simple in a distributed environment,
> with that the queuing system does not "absolutely guarantee"
> no dupes, with the priority being no drops, and also, the
> independent workers A and B can't know the shared value of
> x to make and take atomic increments, without establishing
> a synchronization barrier, here over the network, which is
> to be avoided (eg, blocking and locking on a database's
> critical transactional atomic sequence.nextval, with, say,
> a higher guarantee of no gaps). So, there is a database
> for vending strictly increasing numbers, each group of
> an article has a current number and there's an "atomic
> increment" feature thus that A working on A' will get
> i+1 and B working on B' will get i+2 (or maybe i+3, if
> for example the previous edition of B died). If A working
> on A' and B working on A' duplicated from the queue get
> i+1 and i+2, then, there is as mentioned above a conditional
> update to make sure the article number always increases,
> so there is a gap from the queue dupe or a gap from the
> worker drop, but then A or B has a consistent view of the
> article-id of A' or B'.
>
> So, then with having the number, once that's established,
> then all's well and good to associate the message-id, and
> the article-id.
>
> group: article-id -> message-id
> message: groups -> article-ids
>
> Then, looking at the performance, this logical association
> is neatly maintainable in the DB tables, with consistent
> views for A and B. But it's a limited resource, in this
> implementation, there are actually only so many reads and
> writes per period. So, workers can steadily chew away the
> intake queue, assigning numbers, but then querying for the
> numbers is also at a cost, which is primarily what the
> reader connections do.
>
> Then, the idea is to maintain the logical associations, of
> the message-id <-> article-id, also in a growing file, with
> a write-once read-many file about the NFS file system. There's
> no file locking, and, writes to the file that are disordered
> or contentious could (and by Murphy's law, would) write corrupt
> entries to the file. There are various notions of leader election
> or straw-pulling for exactly one of A or B to collect the numbers
> in order and write them to the article-ids file, one "row" (or 64
> byte fixed length record) per number, at the offset 64*number
> (as from some 0 or the offset from the first number). But,
> consensus and locking for serialization of tasks couples A and B
> which are otherwise running entirely independently. So, then
> the idea is to identify the next offset for the article-ids file,
> and collect a batch of numbers as make a block-sized block of
> the NFS implementation (eg 4Kb or 8Kb and hopefully configurably
> and not 1Mb which is about 64Kb records of 64b each). So, as
> A and B each collect the numbers (and detect if there were gaps
> now) then either (or both) completes a segment to append to the
> file. There aren't append modes of the NFS files, which is fine
> because actually the block now is written to the computed offset,
> which is the same for A and B. In the off chance A and B both
> make writes, file corruption doesn't follow because it's the
> same content, and it's block size, and it's an absolute offset.
>
> So, in this way, it seems that over time, the contents of the DB
> are written out to the sequence by article-id of message-id for
> each group
>
> group: article-id -> message-id
>
> besides that the message-id folder contains the article-ids
>
> message-id: groups -> article-id
>
> the content of which is known when the article-id numbers for
> the groups of the message are vended.
>
>
> Then, in the usual routine of looking up the message-id or
> article-id given the group, the DB table is authoritative,
> but, the NFS file is also correct, where a value exists.
> (Also it's immutable or constant and conveniently a file.)
> So, readers can map into memory the file, and consult the
> offset in the file, to find the message-id for the requested
> article-id, if that's not found, then the DB table, where it
> would surely be, as the message-id had vended an article-id,
> before the groups article-id range was set to include the
> new article.
>
> When a range of the article numbers is passed, then effectively,
> the lookup will always be satisfied by the file lookup instead
> of the DB table lookup, so there won't be the cost of the DB
> table lookup. In some off chance the open files of the NFS
> (also a limited resource, say 32K) are all exhausted, there's
> still a DB table to read, that is a limited and expensive
> resource, but also elastic and autoscalable.
>
> Anyways, this design issue also has the benefit of keeping it
> so that the file system has a convention with that all the data
> remains in the file system, with then usual convenience in
> backup and durability concerns, while still keeping it correct
> and horizontally scalable, basically with the notion of then
> even being able to truncate the database in any lull of traffic,
> for that the entire state is consistent on the file system.
>
> It remains to be figured out that NFS is OK with writing duplicate
> copies of a file block, toward having this highly reliable workflow
> system.
>
>
> That is basically the design issue then, I'm tapping away on this.



Tapping away at this idea of a usenet server system,
I've written much of the read routine that is the
non-blocking I/O with the buffer passing and for the
externally coded data and any different coded data
like the unencrypted or uncompressed. I've quite
settled on 4KiB (2^12B) as the usual buffer page,
and it looks that the NFS offering can be so tuned
that its wsize (write size) is 4096 and with an
async NFS write option that that page size will
have that writes are incorruptible (though for
whatever reason they may be lost), and that 4096B
or 256 entries of 64B (2^6B) for a message-id or oversize-
message-id entry will spool off the message-id's of
the group's articles at an offset in the file that
is article-id * (1 << 6). The MTU of Ethernet packets
is often 1500 so having a wsize of 1KiB is not
nonsensible, as many of the writes are of this
granularity, the MTU might be 9001 or jumbo, which
would carry 2 4KiB NFS packets in one Ethernet packet.
Having the NFS rsize (read size) say 32KiB seems not
unreasonable, with that the reads will be pages of the
article-id's, or, the article contents themselves (split
to headers, xrefs, body) from the filesystem that are
mostly some few key and mostly quite altogether > 32 KiB,
which is quite a lot considering that's less than a JPEG
the size of "this". (99+% of Internet traffic was JPEG
and these days is audio/video traffic, often courtesy JPEG.)

Writing the read routine is amusing me with training the
buffers and it amuses me to write code with quite the
few +1 and -1 in the offsets. Usually having +-1 in
the offset computations is a good or a bad thing, rarely
good, with that often it's a sign that the method signature
just isn't being used quite right in terms of the locals,
if not quite as bad as "build a fence a mile then move it
a foot". When +-1 offsets is a good thing, here the operations
on the content of the buffers are rather agnostic the bounds
and amount of the buffers, thus that I/O should be quite
expedient in the routine.

(Written in Java, it should run quite the same on any
runtime with Java 1.4+.)

That said then next I'm looking to implement the Executor pool.

Acceptor -> Reader -> Scanner -> Executor -> Printer -> Writer

The idea of the Executor pool is that there are many connections
or sessions (the protocol is stateful), then that for one session,
its command's results are returned in order, but, that doesn't say
that the commands are executed in order, just that their results
are returned in order. (For some commands, which affect the state
of the session like current group or current article, that being
pretty much it, those also have to be executed sequentially for
consistency's sake.) So, I'm looking to have the commands be
executed in any possible order, for the usual idea of saturating
the bandwidth of the horizontally scalable backend. (Yeah, I
know NFS has limits, but it's unbounded and durable, and there's
overall a consistent, non-blocking toward lock-free view.)
Anyways, basically the Session has a data structure of its
outstanding commands, as they're enqueued to the task executor,
then whether it can go into the out-of-order pool or must stay
in the serial pool. Then, as the commands complete, or for
example timeout after retries on some network burp, those are
queued back up as the FIFO of the Results and as those arrive
the Writer is re-registered with the SocketChannel's Selector
for I/O notifications and proceeds to fill the socket's output
buffer and retire the Command and Result. One aspect of this
is that the Printer/Writer doesn't necessarily get the data on
the heap, the output for example an article is composed from
the FileChannels of the message-id's header, xref, body. Now,
these days, the system doesn't have much of a limit in open
file handles, but as mentioned above there are limits on NFS
file handles. Basically then the data is retrieved as from the
object store (or here an octet store but the entire contents of
the files are written to the output with filesystem transfer
direct to memory or the I/O channel). Then, releasing the
NFS file handles expeditiously basically is to be figured out
with caching the contents, for any retransmission or simply
serving copies of the current articles to any number of
connections. As all these are, read-only, it looks like the
filesystems' built-in I/O caching with, for example, a read-only
client view and no timeout, basically turns the box into a file
cache, because that is what it is.

Then, it looks like there is a case for separate reader and
writer implementations altogether of the NFS or octet store
(that here is an object store for the articles and their
sections, and an octet store for the pages of the tables).
This is with the goal of minimizing network access while
maintaining the correct view. But, an NFS export can't
be mounted twice from the same client (one for reads and
one for writes), and, while ingesting the message can be
done separately the client, intake has to occur from the
client, then what with a usual distributed cloud queue
implementation having size and content limits, it seems
like it'll be OK.
Ross A. Finlayson
2016-12-17 22:58:10 UTC
Reply
Permalink
On Tuesday, December 13, 2016 at 12:05:13 AM UTC-8, Ross A. Finlayson wrote:
>
>
> That is basically the design issue then, I'm tapping away on this.

The next thing I'm looking at is how to describe the "range",
as a data structure or in algorithms.

Here a "range" class in the runtime library is usually a
"bounds" class. I'm talking about a range, basically a
1-D range, about basically a subset of the integers,
then that the range is iterating over the subset in order,
about how to maintain that in the most maintainable and
accessible terms (in computational complexity's space and time
terms).

So, I'm looking to define a reasonable algebra of individuals,
subsets, segments, and rays (and their complements) that
naturally compose to objects with linear maintenance and linear
iteration and constant access of linear partitions of time-
series data, dense or sparse, with patterns and scale.

This then is to define data structures as so compose that
given a series of items and a predicate, establish the
subset of items as a "range", that then so compose as
above (and also that it has translations and otherwise
is a fungible iterator).

I don't have one of those already in the runtime library.

punch-out <- punches have shapes, patterns? eg 1010
knock-out <- knocks have area
pin-out <- just one
drop-out <-
fall-out <- range is out

Then basically there's a coalescence of all these,
that they have iterators or mark bounds, of the
iterator of the natural range or sequence, for then
these being applied in order

push-up <- basically a prioritization
fill-in <- for a "sparse" range, like the complement upside-down
pin-in
punch-in
knock-in

Then all these have the basic expectation that a range
is the combination of each of these that are expressions
then that they are expressions only of the value of the
iterator, of a natural range.

Then, for the natural range being time, then there is about
the granularity or fine-ness of the time, then that there is
a natural range either over or under the time range.

Then, for the natural range having some natural indices,
the current and effective indices are basically one and
zero based, that all the features of the range are shiftable
or expressed in terms of these offsets.

0 - history

a - z

-m,n

Whether there are pin-outs or knock-outs rather varies on
whether removals are one-off or half-off.

Then, pin-outs might build a punch-out,
While knock-outs might build a scaled punch-out

Here the idea of scale then is to apply the notions
of stride (stripe, stribe, striqe) to the range, about
where the range is for example 0, 1, .., 4, 5 .., 8, 9
that it is like 1, 3, 5, 7 scaled out.

Then, "Range" becomes quite a first-class data structure,
in terms of linear ranges, to implement usual iterators
like forward ranges (iterators).

Then, for time-forward searches, or to compose results in
ranges from time-forward searches, without altogether loading
into memory the individuals and then sorting them and then
detecting their ranges, there is to be defined how ranges
compose. So, the Range includes a reference to its space
and the Bounds of the Space (in integers then extended
precision integers).

"Constructed via range, slices, ..." (gslices), ....



Then, basically I want that the time series is a range,
that expressions matching elements are dispatched to
partitions in the range, that the returned or referenced
composable elements are ranges, that the ranges compose
basically pair-wise in constant time, thus linearly over
the time series, then that iteration over the elements
is linear in the elements in the range, not in the time
series. Then, it's still linear in the time series,
but sub-linear in the time series, also in space terms.

Here, sparse or dense ranges should have the same small-
linear space terms, with there being maintenance on the
ranges, about there being hysteresis or "worst-case 50/50"
(then basically some inertia for where a range is "dense"
or "sparse" when it has gt or lt .5 elements, then about
where it's just organized that way because there is a re-
organization).

So, besides composing, then the elements should have very
natural complements, basically complementing the range by
taking the complement of the ranges parts, that each
sub-structure has a natural complement.

Then, pattern and scale are rather related, about figuring
that out some more, and leaving the general purpose, while
identifying the true primitives of these.

Then eventually there attachment or reference to values
under the range, and general-purpose expressions to return
an iteration or build a range, about the collectors that
establish where range conditions are met and then collapse
after the iteration is done, as possible.

So, there is the function of the range, to iterate, then
there is the building of the range, by iterating. The
default of the range and the space is its bounds (or, in
the extended, that there are none). Then, segments are
identified by beginning and end (and perhaps a scale, about
rigid translations and about then that the space is
unsigned, though unbounded both left and right see
some use). These are dense ranges, then for whether the
range is "naturally" or initially dense or sparse. (The
usual notion is "dense/full" but perhaps that's as
"complement of sparse/empty".) Then, as elements are
added or removed in the space, if they are added range-wise
then that goes to a stack of ranges that any forward
iterator checks before it iterators, about whether the
natural space's next is in or out, or, whether there is
a skip or jump, or a flip then to look for the next item
that is in instead of out.

This is where, the usual enough organization of the data
as collected in time series will be bucketed or partitioned
or sharded into some segment of the space of the range,
that buiding range or reading range has the affinity to
the relevant bucket, partition, or shard. (This is all
1-D time series data, no need to make things complicated.)

Then, the interface basically "builds" or "reads" ranges,
building given an expression and reading as a read-out
(or forward iteration), about that then the implementation
is to compose the ranges of these various elements of a
topological sort about the bounds/segments and scale/patterns
and individuals.

https://en.wikipedia.org/wiki/Allen%27s_interval_algebra

This is interesting, for an algebra of intervals, or
segments, but here so far I'd been having that the
segments of contiguous individuals are eventually
just segments themselves, but composing those would
see the description as of this algebra. Clearly the
goal is the algebra of the contents of sets of integers
in the integer spaces.

An algebra of sets and segments of integers in integer spaces

An integer space defines elements of a type that are ordered.

An individual integer is an element of this space.

A set of integers is a set of integers, a segment of integers
is a set containing a least and greatest element and all elements
between. A ray of integers of a set containing a least element
and all greater elements or containing a greatest element and
all lesser elements.

A complement of an individual is all the other individuals,
a complement of a set is the intersection of all other sets,
a complement of a segment is all the elements of the ray less
than and the ray greater than all individuals of the segment.

What are the usual algebras of the compositions of individuals,
sets, segments, and rays?

https://en.wikipedia.org/wiki/Region_connection_calculus



Then basically all kinds of things that are about subsets
of thing in a topological or ordered space should basically
have a first-class representation as (various kinds of)
elements in the range algebra.

So, I'm wondering what there is already for
"range algebra" and "range calculus".
Ross A. Finlayson
2016-12-19 01:48:10 UTC
Reply
Permalink
On Saturday, December 17, 2016 at 2:58:16 PM UTC-8, Ross A. Finlayson wrote:
> On Tuesday, December 13, 2016 at 12:05:13 AM UTC-8, Ross A. Finlayson wrote:
> >
> >
> > That is basically the design issue then, I'm tapping away on this.
>
> The next thing I'm looking at is how to describe the "range",
> as a data structure or in algorithms.
>
> Here a "range" class in the runtime library is usually a
> "bounds" class. I'm talking about a range, basically a
> 1-D range, about basically a subset of the integers,
> then that the range is iterating over the subset in order,
> about how to maintain that in the most maintainable and
> accessible terms (in computational complexity's space and time
> terms).
>
> So, I'm looking to define a reasonable algebra of individuals,
> subsets, segments, and rays (and their complements) that
> naturally compose to objects with linear maintenance and linear
> iteration and constant access of linear partitions of time-
> series data, dense or sparse, with patterns and scale.
>
> This then is to define data structures as so compose that
> given a series of items and a predicate, establish the
> subset of items as a "range", that then so compose as
> above (and also that it has translations and otherwise
> is a fungible iterator).
>
> I don't have one of those already in the runtime library.
>
> punch-out <- punches have shapes, patterns? eg 1010
> knock-out <- knocks have area
> pin-out <- just one
> drop-out <-
> fall-out <- range is out
>
> Then basically there's a coalescence of all these,
> that they have iterators or mark bounds, of the
> iterator of the natural range or sequence, for then
> these being applied in order
>
> push-up <- basically a prioritization
> fill-in <- for a "sparse" range, like the complement upside-down
> pin-in
> punch-in
> knock-in
>
> Then all these have the basic expectation that a range
> is the combination of each of these that are expressions
> then that they are expressions only of the value of the
> iterator, of a natural range.
>
> Then, for the natural range being time, then there is about
> the granularity or fine-ness of the time, then that there is
> a natural range either over or under the time range.
>
> Then, for the natural range having some natural indices,
> the current and effective indices are basically one and
> zero based, that all the features of the range are shiftable
> or expressed in terms of these offsets.
>
> 0 - history
>
> a - z
>
> -m,n
>
> Whether there are pin-outs or knock-outs rather varies on
> whether removals are one-off or half-off.
>
> Then, pin-outs might build a punch-out,
> While knock-outs might build a scaled punch-out
>
> Here the idea of scale then is to apply the notions
> of stride (stripe, stribe, striqe) to the range, about
> where the range is for example 0, 1, .., 4, 5 .., 8, 9
> that it is like 1, 3, 5, 7 scaled out.
>
> Then, "Range" becomes quite a first-class data structure,
> in terms of linear ranges, to implement usual iterators
> like forward ranges (iterators).
>
> Then, for time-forward searches, or to compose results in
> ranges from time-forward searches, without altogether loading
> into memory the individuals and then sorting them and then
> detecting their ranges, there is to be defined how ranges
> compose. So, the Range includes a reference to its space
> and the Bounds of the Space (in integers then extended
> precision integers).
>
> "Constructed via range, slices, ..." (gslices), ....
>
>
>
> Then, basically I want that the time series is a range,
> that expressions matching elements are dispatched to
> partitions in the range, that the returned or referenced
> composable elements are ranges, that the ranges compose
> basically pair-wise in constant time, thus linearly over
> the time series, then that iteration over the elements
> is linear in the elements in the range, not in the time
> series. Then, it's still linear in the time series,
> but sub-linear in the time series, also in space terms.
>
> Here, sparse or dense ranges should have the same small-
> linear space terms, with there being maintenance on the
> ranges, about there being hysteresis or "worst-case 50/50"
> (then basically some inertia for where a range is "dense"
> or "sparse" when it has gt or lt .5 elements, then about
> where it's just organized that way because there is a re-
> organization).
>
> So, besides composing, then the elements should have very
> natural complements, basically complementing the range by
> taking the complement of the ranges parts, that each
> sub-structure has a natural complement.
>
> Then, pattern and scale are rather related, about figuring
> that out some more, and leaving the general purpose, while
> identifying the true primitives of these.
>
> Then eventually there attachment or reference to values
> under the range, and general-purpose expressions to return
> an iteration or build a range, about the collectors that
> establish where range conditions are met and then collapse
> after the iteration is done, as possible.
>
> So, there is the function of the range, to iterate, then
> there is the building of the range, by iterating. The
> default of the range and the space is its bounds (or, in
> the extended, that there are none). Then, segments are
> identified by beginning and end (and perhaps a scale, about
> rigid translations and about then that the space is
> unsigned, though unbounded both left and right see
> some use). These are dense ranges, then for whether the
> range is "naturally" or initially dense or sparse. (The
> usual notion is "dense/full" but perhaps that's as
> "complement of sparse/empty".) Then, as elements are
> added or removed in the space, if they are added range-wise
> then that goes to a stack of ranges that any forward
> iterator checks before it iterators, about whether the
> natural space's next is in or out, or, whether there is
> a skip or jump, or a flip then to look for the next item
> that is in instead of out.
>
> This is where, the usual enough organization of the data
> as collected in time series will be bucketed or partitioned
> or sharded into some segment of the space of the range,
> that buiding range or reading range has the affinity to
> the relevant bucket, partition, or shard. (This is all
> 1-D time series data, no need to make things complicated.)
>
> Then, the interface basically "builds" or "reads" ranges,
> building given an expression and reading as a read-out
> (or forward iteration), about that then the implementation
> is to compose the ranges of these various elements of a
> topological sort about the bounds/segments and scale/patterns
> and individuals.
>
> https://en.wikipedia.org/wiki/Allen%27s_interval_algebra
>
> This is interesting, for an algebra of intervals, or
> segments, but here so far I'd been having that the
> segments of contiguous individuals are eventually
> just segments themselves, but composing those would
> see the description as of this algebra. Clearly the
> goal is the algebra of the contents of sets of integers
> in the integer spaces.
>
> An algebra of sets and segments of integers in integer spaces
>
> An integer space defines elements of a type that are ordered.
>
> An individual integer is an element of this space.
>
> A set of integers is a set of integers, a segment of integers
> is a set containing a least and greatest element and all elements
> between. A ray of integers of a set containing a least element
> and all greater elements or containing a greatest element and
> all lesser elements.
>
> A complement of an individual is all the other individuals,
> a complement of a set is the intersection of all other sets,
> a complement of a segment is all the elements of the ray less
> than and the ray greater than all individuals of the segment.
>
> What are the usual algebras of the compositions of individuals,
> sets, segments, and rays?
>
> https://en.wikipedia.org/wiki/Region_connection_calculus
>
>
>
> Then basically all kinds of things that are about subsets
> of thing in a topological or ordered space should basically
> have a first-class representation as (various kinds of)
> elements in the range algebra.
>
> So, I'm wondering what there is already for
> "range algebra" and "range calculus".

Some of the features of this subsets of a
range of integers is available as a usual
bit vector, eg with ffs ("find-first-set")
memory scan instructions memory scan instructions,
and as well usual notions of compressed bitmap
indices, with some notion of random access to
the value of a bit by its index and variously
iterating over the elements. Various schemes
to compress the bitmaps down to uncompressed
regions with representing words' worths of bits
may suit parts of the implementation, but I'm
looking for a "pyramidal" or "multi-resolution"
organization of efficient bits, and also flags,
about associating various channels of bits with
the items or messages.

https://en.wikipedia.org/wiki/Bitmap_index

Then, with having narrowed down the design for
what syntax to cover, and, mostly selected data
structures for the innards, then I've been looking
to the data throughput, then some idea of support
of client features.

Throughput is basically about how to keep the
commands moving through. For this, there's a
single thread that reads off the network interface'
I/O buffers, it was also driving the scanner, but
adding encryption and compression layers, then there's
also adding a separate thread to drive the scanner
thus that the network interface is serviced on demand.
Designing a concurrent data structure basically has
a novel selector (as of the non-blocking I/O) to
then pick off a thread from the pool to run the
scanner. Then, on the "printer" side and writing
off to the network interface, it is similar, with
having the session or connection's resources run
the compression and encryption, then for the I/O
thread as servicing the network interface. Basically
this is having put a collator/relay thread between
the I/O threads and the scanner/printer threads
(where the commands are run by the executor pool).


Then, a second notion has been the support of TLS.
It looks I would simply sign a certificate and expect
users to check and install it themselves in their
trust-store for SSL/TLS. That said, it isn't really
a great solution, because, if someone compromises any
of the CA's, certificate authorities, in the trust
store (any of them), then a man-in-the-middle could
sign a cert, and it would be on the server to check
that the content hash reflected the server cert from
the handshake. What might be better would be to have
that each client, signs their own certificate, for the
server to present. This way, the client and server
each sign a cert, and those are exchanged. When the
server gets the client cert, it restarts the negotiation
now with using the client-signed cert as the server
cert. This way, there's only a trust anchor of depth
1 and the trust anchors are never exchanged and can
not be cross-signed nor otherwise would ever share
a trust root. Similarly the server get's the server-
signed cert back from the client then that TLS could
proceed with a session ticket and that otherwise there
would be a stronger protection from compromised CA
certs. Then, this could be pretty automatic with
a simple enough browser interface or link to set up TLS.
Then the server and client would only trust themselves
and each other (and keep their secrets private).

Then, for browsing, a reading of IMAP, the Internet
Message Access Protocol, shows a strong affinity with
the organization of Usenet messages, with newsgroups
as mailboxes. As well, implementing an IMAP server
that is backed by the NNTP server has then that the
search artifacts and etcetera (and this was largely
a reason why I need this improved "range" pattern)
would build for otherwise making deterministic date-
oriented searches over the messages in the NNTP server.
IMAP has a strong affinity with NNTP, and is a very
similar protocol and is implemented much the same
way. Then it would be convenient for users with
an IMAP client to simply point to "usenet.science"
or what and get usenet through their email browser.
Ross A. Finlayson
2016-12-24 06:21:09 UTC
Reply
Permalink
About implementing usenet with reasonably
modern runtimes and an eye toward
unlimited retention, basically looking
into "microtasks" for the routine or
workflow instances, as are driven with
non-blocking I/O throughout, basically
looking to memoize the steps as through
a finite state machine, for restarts as
of a thread, then to go from "service
oriented" to "message oriented".


This involves writing a bit of an
HTTP client for rather usual web
service calls, but with high speed
non-blocking I/O (less threads, more
connections). Also this involves a
sufficient abstraction.
Ross A. Finlayson
2017-01-06 21:56:50 UTC
Reply
Permalink
On Friday, December 23, 2016 at 10:21:16 PM UTC-8, Ross A. Finlayson wrote:
> About implementing usenet with reasonably
> modern runtimes and an eye toward
> unlimited retention, basically looking
> into "microtasks" for the routine or
> workflow instances, as are driven with
> non-blocking I/O throughout, basically
> looking to memoize the steps as through
> a finite state machine, for restarts as
> of a thread, then to go from "service
> oriented" to "message oriented".
>
>
> This involves writing a bit of an
> HTTP client for rather usual web
> service calls, but with high speed
> non-blocking I/O (less threads, more
> connections). Also this involves a
> sufficient abstraction.



This writing some software for usenet service
is coming along with the idea of how to implement
the fundamentally asynchronous non-blocking routine.
This is crystallizing in pattern as a: re-routine,
in reference to computing's usual: co-routine.

The idea of the re-routine is that there are only
so many workers, threads, of the runtime. The usual
runtimes (and this one, Java, say) support preemptive
multithreading as a means of implementing cooperative
multithreading, with the maintenance of separate stacks
(of, the stack machine of usual C-like procedural runtimes)
and some thread-per-connection model. This is somewhat
reasonable for the composition of blocking APIs, but
not so much for the composition of non-blocking APIs
and about how to not have many thread-per-connection
resources with essentially zero duty cycle that instead
could maintain for themselves the state machine of their
routine (with simplified forward states and a general
exception and error routine), for cooperative multi-threading.

The idea of this re-routine then is to connect functions,
there's a scope for variables in the scope, there is
execution of the functions (or here the routines, as
the "re-routines") then the instance of the re-routine
is re-entrant in the sense that as partial results are
accumulated the trace of the routine is marked out, with
leaving in the scope the current or partial or intermediate
results. Then, the asynchronous workers that fulfill each
routine (eg, with a lookup, a system call, or a network
call) are separate worker units dedicated to their domain
(of the routine, not the re-routine, and they can be blocking,
polling for their fleet, or callback with the ticket).

Then, this is basically a network machine and protocol,
here about NNTP and IMAP, and its resources are often
then of network machines and protocols (eg networked
file systems, web services). Then, these "machines"
of the "re-routine" being built (basically for the
streaming model instead of the batch model if you
know what I'm talking about) defining the logical
outcomes of the composition of the inputs and the
resulting outputs in terms of scopes as a model of
the cooperative multithreading, these re-routines
then are seeing for the pattern then that the
source template is about implicitly establishing
the scope and the passing and calling convention
(without a bunch of boilerplate or "callback confusion",
"async hell"). This is where the re-routine, when
a routine worker fills in a partial result and resubmits
the re-routine (with the responsibility/ownership of
the re-routine) that it is re-evaluated from the beginning,
because it is constant linear in reading forward for the
item the state of its overall routine, thusly implicit
without having to build a state machine, as it is
declaratively the routine.

So, I am looking at this as my solution as to how to
establish a very efficient (in resource and performance
terms) formally correct protocol implementation (and
with very simple declarative semantics of usual forward,
linear routines).

This "re-routine" pattern then as a model of cooperative
multithreading sees the complexity and work into the
catalog of blocking, polling, and callback support,
then for usual resource injection of those as all
supported with references to usual sequential processes
(composition of routine).
Ross A. Finlayson
2017-01-21 22:33:14 UTC
Reply
Permalink
On Friday, January 6, 2017 at 1:57:00 PM UTC-8, Ross A. Finlayson wrote:
> On Friday, December 23, 2016 at 10:21:16 PM UTC-8, Ross A. Finlayson wrote:
> > About implementing usenet with reasonably
> > modern runtimes and an eye toward
> > unlimited retention, basically looking
> > into "microtasks" for the routine or
> > workflow instances, as are driven with
> > non-blocking I/O throughout, basically
> > looking to memoize the steps as through
> > a finite state machine, for restarts as
> > of a thread, then to go from "service
> > oriented" to "message oriented".
> >
> >
> > This involves writing a bit of an
> > HTTP client for rather usual web
> > service calls, but with high speed
> > non-blocking I/O (less threads, more
> > connections). Also this involves a
> > sufficient abstraction.
>
>
>
> This writing some software for usenet service
> is coming along with the idea of how to implement
> the fundamentally asynchronous non-blocking routine.
> This is crystallizing in pattern as a: re-routine,
> in reference to computing's usual: co-routine.
>
> The idea of the re-routine is that there are only
> so many workers, threads, of the runtime. The usual
> runtimes (and this one, Java, say) support preemptive
> multithreading as a means of implementing cooperative
> multithreading, with the maintenance of separate stacks
> (of, the stack machine of usual C-like procedural runtimes)
> and some thread-per-connection model. This is somewhat
> reasonable for the composition of blocking APIs, but
> not so much for the composition of non-blocking APIs
> and about how to not have many thread-per-connection
> resources with essentially zero duty cycle that instead
> could maintain for themselves the state machine of their
> routine (with simplified forward states and a general
> exception and error routine), for cooperative multi-threading.
>
> The idea of this re-routine then is to connect functions,
> there's a scope for variables in the scope, there is
> execution of the functions (or here the routines, as
> the "re-routines") then the instance of the re-routine
> is re-entrant in the sense that as partial results are
> accumulated the trace of the routine is marked out, with
> leaving in the scope the current or partial or intermediate
> results. Then, the asynchronous workers that fulfill each
> routine (eg, with a lookup, a system call, or a network
> call) are separate worker units dedicated to their domain
> (of the routine, not the re-routine, and they can be blocking,
> polling for their fleet, or callback with the ticket).
>
> Then, this is basically a network machine and protocol,
> here about NNTP and IMAP, and its resources are often
> then of network machines and protocols (eg networked
> file systems, web services). Then, these "machines"
> of the "re-routine" being built (basically for the
> streaming model instead of the batch model if you
> know what I'm talking about) defining the logical
> outcomes of the composition of the inputs and the
> resulting outputs in terms of scopes as a model of
> the cooperative multithreading, these re-routines
> then are seeing for the pattern then that the
> source template is about implicitly establishing
> the scope and the passing and calling convention
> (without a bunch of boilerplate or "callback confusion",
> "async hell"). This is where the re-routine, when
> a routine worker fills in a partial result and resubmits
> the re-routine (with the responsibility/ownership of
> the re-routine) that it is re-evaluated from the beginning,
> because it is constant linear in reading forward for the
> item the state of its overall routine, thusly implicit
> without having to build a state machine, as it is
> declaratively the routine.
>
> So, I am looking at this as my solution as to how to
> establish a very efficient (in resource and performance
> terms) formally correct protocol implementation (and
> with very simple declarative semantics of usual forward,
> linear routines).
>
> This "re-routine" pattern then as a model of cooperative
> multithreading sees the complexity and work into the
> catalog of blocking, polling, and callback support,
> then for usual resource injection of those as all
> supported with references to usual sequential processes
> (composition of routine).

I've about sorted out how to implement the re-routine.

Basically a re-routine is a suspendable composite
operation, with normal declarative flow-of-control
syntax, that memo-izes its partial results, and
re-executes the same block of statements then to
arrive at its pause, completion, or exit.

Then, the command and executor are passed to the
implementation that has its own (or maybe the
same) execution resources, eg a thread or connection
pool. This resolves the value of the asynchronous
operation, and then re-submits the re-routine to
its originating executor. The re-routine re-runs
(it runs through the branching or flow-of-control
each time, but that's small in the linear and all
the intermediate products are already computed,
and the syntax is usual and in the language).
The re-routine then either re-suspends (as it
launches the next task) or completes or exits (errors).
Whether it suspends, completes or exits, the
re-routine just returns, and the executor then
is specialized and just checks the re-routine
whether it's suspended (and just drops it, the
new responsible launched will re-submit it),
or whether it's completed or errored (to call
back to the originating commander the result of
the command).


In this manner, it seems like a neat way to basically
establish the continuation, for this "non-blocking
asynchronous operation", while at the same time
the branching and flow of control is all in the
language, with the usual un-suprising syntax and
semantics, for cooperative multi-threading. The
cost is in wrapping the functional callers of the
routine and setting up their factories and otherwise
as via injection (and they can block the calling
thread, or have their own threads and block, or
be asynchronous, without changing the definition
of the routine).

So, having sorted this mostly out, then the usual
work as of implementing the routines for the protocol
can so proceed then with a usual notion of a framework
of support for both the simple declaration of routine
and the high performance (and low resource usage) of
the delegation of routine, and support for injection
for test and environment, and all in the language
with minimal clutter, no byte-code modification,
and a ready wrapper for libraries of arbitrary
run-time characteristic.

This solves some problems.
j4n bur53
2017-01-22 20:49:57 UTC
Reply
Permalink
Try this one maybe:
https://www.discourse.org/about/

Ross A. Finlayson schrieb:
> On Friday, January 6, 2017 at 1:57:00 PM UTC-8, Ross A. Finlayson wrote:
>> On Friday, December 23, 2016 at 10:21:16 PM UTC-8, Ross A. Finlayson wrote:
>>> About implementing usenet with reasonably
>>> modern runtimes and an eye toward
>>> unlimited retention, basically looking
>>> into "microtasks" for the routine or
>>> workflow instances, as are driven with
>>> non-blocking I/O throughout, basically
>>> looking to memoize the steps as through
>>> a finite state machine, for restarts as
>>> of a thread, then to go from "service
>>> oriented" to "message oriented".
John Gabriel
2017-01-22 22:01:16 UTC
Reply
Permalink
On Sunday, 22 January 2017 12:50:02 UTC-8, j4n bur53 wrote:
> Try this one maybe:
> https://www.discourse.org/about/
>
> Ross A. Finlayson schrieb:
> > On Friday, January 6, 2017 at 1:57:00 PM UTC-8, Ross A. Finlayson wrote:
> >> On Friday, December 23, 2016 at 10:21:16 PM UTC-8, Ross A. Finlayson wrote:
> >>> About implementing usenet with reasonably
> >>> modern runtimes and an eye toward
> >>> unlimited retention, basically looking
> >>> into "microtasks" for the routine or
> >>> workflow instances, as are driven with
> >>> non-blocking I/O throughout, basically
> >>> looking to memoize the steps as through
> >>> a finite state machine, for restarts as
> >>> of a thread, then to go from "service
> >>> oriented" to "message oriented".

Yes please. Take all your fellow cranks with you to a new usenet. The sooner the better.
Ross A. Finlayson
2017-01-23 01:01:00 UTC
Reply
Permalink
On Sunday, January 22, 2017 at 12:50:02 PM UTC-8, j4n bur53 wrote:
> Try this one maybe:
> https://www.discourse.org/about/
>
> Ross A. Finlayson schrieb:
> > On Friday, January 6, 2017 at 1:57:00 PM UTC-8, Ross A. Finlayson wrote:
> >> On Friday, December 23, 2016 at 10:21:16 PM UTC-8, Ross A. Finlayson wrote:
> >>> About implementing usenet with reasonably
> >>> modern runtimes and an eye toward
> >>> unlimited retention, basically looking
> >>> into "microtasks" for the routine or
> >>> workflow instances, as are driven with
> >>> non-blocking I/O throughout, basically
> >>> looking to memoize the steps as through
> >>> a finite state machine, for restarts as
> >>> of a thread, then to go from "service
> >>> oriented" to "message oriented".

No, thanks, that does not appear to meet my requirements.
j4n bur53
2017-01-23 01:35:52 UTC
Reply
Permalink
Something else, what are your hardware specs?

An Inferno on the Head of a Pin
https://blog.codinghorror.com/an-inferno-on-the-head-of-a-pin/

Ross A. Finlayson schrieb:
> On Sunday, January 22, 2017 at 12:50:02 PM UTC-8, j4n bur53 wrote:
>> Try this one maybe:
>> https://www.discourse.org/about/
>>
>> Ross A. Finlayson schrieb:
>>> On Friday, January 6, 2017 at 1:57:00 PM UTC-8, Ross A. Finlayson wrote:
>>>> On Friday, December 23, 2016 at 10:21:16 PM UTC-8, Ross A. Finlayson wrote:
>>>>> About implementing usenet with reasonably
>>>>> modern runtimes and an eye toward
>>>>> unlimited retention, basically looking
>>>>> into "microtasks" for the routine or
>>>>> workflow instances, as are driven with
>>>>> non-blocking I/O throughout, basically
>>>>> looking to memoize the steps as through
>>>>> a finite state machine, for restarts as
>>>>> of a thread, then to go from "service
>>>>> oriented" to "message oriented".
>
> No, thanks, that does not appear to meet my requirements.
>
Ross A. Finlayson
2017-01-23 01:57:38 UTC
Reply
Permalink
On Sunday, January 22, 2017 at 5:35:57 PM UTC-8, j4n bur53 wrote:
> Something else, what are your hardware specs?
>
> An Inferno on the Head of a Pin
> https://blog.codinghorror.com/an-inferno-on-the-head-of-a-pin/
>
> Ross A. Finlayson schrieb:
> > On Sunday, January 22, 2017 at 12:50:02 PM UTC-8, j4n bur53 wrote:
> >> Try this one maybe:
> >> https://www.discourse.org/about/
> >>
> >> Ross A. Finlayson schrieb:
> >>> On Friday, January 6, 2017 at 1:57:00 PM UTC-8, Ross A. Finlayson wrote:
> >>>> On Friday, December 23, 2016 at 10:21:16 PM UTC-8, Ross A. Finlayson wrote:
> >>>>> About implementing usenet with reasonably
> >>>>> modern runtimes and an eye toward
> >>>>> unlimited retention, basically looking
> >>>>> into "microtasks" for the routine or
> >>>>> workflow instances, as are driven with
> >>>>> non-blocking I/O throughout, basically
> >>>>> looking to memoize the steps as through
> >>>>> a finite state machine, for restarts as
> >>>>> of a thread, then to go from "service
> >>>>> oriented" to "message oriented".
> >
> > No, thanks, that does not appear to meet my requirements.
> >

Thanks for your interest, if you read the thread,
I'm talking about an implementation of usenet,
with modern languages and runtimes, but, with
a filesystem convention, and a distributed redundant
store, and otherwise of very limited hardware and
distributed software resources or the "free tier"
of cloud computing (or, any box).

When it comes to message formats, usenet isn't
limited to plain text, it's as simply usual
MIME multimedia. (The user-agent can render
text however it would so care.)

A reputation system is pretty simply implemented
with forwarding posts to various statistics groups
that over time build profiles of authors that
readers may adopt.

Putting an IMAP interface in front of a NNTP gateway
makes it pretty simple to have cross-platform user
interfaces from any IMAP (eg, email) client.

Then, my requirements include backfilling a store
with the groups of interest for implementing summary
and search for archival and research purposes.
Ross A. Finlayson
2017-01-23 02:03:41 UTC
Reply
Permalink
On Sunday, January 22, 2017 at 5:35:57 PM UTC-8, j4n bur53 wrote:
> Something else, what are your hardware specs?
>
> An Inferno on the Head of a Pin
> https://blog.codinghorror.com/an-inferno-on-the-head-of-a-pin/
>
> Ross A. Finlayson schrieb:
> > On Sunday, January 22, 2017 at 12:50:02 PM UTC-8, j4n bur53 wrote:
> >> Try this one maybe:
> >> https://www.discourse.org/about/
> >>
> >> Ross A. Finlayson schrieb:
> >>> On Friday, January 6, 2017 at 1:57:00 PM UTC-8, Ross A. Finlayson wrote:
> >>>> On Friday, December 23, 2016 at 10:21:16 PM UTC-8, Ross A. Finlayson wrote:
> >>>>> About implementing usenet with reasonably
> >>>>> modern runtimes and an eye toward
> >>>>> unlimited retention, basically looking
> >>>>> into "microtasks" for the routine or
> >>>>> workflow instances, as are driven with
> >>>>> non-blocking I/O throughout, basically
> >>>>> looking to memoize the steps as through
> >>>>> a finite state machine, for restarts as
> >>>>> of a thread, then to go from "service
> >>>>> oriented" to "message oriented".
> >
> > No, thanks, that does not appear to meet my requirements.
> >

(About the 2nd law of thermodynamics, Moore's
law, and the copper process with regards to the
cross-talk about the VLSI or "ultra" VLSI or
the epoch these days, and burning bits, what
you might if interest is the development of
the "reversible computing", which basically
recycles the bits, and then also that besides
the usual electronic transistor, and besides that
today there can be free-form 3-D IC's or "custom
logic", instead of just the planar systolic clock-
driven chip, there are also "systems on chip" with
regards to electron, photon, and heat pipes as
about the photo-electic and Seebeck/Peltier,
with various remarkably high efficiency models
of computation, this besides the very novel
serial and parallel computational units and
logical machines afforded by 3-D IC' and optics.

About "reasonably simple declaration of routine
in commodity languages on commodity hardware
for commodity engineers for enduring systems",
at cost, see above.)
Ross A. Finlayson
2017-02-07 08:16:07 UTC
Reply
Permalink
On Saturday, January 21, 2017 at 10:33:23 PM UTC, Ross A. Finlayson wrote:
> On Friday, January 6, 2017 at 1:57:00 PM UTC-8, Ross A. Finlayson wrote:
> > On Friday, December 23, 2016 at 10:21:16 PM UTC-8, Ross A. Finlayson wrote:
> > > About implementing usenet with reasonably
> > > modern runtimes and an eye toward
> > > unlimited retention, basically looking
> > > into "microtasks" for the routine or
> > > workflow instances, as are driven with
> > > non-blocking I/O throughout, basically
> > > looking to memoize the steps as through
> > > a finite state machine, for restarts as
> > > of a thread, then to go from "service
> > > oriented" to "message oriented".
> > >
> > >
> > > This involves writing a bit of an
> > > HTTP client for rather usual web
> > > service calls, but with high speed
> > > non-blocking I/O (less threads, more
> > > connections). Also this involves a
> > > sufficient abstraction.
> >
> >
> >
> > This writing some software for usenet service
> > is coming along with the idea of how to implement
> > the fundamentally asynchronous non-blocking routine.
> > This is crystallizing in pattern as a: re-routine,
> > in reference to computing's usual: co-routine.
> >
> > The idea of the re-routine is that there are only
> > so many workers, threads, of the runtime. The usual
> > runtimes (and this one, Java, say) support preemptive
> > multithreading as a means of implementing cooperative
> > multithreading, with the maintenance of separate stacks
> > (of, the stack machine of usual C-like procedural runtimes)
> > and some thread-per-connection model. This is somewhat
> > reasonable for the composition of blocking APIs, but
> > not so much for the composition of non-blocking APIs
> > and about how to not have many thread-per-connection
> > resources with essentially zero duty cycle that instead
> > could maintain for themselves the state machine of their
> > routine (with simplified forward states and a general
> > exception and error routine), for cooperative multi-threading.
> >
> > The idea of this re-routine then is to connect functions,
> > there's a scope for variables in the scope, there is
> > execution of the functions (or here the routines, as
> > the "re-routines") then the instance of the re-routine
> > is re-entrant in the sense that as partial results are
> > accumulated the trace of the routine is marked out, with
> > leaving in the scope the current or partial or intermediate
> > results. Then, the asynchronous workers that fulfill each
> > routine (eg, with a lookup, a system call, or a network
> > call) are separate worker units dedicated to their domain
> > (of the routine, not the re-routine, and they can be blocking,
> > polling for their fleet, or callback with the ticket).
> >
> > Then, this is basically a network machine and protocol,
> > here about NNTP and IMAP, and its resources are often
> > then of network machines and protocols (eg networked
> > file systems, web services). Then, these "machines"
> > of the "re-routine" being built (basically for the
> > streaming model instead of the batch model if you
> > know what I'm talking about) defining the logical
> > outcomes of the composition of the inputs and the
> > resulting outputs in terms of scopes as a model of
> > the cooperative multithreading, these re-routines
> > then are seeing for the pattern then that the
> > source template is about implicitly establishing
> > the scope and the passing and calling convention
> > (without a bunch of boilerplate or "callback confusion",
> > "async hell"). This is where the re-routine, when
> > a routine worker fills in a partial result and resubmits
> > the re-routine (with the responsibility/ownership of
> > the re-routine) that it is re-evaluated from the beginning,
> > because it is constant linear in reading forward for the
> > item the state of its overall routine, thusly implicit
> > without having to build a state machine, as it is
> > declaratively the routine.
> >
> > So, I am looking at this as my solution as to how to
> > establish a very efficient (in resource and performance
> > terms) formally correct protocol implementation (and
> > with very simple declarative semantics of usual forward,
> > linear routines).
> >
> > This "re-routine" pattern then as a model of cooperative
> > multithreading sees the complexity and work into the
> > catalog of blocking, polling, and callback support,
> > then for usual resource injection of those as all
> > supported with references to usual sequential processes
> > (composition of routine).
>
> I've about sorted out how to implement the re-routine.
>
> Basically a re-routine is a suspendable composite
> operation, with normal declarative flow-of-control
> syntax, that memo-izes its partial results, and
> re-executes the same block of statements then to
> arrive at its pause, completion, or exit.
>
> Then, the command and executor are passed to the
> implementation that has its own (or maybe the
> same) execution resources, eg a thread or connection
> pool. This resolves the value of the asynchronous
> operation, and then re-submits the re-routine to
> its originating executor. The re-routine re-runs
> (it runs through the branching or flow-of-control
> each time, but that's small in the linear and all
> the intermediate products are already computed,
> and the syntax is usual and in the language).
> The re-routine then either re-suspends (as it
> launches the next task) or completes or exits (errors).
> Whether it suspends, completes or exits, the
> re-routine just returns, and the executor then
> is specialized and just checks the re-routine
> whether it's suspended (and just drops it, the
> new responsible launched will re-submit it),
> or whether it's completed or errored (to call
> back to the originating commander the result of
> the command).
>
>
> In this manner, it seems like a neat way to basically
> establish the continuation, for this "non-blocking
> asynchronous operation", while at the same time
> the branching and flow of control is all in the
> language, with the usual un-suprising syntax and
> semantics, for cooperative multi-threading. The
> cost is in wrapping the functional callers of the
> routine and setting up their factories and otherwise
> as via injection (and they can block the calling
> thread, or have their own threads and block, or
> be asynchronous, without changing the definition
> of the routine).
>
> So, having sorted this mostly out, then the usual
> work as of implementing the routines for the protocol
> can so proceed then with a usual notion of a framework
> of support for both the simple declaration of routine
> and the high performance (and low resource usage) of
> the delegation of routine, and support for injection
> for test and environment, and all in the language
> with minimal clutter, no byte-code modification,
> and a ready wrapper for libraries of arbitrary
> run-time characteristic.
>
> This solves some problems.

Not _too_ much progress, has basically seen the adaptation
of this re-routine pattern to the command implementations,
with basically usual linear procedural logic then the
automatic and agnostic composition of the asynchronous
tasks in the usual declarative syntax that then the
pooled (and to be metered) threads are possibly by
design entirely non-blocking and asynchronous, and
possibly by design blocking or otherwise agnostic of
implementation, with then the design of the state
machine of the routine as "eventually consistent"
or forward and making efficient use of the computational
and synchronization resources.

The next part has been about implementing a client "machine"
as complement to the server "machine", where a machine here
is an assembly as it were of threads and executors about the
"reactive" (or functional, event-driven) handling of the
abstract system resources (small pojos, file name, and
linked lists of 4K buffers). The server basically starts
up listening on a port then accepts and starts a session
for any connection and then a reader fills and moves buffers
to each of the sessions of the connections, and signals the
relay then for the scanning of the inputs and then composing
the commands and executing those as these re-routines, that
as they complete, then the results of the commands are then
printed out to buffers (eg, encoded, compressed, encrypted)
then the writer sends that back on the wire. The client
machine then is basically a model of asynchronous and
probably serial computation or a "web service call", these
days often and probably on a pooled HTTP connections. This
then is pretty simple with the callbacks and the addressing/
routing of the response back to the re-routine's executor
to then re-submit the re-routine to completion.

I've been looking at other examples of continuations, the
"reactive" programming or these days' "streaming model"
(where the challenge is much in the aggregations), that
otherwise non-blocking or asynchronous programming is
often rather ... recursively ... rolled out where this
re-routine gains even though the flow-of-control is
re-executed over the memoized contents of the re-routines
as they are so composed declaratively, that this makes
what would be "linear" at worst "n squared", but that is
only on how many commands there are in the procedure,
not combined over their execution because all the
intermediate results are memoized (as needed, because
if the implementation is local or a mock instead, the
re-routine is agnostic of asychronicity and just runs
through linearly, but the relevant point is that the
number of composable units is a small constant thus
that it's square is a small constant, particularly
as otherwise being a free model of cooperative multi-
threading, here toward a lock-free design). All the
live objects remain on the heap, but just the objects
and not for example the stack as a serialized continuation.
(This could work out to singleton literals or "coding"
but basically it will have to auto-throttle off heap-max.)

So, shuffling and juggling the identifiers and organizations
around and sifting and sorting what elements of the standard
concurrency and functional libraries (of, the "Java" language)
to settle on for usual neat and concise (and re-usable and
temporally agnostic) declarative flow-of-control (i.e., with
"Future"'s everywhere and as about reasonable or least-surprising
semantics, if any, with usual and plain code also being "in
the convention"), then it is settling on a style.

Well, thanks for reading, it's a rather stream-of-consciousness
narrative, here about the design of pretty re-usable software.
Julio Di Egidio
2017-02-07 09:05:42 UTC
Reply
Permalink
On Tuesday, February 7, 2017 at 9:16:14 AM UTC+1, Ross A. Finlayson wrote:

> Not _too_ much progress, has basically seen the adaptation
> of this re-routine pattern to the command implementations,

I do not understand what you are trying to achieve here. As long as Usenet the
protocol is fine per se, the technical problem at least is already solved, i.e.
there is plenty of Usenet server software available... OTOH, the "problem with
Usenet" such that one would want to build an entirely new network seems to me
is more of a socio-cybernetic kind, so I'd rather find interesting discussing,
say, the merits but also the limitations of moderation as an approach, and maybe
even what better could be done. But, again, the technical problem is not really
a problem, in fact that is the easy part....

(Also, I do not see why discuss this in sci.math. Maybe comp.ai.philosophy, as
for collective intelligence?)

Julio
Ross A. Finlayson
2017-02-07 19:18:02 UTC
Reply
Permalink
On Tuesday, February 7, 2017 at 9:05:47 AM UTC, Julio Di Egidio wrote:
> On Tuesday, February 7, 2017 at 9:16:14 AM UTC+1, Ross A. Finlayson wrote:
>
> > Not _too_ much progress, has basically seen the adaptation
> > of this re-routine pattern to the command implementations,
>
> I do not understand what you are trying to achieve here. As long as Usenet the
> protocol is fine per se, the technical problem at least is already solved, i.e.
> there is plenty of Usenet server software available... OTOH, the "problem with
> Usenet" such that one would want to build an entirely new network seems to me
> is more of a socio-cybernetic kind, so I'd rather find interesting discussing,
> say, the merits but also the limitations of moderation as an approach, and maybe
> even what better could be done. But, again, the technical problem is not really
> a problem, in fact that is the easy part....
>
> (Also, I do not see why discuss this in sci.math. Maybe comp.ai.philosophy, as
> for collective intelligence?)
>
> Julio

Sure, I'll limit this.

There is plenty of usenet server software, but it is mostly
INND or BNews/CNews, or a few commercial cousins. The design
of those systems is tied to various economies that don't so much
apply these days. (The use-case, of durable distributed message-
passing, is still quite relevant, and there are many ecosystems
and regimes small and large as about it.) In the days of managed
commodity network and compute resources or "cloud computing", here
as above about requirements, then a modernization is relevant, and
for some developers with the skills, not so distant.

Another point is that the eventual goal is archival, my goal isn't
to start an offshoot, instead to build the system as a working
model of an archive, basically from the author's view as a working
store for extracting material, and from the developer's view as
an example in design with low or no required maintenance and
"scalable" operation for a long time.


You mention comp.ai.philosophy, these days there's a lot more
automated reasoning (or, mockingbird generators), as computing
and development affords more and different forms of automated
reasoning, here again the point is for an archival setting to
give them something to read.

Thanks, then, I'll limit this.
Julio Di Egidio
2017-02-09 07:00:20 UTC
Reply
Permalink
On Tuesday, February 7, 2017 at 8:18:07 PM UTC+1, Ross A. Finlayson wrote:

> There is plenty of usenet server software, but it is mostly
> INND or BNews/CNews, or a few commercial cousins.

There is plenty of free and open news server software:
<https://www.dmoz.org/Computers/Software/Internet/Servers/Usenet>
<https://en.wikipedia.org/wiki/News_server>

> Another point is that the eventual goal is archival, my goal isn't
> to start an offshoot, instead to build the system as a working
> model of an archive, basically from the author's view as a working
> store for extracting material,

I'd have qualms as to what the degree-zero is, namely, I'd think more of hyper-
texts hence a Wiki (or, in the larger, the web itself) as the basic structure.
OTOH, Usenet is a conversational model, for discussions, not even forums.

Regardless, even at that most basic level, you already face the fundamental
problem of the "quality" of the content (for some to be properly defined notion
of quality). For one thing, consider that garbage is garbage even under the
best microscope...

> You mention comp.ai.philosophy, these days there's a lot more

I mentioned comp.ai.philosophy partly because I do not have a better reference,
partly because, for how basic you want to keep it (and I am all for building
incrementally), I would think it is only considerations at that level that can
provide the fundamental requirements.

Julio
Ross A. Finlayson
2017-03-21 23:10:09 UTC
Reply
Permalink
On Tuesday, February 7, 2017 at 12:16:14 AM UTC-8, Ross A. Finlayson wrote:
> On Saturday, January 21, 2017 at 10:33:23 PM UTC, Ross A. Finlayson wrote:
> > On Friday, January 6, 2017 at 1:57:00 PM UTC-8, Ross A. Finlayson wrote:
> > > On Friday, December 23, 2016 at 10:21:16 PM UTC-8, Ross A. Finlayson wrote:
> > > > About implementing usenet with reasonably
> > > > modern runtimes and an eye toward
> > > > unlimited retention, basically looking
> > > > into "microtasks" for the routine or
> > > > workflow instances, as are driven with
> > > > non-blocking I/O throughout, basically
> > > > looking to memoize the steps as through
> > > > a finite state machine, for restarts as
> > > > of a thread, then to go from "service
> > > > oriented" to "message oriented".
> > > >
> > > >
> > > > This involves writing a bit of an
> > > > HTTP client for rather usual web
> > > > service calls, but with high speed
> > > > non-blocking I/O (less threads, more
> > > > connections). Also this involves a
> > > > sufficient abstraction.
> > >
> > >
> > >
> > > This writing some software for usenet service
> > > is coming along with the idea of how to implement
> > > the fundamentally asynchronous non-blocking routine.
> > > This is crystallizing in pattern as a: re-routine,
> > > in reference to computing's usual: co-routine.
> > >
> > > The idea of the re-routine is that there are only
> > > so many workers, threads, of the runtime. The usual
> > > runtimes (and this one, Java, say) support preemptive
> > > multithreading as a means of implementing cooperative
> > > multithreading, with the maintenance of separate stacks
> > > (of, the stack machine of usual C-like procedural runtimes)
> > > and some thread-per-connection model. This is somewhat
> > > reasonable for the composition of blocking APIs, but
> > > not so much for the composition of non-blocking APIs
> > > and about how to not have many thread-per-connection
> > > resources with essentially zero duty cycle that instead
> > > could maintain for themselves the state machine of their
> > > routine (with simplified forward states and a general
> > > exception and error routine), for cooperative multi-threading.
> > >
> > > The idea of this re-routine then is to connect functions,
> > > there's a scope for variables in the scope, there is
> > > execution of the functions (or here the routines, as
> > > the "re-routines") then the instance of the re-routine
> > > is re-entrant in the sense that as partial results are
> > > accumulated the trace of the routine is marked out, with
> > > leaving in the scope the current or partial or intermediate
> > > results. Then, the asynchronous workers that fulfill each
> > > routine (eg, with a lookup, a system call, or a network
> > > call) are separate worker units dedicated to their domain
> > > (of the routine, not the re-routine, and they can be blocking,
> > > polling for their fleet, or callback with the ticket).
> > >
> > > Then, this is basically a network machine and protocol,
> > > here about NNTP and IMAP, and its resources are often
> > > then of network machines and protocols (eg networked
> > > file systems, web services). Then, these "machines"
> > > of the "re-routine" being built (basically for the
> > > streaming model instead of the batch model if you
> > > know what I'm talking about) defining the logical
> > > outcomes of the composition of the inputs and the
> > > resulting outputs in terms of scopes as a model of
> > > the cooperative multithreading, these re-routines
> > > then are seeing for the pattern then that the
> > > source template is about implicitly establishing
> > > the scope and the passing and calling convention
> > > (without a bunch of boilerplate or "callback confusion",
> > > "async hell"). This is where the re-routine, when
> > > a routine worker fills in a partial result and resubmits
> > > the re-routine (with the responsibility/ownership of
> > > the re-routine) that it is re-evaluated from the beginning,
> > > because it is constant linear in reading forward for the
> > > item the state of its overall routine, thusly implicit
> > > without having to build a state machine, as it is
> > > declaratively the routine.
> > >
> > > So, I am looking at this as my solution as to how to
> > > establish a very efficient (in resource and performance
> > > terms) formally correct protocol implementation (and
> > > with very simple declarative semantics of usual forward,
> > > linear routines).
> > >
> > > This "re-routine" pattern then as a model of cooperative
> > > multithreading sees the complexity and work into the
> > > catalog of blocking, polling, and callback support,
> > > then for usual resource injection of those as all
> > > supported with references to usual sequential processes
> > > (composition of routine).
> >
> > I've about sorted out how to implement the re-routine.
> >
> > Basically a re-routine is a suspendable composite
> > operation, with normal declarative flow-of-control
> > syntax, that memo-izes its partial results, and
> > re-executes the same block of statements then to
> > arrive at its pause, completion, or exit.
> >
> > Then, the command and executor are passed to the
> > implementation that has its own (or maybe the
> > same) execution resources, eg a thread or connection
> > pool. This resolves the value of the asynchronous
> > operation, and then re-submits the re-routine to
> > its originating executor. The re-routine re-runs
> > (it runs through the branching or flow-of-control
> > each time, but that's small in the linear and all
> > the intermediate products are already computed,
> > and the syntax is usual and in the language).
> > The re-routine then either re-suspends (as it
> > launches the next task) or completes or exits (errors).
> > Whether it suspends, completes or exits, the
> > re-routine just returns, and the executor then
> > is specialized and just checks the re-routine
> > whether it's suspended (and just drops it, the
> > new responsible launched will re-submit it),
> > or whether it's completed or errored (to call
> > back to the originating commander the result of
> > the command).
> >
> >
> > In this manner, it seems like a neat way to basically
> > establish the continuation, for this "non-blocking
> > asynchronous operation", while at the same time
> > the branching and flow of control is all in the
> > language, with the usual un-suprising syntax and
> > semantics, for cooperative multi-threading. The
> > cost is in wrapping the functional callers of the
> > routine and setting up their factories and otherwise
> > as via injection (and they can block the calling
> > thread, or have their own threads and block, or
> > be asynchronous, without changing the definition
> > of the routine).
> >
> > So, having sorted this mostly out, then the usual
> > work as of implementing the routines for the protocol
> > can so proceed then with a usual notion of a framework
> > of support for both the simple declaration of routine
> > and the high performance (and low resource usage) of
> > the delegation of routine, and support for injection
> > for test and environment, and all in the language
> > with minimal clutter, no byte-code modification,
> > and a ready wrapper for libraries of arbitrary
> > run-time characteristic.
> >
> > This solves some problems.
>
> Not _too_ much progress, has basically seen the adaptation
> of this re-routine pattern to the command implementations,
> with basically usual linear procedural logic then the
> automatic and agnostic composition of the asynchronous
> tasks in the usual declarative syntax that then the
> pooled (and to be metered) threads are possibly by
> design entirely non-blocking and asynchronous, and
> possibly by design blocking or otherwise agnostic of
> implementation, with then the design of the state
> machine of the routine as "eventually consistent"
> or forward and making efficient use of the computational
> and synchronization resources.
>
> The next part has been about implementing a client "machine"
> as complement to the server "machine", where a machine here
> is an assembly as it were of threads and executors about the
> "reactive" (or functional, event-driven) handling of the
> abstract system resources (small pojos, file name, and
> linked lists of 4K buffers). The server basically starts
> up listening on a port then accepts and starts a session
> for any connection and then a reader fills and moves buffers
> to each of the sessions of the connections, and signals the
> relay then for the scanning of the inputs and then composing
> the commands and executing those as these re-routines, that
> as they complete, then the results of the commands are then
> printed out to buffers (eg, encoded, compressed, encrypted)
> then the writer sends that back on the wire. The client
> machine then is basically a model of asynchronous and
> probably serial computation or a "web service call", these
> days often and probably on a pooled HTTP connections. This
> then is pretty simple with the callbacks and the addressing/
> routing of the response back to the re-routine's executor
> to then re-submit the re-routine to completion.
>
> I've been looking at other examples of continuations, the
> "reactive" programming or these days' "streaming model"
> (where the challenge is much in the aggregations), that
> otherwise non-blocking or asynchronous programming is
> often rather ... recursively ... rolled out where this
> re-routine gains even though the flow-of-control is
> re-executed over the memoized contents of the re-routines
> as they are so composed declaratively, that this makes
> what would be "linear" at worst "n squared", but that is
> only on how many commands there are in the procedure,
> not combined over their execution because all the
> intermediate results are memoized (as needed, because
> if the implementation is local or a mock instead, the
> re-routine is agnostic of asychronicity and just runs
> through linearly, but the relevant point is that the
> number of composable units is a small constant thus
> that it's square is a small constant, particularly
> as otherwise being a free model of cooperative multi-
> threading, here toward a lock-free design). All the
> live objects remain on the heap, but just the objects
> and not for example the stack as a serialized continuation.
> (This could work out to singleton literals or "coding"
> but basically it will have to auto-throttle off heap-max.)
>
> So, shuffling and juggling the identifiers and organizations
> around and sifting and sorting what elements of the standard
> concurrency and functional libraries (of, the "Java" language)
> to settle on for usual neat and concise (and re-usable and
> temporally agnostic) declarative flow-of-control (i.e., with
> "Future"'s everywhere and as about reasonable or least-surprising
> semantics, if any, with usual and plain code also being "in
> the convention"), then it is settling on a style.
>
> Well, thanks for reading, it's a rather stream-of-consciousness
> narrative, here about the design of pretty re-usable software.

I continued tapping away at this.

The re-routines now sit beyond a module or domain definition.
This basically defines the modules' value types like session,
message, article, group, content, wildmat. Then, it also
defines a service layer, as about the relations of the elements
of the domain, so that then the otherwise simple value types
have natural methods as relate them, all implemented behind
a service layer, that implemented with these re-routines is
agnostic of synchronous or asynchronous convention, and
is non-blocking throughout with cooperative multithreading.
This has a factory of factories or industry pattern that provides
the object graph wiring and dynamic proxying to the routine
implementations, that are then defined as traits, that the re-
routine composes the routines as mixins (of the domain's
services).

(This is all "in the language" in Java, with no external dependencies.)

The transport mechanism is basically having abstracted the
attachment for a usual non-blocking I/O framework for the
transport types as of the scattering/gathering or vector I/O
as about then the interface between transport and protocol
(here NNTP, but, generally). Basically in a land of 4K byte buffers,
then those are fed from the Reader/Writer that is the endpoint to
a Feeder/Scanner that is implemented for the protocol and usual
features like encryption and compression, then making Commands
and Results out of those (and modelling transactions or command
sequences as state machines which are otherwise absent), those
systolically carrying out as primitive or transport types to a Printer/
Hopper, that also writes the response (or rather, consumes the buffers
in a highly concurrent highly efficient event and selection hammering).
The selector is another bounded resource, so it's configurable the
SelectorAssignment and there might be a thread for each group of
selectors about FD_SETSIZE, but that's not really at issue as select
went to epoll, but provides an option for that eventuality.

The transport and protocol routines are pretty well decoupled this
way, and then the protocol domain, modules, and routines are as
well so decoupled (and fall together pretty naturally), much using
quite usual software design patterns (if not necessarily so formally,
quite directly).

The protocol then (here NNTP) then is basically in a few files detailing
the semantics of the commands to the scanner as overriding methods
of a Command class, and implementing the action in the domain from
extending the TraitedReRoutine then for a single definition in the NNTP
domain that is implemented in various modules or as collections of services.
Ross A. Finlayson
2017-04-10 03:20:45 UTC
Reply
Permalink
On Tuesday, March 21, 2017 at 4:10:21 PM UTC-7, Ross A. Finlayson wrote:
> On Tuesday, February 7, 2017 at 12:16:14 AM UTC-8, Ross A. Finlayson wrote:
> > On Saturday, January 21, 2017 at 10:33:23 PM UTC, Ross A. Finlayson wrote:
> > > On Friday, January 6, 2017 at 1:57:00 PM UTC-8, Ross A. Finlayson wrote:
> > > > On Friday, December 23, 2016 at 10:21:16 PM UTC-8, Ross A. Finlayson wrote:
> > > > > About implementing usenet with reasonably
> > > > > modern runtimes and an eye toward
> > > > > unlimited retention, basically looking
> > > > > into "microtasks" for the routine or
> > > > > workflow instances, as are driven with
> > > > > non-blocking I/O throughout, basically
> > > > > looking to memoize the steps as through
> > > > > a finite state machine, for restarts as
> > > > > of a thread, then to go from "service
> > > > > oriented" to "message oriented".
> > > > >
> > > > >
> > > > > This involves writing a bit of an
> > > > > HTTP client for rather usual web
> > > > > service calls, but with high speed
> > > > > non-blocking I/O (less threads, more
> > > > > connections). Also this involves a
> > > > > sufficient abstraction.
> > > >
> > > >
> > > >
> > > > This writing some software for usenet service
> > > > is coming along with the idea of how to implement
> > > > the fundamentally asynchronous non-blocking routine.
> > > > This is crystallizing in pattern as a: re-routine,
> > > > in reference to computing's usual: co-routine.
> > > >
> > > > The idea of the re-routine is that there are only
> > > > so many workers, threads, of the runtime. The usual
> > > > runtimes (and this one, Java, say) support preemptive
> > > > multithreading as a means of implementing cooperative
> > > > multithreading, with the maintenance of separate stacks
> > > > (of, the stack machine of usual C-like procedural runtimes)
> > > > and some thread-per-connection model. This is somewhat
> > > > reasonable for the composition of blocking APIs, but
> > > > not so much for the composition of non-blocking APIs
> > > > and about how to not have many thread-per-connection
> > > > resources with essentially zero duty cycle that instead
> > > > could maintain for themselves the state machine of their
> > > > routine (with simplified forward states and a general
> > > > exception and error routine), for cooperative multi-threading.
> > > >
> > > > The idea of this re-routine then is to connect functions,
> > > > there's a scope for variables in the scope, there is
> > > > execution of the functions (or here the routines, as
> > > > the "re-routines") then the instance of the re-routine
> > > > is re-entrant in the sense that as partial results are
> > > > accumulated the trace of the routine is marked out, with
> > > > leaving in the scope the current or partial or intermediate
> > > > results. Then, the asynchronous workers that fulfill each
> > > > routine (eg, with a lookup, a system call, or a network
> > > > call) are separate worker units dedicated to their domain
> > > > (of the routine, not the re-routine, and they can be blocking,
> > > > polling for their fleet, or callback with the ticket).
> > > >
> > > > Then, this is basically a network machine and protocol,
> > > > here about NNTP and IMAP, and its resources are often
> > > > then of network machines and protocols (eg networked
> > > > file systems, web services). Then, these "machines"
> > > > of the "re-routine" being built (basically for the
> > > > streaming model instead of the batch model if you
> > > > know what I'm talking about) defining the logical
> > > > outcomes of the composition of the inputs and the
> > > > resulting outputs in terms of scopes as a model of
> > > > the cooperative multithreading, these re-routines
> > > > then are seeing for the pattern then that the
> > > > source template is about implicitly establishing
> > > > the scope and the passing and calling convention
> > > > (without a bunch of boilerplate or "callback confusion",
> > > > "async hell"). This is where the re-routine, when
> > > > a routine worker fills in a partial result and resubmits
> > > > the re-routine (with the responsibility/ownership of
> > > > the re-routine) that it is re-evaluated from the beginning,
> > > > because it is constant linear in reading forward for the
> > > > item the state of its overall routine, thusly implicit
> > > > without having to build a state machine, as it is
> > > > declaratively the routine.
> > > >
> > > > So, I am looking at this as my solution as to how to
> > > > establish a very efficient (in resource and performance
> > > > terms) formally correct protocol implementation (and
> > > > with very simple declarative semantics of usual forward,
> > > > linear routines).
> > > >
> > > > This "re-routine" pattern then as a model of cooperative
> > > > multithreading sees the complexity and work into the
> > > > catalog of blocking, polling, and callback support,
> > > > then for usual resource injection of those as all
> > > > supported with references to usual sequential processes
> > > > (composition of routine).
> > >
> > > I've about sorted out how to implement the re-routine.
> > >
> > > Basically a re-routine is a suspendable composite
> > > operation, with normal declarative flow-of-control
> > > syntax, that memo-izes its partial results, and
> > > re-executes the same block of statements then to
> > > arrive at its pause, completion, or exit.
> > >
> > > Then, the command and executor are passed to the
> > > implementation that has its own (or maybe the
> > > same) execution resources, eg a thread or connection
> > > pool. This resolves the value of the asynchronous
> > > operation, and then re-submits the re-routine to
> > > its originating executor. The re-routine re-runs
> > > (it runs through the branching or flow-of-control
> > > each time, but that's small in the linear and all
> > > the intermediate products are already computed,
> > > and the syntax is usual and in the language).
> > > The re-routine then either re-suspends (as it
> > > launches the next task) or completes or exits (errors).
> > > Whether it suspends, completes or exits, the
> > > re-routine just returns, and the executor then
> > > is specialized and just checks the re-routine
> > > whether it's suspended (and just drops it, the
> > > new responsible launched will re-submit it),
> > > or whether it's completed or errored (to call
> > > back to the originating commander the result of
> > > the command).
> > >
> > >
> > > In this manner, it seems like a neat way to basically
> > > establish the continuation, for this "non-blocking
> > > asynchronous operation", while at the same time
> > > the branching and flow of control is all in the
> > > language, with the usual un-suprising syntax and
> > > semantics, for cooperative multi-threading. The
> > > cost is in wrapping the functional callers of the
> > > routine and setting up their factories and otherwise
> > > as via injection (and they can block the calling
> > > thread, or have their own threads and block, or
> > > be asynchronous, without changing the definition
> > > of the routine).
> > >
> > > So, having sorted this mostly out, then the usual
> > > work as of implementing the routines for the protocol
> > > can so proceed then with a usual notion of a framework
> > > of support for both the simple declaration of routine
> > > and the high performance (and low resource usage) of
> > > the delegation of routine, and support for injection
> > > for test and environment, and all in the language
> > > with minimal clutter, no byte-code modification,
> > > and a ready wrapper for libraries of arbitrary
> > > run-time characteristic.
> > >
> > > This solves some problems.
> >
> > Not _too_ much progress, has basically seen the adaptation
> > of this re-routine pattern to the command implementations,
> > with basically usual linear procedural logic then the
> > automatic and agnostic composition of the asynchronous
> > tasks in the usual declarative syntax that then the
> > pooled (and to be metered) threads are possibly by
> > design entirely non-blocking and asynchronous, and
> > possibly by design blocking or otherwise agnostic of
> > implementation, with then the design of the state
> > machine of the routine as "eventually consistent"
> > or forward and making efficient use of the computational
> > and synchronization resources.
> >
> > The next part has been about implementing a client "machine"
> > as complement to the server "machine", where a machine here
> > is an assembly as it were of threads and executors about the
> > "reactive" (or functional, event-driven) handling of the
> > abstract system resources (small pojos, file name, and
> > linked lists of 4K buffers). The server basically starts
> > up listening on a port then accepts and starts a session
> > for any connection and then a reader fills and moves buffers
> > to each of the sessions of the connections, and signals the
> > relay then for the scanning of the inputs and then composing
> > the commands and executing those as these re-routines, that
> > as they complete, then the results of the commands are then
> > printed out to buffers (eg, encoded, compressed, encrypted)
> > then the writer sends that back on the wire. The client
> > machine then is basically a model of asynchronous and
> > probably serial computation or a "web service call", these
> > days often and probably on a pooled HTTP connections. This
> > then is pretty simple with the callbacks and the addressing/
> > routing of the response back to the re-routine's executor
> > to then re-submit the re-routine to completion.
> >
> > I've been looking at other examples of continuations, the
> > "reactive" programming or these days' "streaming model"
> > (where the challenge is much in the aggregations), that
> > otherwise non-blocking or asynchronous programming is
> > often rather ... recursively ... rolled out where this
> > re-routine gains even though the flow-of-control is
> > re-executed over the memoized contents of the re-routines
> > as they are so composed declaratively, that this makes
> > what would be "linear" at worst "n squared", but that is
> > only on how many commands there are in the procedure,
> > not combined over their execution because all the
> > intermediate results are memoized (as needed, because
> > if the implementation is local or a mock instead, the
> > re-routine is agnostic of asychronicity and just runs
> > through linearly, but the relevant point is that the
> > number of composable units is a small constant thus
> > that it's square is a small constant, particularly
> > as otherwise being a free model of cooperative multi-
> > threading, here toward a lock-free design). All the
> > live objects remain on the heap, but just the objects
> > and not for example the stack as a serialized continuation.
> > (This could work out to singleton literals or "coding"
> > but basically it will have to auto-throttle off heap-max.)
> >
> > So, shuffling and juggling the identifiers and organizations
> > around and sifting and sorting what elements of the standard
> > concurrency and functional libraries (of, the "Java" language)
> > to settle on for usual neat and concise (and re-usable and
> > temporally agnostic) declarative flow-of-control (i.e., with
> > "Future"'s everywhere and as about reasonable or least-surprising
> > semantics, if any, with usual and plain code also being "in
> > the convention"), then it is settling on a style.
> >
> > Well, thanks for reading, it's a rather stream-of-consciousness
> > narrative, here about the design of pretty re-usable software.
>
> I continued tapping away at this.
>
> The re-routines now sit beyond a module or domain definition.
> This basically defines the modules' value types like session,
> message, article, group, content, wildmat. Then, it also
> defines a service layer, as about the relations of the elements
> of the domain, so that then the otherwise simple value types
> have natural methods as relate them, all implemented behind
> a service layer, that implemented with these re-routines is
> agnostic of synchronous or asynchronous convention, and
> is non-blocking throughout with cooperative multithreading.
> This has a factory of factories or industry pattern that provides
> the object graph wiring and dynamic proxying to the routine
> implementations, that are then defined as traits, that the re-
> routine composes the routines as mixins (of the domain's
> services).
>
> (This is all "in the language" in Java, with no external dependencies.)
>
> The transport mechanism is basically having abstracted the
> attachment for a usual non-blocking I/O framework for the
> transport types as of the scattering/gathering or vector I/O
> as about then the interface between transport and protocol
> (here NNTP, but, generally). Basically in a land of 4K byte buffers,
> then those are fed from the Reader/Writer that is the endpoint to
> a Feeder/Scanner that is implemented for the protocol and usual
> features like encryption and compression, then making Commands
> and Results out of those (and modelling transactions or command
> sequences as state machines which are otherwise absent), those
> systolically carrying out as primitive or transport types to a Printer/
> Hopper, that also writes the response (or rather, consumes the buffers
> in a highly concurrent highly efficient event and selection hammering).
> The selector is another bounded resource, so it's configurable the
> SelectorAssignment and there might be a thread for each group of
> selectors about FD_SETSIZE, but that's not really at issue as select
> went to epoll, but provides an option for that eventuality.
>
> The transport and protocol routines are pretty well decoupled this
> way, and then the protocol domain, modules, and routines are as
> well so decoupled (and fall together pretty naturally), much using
> quite usual software design patterns (if not necessarily so formally,
> quite directly).
>
> The protocol then (here NNTP) then is basically in a few files detailing
> the semantics of the commands to the scanner as overriding methods
> of a Command class, and implementing the action in the domain from
> extending the TraitedReRoutine then for a single definition in the NNTP
> domain that is implemented in various modules or as collections of services.


I'm still tapping away at this if rather more slowly (or, more sporadically).

The "re-routine" async completion pattern is more than less
figured out (toward high concurrency as a model of cooperative
multi-threading, behind also a pattern of a domain layer, with mix-in
nyms that is also some factory logic), a simple non-blocking I/O socket
service routine is more than less figured out (the server not the client,
toward again high concurrency and flexible and efficient use of machine
or virtualized resources as they are), the commands and their bodies are
pretty much typed up, then I've been trying to figure out some data
structures basically in I/O (Input/Output), or here mostly throughput
as it is about the streams.

I/O datum FIFOs and holders:

buffer queue
handles queue
buffer+handles queue
buffer/buffer[] or buffer[]/buffer in loops
byte[]/byte[] in steps
Input/Output in Streams

Basically any of the filters or adapters is specialized to these input/output
data holders. Then, there are logically enough queues or FIFOs as there are
really implicitly between any communicating sequential processes that are
rate-limited or otherwise non-systolic ("real-time"), here for some ideas about
data structures, as either implement or adapt unbounded single producer/
single consumer (SPSC) queues.

One idea is the making the linked container with then sentinel nodes
and otherwise making it thread-safe (for a single producer and single
consumer). This is where the queue (or, "monohydra" or "slique") is
rather generally a container, and that here iterations are usually
consuming the queue, but sometimes there are aggregates collected
then to go over the queue. The idea then is that the producer and
consumer have separate views of the queue that the producer does
atomic swap on the tail of the queue and that a consumer's iterator
of elements (as iterable and not just a queue, for using the queue as
a holder and not just a FIFO) returns a marker to the end of the iteration,
for example in computing bounds over the buffers then re-iterating and
flipping the buffers then given the bounds moving the buffers' references
to an output array thus consuming the FIFO.

This then combines with the tasks that the tasks driving the I/O (as events
drive the tasks) are basically constant tasks or runnables (constant to the
session or attachment) that just have incremented a count of times to run
thus that there's always a service of the FIFO after the atomic append.

Another idea is this hybrid or serial mix-and-match (SPSC FIFO), of buffers
and handles. This is where the buffer in the data in-line, the handle is a
reference to the data. This is about passing through the handles where
the channels support their transfer, and converting them to inline data
where they don't. That's then about all the combined cases as the above
I/O datum FIFOs and holders, with adapting them so the filter chain blasts
(eg specialized operation), loops (transferring in and out of buffers), steps
(statefully filling and levelling data), or moves (copying the references, the
data in or out or on or off, then to perform the I/O operations) over them.

It seems rather simpler to just adapt the data types to the boundary I/O data
types which are byte buffers (here size-4K pooled memory buffers) and for
that the domain shouldn't know concrete types so much as interfaces, but
the buffers and handles (file handles) and arrays as they are are pretty much
fungible to the serialization of the elements of the domain, that can then
specialize how they build logical inputs and outputs of the commands.
b***@gmail.com
2017-04-10 12:17:55 UTC
Reply
Permalink
You could use camel.

Camel is a rule-based routing and mediation engine that provides a
object-based implementation of the Enterprise Integration Patterns
using an application programming interface (or declarative domain-
specific language) to configure routing and mediation rules.

Its name is derived from the camel humps, since the pakets
might take flippy-floppy routes. It also provides automatic
integration of the Gamma Functions, so that Archies post could
be automatically verified whether he

computes the factorial correctly.

Am Montag, 10. April 2017 05:20:50 UTC+2 schrieb Ross A. Finlayson:
> On Tuesday, March 21, 2017 at 4:10:21 PM UTC-7, Ross A. Finlayson wrote:
Ross A. Finlayson
2017-05-19 05:40:16 UTC
Reply
Permalink
On Monday, April 10, 2017 at 5:18:09 AM UTC-7, ***@gmail.com wrote:
> You could use camel.
>
> Camel is a rule-based routing and mediation engine that provides a
> object-based implementation of the Enterprise Integration Patterns
> using an application programming interface (or declarative domain-
> specific language) to configure routing and mediation rules.
>
> Its name is derived from the camel humps, since the pakets
> might take flippy-floppy routes. It also provides automatic
> integration of the Gamma Functions, so that Archies post could
> be automatically verified whether he
>
> computes the factorial correctly.
>
> Am Montag, 10. April 2017 05:20:50 UTC+2 schrieb Ross A. Finlayson:
> > On Tuesday, March 21, 2017 at 4:10:21 PM UTC-7, Ross A. Finlayson wrote:

I haven't much worked on this.
b***@gmail.com
2017-07-16 19:35:04 UTC
Reply
Permalink
Doing an NNTP server could be complicated, on the other
hand it seems to be specified what to do, you might not
need to invent much by yourself. You could start with a
WILDMAT matcher, the rest doesn't look so difficult:

[C] NEWNEWS news.*,sci.* 19990624 000000 GMT
[S] 230 list of new articles by message-id follows
[S] <***@example.com>
[S] <***@example.com>
[S] .

Except for the storage & concurrency problem...
https://tools.ietf.org/html/rfc3977#section-7.4
But the vocabulary of the server should be reflected
somewhere, so that we can do the next step:

Call by Meaning Hesam Samimi, Chris Deaton,
Yoshiki Ohshima, Allesandro Warth, Todd Millstein
VPRI Technical Report TR-2014-003
http://www.vpri.org/pdf/tr2014003_callbymeaning.pdf

The goal would be to have a server that we could
ask "when did BKK the last time hold a torch for
Pythagoras" or ask "is it the first time that AP
reinvents Maxwell equations". Etc...

Am Freitag, 19. Mai 2017 07:40:27 UTC+2 schrieb Ross A. Finlayson:
> I haven't much worked on this.
Ross A. Finlayson
2017-07-16 22:10:29 UTC
Reply
Permalink
On Sunday, July 16, 2017 at 12:35:17 PM UTC-7, ***@gmail.com wrote:
> Doing an NNTP server could be complicated, on the other
> hand it seems to be specified what to do, you might not
> need to invent much by yourself. You could start with a
> WILDMAT matcher, the rest doesn't look so difficult:
>
> [C] NEWNEWS news.*,sci.* 19990624 000000 GMT
> [S] 230 list of new articles by message-id follows
> [S] <***@example.com>
> [S] <***@example.com>
> [S] .
>
> Except for the storage & concurrency problem...
> https://tools.ietf.org/html/rfc3977#section-7.4
> But the vocabulary of the server should be reflected
> somewhere, so that we can do the next step:
>
> Call by Meaning Hesam Samimi, Chris Deaton,
> Yoshiki Ohshima, Allesandro Warth, Todd Millstein
> VPRI Technical Report TR-2014-003
> http://www.vpri.org/pdf/tr2014003_callbymeaning.pdf
>
> The goal would be to have a server that we could
> ask "when did BKK the last time hold a torch for
> Pythagoras" or ask "is it the first time that AP
> reinvents Maxwell equations". Etc...
>
> Am Freitag, 19. Mai 2017 07:40:27 UTC+2 schrieb Ross A. Finlayson:
> > I haven't much worked on this.

Implementing search is rather a challenge.

Besides accepter/rejector and usual notions of matching
(eg the superscalar on closed categories), find and query
seems for where besides usual notions of object hashes
as indices that there is to be built up from the accepter/
rejector all sorts of indices as do/don't/don't-matter the
machines of the accepters and rejectors, vis-a-vis going
over input data and the corpus and finding relations (to
the input, or here space of inputs), of the corpus.

That's where, after finding an event for AP, whether
you're interested in the next for him or the first
for someone else. There are quite various ways to
achieve those quite various goals, besides computing
the first goal. Just as an example that's, for example,
the first reasonable AP Maxwell equation (or reference)
or for everybody else, like, who knows about the Maxwell
equation(s).

Search is a challenge, NNTP rather puts it off to IMAP first
for free text search, then for the concept search or
"call by meaning" you reference, basically refining
estimates of the scope of what it takes to find out
what that is.

Then for events in time-series data there's a usual general
model for things as they occur. That could be rather
rich and where causal is separate from associative
(though of course casuality is associative).

With the idea of NNTP as a corpus, then a usual line
for establishing tractability of search is to associate
its contents some document then semantic model i.e.,
then to generate and maintain that besides otherwise
that the individual items or posts and their references
in the meta-data besides the data are made tractable
then for general ideas of things.

I'm to get to this, the re-routine particularly amuses
me as a programming idiom in the design of more-or-less
detached service routine from the corpus, then about
what body of data so more-than-less naturally results,
with rather default and usual semantics.


Such "natural language" meaning as can be compiled for
efficiency to the very direct in storage and reference,
almost then asks "what will AP come up with, next".
b***@gmail.com
2017-07-16 22:24:50 UTC
Reply
Permalink
The search could be a nice benchmark for next
generation commodity i9 CPUs with 20 Logical Cores:
http://ark.intel.com/products/123613

The key for search is a nice index, I recently
experimented with a n-gram index, I did only trigrams.
If your text contains foobar, you make an inverted
index that lists your document for the following keys

foo
oob
oba
bar
ar
r

When you search foobar, you lookup "foo" and "bar". When
you search a pattern like *ooba* you lookup "oob" and "a".
Works quite well. I dunno what Elasticsearch, Solr, etc..

exactly do, they are open source, but the stuff is still
obsfuscated for me. But they are quite popular. They might
do the same, i.e. an index that doesn't need word boundaries,
and that works similarly for chinese and german, and could

also query mathematical symbols etc.. Some confused marketing
guys recently called these text indexes already databases:
"Elasticsearch moved into the top 10 most popular database
management systems":
https://db-engines.com/en/blog_post/70

RF, I noticed you mentioned search in some of your past
posts about a server yourself.

Am Montag, 17. Juli 2017 00:12:38 UTC+2 schrieb Ross A. Finlayson:
> On Sunday, July 16, 2017 at 12:35:17 PM UTC-7, ***@gmail.com wrote:
> > Doing an NNTP server could be complicated, on the other
> > hand it seems to be specified what to do, you might not
> > need to invent much by yourself. You could start with a
> > WILDMAT matcher, the rest doesn't look so difficult:
> >
> > [C] NEWNEWS news.*,sci.* 19990624 000000 GMT
> > [S] 230 list of new articles by message-id follows
> > [S] <***@example.com>
> > [S] <***@example.com>
> > [S] .
> >
> > Except for the storage & concurrency problem...
> > https://tools.ietf.org/html/rfc3977#section-7.4
> > But the vocabulary of the server should be reflected
> > somewhere, so that we can do the next step:
> >
> > Call by Meaning Hesam Samimi, Chris Deaton,
> > Yoshiki Ohshima, Allesandro Warth, Todd Millstein
> > VPRI Technical Report TR-2014-003
> > http://www.vpri.org/pdf/tr2014003_callbymeaning.pdf
> >
> > The goal would be to have a server that we could
> > ask "when did BKK the last time hold a torch for
> > Pythagoras" or ask "is it the first time that AP
> > reinvents Maxwell equations". Etc...
> >
> > Am Freitag, 19. Mai 2017 07:40:27 UTC+2 schrieb Ross A. Finlayson:
> > > I haven't much worked on this.
>
> Implementing search is rather a challenge.
>
> Besides accepter/rejector and usual notions of matching
> (eg the superscalar on closed categories), find and query
> seems for where besides usual notions of object hashes
> as indices that there is to be built up from the accepter/
> rejector all sorts of indices as do/don't/don't-matter the
> machines of the accepters and rejectors, vis-a-vis going
> over input data and the corpus and finding relations (to
> the input, or here space of inputs), of the corpus.
>
> That's where, after finding an event for AP, whether
> you're interested in the next for him or the first
> for someone else. There are quite various ways to
> achieve those quite various goals, besides computing
> the first goal. Just as an example that's, for example,
> the first reasonable AP Maxwell equation (or reference)
> or for everybody else, like, who knows about the Maxwell
> equation(s).
>
> Search is a challenge, NNTP rather puts it off to IMAP first
> for free text search, then for the concept search or
> "call by meaning" you reference, basically refining
> estimates of the scope of what it takes to find out
> what that is.
>
> Then for events in time-series data there's a usual general
> model for things as they occur. That could be rather
> rich and where causal is separate from associative
> (though of course casuality is associative).
>
> With the idea of NNTP as a corpus, then a usual line
> for establishing tractability of search is to associate
> its contents some document then semantic model i.e.,
> then to generate and maintain that besides otherwise
> that the individual items or posts and their references
> in the meta-data besides the data are made tractable
> then for general ideas of things.
>
> I'm to get to this, the re-routine particularly amuses
> me as a programming idiom in the design of more-or-less
> detached service routine from the corpus, then about
> what body of data so more-than-less naturally results,
> with rather default and usual semantics.
>
>
> Such "natural language" meaning as can be compiled for
> efficiency to the very direct in storage and reference,
> almost then asks "what will AP come up with, next".
Ross A. Finlayson
2020-06-30 04:24:46 UTC
Reply
Permalink
On Sunday, July 16, 2017 at 3:25:00 PM UTC-7, Mostowski Collapse wrote:
> The search could be a nice benchmark for next
> generation commodity i9 CPUs with 20 Logical Cores:
> http://ark.intel.com/products/123613
>
> The key for search is a nice index, I recently
> experimented with a n-gram index, I did only trigrams.
> If your text contains foobar, you make an inverted
> index that lists your document for the following keys
>
> foo
> oob
> oba
> bar
> ar
> r
>
> When you search foobar, you lookup "foo" and "bar". When
> you search a pattern like *ooba* you lookup "oob" and "a".
> Works quite well. I dunno what Elasticsearch, Solr, etc..
>
> exactly do, they are open source, but the stuff is still
> obsfuscated for me. But they are quite popular. They might
> do the same, i.e. an index that doesn't need word boundaries,
> and that works similarly for chinese and german, and could
>
> also query mathematical symbols etc.. Some confused marketing
> guys recently called these text indexes already databases:
> "Elasticsearch moved into the top 10 most popular database
> management systems":
> https://db-engines.com/en/blog_post/70
>
> RF, I noticed you mentioned search in some of your past
> posts about a server yourself.
>
> Am Montag, 17. Juli 2017 00:12:38 UTC+2 schrieb Ross A. Finlayson:
> > On Sunday, July 16, 2017 at 12:35:17 PM UTC-7, ***@gmail.com wrote:
> > > Doing an NNTP server could be complicated, on the other
> > > hand it seems to be specified what to do, you might not
> > > need to invent much by yourself. You could start with a
> > > WILDMAT matcher, the rest doesn't look so difficult:
> > >
> > > [C] NEWNEWS news.*,sci.* 19990624 000000 GMT
> > > [S] 230 list of new articles by message-id follows
> > > [S] <***@example.com>
> > > [S] <***@example.com>
> > > [S] .
> > >
> > > Except for the storage & concurrency problem...
> > > https://tools.ietf.org/html/rfc3977#section-7.4
> > > But the vocabulary of the server should be reflected
> > > somewhere, so that we can do the next step:
> > >
> > > Call by Meaning Hesam Samimi, Chris Deaton,
> > > Yoshiki Ohshima, Allesandro Warth, Todd Millstein
> > > VPRI Technical Report TR-2014-003
> > > http://www.vpri.org/pdf/tr2014003_callbymeaning.pdf
> > >
> > > The goal would be to have a server that we could
> > > ask "when did BKK the last time hold a torch for
> > > Pythagoras" or ask "is it the first time that AP
> > > reinvents Maxwell equations". Etc...
> > >
> > > Am Freitag, 19. Mai 2017 07:40:27 UTC+2 schrieb Ross A. Finlayson:
> > > > I haven't much worked on this.
> >
> > Implementing search is rather a challenge.
> >
> > Besides accepter/rejector and usual notions of matching
> > (eg the superscalar on closed categories), find and query
> > seems for where besides usual notions of object hashes
> > as indices that there is to be built up from the accepter/
> > rejector all sorts of indices as do/don't/don't-matter the
> > machines of the accepters and rejectors, vis-a-vis going
> > over input data and the corpus and finding relations (to
> > the input, or here space of inputs), of the corpus.
> >
> > That's where, after finding an event for AP, whether
> > you're interested in the next for him or the first
> > for someone else. There are quite various ways to
> > achieve those quite various goals, besides computing
> > the first goal. Just as an example that's, for example,
> > the first reasonable AP Maxwell equation (or reference)
> > or for everybody else, like, who knows about the Maxwell
> > equation(s).
> >
> > Search is a challenge, NNTP rather puts it off to IMAP first
> > for free text search, then for the concept search or
> > "call by meaning" you reference, basically refining
> > estimates of the scope of what it takes to find out
> > what that is.
> >
> > Then for events in time-series data there's a usual general
> > model for things as they occur. That could be rather
> > rich and where causal is separate from associative
> > (though of course casuality is associative).
> >
> > With the idea of NNTP as a corpus, then a usual line
> > for establishing tractability of search is to associate
> > its contents some document then semantic model i.e.,
> > then to generate and maintain that besides otherwise
> > that the individual items or posts and their references
> > in the meta-data besides the data are made tractable
> > then for general ideas of things.
> >
> > I'm to get to this, the re-routine particularly amuses
> > me as a programming idiom in the design of more-or-less
> > detached service routine from the corpus, then about
> > what body of data so more-than-less naturally results,
> > with rather default and usual semantics.
> >
> >
> > Such "natural language" meaning as can be compiled for
> > efficiency to the very direct in storage and reference,
> > almost then asks "what will AP come up with, next".

I haven't much worked on this. The idea of the industry
pattern and for the re-routine makes for quite a bit simply
the modules in memory or distributed and a default free-threaded
machine.

Search you mentioned and for example HTTP is adding the SEARCH verb,
for example simple associative conditions that naturally only combine,
and run in parallel, there are of course any number of whatever is the
HTTP SEARCH implementations one might consider, here usenet's is
rudimentary where for example IMAP over it is improved, what for
contextual search and content representation.

Information retrieval and pattern recognition and all that is
plenty huge, here that terms define the corpus.

My implementation of the high-performance selector routine,
the networking I/O selector, with this slique I implemented,
runs up and fine and great up to thousands of connections,
but, it seems like running the standard I/O and non-blocking
I/O in the same actual container, makes that I implemented
the selecting hammering non-blocking I/O toward the 10KC,
though it is is small blocks because here the messages are
small, then for under what conditions it runs server class.

With the non-blocking networking I/O, the scanning and parsing
that assembles messages off the I/O, and that's after compression
and encryption in the layers, that it's implemented in Java and
Java does that, then inside that all the commands in the protocol
then have their implementations in the re-routine, that all
non-blocking itself and free-threaded, makes sense for
co-operative multithreading, of an efficient server runtime
with here the notion of a durable back-end (or running in memory).
Mostowski Collapse
2020-06-30 17:00:44 UTC
Reply
Permalink
NNTP is not HTTP. I was using bare metal access to
usenet, not using Google group, via:

news.albasani.net, unfortunately dead since Corona

So was looking for an alternative. And found this
alternative, which seems fine:

news.solani.org

Have Fun!

P.S.: Technical spec of news.solani.org:

Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 500 GB SSD RAID1
Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 2 TB HDD RAID1
Intel Xeon VM, 1.8 GHz, 1 GB RAM, 20 GB
Standort: 2x Falkenstein, 1x New York

advantage of bare metal usenet,
you see all headers of message.

Am Dienstag, 30. Juni 2020 06:24:53 UTC+2 schrieb Ross A. Finlayson:
> Search you mentioned and for example HTTP is adding the SEARCH verb,
Mostowski Collapse
2020-06-30 17:05:42 UTC
Reply
Permalink
And you dont have posting maximums
and wait times dictated by google.

Am Dienstag, 30. Juni 2020 19:00:52 UTC+2 schrieb Mostowski Collapse:
> NNTP is not HTTP. I was using bare metal access to
> usenet, not using Google group, via:
>
> news.albasani.net, unfortunately dead since Corona
>
> So was looking for an alternative. And found this
> alternative, which seems fine:
>
> news.solani.org
>
> Have Fun!
>
> P.S.: Technical spec of news.solani.org:
>
> Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 500 GB SSD RAID1
> Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 2 TB HDD RAID1
> Intel Xeon VM, 1.8 GHz, 1 GB RAM, 20 GB
> Standort: 2x Falkenstein, 1x New York
>
> advantage of bare metal usenet,
> you see all headers of message.
>
> Am Dienstag, 30. Juni 2020 06:24:53 UTC+2 schrieb Ross A. Finlayson:
> > Search you mentioned and for example HTTP is adding the SEARCH verb,
Mostowski Collapse
2020-06-30 17:17:38 UTC
Reply
Permalink
If you would make a HTTP front end, you
would not necessarely need a new HTTP method

SEARCH. You can use the usual GET with a
query part as in this fictive example:

http://myhost.com/myfacade/mysearch?myparameter=<search term>

Mostowski Collapse schrieb:
> And you dont have posting maximums
> and wait times dictated by google.
>
> Am Dienstag, 30. Juni 2020 19:00:52 UTC+2 schrieb Mostowski Collapse:
>> NNTP is not HTTP. I was using bare metal access to
>> usenet, not using Google group, via:
>>
>> news.albasani.net, unfortunately dead since Corona
>>
>> So was looking for an alternative. And found this
>> alternative, which seems fine:
>>
>> news.solani.org
>>
>> Have Fun!
>>
>> P.S.: Technical spec of news.solani.org:
>>
>> Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 500 GB SSD RAID1
>> Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 2 TB HDD RAID1
>> Intel Xeon VM, 1.8 GHz, 1 GB RAM, 20 GB
>> Standort: 2x Falkenstein, 1x New York
>>
>> advantage of bare metal usenet,
>> you see all headers of message.
>>
>> Am Dienstag, 30. Juni 2020 06:24:53 UTC+2 schrieb Ross A. Finlayson:
>>> Search you mentioned and for example HTTP is adding the SEARCH verb,
>
Mostowski Collapse
2020-06-30 17:20:44 UTC
Reply
Permalink
Maybe as a backend, you could use something
from this project:

Apache James, a.k.a. Java Apache Mail Enterprise
Server or some variation thereof, is an open source
SMTP and POP3 mail transfer agent and NNTP news
server written entirely in Java.
https://en.wikipedia.org/wiki/Apache_James

The project seems to be still alive:

Version 3.3.0 was released in March 26, 2019.

I dunno whether Apache James already delivers
some added value, in that it provides some
search. If not you need a second backend

for the search.

Am Dienstag, 30. Juni 2020 19:17:47 UTC+2 schrieb Mostowski Collapse:
> If you would make a HTTP front end, you
> would not necessarely need a new HTTP method
>
> SEARCH. You can use the usual GET with a
> query part as in this fictive example:
>
> http://myhost.com/myfacade/mysearch?myparameter=<search term>
>
> Mostowski Collapse schrieb:
> > And you dont have posting maximums
> > and wait times dictated by google.
> >
> > Am Dienstag, 30. Juni 2020 19:00:52 UTC+2 schrieb Mostowski Collapse:
> >> NNTP is not HTTP. I was using bare metal access to
> >> usenet, not using Google group, via:
> >>
> >> news.albasani.net, unfortunately dead since Corona
> >>
> >> So was looking for an alternative. And found this
> >> alternative, which seems fine:
> >>
> >> news.solani.org
> >>
> >> Have Fun!
> >>
> >> P.S.: Technical spec of news.solani.org:
> >>
> >> Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 500 GB SSD RAID1
> >> Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 2 TB HDD RAID1
> >> Intel Xeon VM, 1.8 GHz, 1 GB RAM, 20 GB
> >> Standort: 2x Falkenstein, 1x New York
> >>
> >> advantage of bare metal usenet,
> >> you see all headers of message.
> >>
> >> Am Dienstag, 30. Juni 2020 06:24:53 UTC+2 schrieb Ross A. Finlayson:
> >>> Search you mentioned and for example HTTP is adding the SEARCH verb,
> >
Mostowski Collapse
2020-06-30 17:27:32 UTC
Reply
Permalink
"Apache James Server 3.1.0 and following versions require
Java 1.8. A migration guide for users willing to upgrade
from 2.3 to 3.4.0 is available. If relying on Guice wiring,
you can use some additional components
(Cassandra, **ElasticSearch**, ...)."
http://james.apache.org/server/index.html

Am Dienstag, 30. Juni 2020 19:20:53 UTC+2 schrieb Mostowski Collapse:
> Maybe as a backend, you could use something
> from this project:
>
> Apache James, a.k.a. Java Apache Mail Enterprise
> Server or some variation thereof, is an open source
> SMTP and POP3 mail transfer agent and NNTP news
> server written entirely in Java.
> https://en.wikipedia.org/wiki/Apache_James
>
> The project seems to be still alive:
>
> Version 3.3.0 was released in March 26, 2019.
>
> I dunno whether Apache James already delivers
> some added value, in that it provides some
> search. If not you need a second backend
>
> for the search.
>
> Am Dienstag, 30. Juni 2020 19:17:47 UTC+2 schrieb Mostowski Collapse:
> > If you would make a HTTP front end, you
> > would not necessarely need a new HTTP method
> >
> > SEARCH. You can use the usual GET with a
> > query part as in this fictive example:
> >
> > http://myhost.com/myfacade/mysearch?myparameter=<search term>
> >
> > Mostowski Collapse schrieb:
> > > And you dont have posting maximums
> > > and wait times dictated by google.
> > >
> > > Am Dienstag, 30. Juni 2020 19:00:52 UTC+2 schrieb Mostowski Collapse:
> > >> NNTP is not HTTP. I was using bare metal access to
> > >> usenet, not using Google group, via:
> > >>
> > >> news.albasani.net, unfortunately dead since Corona
> > >>
> > >> So was looking for an alternative. And found this
> > >> alternative, which seems fine:
> > >>
> > >> news.solani.org
> > >>
> > >> Have Fun!
> > >>
> > >> P.S.: Technical spec of news.solani.org:
> > >>
> > >> Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 500 GB SSD RAID1
> > >> Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 2 TB HDD RAID1
> > >> Intel Xeon VM, 1.8 GHz, 1 GB RAM, 20 GB
> > >> Standort: 2x Falkenstein, 1x New York
> > >>
> > >> advantage of bare metal usenet,
> > >> you see all headers of message.
> > >>
> > >> Am Dienstag, 30. Juni 2020 06:24:53 UTC+2 schrieb Ross A. Finlayson:
> > >>> Search you mentioned and for example HTTP is adding the SEARCH verb,
> > >
Ross A. Finlayson
2020-11-17 01:00:41 UTC
Reply
Permalink
On Tuesday, June 30, 2020 at 10:00:52 AM UTC-7, Mostowski Collapse wrote:
> NNTP is not HTTP. I was using bare metal access to
> usenet, not using Google group, via:
>
> news.albasani.net, unfortunately dead since Corona
>
> So was looking for an alternative. And found this
> alternative, which seems fine:
>
> news.solani.org
>
> Have Fun!
>
> P.S.: Technical spec of news.solani.org:
>
> Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 500 GB SSD RAID1
> Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 2 TB HDD RAID1
> Intel Xeon VM, 1.8 GHz, 1 GB RAM, 20 GB
> Standort: 2x Falkenstein, 1x New York
>
> advantage of bare metal usenet,
> you see all headers of message.
> Am Dienstag, 30. Juni 2020 06:24:53 UTC+2 schrieb Ross A. Finlayson:
> > Search you mentioned and for example HTTP is adding the SEARCH verb,

In traffic there are two kinds of usenet users,
viewers and traffic through Google Groups,
and, USENET. (USENET traffic.)

Here now Google turned on login to view their
Google Groups - effectively closing the Google Groups
without a Google login.

I suppose if they're used at work or whatever though
they'd be open.



Where I got with the C10K non-blocking I/O for a usenet server,
it scales up though then I think in the runtime is a situation where
it only runs epoll or kqueue that the test scale ups, then at the end
or in sockets there is a drop, or it fell off the driver. I've implemented
the code this far, what has all of NNTP in a file and then the "re-routine,
industry-pattern back-end" in memory, then for that running usually.

(Cooperative multithreading on top of non-blocking I/O.)

Implementing the serial queue or "monohydra", or slique,
makes for that then when the parser is constantly parsing,
it seems a usual queue like data structure with parsing
returning its bounds, consuming the queue.

Having the file buffers all down small on 4K pages,
has that a next usual page size is the megabyte.

Here though it seems to make sense to have a natural
4K alignment the file system representation, then that
it is moving files.

So, then with the new modern Java, it that runs in its own
Java server runtime environment, it seems I would also
need to see whether the cloud virt supported the I/O model
or not, or that the cooperative multi-threading for example
would be single-threaded. (Blocking abstractly.)

Then besides I suppose that could be neatly with basically
the program model, and its file model, being well-defined,
then for NNTP with IMAP organization search and extensions,
those being standardized, seems to make sense for an efficient
news file organization.

Here then it seems for serving the NNTP, and for example
their file bodies under the storage, with the fixed headers,
variable header or XREF, and the message body, then under
content it's same as storage.

NNTP has "OVERVIEW" then from it is built search.

Let's see here then, if I get the load test running, or,
just put a limit under the load while there are no load test
errors, it seems the algorithm then scales under load to be
making usually the algorithm serial in CPU, with: encryption,
and compression (traffic). (Block ciphers instead of serial transfer.)

Then, the industry pattern with re-routines, has that the
re-routines are naturally co-operative in the blocking,
and in the language, including flow-of-control and exception scope.


So, I have a high-performance implementation here.
Ross A. Finlayson
2020-11-17 01:38:59 UTC
Reply
Permalink
On Monday, November 16, 2020 at 5:00:51 PM UTC-8, Ross A. Finlayson wrote:
> On Tuesday, June 30, 2020 at 10:00:52 AM UTC-7, Mostowski Collapse wrote:
> > NNTP is not HTTP. I was using bare metal access to
> > usenet, not using Google group, via:
> >
> > news.albasani.net, unfortunately dead since Corona
> >
> > So was looking for an alternative. And found this
> > alternative, which seems fine:
> >
> > news.solani.org
> >
> > Have Fun!
> >
> > P.S.: Technical spec of news.solani.org:
> >
> > Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 500 GB SSD RAID1
> > Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 2 TB HDD RAID1
> > Intel Xeon VM, 1.8 GHz, 1 GB RAM, 20 GB
> > Standort: 2x Falkenstein, 1x New York
> >
> > advantage of bare metal usenet,
> > you see all headers of message.
> > Am Dienstag, 30. Juni 2020 06:24:53 UTC+2 schrieb Ross A. Finlayson:
> > > Search you mentioned and for example HTTP is adding the SEARCH verb,
> In traffic there are two kinds of usenet users,
> viewers and traffic through Google Groups,
> and, USENET. (USENET traffic.)
>
> Here now Google turned on login to view their
> Google Groups - effectively closing the Google Groups
> without a Google login.
>
> I suppose if they're used at work or whatever though
> they'd be open.
>
>
>
> Where I got with the C10K non-blocking I/O for a usenet server,
> it scales up though then I think in the runtime is a situation where
> it only runs epoll or kqueue that the test scale ups, then at the end
> or in sockets there is a drop, or it fell off the driver. I've implemented
> the code this far, what has all of NNTP in a file and then the "re-routine,
> industry-pattern back-end" in memory, then for that running usually.
>
> (Cooperative multithreading on top of non-blocking I/O.)
>
> Implementing the serial queue or "monohydra", or slique,
> makes for that then when the parser is constantly parsing,
> it seems a usual queue like data structure with parsing
> returning its bounds, consuming the queue.
>
> Having the file buffers all down small on 4K pages,
> has that a next usual page size is the megabyte.
>
> Here though it seems to make sense to have a natural
> 4K alignment the file system representation, then that
> it is moving files.
>
> So, then with the new modern Java, it that runs in its own
> Java server runtime environment, it seems I would also
> need to see whether the cloud virt supported the I/O model
> or not, or that the cooperative multi-threading for example
> would be single-threaded. (Blocking abstractly.)
>
> Then besides I suppose that could be neatly with basically
> the program model, and its file model, being well-defined,
> then for NNTP with IMAP organization search and extensions,
> those being standardized, seems to make sense for an efficient
> news file organization.
>
> Here then it seems for serving the NNTP, and for example
> their file bodies under the storage, with the fixed headers,
> variable header or XREF, and the message body, then under
> content it's same as storage.
>
> NNTP has "OVERVIEW" then from it is built search.
>
> Let's see here then, if I get the load test running, or,
> just put a limit under the load while there are no load test
> errors, it seems the algorithm then scales under load to be
> making usually the algorithm serial in CPU, with: encryption,
> and compression (traffic). (Block ciphers instead of serial transfer.)
>
> Then, the industry pattern with re-routines, has that the
> re-routines are naturally co-operative in the blocking,
> and in the language, including flow-of-control and exception scope.
>
>
> So, I have a high-performance implementation here.


It seems like for NFS, then, and having the separate read and write of the client,
a default filesystem, is an idea for the system facility: mirroring the mounted file
locally, and, providing the read view from that via a different route.


A next idea then seems for the organization, the client views themselves
organize over the durable and available file system representation, this
provides anyone a view over the protocol with a group file convention.

I.e., while usual continuous traffic was surfing, individual reads over group
files could have independent views, for example collating contents.

Then, extracting requests from traffic and threads seems usual.

(For example a specialized object transfer view.)

Making protocols for implementing internet protocols in groups and
so on, here makes for giving usenet example views to content generally.

So, I have designed a protocol node and implemented it mostly,
then about designed an object transfer protocol, here the idea
is how to make it so people can extract data, for example their own
data, from a large durable store of all the usenet messages,
making views of usenet running on usenet, eg "Feb. 2016: AP's
Greatest Hits".

Here the point is to figure that usenet, these days, can be operated
in cooperation with usenet, and really for its own sake, for leaving
messages in usenet and here for usenet protocol stores as there's
no reason it's plain text the content, while the protocol supports it.

Building personal view for example is a simple matter of very many
service providers any of which sells usenet all day for a good deal.

Let's see here, $25/MM, storage on the cloud last year for about
a million messages for a month is about $25. Outbound traffic is
usually the metered cloud traffic, here for example that CDN traffic
support the universal share convention, under metering. What that
the algorithm is effectively tunable in CPU and RAM, makes for under
I/O that's it's "unobtrusive" or the cooperative in routine, for CPI I/O and
RAM, then that there is for seeking that Network Store or Database Time
instead effectively becomes File I/O time, as what may be faster,
and more durable. There's a faster database time for scaling the ingestion
here with that the file view is eventually consistent. (And reliable.)

Checking the files would be over time for example with "last checked"
and "last dropped" something along the lines of, finding wrong offsets,
basically having to make it so that it survives neatly corruption of the
store (by being more-or-less stored in-place).

Content catalog and such, catalog.
Ross A. Finlayson
2022-03-01 21:09:17 UTC
Reply
Permalink
On Monday, December 6, 2021 at 4:35:08 PM UTC-8, Reese Page wrote:
> Ross A. Finlayson wrote:
>
> > Here for the re-routine, the industry factory pattern,
> > and the commands in the protocols in the templates,
> > and the memory module, with the algorithm interface,
> > in the high-performance computer resource, it is here that this simple
> > kind of "writing Internet software"
> > makes pretty rapidly for adding resources.
> Agenda 21 Exposed Depopulation & FEMA Camps. Because absolute privileges
> and power corrupts absolutely. If your government isn't fear the people,
> your ass is sold for capitalist profit.


Take a hike, troll.
Duane Hume
2022-03-01 21:18:37 UTC
Reply
Permalink
R̶o̶s̶s̶ A̶. F̶i̶n̶l̶a̶y̶s̶o̶n̶ w̶r̶o̶t̶e̶:

> O̶n̶ M̶o̶n̶d̶a̶y̶, D̶e̶c̶e̶m̶b̶e̶r̶ 6, 2021 a̶t̶ 4:35:08 P̶M̶ U̶T̶C̶-8, R̶e̶e̶s̶e̶ P̶a̶g̶e̶ w̶r̶o̶t̶e̶:
>> R̶o̶s̶s̶ A̶. F̶i̶n̶l̶a̶y̶s̶o̶n̶ w̶r̶o̶t̶e̶:
>>
>> > H̶e̶r̶e̶ f̶o̶r̶ t̶h̶e̶ r̶e̶-r̶o̶u̶t̶i̶n̶e̶, t̶h̶e̶ i̶n̶d̶u̶s̶t̶r̶y̶ f̶a̶c̶t̶o̶r̶y̶ p̶a̶t̶t̶e̶r̶n̶,
>> > a̶n̶d̶ t̶h̶e̶ c̶o̶m̶m̶a̶n̶d̶s̶ i̶n̶ t̶h̶e̶ p̶r̶o̶t̶o̶c̶o̶l̶s̶ i̶n̶ t̶h̶e̶ t̶e̶m̶p̶l̶a̶t̶e̶s̶,
>> > a̶n̶d̶ t̶h̶e̶ m̶e̶m̶o̶r̶y̶ m̶o̶d̶u̶l̶e̶, w̶i̶t̶h̶ t̶h̶e̶ a̶l̶g̶o̶r̶i̶t̶h̶m̶ i̶n̶t̶e̶r̶f̶a̶c̶e̶,
>> > i̶n̶ t̶h̶e̶ h̶i̶g̶h̶-p̶e̶r̶f̶o̶r̶m̶a̶n̶c̶e̶ c̶o̶m̶p̶u̶t̶e̶r̶ r̶e̶s̶o̶u̶r̶c̶e̶, i̶t̶ i̶s̶ h̶e̶r̶e̶ t̶h̶a̶t̶ t̶h̶i̶s̶
>> > s̶i̶m̶p̶l̶e̶ k̶i̶n̶d̶ o̶f̶ "w̶r̶i̶t̶i̶n̶g̶ I̶n̶t̶e̶r̶n̶e̶t̶ s̶o̶f̶t̶w̶a̶r̶e̶"
>> > m̶a̶k̶e̶s̶ p̶r̶e̶t̶t̶y̶ r̶a̶p̶i̶d̶l̶y̶ f̶o̶r̶ a̶d̶d̶i̶n̶g̶ r̶e̶s̶o̶u̶r̶c̶e̶s̶
>>
>> Agenda 21 Exposed Depopulation & FEMA Camps. Because absolute
>> privileges and power corrupts absolutely. If your government isn't fear
>> the people, your ass is sold for capitalist profit.
>
> T̶a̶k̶e̶ a̶ h̶i̶k̶e̶, t̶r̶o̶l̶l̶.

Watch this, you extreme uneducated troll

A Reality Check on the NWO Lies About Russia vs Ukraine.
https://www.bitchute.com/video/2GmMZqZ97fD3/

Zelensky Actor_Dancer - At least Reagan Just Appeared With a Monkey
https://www.bitchute.com/video/cKx0h5uejQHb/
Ross A. Finlayson
2022-03-05 13:00:10 UTC
Reply
Permalink
On Tuesday, March 1, 2022 at 1:18:55 PM UTC-8, Duane Hume wrote:
> R̶o̶s̶s̶ A̶. F̶i̶n̶l̶a̶y̶s̶o̶n̶ w̶r̶o̶t̶e̶:
>
> > O̶n̶ M̶o̶n̶d̶a̶y̶, D̶e̶c̶e̶m̶b̶e̶r̶ 6, 2021 a̶t̶ 4:35:08 P̶M̶ U̶T̶C̶-8, R̶e̶e̶s̶e̶ P̶a̶g̶e̶ w̶r̶o̶t̶e̶:
> >> R̶o̶s̶s̶ A̶. F̶i̶n̶l̶a̶y̶s̶o̶n̶ w̶r̶o̶t̶e̶:
> >>
> >> > H̶e̶r̶e̶ f̶o̶r̶ t̶h̶e̶ r̶e̶-r̶o̶u̶t̶i̶n̶e̶, t̶h̶e̶ i̶n̶d̶u̶s̶t̶r̶y̶ f̶a̶c̶t̶o̶r̶y̶ p̶a̶t̶t̶e̶r̶n̶,
> >> > a̶n̶d̶ t̶h̶e̶ c̶o̶m̶m̶a̶n̶d̶s̶ i̶n̶ t̶h̶e̶ p̶r̶o̶t̶o̶c̶o̶l̶s̶ i̶n̶ t̶h̶e̶ t̶e̶m̶p̶l̶a̶t̶e̶s̶,
> >> > a̶n̶d̶ t̶h̶e̶ m̶e̶m̶o̶r̶y̶ m̶o̶d̶u̶l̶e̶, w̶i̶t̶h̶ t̶h̶e̶ a̶l̶g̶o̶r̶i̶t̶h̶m̶ i̶n̶t̶e̶r̶f̶a̶c̶e̶,
> >> > i̶n̶ t̶h̶e̶ h̶i̶g̶h̶-p̶e̶r̶f̶o̶r̶m̶a̶n̶c̶e̶ c̶o̶m̶p̶u̶t̶e̶r̶ r̶e̶s̶o̶u̶r̶c̶e̶, i̶t̶ i̶s̶ h̶e̶r̶e̶ t̶h̶a̶t̶ t̶h̶i̶s̶
> >> > s̶i̶m̶p̶l̶e̶ k̶i̶n̶d̶ o̶f̶ "w̶r̶i̶t̶i̶n̶g̶ I̶n̶t̶e̶r̶n̶e̶t̶ s̶o̶f̶t̶w̶a̶r̶e̶"
> >> > m̶a̶k̶e̶s̶ p̶r̶e̶t̶t̶y̶ r̶a̶p̶i̶d̶l̶y̶ f̶o̶r̶ a̶d̶d̶i̶n̶g̶ r̶e̶s̶o̶u̶r̶c̶e̶s̶
> >>
> >> Agenda 21 Exposed Depopulation & FEMA Camps. Because absolute
> >> privileges and power corrupts absolutely. If your government isn't fear
> >> the people, your ass is sold for capitalist profit.
> >
> > T̶a̶k̶e̶ a̶ h̶i̶k̶e̶, t̶r̶o̶l̶l̶.
>
> Watch this, you extreme uneducated troll
>
> A Reality Check on the NWO Lies About Russia vs Ukraine.
> https://www.bitchute.com/video/2GmMZqZ97fD3/
>
> Zelensky Actor_Dancer - At least Reagan Just Appeared With a Monkey
> https://www.bitchute.com/video/cKx0h5uejQHb/

No, I don't follow links from usene
Mostowski Collapse
2022-03-05 19:33:45 UTC
Reply
Permalink
So NATO will not nuke or assasinate Putin.
So who might then do it? This is left as an exercise...

Duane Hume schrieb am Dienstag, 1. März 2022 um 22:18:55 UTC+1:
> Zelensky Actor_Dancer - At least Reagan Just Appeared With a Monkey
> https://www.bitchute.com/video/cKx0h5uejQHb/
Mostowski Collapse
2022-03-05 20:48:19 UTC
Reply
Permalink
Ha Ha, crowdfounding...

Diener des Volkes - Trailer mit deutschen Untertiteln
https://www.youtube.com/watch?v=XI8q82Pkyng

Mostowski Collapse schrieb am Samstag, 5. März 2022 um 20:33:58 UTC+1:
> So NATO will not nuke or assasinate Putin.
> So who might then do it? This is left as an exercise...
> Duane Hume schrieb am Dienstag, 1. März 2022 um 22:18:55 UTC+1:
> > Zelensky Actor_Dancer - At least Reagan Just Appeared With a Monkey
> > https://www.bitchute.com/video/cKx0h5uejQHb/
Ross A. Finlayson
2022-03-05 22:26:44 UTC
Reply
Permalink
On Saturday, March 5, 2022 at 12:48:32 PM UTC-8, Mostowski Collapse wrote:
> Ha Ha, crowdfounding...
>
> Diener des Volkes - Trailer mit deutschen Untertiteln
> https://www.youtube.com/watch?v=XI8q82Pkyng
> Mostowski Collapse schrieb am Samstag, 5. März 2022 um 20:33:58 UTC+1:
> > So NATO will not nuke or assasinate Putin.
> > So who might then do it? This is left as an exercise...
> > Duane Hume schrieb am Dienstag, 1. März 2022 um 22:18:55 UTC+1:
> > > Zelensky Actor_Dancer - At least Reagan Just Appeared With a Monkey
> > > https://www.bitchute.com/video/cKx0h5uejQHb/


In more primitive cultures it's usually matters of 5's and 20's.

Feeling more rarified these days?
Darin Herr
2022-03-05 21:04:38 UTC
Reply
Permalink
Mostowski Collapse wrote:

> So NATO will not nuke or assasinate Putin.
> So who might then do it? This is left as an exercise...
>
> Duane Hume schrieb am Dienstag, 1. März 2022 um 22:18:55 UTC+1:
>> Zelensky Actor_Dancer - At least Reagan Just Appeared With a Monkey
>> https://www.bitchute.com/video/cKx0h5uejQHb/

How on earth going to assassinate an elected president, and for what,
saving your sorry ass from the *nuclear_nazis*, friend?? You don't know
mathematics.

Anyway, as I see it, after a winning army the Nuremberg 2.0 will be
started, because the toxic depopulation injections, said "vaccines".

Then money would be Gold coupled, so your fake money shithole country will
likely disappear. You have to return all Gold to the countries it belongs.

RUSSIA SAVES EUROPE FROM NUCLEAR DISASTER
https://www.bitchute.com/video/fl86YAfeJobU/

Overcoming The Fake News Narrative With The Truth On Vladimir Putin vs The
New-Nazi World Order https://www.bitchute.com/video/faSkGT9Ybnnf/
Ross A. Finlayson
2022-03-05 22:25:09 UTC
Reply
Permalink
On Saturday, March 5, 2022 at 1:04:56 PM UTC-8, Darin Herr wrote:
> Mostowski Collapse wrote:
>
> > So NATO will not nuke or assasinate Putin.
> > So who might then do it? This is left as an exercise...
> >
> > Duane Hume schrieb am Dienstag, 1. März 2022 um 22:18:55 UTC+1:
> >> Zelensky Actor_Dancer - At least Reagan Just Appeared With a Monkey
> >> https://www.bitchute.com/video/cKx0h5uejQHb/
> How on earth going to assassinate an elected president, and for what,
> saving your sorry ass from the *nuclear_nazis*, friend?? You don't know
> mathematics.
>
> Anyway, as I see it, after a winning army the Nuremberg 2.0 will be
> started, because the toxic depopulation injections, said "vaccines".
>
> Then money would be Gold coupled, so your fake money shithole country will
> likely disappear. You have to return all Gold to the countries it belongs.
>
> RUSSIA SAVES EUROPE FROM NUCLEAR DISASTER
> https://www.bitchute.com/video/fl86YAfeJobU/
>
> Overcoming The Fake News Narrative With The Truth On Vladimir Putin vs The
> New-Nazi World Order https://www.bitchute.com/video/faSkGT9Ybnnf/

https://en.wikipedia.org/wiki/Godwin%27s_law
Ross Finlayson
2023-03-09 04:51:53 UTC
Reply
Permalink
On Monday, December 6, 2021 at 7:32:16 AM UTC-8, Ross A. Finlayson wrote:
> On Monday, November 16, 2020 at 5:39:08 PM UTC-8, Ross A. Finlayson wrote:
> > On Monday, November 16, 2020 at 5:00:51 PM UTC-8, Ross A. Finlayson wrote:
> > > On Tuesday, June 30, 2020 at 10:00:52 AM UTC-7, Mostowski Collapse wrote:
> > > > NNTP is not HTTP. I was using bare metal access to
> > > > usenet, not using Google group, via:
> > > >
> > > > news.albasani.net, unfortunately dead since Corona
> > > >
> > > > So was looking for an alternative. And found this
> > > > alternative, which seems fine:
> > > >
> > > > news.solani.org
> > > >
> > > > Have Fun!
> > > >
> > > > P.S.: Technical spec of news.solani.org:
> > > >
> > > > Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 500 GB SSD RAID1
> > > > Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 2 TB HDD RAID1
> > > > Intel Xeon VM, 1.8 GHz, 1 GB RAM, 20 GB
> > > > Standort: 2x Falkenstein, 1x New York
> > > >
> > > > advantage of bare metal usenet,
> > > > you see all headers of message.
> > > > Am Dienstag, 30. Juni 2020 06:24:53 UTC+2 schrieb Ross A. Finlayson:
> > > > > Search you mentioned and for example HTTP is adding the SEARCH verb,
> > > In traffic there are two kinds of usenet users,
> > > viewers and traffic through Google Groups,
> > > and, USENET. (USENET traffic.)
> > >
> > > Here now Google turned on login to view their
> > > Google Groups - effectively closing the Google Groups
> > > without a Google login.
> > >
> > > I suppose if they're used at work or whatever though
> > > they'd be open.
> > >
> > >
> > >
> > > Where I got with the C10K non-blocking I/O for a usenet server,
> > > it scales up though then I think in the runtime is a situation where
> > > it only runs epoll or kqueue that the test scale ups, then at the end
> > > or in sockets there is a drop, or it fell off the driver. I've implemented
> > > the code this far, what has all of NNTP in a file and then the "re-routine,
> > > industry-pattern back-end" in memory, then for that running usually.
> > >
> > > (Cooperative multithreading on top of non-blocking I/O.)
> > >
> > > Implementing the serial queue or "monohydra", or slique,
> > > makes for that then when the parser is constantly parsing,
> > > it seems a usual queue like data structure with parsing
> > > returning its bounds, consuming the queue.
> > >
> > > Having the file buffers all down small on 4K pages,
> > > has that a next usual page size is the megabyte.
> > >
> > > Here though it seems to make sense to have a natural
> > > 4K alignment the file system representation, then that
> > > it is moving files.
> > >
> > > So, then with the new modern Java, it that runs in its own
> > > Java server runtime environment, it seems I would also
> > > need to see whether the cloud virt supported the I/O model
> > > or not, or that the cooperative multi-threading for example
> > > would be single-threaded. (Blocking abstractly.)
> > >
> > > Then besides I suppose that could be neatly with basically
> > > the program model, and its file model, being well-defined,
> > > then for NNTP with IMAP organization search and extensions,
> > > those being standardized, seems to make sense for an efficient
> > > news file organization.
> > >
> > > Here then it seems for serving the NNTP, and for example
> > > their file bodies under the storage, with the fixed headers,
> > > variable header or XREF, and the message body, then under
> > > content it's same as storage.
> > >
> > > NNTP has "OVERVIEW" then from it is built search.
> > >
> > > Let's see here then, if I get the load test running, or,
> > > just put a limit under the load while there are no load test
> > > errors, it seems the algorithm then scales under load to be
> > > making usually the algorithm serial in CPU, with: encryption,
> > > and compression (traffic). (Block ciphers instead of serial transfer.)
> > >
> > > Then, the industry pattern with re-routines, has that the
> > > re-routines are naturally co-operative in the blocking,
> > > and in the language, including flow-of-control and exception scope.
> > >
> > >
> > > So, I have a high-performance implementation here.
> > It seems like for NFS, then, and having the separate read and write of the client,
> > a default filesystem, is an idea for the system facility: mirroring the mounted file
> > locally, and, providing the read view from that via a different route.
> >
> >
> > A next idea then seems for the organization, the client views themselves
> > organize over the durable and available file system representation, this
> > provides anyone a view over the protocol with a group file convention.
> >
> > I.e., while usual continuous traffic was surfing, individual reads over group
> > files could have independent views, for example collating contents.
> >
> > Then, extracting requests from traffic and threads seems usual.
> >
> > (For example a specialized object transfer view.)
> >
> > Making protocols for implementing internet protocols in groups and
> > so on, here makes for giving usenet example views to content generally.
> >
> > So, I have designed a protocol node and implemented it mostly,
> > then about designed an object transfer protocol, here the idea
> > is how to make it so people can extract data, for example their own
> > data, from a large durable store of all the usenet messages,
> > making views of usenet running on usenet, eg "Feb. 2016: AP's
> > Greatest Hits".
> >
> > Here the point is to figure that usenet, these days, can be operated
> > in cooperation with usenet, and really for its own sake, for leaving
> > messages in usenet and here for usenet protocol stores as there's
> > no reason it's plain text the content, while the protocol supports it.
> >
> > Building personal view for example is a simple matter of very many
> > service providers any of which sells usenet all day for a good deal.
> >
> > Let's see here, $25/MM, storage on the cloud last year for about
> > a million messages for a month is about $25. Outbound traffic is
> > usually the metered cloud traffic, here for example that CDN traffic
> > support the universal share convention, under metering. What that
> > the algorithm is effectively tunable in CPU and RAM, makes for under
> > I/O that's it's "unobtrusive" or the cooperative in routine, for CPI I/O and
> > RAM, then that there is for seeking that Network Store or Database Time
> > instead effectively becomes File I/O time, as what may be faster,
> > and more durable. There's a faster database time for scaling the ingestion
> > here with that the file view is eventually consistent. (And reliable.)
> >
> > Checking the files would be over time for example with "last checked"
> > and "last dropped" something along the lines of, finding wrong offsets,
> > basically having to make it so that it survives neatly corruption of the
> > store (by being more-or-less stored in-place).
> >
> > Content catalog and such, catalog.
> Then I wonder and figure the re-routine can scale.
>
> Here for the re-routine, the industry factory pattern,
> and the commands in the protocols in the templates,
> and the memory module, with the algorithm interface,
> in the high-performance computer resource, it is here
> that this simple kind of "writing Internet software"
> makes pretty rapidly for adding resources.
>
> Here the design is basically of a file I/O abstraction,
> that the computer reads data files with mmap to get
> their handlers, what results that for I/O map the channels
> result transferring the channels in I/O for what results,
> in mostly the allocated resource requirements generally,
> and for the protocol and algorithm, it results then that
> the industry factory pattern and making for interfaces,
> then also here the I/O routine as what results that this
> is an implementation, of a network server, mostly is making
> for that the re-routine, results very neatly a model of
> parallel cooperation.
>
> I think computers still have file systems and file I/O but
> in abstraction just because PAGE_SIZE is still relevant for
> the network besides or I/O, if eventually, here is that the
> value types are in the commands and so on, it is besides
> that in terms of the resources so defined it still is in a filesystem
> convention that a remote and unreliable view of it suffices.
>
> Here then the source code also being "this is only 20-50k",
> lines of code, with basically an entire otherwise library stack
> of the runtime itself, only the network and file abstraction,
> this makes for also that modularity results. (Factory Industry
> Pattern Modules.)
>
> For a network server, here, that, mostly it is high performance
> in the sense that this is about the most direct handle on the channels
> and here mostly for the text layer in the I/O order, or protocol layer,
> here is that basically encryption and compression usually in the layer,
> there is besides a usual concern where encryption and compression
> are left out, there is that text in the layer itself is commands.
>
> Then, those being constants under the resources for the protocol,
> it's what results usual protocols like NNTP and HTTP and other protocols
> with usually one server and many clients, here is for that these protocols
> are defined in these modules, mostly there NNTP and IMAP, ..., HTTP.
>
> These are here defined "all Java" or "Pure Java", i.e. let's be clear that
> in terms of the reference abstraction layer, I think computers still use
> the non-blocking I/O and filesystems and network to RAM, so that as
> the I/O is implemented in those it actually has those besides instead for
> example defaulting to byte-per-channel or character I/O. I.e. the usual
> semantics for servicing the I/O in the accepter routine and what makes
> for that the platform also provides a reference encryption implementation,
> if not so relevant for the block encoder chain, besides that for example
> compression has a default implementation, here the I/O model is as simply
> in store for handles, channels, ..., that it results that data especially delivered
> from a constant store can anyways be mostly compressed and encrypted
> already or predigested to serve, here that it's the convention, here is for
> resulting that these client-server protocols, with usually reads > postings
> then here besides "retention", basically here is for what it is.
>
> With the re-routine and the protocol layer besides, having written the
> routines in the re-routine, what there is to write here is this industry
> factory, or a module framework, implementing the re-routines, as they're
> built from the linear description a routine, makes for as the routine progresses
> that it's "in the language" and that more than less in the terms, it makes for
> implementing the case of logic for values, in the logic's flow-of-control's terms.
>
> Then, there is that actually running the software is different than just
> writing it, here in the sense that as a server runtime, it is to be made a
> thing, by giving it a name, and giving it an authority, to exist on the Internet.
>
> There is basically that for BGP and NAT and so on, and, mobile fabric networks,
> IP and TCP/IP, of course IPv4 and IPv6 are the coarse fabric main space, with
> respect to what are CIDR and 24 bits rule and what makes for TCP/IP, here
> entirely the course is using the TCP/IP stack and Java's TCP/IP stack, with
> respect to that TCP/IP is so provided or in terms of process what results
> ports mostly and connection models where it is exactly the TCP after the IP,
> the Transport Control Protocol and Internet Protocol, have here both this
> socket and datagram connection orientation, or stateful and stateless or
> here that in terms of routing it's defined in addresses, under that names
> and routing define sources, routes, destinations, ..., that routine numeric
> IP addresses result in the usual sense of the network being behind an IP
> and including IPv4 network fabric with respect to local routers.
>
> I.e., here to include a service framework is "here besides the routine, let's
> make it clear that in terms of being a durable resource, there needs to be
> some lockbox filled with its sustenance that in some locked or constant
> terms results that for the duration of its outlay, say five years, it is held
> up, then, it will be so again, or, let down to result the carry-over that it
> invested to archive itself, I won't have to care or do anything until then".
>
>
> About the service activation and the idea that, for a port, the routine itself
> needs only run under load, i.e. there is effectively little traffic on the old archives,
> and usually only the some other archive needs any traffic. Here the point is
> that for the Java routine there is the system port that was accepted for the
> request, that inetd or the systemd or means the network service was accessed,
> made for that much as for HTTP the protocol is client-server also for IP the
> protocol is client-server, while the TCP is packets. This is a general idea for
> system integration while here mostly the routine is that being a detail:
> the filesystem or network resource that results that the re-routines basically
> make very large CPU scaling.
>
> Then, it is basically containerized this sense of "at some domain name, there
> is a service, it's HTTP and NNTP and IMAP besides, what cares the world".
>
> I.e. being built on connection oriented protocols like the socket layer,
> HTTP(S) and NNTP(S) and IMAP(S) or with the TLS orientation to certificates,
> it's more than less sensible that most users have no idea of installing some
> NNTP browser or pointing their email to IMAP so that the email browser
> browses the newsgroups and for postings, here this is mostly only talk
> about implementing NNTP then IMAP and HTTP that happens to look like that,
> besides for example SMTP or NNTP posting.
>
> I.e., having "this IMAP server, happens to be this NNTP module", or
> "this HTTP server, happens to be a real simple mailbox these groups",
> makes for having partitions and retentions of those and that basically
> NNTP messages in the protocol can be more or less the same content
> in media, what otherwise is of a usual message type.
>
> Then, the NNTP server-server routine is the progation of messages
> besides "I shall hire ten great usenet retention accounts and gently
> and politely draw them down and back-fill Usenet, these ten groups".
>
> By then I would have to have made for retention in storage, such contents,
> as have a reference value, then for besides making that independent in
> reference value, just so that it suffices that it basically results "a usable
> durable filesystem that happens you can browse it like usenet". I.e. as
> the pieces to make the backfill are dug up, they get assigned reference numbers
> of their time to make for what here is that in a grand schema of things,
> they have a reference number in numerical order (and what's also the
> server's "message-number" besides its "message-id") as noted above this
> gets into the storage for retention of a file, while, most services for this
> are instead for storage and serving, not necessarily or at all retention.
>
> I.e., the point is that as the groups are retained from retention, there is an
> approach what makes for an orderly archeology, as for what convention
> some data arrives, here that this server-server routine is besides the usual
> routine which is "here are new posts, propagate them", it's "please deliver
> as of a retention scan, and I'll try not to repeat it, what results as orderly
> as possible a proof or exercise of what we'll call afterward entire retention",
> then will be for as of writing a file that "as of the date, from start to finish,
> this site certified these messages as best-effort retention".
>
> It seems then besides there is basically "here is some mbox file, serve it
> like it was an NNTP group or an IMAP mailbox", ingestion, in terms of that
> what is ingestion, is to result for the protocol that "for this protocol,
> there is actually a normative filesystem representation that happens to
> be pretty much also altogether definede by the protocol", the point is
> that ingestion would result in command to remain in the protocol,
> that a usual file type that "presents a usual abstraction, of a filesystem,
> as from the contents of a file", here with the notion of "for all these
> threaded discussions, here this system only cares some approach to
> these ten particular newgroups that already have mostly their corpus
> though it's not in perhaps their native mbox instead consulted from services".
>
> Then, there's for storing and serving the files, and there is the usual
> notion that moving the data, is to result, that really these file organizations
> are not so large in terms of resources, being "less than gigabytes" or so,
> still there's a notion that as a durable resource they're to be made
> fungible here the networked file approach in the native filesystem,
> then that with respect to it's a backing store, it's to make for that
> the entire enterprise is more or less to made in terms of account,
> that then as a facility on the network then a service in the network,
> it's basically separated the facility and service, while still of course
> that the service is basically defined by its corpus.
>
>
> Then, to make that fungible in a world of account, while with an exit
> strategy so that the operation isn't not abstract, is mostly about the
> domain name, then that what results the networking, after trusted
> network naming and connections for what result routing, and then
> the port, in terms of that there are usual firewalls in ports though that
> besides usually enough client ports are ephemeral, here the point is
> that the protocols and their well-known ports, here it's usually enough
> that the Internet doesn't concern itself so much protocols but with
> respect to proxies, here that for example NNTP and IMAP don't have
> so much anything so related that way after startTLS. For the world of
> account, is basically to have for a domain name, an administrator, and,
> an owner or representative. These are to establish authority for changes
> and also accountability for usage.
>
> Basically they're to be persons and there is a process to get to be an
> administrator of DNS, most always there are services that a usual person
> implementing the system might use, besides for example the numerical.
>
> More relevant though to DNS is getting servers on the network, with respect
> to listening ports and that they connect to clients what so discover them as
> via DNS or configuration, here as above the usual notion that these are
> standard services and run on well-known ports for inetd or systemd.
> I.e. there is basically that running a server and dedicated networking,
> and power and so on, and some notion of the limits of reliability, is then
> as very much in other aspects of the organization of the system, i.e. its name,
> while at the same time, the point that a module makes for that basically
> the provision of a domain name or well-known or ephemeral host, is the
> usual notion that static IP addresses are a limited resource and as about
> the various networks in IPv4 and how they route traffic, is for that these
> services have well-known sections in DNS for at least that the most usual
> configuration is none.
>
> For a usual global reliability and availability, is some notion basically that
> each region and zone has a service available on the IP address, for that
> "hostname" resolves to the IP addresses. As well, in reverse, for the IP
> address and about the hostname, it should resolve reverse to hostname.
>
> About certificates mostly for identification after mapping to port, or
> multi-home Internet routing, here is the point that whether the domain
> name administration is "epochal" or "regular", is that epochs are defined
> by the ports behind the numbers and the domain name system as well,
> where in terms of the registrar, the domain names are epochal to the
> registrar, with respect to owners of domain names.
>
> Then if DNS is a datagram or UDP service is for ICMP as for TCP/IP,
> and also BGP and NAT and routing and what are local and remote
> addresses, here is for not-so-much "implement DNS the protocol
> also while you're at it", rather for what results that there is a durable
> and long-standing and proper doorman, for some usenet.science.
>
> Here then the notion seems to be whether the doorman basically
> knows well-known services, is a multi-homing router, or otherwise
> what is the point that it starts the lean runtime, with respect to that
> it's a container and having enough sense of administration its operation
> as contained. I.e. here given a port and a hostname and always running
> makes for that as long as there is the low (preferable no) idle for services
> running that have no clients, is here also for the cheapest doorman that
> knows how to standup the client sentinel. (And put it back away.)
>
> Probably the most awful thing in the cloud services is the cost for
> data ingress and egress. What that means is that for example using
> a facility that is bound by that as a cost instead of under some constant
> cost, is basically why there is the approach that the containers needs a
> handle to the files, and they're either local files or network files, here
> with the some convention above in archival a shared consistent view
> of all the files, or abstractly consistent, is for making that the doorman
> can handle lots of starting and finishing connections, while it is out of
> the way when usually it's client traffic and opening and closing connections,
> and the usual abstraction is that the client sentinel is never off and doorman
> does nothing, here is for attaching the one to some lower constant cost,
> where for example any long-running cost is more than some low constant cost.
>
> Then, this kind of service is often represented by nodes, in the usual sense
> "here is an abstract container with you hope some native performance under
> the hypervisor where it lives on the farm on its rack, it basically is moved the
> image to wherever it's requested from and lives there, have fun, the meter is on".
> I.e. that's just "this Jar has some config conventions and you can make the
> container associate it and watchdog it with systemd for example and use the
> cgroups while you're at it and make for tempfs quota and also the best network
> file share, which you might be welcome to cache if you care just in the off-chance
> that this file-mapping is free or constant cost as long as it doesn't egress the
> network", is for here about the facilities that work, to get a copy of the system
> what with respect to its usual operation is a piece of the Internet.
>
> For the different reference modules (industry factories) in their patterns then
> and under combined configuration "file + process + network + fare", is that
> the fare of the service basically reflects a daily coin, in the sense that it
> represents an annual or epochal fee, what results for the time there is
> what is otherwise all defined the "file + process + network + name",
> what results it perpetuates in operation more than less simply and automatically.
>
> Then, the point though is to get it to where "I can go to this service, and
> administer it more or less by paying an account, that it thus lives in its
> budget and quota in its metered world".
>
> That though is very involved with identity, that in terms of "I the account
> as provided this sum make this sum paid with respect to an agreement",
> is that authority to make agreements must make that it results that the
> operation of the system, is entirely transparent, and defined in terms of
> the roles and delegation, conventions in operation.
>
> I.e., I personally don't want to administer a copy of usenet, but, it's here
> pretty much sorted out that I can administer one once then that it's to
> administer itself in the following, in terms of it having resources to allocate
> and resources to disburse. Also if nobody's using it it should basically work
> itself out to dial its lights down (while maintaining availability).
>
> Then a point seems "maintain and administer the operation in effect,
> what arrangement sees via delegation, that a card number and a phone
> number and an email account and more than less a responsible entity,
> is so indicated for example in cryptographic identity thus that the operation
> of this system as a service, effectively operates itself out of a kitty,
> what makes for administration and overhead, an entirely transparent
> model of a miniature business the system as a service".
>
> "... and a mailing address and mail service."
>
> Then, for accounts and accounts, for example is the provision of the component
> as simply an image in cloud algorithms, where basically as above here it's configured
> that anybody with any cloud account could basically run it on their own terms,
> there is for here sorting out "after this delegation to some business entity what
> results a corporation in effect, the rest is business-in-a-box and more-than-less
> what makes for its administration in state, is for how it basically limits and replicates
> its service, in terms of its own assets here as what administered is abstractly
> "durable forever mailboxes with private ownership if on public or managed resources".
>
> A usual notion of a private email and usenet service offering and business-in-a-box,
> here what I'm looking at is that besides archiving sci.math and copying out its content
> under author line, is to make such an industry for example here that "once having
> implemented an Internet service, an Internet service of them results Internet".
>
> I.e. here the point is to make a corporation and a foundation in effect, what in terms
> of then about the books and accounts, is about accounts for the business accounts
> that reflect a persistent entity, then what results in terms of computing, networking,
> and internetworking, with a regular notion of "let's never change this arrangement
> but it's in monthly or annual terms", here for that in overall arrangements,
> it results what the entire system more than less runs in ways then to either
> run out its limits or make itself a sponsored effort, about more-or-less a simple
> and responsible and accountable set of operations what effect the business
> (here that in terms of service there is basically the realm of agreement)
> that basically this sort of business-in-a-box model, is then besides itself of
> accounts, toward the notion as pay-as-you-go and "usual credits and their limits".
>
> Then for a news://usenet.science, or for example sci.math.usenet.science,
> is the idea that the entity is "some assemblage what is so that in DNS, and,
> in the accounts payable and receivable, and, in the material matters of
> arrangement and authority for administration, of DNS and resources and
> accounts what result durably persisting the business, is basically for a service
> then of what these are usual enough tasks, as that are interactive workflows
> and for mechanical workflows.
>
> I.e. the point is for having the service than an on/off button and more or less
> what is for a given instance of the operation, what results from some protocol
> that provides a "durable store" of a sort of the business, that at any time basically
> some re-routine or "eventually consistent" continuance of the operation of the
> business, results basically a continuity in its operations, what is entirely granular,
> that here for example the point is to "pick a DNS name, attach an account service,
> go" it so results that in the terms, basically there are the placeholders of the
> interactive workflows in that, and as what in terms are often for example simply
> card and phone number terms, account terms.
>
> I.e. a service to replenish accounts as kitties for making accounts only and
> exactly limited to the one service, its transfers, basically results that there
> is the notion of an email address, a phone number, a credit card's information,
> here a fixed limit debit account that works as of a kitty, there is a regular workflow
> service that will read out the durable stores and according to the timeliness of
> their events, affect the configuration and reconciliation of payments for accounts
> (closed loop scheduling/receiving).
>
> https://datatracker.ietf.org/doc/draft-flanagan-regext-datadictionary/
> https://www.rfc-editor.org/rfc/rfc9022.txt
>
> Basically for dailies, monthlies, and annuals, what make weeklies,
> is this idea of Internet-from-a- account, what is services.


After implementing a store, and the protocol for getting messages, then what seems relevant here in the
context of the SEARCH command, is a fungible file-format, that is derived from the body of the message
in a normal form, that is a data structure that represents an index and catalog and dictionary and summary
of the message, a form of a data structure of a "search index".

These types files should naturally compose, and result a data structure that according to some normal
forms of search and summary algorithms, result that a data structure results, that makes for efficient
search of sections of the corpus for information retrieval, here that "information retrieval is the science
of search algorithms".

Now, for what and how people search, or what is the specification of a search, is in terms of queries, say,
here for some brief forms of queries that advise what's definitely included in the search, what's excluded,
then perhaps what's maybe included, or yes/no/maybe, which makes for a predicate that can be built,
that can be applied to results that compose and build for the terms of a filter with yes/no/maybe or
sure/no/yes, with predicates in values.

Here there is basically "free text search" and "matching summaries", where text is the text and summary is
a data structure, with attributes as paths the leaves of the tree of which match.

Then, the message has text, its body, and and headers, key-value pairs or collections thereof, where as well
there are default summaries like "a histogram of words by occurrence" or for example default text like "the
MIME body of this message has a default text representation".

So, the idea developing here is to define what are "normal" forms of data structures that have some "normal"
forms of encoding that result that these "normalizing" after "normative" data structures define well-behaved
algorithms upon them, which provide well-defined bounds in resources that return some quantification of results,
like any/each/every/all, "hits".

This is where usually enough search engines' or collected search algorithms ("find") usually enough have these
de-facto forms, "under the hood", as it were, to make it first-class that for a given message and body that
there is a normal form of a "catalog summary index" which can be compiled to a constant when the message
is ingested, that then basically any filestore of these messages has alongside it the filestore of the "catsums"
or as on-demand, then that any algorithm has at least well-defined behavior under partitions or collections
or selections of these messages, or items, for various standard algorithms that separate "to find" from
"to serve to find".

So, ..., what I'm wondering are what would be sufficient normal forms in brief that result that there are
defined for a given corpus of messages, basically at the granularity of messages, how is defined how
there is a normal form for each message its "catsum", that catums have a natural algebra that a
concatenation of catums is a catsum and that some standard algorithms naturally have well-defined
results on their predicates and quantifiers of matching, in serial and parallel, and that the results
combine in serial and parallel.

The results should be applicable to any kind of data but here it's more or less about usenet groups.
Ross Finlayson
2023-03-09 06:23:00 UTC
Reply
Permalink
On Wednesday, March 8, 2023 at 8:51:58 PM UTC-8, Ross Finlayson wrote:
> On Monday, December 6, 2021 at 7:32:16 AM UTC-8, Ross A. Finlayson wrote:
> > On Monday, November 16, 2020 at 5:39:08 PM UTC-8, Ross A. Finlayson wrote:
> > > On Monday, November 16, 2020 at 5:00:51 PM UTC-8, Ross A. Finlayson wrote:
> > > > On Tuesday, June 30, 2020 at 10:00:52 AM UTC-7, Mostowski Collapse wrote:
> > > > > NNTP is not HTTP. I was using bare metal access to
> > > > > usenet, not using Google group, via:
> > > > >
> > > > > news.albasani.net, unfortunately dead since Corona
> > > > >
> > > > > So was looking for an alternative. And found this
> > > > > alternative, which seems fine:
> > > > >
> > > > > news.solani.org
> > > > >
> > > > > Have Fun!
> > > > >
> > > > > P.S.: Technical spec of news.solani.org:
> > > > >
> > > > > Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 500 GB SSD RAID1
> > > > > Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 2 TB HDD RAID1
> > > > > Intel Xeon VM, 1.8 GHz, 1 GB RAM, 20 GB
> > > > > Standort: 2x Falkenstein, 1x New York
> > > > >
> > > > > advantage of bare metal usenet,
> > > > > you see all headers of message.
> > > > > Am Dienstag, 30. Juni 2020 06:24:53 UTC+2 schrieb Ross A. Finlayson:
> > > > > > Search you mentioned and for example HTTP is adding the SEARCH verb,
> > > > In traffic there are two kinds of usenet users,
> > > > viewers and traffic through Google Groups,
> > > > and, USENET. (USENET traffic.)
> > > >
> > > > Here now Google turned on login to view their
> > > > Google Groups - effectively closing the Google Groups
> > > > without a Google login.
> > > >
> > > > I suppose if they're used at work or whatever though
> > > > they'd be open.
> > > >
> > > >
> > > >
> > > > Where I got with the C10K non-blocking I/O for a usenet server,
> > > > it scales up though then I think in the runtime is a situation where
> > > > it only runs epoll or kqueue that the test scale ups, then at the end
> > > > or in sockets there is a drop, or it fell off the driver. I've implemented
> > > > the code this far, what has all of NNTP in a file and then the "re-routine,
> > > > industry-pattern back-end" in memory, then for that running usually.
> > > >
> > > > (Cooperative multithreading on top of non-blocking I/O.)
> > > >
> > > > Implementing the serial queue or "monohydra", or slique,
> > > > makes for that then when the parser is constantly parsing,
> > > > it seems a usual queue like data structure with parsing
> > > > returning its bounds, consuming the queue.
> > > >
> > > > Having the file buffers all down small on 4K pages,
> > > > has that a next usual page size is the megabyte.
> > > >
> > > > Here though it seems to make sense to have a natural
> > > > 4K alignment the file system representation, then that
> > > > it is moving files.
> > > >
> > > > So, then with the new modern Java, it that runs in its own
> > > > Java server runtime environment, it seems I would also
> > > > need to see whether the cloud virt supported the I/O model
> > > > or not, or that the cooperative multi-threading for example
> > > > would be single-threaded. (Blocking abstractly.)
> > > >
> > > > Then besides I suppose that could be neatly with basically
> > > > the program model, and its file model, being well-defined,
> > > > then for NNTP with IMAP organization search and extensions,
> > > > those being standardized, seems to make sense for an efficient
> > > > news file organization.
> > > >
> > > > Here then it seems for serving the NNTP, and for example
> > > > their file bodies under the storage, with the fixed headers,
> > > > variable header or XREF, and the message body, then under
> > > > content it's same as storage.
> > > >
> > > > NNTP has "OVERVIEW" then from it is built search.
> > > >
> > > > Let's see here then, if I get the load test running, or,
> > > > just put a limit under the load while there are no load test
> > > > errors, it seems the algorithm then scales under load to be
> > > > making usually the algorithm serial in CPU, with: encryption,
> > > > and compression (traffic). (Block ciphers instead of serial transfer.)
> > > >
> > > > Then, the industry pattern with re-routines, has that the
> > > > re-routines are naturally co-operative in the blocking,
> > > > and in the language, including flow-of-control and exception scope.
> > > >
> > > >
> > > > So, I have a high-performance implementation here.
> > > It seems like for NFS, then, and having the separate read and write of the client,
> > > a default filesystem, is an idea for the system facility: mirroring the mounted file
> > > locally, and, providing the read view from that via a different route.
> > >
> > >
> > > A next idea then seems for the organization, the client views themselves
> > > organize over the durable and available file system representation, this
> > > provides anyone a view over the protocol with a group file convention.
> > >
> > > I.e., while usual continuous traffic was surfing, individual reads over group
> > > files could have independent views, for example collating contents.
> > >
> > > Then, extracting requests from traffic and threads seems usual.
> > >
> > > (For example a specialized object transfer view.)
> > >
> > > Making protocols for implementing internet protocols in groups and
> > > so on, here makes for giving usenet example views to content generally.
> > >
> > > So, I have designed a protocol node and implemented it mostly,
> > > then about designed an object transfer protocol, here the idea
> > > is how to make it so people can extract data, for example their own
> > > data, from a large durable store of all the usenet messages,
> > > making views of usenet running on usenet, eg "Feb. 2016: AP's
> > > Greatest Hits".
> > >
> > > Here the point is to figure that usenet, these days, can be operated
> > > in cooperation with usenet, and really for its own sake, for leaving
> > > messages in usenet and here for usenet protocol stores as there's
> > > no reason it's plain text the content, while the protocol supports it.
> > >
> > > Building personal view for example is a simple matter of very many
> > > service providers any of which sells usenet all day for a good deal.
> > >
> > > Let's see here, $25/MM, storage on the cloud last year for about
> > > a million messages for a month is about $25. Outbound traffic is
> > > usually the metered cloud traffic, here for example that CDN traffic
> > > support the universal share convention, under metering. What that
> > > the algorithm is effectively tunable in CPU and RAM, makes for under
> > > I/O that's it's "unobtrusive" or the cooperative in routine, for CPI I/O and
> > > RAM, then that there is for seeking that Network Store or Database Time
> > > instead effectively becomes File I/O time, as what may be faster,
> > > and more durable. There's a faster database time for scaling the ingestion
> > > here with that the file view is eventually consistent. (And reliable.)
> > >
> > > Checking the files would be over time for example with "last checked"
> > > and "last dropped" something along the lines of, finding wrong offsets,
> > > basically having to make it so that it survives neatly corruption of the
> > > store (by being more-or-less stored in-place).
> > >
> > > Content catalog and such, catalog.
> > Then I wonder and figure the re-routine can scale.
> >
> > Here for the re-routine, the industry factory pattern,
> > and the commands in the protocols in the templates,
> > and the memory module, with the algorithm interface,
> > in the high-performance computer resource, it is here
> > that this simple kind of "writing Internet software"
> > makes pretty rapidly for adding resources.
> >
> > Here the design is basically of a file I/O abstraction,
> > that the computer reads data files with mmap to get
> > their handlers, what results that for I/O map the channels
> > result transferring the channels in I/O for what results,
> > in mostly the allocated resource requirements generally,
> > and for the protocol and algorithm, it results then that
> > the industry factory pattern and making for interfaces,
> > then also here the I/O routine as what results that this
> > is an implementation, of a network server, mostly is making
> > for that the re-routine, results very neatly a model of
> > parallel cooperation.
> >
> > I think computers still have file systems and file I/O but
> > in abstraction just because PAGE_SIZE is still relevant for
> > the network besides or I/O, if eventually, here is that the
> > value types are in the commands and so on, it is besides
> > that in terms of the resources so defined it still is in a filesystem
> > convention that a remote and unreliable view of it suffices.
> >
> > Here then the source code also being "this is only 20-50k",
> > lines of code, with basically an entire otherwise library stack
> > of the runtime itself, only the network and file abstraction,
> > this makes for also that modularity results. (Factory Industry
> > Pattern Modules.)
> >
> > For a network server, here, that, mostly it is high performance
> > in the sense that this is about the most direct handle on the channels
> > and here mostly for the text layer in the I/O order, or protocol layer,
> > here is that basically encryption and compression usually in the layer,
> > there is besides a usual concern where encryption and compression
> > are left out, there is that text in the layer itself is commands.
> >
> > Then, those being constants under the resources for the protocol,
> > it's what results usual protocols like NNTP and HTTP and other protocols
> > with usually one server and many clients, here is for that these protocols
> > are defined in these modules, mostly there NNTP and IMAP, ..., HTTP.
> >
> > These are here defined "all Java" or "Pure Java", i.e. let's be clear that
> > in terms of the reference abstraction layer, I think computers still use
> > the non-blocking I/O and filesystems and network to RAM, so that as
> > the I/O is implemented in those it actually has those besides instead for
> > example defaulting to byte-per-channel or character I/O. I.e. the usual
> > semantics for servicing the I/O in the accepter routine and what makes
> > for that the platform also provides a reference encryption implementation,
> > if not so relevant for the block encoder chain, besides that for example
> > compression has a default implementation, here the I/O model is as simply
> > in store for handles, channels, ..., that it results that data especially delivered
> > from a constant store can anyways be mostly compressed and encrypted
> > already or predigested to serve, here that it's the convention, here is for
> > resulting that these client-server protocols, with usually reads > postings
> > then here besides "retention", basically here is for what it is.
> >
> > With the re-routine and the protocol layer besides, having written the
> > routines in the re-routine, what there is to write here is this industry
> > factory, or a module framework, implementing the re-routines, as they're
> > built from the linear description a routine, makes for as the routine progresses
> > that it's "in the language" and that more than less in the terms, it makes for
> > implementing the case of logic for values, in the logic's flow-of-control's terms.
> >
> > Then, there is that actually running the software is different than just
> > writing it, here in the sense that as a server runtime, it is to be made a
> > thing, by giving it a name, and giving it an authority, to exist on the Internet.
> >
> > There is basically that for BGP and NAT and so on, and, mobile fabric networks,
> > IP and TCP/IP, of course IPv4 and IPv6 are the coarse fabric main space, with
> > respect to what are CIDR and 24 bits rule and what makes for TCP/IP, here
> > entirely the course is using the TCP/IP stack and Java's TCP/IP stack, with
> > respect to that TCP/IP is so provided or in terms of process what results
> > ports mostly and connection models where it is exactly the TCP after the IP,
> > the Transport Control Protocol and Internet Protocol, have here both this
> > socket and datagram connection orientation, or stateful and stateless or
> > here that in terms of routing it's defined in addresses, under that names
> > and routing define sources, routes, destinations, ..., that routine numeric
> > IP addresses result in the usual sense of the network being behind an IP
> > and including IPv4 network fabric with respect to local routers.
> >
> > I.e., here to include a service framework is "here besides the routine, let's
> > make it clear that in terms of being a durable resource, there needs to be
> > some lockbox filled with its sustenance that in some locked or constant
> > terms results that for the duration of its outlay, say five years, it is held
> > up, then, it will be so again, or, let down to result the carry-over that it
> > invested to archive itself, I won't have to care or do anything until then".
> >
> >
> > About the service activation and the idea that, for a port, the routine itself
> > needs only run under load, i.e. there is effectively little traffic on the old archives,
> > and usually only the some other archive needs any traffic. Here the point is
> > that for the Java routine there is the system port that was accepted for the
> > request, that inetd or the systemd or means the network service was accessed,
> > made for that much as for HTTP the protocol is client-server also for IP the
> > protocol is client-server, while the TCP is packets. This is a general idea for
> > system integration while here mostly the routine is that being a detail:
> > the filesystem or network resource that results that the re-routines basically
> > make very large CPU scaling.
> >
> > Then, it is basically containerized this sense of "at some domain name, there
> > is a service, it's HTTP and NNTP and IMAP besides, what cares the world".
> >
> > I.e. being built on connection oriented protocols like the socket layer,
> > HTTP(S) and NNTP(S) and IMAP(S) or with the TLS orientation to certificates,
> > it's more than less sensible that most users have no idea of installing some
> > NNTP browser or pointing their email to IMAP so that the email browser
> > browses the newsgroups and for postings, here this is mostly only talk
> > about implementing NNTP then IMAP and HTTP that happens to look like that,
> > besides for example SMTP or NNTP posting.
> >
> > I.e., having "this IMAP server, happens to be this NNTP module", or
> > "this HTTP server, happens to be a real simple mailbox these groups",
> > makes for having partitions and retentions of those and that basically
> > NNTP messages in the protocol can be more or less the same content
> > in media, what otherwise is of a usual message type.
> >
> > Then, the NNTP server-server routine is the progation of messages
> > besides "I shall hire ten great usenet retention accounts and gently
> > and politely draw them down and back-fill Usenet, these ten groups".
> >
> > By then I would have to have made for retention in storage, such contents,
> > as have a reference value, then for besides making that independent in
> > reference value, just so that it suffices that it basically results "a usable
> > durable filesystem that happens you can browse it like usenet". I.e. as
> > the pieces to make the backfill are dug up, they get assigned reference numbers
> > of their time to make for what here is that in a grand schema of things,
> > they have a reference number in numerical order (and what's also the
> > server's "message-number" besides its "message-id") as noted above this
> > gets into the storage for retention of a file, while, most services for this
> > are instead for storage and serving, not necessarily or at all retention.
> >
> > I.e., the point is that as the groups are retained from retention, there is an
> > approach what makes for an orderly archeology, as for what convention
> > some data arrives, here that this server-server routine is besides the usual
> > routine which is "here are new posts, propagate them", it's "please deliver
> > as of a retention scan, and I'll try not to repeat it, what results as orderly
> > as possible a proof or exercise of what we'll call afterward entire retention",
> > then will be for as of writing a file that "as of the date, from start to finish,
> > this site certified these messages as best-effort retention".
> >
> > It seems then besides there is basically "here is some mbox file, serve it
> > like it was an NNTP group or an IMAP mailbox", ingestion, in terms of that
> > what is ingestion, is to result for the protocol that "for this protocol,
> > there is actually a normative filesystem representation that happens to
> > be pretty much also altogether definede by the protocol", the point is
> > that ingestion would result in command to remain in the protocol,
> > that a usual file type that "presents a usual abstraction, of a filesystem,
> > as from the contents of a file", here with the notion of "for all these
> > threaded discussions, here this system only cares some approach to
> > these ten particular newgroups that already have mostly their corpus
> > though it's not in perhaps their native mbox instead consulted from services".
> >
> > Then, there's for storing and serving the files, and there is the usual
> > notion that moving the data, is to result, that really these file organizations
> > are not so large in terms of resources, being "less than gigabytes" or so,
> > still there's a notion that as a durable resource they're to be made
> > fungible here the networked file approach in the native filesystem,
> > then that with respect to it's a backing store, it's to make for that
> > the entire enterprise is more or less to made in terms of account,
> > that then as a facility on the network then a service in the network,
> > it's basically separated the facility and service, while still of course
> > that the service is basically defined by its corpus.
> >
> >
> > Then, to make that fungible in a world of account, while with an exit
> > strategy so that the operation isn't not abstract, is mostly about the
> > domain name, then that what results the networking, after trusted
> > network naming and connections for what result routing, and then
> > the port, in terms of that there are usual firewalls in ports though that
> > besides usually enough client ports are ephemeral, here the point is
> > that the protocols and their well-known ports, here it's usually enough
> > that the Internet doesn't concern itself so much protocols but with
> > respect to proxies, here that for example NNTP and IMAP don't have
> > so much anything so related that way after startTLS. For the world of
> > account, is basically to have for a domain name, an administrator, and,
> > an owner or representative. These are to establish authority for changes
> > and also accountability for usage.
> >
> > Basically they're to be persons and there is a process to get to be an
> > administrator of DNS, most always there are services that a usual person
> > implementing the system might use, besides for example the numerical.
> >
> > More relevant though to DNS is getting servers on the network, with respect
> > to listening ports and that they connect to clients what so discover them as
> > via DNS or configuration, here as above the usual notion that these are
> > standard services and run on well-known ports for inetd or systemd.
> > I.e. there is basically that running a server and dedicated networking,
> > and power and so on, and some notion of the limits of reliability, is then
> > as very much in other aspects of the organization of the system, i.e. its name,
> > while at the same time, the point that a module makes for that basically
> > the provision of a domain name or well-known or ephemeral host, is the
> > usual notion that static IP addresses are a limited resource and as about
> > the various networks in IPv4 and how they route traffic, is for that these
> > services have well-known sections in DNS for at least that the most usual
> > configuration is none.
> >
> > For a usual global reliability and availability, is some notion basically that
> > each region and zone has a service available on the IP address, for that
> > "hostname" resolves to the IP addresses. As well, in reverse, for the IP
> > address and about the hostname, it should resolve reverse to hostname.
> >
> > About certificates mostly for identification after mapping to port, or
> > multi-home Internet routing, here is the point that whether the domain
> > name administration is "epochal" or "regular", is that epochs are defined
> > by the ports behind the numbers and the domain name system as well,
> > where in terms of the registrar, the domain names are epochal to the
> > registrar, with respect to owners of domain names.
> >
> > Then if DNS is a datagram or UDP service is for ICMP as for TCP/IP,
> > and also BGP and NAT and routing and what are local and remote
> > addresses, here is for not-so-much "implement DNS the protocol
> > also while you're at it", rather for what results that there is a durable
> > and long-standing and proper doorman, for some usenet.science.
> >
> > Here then the notion seems to be whether the doorman basically
> > knows well-known services, is a multi-homing router, or otherwise
> > what is the point that it starts the lean runtime, with respect to that
> > it's a container and having enough sense of administration its operation
> > as contained. I.e. here given a port and a hostname and always running
> > makes for that as long as there is the low (preferable no) idle for services
> > running that have no clients, is here also for the cheapest doorman that
> > knows how to standup the client sentinel. (And put it back away.)
> >
> > Probably the most awful thing in the cloud services is the cost for
> > data ingress and egress. What that means is that for example using
> > a facility that is bound by that as a cost instead of under some constant
> > cost, is basically why there is the approach that the containers needs a
> > handle to the files, and they're either local files or network files, here
> > with the some convention above in archival a shared consistent view
> > of all the files, or abstractly consistent, is for making that the doorman
> > can handle lots of starting and finishing connections, while it is out of
> > the way when usually it's client traffic and opening and closing connections,
> > and the usual abstraction is that the client sentinel is never off and doorman
> > does nothing, here is for attaching the one to some lower constant cost,
> > where for example any long-running cost is more than some low constant cost.
> >
> > Then, this kind of service is often represented by nodes, in the usual sense
> > "here is an abstract container with you hope some native performance under
> > the hypervisor where it lives on the farm on its rack, it basically is moved the
> > image to wherever it's requested from and lives there, have fun, the meter is on".
> > I.e. that's just "this Jar has some config conventions and you can make the
> > container associate it and watchdog it with systemd for example and use the
> > cgroups while you're at it and make for tempfs quota and also the best network
> > file share, which you might be welcome to cache if you care just in the off-chance
> > that this file-mapping is free or constant cost as long as it doesn't egress the
> > network", is for here about the facilities that work, to get a copy of the system
> > what with respect to its usual operation is a piece of the Internet.
> >
> > For the different reference modules (industry factories) in their patterns then
> > and under combined configuration "file + process + network + fare", is that
> > the fare of the service basically reflects a daily coin, in the sense that it
> > represents an annual or epochal fee, what results for the time there is
> > what is otherwise all defined the "file + process + network + name",
> > what results it perpetuates in operation more than less simply and automatically.
> >
> > Then, the point though is to get it to where "I can go to this service, and
> > administer it more or less by paying an account, that it thus lives in its
> > budget and quota in its metered world".
> >
> > That though is very involved with identity, that in terms of "I the account
> > as provided this sum make this sum paid with respect to an agreement",
> > is that authority to make agreements must make that it results that the
> > operation of the system, is entirely transparent, and defined in terms of
> > the roles and delegation, conventions in operation.
> >
> > I.e., I personally don't want to administer a copy of usenet, but, it's here
> > pretty much sorted out that I can administer one once then that it's to
> > administer itself in the following, in terms of it having resources to allocate
> > and resources to disburse. Also if nobody's using it it should basically work
> > itself out to dial its lights down (while maintaining availability).
> >
> > Then a point seems "maintain and administer the operation in effect,
> > what arrangement sees via delegation, that a card number and a phone
> > number and an email account and more than less a responsible entity,
> > is so indicated for example in cryptographic identity thus that the operation
> > of this system as a service, effectively operates itself out of a kitty,
> > what makes for administration and overhead, an entirely transparent
> > model of a miniature business the system as a service".
> >
> > "... and a mailing address and mail service."
> >
> > Then, for accounts and accounts, for example is the provision of the component
> > as simply an image in cloud algorithms, where basically as above here it's configured
> > that anybody with any cloud account could basically run it on their own terms,
> > there is for here sorting out "after this delegation to some business entity what
> > results a corporation in effect, the rest is business-in-a-box and more-than-less
> > what makes for its administration in state, is for how it basically limits and replicates
> > its service, in terms of its own assets here as what administered is abstractly
> > "durable forever mailboxes with private ownership if on public or managed resources".
> >
> > A usual notion of a private email and usenet service offering and business-in-a-box,
> > here what I'm looking at is that besides archiving sci.math and copying out its content
> > under author line, is to make such an industry for example here that "once having
> > implemented an Internet service, an Internet service of them results Internet".
> >
> > I.e. here the point is to make a corporation and a foundation in effect, what in terms
> > of then about the books and accounts, is about accounts for the business accounts
> > that reflect a persistent entity, then what results in terms of computing, networking,
> > and internetworking, with a regular notion of "let's never change this arrangement
> > but it's in monthly or annual terms", here for that in overall arrangements,
> > it results what the entire system more than less runs in ways then to either
> > run out its limits or make itself a sponsored effort, about more-or-less a simple
> > and responsible and accountable set of operations what effect the business
> > (here that in terms of service there is basically the realm of agreement)
> > that basically this sort of business-in-a-box model, is then besides itself of
> > accounts, toward the notion as pay-as-you-go and "usual credits and their limits".
> >
> > Then for a news://usenet.science, or for example sci.math.usenet.science,
> > is the idea that the entity is "some assemblage what is so that in DNS, and,
> > in the accounts payable and receivable, and, in the material matters of
> > arrangement and authority for administration, of DNS and resources and
> > accounts what result durably persisting the business, is basically for a service
> > then of what these are usual enough tasks, as that are interactive workflows
> > and for mechanical workflows.
> >
> > I.e. the point is for having the service than an on/off button and more or less
> > what is for a given instance of the operation, what results from some protocol
> > that provides a "durable store" of a sort of the business, that at any time basically
> > some re-routine or "eventually consistent" continuance of the operation of the
> > business, results basically a continuity in its operations, what is entirely granular,
> > that here for example the point is to "pick a DNS name, attach an account service,
> > go" it so results that in the terms, basically there are the placeholders of the
> > interactive workflows in that, and as what in terms are often for example simply
> > card and phone number terms, account terms.
> >
> > I.e. a service to replenish accounts as kitties for making accounts only and
> > exactly limited to the one service, its transfers, basically results that there
> > is the notion of an email address, a phone number, a credit card's information,
> > here a fixed limit debit account that works as of a kitty, there is a regular workflow
> > service that will read out the durable stores and according to the timeliness of
> > their events, affect the configuration and reconciliation of payments for accounts
> > (closed loop scheduling/receiving).
> >
> > https://datatracker.ietf.org/doc/draft-flanagan-regext-datadictionary/
> > https://www.rfc-editor.org/rfc/rfc9022.txt
> >
> > Basically for dailies, monthlies, and annuals, what make weeklies,
> > is this idea of Internet-from-a- account, what is services.
> After implementing a store, and the protocol for getting messages, then what seems relevant here in the
> context of the SEARCH command, is a fungible file-format, that is derived from the body of the message
> in a normal form, that is a data structure that represents an index and catalog and dictionary and summary
> of the message, a form of a data structure of a "search index".
>
> These types files should naturally compose, and result a data structure that according to some normal
> forms of search and summary algorithms, result that a data structure results, that makes for efficient
> search of sections of the corpus for information retrieval, here that "information retrieval is the science
> of search algorithms".
>
> Now, for what and how people search, or what is the specification of a search, is in terms of queries, say,
> here for some brief forms of queries that advise what's definitely included in the search, what's excluded,
> then perhaps what's maybe included, or yes/no/maybe, which makes for a predicate that can be built,
> that can be applied to results that compose and build for the terms of a filter with yes/no/maybe or
> sure/no/yes, with predicates in values.
>
> Here there is basically "free text search" and "matching summaries", where text is the text and summary is
> a data structure, with attributes as paths the leaves of the tree of which match.
>
> Then, the message has text, its body, and and headers, key-value pairs or collections thereof, where as well
> there are default summaries like "a histogram of words by occurrence" or for example default text like "the
> MIME body of this message has a default text representation".
>
> So, the idea developing here is to define what are "normal" forms of data structures that have some "normal"
> forms of encoding that result that these "normalizing" after "normative" data structures define well-behaved
> algorithms upon them, which provide well-defined bounds in resources that return some quantification of results,
> like any/each/every/all, "hits".
>
> This is where usually enough search engines' or collected search algorithms ("find") usually enough have these
> de-facto forms, "under the hood", as it were, to make it first-class that for a given message and body that
> there is a normal form of a "catalog summary index" which can be compiled to a constant when the message
> is ingested, that then basically any filestore of these messages has alongside it the filestore of the "catsums"
> or as on-demand, then that any algorithm has at least well-defined behavior under partitions or collections
> or selections of these messages, or items, for various standard algorithms that separate "to find" from
> "to serve to find".
>
> So, ..., what I'm wondering are what would be sufficient normal forms in brief that result that there are
> defined for a given corpus of messages, basically at the granularity of messages, how is defined how
> there is a normal form for each message its "catsum", that catums have a natural algebra that a
> concatenation of catums is a catsum and that some standard algorithms naturally have well-defined
> results on their predicates and quantifiers of matching, in serial and parallel, and that the results
> combine in serial and parallel.
>
> The results should be applicable to any kind of data but here it's more or less about usenet groups.


So I start browsing the Information Retrieval section in Wikipedia and more or less get to reading
Luhn's 1958 "automatic coding of document summaries" or "The Automatic Creation of Literature
Abstracts". Then, what I figure, is that the histogram, is an associative array of keys to counts,
and what I figure is to compute both the common terms, and, the rare terms, so that there's both
"common-weight" and "rare-weight" computed, off of the count of the terms, and the count of
distinct terms, where it is working up that besides catums, or catsums, it would result a relational
algebra of terms in, ..., terms, of counts and densities and these type things. This is where, first I
would figure the catsum would be deterministic before it's at all probabilistic, because the goal is
match-find not match-guess, while still it's to support the less deterministic but more opportunistic
at the same time.

Then, the "index" is basically like a usual book's index, for each term that's not a common term in
the language but is a common term in the book, what page it's on, here that that is a read-out of
a histogram of the terms to pages. Then, compound terms, basically get into grammar, and in terms
of terms, I don't so much care to parse glossolalia as what result mostly well-defined compound terms
in usual natural languages, for the utility of a dictionary and technical dictionaries. Here "pages" are
both according to common message threads, and also the surround of messages in the same time
period, where a group is a common message thread and a usenet is a common message thread.

(I've had a copy of "the information retrieval book" before, also borrowed one "data logic".)

"Spelling mistakes considered adversarial."

https://en.wikipedia.org/wiki/Subject_indexing#Indexing_theory

Then, there's lots to be said for "summary" and "summary in statistic".


A first usual data structure for efficiency is the binary tree or bounding tree. Then, there's
also what makes for divide-and-conquer or linear speedup.


About the same time as Luhn's monograph or 1956, there was published a little book
called "Logic and Language", Huppe and Kaminsky. It details how according to linguistics
there are certain usual regular patterns of words after phonemes and morphology what
result then for stems and etymology that then for vocabulary that grammar or natural
language results above. Then there are also gentle introductions to logic. It's very readable
and quite brief.
V
2023-03-17 14:10:23 UTC
Reply
Permalink
Let's get to know each other. Me: http://kohtumispaik.000webhostapp.com/Infovahetusteks/dpic/1679061026.gif

Have a nice day......









On Thursday, March 9, 2023 at 8:23:04 AM UTC+2, Ross Finlayson wrote:
> On Wednesday, March 8, 2023 at 8:51:58 PM UTC-8, Ross Finlayson wrote:
> > On Monday, December 6, 2021 at 7:32:16 AM UTC-8, Ross A. Finlayson wrote:
> > > On Monday, November 16, 2020 at 5:39:08 PM UTC-8, Ross A. Finlayson wrote:
> > > > On Monday, November 16, 2020 at 5:00:51 PM UTC-8, Ross A. Finlayson wrote:
> > > > > On Tuesday, June 30, 2020 at 10:00:52 AM UTC-7, Mostowski Collapse wrote:
> > > > > > NNTP is not HTTP. I was using bare metal access to
> > > > > > usenet, not using Google group, via:
> > > > > >
> > > > > > news.albasani.net, unfortunately dead since Corona
> > > > > >
> > > > > > So was looking for an alternative. And found this
> > > > > > alternative, which seems fine:
> > > > > >
> > > > > > news.solani.org
> > > > > >
> > > > > > Have Fun!
> > > > > >
> > > > > > P.S.: Technical spec of news.solani.org:
> > > > > >
> > > > > > Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 500 GB SSD RAID1
> > > > > > Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 2 TB HDD RAID1
> > > > > > Intel Xeon VM, 1.8 GHz, 1 GB RAM, 20 GB
> > > > > > Standort: 2x Falkenstein, 1x New York
> > > > > >
> > > > > > advantage of bare metal usenet,
> > > > > > you see all headers of message.
> > > > > > Am Dienstag, 30. Juni 2020 06:24:53 UTC+2 schrieb Ross A. Finlayson:
> > > > > > > Search you mentioned and for example HTTP is adding the SEARCH verb,
> > > > > In traffic there are two kinds of usenet users,
> > > > > viewers and traffic through Google Groups,
> > > > > and, USENET. (USENET traffic.)
> > > > >
> > > > > Here now Google turned on login to view their
> > > > > Google Groups - effectively closing the Google Groups
> > > > > without a Google login.
> > > > >
> > > > > I suppose if they're used at work or whatever though
> > > > > they'd be open.
> > > > >
> > > > >
> > > > >
> > > > > Where I got with the C10K non-blocking I/O for a usenet server,
> > > > > it scales up though then I think in the runtime is a situation where
> > > > > it only runs epoll or kqueue that the test scale ups, then at the end
> > > > > or in sockets there is a drop, or it fell off the driver. I've implemented
> > > > > the code this far, what has all of NNTP in a file and then the "re-routine,
> > > > > industry-pattern back-end" in memory, then for that running usually.
> > > > >
> > > > > (Cooperative multithreading on top of non-blocking I/O.)
> > > > >
> > > > > Implementing the serial queue or "monohydra", or slique,
> > > > > makes for that then when the parser is constantly parsing,
> > > > > it seems a usual queue like data structure with parsing
> > > > > returning its bounds, consuming the queue.
> > > > >
> > > > > Having the file buffers all down small on 4K pages,
> > > > > has that a next usual page size is the megabyte.
> > > > >
> > > > > Here though it seems to make sense to have a natural
> > > > > 4K alignment the file system representation, then that
> > > > > it is moving files.
> > > > >
> > > > > So, then with the new modern Java, it that runs in its own
> > > > > Java server runtime environment, it seems I would also
> > > > > need to see whether the cloud virt supported the I/O model
> > > > > or not, or that the cooperative multi-threading for example
> > > > > would be single-threaded. (Blocking abstractly.)
> > > > >
> > > > > Then besides I suppose that could be neatly with basically
> > > > > the program model, and its file model, being well-defined,
> > > > > then for NNTP with IMAP organization search and extensions,
> > > > > those being standardized, seems to make sense for an efficient
> > > > > news file organization.
> > > > >
> > > > > Here then it seems for serving the NNTP, and for example
> > > > > their file bodies under the storage, with the fixed headers,
> > > > > variable header or XREF, and the message body, then under
> > > > > content it's same as storage.
> > > > >
> > > > > NNTP has "OVERVIEW" then from it is built search.
> > > > >
> > > > > Let's see here then, if I get the load test running, or,
> > > > > just put a limit under the load while there are no load test
> > > > > errors, it seems the algorithm then scales under load to be
> > > > > making usually the algorithm serial in CPU, with: encryption,
> > > > > and compression (traffic). (Block ciphers instead of serial transfer.)
> > > > >
> > > > > Then, the industry pattern with re-routines, has that the
> > > > > re-routines are naturally co-operative in the blocking,
> > > > > and in the language, including flow-of-control and exception scope.
> > > > >
> > > > >
> > > > > So, I have a high-performance implementation here.
> > > > It seems like for NFS, then, and having the separate read and write of the client,
> > > > a default filesystem, is an idea for the system facility: mirroring the mounted file
> > > > locally, and, providing the read view from that via a different route.
> > > >
> > > >
> > > > A next idea then seems for the organization, the client views themselves
> > > > organize over the durable and available file system representation, this
> > > > provides anyone a view over the protocol with a group file convention.
> > > >
> > > > I.e., while usual continuous traffic was surfing, individual reads over group
> > > > files could have independent views, for example collating contents.
> > > >
> > > > Then, extracting requests from traffic and threads seems usual.
> > > >
> > > > (For example a specialized object transfer view.)
> > > >
> > > > Making protocols for implementing internet protocols in groups and
> > > > so on, here makes for giving usenet example views to content generally.
> > > >
> > > > So, I have designed a protocol node and implemented it mostly,
> > > > then about designed an object transfer protocol, here the idea
> > > > is how to make it so people can extract data, for example their own
> > > > data, from a large durable store of all the usenet messages,
> > > > making views of usenet running on usenet, eg "Feb. 2016: AP's
> > > > Greatest Hits".
> > > >
> > > > Here the point is to figure that usenet, these days, can be operated
> > > > in cooperation with usenet, and really for its own sake, for leaving
> > > > messages in usenet and here for usenet protocol stores as there's
> > > > no reason it's plain text the content, while the protocol supports it.
> > > >
> > > > Building personal view for example is a simple matter of very many
> > > > service providers any of which sells usenet all day for a good deal.
> > > >
> > > > Let's see here, $25/MM, storage on the cloud last year for about
> > > > a million messages for a month is about $25. Outbound traffic is
> > > > usually the metered cloud traffic, here for example that CDN traffic
> > > > support the universal share convention, under metering. What that
> > > > the algorithm is effectively tunable in CPU and RAM, makes for under
> > > > I/O that's it's "unobtrusive" or the cooperative in routine, for CPI I/O and
> > > > RAM, then that there is for seeking that Network Store or Database Time
> > > > instead effectively becomes File I/O time, as what may be faster,
> > > > and more durable. There's a faster database time for scaling the ingestion
> > > > here with that the file view is eventually consistent. (And reliable.)
> > > >
> > > > Checking the files would be over time for example with "last checked"
> > > > and "last dropped" something along the lines of, finding wrong offsets,
> > > > basically having to make it so that it survives neatly corruption of the
> > > > store (by being more-or-less stored in-place).
> > > >
> > > > Content catalog and such, catalog.
> > > Then I wonder and figure the re-routine can scale.
> > >
> > > Here for the re-routine, the industry factory pattern,
> > > and the commands in the protocols in the templates,
> > > and the memory module, with the algorithm interface,
> > > in the high-performance computer resource, it is here
> > > that this simple kind of "writing Internet software"
> > > makes pretty rapidly for adding resources.
> > >
> > > Here the design is basically of a file I/O abstraction,
> > > that the computer reads data files with mmap to get
> > > their handlers, what results that for I/O map the channels
> > > result transferring the channels in I/O for what results,
> > > in mostly the allocated resource requirements generally,
> > > and for the protocol and algorithm, it results then that
> > > the industry factory pattern and making for interfaces,
> > > then also here the I/O routine as what results that this
> > > is an implementation, of a network server, mostly is making
> > > for that the re-routine, results very neatly a model of
> > > parallel cooperation.
> > >
> > > I think computers still have file systems and file I/O but
> > > in abstraction just because PAGE_SIZE is still relevant for
> > > the network besides or I/O, if eventually, here is that the
> > > value types are in the commands and so on, it is besides
> > > that in terms of the resources so defined it still is in a filesystem
> > > convention that a remote and unreliable view of it suffices.
> > >
> > > Here then the source code also being "this is only 20-50k",
> > > lines of code, with basically an entire otherwise library stack
> > > of the runtime itself, only the network and file abstraction,
> > > this makes for also that modularity results. (Factory Industry
> > > Pattern Modules.)
> > >
> > > For a network server, here, that, mostly it is high performance
> > > in the sense that this is about the most direct handle on the channels
> > > and here mostly for the text layer in the I/O order, or protocol layer,
> > > here is that basically encryption and compression usually in the layer,
> > > there is besides a usual concern where encryption and compression
> > > are left out, there is that text in the layer itself is commands.
> > >
> > > Then, those being constants under the resources for the protocol,
> > > it's what results usual protocols like NNTP and HTTP and other protocols
> > > with usually one server and many clients, here is for that these protocols
> > > are defined in these modules, mostly there NNTP and IMAP, ..., HTTP.
> > >
> > > These are here defined "all Java" or "Pure Java", i.e. let's be clear that
> > > in terms of the reference abstraction layer, I think computers still use
> > > the non-blocking I/O and filesystems and network to RAM, so that as
> > > the I/O is implemented in those it actually has those besides instead for
> > > example defaulting to byte-per-channel or character I/O. I.e. the usual
> > > semantics for servicing the I/O in the accepter routine and what makes
> > > for that the platform also provides a reference encryption implementation,
> > > if not so relevant for the block encoder chain, besides that for example
> > > compression has a default implementation, here the I/O model is as simply
> > > in store for handles, channels, ..., that it results that data especially delivered
> > > from a constant store can anyways be mostly compressed and encrypted
> > > already or predigested to serve, here that it's the convention, here is for
> > > resulting that these client-server protocols, with usually reads > postings
> > > then here besides "retention", basically here is for what it is.
> > >
> > > With the re-routine and the protocol layer besides, having written the
> > > routines in the re-routine, what there is to write here is this industry
> > > factory, or a module framework, implementing the re-routines, as they're
> > > built from the linear description a routine, makes for as the routine progresses
> > > that it's "in the language" and that more than less in the terms, it makes for
> > > implementing the case of logic for values, in the logic's flow-of-control's terms.
> > >
> > > Then, there is that actually running the software is different than just
> > > writing it, here in the sense that as a server runtime, it is to be made a
> > > thing, by giving it a name, and giving it an authority, to exist on the Internet.
> > >
> > > There is basically that for BGP and NAT and so on, and, mobile fabric networks,
> > > IP and TCP/IP, of course IPv4 and IPv6 are the coarse fabric main space, with
> > > respect to what are CIDR and 24 bits rule and what makes for TCP/IP, here
> > > entirely the course is using the TCP/IP stack and Java's TCP/IP stack, with
> > > respect to that TCP/IP is so provided or in terms of process what results
> > > ports mostly and connection models where it is exactly the TCP after the IP,
> > > the Transport Control Protocol and Internet Protocol, have here both this
> > > socket and datagram connection orientation, or stateful and stateless or
> > > here that in terms of routing it's defined in addresses, under that names
> > > and routing define sources, routes, destinations, ..., that routine numeric
> > > IP addresses result in the usual sense of the network being behind an IP
> > > and including IPv4 network fabric with respect to local routers.
> > >
> > > I.e., here to include a service framework is "here besides the routine, let's
> > > make it clear that in terms of being a durable resource, there needs to be
> > > some lockbox filled with its sustenance that in some locked or constant
> > > terms results that for the duration of its outlay, say five years, it is held
> > > up, then, it will be so again, or, let down to result the carry-over that it
> > > invested to archive itself, I won't have to care or do anything until then".
> > >
> > >
> > > About the service activation and the idea that, for a port, the routine itself
> > > needs only run under load, i.e. there is effectively little traffic on the old archives,
> > > and usually only the some other archive needs any traffic. Here the point is
> > > that for the Java routine there is the system port that was accepted for the
> > > request, that inetd or the systemd or means the network service was accessed,
> > > made for that much as for HTTP the protocol is client-server also for IP the
> > > protocol is client-server, while the TCP is packets. This is a general idea for
> > > system integration while here mostly the routine is that being a detail:
> > > the filesystem or network resource that results that the re-routines basically
> > > make very large CPU scaling.
> > >
> > > Then, it is basically containerized this sense of "at some domain name, there
> > > is a service, it's HTTP and NNTP and IMAP besides, what cares the world".
> > >
> > > I.e. being built on connection oriented protocols like the socket layer,
> > > HTTP(S) and NNTP(S) and IMAP(S) or with the TLS orientation to certificates,
> > > it's more than less sensible that most users have no idea of installing some
> > > NNTP browser or pointing their email to IMAP so that the email browser
> > > browses the newsgroups and for postings, here this is mostly only talk
> > > about implementing NNTP then IMAP and HTTP that happens to look like that,
> > > besides for example SMTP or NNTP posting.
> > >
> > > I.e., having "this IMAP server, happens to be this NNTP module", or
> > > "this HTTP server, happens to be a real simple mailbox these groups",
> > > makes for having partitions and retentions of those and that basically
> > > NNTP messages in the protocol can be more or less the same content
> > > in media, what otherwise is of a usual message type.
> > >
> > > Then, the NNTP server-server routine is the progation of messages
> > > besides "I shall hire ten great usenet retention accounts and gently
> > > and politely draw them down and back-fill Usenet, these ten groups".
> > >
> > > By then I would have to have made for retention in storage, such contents,
> > > as have a reference value, then for besides making that independent in
> > > reference value, just so that it suffices that it basically results "a usable
> > > durable filesystem that happens you can browse it like usenet". I.e. as
> > > the pieces to make the backfill are dug up, they get assigned reference numbers
> > > of their time to make for what here is that in a grand schema of things,
> > > they have a reference number in numerical order (and what's also the
> > > server's "message-number" besides its "message-id") as noted above this
> > > gets into the storage for retention of a file, while, most services for this
> > > are instead for storage and serving, not necessarily or at all retention.
> > >
> > > I.e., the point is that as the groups are retained from retention, there is an
> > > approach what makes for an orderly archeology, as for what convention
> > > some data arrives, here that this server-server routine is besides the usual
> > > routine which is "here are new posts, propagate them", it's "please deliver
> > > as of a retention scan, and I'll try not to repeat it, what results as orderly
> > > as possible a proof or exercise of what we'll call afterward entire retention",
> > > then will be for as of writing a file that "as of the date, from start to finish,
> > > this site certified these messages as best-effort retention".
> > >
> > > It seems then besides there is basically "here is some mbox file, serve it
> > > like it was an NNTP group or an IMAP mailbox", ingestion, in terms of that
> > > what is ingestion, is to result for the protocol that "for this protocol,
> > > there is actually a normative filesystem representation that happens to
> > > be pretty much also altogether definede by the protocol", the point is
> > > that ingestion would result in command to remain in the protocol,
> > > that a usual file type that "presents a usual abstraction, of a filesystem,
> > > as from the contents of a file", here with the notion of "for all these
> > > threaded discussions, here this system only cares some approach to
> > > these ten particular newgroups that already have mostly their corpus
> > > though it's not in perhaps their native mbox instead consulted from services".
> > >
> > > Then, there's for storing and serving the files, and there is the usual
> > > notion that moving the data, is to result, that really these file organizations
> > > are not so large in terms of resources, being "less than gigabytes" or so,
> > > still there's a notion that as a durable resource they're to be made
> > > fungible here the networked file approach in the native filesystem,
> > > then that with respect to it's a backing store, it's to make for that
> > > the entire enterprise is more or less to made in terms of account,
> > > that then as a facility on the network then a service in the network,
> > > it's basically separated the facility and service, while still of course
> > > that the service is basically defined by its corpus.
> > >
> > >
> > > Then, to make that fungible in a world of account, while with an exit
> > > strategy so that the operation isn't not abstract, is mostly about the
> > > domain name, then that what results the networking, after trusted
> > > network naming and connections for what result routing, and then
> > > the port, in terms of that there are usual firewalls in ports though that
> > > besides usually enough client ports are ephemeral, here the point is
> > > that the protocols and their well-known ports, here it's usually enough
> > > that the Internet doesn't concern itself so much protocols but with
> > > respect to proxies, here that for example NNTP and IMAP don't have
> > > so much anything so related that way after startTLS. For the world of
> > > account, is basically to have for a domain name, an administrator, and,
> > > an owner or representative. These are to establish authority for changes
> > > and also accountability for usage.
> > >
> > > Basically they're to be persons and there is a process to get to be an
> > > administrator of DNS, most always there are services that a usual person
> > > implementing the system might use, besides for example the numerical.
> > >
> > > More relevant though to DNS is getting servers on the network, with respect
> > > to listening ports and that they connect to clients what so discover them as
> > > via DNS or configuration, here as above the usual notion that these are
> > > standard services and run on well-known ports for inetd or systemd.
> > > I.e. there is basically that running a server and dedicated networking,
> > > and power and so on, and some notion of the limits of reliability, is then
> > > as very much in other aspects of the organization of the system, i.e. its name,
> > > while at the same time, the point that a module makes for that basically
> > > the provision of a domain name or well-known or ephemeral host, is the
> > > usual notion that static IP addresses are a limited resource and as about
> > > the various networks in IPv4 and how they route traffic, is for that these
> > > services have well-known sections in DNS for at least that the most usual
> > > configuration is none.
> > >
> > > For a usual global reliability and availability, is some notion basically that
> > > each region and zone has a service available on the IP address, for that
> > > "hostname" resolves to the IP addresses. As well, in reverse, for the IP
> > > address and about the hostname, it should resolve reverse to hostname.
> > >
> > > About certificates mostly for identification after mapping to port, or
> > > multi-home Internet routing, here is the point that whether the domain
> > > name administration is "epochal" or "regular", is that epochs are defined
> > > by the ports behind the numbers and the domain name system as well,
> > > where in terms of the registrar, the domain names are epochal to the
> > > registrar, with respect to owners of domain names.
> > >
> > > Then if DNS is a datagram or UDP service is for ICMP as for TCP/IP,
> > > and also BGP and NAT and routing and what are local and remote
> > > addresses, here is for not-so-much "implement DNS the protocol
> > > also while you're at it", rather for what results that there is a durable
> > > and long-standing and proper doorman, for some usenet.science.
> > >
> > > Here then the notion seems to be whether the doorman basically
> > > knows well-known services, is a multi-homing router, or otherwise
> > > what is the point that it starts the lean runtime, with respect to that
> > > it's a container and having enough sense of administration its operation
> > > as contained. I.e. here given a port and a hostname and always running
> > > makes for that as long as there is the low (preferable no) idle for services
> > > running that have no clients, is here also for the cheapest doorman that
> > > knows how to standup the client sentinel. (And put it back away.)
> > >
> > > Probably the most awful thing in the cloud services is the cost for
> > > data ingress and egress. What that means is that for example using
> > > a facility that is bound by that as a cost instead of under some constant
> > > cost, is basically why there is the approach that the containers needs a
> > > handle to the files, and they're either local files or network files, here
> > > with the some convention above in archival a shared consistent view
> > > of all the files, or abstractly consistent, is for making that the doorman
> > > can handle lots of starting and finishing connections, while it is out of
> > > the way when usually it's client traffic and opening and closing connections,
> > > and the usual abstraction is that the client sentinel is never off and doorman
> > > does nothing, here is for attaching the one to some lower constant cost,
> > > where for example any long-running cost is more than some low constant cost.
> > >
> > > Then, this kind of service is often represented by nodes, in the usual sense
> > > "here is an abstract container with you hope some native performance under
> > > the hypervisor where it lives on the farm on its rack, it basically is moved the
> > > image to wherever it's requested from and lives there, have fun, the meter is on".
> > > I.e. that's just "this Jar has some config conventions and you can make the
> > > container associate it and watchdog it with systemd for example and use the
> > > cgroups while you're at it and make for tempfs quota and also the best network
> > > file share, which you might be welcome to cache if you care just in the off-chance
> > > that this file-mapping is free or constant cost as long as it doesn't egress the
> > > network", is for here about the facilities that work, to get a copy of the system
> > > what with respect to its usual operation is a piece of the Internet.
> > >
> > > For the different reference modules (industry factories) in their patterns then
> > > and under combined configuration "file + process + network + fare", is that
> > > the fare of the service basically reflects a daily coin, in the sense that it
> > > represents an annual or epochal fee, what results for the time there is
> > > what is otherwise all defined the "file + process + network + name",
> > > what results it perpetuates in operation more than less simply and automatically.
> > >
> > > Then, the point though is to get it to where "I can go to this service, and
> > > administer it more or less by paying an account, that it thus lives in its
> > > budget and quota in its metered world".
> > >
> > > That though is very involved with identity, that in terms of "I the account
> > > as provided this sum make this sum paid with respect to an agreement",
> > > is that authority to make agreements must make that it results that the
> > > operation of the system, is entirely transparent, and defined in terms of
> > > the roles and delegation, conventions in operation.
> > >
> > > I.e., I personally don't want to administer a copy of usenet, but, it's here
> > > pretty much sorted out that I can administer one once then that it's to
> > > administer itself in the following, in terms of it having resources to allocate
> > > and resources to disburse. Also if nobody's using it it should basically work
> > > itself out to dial its lights down (while maintaining availability).
> > >
> > > Then a point seems "maintain and administer the operation in effect,
> > > what arrangement sees via delegation, that a card number and a phone
> > > number and an email account and more than less a responsible entity,
> > > is so indicated for example in cryptographic identity thus that the operation
> > > of this system as a service, effectively operates itself out of a kitty,
> > > what makes for administration and overhead, an entirely transparent
> > > model of a miniature business the system as a service".
> > >
> > > "... and a mailing address and mail service."
> > >
> > > Then, for accounts and accounts, for example is the provision of the component
> > > as simply an image in cloud algorithms, where basically as above here it's configured
> > > that anybody with any cloud account could basically run it on their own terms,
> > > there is for here sorting out "after this delegation to some business entity what
> > > results a corporation in effect, the rest is business-in-a-box and more-than-less
> > > what makes for its administration in state, is for how it basically limits and replicates
> > > its service, in terms of its own assets here as what administered is abstractly
> > > "durable forever mailboxes with private ownership if on public or managed resources".
> > >
> > > A usual notion of a private email and usenet service offering and business-in-a-box,
> > > here what I'm looking at is that besides archiving sci.math and copying out its content
> > > under author line, is to make such an industry for example here that "once having
> > > implemented an Internet service, an Internet service of them results Internet".
> > >
> > > I.e. here the point is to make a corporation and a foundation in effect, what in terms
> > > of then about the books and accounts, is about accounts for the business accounts
> > > that reflect a persistent entity, then what results in terms of computing, networking,
> > > and internetworking, with a regular notion of "let's never change this arrangement
> > > but it's in monthly or annual terms", here for that in overall arrangements,
> > > it results what the entire system more than less runs in ways then to either
> > > run out its limits or make itself a sponsored effort, about more-or-less a simple
> > > and responsible and accountable set of operations what effect the business
> > > (here that in terms of service there is basically the realm of agreement)
> > > that basically this sort of business-in-a-box model, is then besides itself of
> > > accounts, toward the notion as pay-as-you-go and "usual credits and their limits".
> > >
> > > Then for a news://usenet.science, or for example sci.math.usenet.science,
> > > is the idea that the entity is "some assemblage what is so that in DNS, and,
> > > in the accounts payable and receivable, and, in the material matters of
> > > arrangement and authority for administration, of DNS and resources and
> > > accounts what result durably persisting the business, is basically for a service
> > > then of what these are usual enough tasks, as that are interactive workflows
> > > and for mechanical workflows.
> > >
> > > I.e. the point is for having the service than an on/off button and more or less
> > > what is for a given instance of the operation, what results from some protocol
> > > that provides a "durable store" of a sort of the business, that at any time basically
> > > some re-routine or "eventually consistent" continuance of the operation of the
> > > business, results basically a continuity in its operations, what is entirely granular,
> > > that here for example the point is to "pick a DNS name, attach an account service,
> > > go" it so results that in the terms, basically there are the placeholders of the
> > > interactive workflows in that, and as what in terms are often for example simply
> > > card and phone number terms, account terms.
> > >
> > > I.e. a service to replenish accounts as kitties for making accounts only and
> > > exactly limited to the one service, its transfers, basically results that there
> > > is the notion of an email address, a phone number, a credit card's information,
> > > here a fixed limit debit account that works as of a kitty, there is a regular workflow
> > > service that will read out the durable stores and according to the timeliness of
> > > their events, affect the configuration and reconciliation of payments for accounts
> > > (closed loop scheduling/receiving).
> > >
> > > https://datatracker.ietf.org/doc/draft-flanagan-regext-datadictionary/
> > > https://www.rfc-editor.org/rfc/rfc9022.txt
> > >
> > > Basically for dailies, monthlies, and annuals, what make weeklies,
> > > is this idea of Internet-from-a- account, what is services.
> > After implementing a store, and the protocol for getting messages, then what seems relevant here in the
> > context of the SEARCH command, is a fungible file-format, that is derived from the body of the message
> > in a normal form, that is a data structure that represents an index and catalog and dictionary and summary
> > of the message, a form of a data structure of a "search index".
> >
> > These types files should naturally compose, and result a data structure that according to some normal
> > forms of search and summary algorithms, result that a data structure results, that makes for efficient
> > search of sections of the corpus for information retrieval, here that "information retrieval is the science
> > of search algorithms".
> >
> > Now, for what and how people search, or what is the specification of a search, is in terms of queries, say,
> > here for some brief forms of queries that advise what's definitely included in the search, what's excluded,
> > then perhaps what's maybe included, or yes/no/maybe, which makes for a predicate that can be built,
> > that can be applied to results that compose and build for the terms of a filter with yes/no/maybe or
> > sure/no/yes, with predicates in values.
> >
> > Here there is basically "free text search" and "matching summaries", where text is the text and summary is
> > a data structure, with attributes as paths the leaves of the tree of which match.
> >
> > Then, the message has text, its body, and and headers, key-value pairs or collections thereof, where as well
> > there are default summaries like "a histogram of words by occurrence" or for example default text like "the
> > MIME body of this message has a default text representation".
> >
> > So, the idea developing here is to define what are "normal" forms of data structures that have some "normal"
> > forms of encoding that result that these "normalizing" after "normative" data structures define well-behaved
> > algorithms upon them, which provide well-defined bounds in resources that return some quantification of results,
> > like any/each/every/all, "hits".
> >
> > This is where usually enough search engines' or collected search algorithms ("find") usually enough have these
> > de-facto forms, "under the hood", as it were, to make it first-class that for a given message and body that
> > there is a normal form of a "catalog summary index" which can be compiled to a constant when the message
> > is ingested, that then basically any filestore of these messages has alongside it the filestore of the "catsums"
> > or as on-demand, then that any algorithm has at least well-defined behavior under partitions or collections
> > or selections of these messages, or items, for various standard algorithms that separate "to find" from
> > "to serve to find".
> >
> > So, ..., what I'm wondering are what would be sufficient normal forms in brief that result that there are
> > defined for a given corpus of messages, basically at the granularity of messages, how is defined how
> > there is a normal form for each message its "catsum", that catums have a natural algebra that a
> > concatenation of catums is a catsum and that some standard algorithms naturally have well-defined
> > results on their predicates and quantifiers of matching, in serial and parallel, and that the results
> > combine in serial and parallel.
> >
> > The results should be applicable to any kind of data but here it's more or less about usenet groups.
> So I start browsing the Information Retrieval section in Wikipedia and more or less get to reading
> Luhn's 1958 "automatic coding of document summaries" or "The Automatic Creation of Literature
> Abstracts". Then, what I figure, is that the histogram, is an associative array of keys to counts,
> and what I figure is to compute both the common terms, and, the rare terms, so that there's both
> "common-weight" and "rare-weight" computed, off of the count of the terms, and the count of
> distinct terms, where it is working up that besides catums, or catsums, it would result a relational
> algebra of terms in, ..., terms, of counts and densities and these type things. This is where, first I
> would figure the catsum would be deterministic before it's at all probabilistic, because the goal is
> match-find not match-guess, while still it's to support the less deterministic but more opportunistic
> at the same time.
>
> Then, the "index" is basically like a usual book's index, for each term that's not a common term in
> the language but is a common term in the book, what page it's on, here that that is a read-out of
> a histogram of the terms to pages. Then, compound terms, basically get into grammar, and in terms
> of terms, I don't so much care to parse glossolalia as what result mostly well-defined compound terms
> in usual natural languages, for the utility of a dictionary and technical dictionaries. Here "pages" are
> both according to common message threads, and also the surround of messages in the same time
> period, where a group is a common message thread and a usenet is a common message thread.
>
> (I've had a copy of "the information retrieval book" before, also borrowed one "data logic".)
>
> "Spelling mistakes considered adversarial."
>
> https://en.wikipedia.org/wiki/Subject_indexing#Indexing_theory
>
> Then, there's lots to be said for "summary" and "summary in statistic".
>
>
> A first usual data structure for efficiency is the binary tree or bounding tree. Then, there's
> also what makes for divide-and-conquer or linear speedup.
>
>
> About the same time as Luhn's monograph or 1956, there was published a little book
> called "Logic and Language", Huppe and Kaminsky. It details how according to linguistics
> there are certain usual regular patterns of words after phonemes and morphology what
> result then for stems and etymology that then for vocabulary that grammar or natural
> language results above. Then there are also gentle introductions to logic. It's very readable
> and quite brief.
Ross Finlayson
2023-04-29 21:54:21 UTC
Reply
Permalink
On Wednesday, March 8, 2023 at 10:23:04 PM UTC-8, Ross Finlayson wrote:
> On Wednesday, March 8, 2023 at 8:51:58 PM UTC-8, Ross Finlayson wrote:
> > On Monday, December 6, 2021 at 7:32:16 AM UTC-8, Ross A. Finlayson wrote:
> > > On Monday, November 16, 2020 at 5:39:08 PM UTC-8, Ross A. Finlayson wrote:
> > > > On Monday, November 16, 2020 at 5:00:51 PM UTC-8, Ross A. Finlayson wrote:
> > > > > On Tuesday, June 30, 2020 at 10:00:52 AM UTC-7, Mostowski Collapse wrote:
> > > > > > NNTP is not HTTP. I was using bare metal access to
> > > > > > usenet, not using Google group, via:
> > > > > >
> > > > > > news.albasani.net, unfortunately dead since Corona
> > > > > >
> > > > > > So was looking for an alternative. And found this
> > > > > > alternative, which seems fine:
> > > > > >
> > > > > > news.solani.org
> > > > > >
> > > > > > Have Fun!
> > > > > >
> > > > > > P.S.: Technical spec of news.solani.org:
> > > > > >
> > > > > > Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 500 GB SSD RAID1
> > > > > > Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 2 TB HDD RAID1
> > > > > > Intel Xeon VM, 1.8 GHz, 1 GB RAM, 20 GB
> > > > > > Standort: 2x Falkenstein, 1x New York
> > > > > >
> > > > > > advantage of bare metal usenet,
> > > > > > you see all headers of message.
> > > > > > Am Dienstag, 30. Juni 2020 06:24:53 UTC+2 schrieb Ross A. Finlayson:
> > > > > > > Search you mentioned and for example HTTP is adding the SEARCH verb,
> > > > > In traffic there are two kinds of usenet users,
> > > > > viewers and traffic through Google Groups,
> > > > > and, USENET. (USENET traffic.)
> > > > >
> > > > > Here now Google turned on login to view their
> > > > > Google Groups - effectively closing the Google Groups
> > > > > without a Google login.
> > > > >
> > > > > I suppose if they're used at work or whatever though
> > > > > they'd be open.
> > > > >
> > > > >
> > > > >
> > > > > Where I got with the C10K non-blocking I/O for a usenet server,
> > > > > it scales up though then I think in the runtime is a situation where
> > > > > it only runs epoll or kqueue that the test scale ups, then at the end
> > > > > or in sockets there is a drop, or it fell off the driver. I've implemented
> > > > > the code this far, what has all of NNTP in a file and then the "re-routine,
> > > > > industry-pattern back-end" in memory, then for that running usually.
> > > > >
> > > > > (Cooperative multithreading on top of non-blocking I/O.)
> > > > >
> > > > > Implementing the serial queue or "monohydra", or slique,
> > > > > makes for that then when the parser is constantly parsing,
> > > > > it seems a usual queue like data structure with parsing
> > > > > returning its bounds, consuming the queue.
> > > > >
> > > > > Having the file buffers all down small on 4K pages,
> > > > > has that a next usual page size is the megabyte.
> > > > >
> > > > > Here though it seems to make sense to have a natural
> > > > > 4K alignment the file system representation, then that
> > > > > it is moving files.
> > > > >
> > > > > So, then with the new modern Java, it that runs in its own
> > > > > Java server runtime environment, it seems I would also
> > > > > need to see whether the cloud virt supported the I/O model
> > > > > or not, or that the cooperative multi-threading for example
> > > > > would be single-threaded. (Blocking abstractly.)
> > > > >
> > > > > Then besides I suppose that could be neatly with basically
> > > > > the program model, and its file model, being well-defined,
> > > > > then for NNTP with IMAP organization search and extensions,
> > > > > those being standardized, seems to make sense for an efficient
> > > > > news file organization.
> > > > >
> > > > > Here then it seems for serving the NNTP, and for example
> > > > > their file bodies under the storage, with the fixed headers,
> > > > > variable header or XREF, and the message body, then under
> > > > > content it's same as storage.
> > > > >
> > > > > NNTP has "OVERVIEW" then from it is built search.
> > > > >
> > > > > Let's see here then, if I get the load test running, or,
> > > > > just put a limit under the load while there are no load test
> > > > > errors, it seems the algorithm then scales under load to be
> > > > > making usually the algorithm serial in CPU, with: encryption,
> > > > > and compression (traffic). (Block ciphers instead of serial transfer.)
> > > > >
> > > > > Then, the industry pattern with re-routines, has that the
> > > > > re-routines are naturally co-operative in the blocking,
> > > > > and in the language, including flow-of-control and exception scope.
> > > > >
> > > > >
> > > > > So, I have a high-performance implementation here.
> > > > It seems like for NFS, then, and having the separate read and write of the client,
> > > > a default filesystem, is an idea for the system facility: mirroring the mounted file
> > > > locally, and, providing the read view from that via a different route.
> > > >
> > > >
> > > > A next idea then seems for the organization, the client views themselves
> > > > organize over the durable and available file system representation, this
> > > > provides anyone a view over the protocol with a group file convention.
> > > >
> > > > I.e., while usual continuous traffic was surfing, individual reads over group
> > > > files could have independent views, for example collating contents.
> > > >
> > > > Then, extracting requests from traffic and threads seems usual.
> > > >
> > > > (For example a specialized object transfer view.)
> > > >
> > > > Making protocols for implementing internet protocols in groups and
> > > > so on, here makes for giving usenet example views to content generally.
> > > >
> > > > So, I have designed a protocol node and implemented it mostly,
> > > > then about designed an object transfer protocol, here the idea
> > > > is how to make it so people can extract data, for example their own
> > > > data, from a large durable store of all the usenet messages,
> > > > making views of usenet running on usenet, eg "Feb. 2016: AP's
> > > > Greatest Hits".
> > > >
> > > > Here the point is to figure that usenet, these days, can be operated
> > > > in cooperation with usenet, and really for its own sake, for leaving
> > > > messages in usenet and here for usenet protocol stores as there's
> > > > no reason it's plain text the content, while the protocol supports it.
> > > >
> > > > Building personal view for example is a simple matter of very many
> > > > service providers any of which sells usenet all day for a good deal.
> > > >
> > > > Let's see here, $25/MM, storage on the cloud last year for about
> > > > a million messages for a month is about $25. Outbound traffic is
> > > > usually the metered cloud traffic, here for example that CDN traffic
> > > > support the universal share convention, under metering. What that
> > > > the algorithm is effectively tunable in CPU and RAM, makes for under
> > > > I/O that's it's "unobtrusive" or the cooperative in routine, for CPI I/O and
> > > > RAM, then that there is for seeking that Network Store or Database Time
> > > > instead effectively becomes File I/O time, as what may be faster,
> > > > and more durable. There's a faster database time for scaling the ingestion
> > > > here with that the file view is eventually consistent. (And reliable.)
> > > >
> > > > Checking the files would be over time for example with "last checked"
> > > > and "last dropped" something along the lines of, finding wrong offsets,
> > > > basically having to make it so that it survives neatly corruption of the
> > > > store (by being more-or-less stored in-place).
> > > >
> > > > Content catalog and such, catalog.
> > > Then I wonder and figure the re-routine can scale.
> > >
> > > Here for the re-routine, the industry factory pattern,
> > > and the commands in the protocols in the templates,
> > > and the memory module, with the algorithm interface,
> > > in the high-performance computer resource, it is here
> > > that this simple kind of "writing Internet software"
> > > makes pretty rapidly for adding resources.
> > >
> > > Here the design is basically of a file I/O abstraction,
> > > that the computer reads data files with mmap to get
> > > their handlers, what results that for I/O map the channels
> > > result transferring the channels in I/O for what results,
> > > in mostly the allocated resource requirements generally,
> > > and for the protocol and algorithm, it results then that
> > > the industry factory pattern and making for interfaces,
> > > then also here the I/O routine as what results that this
> > > is an implementation, of a network server, mostly is making
> > > for that the re-routine, results very neatly a model of
> > > parallel cooperation.
> > >
> > > I think computers still have file systems and file I/O but
> > > in abstraction just because PAGE_SIZE is still relevant for
> > > the network besides or I/O, if eventually, here is that the
> > > value types are in the commands and so on, it is besides
> > > that in terms of the resources so defined it still is in a filesystem
> > > convention that a remote and unreliable view of it suffices.
> > >
> > > Here then the source code also being "this is only 20-50k",
> > > lines of code, with basically an entire otherwise library stack
> > > of the runtime itself, only the network and file abstraction,
> > > this makes for also that modularity results. (Factory Industry
> > > Pattern Modules.)
> > >
> > > For a network server, here, that, mostly it is high performance
> > > in the sense that this is about the most direct handle on the channels
> > > and here mostly for the text layer in the I/O order, or protocol layer,
> > > here is that basically encryption and compression usually in the layer,
> > > there is besides a usual concern where encryption and compression
> > > are left out, there is that text in the layer itself is commands.
> > >
> > > Then, those being constants under the resources for the protocol,
> > > it's what results usual protocols like NNTP and HTTP and other protocols
> > > with usually one server and many clients, here is for that these protocols
> > > are defined in these modules, mostly there NNTP and IMAP, ..., HTTP.
> > >
> > > These are here defined "all Java" or "Pure Java", i.e. let's be clear that
> > > in terms of the reference abstraction layer, I think computers still use
> > > the non-blocking I/O and filesystems and network to RAM, so that as
> > > the I/O is implemented in those it actually has those besides instead for
> > > example defaulting to byte-per-channel or character I/O. I.e. the usual
> > > semantics for servicing the I/O in the accepter routine and what makes
> > > for that the platform also provides a reference encryption implementation,
> > > if not so relevant for the block encoder chain, besides that for example
> > > compression has a default implementation, here the I/O model is as simply
> > > in store for handles, channels, ..., that it results that data especially delivered
> > > from a constant store can anyways be mostly compressed and encrypted
> > > already or predigested to serve, here that it's the convention, here is for
> > > resulting that these client-server protocols, with usually reads > postings
> > > then here besides "retention", basically here is for what it is.
> > >
> > > With the re-routine and the protocol layer besides, having written the
> > > routines in the re-routine, what there is to write here is this industry
> > > factory, or a module framework, implementing the re-routines, as they're
> > > built from the linear description a routine, makes for as the routine progresses
> > > that it's "in the language" and that more than less in the terms, it makes for
> > > implementing the case of logic for values, in the logic's flow-of-control's terms.
> > >
> > > Then, there is that actually running the software is different than just
> > > writing it, here in the sense that as a server runtime, it is to be made a
> > > thing, by giving it a name, and giving it an authority, to exist on the Internet.
> > >
> > > There is basically that for BGP and NAT and so on, and, mobile fabric networks,
> > > IP and TCP/IP, of course IPv4 and IPv6 are the coarse fabric main space, with
> > > respect to what are CIDR and 24 bits rule and what makes for TCP/IP, here
> > > entirely the course is using the TCP/IP stack and Java's TCP/IP stack, with
> > > respect to that TCP/IP is so provided or in terms of process what results
> > > ports mostly and connection models where it is exactly the TCP after the IP,
> > > the Transport Control Protocol and Internet Protocol, have here both this
> > > socket and datagram connection orientation, or stateful and stateless or
> > > here that in terms of routing it's defined in addresses, under that names
> > > and routing define sources, routes, destinations, ..., that routine numeric
> > > IP addresses result in the usual sense of the network being behind an IP
> > > and including IPv4 network fabric with respect to local routers.
> > >
> > > I.e., here to include a service framework is "here besides the routine, let's
> > > make it clear that in terms of being a durable resource, there needs to be
> > > some lockbox filled with its sustenance that in some locked or constant
> > > terms results that for the duration of its outlay, say five years, it is held
> > > up, then, it will be so again, or, let down to result the carry-over that it
> > > invested to archive itself, I won't have to care or do anything until then".
> > >
> > >
> > > About the service activation and the idea that, for a port, the routine itself
> > > needs only run under load, i.e. there is effectively little traffic on the old archives,
> > > and usually only the some other archive needs any traffic. Here the point is
> > > that for the Java routine there is the system port that was accepted for the
> > > request, that inetd or the systemd or means the network service was accessed,
> > > made for that much as for HTTP the protocol is client-server also for IP the
> > > protocol is client-server, while the TCP is packets. This is a general idea for
> > > system integration while here mostly the routine is that being a detail:
> > > the filesystem or network resource that results that the re-routines basically
> > > make very large CPU scaling.
> > >
> > > Then, it is basically containerized this sense of "at some domain name, there
> > > is a service, it's HTTP and NNTP and IMAP besides, what cares the world".
> > >
> > > I.e. being built on connection oriented protocols like the socket layer,
> > > HTTP(S) and NNTP(S) and IMAP(S) or with the TLS orientation to certificates,
> > > it's more than less sensible that most users have no idea of installing some
> > > NNTP browser or pointing their email to IMAP so that the email browser
> > > browses the newsgroups and for postings, here this is mostly only talk
> > > about implementing NNTP then IMAP and HTTP that happens to look like that,
> > > besides for example SMTP or NNTP posting.
> > >
> > > I.e., having "this IMAP server, happens to be this NNTP module", or
> > > "this HTTP server, happens to be a real simple mailbox these groups",
> > > makes for having partitions and retentions of those and that basically
> > > NNTP messages in the protocol can be more or less the same content
> > > in media, what otherwise is of a usual message type.
> > >
> > > Then, the NNTP server-server routine is the progation of messages
> > > besides "I shall hire ten great usenet retention accounts and gently
> > > and politely draw them down and back-fill Usenet, these ten groups".
> > >
> > > By then I would have to have made for retention in storage, such contents,
> > > as have a reference value, then for besides making that independent in
> > > reference value, just so that it suffices that it basically results "a usable
> > > durable filesystem that happens you can browse it like usenet". I.e. as
> > > the pieces to make the backfill are dug up, they get assigned reference numbers
> > > of their time to make for what here is that in a grand schema of things,
> > > they have a reference number in numerical order (and what's also the
> > > server's "message-number" besides its "message-id") as noted above this
> > > gets into the storage for retention of a file, while, most services for this
> > > are instead for storage and serving, not necessarily or at all retention.
> > >
> > > I.e., the point is that as the groups are retained from retention, there is an
> > > approach what makes for an orderly archeology, as for what convention
> > > some data arrives, here that this server-server routine is besides the usual
> > > routine which is "here are new posts, propagate them", it's "please deliver
> > > as of a retention scan, and I'll try not to repeat it, what results as orderly
> > > as possible a proof or exercise of what we'll call afterward entire retention",
> > > then will be for as of writing a file that "as of the date, from start to finish,
> > > this site certified these messages as best-effort retention".
> > >
> > > It seems then besides there is basically "here is some mbox file, serve it
> > > like it was an NNTP group or an IMAP mailbox", ingestion, in terms of that
> > > what is ingestion, is to result for the protocol that "for this protocol,
> > > there is actually a normative filesystem representation that happens to
> > > be pretty much also altogether definede by the protocol", the point is
> > > that ingestion would result in command to remain in the protocol,
> > > that a usual file type that "presents a usual abstraction, of a filesystem,
> > > as from the contents of a file", here with the notion of "for all these
> > > threaded discussions, here this system only cares some approach to
> > > these ten particular newgroups that already have mostly their corpus
> > > though it's not in perhaps their native mbox instead consulted from services".
> > >
> > > Then, there's for storing and serving the files, and there is the usual
> > > notion that moving the data, is to result, that really these file organizations
> > > are not so large in terms of resources, being "less than gigabytes" or so,
> > > still there's a notion that as a durable resource they're to be made
> > > fungible here the networked file approach in the native filesystem,
> > > then that with respect to it's a backing store, it's to make for that
> > > the entire enterprise is more or less to made in terms of account,
> > > that then as a facility on the network then a service in the network,
> > > it's basically separated the facility and service, while still of course
> > > that the service is basically defined by its corpus.
> > >
> > >
> > > Then, to make that fungible in a world of account, while with an exit
> > > strategy so that the operation isn't not abstract, is mostly about the
> > > domain name, then that what results the networking, after trusted
> > > network naming and connections for what result routing, and then
> > > the port, in terms of that there are usual firewalls in ports though that
> > > besides usually enough client ports are ephemeral, here the point is
> > > that the protocols and their well-known ports, here it's usually enough
> > > that the Internet doesn't concern itself so much protocols but with
> > > respect to proxies, here that for example NNTP and IMAP don't have
> > > so much anything so related that way after startTLS. For the world of
> > > account, is basically to have for a domain name, an administrator, and,
> > > an owner or representative. These are to establish authority for changes
> > > and also accountability for usage.
> > >
> > > Basically they're to be persons and there is a process to get to be an
> > > administrator of DNS, most always there are services that a usual person
> > > implementing the system might use, besides for example the numerical.
> > >
> > > More relevant though to DNS is getting servers on the network, with respect
> > > to listening ports and that they connect to clients what so discover them as
> > > via DNS or configuration, here as above the usual notion that these are
> > > standard services and run on well-known ports for inetd or systemd.
> > > I.e. there is basically that running a server and dedicated networking,
> > > and power and so on, and some notion of the limits of reliability, is then
> > > as very much in other aspects of the organization of the system, i.e. its name,
> > > while at the same time, the point that a module makes for that basically
> > > the provision of a domain name or well-known or ephemeral host, is the
> > > usual notion that static IP addresses are a limited resource and as about
> > > the various networks in IPv4 and how they route traffic, is for that these
> > > services have well-known sections in DNS for at least that the most usual
> > > configuration is none.
> > >
> > > For a usual global reliability and availability, is some notion basically that
> > > each region and zone has a service available on the IP address, for that
> > > "hostname" resolves to the IP addresses. As well, in reverse, for the IP
> > > address and about the hostname, it should resolve reverse to hostname.
> > >
> > > About certificates mostly for identification after mapping to port, or
> > > multi-home Internet routing, here is the point that whether the domain
> > > name administration is "epochal" or "regular", is that epochs are defined
> > > by the ports behind the numbers and the domain name system as well,
> > > where in terms of the registrar, the domain names are epochal to the
> > > registrar, with respect to owners of domain names.
> > >
> > > Then if DNS is a datagram or UDP service is for ICMP as for TCP/IP,
> > > and also BGP and NAT and routing and what are local and remote
> > > addresses, here is for not-so-much "implement DNS the protocol
> > > also while you're at it", rather for what results that there is a durable
> > > and long-standing and proper doorman, for some usenet.science.
> > >
> > > Here then the notion seems to be whether the doorman basically
> > > knows well-known services, is a multi-homing router, or otherwise
> > > what is the point that it starts the lean runtime, with respect to that
> > > it's a container and having enough sense of administration its operation
> > > as contained. I.e. here given a port and a hostname and always running
> > > makes for that as long as there is the low (preferable no) idle for services
> > > running that have no clients, is here also for the cheapest doorman that
> > > knows how to standup the client sentinel. (And put it back away.)
> > >
> > > Probably the most awful thing in the cloud services is the cost for
> > > data ingress and egress. What that means is that for example using
> > > a facility that is bound by that as a cost instead of under some constant
> > > cost, is basically why there is the approach that the containers needs a
> > > handle to the files, and they're either local files or network files, here
> > > with the some convention above in archival a shared consistent view
> > > of all the files, or abstractly consistent, is for making that the doorman
> > > can handle lots of starting and finishing connections, while it is out of
> > > the way when usually it's client traffic and opening and closing connections,
> > > and the usual abstraction is that the client sentinel is never off and doorman
> > > does nothing, here is for attaching the one to some lower constant cost,
> > > where for example any long-running cost is more than some low constant cost.
> > >
> > > Then, this kind of service is often represented by nodes, in the usual sense
> > > "here is an abstract container with you hope some native performance under
> > > the hypervisor where it lives on the farm on its rack, it basically is moved the
> > > image to wherever it's requested from and lives there, have fun, the meter is on".
> > > I.e. that's just "this Jar has some config conventions and you can make the
> > > container associate it and watchdog it with systemd for example and use the
> > > cgroups while you're at it and make for tempfs quota and also the best network
> > > file share, which you might be welcome to cache if you care just in the off-chance
> > > that this file-mapping is free or constant cost as long as it doesn't egress the
> > > network", is for here about the facilities that work, to get a copy of the system
> > > what with respect to its usual operation is a piece of the Internet.
> > >
> > > For the different reference modules (industry factories) in their patterns then
> > > and under combined configuration "file + process + network + fare", is that
> > > the fare of the service basically reflects a daily coin, in the sense that it
> > > represents an annual or epochal fee, what results for the time there is
> > > what is otherwise all defined the "file + process + network + name",
> > > what results it perpetuates in operation more than less simply and automatically.
> > >
> > > Then, the point though is to get it to where "I can go to this service, and
> > > administer it more or less by paying an account, that it thus lives in its
> > > budget and quota in its metered world".
> > >
> > > That though is very involved with identity, that in terms of "I the account
> > > as provided this sum make this sum paid with respect to an agreement",
> > > is that authority to make agreements must make that it results that the
> > > operation of the system, is entirely transparent, and defined in terms of
> > > the roles and delegation, conventions in operation.
> > >
> > > I.e., I personally don't want to administer a copy of usenet, but, it's here
> > > pretty much sorted out that I can administer one once then that it's to
> > > administer itself in the following, in terms of it having resources to allocate
> > > and resources to disburse. Also if nobody's using it it should basically work
> > > itself out to dial its lights down (while maintaining availability).
> > >
> > > Then a point seems "maintain and administer the operation in effect,
> > > what arrangement sees via delegation, that a card number and a phone
> > > number and an email account and more than less a responsible entity,
> > > is so indicated for example in cryptographic identity thus that the operation
> > > of this system as a service, effectively operates itself out of a kitty,
> > > what makes for administration and overhead, an entirely transparent
> > > model of a miniature business the system as a service".
> > >
> > > "... and a mailing address and mail service."
> > >
> > > Then, for accounts and accounts, for example is the provision of the component
> > > as simply an image in cloud algorithms, where basically as above here it's configured
> > > that anybody with any cloud account could basically run it on their own terms,
> > > there is for here sorting out "after this delegation to some business entity what
> > > results a corporation in effect, the rest is business-in-a-box and more-than-less
> > > what makes for its administration in state, is for how it basically limits and replicates
> > > its service, in terms of its own assets here as what administered is abstractly
> > > "durable forever mailboxes with private ownership if on public or managed resources".
> > >
> > > A usual notion of a private email and usenet service offering and business-in-a-box,
> > > here what I'm looking at is that besides archiving sci.math and copying out its content
> > > under author line, is to make such an industry for example here that "once having
> > > implemented an Internet service, an Internet service of them results Internet".
> > >
> > > I.e. here the point is to make a corporation and a foundation in effect, what in terms
> > > of then about the books and accounts, is about accounts for the business accounts
> > > that reflect a persistent entity, then what results in terms of computing, networking,
> > > and internetworking, with a regular notion of "let's never change this arrangement
> > > but it's in monthly or annual terms", here for that in overall arrangements,
> > > it results what the entire system more than less runs in ways then to either
> > > run out its limits or make itself a sponsored effort, about more-or-less a simple
> > > and responsible and accountable set of operations what effect the business
> > > (here that in terms of service there is basically the realm of agreement)
> > > that basically this sort of business-in-a-box model, is then besides itself of
> > > accounts, toward the notion as pay-as-you-go and "usual credits and their limits".
> > >
> > > Then for a news://usenet.science, or for example sci.math.usenet.science,
> > > is the idea that the entity is "some assemblage what is so that in DNS, and,
> > > in the accounts payable and receivable, and, in the material matters of
> > > arrangement and authority for administration, of DNS and resources and
> > > accounts what result durably persisting the business, is basically for a service
> > > then of what these are usual enough tasks, as that are interactive workflows
> > > and for mechanical workflows.
> > >
> > > I.e. the point is for having the service than an on/off button and more or less
> > > what is for a given instance of the operation, what results from some protocol
> > > that provides a "durable store" of a sort of the business, that at any time basically
> > > some re-routine or "eventually consistent" continuance of the operation of the
> > > business, results basically a continuity in its operations, what is entirely granular,
> > > that here for example the point is to "pick a DNS name, attach an account service,
> > > go" it so results that in the terms, basically there are the placeholders of the
> > > interactive workflows in that, and as what in terms are often for example simply
> > > card and phone number terms, account terms.
> > >
> > > I.e. a service to replenish accounts as kitties for making accounts only and
> > > exactly limited to the one service, its transfers, basically results that there
> > > is the notion of an email address, a phone number, a credit card's information,
> > > here a fixed limit debit account that works as of a kitty, there is a regular workflow
> > > service that will read out the durable stores and according to the timeliness of
> > > their events, affect the configuration and reconciliation of payments for accounts
> > > (closed loop scheduling/receiving).
> > >
> > > https://datatracker.ietf.org/doc/draft-flanagan-regext-datadictionary/
> > > https://www.rfc-editor.org/rfc/rfc9022.txt
> > >
> > > Basically for dailies, monthlies, and annuals, what make weeklies,
> > > is this idea of Internet-from-a- account, what is services.
> > After implementing a store, and the protocol for getting messages, then what seems relevant here in the
> > context of the SEARCH command, is a fungible file-format, that is derived from the body of the message
> > in a normal form, that is a data structure that represents an index and catalog and dictionary and summary
> > of the message, a form of a data structure of a "search index".
> >
> > These types files should naturally compose, and result a data structure that according to some normal
> > forms of search and summary algorithms, result that a data structure results, that makes for efficient
> > search of sections of the corpus for information retrieval, here that "information retrieval is the science
> > of search algorithms".
> >
> > Now, for what and how people search, or what is the specification of a search, is in terms of queries, say,
> > here for some brief forms of queries that advise what's definitely included in the search, what's excluded,
> > then perhaps what's maybe included, or yes/no/maybe, which makes for a predicate that can be built,
> > that can be applied to results that compose and build for the terms of a filter with yes/no/maybe or
> > sure/no/yes, with predicates in values.
> >
> > Here there is basically "free text search" and "matching summaries", where text is the text and summary is
> > a data structure, with attributes as paths the leaves of the tree of which match.
> >
> > Then, the message has text, its body, and and headers, key-value pairs or collections thereof, where as well
> > there are default summaries like "a histogram of words by occurrence" or for example default text like "the
> > MIME body of this message has a default text representation".
> >
> > So, the idea developing here is to define what are "normal" forms of data structures that have some "normal"
> > forms of encoding that result that these "normalizing" after "normative" data structures define well-behaved
> > algorithms upon them, which provide well-defined bounds in resources that return some quantification of results,
> > like any/each/every/all, "hits".
> >
> > This is where usually enough search engines' or collected search algorithms ("find") usually enough have these
> > de-facto forms, "under the hood", as it were, to make it first-class that for a given message and body that
> > there is a normal form of a "catalog summary index" which can be compiled to a constant when the message
> > is ingested, that then basically any filestore of these messages has alongside it the filestore of the "catsums"
> > or as on-demand, then that any algorithm has at least well-defined behavior under partitions or collections
> > or selections of these messages, or items, for various standard algorithms that separate "to find" from
> > "to serve to find".
> >
> > So, ..., what I'm wondering are what would be sufficient normal forms in brief that result that there are
> > defined for a given corpus of messages, basically at the granularity of messages, how is defined how
> > there is a normal form for each message its "catsum", that catums have a natural algebra that a
> > concatenation of catums is a catsum and that some standard algorithms naturally have well-defined
> > results on their predicates and quantifiers of matching, in serial and parallel, and that the results
> > combine in serial and parallel.
> >
> > The results should be applicable to any kind of data but here it's more or less about usenet groups.
> So I start browsing the Information Retrieval section in Wikipedia and more or less get to reading
> Luhn's 1958 "automatic coding of document summaries" or "The Automatic Creation of Literature
> Abstracts". Then, what I figure, is that the histogram, is an associative array of keys to counts,
> and what I figure is to compute both the common terms, and, the rare terms, so that there's both
> "common-weight" and "rare-weight" computed, off of the count of the terms, and the count of
> distinct terms, where it is working up that besides catums, or catsums, it would result a relational
> algebra of terms in, ..., terms, of counts and densities and these type things. This is where, first I
> would figure the catsum would be deterministic before it's at all probabilistic, because the goal is
> match-find not match-guess, while still it's to support the less deterministic but more opportunistic
> at the same time.
>
> Then, the "index" is basically like a usual book's index, for each term that's not a common term in
> the language but is a common term in the book, what page it's on, here that that is a read-out of
> a histogram of the terms to pages. Then, compound terms, basically get into grammar, and in terms
> of terms, I don't so much care to parse glossolalia as what result mostly well-defined compound terms
> in usual natural languages, for the utility of a dictionary and technical dictionaries. Here "pages" are
> both according to common message threads, and also the surround of messages in the same time
> period, where a group is a common message thread and a usenet is a common message thread.
>
> (I've had a copy of "the information retrieval book" before, also borrowed one "data logic".)
>
> "Spelling mistakes considered adversarial."
>
> https://en.wikipedia.org/wiki/Subject_indexing#Indexing_theory
>
> Then, there's lots to be said for "summary" and "summary in statistic".
>
>
> A first usual data structure for efficiency is the binary tree or bounding tree. Then, there's
> also what makes for divide-and-conquer or linear speedup.
>
>
> About the same time as Luhn's monograph or 1956, there was published a little book
> called "Logic and Language", Huppe and Kaminsky. It details how according to linguistics
> there are certain usual regular patterns of words after phonemes and morphology what
> result then for stems and etymology that then for vocabulary that grammar or natural
> language results above. Then there are also gentle introductions to logic. It's very readable
> and quite brief.



I haven't much been tapping away at this,
but it's pretty simple to stand up a usenet peer,
and pretty simple to slurp a copy,
of the "Big 8" usenet text groups, for example,
or particularly just for a few.
Ross Finlayson
2023-12-22 08:36:33 UTC
Reply
Permalink
On Saturday, April 29, 2023 at 2:54:26 PM UTC-7, Ross Finlayson wrote:
> On Wednesday, March 8, 2023 at 10:23:04 PM UTC-8, Ross Finlayson wrote:
> > On Wednesday, March 8, 2023 at 8:51:58 PM UTC-8, Ross Finlayson wrote:
> > > On Monday, December 6, 2021 at 7:32:16 AM UTC-8, Ross A. Finlayson wrote:
> > > > On Monday, November 16, 2020 at 5:39:08 PM UTC-8, Ross A. Finlayson wrote:
> > > > > On Monday, November 16, 2020 at 5:00:51 PM UTC-8, Ross A. Finlayson wrote:
> > > > > > On Tuesday, June 30, 2020 at 10:00:52 AM UTC-7, Mostowski Collapse wrote:
> > > > > > > NNTP is not HTTP. I was using bare metal access to
> > > > > > > usenet, not using Google group, via:
> > > > > > >
> > > > > > > news.albasani.net, unfortunately dead since Corona
> > > > > > >
> > > > > > > So was looking for an alternative. And found this
> > > > > > > alternative, which seems fine:
> > > > > > >
> > > > > > > news.solani.org
> > > > > > >
> > > > > > > Have Fun!
> > > > > > >
> > > > > > > P.S.: Technical spec of news.solani.org:
> > > > > > >
> > > > > > > Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 500 GB SSD RAID1
> > > > > > > Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 2 TB HDD RAID1
> > > > > > > Intel Xeon VM, 1.8 GHz, 1 GB RAM, 20 GB
> > > > > > > Standort: 2x Falkenstein, 1x New York
> > > > > > >
> > > > > > > advantage of bare metal usenet,
> > > > > > > you see all headers of message.
> > > > > > > Am Dienstag, 30. Juni 2020 06:24:53 UTC+2 schrieb Ross A. Finlayson:
> > > > > > > > Search you mentioned and for example HTTP is adding the SEARCH verb,
> > > > > > In traffic there are two kinds of usenet users,
> > > > > > viewers and traffic through Google Groups,
> > > > > > and, USENET. (USENET traffic.)
> > > > > >
> > > > > > Here now Google turned on login to view their
> > > > > > Google Groups - effectively closing the Google Groups
> > > > > > without a Google login.
> > > > > >
> > > > > > I suppose if they're used at work or whatever though
> > > > > > they'd be open.
> > > > > >
> > > > > >
> > > > > >
> > > > > > Where I got with the C10K non-blocking I/O for a usenet server,
> > > > > > it scales up though then I think in the runtime is a situation where
> > > > > > it only runs epoll or kqueue that the test scale ups, then at the end
> > > > > > or in sockets there is a drop, or it fell off the driver. I've implemented
> > > > > > the code this far, what has all of NNTP in a file and then the "re-routine,
> > > > > > industry-pattern back-end" in memory, then for that running usually.
> > > > > >
> > > > > > (Cooperative multithreading on top of non-blocking I/O.)
> > > > > >
> > > > > > Implementing the serial queue or "monohydra", or slique,
> > > > > > makes for that then when the parser is constantly parsing,
> > > > > > it seems a usual queue like data structure with parsing
> > > > > > returning its bounds, consuming the queue.
> > > > > >
> > > > > > Having the file buffers all down small on 4K pages,
> > > > > > has that a next usual page size is the megabyte.
> > > > > >
> > > > > > Here though it seems to make sense to have a natural
> > > > > > 4K alignment the file system representation, then that
> > > > > > it is moving files.
> > > > > >
> > > > > > So, then with the new modern Java, it that runs in its own
> > > > > > Java server runtime environment, it seems I would also
> > > > > > need to see whether the cloud virt supported the I/O model
> > > > > > or not, or that the cooperative multi-threading for example
> > > > > > would be single-threaded. (Blocking abstractly.)
> > > > > >
> > > > > > Then besides I suppose that could be neatly with basically
> > > > > > the program model, and its file model, being well-defined,
> > > > > > then for NNTP with IMAP organization search and extensions,
> > > > > > those being standardized, seems to make sense for an efficient
> > > > > > news file organization.
> > > > > >
> > > > > > Here then it seems for serving the NNTP, and for example
> > > > > > their file bodies under the storage, with the fixed headers,
> > > > > > variable header or XREF, and the message body, then under
> > > > > > content it's same as storage.
> > > > > >
> > > > > > NNTP has "OVERVIEW" then from it is built search.
> > > > > >
> > > > > > Let's see here then, if I get the load test running, or,
> > > > > > just put a limit under the load while there are no load test
> > > > > > errors, it seems the algorithm then scales under load to be
> > > > > > making usually the algorithm serial in CPU, with: encryption,
> > > > > > and compression (traffic). (Block ciphers instead of serial transfer.)
> > > > > >
> > > > > > Then, the industry pattern with re-routines, has that the
> > > > > > re-routines are naturally co-operative in the blocking,
> > > > > > and in the language, including flow-of-control and exception scope.
> > > > > >
> > > > > >
> > > > > > So, I have a high-performance implementation here.
> > > > > It seems like for NFS, then, and having the separate read and write of the client,
> > > > > a default filesystem, is an idea for the system facility: mirroring the mounted file
> > > > > locally, and, providing the read view from that via a different route.
> > > > >
> > > > >
> > > > > A next idea then seems for the organization, the client views themselves
> > > > > organize over the durable and available file system representation, this
> > > > > provides anyone a view over the protocol with a group file convention.
> > > > >
> > > > > I.e., while usual continuous traffic was surfing, individual reads over group
> > > > > files could have independent views, for example collating contents.
> > > > >
> > > > > Then, extracting requests from traffic and threads seems usual.
> > > > >
> > > > > (For example a specialized object transfer view.)
> > > > >
> > > > > Making protocols for implementing internet protocols in groups and
> > > > > so on, here makes for giving usenet example views to content generally.
> > > > >
> > > > > So, I have designed a protocol node and implemented it mostly,
> > > > > then about designed an object transfer protocol, here the idea
> > > > > is how to make it so people can extract data, for example their own
> > > > > data, from a large durable store of all the usenet messages,
> > > > > making views of usenet running on usenet, eg "Feb. 2016: AP's
> > > > > Greatest Hits".
> > > > >
> > > > > Here the point is to figure that usenet, these days, can be operated
> > > > > in cooperation with usenet, and really for its own sake, for leaving
> > > > > messages in usenet and here for usenet protocol stores as there's
> > > > > no reason it's plain text the content, while the protocol supports it.
> > > > >
> > > > > Building personal view for example is a simple matter of very many
> > > > > service providers any of which sells usenet all day for a good deal.
> > > > >
> > > > > Let's see here, $25/MM, storage on the cloud last year for about
> > > > > a million messages for a month is about $25. Outbound traffic is
> > > > > usually the metered cloud traffic, here for example that CDN traffic
> > > > > support the universal share convention, under metering. What that
> > > > > the algorithm is effectively tunable in CPU and RAM, makes for under
> > > > > I/O that's it's "unobtrusive" or the cooperative in routine, for CPI I/O and
> > > > > RAM, then that there is for seeking that Network Store or Database Time
> > > > > instead effectively becomes File I/O time, as what may be faster,
> > > > > and more durable. There's a faster database time for scaling the ingestion
> > > > > here with that the file view is eventually consistent. (And reliable.)
> > > > >
> > > > > Checking the files would be over time for example with "last checked"
> > > > > and "last dropped" something along the lines of, finding wrong offsets,
> > > > > basically having to make it so that it survives neatly corruption of the
> > > > > store (by being more-or-less stored in-place).
> > > > >
> > > > > Content catalog and such, catalog.
> > > > Then I wonder and figure the re-routine can scale.
> > > >
> > > > Here for the re-routine, the industry factory pattern,
> > > > and the commands in the protocols in the templates,
> > > > and the memory module, with the algorithm interface,
> > > > in the high-performance computer resource, it is here
> > > > that this simple kind of "writing Internet software"
> > > > makes pretty rapidly for adding resources.
> > > >
> > > > Here the design is basically of a file I/O abstraction,
> > > > that the computer reads data files with mmap to get
> > > > their handlers, what results that for I/O map the channels
> > > > result transferring the channels in I/O for what results,
> > > > in mostly the allocated resource requirements generally,
> > > > and for the protocol and algorithm, it results then that
> > > > the industry factory pattern and making for interfaces,
> > > > then also here the I/O routine as what results that this
> > > > is an implementation, of a network server, mostly is making
> > > > for that the re-routine, results very neatly a model of
> > > > parallel cooperation.
> > > >
> > > > I think computers still have file systems and file I/O but
> > > > in abstraction just because PAGE_SIZE is still relevant for
> > > > the network besides or I/O, if eventually, here is that the
> > > > value types are in the commands and so on, it is besides
> > > > that in terms of the resources so defined it still is in a filesystem
> > > > convention that a remote and unreliable view of it suffices.
> > > >
> > > > Here then the source code also being "this is only 20-50k",
> > > > lines of code, with basically an entire otherwise library stack
> > > > of the runtime itself, only the network and file abstraction,
> > > > this makes for also that modularity results. (Factory Industry
> > > > Pattern Modules.)
> > > >
> > > > For a network server, here, that, mostly it is high performance
> > > > in the sense that this is about the most direct handle on the channels
> > > > and here mostly for the text layer in the I/O order, or protocol layer,
> > > > here is that basically encryption and compression usually in the layer,
> > > > there is besides a usual concern where encryption and compression
> > > > are left out, there is that text in the layer itself is commands.
> > > >
> > > > Then, those being constants under the resources for the protocol,
> > > > it's what results usual protocols like NNTP and HTTP and other protocols
> > > > with usually one server and many clients, here is for that these protocols
> > > > are defined in these modules, mostly there NNTP and IMAP, ..., HTTP.
> > > >
> > > > These are here defined "all Java" or "Pure Java", i.e. let's be clear that
> > > > in terms of the reference abstraction layer, I think computers still use
> > > > the non-blocking I/O and filesystems and network to RAM, so that as
> > > > the I/O is implemented in those it actually has those besides instead for
> > > > example defaulting to byte-per-channel or character I/O. I.e. the usual
> > > > semantics for servicing the I/O in the accepter routine and what makes
> > > > for that the platform also provides a reference encryption implementation,
> > > > if not so relevant for the block encoder chain, besides that for example
> > > > compression has a default implementation, here the I/O model is as simply
> > > > in store for handles, channels, ..., that it results that data especially delivered
> > > > from a constant store can anyways be mostly compressed and encrypted
> > > > already or predigested to serve, here that it's the convention, here is for
> > > > resulting that these client-server protocols, with usually reads > postings
> > > > then here besides "retention", basically here is for what it is.
> > > >
> > > > With the re-routine and the protocol layer besides, having written the
> > > > routines in the re-routine, what there is to write here is this industry
> > > > factory, or a module framework, implementing the re-routines, as they're
> > > > built from the linear description a routine, makes for as the routine progresses
> > > > that it's "in the language" and that more than less in the terms, it makes for
> > > > implementing the case of logic for values, in the logic's flow-of-control's terms.
> > > >
> > > > Then, there is that actually running the software is different than just
> > > > writing it, here in the sense that as a server runtime, it is to be made a
> > > > thing, by giving it a name, and giving it an authority, to exist on the Internet.
> > > >
> > > > There is basically that for BGP and NAT and so on, and, mobile fabric networks,
> > > > IP and TCP/IP, of course IPv4 and IPv6 are the coarse fabric main space, with
> > > > respect to what are CIDR and 24 bits rule and what makes for TCP/IP, here
> > > > entirely the course is using the TCP/IP stack and Java's TCP/IP stack, with
> > > > respect to that TCP/IP is so provided or in terms of process what results
> > > > ports mostly and connection models where it is exactly the TCP after the IP,
> > > > the Transport Control Protocol and Internet Protocol, have here both this
> > > > socket and datagram connection orientation, or stateful and stateless or
> > > > here that in terms of routing it's defined in addresses, under that names
> > > > and routing define sources, routes, destinations, ..., that routine numeric
> > > > IP addresses result in the usual sense of the network being behind an IP
> > > > and including IPv4 network fabric with respect to local routers.
> > > >
> > > > I.e., here to include a service framework is "here besides the routine, let's
> > > > make it clear that in terms of being a durable resource, there needs to be
> > > > some lockbox filled with its sustenance that in some locked or constant
> > > > terms results that for the duration of its outlay, say five years, it is held
> > > > up, then, it will be so again, or, let down to result the carry-over that it
> > > > invested to archive itself, I won't have to care or do anything until then".
> > > >
> > > >
> > > > About the service activation and the idea that, for a port, the routine itself
> > > > needs only run under load, i.e. there is effectively little traffic on the old archives,
> > > > and usually only the some other archive needs any traffic. Here the point is
> > > > that for the Java routine there is the system port that was accepted for the
> > > > request, that inetd or the systemd or means the network service was accessed,
> > > > made for that much as for HTTP the protocol is client-server also for IP the
> > > > protocol is client-server, while the TCP is packets. This is a general idea for
> > > > system integration while here mostly the routine is that being a detail:
> > > > the filesystem or network resource that results that the re-routines basically
> > > > make very large CPU scaling.
> > > >
> > > > Then, it is basically containerized this sense of "at some domain name, there
> > > > is a service, it's HTTP and NNTP and IMAP besides, what cares the world".
> > > >
> > > > I.e. being built on connection oriented protocols like the socket layer,
> > > > HTTP(S) and NNTP(S) and IMAP(S) or with the TLS orientation to certificates,
> > > > it's more than less sensible that most users have no idea of installing some
> > > > NNTP browser or pointing their email to IMAP so that the email browser
> > > > browses the newsgroups and for postings, here this is mostly only talk
> > > > about implementing NNTP then IMAP and HTTP that happens to look like that,
> > > > besides for example SMTP or NNTP posting.
> > > >
> > > > I.e., having "this IMAP server, happens to be this NNTP module", or
> > > > "this HTTP server, happens to be a real simple mailbox these groups",
> > > > makes for having partitions and retentions of those and that basically
> > > > NNTP messages in the protocol can be more or less the same content
> > > > in media, what otherwise is of a usual message type.
> > > >
> > > > Then, the NNTP server-server routine is the progation of messages
> > > > besides "I shall hire ten great usenet retention accounts and gently
> > > > and politely draw them down and back-fill Usenet, these ten groups".
> > > >
> > > > By then I would have to have made for retention in storage, such contents,
> > > > as have a reference value, then for besides making that independent in
> > > > reference value, just so that it suffices that it basically results "a usable
> > > > durable filesystem that happens you can browse it like usenet". I.e. as
> > > > the pieces to make the backfill are dug up, they get assigned reference numbers
> > > > of their time to make for what here is that in a grand schema of things,
> > > > they have a reference number in numerical order (and what's also the
> > > > server's "message-number" besides its "message-id") as noted above this
> > > > gets into the storage for retention of a file, while, most services for this
> > > > are instead for storage and serving, not necessarily or at all retention.
> > > >
> > > > I.e., the point is that as the groups are retained from retention, there is an
> > > > approach what makes for an orderly archeology, as for what convention
> > > > some data arrives, here that this server-server routine is besides the usual
> > > > routine which is "here are new posts, propagate them", it's "please deliver
> > > > as of a retention scan, and I'll try not to repeat it, what results as orderly
> > > > as possible a proof or exercise of what we'll call afterward entire retention",
> > > > then will be for as of writing a file that "as of the date, from start to finish,
> > > > this site certified these messages as best-effort retention".
> > > >
> > > > It seems then besides there is basically "here is some mbox file, serve it
> > > > like it was an NNTP group or an IMAP mailbox", ingestion, in terms of that
> > > > what is ingestion, is to result for the protocol that "for this protocol,
> > > > there is actually a normative filesystem representation that happens to
> > > > be pretty much also altogether definede by the protocol", the point is
> > > > that ingestion would result in command to remain in the protocol,
> > > > that a usual file type that "presents a usual abstraction, of a filesystem,
> > > > as from the contents of a file", here with the notion of "for all these
> > > > threaded discussions, here this system only cares some approach to
> > > > these ten particular newgroups that already have mostly their corpus
> > > > though it's not in perhaps their native mbox instead consulted from services".
> > > >
> > > > Then, there's for storing and serving the files, and there is the usual
> > > > notion that moving the data, is to result, that really these file organizations
> > > > are not so large in terms of resources, being "less than gigabytes" or so,
> > > > still there's a notion that as a durable resource they're to be made
> > > > fungible here the networked file approach in the native filesystem,
> > > > then that with respect to it's a backing store, it's to make for that
> > > > the entire enterprise is more or less to made in terms of account,
> > > > that then as a facility on the network then a service in the network,
> > > > it's basically separated the facility and service, while still of course
> > > > that the service is basically defined by its corpus.
> > > >
> > > >
> > > > Then, to make that fungible in a world of account, while with an exit
> > > > strategy so that the operation isn't not abstract, is mostly about the
> > > > domain name, then that what results the networking, after trusted
> > > > network naming and connections for what result routing, and then
> > > > the port, in terms of that there are usual firewalls in ports though that
> > > > besides usually enough client ports are ephemeral, here the point is
> > > > that the protocols and their well-known ports, here it's usually enough
> > > > that the Internet doesn't concern itself so much protocols but with
> > > > respect to proxies, here that for example NNTP and IMAP don't have
> > > > so much anything so related that way after startTLS. For the world of
> > > > account, is basically to have for a domain name, an administrator, and,
> > > > an owner or representative. These are to establish authority for changes
> > > > and also accountability for usage.
> > > >
> > > > Basically they're to be persons and there is a process to get to be an
> > > > administrator of DNS, most always there are services that a usual person
> > > > implementing the system might use, besides for example the numerical.
> > > >
> > > > More relevant though to DNS is getting servers on the network, with respect
> > > > to listening ports and that they connect to clients what so discover them as
> > > > via DNS or configuration, here as above the usual notion that these are
> > > > standard services and run on well-known ports for inetd or systemd.
> > > > I.e. there is basically that running a server and dedicated networking,
> > > > and power and so on, and some notion of the limits of reliability, is then
> > > > as very much in other aspects of the organization of the system, i.e. its name,
> > > > while at the same time, the point that a module makes for that basically
> > > > the provision of a domain name or well-known or ephemeral host, is the
> > > > usual notion that static IP addresses are a limited resource and as about
> > > > the various networks in IPv4 and how they route traffic, is for that these
> > > > services have well-known sections in DNS for at least that the most usual
> > > > configuration is none.
> > > >
> > > > For a usual global reliability and availability, is some notion basically that
> > > > each region and zone has a service available on the IP address, for that
> > > > "hostname" resolves to the IP addresses. As well, in reverse, for the IP
> > > > address and about the hostname, it should resolve reverse to hostname.
> > > >
> > > > About certificates mostly for identification after mapping to port, or
> > > > multi-home Internet routing, here is the point that whether the domain
> > > > name administration is "epochal" or "regular", is that epochs are defined
> > > > by the ports behind the numbers and the domain name system as well,
> > > > where in terms of the registrar, the domain names are epochal to the
> > > > registrar, with respect to owners of domain names.
> > > >
> > > > Then if DNS is a datagram or UDP service is for ICMP as for TCP/IP,
> > > > and also BGP and NAT and routing and what are local and remote
> > > > addresses, here is for not-so-much "implement DNS the protocol
> > > > also while you're at it", rather for what results that there is a durable
> > > > and long-standing and proper doorman, for some usenet.science.
> > > >
> > > > Here then the notion seems to be whether the doorman basically
> > > > knows well-known services, is a multi-homing router, or otherwise
> > > > what is the point that it starts the lean runtime, with respect to that
> > > > it's a container and having enough sense of administration its operation
> > > > as contained. I.e. here given a port and a hostname and always running
> > > > makes for that as long as there is the low (preferable no) idle for services
> > > > running that have no clients, is here also for the cheapest doorman that
> > > > knows how to standup the client sentinel. (And put it back away.)
> > > >
> > > > Probably the most awful thing in the cloud services is the cost for
> > > > data ingress and egress. What that means is that for example using
> > > > a facility that is bound by that as a cost instead of under some constant
> > > > cost, is basically why there is the approach that the containers needs a
> > > > handle to the files, and they're either local files or network files, here
> > > > with the some convention above in archival a shared consistent view
> > > > of all the files, or abstractly consistent, is for making that the doorman
> > > > can handle lots of starting and finishing connections, while it is out of
> > > > the way when usually it's client traffic and opening and closing connections,
> > > > and the usual abstraction is that the client sentinel is never off and doorman
> > > > does nothing, here is for attaching the one to some lower constant cost,
> > > > where for example any long-running cost is more than some low constant cost.
> > > >
> > > > Then, this kind of service is often represented by nodes, in the usual sense
> > > > "here is an abstract container with you hope some native performance under
> > > > the hypervisor where it lives on the farm on its rack, it basically is moved the
> > > > image to wherever it's requested from and lives there, have fun, the meter is on".
> > > > I.e. that's just "this Jar has some config conventions and you can make the
> > > > container associate it and watchdog it with systemd for example and use the
> > > > cgroups while you're at it and make for tempfs quota and also the best network
> > > > file share, which you might be welcome to cache if you care just in the off-chance
> > > > that this file-mapping is free or constant cost as long as it doesn't egress the
> > > > network", is for here about the facilities that work, to get a copy of the system
> > > > what with respect to its usual operation is a piece of the Internet.
> > > >
> > > > For the different reference modules (industry factories) in their patterns then
> > > > and under combined configuration "file + process + network + fare", is that
> > > > the fare of the service basically reflects a daily coin, in the sense that it
> > > > represents an annual or epochal fee, what results for the time there is
> > > > what is otherwise all defined the "file + process + network + name",
> > > > what results it perpetuates in operation more than less simply and automatically.
> > > >
> > > > Then, the point though is to get it to where "I can go to this service, and
> > > > administer it more or less by paying an account, that it thus lives in its
> > > > budget and quota in its metered world".
> > > >
> > > > That though is very involved with identity, that in terms of "I the account
> > > > as provided this sum make this sum paid with respect to an agreement",
> > > > is that authority to make agreements must make that it results that the
> > > > operation of the system, is entirely transparent, and defined in terms of
> > > > the roles and delegation, conventions in operation.
> > > >
> > > > I.e., I personally don't want to administer a copy of usenet, but, it's here
> > > > pretty much sorted out that I can administer one once then that it's to
> > > > administer itself in the following, in terms of it having resources to allocate
> > > > and resources to disburse. Also if nobody's using it it should basically work
> > > > itself out to dial its lights down (while maintaining availability).
> > > >
> > > > Then a point seems "maintain and administer the operation in effect,
> > > > what arrangement sees via delegation, that a card number and a phone
> > > > number and an email account and more than less a responsible entity,
> > > > is so indicated for example in cryptographic identity thus that the operation
> > > > of this system as a service, effectively operates itself out of a kitty,
> > > > what makes for administration and overhead, an entirely transparent
> > > > model of a miniature business the system as a service".
> > > >
> > > > "... and a mailing address and mail service."
> > > >
> > > > Then, for accounts and accounts, for example is the provision of the component
> > > > as simply an image in cloud algorithms, where basically as above here it's configured
> > > > that anybody with any cloud account could basically run it on their own terms,
> > > > there is for here sorting out "after this delegation to some business entity what
> > > > results a corporation in effect, the rest is business-in-a-box and more-than-less
> > > > what makes for its administration in state, is for how it basically limits and replicates
> > > > its service, in terms of its own assets here as what administered is abstractly
> > > > "durable forever mailboxes with private ownership if on public or managed resources".
> > > >
> > > > A usual notion of a private email and usenet service offering and business-in-a-box,
> > > > here what I'm looking at is that besides archiving sci.math and copying out its content
> > > > under author line, is to make such an industry for example here that "once having
> > > > implemented an Internet service, an Internet service of them results Internet".
> > > >
> > > > I.e. here the point is to make a corporation and a foundation in effect, what in terms
> > > > of then about the books and accounts, is about accounts for the business accounts
> > > > that reflect a persistent entity, then what results in terms of computing, networking,
> > > > and internetworking, with a regular notion of "let's never change this arrangement
> > > > but it's in monthly or annual terms", here for that in overall arrangements,
> > > > it results what the entire system more than less runs in ways then to either
> > > > run out its limits or make itself a sponsored effort, about more-or-less a simple
> > > > and responsible and accountable set of operations what effect the business
> > > > (here that in terms of service there is basically the realm of agreement)
> > > > that basically this sort of business-in-a-box model, is then besides itself of
> > > > accounts, toward the notion as pay-as-you-go and "usual credits and their limits".
> > > >
> > > > Then for a news://usenet.science, or for example sci.math.usenet.science,
> > > > is the idea that the entity is "some assemblage what is so that in DNS, and,
> > > > in the accounts payable and receivable, and, in the material matters of
> > > > arrangement and authority for administration, of DNS and resources and
> > > > accounts what result durably persisting the business, is basically for a service
> > > > then of what these are usual enough tasks, as that are interactive workflows
> > > > and for mechanical workflows.
> > > >
> > > > I.e. the point is for having the service than an on/off button and more or less
> > > > what is for a given instance of the operation, what results from some protocol
> > > > that provides a "durable store" of a sort of the business, that at any time basically
> > > > some re-routine or "eventually consistent" continuance of the operation of the
> > > > business, results basically a continuity in its operations, what is entirely granular,
> > > > that here for example the point is to "pick a DNS name, attach an account service,
> > > > go" it so results that in the terms, basically there are the placeholders of the
> > > > interactive workflows in that, and as what in terms are often for example simply
> > > > card and phone number terms, account terms.
> > > >
> > > > I.e. a service to replenish accounts as kitties for making accounts only and
> > > > exactly limited to the one service, its transfers, basically results that there
> > > > is the notion of an email address, a phone number, a credit card's information,
> > > > here a fixed limit debit account that works as of a kitty, there is a regular workflow
> > > > service that will read out the durable stores and according to the timeliness of
> > > > their events, affect the configuration and reconciliation of payments for accounts
> > > > (closed loop scheduling/receiving).
> > > >
> > > > https://datatracker.ietf.org/doc/draft-flanagan-regext-datadictionary/
> > > > https://www.rfc-editor.org/rfc/rfc9022.txt
> > > >
> > > > Basically for dailies, monthlies, and annuals, what make weeklies,
> > > > is this idea of Internet-from-a- account, what is services.
> > > After implementing a store, and the protocol for getting messages, then what seems relevant here in the
> > > context of the SEARCH command, is a fungible file-format, that is derived from the body of the message
> > > in a normal form, that is a data structure that represents an index and catalog and dictionary and summary
> > > of the message, a form of a data structure of a "search index".
> > >
> > > These types files should naturally compose, and result a data structure that according to some normal
> > > forms of search and summary algorithms, result that a data structure results, that makes for efficient
> > > search of sections of the corpus for information retrieval, here that "information retrieval is the science
> > > of search algorithms".
> > >
> > > Now, for what and how people search, or what is the specification of a search, is in terms of queries, say,
> > > here for some brief forms of queries that advise what's definitely included in the search, what's excluded,
> > > then perhaps what's maybe included, or yes/no/maybe, which makes for a predicate that can be built,
> > > that can be applied to results that compose and build for the terms of a filter with yes/no/maybe or
> > > sure/no/yes, with predicates in values.
> > >
> > > Here there is basically "free text search" and "matching summaries", where text is the text and summary is
> > > a data structure, with attributes as paths the leaves of the tree of which match.
> > >
> > > Then, the message has text, its body, and and headers, key-value pairs or collections thereof, where as well
> > > there are default summaries like "a histogram of words by occurrence" or for example default text like "the
> > > MIME body of this message has a default text representation".
> > >
> > > So, the idea developing here is to define what are "normal" forms of data structures that have some "normal"
> > > forms of encoding that result that these "normalizing" after "normative" data structures define well-behaved
> > > algorithms upon them, which provide well-defined bounds in resources that return some quantification of results,
> > > like any/each/every/all, "hits".
> > >
> > > This is where usually enough search engines' or collected search algorithms ("find") usually enough have these
> > > de-facto forms, "under the hood", as it were, to make it first-class that for a given message and body that
> > > there is a normal form of a "catalog summary index" which can be compiled to a constant when the message
> > > is ingested, that then basically any filestore of these messages has alongside it the filestore of the "catsums"
> > > or as on-demand, then that any algorithm has at least well-defined behavior under partitions or collections
> > > or selections of these messages, or items, for various standard algorithms that separate "to find" from
> > > "to serve to find".
> > >
> > > So, ..., what I'm wondering are what would be sufficient normal forms in brief that result that there are
> > > defined for a given corpus of messages, basically at the granularity of messages, how is defined how
> > > there is a normal form for each message its "catsum", that catums have a natural algebra that a
> > > concatenation of catums is a catsum and that some standard algorithms naturally have well-defined
> > > results on their predicates and quantifiers of matching, in serial and parallel, and that the results
> > > combine in serial and parallel.
> > >
> > > The results should be applicable to any kind of data but here it's more or less about usenet groups.
> > So I start browsing the Information Retrieval section in Wikipedia and more or less get to reading
> > Luhn's 1958 "automatic coding of document summaries" or "The Automatic Creation of Literature
> > Abstracts". Then, what I figure, is that the histogram, is an associative array of keys to counts,
> > and what I figure is to compute both the common terms, and, the rare terms, so that there's both
> > "common-weight" and "rare-weight" computed, off of the count of the terms, and the count of
> > distinct terms, where it is working up that besides catums, or catsums, it would result a relational
> > algebra of terms in, ..., terms, of counts and densities and these type things. This is where, first I
> > would figure the catsum would be deterministic before it's at all probabilistic, because the goal is
> > match-find not match-guess, while still it's to support the less deterministic but more opportunistic
> > at the same time.
> >
> > Then, the "index" is basically like a usual book's index, for each term that's not a common term in
> > the language but is a common term in the book, what page it's on, here that that is a read-out of
> > a histogram of the terms to pages. Then, compound terms, basically get into grammar, and in terms
> > of terms, I don't so much care to parse glossolalia as what result mostly well-defined compound terms
> > in usual natural languages, for the utility of a dictionary and technical dictionaries. Here "pages" are
> > both according to common message threads, and also the surround of messages in the same time
> > period, where a group is a common message thread and a usenet is a common message thread.
> >
> > (I've had a copy of "the information retrieval book" before, also borrowed one "data logic".)
> >
> > "Spelling mistakes considered adversarial."
> >
> > https://en.wikipedia.org/wiki/Subject_indexing#Indexing_theory
> >
> > Then, there's lots to be said for "summary" and "summary in statistic".
> >
> >
> > A first usual data structure for efficiency is the binary tree or bounding tree. Then, there's
> > also what makes for divide-and-conquer or linear speedup.
> >
> >
> > About the same time as Luhn's monograph or 1956, there was published a little book
> > called "Logic and Language", Huppe and Kaminsky. It details how according to linguistics
> > there are certain usual regular patterns of words after phonemes and morphology what
> > result then for stems and etymology that then for vocabulary that grammar or natural
> > language results above. Then there are also gentle introductions to logic. It's very readable
> > and quite brief.
> I haven't much been tapping away at this,
> but it's pretty simple to stand up a usenet peer,
> and pretty simple to slurp a copy,
> of the "Big 8" usenet text groups, for example,
> or particularly just for a few.


Well, I've been thinking about this, and there are some ideas.

One is about a system of reputation, the idea being New/Old/Off/Bad/Bot/Non,
basically figuring that reputation is established by action.

Figuring how to categorize spam, UCE, vice, crime, and call that Bad, then
gets into basically two editions, with a common backing, Cur (curated) and Raw,
with Old and New in curated, and Off and Bot a filter off that, and Bad and Non
excluded, though in the raw feed. Then there's only to forward what's curated,
or current.

Here the idea is that New graduates to Old, Non might be a false-negative New,
but is probably a negative Bad or Off, and then Bot is a sort of honor system, and
Old might wander to Off and vice-versa, then that Off and Old can vacillate.

Then for renditions, is basically that the idea is that it's the same content
behind NNTP, with IMAP, then also an HTTP gateway, Atom/RDF feed, ....

(It's pretty usually text-only but here is MIME.)

There are various ways to make for posting that's basically for that Old
can post what they want, and Off, then for something like that New,
gets an email in reply to their post, that they reply to that, to round-trip a post.

(Also mail-to-news and news-to-mail are pretty usual. Also there are
notions of humanitarian inputs.)

Similarly there are the notions above about using certificates and TLS to
use technology and protocol to solve technology protocol abuse problems.

For surfacing the items then is about technologies like robots.txt and
Dublin Core metadata, and similar notions with respect to uniqueness.
If you have other ideas about this, please chime in.

Then for having a couple sorts of organizations of both the domain name
and the URL's as resources, makes for example for sub-domains for groups,
for example then with certificate conventions in that, then usual sorts of
URL's that are, you know, URL's, and URN's, then, about URL's, URI's, and URN's.

Luckily it's all quite standardized so quite stock NTTP, IMAP, and HTTP browsers,
and about SMTP and IMAP, and with TLS, make of course a fungible sort of system.


How to pay for it all? At about $500 a year for all text usenet,
about a day's golf foursome and a few beers can stand up a new Usenet peer.
Ross Finlayson
2024-01-23 04:38:38 UTC
Reply
Permalink
On Friday, December 22, 2023 at 12:36:40 AM UTC-8, Ross Finlayson wrote:
> On Saturday, April 29, 2023 at 2:54:26 PM UTC-7, Ross Finlayson wrote:
> > On Wednesday, March 8, 2023 at 10:23:04 PM UTC-8, Ross Finlayson wrote:
> > > On Wednesday, March 8, 2023 at 8:51:58 PM UTC-8, Ross Finlayson wrote:
> > > > On Monday, December 6, 2021 at 7:32:16 AM UTC-8, Ross A. Finlayson wrote:
> > > > > On Monday, November 16, 2020 at 5:39:08 PM UTC-8, Ross A. Finlayson wrote:
> > > > > > On Monday, November 16, 2020 at 5:00:51 PM UTC-8, Ross A. Finlayson wrote:
> > > > > > > On Tuesday, June 30, 2020 at 10:00:52 AM UTC-7, Mostowski Collapse wrote:
> > > > > > > > NNTP is not HTTP. I was using bare metal access to
> > > > > > > > usenet, not using Google group, via:
> > > > > > > >
> > > > > > > > news.albasani.net, unfortunately dead since Corona
> > > > > > > >
> > > > > > > > So was looking for an alternative. And found this
> > > > > > > > alternative, which seems fine:
> > > > > > > >
> > > > > > > > news.solani.org
> > > > > > > >
> > > > > > > > Have Fun!
> > > > > > > >
> > > > > > > > P.S.: Technical spec of news.solani.org:
> > > > > > > >
> > > > > > > > Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 500 GB SSD RAID1
> > > > > > > > Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 2 TB HDD RAID1
> > > > > > > > Intel Xeon VM, 1.8 GHz, 1 GB RAM, 20 GB
> > > > > > > > Standort: 2x Falkenstein, 1x New York
> > > > > > > >
> > > > > > > > advantage of bare metal usenet,
> > > > > > > > you see all headers of message.
> > > > > > > > Am Dienstag, 30. Juni 2020 06:24:53 UTC+2 schrieb Ross A. Finlayson:
> > > > > > > > > Search you mentioned and for example HTTP is adding the SEARCH verb,
> > > > > > > In traffic there are two kinds of usenet users,
> > > > > > > viewers and traffic through Google Groups,
> > > > > > > and, USENET. (USENET traffic.)
> > > > > > >
> > > > > > > Here now Google turned on login to view their
> > > > > > > Google Groups - effectively closing the Google Groups
> > > > > > > without a Google login.
> > > > > > >
> > > > > > > I suppose if they're used at work or whatever though
> > > > > > > they'd be open.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Where I got with the C10K non-blocking I/O for a usenet server,
> > > > > > > it scales up though then I think in the runtime is a situation where
> > > > > > > it only runs epoll or kqueue that the test scale ups, then at the end
> > > > > > > or in sockets there is a drop, or it fell off the driver. I've implemented
> > > > > > > the code this far, what has all of NNTP in a file and then the "re-routine,
> > > > > > > industry-pattern back-end" in memory, then for that running usually.
> > > > > > >
> > > > > > > (Cooperative multithreading on top of non-blocking I/O.)
> > > > > > >
> > > > > > > Implementing the serial queue or "monohydra", or slique,
> > > > > > > makes for that then when the parser is constantly parsing,
> > > > > > > it seems a usual queue like data structure with parsing
> > > > > > > returning its bounds, consuming the queue.
> > > > > > >
> > > > > > > Having the file buffers all down small on 4K pages,
> > > > > > > has that a next usual page size is the megabyte.
> > > > > > >
> > > > > > > Here though it seems to make sense to have a natural
> > > > > > > 4K alignment the file system representation, then that
> > > > > > > it is moving files.
> > > > > > >
> > > > > > > So, then with the new modern Java, it that runs in its own
> > > > > > > Java server runtime environment, it seems I would also
> > > > > > > need to see whether the cloud virt supported the I/O model
> > > > > > > or not, or that the cooperative multi-threading for example
> > > > > > > would be single-threaded. (Blocking abstractly.)
> > > > > > >
> > > > > > > Then besides I suppose that could be neatly with basically
> > > > > > > the program model, and its file model, being well-defined,
> > > > > > > then for NNTP with IMAP organization search and extensions,
> > > > > > > those being standardized, seems to make sense for an efficient
> > > > > > > news file organization.
> > > > > > >
> > > > > > > Here then it seems for serving the NNTP, and for example
> > > > > > > their file bodies under the storage, with the fixed headers,
> > > > > > > variable header or XREF, and the message body, then under
> > > > > > > content it's same as storage.
> > > > > > >
> > > > > > > NNTP has "OVERVIEW" then from it is built search.
> > > > > > >
> > > > > > > Let's see here then, if I get the load test running, or,
> > > > > > > just put a limit under the load while there are no load test
> > > > > > > errors, it seems the algorithm then scales under load to be
> > > > > > > making usually the algorithm serial in CPU, with: encryption,
> > > > > > > and compression (traffic). (Block ciphers instead of serial transfer.)
> > > > > > >
> > > > > > > Then, the industry pattern with re-routines, has that the
> > > > > > > re-routines are naturally co-operative in the blocking,
> > > > > > > and in the language, including flow-of-control and exception scope.
> > > > > > >
> > > > > > >
> > > > > > > So, I have a high-performance implementation here.
> > > > > > It seems like for NFS, then, and having the separate read and write of the client,
> > > > > > a default filesystem, is an idea for the system facility: mirroring the mounted file
> > > > > > locally, and, providing the read view from that via a different route.
> > > > > >
> > > > > >
> > > > > > A next idea then seems for the organization, the client views themselves
> > > > > > organize over the durable and available file system representation, this
> > > > > > provides anyone a view over the protocol with a group file convention.
> > > > > >
> > > > > > I.e., while usual continuous traffic was surfing, individual reads over group
> > > > > > files could have independent views, for example collating contents.
> > > > > >
> > > > > > Then, extracting requests from traffic and threads seems usual.
> > > > > >
> > > > > > (For example a specialized object transfer view.)
> > > > > >
> > > > > > Making protocols for implementing internet protocols in groups and
> > > > > > so on, here makes for giving usenet example views to content generally.
> > > > > >
> > > > > > So, I have designed a protocol node and implemented it mostly,
> > > > > > then about designed an object transfer protocol, here the idea
> > > > > > is how to make it so people can extract data, for example their own
> > > > > > data, from a large durable store of all the usenet messages,
> > > > > > making views of usenet running on usenet, eg "Feb. 2016: AP's
> > > > > > Greatest Hits".
> > > > > >
> > > > > > Here the point is to figure that usenet, these days, can be operated
> > > > > > in cooperation with usenet, and really for its own sake, for leaving
> > > > > > messages in usenet and here for usenet protocol stores as there's
> > > > > > no reason it's plain text the content, while the protocol supports it.
> > > > > >
> > > > > > Building personal view for example is a simple matter of very many
> > > > > > service providers any of which sells usenet all day for a good deal.
> > > > > >
> > > > > > Let's see here, $25/MM, storage on the cloud last year for about
> > > > > > a million messages for a month is about $25. Outbound traffic is
> > > > > > usually the metered cloud traffic, here for example that CDN traffic
> > > > > > support the universal share convention, under metering. What that
> > > > > > the algorithm is effectively tunable in CPU and RAM, makes for under
> > > > > > I/O that's it's "unobtrusive" or the cooperative in routine, for CPI I/O and
> > > > > > RAM, then that there is for seeking that Network Store or Database Time
> > > > > > instead effectively becomes File I/O time, as what may be faster,
> > > > > > and more durable. There's a faster database time for scaling the ingestion
> > > > > > here with that the file view is eventually consistent. (And reliable.)
> > > > > >
> > > > > > Checking the files would be over time for example with "last checked"
> > > > > > and "last dropped" something along the lines of, finding wrong offsets,
> > > > > > basically having to make it so that it survives neatly corruption of the
> > > > > > store (by being more-or-less stored in-place).
> > > > > >
> > > > > > Content catalog and such, catalog.
> > > > > Then I wonder and figure the re-routine can scale.
> > > > >
> > > > > Here for the re-routine, the industry factory pattern,
> > > > > and the commands in the protocols in the templates,
> > > > > and the memory module, with the algorithm interface,
> > > > > in the high-performance computer resource, it is here
> > > > > that this simple kind of "writing Internet software"
> > > > > makes pretty rapidly for adding resources.
> > > > >
> > > > > Here the design is basically of a file I/O abstraction,
> > > > > that the computer reads data files with mmap to get
> > > > > their handlers, what results that for I/O map the channels
> > > > > result transferring the channels in I/O for what results,
> > > > > in mostly the allocated resource requirements generally,
> > > > > and for the protocol and algorithm, it results then that
> > > > > the industry factory pattern and making for interfaces,
> > > > > then also here the I/O routine as what results that this
> > > > > is an implementation, of a network server, mostly is making
> > > > > for that the re-routine, results very neatly a model of
> > > > > parallel cooperation.
> > > > >
> > > > > I think computers still have file systems and file I/O but
> > > > > in abstraction just because PAGE_SIZE is still relevant for
> > > > > the network besides or I/O, if eventually, here is that the
> > > > > value types are in the commands and so on, it is besides
> > > > > that in terms of the resources so defined it still is in a filesystem
> > > > > convention that a remote and unreliable view of it suffices.
> > > > >
> > > > > Here then the source code also being "this is only 20-50k",
> > > > > lines of code, with basically an entire otherwise library stack
> > > > > of the runtime itself, only the network and file abstraction,
> > > > > this makes for also that modularity results. (Factory Industry
> > > > > Pattern Modules.)
> > > > >
> > > > > For a network server, here, that, mostly it is high performance
> > > > > in the sense that this is about the most direct handle on the channels
> > > > > and here mostly for the text layer in the I/O order, or protocol layer,
> > > > > here is that basically encryption and compression usually in the layer,
> > > > > there is besides a usual concern where encryption and compression
> > > > > are left out, there is that text in the layer itself is commands.
> > > > >
> > > > > Then, those being constants under the resources for the protocol,
> > > > > it's what results usual protocols like NNTP and HTTP and other protocols
> > > > > with usually one server and many clients, here is for that these protocols
> > > > > are defined in these modules, mostly there NNTP and IMAP, ..., HTTP.
> > > > >
> > > > > These are here defined "all Java" or "Pure Java", i.e. let's be clear that
> > > > > in terms of the reference abstraction layer, I think computers still use
> > > > > the non-blocking I/O and filesystems and network to RAM, so that as
> > > > > the I/O is implemented in those it actually has those besides instead for
> > > > > example defaulting to byte-per-channel or character I/O. I.e. the usual
> > > > > semantics for servicing the I/O in the accepter routine and what makes
> > > > > for that the platform also provides a reference encryption implementation,
> > > > > if not so relevant for the block encoder chain, besides that for example
> > > > > compression has a default implementation, here the I/O model is as simply
> > > > > in store for handles, channels, ..., that it results that data especially delivered
> > > > > from a constant store can anyways be mostly compressed and encrypted
> > > > > already or predigested to serve, here that it's the convention, here is for
> > > > > resulting that these client-server protocols, with usually reads > postings
> > > > > then here besides "retention", basically here is for what it is.
> > > > >
> > > > > With the re-routine and the protocol layer besides, having written the
> > > > > routines in the re-routine, what there is to write here is this industry
> > > > > factory, or a module framework, implementing the re-routines, as they're
> > > > > built from the linear description a routine, makes for as the routine progresses
> > > > > that it's "in the language" and that more than less in the terms, it makes for
> > > > > implementing the case of logic for values, in the logic's flow-of-control's terms.
> > > > >
> > > > > Then, there is that actually running the software is different than just
> > > > > writing it, here in the sense that as a server runtime, it is to be made a
> > > > > thing, by giving it a name, and giving it an authority, to exist on the Internet.
> > > > >
> > > > > There is basically that for BGP and NAT and so on, and, mobile fabric networks,
> > > > > IP and TCP/IP, of course IPv4 and IPv6 are the coarse fabric main space, with
> > > > > respect to what are CIDR and 24 bits rule and what makes for TCP/IP, here
> > > > > entirely the course is using the TCP/IP stack and Java's TCP/IP stack, with
> > > > > respect to that TCP/IP is so provided or in terms of process what results
> > > > > ports mostly and connection models where it is exactly the TCP after the IP,
> > > > > the Transport Control Protocol and Internet Protocol, have here both this
> > > > > socket and datagram connection orientation, or stateful and stateless or
> > > > > here that in terms of routing it's defined in addresses, under that names
> > > > > and routing define sources, routes, destinations, ..., that routine numeric
> > > > > IP addresses result in the usual sense of the network being behind an IP
> > > > > and including IPv4 network fabric with respect to local routers.
> > > > >
> > > > > I.e., here to include a service framework is "here besides the routine, let's
> > > > > make it clear that in terms of being a durable resource, there needs to be
> > > > > some lockbox filled with its sustenance that in some locked or constant
> > > > > terms results that for the duration of its outlay, say five years, it is held
> > > > > up, then, it will be so again, or, let down to result the carry-over that it
> > > > > invested to archive itself, I won't have to care or do anything until then".
> > > > >
> > > > >
> > > > > About the service activation and the idea that, for a port, the routine itself
> > > > > needs only run under load, i.e. there is effectively little traffic on the old archives,
> > > > > and usually only the some other archive needs any traffic. Here the point is
> > > > > that for the Java routine there is the system port that was accepted for the
> > > > > request, that inetd or the systemd or means the network service was accessed,
> > > > > made for that much as for HTTP the protocol is client-server also for IP the
> > > > > protocol is client-server, while the TCP is packets. This is a general idea for
> > > > > system integration while here mostly the routine is that being a detail:
> > > > > the filesystem or network resource that results that the re-routines basically
> > > > > make very large CPU scaling.
> > > > >
> > > > > Then, it is basically containerized this sense of "at some domain name, there
> > > > > is a service, it's HTTP and NNTP and IMAP besides, what cares the world".
> > > > >
> > > > > I.e. being built on connection oriented protocols like the socket layer,
> > > > > HTTP(S) and NNTP(S) and IMAP(S) or with the TLS orientation to certificates,
> > > > > it's more than less sensible that most users have no idea of installing some
> > > > > NNTP browser or pointing their email to IMAP so that the email browser
> > > > > browses the newsgroups and for postings, here this is mostly only talk
> > > > > about implementing NNTP then IMAP and HTTP that happens to look like that,
> > > > > besides for example SMTP or NNTP posting.
> > > > >
> > > > > I.e., having "this IMAP server, happens to be this NNTP module", or
> > > > > "this HTTP server, happens to be a real simple mailbox these groups",
> > > > > makes for having partitions and retentions of those and that basically
> > > > > NNTP messages in the protocol can be more or less the same content
> > > > > in media, what otherwise is of a usual message type.
> > > > >
> > > > > Then, the NNTP server-server routine is the progation of messages
> > > > > besides "I shall hire ten great usenet retention accounts and gently
> > > > > and politely draw them down and back-fill Usenet, these ten groups".
> > > > >
> > > > > By then I would have to have made for retention in storage, such contents,
> > > > > as have a reference value, then for besides making that independent in
> > > > > reference value, just so that it suffices that it basically results "a usable
> > > > > durable filesystem that happens you can browse it like usenet". I.e. as
> > > > > the pieces to make the backfill are dug up, they get assigned reference numbers
> > > > > of their time to make for what here is that in a grand schema of things,
> > > > > they have a reference number in numerical order (and what's also the
> > > > > server's "message-number" besides its "message-id") as noted above this
> > > > > gets into the storage for retention of a file, while, most services for this
> > > > > are instead for storage and serving, not necessarily or at all retention.
> > > > >
> > > > > I.e., the point is that as the groups are retained from retention, there is an
> > > > > approach what makes for an orderly archeology, as for what convention
> > > > > some data arrives, here that this server-server routine is besides the usual
> > > > > routine which is "here are new posts, propagate them", it's "please deliver
> > > > > as of a retention scan, and I'll try not to repeat it, what results as orderly
> > > > > as possible a proof or exercise of what we'll call afterward entire retention",
> > > > > then will be for as of writing a file that "as of the date, from start to finish,
> > > > > this site certified these messages as best-effort retention".
> > > > >
> > > > > It seems then besides there is basically "here is some mbox file, serve it
> > > > > like it was an NNTP group or an IMAP mailbox", ingestion, in terms of that
> > > > > what is ingestion, is to result for the protocol that "for this protocol,
> > > > > there is actually a normative filesystem representation that happens to
> > > > > be pretty much also altogether definede by the protocol", the point is
> > > > > that ingestion would result in command to remain in the protocol,
> > > > > that a usual file type that "presents a usual abstraction, of a filesystem,
> > > > > as from the contents of a file", here with the notion of "for all these
> > > > > threaded discussions, here this system only cares some approach to
> > > > > these ten particular newgroups that already have mostly their corpus
> > > > > though it's not in perhaps their native mbox instead consulted from services".
> > > > >
> > > > > Then, there's for storing and serving the files, and there is the usual
> > > > > notion that moving the data, is to result, that really these file organizations
> > > > > are not so large in terms of resources, being "less than gigabytes" or so,
> > > > > still there's a notion that as a durable resource they're to be made
> > > > > fungible here the networked file approach in the native filesystem,
> > > > > then that with respect to it's a backing store, it's to make for that
> > > > > the entire enterprise is more or less to made in terms of account,
> > > > > that then as a facility on the network then a service in the network,
> > > > > it's basically separated the facility and service, while still of course
> > > > > that the service is basically defined by its corpus.
> > > > >
> > > > >
> > > > > Then, to make that fungible in a world of account, while with an exit
> > > > > strategy so that the operation isn't not abstract, is mostly about the
> > > > > domain name, then that what results the networking, after trusted
> > > > > network naming and connections for what result routing, and then
> > > > > the port, in terms of that there are usual firewalls in ports though that
> > > > > besides usually enough client ports are ephemeral, here the point is
> > > > > that the protocols and their well-known ports, here it's usually enough
> > > > > that the Internet doesn't concern itself so much protocols but with
> > > > > respect to proxies, here that for example NNTP and IMAP don't have
> > > > > so much anything so related that way after startTLS. For the world of
> > > > > account, is basically to have for a domain name, an administrator, and,
> > > > > an owner or representative. These are to establish authority for changes
> > > > > and also accountability for usage.
> > > > >
> > > > > Basically they're to be persons and there is a process to get to be an
> > > > > administrator of DNS, most always there are services that a usual person
> > > > > implementing the system might use, besides for example the numerical.
> > > > >
> > > > > More relevant though to DNS is getting servers on the network, with respect
> > > > > to listening ports and that they connect to clients what so discover them as
> > > > > via DNS or configuration, here as above the usual notion that these are
> > > > > standard services and run on well-known ports for inetd or systemd.
> > > > > I.e. there is basically that running a server and dedicated networking,
> > > > > and power and so on, and some notion of the limits of reliability, is then
> > > > > as very much in other aspects of the organization of the system, i.e. its name,
> > > > > while at the same time, the point that a module makes for that basically
> > > > > the provision of a domain name or well-known or ephemeral host, is the
> > > > > usual notion that static IP addresses are a limited resource and as about
> > > > > the various networks in IPv4 and how they route traffic, is for that these
> > > > > services have well-known sections in DNS for at least that the most usual
> > > > > configuration is none.
> > > > >
> > > > > For a usual global reliability and availability, is some notion basically that
> > > > > each region and zone has a service available on the IP address, for that
> > > > > "hostname" resolves to the IP addresses. As well, in reverse, for the IP
> > > > > address and about the hostname, it should resolve reverse to hostname.
> > > > >
> > > > > About certificates mostly for identification after mapping to port, or
> > > > > multi-home Internet routing, here is the point that whether the domain
> > > > > name administration is "epochal" or "regular", is that epochs are defined
> > > > > by the ports behind the numbers and the domain name system as well,
> > > > > where in terms of the registrar, the domain names are epochal to the
> > > > > registrar, with respect to owners of domain names.
> > > > >
> > > > > Then if DNS is a datagram or UDP service is for ICMP as for TCP/IP,
> > > > > and also BGP and NAT and routing and what are local and remote
> > > > > addresses, here is for not-so-much "implement DNS the protocol
> > > > > also while you're at it", rather for what results that there is a durable
> > > > > and long-standing and proper doorman, for some usenet.science.
> > > > >
> > > > > Here then the notion seems to be whether the doorman basically
> > > > > knows well-known services, is a multi-homing router, or otherwise
> > > > > what is the point that it starts the lean runtime, with respect to that
> > > > > it's a container and having enough sense of administration its operation
> > > > > as contained. I.e. here given a port and a hostname and always running
> > > > > makes for that as long as there is the low (preferable no) idle for services
> > > > > running that have no clients, is here also for the cheapest doorman that
> > > > > knows how to standup the client sentinel. (And put it back away.)
> > > > >
> > > > > Probably the most awful thing in the cloud services is the cost for
> > > > > data ingress and egress. What that means is that for example using
> > > > > a facility that is bound by that as a cost instead of under some constant
> > > > > cost, is basically why there is the approach that the containers needs a
> > > > > handle to the files, and they're either local files or network files, here
> > > > > with the some convention above in archival a shared consistent view
> > > > > of all the files, or abstractly consistent, is for making that the doorman
> > > > > can handle lots of starting and finishing connections, while it is out of
> > > > > the way when usually it's client traffic and opening and closing connections,
> > > > > and the usual abstraction is that the client sentinel is never off and doorman
> > > > > does nothing, here is for attaching the one to some lower constant cost,
> > > > > where for example any long-running cost is more than some low constant cost.
> > > > >
> > > > > Then, this kind of service is often represented by nodes, in the usual sense
> > > > > "here is an abstract container with you hope some native performance under
> > > > > the hypervisor where it lives on the farm on its rack, it basically is moved the
> > > > > image to wherever it's requested from and lives there, have fun, the meter is on".
> > > > > I.e. that's just "this Jar has some config conventions and you can make the
> > > > > container associate it and watchdog it with systemd for example and use the
> > > > > cgroups while you're at it and make for tempfs quota and also the best network
> > > > > file share, which you might be welcome to cache if you care just in the off-chance
> > > > > that this file-mapping is free or constant cost as long as it doesn't egress the
> > > > > network", is for here about the facilities that work, to get a copy of the system
> > > > > what with respect to its usual operation is a piece of the Internet.
> > > > >
> > > > > For the different reference modules (industry factories) in their patterns then
> > > > > and under combined configuration "file + process + network + fare", is that
> > > > > the fare of the service basically reflects a daily coin, in the sense that it
> > > > > represents an annual or epochal fee, what results for the time there is
> > > > > what is otherwise all defined the "file + process + network + name",
> > > > > what results it perpetuates in operation more than less simply and automatically.
> > > > >
> > > > > Then, the point though is to get it to where "I can go to this service, and
> > > > > administer it more or less by paying an account, that it thus lives in its
> > > > > budget and quota in its metered world".
> > > > >
> > > > > That though is very involved with identity, that in terms of "I the account
> > > > > as provided this sum make this sum paid with respect to an agreement",
> > > > > is that authority to make agreements must make that it results that the
> > > > > operation of the system, is entirely transparent, and defined in terms of
> > > > > the roles and delegation, conventions in operation.
> > > > >
> > > > > I.e., I personally don't want to administer a copy of usenet, but, it's here
> > > > > pretty much sorted out that I can administer one once then that it's to
> > > > > administer itself in the following, in terms of it having resources to allocate
> > > > > and resources to disburse. Also if nobody's using it it should basically work
> > > > > itself out to dial its lights down (while maintaining availability).
> > > > >
> > > > > Then a point seems "maintain and administer the operation in effect,
> > > > > what arrangement sees via delegation, that a card number and a phone
> > > > > number and an email account and more than less a responsible entity,
> > > > > is so indicated for example in cryptographic identity thus that the operation
> > > > > of this system as a service, effectively operates itself out of a kitty,
> > > > > what makes for administration and overhead, an entirely transparent
> > > > > model of a miniature business the system as a service".
> > > > >
> > > > > "... and a mailing address and mail service."
> > > > >
> > > > > Then, for accounts and accounts, for example is the provision of the component
> > > > > as simply an image in cloud algorithms, where basically as above here it's configured
> > > > > that anybody with any cloud account could basically run it on their own terms,
> > > > > there is for here sorting out "after this delegation to some business entity what
> > > > > results a corporation in effect, the rest is business-in-a-box and more-than-less
> > > > > what makes for its administration in state, is for how it basically limits and replicates
> > > > > its service, in terms of its own assets here as what administered is abstractly
> > > > > "durable forever mailboxes with private ownership if on public or managed resources".
> > > > >
> > > > > A usual notion of a private email and usenet service offering and business-in-a-box,
> > > > > here what I'm looking at is that besides archiving sci.math and copying out its content
> > > > > under author line, is to make such an industry for example here that "once having
> > > > > implemented an Internet service, an Internet service of them results Internet".
> > > > >
> > > > > I.e. here the point is to make a corporation and a foundation in effect, what in terms
> > > > > of then about the books and accounts, is about accounts for the business accounts
> > > > > that reflect a persistent entity, then what results in terms of computing, networking,
> > > > > and internetworking, with a regular notion of "let's never change this arrangement
> > > > > but it's in monthly or annual terms", here for that in overall arrangements,
> > > > > it results what the entire system more than less runs in ways then to either
> > > > > run out its limits or make itself a sponsored effort, about more-or-less a simple
> > > > > and responsible and accountable set of operations what effect the business
> > > > > (here that in terms of service there is basically the realm of agreement)
> > > > > that basically this sort of business-in-a-box model, is then besides itself of
> > > > > accounts, toward the notion as pay-as-you-go and "usual credits and their limits".
> > > > >
> > > > > Then for a news://usenet.science, or for example sci.math.usenet.science,
> > > > > is the idea that the entity is "some assemblage what is so that in DNS, and,
> > > > > in the accounts payable and receivable, and, in the material matters of
> > > > > arrangement and authority for administration, of DNS and resources and
> > > > > accounts what result durably persisting the business, is basically for a service
> > > > > then of what these are usual enough tasks, as that are interactive workflows
> > > > > and for mechanical workflows.
> > > > >
> > > > > I.e. the point is for having the service than an on/off button and more or less
> > > > > what is for a given instance of the operation, what results from some protocol
> > > > > that provides a "durable store" of a sort of the business, that at any time basically
> > > > > some re-routine or "eventually consistent" continuance of the operation of the
> > > > > business, results basically a continuity in its operations, what is entirely granular,
> > > > > that here for example the point is to "pick a DNS name, attach an account service,
> > > > > go" it so results that in the terms, basically there are the placeholders of the
> > > > > interactive workflows in that, and as what in terms are often for example simply
> > > > > card and phone number terms, account terms.
> > > > >
> > > > > I.e. a service to replenish accounts as kitties for making accounts only and
> > > > > exactly limited to the one service, its transfers, basically results that there
> > > > > is the notion of an email address, a phone number, a credit card's information,
> > > > > here a fixed limit debit account that works as of a kitty, there is a regular workflow
> > > > > service that will read out the durable stores and according to the timeliness of
> > > > > their events, affect the configuration and reconciliation of payments for accounts
> > > > > (closed loop scheduling/receiving).
> > > > >
> > > > > https://datatracker.ietf.org/doc/draft-flanagan-regext-datadictionary/
> > > > > https://www.rfc-editor.org/rfc/rfc9022.txt
> > > > >
> > > > > Basically for dailies, monthlies, and annuals, what make weeklies,
> > > > > is this idea of Internet-from-a- account, what is services.
> > > > After implementing a store, and the protocol for getting messages, then what seems relevant here in the
> > > > context of the SEARCH command, is a fungible file-format, that is derived from the body of the message
> > > > in a normal form, that is a data structure that represents an index and catalog and dictionary and summary
> > > > of the message, a form of a data structure of a "search index".
> > > >
> > > > These types files should naturally compose, and result a data structure that according to some normal
> > > > forms of search and summary algorithms, result that a data structure results, that makes for efficient
> > > > search of sections of the corpus for information retrieval, here that "information retrieval is the science
> > > > of search algorithms".
> > > >
> > > > Now, for what and how people search, or what is the specification of a search, is in terms of queries, say,
> > > > here for some brief forms of queries that advise what's definitely included in the search, what's excluded,
> > > > then perhaps what's maybe included, or yes/no/maybe, which makes for a predicate that can be built,
> > > > that can be applied to results that compose and build for the terms of a filter with yes/no/maybe or
> > > > sure/no/yes, with predicates in values.
> > > >
> > > > Here there is basically "free text search" and "matching summaries", where text is the text and summary is
> > > > a data structure, with attributes as paths the leaves of the tree of which match.
> > > >
> > > > Then, the message has text, its body, and and headers, key-value pairs or collections thereof, where as well
> > > > there are default summaries like "a histogram of words by occurrence" or for example default text like "the
> > > > MIME body of this message has a default text representation".
> > > >
> > > > So, the idea developing here is to define what are "normal" forms of data structures that have some "normal"
> > > > forms of encoding that result that these "normalizing" after "normative" data structures define well-behaved
> > > > algorithms upon them, which provide well-defined bounds in resources that return some quantification of results,
> > > > like any/each/every/all, "hits".
> > > >
> > > > This is where usually enough search engines' or collected search algorithms ("find") usually enough have these
> > > > de-facto forms, "under the hood", as it were, to make it first-class that for a given message and body that
> > > > there is a normal form of a "catalog summary index" which can be compiled to a constant when the message
> > > > is ingested, that then basically any filestore of these messages has alongside it the filestore of the "catsums"
> > > > or as on-demand, then that any algorithm has at least well-defined behavior under partitions or collections
> > > > or selections of these messages, or items, for various standard algorithms that separate "to find" from
> > > > "to serve to find".
> > > >
> > > > So, ..., what I'm wondering are what would be sufficient normal forms in brief that result that there are
> > > > defined for a given corpus of messages, basically at the granularity of messages, how is defined how
> > > > there is a normal form for each message its "catsum", that catums have a natural algebra that a
> > > > concatenation of catums is a catsum and that some standard algorithms naturally have well-defined
> > > > results on their predicates and quantifiers of matching, in serial and parallel, and that the results
> > > > combine in serial and parallel.
> > > >
> > > > The results should be applicable to any kind of data but here it's more or less about usenet groups.
> > > So I start browsing the Information Retrieval section in Wikipedia and more or less get to reading
> > > Luhn's 1958 "automatic coding of document summaries" or "The Automatic Creation of Literature
> > > Abstracts". Then, what I figure, is that the histogram, is an associative array of keys to counts,
> > > and what I figure is to compute both the common terms, and, the rare terms, so that there's both
> > > "common-weight" and "rare-weight" computed, off of the count of the terms, and the count of
> > > distinct terms, where it is working up that besides catums, or catsums, it would result a relational
> > > algebra of terms in, ..., terms, of counts and densities and these type things. This is where, first I
> > > would figure the catsum would be deterministic before it's at all probabilistic, because the goal is
> > > match-find not match-guess, while still it's to support the less deterministic but more opportunistic
> > > at the same time.
> > >
> > > Then, the "index" is basically like a usual book's index, for each term that's not a common term in
> > > the language but is a common term in the book, what page it's on, here that that is a read-out of
> > > a histogram of the terms to pages. Then, compound terms, basically get into grammar, and in terms
> > > of terms, I don't so much care to parse glossolalia as what result mostly well-defined compound terms
> > > in usual natural languages, for the utility of a dictionary and technical dictionaries. Here "pages" are
> > > both according to common message threads, and also the surround of messages in the same time
> > > period, where a group is a common message thread and a usenet is a common message thread.
> > >
> > > (I've had a copy of "the information retrieval book" before, also borrowed one "data logic".)
> > >
> > > "Spelling mistakes considered adversarial."
> > >
> > > https://en.wikipedia.org/wiki/Subject_indexing#Indexing_theory
> > >
> > > Then, there's lots to be said for "summary" and "summary in statistic".
> > >
> > >
> > > A first usual data structure for efficiency is the binary tree or bounding tree. Then, there's
> > > also what makes for divide-and-conquer or linear speedup.
> > >
> > >
> > > About the same time as Luhn's monograph or 1956, there was published a little book
> > > called "Logic and Language", Huppe and Kaminsky. It details how according to linguistics
> > > there are certain usual regular patterns of words after phonemes and morphology what
> > > result then for stems and etymology that then for vocabulary that grammar or natural
> > > language results above. Then there are also gentle introductions to logic. It's very readable
> > > and quite brief.
> > I haven't much been tapping away at this,
> > but it's pretty simple to stand up a usenet peer,
> > and pretty simple to slurp a copy,
> > of the "Big 8" usenet text groups, for example,
> > or particularly just for a few.
> Well, I've been thinking about this, and there are some ideas.
>
> One is about a system of reputation, the idea being New/Old/Off/Bad/Bot/Non,
> basically figuring that reputation is established by action.
>
> Figuring how to categorize spam, UCE, vice, crime, and call that Bad, then
> gets into basically two editions, with a common backing, Cur (curated) and Raw,
> with Old and New in curated, and Off and Bot a filter off that, and Bad and Non
> excluded, though in the raw feed. Then there's only to forward what's curated,
> or current.
>
> Here the idea is that New graduates to Old, Non might be a false-negative New,
> but is probably a negative Bad or Off, and then Bot is a sort of honor system, and
> Old might wander to Off and vice-versa, then that Off and Old can vacillate.
>
> Then for renditions, is basically that the idea is that it's the same content
> behind NNTP, with IMAP, then also an HTTP gateway, Atom/RDF feed, ....
>
> (It's pretty usually text-only but here is MIME.)
>
> There are various ways to make for posting that's basically for that Old
> can post what they want, and Off, then for something like that New,
> gets an email in reply to their post, that they reply to that, to round-trip a post.
>
> (Also mail-to-news and news-to-mail are pretty usual. Also there are
> notions of humanitarian inputs.)
>
> Similarly there are the notions above about using certificates and TLS to
> use technology and protocol to solve technology protocol abuse problems.
>
> For surfacing the items then is about technologies like robots.txt and
> Dublin Core metadata, and similar notions with respect to uniqueness.
> If you have other ideas about this, please chime in.
>
> Then for having a couple sorts of organizations of both the domain name
> and the URL's as resources, makes for example for sub-domains for groups,
> for example then with certificate conventions in that, then usual sorts of
> URL's that are, you know, URL's, and URN's, then, about URL's, URI's, and URN's.
>
> Luckily it's all quite standardized so quite stock NTTP, IMAP, and HTTP browsers,
> and about SMTP and IMAP, and with TLS, make of course a fungible sort of system.
>
>
> How to pay for it all? At about $500 a year for all text usenet,
> about a day's golf foursome and a few beers can stand up a new Usenet peer.





Basically thinking about a "backing file format convention".

The message ID's are universally unique. File-systems support various counts and depths
of sub-directories. The message ID's aren't necessarily opaque structurally as file-names.
So, the first thing is a function that given a message-ID, results a message-ID-file-name.

Then, as it's figured that groups, are, separable, is about how, to, either have all the
messages in one store, or, split it out by groups. Either way the idea is to convert the
message-ID-file-name, to a given depth of directories, also legal in file names, so it
results that the message's get uniformly distributed in sub-directories of approximately
equal count and depth.

A....D...G <- message-ID

ABCDEFG <- message-ID-file-name

/A/B/C/D/E/F/ABCDEFG <- message-ID-directory-path

So, the idea is that the backing file format convention, basically results uniform lookup
of a file's existence, then about ingestion and constructing a message, then, moving
that directory as a link in the filesystem, so it results atomicity in the file system that
supports that the existence of a message-ID-directory-path is a function of message-ID,
and usual filesystem guarantees.



About the storage of the files, basically each message is only "header + body". Then,
when the message is served, then it has appended to its header the message numbers
according to the group, "header + numbers + body".

So, the idea is to store the header and body compressed with deflate, then that there's
a pretty simple implementation of a first-class treatment of deflated data, to compute
the deflated "numbers" on demand, and result that concatenation results "header + numbers
+ body". It's figured that clients would either support deflated, compressed data natively,
or, that the server would instead decompress data is compression's not supported, then
figuring that otherwise the data's stored at-rest as compressed. There's an idea that the
entire backing could be stored partially encrypted also, at-rest, but that would be special-purpose,

The usual idea that the backing-file-format-convention, is a physical interface for all access,
and also results that tar'ing that up to a file results a transport file also, and that, simply
the backing-file-formats can be overlaid or make symlinks farms together and such.


There's an idea then to make metadata, of, the, message-date, basically to have partitions
by day, where Jan 1 2020 = Jan 1 1970 - 18262,

YYYY/MM/DD/A/B/C/D/E/F/ABCDEFG -> symlink to /A/B/C/D/E/F/ABCDEFG/


This is where, the groups' file, which relate their message-numbers to message-ID's, only
has the message-numbers, vis-a-vis, browsing by date, in terms of, taking the intersection
of message-numbers' message-ID's and time-partitions' message-ID's.


Above, the idea of the groups file, is that message-ID's have a limit, and that, the groups file,
would have a fixed-size or fixed-length record, with the index and message-number being the offset,
and the record being the message-ID, then its header and body accessed as the message-ID-directory-path.

So, toward working out a BFF convention is to make it possible that file operation tools
like tar and cp and deflate and other usual command line tools, or facilities, make it so that
then while there should be a symlink free approach, also then as to how to employ symlinks,
with regards to usual indexes from axes of access to enumeration.

As above then I'm wondering to figure out how to make it so, that for something like a mailbox format,
then to have that round-trip from BFF format, but mostly how to make it so that any given collection
of messages, given each has a unique ID, and according to its headers its groups and injection date,
it results an automatic sort of building or rebuilding then the groups files.

Another key sort of thing is the threading. Also, there is to be consider the multi-post or cross-post.


Then, for metadata, is the idea of basically into supporting the protocol's overview and wildmat,
then for the affinity with IMAP, then up into usual notions of key-attribute filtering, and as with
regards to full content search, about a sort of "search file format", or indices, again with the goal
of that being fungible variously, and constructible according to simple bounds, and, resulting
that the goal is to reduce the size of the files at rest, figuring mostly the files at rest aren't accessed,
or when they are, they're just served raw as compressed, because messages once authored are static.

That said, the groups their contents grow over time, and also there is for notions of no-archive
and retention, basically about how to consider that in those use cases, to employ symlinks,
which result natural partitions, then to have usual rotation of truncation as deleting a folder,
invalidating all the symlinks to it, then a usual handler of ignoring broken symlinks, or deleting them,
so that maintenance is simple along the lines of "rm group" or "rm year".

So, there's some thinking involved to make it so the messages each, have their own folders,
and then parts in those, as above, this is the thinking here along the lines of "BFF/SFF",
then for setting up C10+K servers in front of that for NNTP, IMAP, and a simple service
mechanism for surfacing HTTP, these kinds of things. Then, the idea is that metadata
gets accumulated next to the messages in their folders, then those also to be concatenable,
to result that then for search, that corpuses or corpi are built off those intermediate data,
for usual searches and specialized searches and these kinds things.

Then, the idea is to make for this BFF/SFF convention, then to start gathering "certified corpi"
of groups over time, making for those then being pretty simply distributable like the old
idea of an mbox mailbox format, with regards to that being one file that results the entire thing.

Then, threads and the message numbers, where threading by message number is the

header + numbers + body

the numbers part, sort of is for open and closed threads, here though of course that threads
are formally always open, or about composing threads of those as over them being partitioned
in usual reasonable times, for transient threads and long-winded threads and recurring threads.



Then, besides "control" and "junk" and such or relating administration, is here for the sort
of minimal administration that results this NOOBNB curation. This and matters of relay
ingestion and authoring ingestion and ingestion as concatenation of BFF files,
is about these kinds of things.
Ross Finlayson
2024-01-23 07:09:32 UTC
Reply
Permalink
On Monday, January 22, 2024 at 8:38:45 PM UTC-8, Ross Finlayson wrote:
> On Friday, December 22, 2023 at 12:36:40 AM UTC-8, Ross Finlayson wrote:
> > On Saturday, April 29, 2023 at 2:54:26 PM UTC-7, Ross Finlayson wrote:
> > > On Wednesday, March 8, 2023 at 10:23:04 PM UTC-8, Ross Finlayson wrote:
> > > > On Wednesday, March 8, 2023 at 8:51:58 PM UTC-8, Ross Finlayson wrote:
> > > > > On Monday, December 6, 2021 at 7:32:16 AM UTC-8, Ross A. Finlayson wrote:
> > > > > > On Monday, November 16, 2020 at 5:39:08 PM UTC-8, Ross A. Finlayson wrote:
> > > > > > > On Monday, November 16, 2020 at 5:00:51 PM UTC-8, Ross A. Finlayson wrote:
> > > > > > > > On Tuesday, June 30, 2020 at 10:00:52 AM UTC-7, Mostowski Collapse wrote:
> > > > > > > > > NNTP is not HTTP. I was using bare metal access to
> > > > > > > > > usenet, not using Google group, via:
> > > > > > > > >
> > > > > > > > > news.albasani.net, unfortunately dead since Corona
> > > > > > > > >
> > > > > > > > > So was looking for an alternative. And found this
> > > > > > > > > alternative, which seems fine:
> > > > > > > > >
> > > > > > > > > news.solani.org
> > > > > > > > >
> > > > > > > > > Have Fun!
> > > > > > > > >
> > > > > > > > > P.S.: Technical spec of news.solani.org:
> > > > > > > > >
> > > > > > > > > Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 500 GB SSD RAID1
> > > > > > > > > Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 2 TB HDD RAID1
> > > > > > > > > Intel Xeon VM, 1.8 GHz, 1 GB RAM, 20 GB
> > > > > > > > > Standort: 2x Falkenstein, 1x New York
> > > > > > > > >
> > > > > > > > > advantage of bare metal usenet,
> > > > > > > > > you see all headers of message.
> > > > > > > > > Am Dienstag, 30. Juni 2020 06:24:53 UTC+2 schrieb Ross A. Finlayson:
> > > > > > > > > > Search you mentioned and for example HTTP is adding the SEARCH verb,
> > > > > > > > In traffic there are two kinds of usenet users,
> > > > > > > > viewers and traffic through Google Groups,
> > > > > > > > and, USENET. (USENET traffic.)
> > > > > > > >
> > > > > > > > Here now Google turned on login to view their
> > > > > > > > Google Groups - effectively closing the Google Groups
> > > > > > > > without a Google login.
> > > > > > > >
> > > > > > > > I suppose if they're used at work or whatever though
> > > > > > > > they'd be open.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Where I got with the C10K non-blocking I/O for a usenet server,
> > > > > > > > it scales up though then I think in the runtime is a situation where
> > > > > > > > it only runs epoll or kqueue that the test scale ups, then at the end
> > > > > > > > or in sockets there is a drop, or it fell off the driver. I've implemented
> > > > > > > > the code this far, what has all of NNTP in a file and then the "re-routine,
> > > > > > > > industry-pattern back-end" in memory, then for that running usually.
> > > > > > > >
> > > > > > > > (Cooperative multithreading on top of non-blocking I/O.)
> > > > > > > >
> > > > > > > > Implementing the serial queue or "monohydra", or slique,
> > > > > > > > makes for that then when the parser is constantly parsing,
> > > > > > > > it seems a usual queue like data structure with parsing
> > > > > > > > returning its bounds, consuming the queue.
> > > > > > > >
> > > > > > > > Having the file buffers all down small on 4K pages,
> > > > > > > > has that a next usual page size is the megabyte.
> > > > > > > >
> > > > > > > > Here though it seems to make sense to have a natural
> > > > > > > > 4K alignment the file system representation, then that
> > > > > > > > it is moving files.
> > > > > > > >
> > > > > > > > So, then with the new modern Java, it that runs in its own
> > > > > > > > Java server runtime environment, it seems I would also
> > > > > > > > need to see whether the cloud virt supported the I/O model
> > > > > > > > or not, or that the cooperative multi-threading for example
> > > > > > > > would be single-threaded. (Blocking abstractly.)
> > > > > > > >
> > > > > > > > Then besides I suppose that could be neatly with basically
> > > > > > > > the program model, and its file model, being well-defined,
> > > > > > > > then for NNTP with IMAP organization search and extensions,
> > > > > > > > those being standardized, seems to make sense for an efficient
> > > > > > > > news file organization.
> > > > > > > >
> > > > > > > > Here then it seems for serving the NNTP, and for example
> > > > > > > > their file bodies under the storage, with the fixed headers,
> > > > > > > > variable header or XREF, and the message body, then under
> > > > > > > > content it's same as storage.
> > > > > > > >
> > > > > > > > NNTP has "OVERVIEW" then from it is built search.
> > > > > > > >
> > > > > > > > Let's see here then, if I get the load test running, or,
> > > > > > > > just put a limit under the load while there are no load test
> > > > > > > > errors, it seems the algorithm then scales under load to be
> > > > > > > > making usually the algorithm serial in CPU, with: encryption,
> > > > > > > > and compression (traffic). (Block ciphers instead of serial transfer.)
> > > > > > > >
> > > > > > > > Then, the industry pattern with re-routines, has that the
> > > > > > > > re-routines are naturally co-operative in the blocking,
> > > > > > > > and in the language, including flow-of-control and exception scope.
> > > > > > > >
> > > > > > > >
> > > > > > > > So, I have a high-performance implementation here.
> > > > > > > It seems like for NFS, then, and having the separate read and write of the client,
> > > > > > > a default filesystem, is an idea for the system facility: mirroring the mounted file
> > > > > > > locally, and, providing the read view from that via a different route.
> > > > > > >
> > > > > > >
> > > > > > > A next idea then seems for the organization, the client views themselves
> > > > > > > organize over the durable and available file system representation, this
> > > > > > > provides anyone a view over the protocol with a group file convention.
> > > > > > >
> > > > > > > I.e., while usual continuous traffic was surfing, individual reads over group
> > > > > > > files could have independent views, for example collating contents.
> > > > > > >
> > > > > > > Then, extracting requests from traffic and threads seems usual.
> > > > > > >
> > > > > > > (For example a specialized object transfer view.)
> > > > > > >
> > > > > > > Making protocols for implementing internet protocols in groups and
> > > > > > > so on, here makes for giving usenet example views to content generally.
> > > > > > >
> > > > > > > So, I have designed a protocol node and implemented it mostly,
> > > > > > > then about designed an object transfer protocol, here the idea
> > > > > > > is how to make it so people can extract data, for example their own
> > > > > > > data, from a large durable store of all the usenet messages,
> > > > > > > making views of usenet running on usenet, eg "Feb. 2016: AP's
> > > > > > > Greatest Hits".
> > > > > > >
> > > > > > > Here the point is to figure that usenet, these days, can be operated
> > > > > > > in cooperation with usenet, and really for its own sake, for leaving
> > > > > > > messages in usenet and here for usenet protocol stores as there's
> > > > > > > no reason it's plain text the content, while the protocol supports it.
> > > > > > >
> > > > > > > Building personal view for example is a simple matter of very many
> > > > > > > service providers any of which sells usenet all day for a good deal.
> > > > > > >
> > > > > > > Let's see here, $25/MM, storage on the cloud last year for about
> > > > > > > a million messages for a month is about $25. Outbound traffic is
> > > > > > > usually the metered cloud traffic, here for example that CDN traffic
> > > > > > > support the universal share convention, under metering. What that
> > > > > > > the algorithm is effectively tunable in CPU and RAM, makes for under
> > > > > > > I/O that's it's "unobtrusive" or the cooperative in routine, for CPI I/O and
> > > > > > > RAM, then that there is for seeking that Network Store or Database Time
> > > > > > > instead effectively becomes File I/O time, as what may be faster,
> > > > > > > and more durable. There's a faster database time for scaling the ingestion
> > > > > > > here with that the file view is eventually consistent. (And reliable.)
> > > > > > >
> > > > > > > Checking the files would be over time for example with "last checked"
> > > > > > > and "last dropped" something along the lines of, finding wrong offsets,
> > > > > > > basically having to make it so that it survives neatly corruption of the
> > > > > > > store (by being more-or-less stored in-place).
> > > > > > >
> > > > > > > Content catalog and such, catalog.
> > > > > > Then I wonder and figure the re-routine can scale.
> > > > > >
> > > > > > Here for the re-routine, the industry factory pattern,
> > > > > > and the commands in the protocols in the templates,
> > > > > > and the memory module, with the algorithm interface,
> > > > > > in the high-performance computer resource, it is here
> > > > > > that this simple kind of "writing Internet software"
> > > > > > makes pretty rapidly for adding resources.
> > > > > >
> > > > > > Here the design is basically of a file I/O abstraction,
> > > > > > that the computer reads data files with mmap to get
> > > > > > their handlers, what results that for I/O map the channels
> > > > > > result transferring the channels in I/O for what results,
> > > > > > in mostly the allocated resource requirements generally,
> > > > > > and for the protocol and algorithm, it results then that
> > > > > > the industry factory pattern and making for interfaces,
> > > > > > then also here the I/O routine as what results that this
> > > > > > is an implementation, of a network server, mostly is making
> > > > > > for that the re-routine, results very neatly a model of
> > > > > > parallel cooperation.
> > > > > >
> > > > > > I think computers still have file systems and file I/O but
> > > > > > in abstraction just because PAGE_SIZE is still relevant for
> > > > > > the network besides or I/O, if eventually, here is that the
> > > > > > value types are in the commands and so on, it is besides
> > > > > > that in terms of the resources so defined it still is in a filesystem
> > > > > > convention that a remote and unreliable view of it suffices.
> > > > > >
> > > > > > Here then the source code also being "this is only 20-50k",
> > > > > > lines of code, with basically an entire otherwise library stack
> > > > > > of the runtime itself, only the network and file abstraction,
> > > > > > this makes for also that modularity results. (Factory Industry
> > > > > > Pattern Modules.)
> > > > > >
> > > > > > For a network server, here, that, mostly it is high performance
> > > > > > in the sense that this is about the most direct handle on the channels
> > > > > > and here mostly for the text layer in the I/O order, or protocol layer,
> > > > > > here is that basically encryption and compression usually in the layer,
> > > > > > there is besides a usual concern where encryption and compression
> > > > > > are left out, there is that text in the layer itself is commands.
> > > > > >
> > > > > > Then, those being constants under the resources for the protocol,
> > > > > > it's what results usual protocols like NNTP and HTTP and other protocols
> > > > > > with usually one server and many clients, here is for that these protocols
> > > > > > are defined in these modules, mostly there NNTP and IMAP, ..., HTTP.
> > > > > >
> > > > > > These are here defined "all Java" or "Pure Java", i.e. let's be clear that
> > > > > > in terms of the reference abstraction layer, I think computers still use
> > > > > > the non-blocking I/O and filesystems and network to RAM, so that as
> > > > > > the I/O is implemented in those it actually has those besides instead for
> > > > > > example defaulting to byte-per-channel or character I/O. I.e. the usual
> > > > > > semantics for servicing the I/O in the accepter routine and what makes
> > > > > > for that the platform also provides a reference encryption implementation,
> > > > > > if not so relevant for the block encoder chain, besides that for example
> > > > > > compression has a default implementation, here the I/O model is as simply
> > > > > > in store for handles, channels, ..., that it results that data especially delivered
> > > > > > from a constant store can anyways be mostly compressed and encrypted
> > > > > > already or predigested to serve, here that it's the convention, here is for
> > > > > > resulting that these client-server protocols, with usually reads > postings
> > > > > > then here besides "retention", basically here is for what it is.
> > > > > >
> > > > > > With the re-routine and the protocol layer besides, having written the
> > > > > > routines in the re-routine, what there is to write here is this industry
> > > > > > factory, or a module framework, implementing the re-routines, as they're
> > > > > > built from the linear description a routine, makes for as the routine progresses
> > > > > > that it's "in the language" and that more than less in the terms, it makes for
> > > > > > implementing the case of logic for values, in the logic's flow-of-control's terms.
> > > > > >
> > > > > > Then, there is that actually running the software is different than just
> > > > > > writing it, here in the sense that as a server runtime, it is to be made a
> > > > > > thing, by giving it a name, and giving it an authority, to exist on the Internet.
> > > > > >
> > > > > > There is basically that for BGP and NAT and so on, and, mobile fabric networks,
> > > > > > IP and TCP/IP, of course IPv4 and IPv6 are the coarse fabric main space, with
> > > > > > respect to what are CIDR and 24 bits rule and what makes for TCP/IP, here
> > > > > > entirely the course is using the TCP/IP stack and Java's TCP/IP stack, with
> > > > > > respect to that TCP/IP is so provided or in terms of process what results
> > > > > > ports mostly and connection models where it is exactly the TCP after the IP,
> > > > > > the Transport Control Protocol and Internet Protocol, have here both this
> > > > > > socket and datagram connection orientation, or stateful and stateless or
> > > > > > here that in terms of routing it's defined in addresses, under that names
> > > > > > and routing define sources, routes, destinations, ..., that routine numeric
> > > > > > IP addresses result in the usual sense of the network being behind an IP
> > > > > > and including IPv4 network fabric with respect to local routers.
> > > > > >
> > > > > > I.e., here to include a service framework is "here besides the routine, let's
> > > > > > make it clear that in terms of being a durable resource, there needs to be
> > > > > > some lockbox filled with its sustenance that in some locked or constant
> > > > > > terms results that for the duration of its outlay, say five years, it is held
> > > > > > up, then, it will be so again, or, let down to result the carry-over that it
> > > > > > invested to archive itself, I won't have to care or do anything until then".
> > > > > >
> > > > > >
> > > > > > About the service activation and the idea that, for a port, the routine itself
> > > > > > needs only run under load, i.e. there is effectively little traffic on the old archives,
> > > > > > and usually only the some other archive needs any traffic. Here the point is
> > > > > > that for the Java routine there is the system port that was accepted for the
> > > > > > request, that inetd or the systemd or means the network service was accessed,
> > > > > > made for that much as for HTTP the protocol is client-server also for IP the
> > > > > > protocol is client-server, while the TCP is packets. This is a general idea for
> > > > > > system integration while here mostly the routine is that being a detail:
> > > > > > the filesystem or network resource that results that the re-routines basically
> > > > > > make very large CPU scaling.
> > > > > >
> > > > > > Then, it is basically containerized this sense of "at some domain name, there
> > > > > > is a service, it's HTTP and NNTP and IMAP besides, what cares the world".
> > > > > >
> > > > > > I.e. being built on connection oriented protocols like the socket layer,
> > > > > > HTTP(S) and NNTP(S) and IMAP(S) or with the TLS orientation to certificates,
> > > > > > it's more than less sensible that most users have no idea of installing some
> > > > > > NNTP browser or pointing their email to IMAP so that the email browser
> > > > > > browses the newsgroups and for postings, here this is mostly only talk
> > > > > > about implementing NNTP then IMAP and HTTP that happens to look like that,
> > > > > > besides for example SMTP or NNTP posting.
> > > > > >
> > > > > > I.e., having "this IMAP server, happens to be this NNTP module", or
> > > > > > "this HTTP server, happens to be a real simple mailbox these groups",
> > > > > > makes for having partitions and retentions of those and that basically
> > > > > > NNTP messages in the protocol can be more or less the same content
> > > > > > in media, what otherwise is of a usual message type.
> > > > > >
> > > > > > Then, the NNTP server-server routine is the progation of messages
> > > > > > besides "I shall hire ten great usenet retention accounts and gently
> > > > > > and politely draw them down and back-fill Usenet, these ten groups".
> > > > > >
> > > > > > By then I would have to have made for retention in storage, such contents,
> > > > > > as have a reference value, then for besides making that independent in
> > > > > > reference value, just so that it suffices that it basically results "a usable
> > > > > > durable filesystem that happens you can browse it like usenet". I.e. as
> > > > > > the pieces to make the backfill are dug up, they get assigned reference numbers
> > > > > > of their time to make for what here is that in a grand schema of things,
> > > > > > they have a reference number in numerical order (and what's also the
> > > > > > server's "message-number" besides its "message-id") as noted above this
> > > > > > gets into the storage for retention of a file, while, most services for this
> > > > > > are instead for storage and serving, not necessarily or at all retention.
> > > > > >
> > > > > > I.e., the point is that as the groups are retained from retention, there is an
> > > > > > approach what makes for an orderly archeology, as for what convention
> > > > > > some data arrives, here that this server-server routine is besides the usual
> > > > > > routine which is "here are new posts, propagate them", it's "please deliver
> > > > > > as of a retention scan, and I'll try not to repeat it, what results as orderly
> > > > > > as possible a proof or exercise of what we'll call afterward entire retention",
> > > > > > then will be for as of writing a file that "as of the date, from start to finish,
> > > > > > this site certified these messages as best-effort retention".
> > > > > >
> > > > > > It seems then besides there is basically "here is some mbox file, serve it
> > > > > > like it was an NNTP group or an IMAP mailbox", ingestion, in terms of that
> > > > > > what is ingestion, is to result for the protocol that "for this protocol,
> > > > > > there is actually a normative filesystem representation that happens to
> > > > > > be pretty much also altogether definede by the protocol", the point is
> > > > > > that ingestion would result in command to remain in the protocol,
> > > > > > that a usual file type that "presents a usual abstraction, of a filesystem,
> > > > > > as from the contents of a file", here with the notion of "for all these
> > > > > > threaded discussions, here this system only cares some approach to
> > > > > > these ten particular newgroups that already have mostly their corpus
> > > > > > though it's not in perhaps their native mbox instead consulted from services".
> > > > > >
> > > > > > Then, there's for storing and serving the files, and there is the usual
> > > > > > notion that moving the data, is to result, that really these file organizations
> > > > > > are not so large in terms of resources, being "less than gigabytes" or so,
> > > > > > still there's a notion that as a durable resource they're to be made
> > > > > > fungible here the networked file approach in the native filesystem,
> > > > > > then that with respect to it's a backing store, it's to make for that
> > > > > > the entire enterprise is more or less to made in terms of account,
> > > > > > that then as a facility on the network then a service in the network,
> > > > > > it's basically separated the facility and service, while still of course
> > > > > > that the service is basically defined by its corpus.
> > > > > >
> > > > > >
> > > > > > Then, to make that fungible in a world of account, while with an exit
> > > > > > strategy so that the operation isn't not abstract, is mostly about the
> > > > > > domain name, then that what results the networking, after trusted
> > > > > > network naming and connections for what result routing, and then
> > > > > > the port, in terms of that there are usual firewalls in ports though that
> > > > > > besides usually enough client ports are ephemeral, here the point is
> > > > > > that the protocols and their well-known ports, here it's usually enough
> > > > > > that the Internet doesn't concern itself so much protocols but with
> > > > > > respect to proxies, here that for example NNTP and IMAP don't have
> > > > > > so much anything so related that way after startTLS. For the world of
> > > > > > account, is basically to have for a domain name, an administrator, and,
> > > > > > an owner or representative. These are to establish authority for changes
> > > > > > and also accountability for usage.
> > > > > >
> > > > > > Basically they're to be persons and there is a process to get to be an
> > > > > > administrator of DNS, most always there are services that a usual person
> > > > > > implementing the system might use, besides for example the numerical.
> > > > > >
> > > > > > More relevant though to DNS is getting servers on the network, with respect
> > > > > > to listening ports and that they connect to clients what so discover them as
> > > > > > via DNS or configuration, here as above the usual notion that these are
> > > > > > standard services and run on well-known ports for inetd or systemd.
> > > > > > I.e. there is basically that running a server and dedicated networking,
> > > > > > and power and so on, and some notion of the limits of reliability, is then
> > > > > > as very much in other aspects of the organization of the system, i.e. its name,
> > > > > > while at the same time, the point that a module makes for that basically
> > > > > > the provision of a domain name or well-known or ephemeral host, is the
> > > > > > usual notion that static IP addresses are a limited resource and as about
> > > > > > the various networks in IPv4 and how they route traffic, is for that these
> > > > > > services have well-known sections in DNS for at least that the most usual
> > > > > > configuration is none.
> > > > > >
> > > > > > For a usual global reliability and availability, is some notion basically that
> > > > > > each region and zone has a service available on the IP address, for that
> > > > > > "hostname" resolves to the IP addresses. As well, in reverse, for the IP
> > > > > > address and about the hostname, it should resolve reverse to hostname.
> > > > > >
> > > > > > About certificates mostly for identification after mapping to port, or
> > > > > > multi-home Internet routing, here is the point that whether the domain
> > > > > > name administration is "epochal" or "regular", is that epochs are defined
> > > > > > by the ports behind the numbers and the domain name system as well,
> > > > > > where in terms of the registrar, the domain names are epochal to the
> > > > > > registrar, with respect to owners of domain names.
> > > > > >
> > > > > > Then if DNS is a datagram or UDP service is for ICMP as for TCP/IP,
> > > > > > and also BGP and NAT and routing and what are local and remote
> > > > > > addresses, here is for not-so-much "implement DNS the protocol
> > > > > > also while you're at it", rather for what results that there is a durable
> > > > > > and long-standing and proper doorman, for some usenet.science.
> > > > > >
> > > > > > Here then the notion seems to be whether the doorman basically
> > > > > > knows well-known services, is a multi-homing router, or otherwise
> > > > > > what is the point that it starts the lean runtime, with respect to that
> > > > > > it's a container and having enough sense of administration its operation
> > > > > > as contained. I.e. here given a port and a hostname and always running
> > > > > > makes for that as long as there is the low (preferable no) idle for services
> > > > > > running that have no clients, is here also for the cheapest doorman that
> > > > > > knows how to standup the client sentinel. (And put it back away.)
> > > > > >
> > > > > > Probably the most awful thing in the cloud services is the cost for
> > > > > > data ingress and egress. What that means is that for example using
> > > > > > a facility that is bound by that as a cost instead of under some constant
> > > > > > cost, is basically why there is the approach that the containers needs a
> > > > > > handle to the files, and they're either local files or network files, here
> > > > > > with the some convention above in archival a shared consistent view
> > > > > > of all the files, or abstractly consistent, is for making that the doorman
> > > > > > can handle lots of starting and finishing connections, while it is out of
> > > > > > the way when usually it's client traffic and opening and closing connections,
> > > > > > and the usual abstraction is that the client sentinel is never off and doorman
> > > > > > does nothing, here is for attaching the one to some lower constant cost,
> > > > > > where for example any long-running cost is more than some low constant cost.
> > > > > >
> > > > > > Then, this kind of service is often represented by nodes, in the usual sense
> > > > > > "here is an abstract container with you hope some native performance under
> > > > > > the hypervisor where it lives on the farm on its rack, it basically is moved the
> > > > > > image to wherever it's requested from and lives there, have fun, the meter is on".
> > > > > > I.e. that's just "this Jar has some config conventions and you can make the
> > > > > > container associate it and watchdog it with systemd for example and use the
> > > > > > cgroups while you're at it and make for tempfs quota and also the best network
> > > > > > file share, which you might be welcome to cache if you care just in the off-chance
> > > > > > that this file-mapping is free or constant cost as long as it doesn't egress the
> > > > > > network", is for here about the facilities that work, to get a copy of the system
> > > > > > what with respect to its usual operation is a piece of the Internet.
> > > > > >
> > > > > > For the different reference modules (industry factories) in their patterns then
> > > > > > and under combined configuration "file + process + network + fare", is that
> > > > > > the fare of the service basically reflects a daily coin, in the sense that it
> > > > > > represents an annual or epochal fee, what results for the time there is
> > > > > > what is otherwise all defined the "file + process + network + name",
> > > > > > what results it perpetuates in operation more than less simply and automatically.
> > > > > >
> > > > > > Then, the point though is to get it to where "I can go to this service, and
> > > > > > administer it more or less by paying an account, that it thus lives in its
> > > > > > budget and quota in its metered world".
> > > > > >
> > > > > > That though is very involved with identity, that in terms of "I the account
> > > > > > as provided this sum make this sum paid with respect to an agreement",
> > > > > > is that authority to make agreements must make that it results that the
> > > > > > operation of the system, is entirely transparent, and defined in terms of
> > > > > > the roles and delegation, conventions in operation.
> > > > > >
> > > > > > I.e., I personally don't want to administer a copy of usenet, but, it's here
> > > > > > pretty much sorted out that I can administer one once then that it's to
> > > > > > administer itself in the following, in terms of it having resources to allocate
> > > > > > and resources to disburse. Also if nobody's using it it should basically work
> > > > > > itself out to dial its lights down (while maintaining availability).
> > > > > >
> > > > > > Then a point seems "maintain and administer the operation in effect,
> > > > > > what arrangement sees via delegation, that a card number and a phone
> > > > > > number and an email account and more than less a responsible entity,
> > > > > > is so indicated for example in cryptographic identity thus that the operation
> > > > > > of this system as a service, effectively operates itself out of a kitty,
> > > > > > what makes for administration and overhead, an entirely transparent
> > > > > > model of a miniature business the system as a service".
> > > > > >
> > > > > > "... and a mailing address and mail service."
> > > > > >
> > > > > > Then, for accounts and accounts, for example is the provision of the component
> > > > > > as simply an image in cloud algorithms, where basically as above here it's configured
> > > > > > that anybody with any cloud account could basically run it on their own terms,
> > > > > > there is for here sorting out "after this delegation to some business entity what
> > > > > > results a corporation in effect, the rest is business-in-a-box and more-than-less
> > > > > > what makes for its administration in state, is for how it basically limits and replicates
> > > > > > its service, in terms of its own assets here as what administered is abstractly
> > > > > > "durable forever mailboxes with private ownership if on public or managed resources".
> > > > > >
> > > > > > A usual notion of a private email and usenet service offering and business-in-a-box,
> > > > > > here what I'm looking at is that besides archiving sci.math and copying out its content
> > > > > > under author line, is to make such an industry for example here that "once having
> > > > > > implemented an Internet service, an Internet service of them results Internet".
> > > > > >
> > > > > > I.e. here the point is to make a corporation and a foundation in effect, what in terms
> > > > > > of then about the books and accounts, is about accounts for the business accounts
> > > > > > that reflect a persistent entity, then what results in terms of computing, networking,
> > > > > > and internetworking, with a regular notion of "let's never change this arrangement
> > > > > > but it's in monthly or annual terms", here for that in overall arrangements,
> > > > > > it results what the entire system more than less runs in ways then to either
> > > > > > run out its limits or make itself a sponsored effort, about more-or-less a simple
> > > > > > and responsible and accountable set of operations what effect the business
> > > > > > (here that in terms of service there is basically the realm of agreement)
> > > > > > that basically this sort of business-in-a-box model, is then besides itself of
> > > > > > accounts, toward the notion as pay-as-you-go and "usual credits and their limits".
> > > > > >
> > > > > > Then for a news://usenet.science, or for example sci.math.usenet.science,
> > > > > > is the idea that the entity is "some assemblage what is so that in DNS, and,
> > > > > > in the accounts payable and receivable, and, in the material matters of
> > > > > > arrangement and authority for administration, of DNS and resources and
> > > > > > accounts what result durably persisting the business, is basically for a service
> > > > > > then of what these are usual enough tasks, as that are interactive workflows
> > > > > > and for mechanical workflows.
> > > > > >
> > > > > > I.e. the point is for having the service than an on/off button and more or less
> > > > > > what is for a given instance of the operation, what results from some protocol
> > > > > > that provides a "durable store" of a sort of the business, that at any time basically
> > > > > > some re-routine or "eventually consistent" continuance of the operation of the
> > > > > > business, results basically a continuity in its operations, what is entirely granular,
> > > > > > that here for example the point is to "pick a DNS name, attach an account service,
> > > > > > go" it so results that in the terms, basically there are the placeholders of the
> > > > > > interactive workflows in that, and as what in terms are often for example simply
> > > > > > card and phone number terms, account terms.
> > > > > >
> > > > > > I.e. a service to replenish accounts as kitties for making accounts only and
> > > > > > exactly limited to the one service, its transfers, basically results that there
> > > > > > is the notion of an email address, a phone number, a credit card's information,
> > > > > > here a fixed limit debit account that works as of a kitty, there is a regular workflow
> > > > > > service that will read out the durable stores and according to the timeliness of
> > > > > > their events, affect the configuration and reconciliation of payments for accounts
> > > > > > (closed loop scheduling/receiving).
> > > > > >
> > > > > > https://datatracker.ietf.org/doc/draft-flanagan-regext-datadictionary/
> > > > > > https://www.rfc-editor.org/rfc/rfc9022.txt
> > > > > >
> > > > > > Basically for dailies, monthlies, and annuals, what make weeklies,
> > > > > > is this idea of Internet-from-a- account, what is services.
> > > > > After implementing a store, and the protocol for getting messages, then what seems relevant here in the
> > > > > context of the SEARCH command, is a fungible file-format, that is derived from the body of the message
> > > > > in a normal form, that is a data structure that represents an index and catalog and dictionary and summary
> > > > > of the message, a form of a data structure of a "search index".
> > > > >
> > > > > These types files should naturally compose, and result a data structure that according to some normal
> > > > > forms of search and summary algorithms, result that a data structure results, that makes for efficient
> > > > > search of sections of the corpus for information retrieval, here that "information retrieval is the science
> > > > > of search algorithms".
> > > > >
> > > > > Now, for what and how people search, or what is the specification of a search, is in terms of queries, say,
> > > > > here for some brief forms of queries that advise what's definitely included in the search, what's excluded,
> > > > > then perhaps what's maybe included, or yes/no/maybe, which makes for a predicate that can be built,
> > > > > that can be applied to results that compose and build for the terms of a filter with yes/no/maybe or
> > > > > sure/no/yes, with predicates in values.
> > > > >
> > > > > Here there is basically "free text search" and "matching summaries", where text is the text and summary is
> > > > > a data structure, with attributes as paths the leaves of the tree of which match.
> > > > >
> > > > > Then, the message has text, its body, and and headers, key-value pairs or collections thereof, where as well
> > > > > there are default summaries like "a histogram of words by occurrence" or for example default text like "the
> > > > > MIME body of this message has a default text representation".
> > > > >
> > > > > So, the idea developing here is to define what are "normal" forms of data structures that have some "normal"
> > > > > forms of encoding that result that these "normalizing" after "normative" data structures define well-behaved
> > > > > algorithms upon them, which provide well-defined bounds in resources that return some quantification of results,
> > > > > like any/each/every/all, "hits".
> > > > >
> > > > > This is where usually enough search engines' or collected search algorithms ("find") usually enough have these
> > > > > de-facto forms, "under the hood", as it were, to make it first-class that for a given message and body that
> > > > > there is a normal form of a "catalog summary index" which can be compiled to a constant when the message
> > > > > is ingested, that then basically any filestore of these messages has alongside it the filestore of the "catsums"
> > > > > or as on-demand, then that any algorithm has at least well-defined behavior under partitions or collections
> > > > > or selections of these messages, or items, for various standard algorithms that separate "to find" from
> > > > > "to serve to find".
> > > > >
> > > > > So, ..., what I'm wondering are what would be sufficient normal forms in brief that result that there are
> > > > > defined for a given corpus of messages, basically at the granularity of messages, how is defined how
> > > > > there is a normal form for each message its "catsum", that catums have a natural algebra that a
> > > > > concatenation of catums is a catsum and that some standard algorithms naturally have well-defined
> > > > > results on their predicates and quantifiers of matching, in serial and parallel, and that the results
> > > > > combine in serial and parallel.
> > > > >
> > > > > The results should be applicable to any kind of data but here it's more or less about usenet groups.
> > > > So I start browsing the Information Retrieval section in Wikipedia and more or less get to reading
> > > > Luhn's 1958 "automatic coding of document summaries" or "The Automatic Creation of Literature
> > > > Abstracts". Then, what I figure, is that the histogram, is an associative array of keys to counts,
> > > > and what I figure is to compute both the common terms, and, the rare terms, so that there's both
> > > > "common-weight" and "rare-weight" computed, off of the count of the terms, and the count of
> > > > distinct terms, where it is working up that besides catums, or catsums, it would result a relational
> > > > algebra of terms in, ..., terms, of counts and densities and these type things. This is where, first I
> > > > would figure the catsum would be deterministic before it's at all probabilistic, because the goal is
> > > > match-find not match-guess, while still it's to support the less deterministic but more opportunistic
> > > > at the same time.
> > > >
> > > > Then, the "index" is basically like a usual book's index, for each term that's not a common term in
> > > > the language but is a common term in the book, what page it's on, here that that is a read-out of
> > > > a histogram of the terms to pages. Then, compound terms, basically get into grammar, and in terms
> > > > of terms, I don't so much care to parse glossolalia as what result mostly well-defined compound terms
> > > > in usual natural languages, for the utility of a dictionary and technical dictionaries. Here "pages" are
> > > > both according to common message threads, and also the surround of messages in the same time
> > > > period, where a group is a common message thread and a usenet is a common message thread.
> > > >
> > > > (I've had a copy of "the information retrieval book" before, also borrowed one "data logic".)
> > > >
> > > > "Spelling mistakes considered adversarial."
> > > >
> > > > https://en.wikipedia.org/wiki/Subject_indexing#Indexing_theory
> > > >
> > > > Then, there's lots to be said for "summary" and "summary in statistic".
> > > >
> > > >
> > > > A first usual data structure for efficiency is the binary tree or bounding tree. Then, there's
> > > > also what makes for divide-and-conquer or linear speedup.
> > > >
> > > >
> > > > About the same time as Luhn's monograph or 1956, there was published a little book
> > > > called "Logic and Language", Huppe and Kaminsky. It details how according to linguistics
> > > > there are certain usual regular patterns of words after phonemes and morphology what
> > > > result then for stems and etymology that then for vocabulary that grammar or natural
> > > > language results above. Then there are also gentle introductions to logic. It's very readable
> > > > and quite brief.
> > > I haven't much been tapping away at this,
> > > but it's pretty simple to stand up a usenet peer,
> > > and pretty simple to slurp a copy,
> > > of the "Big 8" usenet text groups, for example,
> > > or particularly just for a few.
> > Well, I've been thinking about this, and there are some ideas.
> >
> > One is about a system of reputation, the idea being New/Old/Off/Bad/Bot/Non,
> > basically figuring that reputation is established by action.
> >
> > Figuring how to categorize spam, UCE, vice, crime, and call that Bad, then
> > gets into basically two editions, with a common backing, Cur (curated) and Raw,
> > with Old and New in curated, and Off and Bot a filter off that, and Bad and Non
> > excluded, though in the raw feed. Then there's only to forward what's curated,
> > or current.
> >
> > Here the idea is that New graduates to Old, Non might be a false-negative New,
> > but is probably a negative Bad or Off, and then Bot is a sort of honor system, and
> > Old might wander to Off and vice-versa, then that Off and Old can vacillate.
> >
> > Then for renditions, is basically that the idea is that it's the same content
> > behind NNTP, with IMAP, then also an HTTP gateway, Atom/RDF feed, ....
> >
> > (It's pretty usually text-only but here is MIME.)
> >
> > There are various ways to make for posting that's basically for that Old
> > can post what they want, and Off, then for something like that New,
> > gets an email in reply to their post, that they reply to that, to round-trip a post.
> >
> > (Also mail-to-news and news-to-mail are pretty usual. Also there are
> > notions of humanitarian inputs.)
> >
> > Similarly there are the notions above about using certificates and TLS to
> > use technology and protocol to solve technology protocol abuse problems.
> >
> > For surfacing the items then is about technologies like robots.txt and
> > Dublin Core metadata, and similar notions with respect to uniqueness.
> > If you have other ideas about this, please chime in.
> >
> > Then for having a couple sorts of organizations of both the domain name
> > and the URL's as resources, makes for example for sub-domains for groups,
> > for example then with certificate conventions in that, then usual sorts of
> > URL's that are, you know, URL's, and URN's, then, about URL's, URI's, and URN's.
> >
> > Luckily it's all quite standardized so quite stock NTTP, IMAP, and HTTP browsers,
> > and about SMTP and IMAP, and with TLS, make of course a fungible sort of system.
> >
> >
> > How to pay for it all? At about $500 a year for all text usenet,
> > about a day's golf foursome and a few beers can stand up a new Usenet peer.
> Basically thinking about a "backing file format convention".
>
> The message ID's are universally unique. File-systems support various counts and depths
> of sub-directories. The message ID's aren't necessarily opaque structurally as file-names.
> So, the first thing is a function that given a message-ID, results a message-ID-file-name.
>
> Then, as it's figured that groups, are, separable, is about how, to, either have all the
> messages in one store, or, split it out by groups. Either way the idea is to convert the
> message-ID-file-name, to a given depth of directories, also legal in file names, so it
> results that the message's get uniformly distributed in sub-directories of approximately
> equal count and depth.
>
> A....D...G <- message-ID
>
> ABCDEFG <- message-ID-file-name
>
> /A/B/C/D/E/F/ABCDEFG <- message-ID-directory-path
>
> So, the idea is that the backing file format convention, basically results uniform lookup
> of a file's existence, then about ingestion and constructing a message, then, moving
> that directory as a link in the filesystem, so it results atomicity in the file system that
> supports that the existence of a message-ID-directory-path is a function of message-ID,
> and usual filesystem guarantees.
>
>
>
> About the storage of the files, basically each message is only "header + body". Then,
> when the message is served, then it has appended to its header the message numbers
> according to the group, "header + numbers + body".
>
> So, the idea is to store the header and body compressed with deflate, then that there's
> a pretty simple implementation of a first-class treatment of deflated data, to compute
> the deflated "numbers" on demand, and result that concatenation results "header + numbers
> + body". It's figured that clients would either support deflated, compressed data natively,
> or, that the server would instead decompress data is compression's not supported, then
> figuring that otherwise the data's stored at-rest as compressed. There's an idea that the
> entire backing could be stored partially encrypted also, at-rest, but that would be special-purpose,
>
> The usual idea that the backing-file-format-convention, is a physical interface for all access,
> and also results that tar'ing that up to a file results a transport file also, and that, simply
> the backing-file-formats can be overlaid or make symlinks farms together and such.
>
>
> There's an idea then to make metadata, of, the, message-date, basically to have partitions
> by day, where Jan 1 2020 = Jan 1 1970 - 18262,
>
> YYYY/MM/DD/A/B/C/D/E/F/ABCDEFG -> symlink to /A/B/C/D/E/F/ABCDEFG/
>
>
> This is where, the groups' file, which relate their message-numbers to message-ID's, only
> has the message-numbers, vis-a-vis, browsing by date, in terms of, taking the intersection
> of message-numbers' message-ID's and time-partitions' message-ID's.
>
>
> Above, the idea of the groups file, is that message-ID's have a limit, and that, the groups file,
> would have a fixed-size or fixed-length record, with the index and message-number being the offset,
> and the record being the message-ID, then its header and body accessed as the message-ID-directory-path.
>
> So, toward working out a BFF convention is to make it possible that file operation tools
> like tar and cp and deflate and other usual command line tools, or facilities, make it so that
> then while there should be a symlink free approach, also then as to how to employ symlinks,
> with regards to usual indexes from axes of access to enumeration.
>
> As above then I'm wondering to figure out how to make it so, that for something like a mailbox format,
> then to have that round-trip from BFF format, but mostly how to make it so that any given collection
> of messages, given each has a unique ID, and according to its headers its groups and injection date,
> it results an automatic sort of building or rebuilding then the groups files.
>
> Another key sort of thing is the threading. Also, there is to be consider the multi-post or cross-post.
>
>
> Then, for metadata, is the idea of basically into supporting the protocol's overview and wildmat,
> then for the affinity with IMAP, then up into usual notions of key-attribute filtering, and as with
> regards to full content search, about a sort of "search file format", or indices, again with the goal
> of that being fungible variously, and constructible according to simple bounds, and, resulting
> that the goal is to reduce the size of the files at rest, figuring mostly the files at rest aren't accessed,
> or when they are, they're just served raw as compressed, because messages once authored are static.
>
> That said, the groups their contents grow over time, and also there is for notions of no-archive
> and retention, basically about how to consider that in those use cases, to employ symlinks,
> which result natural partitions, then to have usual rotation of truncation as deleting a folder,
> invalidating all the symlinks to it, then a usual handler of ignoring broken symlinks, or deleting them,
> so that maintenance is simple along the lines of "rm group" or "rm year".
>
> So, there's some thinking involved to make it so the messages each, have their own folders,
> and then parts in those, as above, this is the thinking here along the lines of "BFF/SFF",
> then for setting up C10+K servers in front of that for NNTP, IMAP, and a simple service
> mechanism for surfacing HTTP, these kinds of things. Then, the idea is that metadata
> gets accumulated next to the messages in their folders, then those also to be concatenable,
> to result that then for search, that corpuses or corpi are built off those intermediate data,
> for usual searches and specialized searches and these kinds things.
>
> Then, the idea is to make for this BFF/SFF convention, then to start gathering "certified corpi"
> of groups over time, making for those then being pretty simply distributable like the old
> idea of an mbox mailbox format, with regards to that being one file that results the entire thing.
>
> Then, threads and the message numbers, where threading by message number is the
>
> header + numbers + body
>
> the numbers part, sort of is for open and closed threads, here though of course that threads
> are formally always open, or about composing threads of those as over them being partitioned
> in usual reasonable times, for transient threads and long-winded threads and recurring threads.
>
>
>
> Then, besides "control" and "junk" and such or relating administration, is here for the sort
> of minimal administration that results this NOOBNB curation. This and matters of relay
> ingestion and authoring ingestion and ingestion as concatenation of BFF files,
> is about these kinds of things.




The idea of "NOOBNB curation" seems a reasonable sort of simple-enough
yet full-enough way to start building a usual common open messaging system,
with as well the omission of the overall un-wanted and illicit.

The idea of NOOBNB curation, is that it's like "Noob NB: Nota Bene for Noobs",
with splitting New/Old/Off or "NOO" and Bot/Non/Bad or BNB, so that the curation
delivers NOO, or Nu, while the raw includes be-not-back, BNB.

So, the idea for New/Old/Off, is that there is Off traffic, but, "caveat lector",
reader be aware, figuring that people can still client-side "kill file" the curated feed.

Then, Bot/Non/Bad, basically includes that Bot would include System Bot, and Free Bot,
sort of with the idea of that if Bots want feed then they get raw, while System Bot can
post metadata of what's Bot/Non/Bad and it gets simply excluded from the curated.

Then, for this it seems the axis of metadata is the Author, about the relation of Authors
to posts. I.e. it's the principal metadata axis of otherwise posts, individual messages.

Here the idea is that generally that once some author's established as "Old", then
they always go into NOO, as either Old or Off, while "New" is the establishment
of this maturity, to at least follow the charter and otherwise for take-it-or-leave-it.


Then, "Non" is basically that "New", according to Author, basically either gets accepted,
or not, according to what must be some "objective standards of topicality and etiquette".

Then "Bad" is pretty much that anybody who results Bad basically gets marked Bad.

Now, it's a temporal thing, and it's possible that attacks would result false positives
and false negatives, a.k.a. Type I and Type II errors. There's a general idea to attenuate
"Off" and even "Bad", figuring "Off" reverts to "Old" and "Bad" reverts to "Non", according
to Author, or for example "Injection Site".


Then, for the posting side, there are some things involved. There are legal things involved,
illicit content or contraband, have some safe harbor provisions in usual first-world countries,
vis-a-vis, for example, the copyright claim. Responsiveness to copyright claims, would basically
be marking spammers of warez as Bad, and not including them in the curated, that being figured
the extent of responsibility.

There's otherwise a usual good-faith expectation of fair-use, intellectual-property wise.


Otherwise then it's that "Usenet the protocol relies on email identity". So, the idea to implement
that posts round-trip through email, is considered the bar.

Here then furthermore is considered how to make a general sort of Injection-Site algorithm,
in terms of peering or peerages, and compeering, as with respect to Sites, their policies, and then
here with respect to the dual outfeeds, curated and raw, figuring curated is good-faith and raw,
includes garbage, or for example to just pipe raw to /dev/null, and for automatically curating in-feed.

The idea is to support establishment of association of an e-mail identity, so that a usual sort
of general-purpose responsible algorithm, can work up various factors authentication, in
the usual notions of authentication AuthN and authorization AuthZ, with respect to
login and "posting allowed", or as via delegation in what's called Federated identity,
that resulting being the responsibility of peers, their hosts, and so on.

Then, about that for humanitarian and free-press sorts reasons, "anonymity", well first
off there's anonymity is not part of the charter, and indeed the charter says to use
your real name and your real e-mail address. I.e., anonymity on the one has a reasonable
sort of misdirection from miscreants attacking anybody, on the other hand those same
sorts miscreants abuse anonymity, so, here it's basically the idea that "NOOBNB" is a very
brief system of reputation as of the vouched identity of an author by email address,
or the opaque value that results gets posted in the sender field by whatever peer injects whatever.

How then to automatically characterize spam and the illicit is sort of a thing,
while that the off-topic but otherwise according to charter including the spirit
of the charter as free press, with anonymity to protect while not anonymity to attack,
these are the kinds of things that help make for that "NOOBNB curation", is to result
a sort of addendum to Usenet charter, that results though same as the old Usenet charter.

Characterization could include for example "MIME banned", "glyph ranges banned",
"subjects banned", "injection sites banned", these being open then so that legitimate
posters run not afoul, that while bad actors could adapt, then they would get funneled
into "automatic semantic text characterization bans".

The idea then is that responsible injection sites will have measures in place to prevent
"Non" authors from becoming "New" authors, those maturing, "Old" and "Off" post freely,
that among "Bot" is "System Bot" and "Tag Bot", then that according to algorithms in
data in the raw Bot feed, is established relations that attenuate to Bad and Non,
so that it's a self-describing sort of data set, and peers pick up either or both.


Then the other key notion is to reflect an ID generator, so that, every post, gets
exactly and uniquely, one ID, identifier, a global and universally unique identifer.
This was addressed as above and it's a usual notion of a common facility, UUID dispenser.
The idea of identifying those over times, is for that over the corpus, is established
a sort of digit-by-digit stamp generator, to check for IDs over the entire corpus,
or here a compact and efficient representation of same, then for issuing ranges,
for usual expectations of the order of sites on the order of posters the order of posts.

Luckily it's sort of already the case that all the messages already do have unique ID's.

"Usenet: it has a charter."
Ross Finlayson
2024-01-24 02:54:46 UTC
Reply
Permalink
On Monday, January 22, 2024 at 11:09:38 PM UTC-8, Ross Finlayson wrote:
> On Monday, January 22, 2024 at 8:38:45 PM UTC-8, Ross Finlayson wrote:
> > On Friday, December 22, 2023 at 12:36:40 AM UTC-8, Ross Finlayson wrote:
> > > On Saturday, April 29, 2023 at 2:54:26 PM UTC-7, Ross Finlayson wrote:
> > > > On Wednesday, March 8, 2023 at 10:23:04 PM UTC-8, Ross Finlayson wrote:
> > > > > On Wednesday, March 8, 2023 at 8:51:58 PM UTC-8, Ross Finlayson wrote:
> > > > > > On Monday, December 6, 2021 at 7:32:16 AM UTC-8, Ross A. Finlayson wrote:
> > > > > > > On Monday, November 16, 2020 at 5:39:08 PM UTC-8, Ross A. Finlayson wrote:
> > > > > > > > On Monday, November 16, 2020 at 5:00:51 PM UTC-8, Ross A. Finlayson wrote:
> > > > > > > > > On Tuesday, June 30, 2020 at 10:00:52 AM UTC-7, Mostowski Collapse wrote:
> > > > > > > > > > NNTP is not HTTP. I was using bare metal access to
> > > > > > > > > > usenet, not using Google group, via:
> > > > > > > > > >
> > > > > > > > > > news.albasani.net, unfortunately dead since Corona
> > > > > > > > > >
> > > > > > > > > > So was looking for an alternative. And found this
> > > > > > > > > > alternative, which seems fine:
> > > > > > > > > >
> > > > > > > > > > news.solani.org
> > > > > > > > > >
> > > > > > > > > > Have Fun!
> > > > > > > > > >
> > > > > > > > > > P.S.: Technical spec of news.solani.org:
> > > > > > > > > >
> > > > > > > > > > Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 500 GB SSD RAID1
> > > > > > > > > > Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 2 TB HDD RAID1
> > > > > > > > > > Intel Xeon VM, 1.8 GHz, 1 GB RAM, 20 GB
> > > > > > > > > > Standort: 2x Falkenstein, 1x New York
> > > > > > > > > >
> > > > > > > > > > advantage of bare metal usenet,
> > > > > > > > > > you see all headers of message.
> > > > > > > > > > Am Dienstag, 30. Juni 2020 06:24:53 UTC+2 schrieb Ross A. Finlayson:
> > > > > > > > > > > Search you mentioned and for example HTTP is adding the SEARCH verb,
> > > > > > > > > In traffic there are two kinds of usenet users,
> > > > > > > > > viewers and traffic through Google Groups,
> > > > > > > > > and, USENET. (USENET traffic.)
> > > > > > > > >
> > > > > > > > > Here now Google turned on login to view their
> > > > > > > > > Google Groups - effectively closing the Google Groups
> > > > > > > > > without a Google login.
> > > > > > > > >
> > > > > > > > > I suppose if they're used at work or whatever though
> > > > > > > > > they'd be open.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Where I got with the C10K non-blocking I/O for a usenet server,
> > > > > > > > > it scales up though then I think in the runtime is a situation where
> > > > > > > > > it only runs epoll or kqueue that the test scale ups, then at the end
> > > > > > > > > or in sockets there is a drop, or it fell off the driver. I've implemented
> > > > > > > > > the code this far, what has all of NNTP in a file and then the "re-routine,
> > > > > > > > > industry-pattern back-end" in memory, then for that running usually.
> > > > > > > > >
> > > > > > > > > (Cooperative multithreading on top of non-blocking I/O.)
> > > > > > > > >
> > > > > > > > > Implementing the serial queue or "monohydra", or slique,
> > > > > > > > > makes for that then when the parser is constantly parsing,
> > > > > > > > > it seems a usual queue like data structure with parsing
> > > > > > > > > returning its bounds, consuming the queue.
> > > > > > > > >
> > > > > > > > > Having the file buffers all down small on 4K pages,
> > > > > > > > > has that a next usual page size is the megabyte.
> > > > > > > > >
> > > > > > > > > Here though it seems to make sense to have a natural
> > > > > > > > > 4K alignment the file system representation, then that
> > > > > > > > > it is moving files.
> > > > > > > > >
> > > > > > > > > So, then with the new modern Java, it that runs in its own
> > > > > > > > > Java server runtime environment, it seems I would also
> > > > > > > > > need to see whether the cloud virt supported the I/O model
> > > > > > > > > or not, or that the cooperative multi-threading for example
> > > > > > > > > would be single-threaded. (Blocking abstractly.)
> > > > > > > > >
> > > > > > > > > Then besides I suppose that could be neatly with basically
> > > > > > > > > the program model, and its file model, being well-defined,
> > > > > > > > > then for NNTP with IMAP organization search and extensions,
> > > > > > > > > those being standardized, seems to make sense for an efficient
> > > > > > > > > news file organization.
> > > > > > > > >
> > > > > > > > > Here then it seems for serving the NNTP, and for example
> > > > > > > > > their file bodies under the storage, with the fixed headers,
> > > > > > > > > variable header or XREF, and the message body, then under
> > > > > > > > > content it's same as storage.
> > > > > > > > >
> > > > > > > > > NNTP has "OVERVIEW" then from it is built search.
> > > > > > > > >
> > > > > > > > > Let's see here then, if I get the load test running, or,
> > > > > > > > > just put a limit under the load while there are no load test
> > > > > > > > > errors, it seems the algorithm then scales under load to be
> > > > > > > > > making usually the algorithm serial in CPU, with: encryption,
> > > > > > > > > and compression (traffic). (Block ciphers instead of serial transfer.)
> > > > > > > > >
> > > > > > > > > Then, the industry pattern with re-routines, has that the
> > > > > > > > > re-routines are naturally co-operative in the blocking,
> > > > > > > > > and in the language, including flow-of-control and exception scope.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > So, I have a high-performance implementation here.
> > > > > > > > It seems like for NFS, then, and having the separate read and write of the client,
> > > > > > > > a default filesystem, is an idea for the system facility: mirroring the mounted file
> > > > > > > > locally, and, providing the read view from that via a different route.
> > > > > > > >
> > > > > > > >
> > > > > > > > A next idea then seems for the organization, the client views themselves
> > > > > > > > organize over the durable and available file system representation, this
> > > > > > > > provides anyone a view over the protocol with a group file convention.
> > > > > > > >
> > > > > > > > I.e., while usual continuous traffic was surfing, individual reads over group
> > > > > > > > files could have independent views, for example collating contents.
> > > > > > > >
> > > > > > > > Then, extracting requests from traffic and threads seems usual.
> > > > > > > >
> > > > > > > > (For example a specialized object transfer view.)
> > > > > > > >
> > > > > > > > Making protocols for implementing internet protocols in groups and
> > > > > > > > so on, here makes for giving usenet example views to content generally.
> > > > > > > >
> > > > > > > > So, I have designed a protocol node and implemented it mostly,
> > > > > > > > then about designed an object transfer protocol, here the idea
> > > > > > > > is how to make it so people can extract data, for example their own
> > > > > > > > data, from a large durable store of all the usenet messages,
> > > > > > > > making views of usenet running on usenet, eg "Feb. 2016: AP's
> > > > > > > > Greatest Hits".
> > > > > > > >
> > > > > > > > Here the point is to figure that usenet, these days, can be operated
> > > > > > > > in cooperation with usenet, and really for its own sake, for leaving
> > > > > > > > messages in usenet and here for usenet protocol stores as there's
> > > > > > > > no reason it's plain text the content, while the protocol supports it.
> > > > > > > >
> > > > > > > > Building personal view for example is a simple matter of very many
> > > > > > > > service providers any of which sells usenet all day for a good deal.
> > > > > > > >
> > > > > > > > Let's see here, $25/MM, storage on the cloud last year for about
> > > > > > > > a million messages for a month is about $25. Outbound traffic is
> > > > > > > > usually the metered cloud traffic, here for example that CDN traffic
> > > > > > > > support the universal share convention, under metering. What that
> > > > > > > > the algorithm is effectively tunable in CPU and RAM, makes for under
> > > > > > > > I/O that's it's "unobtrusive" or the cooperative in routine, for CPI I/O and
> > > > > > > > RAM, then that there is for seeking that Network Store or Database Time
> > > > > > > > instead effectively becomes File I/O time, as what may be faster,
> > > > > > > > and more durable. There's a faster database time for scaling the ingestion
> > > > > > > > here with that the file view is eventually consistent. (And reliable.)
> > > > > > > >
> > > > > > > > Checking the files would be over time for example with "last checked"
> > > > > > > > and "last dropped" something along the lines of, finding wrong offsets,
> > > > > > > > basically having to make it so that it survives neatly corruption of the
> > > > > > > > store (by being more-or-less stored in-place).
> > > > > > > >
> > > > > > > > Content catalog and such, catalog.
> > > > > > > Then I wonder and figure the re-routine can scale.
> > > > > > >
> > > > > > > Here for the re-routine, the industry factory pattern,
> > > > > > > and the commands in the protocols in the templates,
> > > > > > > and the memory module, with the algorithm interface,
> > > > > > > in the high-performance computer resource, it is here
> > > > > > > that this simple kind of "writing Internet software"
> > > > > > > makes pretty rapidly for adding resources.
> > > > > > >
> > > > > > > Here the design is basically of a file I/O abstraction,
> > > > > > > that the computer reads data files with mmap to get
> > > > > > > their handlers, what results that for I/O map the channels
> > > > > > > result transferring the channels in I/O for what results,
> > > > > > > in mostly the allocated resource requirements generally,
> > > > > > > and for the protocol and algorithm, it results then that
> > > > > > > the industry factory pattern and making for interfaces,
> > > > > > > then also here the I/O routine as what results that this
> > > > > > > is an implementation, of a network server, mostly is making
> > > > > > > for that the re-routine, results very neatly a model of
> > > > > > > parallel cooperation.
> > > > > > >
> > > > > > > I think computers still have file systems and file I/O but
> > > > > > > in abstraction just because PAGE_SIZE is still relevant for
> > > > > > > the network besides or I/O, if eventually, here is that the
> > > > > > > value types are in the commands and so on, it is besides
> > > > > > > that in terms of the resources so defined it still is in a filesystem
> > > > > > > convention that a remote and unreliable view of it suffices.
> > > > > > >
> > > > > > > Here then the source code also being "this is only 20-50k",
> > > > > > > lines of code, with basically an entire otherwise library stack
> > > > > > > of the runtime itself, only the network and file abstraction,
> > > > > > > this makes for also that modularity results. (Factory Industry
> > > > > > > Pattern Modules.)
> > > > > > >
> > > > > > > For a network server, here, that, mostly it is high performance
> > > > > > > in the sense that this is about the most direct handle on the channels
> > > > > > > and here mostly for the text layer in the I/O order, or protocol layer,
> > > > > > > here is that basically encryption and compression usually in the layer,
> > > > > > > there is besides a usual concern where encryption and compression
> > > > > > > are left out, there is that text in the layer itself is commands.
> > > > > > >
> > > > > > > Then, those being constants under the resources for the protocol,
> > > > > > > it's what results usual protocols like NNTP and HTTP and other protocols
> > > > > > > with usually one server and many clients, here is for that these protocols
> > > > > > > are defined in these modules, mostly there NNTP and IMAP, ..., HTTP.
> > > > > > >
> > > > > > > These are here defined "all Java" or "Pure Java", i.e. let's be clear that
> > > > > > > in terms of the reference abstraction layer, I think computers still use
> > > > > > > the non-blocking I/O and filesystems and network to RAM, so that as
> > > > > > > the I/O is implemented in those it actually has those besides instead for
> > > > > > > example defaulting to byte-per-channel or character I/O. I.e. the usual
> > > > > > > semantics for servicing the I/O in the accepter routine and what makes
> > > > > > > for that the platform also provides a reference encryption implementation,
> > > > > > > if not so relevant for the block encoder chain, besides that for example
> > > > > > > compression has a default implementation, here the I/O model is as simply
> > > > > > > in store for handles, channels, ..., that it results that data especially delivered
> > > > > > > from a constant store can anyways be mostly compressed and encrypted
> > > > > > > already or predigested to serve, here that it's the convention, here is for
> > > > > > > resulting that these client-server protocols, with usually reads > postings
> > > > > > > then here besides "retention", basically here is for what it is.
> > > > > > >
> > > > > > > With the re-routine and the protocol layer besides, having written the
> > > > > > > routines in the re-routine, what there is to write here is this industry
> > > > > > > factory, or a module framework, implementing the re-routines, as they're
> > > > > > > built from the linear description a routine, makes for as the routine progresses
> > > > > > > that it's "in the language" and that more than less in the terms, it makes for
> > > > > > > implementing the case of logic for values, in the logic's flow-of-control's terms.
> > > > > > >
> > > > > > > Then, there is that actually running the software is different than just
> > > > > > > writing it, here in the sense that as a server runtime, it is to be made a
> > > > > > > thing, by giving it a name, and giving it an authority, to exist on the Internet.
> > > > > > >
> > > > > > > There is basically that for BGP and NAT and so on, and, mobile fabric networks,
> > > > > > > IP and TCP/IP, of course IPv4 and IPv6 are the coarse fabric main space, with
> > > > > > > respect to what are CIDR and 24 bits rule and what makes for TCP/IP, here
> > > > > > > entirely the course is using the TCP/IP stack and Java's TCP/IP stack, with
> > > > > > > respect to that TCP/IP is so provided or in terms of process what results
> > > > > > > ports mostly and connection models where it is exactly the TCP after the IP,
> > > > > > > the Transport Control Protocol and Internet Protocol, have here both this
> > > > > > > socket and datagram connection orientation, or stateful and stateless or
> > > > > > > here that in terms of routing it's defined in addresses, under that names
> > > > > > > and routing define sources, routes, destinations, ..., that routine numeric
> > > > > > > IP addresses result in the usual sense of the network being behind an IP
> > > > > > > and including IPv4 network fabric with respect to local routers.
> > > > > > >
> > > > > > > I.e., here to include a service framework is "here besides the routine, let's
> > > > > > > make it clear that in terms of being a durable resource, there needs to be
> > > > > > > some lockbox filled with its sustenance that in some locked or constant
> > > > > > > terms results that for the duration of its outlay, say five years, it is held
> > > > > > > up, then, it will be so again, or, let down to result the carry-over that it
> > > > > > > invested to archive itself, I won't have to care or do anything until then".
> > > > > > >
> > > > > > >
> > > > > > > About the service activation and the idea that, for a port, the routine itself
> > > > > > > needs only run under load, i.e. there is effectively little traffic on the old archives,
> > > > > > > and usually only the some other archive needs any traffic. Here the point is
> > > > > > > that for the Java routine there is the system port that was accepted for the
> > > > > > > request, that inetd or the systemd or means the network service was accessed,
> > > > > > > made for that much as for HTTP the protocol is client-server also for IP the
> > > > > > > protocol is client-server, while the TCP is packets. This is a general idea for
> > > > > > > system integration while here mostly the routine is that being a detail:
> > > > > > > the filesystem or network resource that results that the re-routines basically
> > > > > > > make very large CPU scaling.
> > > > > > >
> > > > > > > Then, it is basically containerized this sense of "at some domain name, there
> > > > > > > is a service, it's HTTP and NNTP and IMAP besides, what cares the world".
> > > > > > >
> > > > > > > I.e. being built on connection oriented protocols like the socket layer,
> > > > > > > HTTP(S) and NNTP(S) and IMAP(S) or with the TLS orientation to certificates,
> > > > > > > it's more than less sensible that most users have no idea of installing some
> > > > > > > NNTP browser or pointing their email to IMAP so that the email browser
> > > > > > > browses the newsgroups and for postings, here this is mostly only talk
> > > > > > > about implementing NNTP then IMAP and HTTP that happens to look like that,
> > > > > > > besides for example SMTP or NNTP posting.
> > > > > > >
> > > > > > > I.e., having "this IMAP server, happens to be this NNTP module", or
> > > > > > > "this HTTP server, happens to be a real simple mailbox these groups",
> > > > > > > makes for having partitions and retentions of those and that basically
> > > > > > > NNTP messages in the protocol can be more or less the same content
> > > > > > > in media, what otherwise is of a usual message type.
> > > > > > >
> > > > > > > Then, the NNTP server-server routine is the progation of messages
> > > > > > > besides "I shall hire ten great usenet retention accounts and gently
> > > > > > > and politely draw them down and back-fill Usenet, these ten groups".
> > > > > > >
> > > > > > > By then I would have to have made for retention in storage, such contents,
> > > > > > > as have a reference value, then for besides making that independent in
> > > > > > > reference value, just so that it suffices that it basically results "a usable
> > > > > > > durable filesystem that happens you can browse it like usenet". I.e. as
> > > > > > > the pieces to make the backfill are dug up, they get assigned reference numbers
> > > > > > > of their time to make for what here is that in a grand schema of things,
> > > > > > > they have a reference number in numerical order (and what's also the
> > > > > > > server's "message-number" besides its "message-id") as noted above this
> > > > > > > gets into the storage for retention of a file, while, most services for this
> > > > > > > are instead for storage and serving, not necessarily or at all retention.
> > > > > > >
> > > > > > > I.e., the point is that as the groups are retained from retention, there is an
> > > > > > > approach what makes for an orderly archeology, as for what convention
> > > > > > > some data arrives, here that this server-server routine is besides the usual
> > > > > > > routine which is "here are new posts, propagate them", it's "please deliver
> > > > > > > as of a retention scan, and I'll try not to repeat it, what results as orderly
> > > > > > > as possible a proof or exercise of what we'll call afterward entire retention",
> > > > > > > then will be for as of writing a file that "as of the date, from start to finish,
> > > > > > > this site certified these messages as best-effort retention".
> > > > > > >
> > > > > > > It seems then besides there is basically "here is some mbox file, serve it
> > > > > > > like it was an NNTP group or an IMAP mailbox", ingestion, in terms of that
> > > > > > > what is ingestion, is to result for the protocol that "for this protocol,
> > > > > > > there is actually a normative filesystem representation that happens to
> > > > > > > be pretty much also altogether definede by the protocol", the point is
> > > > > > > that ingestion would result in command to remain in the protocol,
> > > > > > > that a usual file type that "presents a usual abstraction, of a filesystem,
> > > > > > > as from the contents of a file", here with the notion of "for all these
> > > > > > > threaded discussions, here this system only cares some approach to
> > > > > > > these ten particular newgroups that already have mostly their corpus
> > > > > > > though it's not in perhaps their native mbox instead consulted from services".
> > > > > > >
> > > > > > > Then, there's for storing and serving the files, and there is the usual
> > > > > > > notion that moving the data, is to result, that really these file organizations
> > > > > > > are not so large in terms of resources, being "less than gigabytes" or so,
> > > > > > > still there's a notion that as a durable resource they're to be made
> > > > > > > fungible here the networked file approach in the native filesystem,
> > > > > > > then that with respect to it's a backing store, it's to make for that
> > > > > > > the entire enterprise is more or less to made in terms of account,
> > > > > > > that then as a facility on the network then a service in the network,
> > > > > > > it's basically separated the facility and service, while still of course
> > > > > > > that the service is basically defined by its corpus.
> > > > > > >
> > > > > > >
> > > > > > > Then, to make that fungible in a world of account, while with an exit
> > > > > > > strategy so that the operation isn't not abstract, is mostly about the
> > > > > > > domain name, then that what results the networking, after trusted
> > > > > > > network naming and connections for what result routing, and then
> > > > > > > the port, in terms of that there are usual firewalls in ports though that
> > > > > > > besides usually enough client ports are ephemeral, here the point is
> > > > > > > that the protocols and their well-known ports, here it's usually enough
> > > > > > > that the Internet doesn't concern itself so much protocols but with
> > > > > > > respect to proxies, here that for example NNTP and IMAP don't have
> > > > > > > so much anything so related that way after startTLS. For the world of
> > > > > > > account, is basically to have for a domain name, an administrator, and,
> > > > > > > an owner or representative. These are to establish authority for changes
> > > > > > > and also accountability for usage.
> > > > > > >
> > > > > > > Basically they're to be persons and there is a process to get to be an
> > > > > > > administrator of DNS, most always there are services that a usual person
> > > > > > > implementing the system might use, besides for example the numerical.
> > > > > > >
> > > > > > > More relevant though to DNS is getting servers on the network, with respect
> > > > > > > to listening ports and that they connect to clients what so discover them as
> > > > > > > via DNS or configuration, here as above the usual notion that these are
> > > > > > > standard services and run on well-known ports for inetd or systemd.
> > > > > > > I.e. there is basically that running a server and dedicated networking,
> > > > > > > and power and so on, and some notion of the limits of reliability, is then
> > > > > > > as very much in other aspects of the organization of the system, i.e. its name,
> > > > > > > while at the same time, the point that a module makes for that basically
> > > > > > > the provision of a domain name or well-known or ephemeral host, is the
> > > > > > > usual notion that static IP addresses are a limited resource and as about
> > > > > > > the various networks in IPv4 and how they route traffic, is for that these
> > > > > > > services have well-known sections in DNS for at least that the most usual
> > > > > > > configuration is none.
> > > > > > >
> > > > > > > For a usual global reliability and availability, is some notion basically that
> > > > > > > each region and zone has a service available on the IP address, for that
> > > > > > > "hostname" resolves to the IP addresses. As well, in reverse, for the IP
> > > > > > > address and about the hostname, it should resolve reverse to hostname.
> > > > > > >
> > > > > > > About certificates mostly for identification after mapping to port, or
> > > > > > > multi-home Internet routing, here is the point that whether the domain
> > > > > > > name administration is "epochal" or "regular", is that epochs are defined
> > > > > > > by the ports behind the numbers and the domain name system as well,
> > > > > > > where in terms of the registrar, the domain names are epochal to the
> > > > > > > registrar, with respect to owners of domain names.
> > > > > > >
> > > > > > > Then if DNS is a datagram or UDP service is for ICMP as for TCP/IP,
> > > > > > > and also BGP and NAT and routing and what are local and remote
> > > > > > > addresses, here is for not-so-much "implement DNS the protocol
> > > > > > > also while you're at it", rather for what results that there is a durable
> > > > > > > and long-standing and proper doorman, for some usenet.science.
> > > > > > >
> > > > > > > Here then the notion seems to be whether the doorman basically
> > > > > > > knows well-known services, is a multi-homing router, or otherwise
> > > > > > > what is the point that it starts the lean runtime, with respect to that
> > > > > > > it's a container and having enough sense of administration its operation
> > > > > > > as contained. I.e. here given a port and a hostname and always running
> > > > > > > makes for that as long as there is the low (preferable no) idle for services
> > > > > > > running that have no clients, is here also for the cheapest doorman that
> > > > > > > knows how to standup the client sentinel. (And put it back away.)
> > > > > > >
> > > > > > > Probably the most awful thing in the cloud services is the cost for
> > > > > > > data ingress and egress. What that means is that for example using
> > > > > > > a facility that is bound by that as a cost instead of under some constant
> > > > > > > cost, is basically why there is the approach that the containers needs a
> > > > > > > handle to the files, and they're either local files or network files, here
> > > > > > > with the some convention above in archival a shared consistent view
> > > > > > > of all the files, or abstractly consistent, is for making that the doorman
> > > > > > > can handle lots of starting and finishing connections, while it is out of
> > > > > > > the way when usually it's client traffic and opening and closing connections,
> > > > > > > and the usual abstraction is that the client sentinel is never off and doorman
> > > > > > > does nothing, here is for attaching the one to some lower constant cost,
> > > > > > > where for example any long-running cost is more than some low constant cost.
> > > > > > >
> > > > > > > Then, this kind of service is often represented by nodes, in the usual sense
> > > > > > > "here is an abstract container with you hope some native performance under
> > > > > > > the hypervisor where it lives on the farm on its rack, it basically is moved the
> > > > > > > image to wherever it's requested from and lives there, have fun, the meter is on".
> > > > > > > I.e. that's just "this Jar has some config conventions and you can make the
> > > > > > > container associate it and watchdog it with systemd for example and use the
> > > > > > > cgroups while you're at it and make for tempfs quota and also the best network
> > > > > > > file share, which you might be welcome to cache if you care just in the off-chance
> > > > > > > that this file-mapping is free or constant cost as long as it doesn't egress the
> > > > > > > network", is for here about the facilities that work, to get a copy of the system
> > > > > > > what with respect to its usual operation is a piece of the Internet.
> > > > > > >
> > > > > > > For the different reference modules (industry factories) in their patterns then
> > > > > > > and under combined configuration "file + process + network + fare", is that
> > > > > > > the fare of the service basically reflects a daily coin, in the sense that it
> > > > > > > represents an annual or epochal fee, what results for the time there is
> > > > > > > what is otherwise all defined the "file + process + network + name",
> > > > > > > what results it perpetuates in operation more than less simply and automatically.
> > > > > > >
> > > > > > > Then, the point though is to get it to where "I can go to this service, and
> > > > > > > administer it more or less by paying an account, that it thus lives in its
> > > > > > > budget and quota in its metered world".
> > > > > > >
> > > > > > > That though is very involved with identity, that in terms of "I the account
> > > > > > > as provided this sum make this sum paid with respect to an agreement",
> > > > > > > is that authority to make agreements must make that it results that the
> > > > > > > operation of the system, is entirely transparent, and defined in terms of
> > > > > > > the roles and delegation, conventions in operation.
> > > > > > >
> > > > > > > I.e., I personally don't want to administer a copy of usenet, but, it's here
> > > > > > > pretty much sorted out that I can administer one once then that it's to
> > > > > > > administer itself in the following, in terms of it having resources to allocate
> > > > > > > and resources to disburse. Also if nobody's using it it should basically work
> > > > > > > itself out to dial its lights down (while maintaining availability).
> > > > > > >
> > > > > > > Then a point seems "maintain and administer the operation in effect,
> > > > > > > what arrangement sees via delegation, that a card number and a phone
> > > > > > > number and an email account and more than less a responsible entity,
> > > > > > > is so indicated for example in cryptographic identity thus that the operation
> > > > > > > of this system as a service, effectively operates itself out of a kitty,
> > > > > > > what makes for administration and overhead, an entirely transparent
> > > > > > > model of a miniature business the system as a service".
> > > > > > >
> > > > > > > "... and a mailing address and mail service."
> > > > > > >
> > > > > > > Then, for accounts and accounts, for example is the provision of the component
> > > > > > > as simply an image in cloud algorithms, where basically as above here it's configured
> > > > > > > that anybody with any cloud account could basically run it on their own terms,
> > > > > > > there is for here sorting out "after this delegation to some business entity what
> > > > > > > results a corporation in effect, the rest is business-in-a-box and more-than-less
> > > > > > > what makes for its administration in state, is for how it basically limits and replicates
> > > > > > > its service, in terms of its own assets here as what administered is abstractly
> > > > > > > "durable forever mailboxes with private ownership if on public or managed resources".
> > > > > > >
> > > > > > > A usual notion of a private email and usenet service offering and business-in-a-box,
> > > > > > > here what I'm looking at is that besides archiving sci.math and copying out its content
> > > > > > > under author line, is to make such an industry for example here that "once having
> > > > > > > implemented an Internet service, an Internet service of them results Internet".
> > > > > > >
> > > > > > > I.e. here the point is to make a corporation and a foundation in effect, what in terms
> > > > > > > of then about the books and accounts, is about accounts for the business accounts
> > > > > > > that reflect a persistent entity, then what results in terms of computing, networking,
> > > > > > > and internetworking, with a regular notion of "let's never change this arrangement
> > > > > > > but it's in monthly or annual terms", here for that in overall arrangements,
> > > > > > > it results what the entire system more than less runs in ways then to either
> > > > > > > run out its limits or make itself a sponsored effort, about more-or-less a simple
> > > > > > > and responsible and accountable set of operations what effect the business
> > > > > > > (here that in terms of service there is basically the realm of agreement)
> > > > > > > that basically this sort of business-in-a-box model, is then besides itself of
> > > > > > > accounts, toward the notion as pay-as-you-go and "usual credits and their limits".
> > > > > > >
> > > > > > > Then for a news://usenet.science, or for example sci.math.usenet.science,
> > > > > > > is the idea that the entity is "some assemblage what is so that in DNS, and,
> > > > > > > in the accounts payable and receivable, and, in the material matters of
> > > > > > > arrangement and authority for administration, of DNS and resources and
> > > > > > > accounts what result durably persisting the business, is basically for a service
> > > > > > > then of what these are usual enough tasks, as that are interactive workflows
> > > > > > > and for mechanical workflows.
> > > > > > >
> > > > > > > I.e. the point is for having the service than an on/off button and more or less
> > > > > > > what is for a given instance of the operation, what results from some protocol
> > > > > > > that provides a "durable store" of a sort of the business, that at any time basically
> > > > > > > some re-routine or "eventually consistent" continuance of the operation of the
> > > > > > > business, results basically a continuity in its operations, what is entirely granular,
> > > > > > > that here for example the point is to "pick a DNS name, attach an account service,
> > > > > > > go" it so results that in the terms, basically there are the placeholders of the
> > > > > > > interactive workflows in that, and as what in terms are often for example simply
> > > > > > > card and phone number terms, account terms.
> > > > > > >
> > > > > > > I.e. a service to replenish accounts as kitties for making accounts only and
> > > > > > > exactly limited to the one service, its transfers, basically results that there
> > > > > > > is the notion of an email address, a phone number, a credit card's information,
> > > > > > > here a fixed limit debit account that works as of a kitty, there is a regular workflow
> > > > > > > service that will read out the durable stores and according to the timeliness of
> > > > > > > their events, affect the configuration and reconciliation of payments for accounts
> > > > > > > (closed loop scheduling/receiving).
> > > > > > >
> > > > > > > https://datatracker.ietf.org/doc/draft-flanagan-regext-datadictionary/
> > > > > > > https://www.rfc-editor.org/rfc/rfc9022.txt
> > > > > > >
> > > > > > > Basically for dailies, monthlies, and annuals, what make weeklies,
> > > > > > > is this idea of Internet-from-a- account, what is services.
> > > > > > After implementing a store, and the protocol for getting messages, then what seems relevant here in the
> > > > > > context of the SEARCH command, is a fungible file-format, that is derived from the body of the message
> > > > > > in a normal form, that is a data structure that represents an index and catalog and dictionary and summary
> > > > > > of the message, a form of a data structure of a "search index".
> > > > > >
> > > > > > These types files should naturally compose, and result a data structure that according to some normal
> > > > > > forms of search and summary algorithms, result that a data structure results, that makes for efficient
> > > > > > search of sections of the corpus for information retrieval, here that "information retrieval is the science
> > > > > > of search algorithms".
> > > > > >
> > > > > > Now, for what and how people search, or what is the specification of a search, is in terms of queries, say,
> > > > > > here for some brief forms of queries that advise what's definitely included in the search, what's excluded,
> > > > > > then perhaps what's maybe included, or yes/no/maybe, which makes for a predicate that can be built,
> > > > > > that can be applied to results that compose and build for the terms of a filter with yes/no/maybe or
> > > > > > sure/no/yes, with predicates in values.
> > > > > >
> > > > > > Here there is basically "free text search" and "matching summaries", where text is the text and summary is
> > > > > > a data structure, with attributes as paths the leaves of the tree of which match.
> > > > > >
> > > > > > Then, the message has text, its body, and and headers, key-value pairs or collections thereof, where as well
> > > > > > there are default summaries like "a histogram of words by occurrence" or for example default text like "the
> > > > > > MIME body of this message has a default text representation".
> > > > > >
> > > > > > So, the idea developing here is to define what are "normal" forms of data structures that have some "normal"
> > > > > > forms of encoding that result that these "normalizing" after "normative" data structures define well-behaved
> > > > > > algorithms upon them, which provide well-defined bounds in resources that return some quantification of results,
> > > > > > like any/each/every/all, "hits".
> > > > > >
> > > > > > This is where usually enough search engines' or collected search algorithms ("find") usually enough have these
> > > > > > de-facto forms, "under the hood", as it were, to make it first-class that for a given message and body that
> > > > > > there is a normal form of a "catalog summary index" which can be compiled to a constant when the message
> > > > > > is ingested, that then basically any filestore of these messages has alongside it the filestore of the "catsums"
> > > > > > or as on-demand, then that any algorithm has at least well-defined behavior under partitions or collections
> > > > > > or selections of these messages, or items, for various standard algorithms that separate "to find" from
> > > > > > "to serve to find".
> > > > > >
> > > > > > So, ..., what I'm wondering are what would be sufficient normal forms in brief that result that there are
> > > > > > defined for a given corpus of messages, basically at the granularity of messages, how is defined how
> > > > > > there is a normal form for each message its "catsum", that catums have a natural algebra that a
> > > > > > concatenation of catums is a catsum and that some standard algorithms naturally have well-defined
> > > > > > results on their predicates and quantifiers of matching, in serial and parallel, and that the results
> > > > > > combine in serial and parallel.
> > > > > >
> > > > > > The results should be applicable to any kind of data but here it's more or less about usenet groups.
> > > > > So I start browsing the Information Retrieval section in Wikipedia and more or less get to reading
> > > > > Luhn's 1958 "automatic coding of document summaries" or "The Automatic Creation of Literature
> > > > > Abstracts". Then, what I figure, is that the histogram, is an associative array of keys to counts,
> > > > > and what I figure is to compute both the common terms, and, the rare terms, so that there's both
> > > > > "common-weight" and "rare-weight" computed, off of the count of the terms, and the count of
> > > > > distinct terms, where it is working up that besides catums, or catsums, it would result a relational
> > > > > algebra of terms in, ..., terms, of counts and densities and these type things. This is where, first I
> > > > > would figure the catsum would be deterministic before it's at all probabilistic, because the goal is
> > > > > match-find not match-guess, while still it's to support the less deterministic but more opportunistic
> > > > > at the same time.
> > > > >
> > > > > Then, the "index" is basically like a usual book's index, for each term that's not a common term in
> > > > > the language but is a common term in the book, what page it's on, here that that is a read-out of
> > > > > a histogram of the terms to pages. Then, compound terms, basically get into grammar, and in terms
> > > > > of terms, I don't so much care to parse glossolalia as what result mostly well-defined compound terms
> > > > > in usual natural languages, for the utility of a dictionary and technical dictionaries. Here "pages" are
> > > > > both according to common message threads, and also the surround of messages in the same time
> > > > > period, where a group is a common message thread and a usenet is a common message thread.
> > > > >
> > > > > (I've had a copy of "the information retrieval book" before, also borrowed one "data logic".)
> > > > >
> > > > > "Spelling mistakes considered adversarial."
> > > > >
> > > > > https://en.wikipedia.org/wiki/Subject_indexing#Indexing_theory
> > > > >
> > > > > Then, there's lots to be said for "summary" and "summary in statistic".
> > > > >
> > > > >
> > > > > A first usual data structure for efficiency is the binary tree or bounding tree. Then, there's
> > > > > also what makes for divide-and-conquer or linear speedup.
> > > > >
> > > > >
> > > > > About the same time as Luhn's monograph or 1956, there was published a little book
> > > > > called "Logic and Language", Huppe and Kaminsky. It details how according to linguistics
> > > > > there are certain usual regular patterns of words after phonemes and morphology what
> > > > > result then for stems and etymology that then for vocabulary that grammar or natural
> > > > > language results above. Then there are also gentle introductions to logic. It's very readable
> > > > > and quite brief.
> > > > I haven't much been tapping away at this,
> > > > but it's pretty simple to stand up a usenet peer,
> > > > and pretty simple to slurp a copy,
> > > > of the "Big 8" usenet text groups, for example,
> > > > or particularly just for a few.
> > > Well, I've been thinking about this, and there are some ideas.
> > >
> > > One is about a system of reputation, the idea being New/Old/Off/Bad/Bot/Non,
> > > basically figuring that reputation is established by action.
> > >
> > > Figuring how to categorize spam, UCE, vice, crime, and call that Bad, then
> > > gets into basically two editions, with a common backing, Cur (curated) and Raw,
> > > with Old and New in curated, and Off and Bot a filter off that, and Bad and Non
> > > excluded, though in the raw feed. Then there's only to forward what's curated,
> > > or current.
> > >
> > > Here the idea is that New graduates to Old, Non might be a false-negative New,
> > > but is probably a negative Bad or Off, and then Bot is a sort of honor system, and
> > > Old might wander to Off and vice-versa, then that Off and Old can vacillate.
> > >
> > > Then for renditions, is basically that the idea is that it's the same content
> > > behind NNTP, with IMAP, then also an HTTP gateway, Atom/RDF feed, ....
> > >
> > > (It's pretty usually text-only but here is MIME.)
> > >
> > > There are various ways to make for posting that's basically for that Old
> > > can post what they want, and Off, then for something like that New,
> > > gets an email in reply to their post, that they reply to that, to round-trip a post.
> > >
> > > (Also mail-to-news and news-to-mail are pretty usual. Also there are
> > > notions of humanitarian inputs.)
> > >
> > > Similarly there are the notions above about using certificates and TLS to
> > > use technology and protocol to solve technology protocol abuse problems.
> > >
> > > For surfacing the items then is about technologies like robots.txt and
> > > Dublin Core metadata, and similar notions with respect to uniqueness.
> > > If you have other ideas about this, please chime in.
> > >
> > > Then for having a couple sorts of organizations of both the domain name
> > > and the URL's as resources, makes for example for sub-domains for groups,
> > > for example then with certificate conventions in that, then usual sorts of
> > > URL's that are, you know, URL's, and URN's, then, about URL's, URI's, and URN's.
> > >
> > > Luckily it's all quite standardized so quite stock NTTP, IMAP, and HTTP browsers,
> > > and about SMTP and IMAP, and with TLS, make of course a fungible sort of system.
> > >
> > >
> > > How to pay for it all? At about $500 a year for all text usenet,
> > > about a day's golf foursome and a few beers can stand up a new Usenet peer.
> > Basically thinking about a "backing file format convention".
> >
> > The message ID's are universally unique. File-systems support various counts and depths
> > of sub-directories. The message ID's aren't necessarily opaque structurally as file-names.
> > So, the first thing is a function that given a message-ID, results a message-ID-file-name.
> >
> > Then, as it's figured that groups, are, separable, is about how, to, either have all the
> > messages in one store, or, split it out by groups. Either way the idea is to convert the
> > message-ID-file-name, to a given depth of directories, also legal in file names, so it
> > results that the message's get uniformly distributed in sub-directories of approximately
> > equal count and depth.
> >
> > A....D...G <- message-ID
> >
> > ABCDEFG <- message-ID-file-name
> >
> > /A/B/C/D/E/F/ABCDEFG <- message-ID-directory-path
> >
> > So, the idea is that the backing file format convention, basically results uniform lookup
> > of a file's existence, then about ingestion and constructing a message, then, moving
> > that directory as a link in the filesystem, so it results atomicity in the file system that
> > supports that the existence of a message-ID-directory-path is a function of message-ID,
> > and usual filesystem guarantees.
> >
> >
> >
> > About the storage of the files, basically each message is only "header + body". Then,
> > when the message is served, then it has appended to its header the message numbers
> > according to the group, "header + numbers + body".
> >
> > So, the idea is to store the header and body compressed with deflate, then that there's
> > a pretty simple implementation of a first-class treatment of deflated data, to compute
> > the deflated "numbers" on demand, and result that concatenation results "header + numbers
> > + body". It's figured that clients would either support deflated, compressed data natively,
> > or, that the server would instead decompress data is compression's not supported, then
> > figuring that otherwise the data's stored at-rest as compressed. There's an idea that the
> > entire backing could be stored partially encrypted also, at-rest, but that would be special-purpose,
> >
> > The usual idea that the backing-file-format-convention, is a physical interface for all access,
> > and also results that tar'ing that up to a file results a transport file also, and that, simply
> > the backing-file-formats can be overlaid or make symlinks farms together and such.
> >
> >
> > There's an idea then to make metadata, of, the, message-date, basically to have partitions
> > by day, where Jan 1 2020 = Jan 1 1970 - 18262,
> >
> > YYYY/MM/DD/A/B/C/D/E/F/ABCDEFG -> symlink to /A/B/C/D/E/F/ABCDEFG/
> >
> >
> > This is where, the groups' file, which relate their message-numbers to message-ID's, only
> > has the message-numbers, vis-a-vis, browsing by date, in terms of, taking the intersection
> > of message-numbers' message-ID's and time-partitions' message-ID's.
> >
> >
> > Above, the idea of the groups file, is that message-ID's have a limit, and that, the groups file,
> > would have a fixed-size or fixed-length record, with the index and message-number being the offset,
> > and the record being the message-ID, then its header and body accessed as the message-ID-directory-path.
> >
> > So, toward working out a BFF convention is to make it possible that file operation tools
> > like tar and cp and deflate and other usual command line tools, or facilities, make it so that
> > then while there should be a symlink free approach, also then as to how to employ symlinks,
> > with regards to usual indexes from axes of access to enumeration.
> >
> > As above then I'm wondering to figure out how to make it so, that for something like a mailbox format,
> > then to have that round-trip from BFF format, but mostly how to make it so that any given collection
> > of messages, given each has a unique ID, and according to its headers its groups and injection date,
> > it results an automatic sort of building or rebuilding then the groups files.
> >
> > Another key sort of thing is the threading. Also, there is to be consider the multi-post or cross-post.
> >
> >
> > Then, for metadata, is the idea of basically into supporting the protocol's overview and wildmat,
> > then for the affinity with IMAP, then up into usual notions of key-attribute filtering, and as with
> > regards to full content search, about a sort of "search file format", or indices, again with the goal
> > of that being fungible variously, and constructible according to simple bounds, and, resulting
> > that the goal is to reduce the size of the files at rest, figuring mostly the files at rest aren't accessed,
> > or when they are, they're just served raw as compressed, because messages once authored are static.
> >
> > That said, the groups their contents grow over time, and also there is for notions of no-archive
> > and retention, basically about how to consider that in those use cases, to employ symlinks,
> > which result natural partitions, then to have usual rotation of truncation as deleting a folder,
> > invalidating all the symlinks to it, then a usual handler of ignoring broken symlinks, or deleting them,
> > so that maintenance is simple along the lines of "rm group" or "rm year".
> >
> > So, there's some thinking involved to make it so the messages each, have their own folders,
> > and then parts in those, as above, this is the thinking here along the lines of "BFF/SFF",
> > then for setting up C10+K servers in front of that for NNTP, IMAP, and a simple service
> > mechanism for surfacing HTTP, these kinds of things. Then, the idea is that metadata
> > gets accumulated next to the messages in their folders, then those also to be concatenable,
> > to result that then for search, that corpuses or corpi are built off those intermediate data,
> > for usual searches and specialized searches and these kinds things.
> >
> > Then, the idea is to make for this BFF/SFF convention, then to start gathering "certified corpi"
> > of groups over time, making for those then being pretty simply distributable like the old
> > idea of an mbox mailbox format, with regards to that being one file that results the entire thing.
> >
> > Then, threads and the message numbers, where threading by message number is the
> >
> > header + numbers + body
> >
> > the numbers part, sort of is for open and closed threads, here though of course that threads
> > are formally always open, or about composing threads of those as over them being partitioned
> > in usual reasonable times, for transient threads and long-winded threads and recurring threads.
> >
> >
> >
> > Then, besides "control" and "junk" and such or relating administration, is here for the sort
> > of minimal administration that results this NOOBNB curation. This and matters of relay
> > ingestion and authoring ingestion and ingestion as concatenation of BFF files,
> > is about these kinds of things.
> The idea of "NOOBNB curation" seems a reasonable sort of simple-enough
> yet full-enough way to start building a usual common open messaging system,
> with as well the omission of the overall un-wanted and illicit.
>
> The idea of NOOBNB curation, is that it's like "Noob NB: Nota Bene for Noobs",
> with splitting New/Old/Off or "NOO" and Bot/Non/Bad or BNB, so that the curation
> delivers NOO, or Nu, while the raw includes be-not-back, BNB.
>
> So, the idea for New/Old/Off, is that there is Off traffic, but, "caveat lector",
> reader be aware, figuring that people can still client-side "kill file" the curated feed.
>
> Then, Bot/Non/Bad, basically includes that Bot would include System Bot, and Free Bot,
> sort of with the idea of that if Bots want feed then they get raw, while System Bot can
> post metadata of what's Bot/Non/Bad and it gets simply excluded from the curated.
>
> Then, for this it seems the axis of metadata is the Author, about the relation of Authors
> to posts. I.e. it's the principal metadata axis of otherwise posts, individual messages.
>
> Here the idea is that generally that once some author's established as "Old", then
> they always go into NOO, as either Old or Off, while "New" is the establishment
> of this maturity, to at least follow the charter and otherwise for take-it-or-leave-it.
>
>
> Then, "Non" is basically that "New", according to Author, basically either gets accepted,
> or not, according to what must be some "objective standards of topicality and etiquette".
>
> Then "Bad" is pretty much that anybody who results Bad basically gets marked Bad.
>
> Now, it's a temporal thing, and it's possible that attacks would result false positives
> and false negatives, a.k.a. Type I and Type II errors. There's a general idea to attenuate
> "Off" and even "Bad", figuring "Off" reverts to "Old" and "Bad" reverts to "Non", according
> to Author, or for example "Injection Site".
>
>
> Then, for the posting side, there are some things involved. There are legal things involved,
> illicit content or contraband, have some safe harbor provisions in usual first-world countries,
> vis-a-vis, for example, the copyright claim. Responsiveness to copyright claims, would basically
> be marking spammers of warez as Bad, and not including them in the curated, that being figured
> the extent of responsibility.
>
> There's otherwise a usual good-faith expectation of fair-use, intellectual-property wise.
>
>
> Otherwise then it's that "Usenet the protocol relies on email identity". So, the idea to implement
> that posts round-trip through email, is considered the bar.
>
> Here then furthermore is considered how to make a general sort of Injection-Site algorithm,
> in terms of peering or peerages, and compeering, as with respect to Sites, their policies, and then
> here with respect to the dual outfeeds, curated and raw, figuring curated is good-faith and raw,
> includes garbage, or for example to just pipe raw to /dev/null, and for automatically curating in-feed.
>
> The idea is to support establishment of association of an e-mail identity, so that a usual sort
> of general-purpose responsible algorithm, can work up various factors authentication, in
> the usual notions of authentication AuthN and authorization AuthZ, with respect to
> login and "posting allowed", or as via delegation in what's called Federated identity,
> that resulting being the responsibility of peers, their hosts, and so on.
>
> Then, about that for humanitarian and free-press sorts reasons, "anonymity", well first
> off there's anonymity is not part of the charter, and indeed the charter says to use
> your real name and your real e-mail address. I.e., anonymity on the one has a reasonable
> sort of misdirection from miscreants attacking anybody, on the other hand those same
> sorts miscreants abuse anonymity, so, here it's basically the idea that "NOOBNB" is a very
> brief system of reputation as of the vouched identity of an author by email address,
> or the opaque value that results gets posted in the sender field by whatever peer injects whatever.
>
> How then to automatically characterize spam and the illicit is sort of a thing,
> while that the off-topic but otherwise according to charter including the spirit
> of the charter as free press, with anonymity to protect while not anonymity to attack,
> these are the kinds of things that help make for that "NOOBNB curation", is to result
> a sort of addendum to Usenet charter, that results though same as the old Usenet charter.
>
> Characterization could include for example "MIME banned", "glyph ranges banned",
> "subjects banned", "injection sites banned", these being open then so that legitimate
> posters run not afoul, that while bad actors could adapt, then they would get funneled
> into "automatic semantic text characterization bans".
>
> The idea then is that responsible injection sites will have measures in place to prevent
> "Non" authors from becoming "New" authors, those maturing, "Old" and "Off" post freely,
> that among "Bot" is "System Bot" and "Tag Bot", then that according to algorithms in
> data in the raw Bot feed, is established relations that attenuate to Bad and Non,
> so that it's a self-describing sort of data set, and peers pick up either or both.
>
>
> Then the other key notion is to reflect an ID generator, so that, every post, gets
> exactly and uniquely, one ID, identifier, a global and universally unique identifer.
> This was addressed as above and it's a usual notion of a common facility, UUID dispenser.
> The idea of identifying those over times, is for that over the corpus, is established
> a sort of digit-by-digit stamp generator, to check for IDs over the entire corpus,
> or here a compact and efficient representation of same, then for issuing ranges,
> for usual expectations of the order of sites on the order of posters the order of posts.
>
> Luckily it's sort of already the case that all the messages already do have unique ID's.
>
> "Usenet: it has a charter."


About build-time and run-time, here the idea is to make some specifications
what reflect the BFF/SFF filesystem and file-format conventions, then to
make it so that algorithms and servers run on those, as then with respect
to reference implementations, and specification conformance, of the client
protocols, and the server and service protocols, what are all pretty much
standardized, inside and outside, usual sorts Internet text protocols,
and usual sorts data facilities.

I figure the usual sort of milieu these days for common, open systems,
is something like "Git Web", or otherwise in terms of git hosting,
in terms of that it's an idea that setting up a git server, makes it
pretty simple to clone code and so on. I'm most familiar with this
tooling compared to RCS, CVS, svn, hg, tla, arch, or other sorts usual
"source control", systems. Most people might know: git.


So, the idea is to make reference implementations in various editions of tooling,
that result the establishment of the common backing, this filesystem convention
or BFF the backing file-format, best friends forever, then basically about making
for their being cataloged archives of groups their messages in time-series data,
then to simply start a Usenet archive by concatenating those together as overlaying
them, then as to generating the article numbers, as where the article numbers are
specific to the installation, where there are globally unique IDs of message-IDs,
then article numbers indicate the server's handles to messages by group.

The sources of reference implementations of services and algorithms are sources
and go in source control, but the notion of archives fungibly in BFF files,
represent static assets for where a given corpus of a month's messages
basically represent the entirety, or what "25 million messages" is, vis-a-vis
low-volume groups like Big 8 text Usenet, and here curated and raw feeds after NOOBNB.

So, there's a general idea to surface the archive files, those being fungible anyways,
then some bootstrap scripts in terms of data-install and code-install, for config/code/data,
so that anybody can rent a node, clone these scripts, download a year's Usenet,
run some scripts if to setup SFF files, then launch a Usenet service.

So, that is about common sources and provisioning of code and data.

The compeering then is the other idea about the usual idea of pull and push feeds,
and suck feeds, where NNTP is mostly push feeds, and compeers are expected to
be online and accept CHECK, IHAVE, and TAKETHIS, and these kinds use-cases of
ingestion, of the propagation of posts.

There's a notion of a sort of compeering topology, basically in terms of "the lot of us
will hire each some introductory resources, and use them up, passing around the routing
according to DNS, what serves making ingress and egress, from a named Internet protocol port".

https://datatracker.ietf.org/doc/html/rfc3977
https://datatracker.ietf.org/doc/html/rfc4644


(Looking at WILDMAT, it's cool that a sort of this yes/no/maybe or sure/no/yes, which
is a sort of very composable filtering. I sort of invented one of those for rich front-end
data tables since looking at the specs here, "filterPredicate", composable, front-end/back-end,
yes/no/maybe.)

I.e., NNTP has a static (network) topology, expecting peers to be online usually, while here
the idea is that "compeering", will include push and pull, about the "X-RETRANSFER-TO",
and along the lines of the Message Transfer Agent, queuing messages for opportunistic
delivery, and in-line with the notions of e-mail traditionally and the functions of DNS and
the Internet protocols.

https://datatracker.ietf.org/doc/html/rfc4642
https://datatracker.ietf.org/doc/html/rfc1036
https://datatracker.ietf.org/doc/html/rfc2980
https://datatracker.ietf.org/doc/html/rfc4644
https://datatracker.ietf.org/doc/html/rfc4643

This idea of compeering sort of results that as peers come online, then to start
in the time-series data of the last transmission, then to launch a push feed
up to currency. It's similar with that simply being periodic in real-time (clock time),
or message-driven, pushing messages as they arrive.

The message feeds in-feeds and out-feeds reflect sorts of system accounts
or peering agreements, then for the compeering to establish what are the
topologies, then for something like a message transfer agent, to fill a basket
with the contents, for caches or a sort of lock-box approach, as well aligned
with SMTP, POP3, IMAP, and other Internet text protocols of messaging.

The idea is to implement some use-cases of compeering, with e-mail,
news2mail and mail2news, as the Internet protocols have high affinity
for each other, and are widely implemented.

So, besides the runtime (code and data, config), then is also involved the infrastructure,
resources of the runtime and resources of the networking. It's pretty simple to write
code and not very difficult to get data, then infrastructure gets into cost. This was
described above as the idea of "business-in-a-box".

Well, tapping away at this, ....
Mild Shock
2024-01-24 12:46:07 UTC
Reply
Permalink
Ha Ha, Rossy Boy will be the first moron not able to post
anymore when they shut down Google Groups posting channel.
The rubbish he posts here is just copy pasta nonsense from
a cheap Archimedes Plutonium copy.

Ross Finlayson schrieb am Mittwoch, 24. Januar 2024 um 03:54:52 UTC+1:

> Well, tapping away at this, ....
Mild Shock
2024-01-24 12:51:03 UTC
Reply
Permalink
How do I know, well his header contains:

Injection-Info: google-groups.googlegroups.com;
posting-host=97.126.97.251; posting-account=WH2DoQoAAADZe3cdQWvJ9HKImeLRniYW

Means he is even using google groups right now.
From his cabin in the woods:

$ whois 97.126.97.251

OrgName: CenturyLink Communications, LLC
OrgId: CCL-534
Address: 100 CENTURYLINK DR
City: Monroe
StateProv: LA
PostalCode: 71201

Mild Shock schrieb:
>
> Ha Ha, Rossy Boy will be the first moron not able to post
> anymore when they shut down Google Groups posting channel.
> The rubbish he posts here is just copy pasta nonsense from
> a cheap Archimedes Plutonium copy.
>
> Ross Finlayson schrieb am Mittwoch, 24. Januar 2024 um 03:54:52 UTC+1:
>
>> Well, tapping away at this, ....
Alan Mackenzie
2024-01-24 19:41:15 UTC
Reply
Permalink
Ross Finlayson <***@gmail.com> wrote:

[ .... ]

> Basically thinking about a "backing file format convention".

> The message ID's are universally unique. File-systems support various
> counts and depths of sub-directories. The message ID's aren't
> necessarily opaque structurally as file-names. So, the first thing is
> a function that given a message-ID, results a message-ID-file-name.

> Then, as it's figured that groups, are, separable, is about how, to,
> either have all the messages in one store, or, split it out by groups.
> Either way the idea is to convert the message-ID-file-name, to a given
> depth of directories, also legal in file names, so it results that the
> message's get uniformly distributed in sub-directories of approximately
> equal count and depth.

> A....D...G <- message-ID

> ABCDEFG <- message-ID-file-name

> /A/B/C/D/E/F/ABCDEFG <- message-ID-directory-path

> So, the idea is that the backing file format convention, basically
> results uniform lookup of a file's existence, then about ingestion and
> constructing a message, then, moving that directory as a link in the
> filesystem, so it results atomicity in the file system that supports
> that the existence of a message-ID-directory-path is a function of
> message-ID, and usual filesystem guarantees.

[ .... ]

> Then, threads and the message numbers, where threading by message
> number is the

> header + numbers + body

> the numbers part, sort of is for open and closed threads, here though
> of course that threads are formally always open, or about composing
> threads of those as over them being partitioned in usual reasonable
> times, for transient threads and long-winded threads and recurring
> threads.

> Then, besides "control" and "junk" and such or relating administration,
> is here for the sort of minimal administration that results this NOOBNB
> curation. This and matters of relay ingestion and authoring ingestion
> and ingestion as concatenation of BFF files, is about these kinds of
> things.

What you've described is known as a news server, and several well
established ones exist. You're trying to reinvent the wheel.

--
Alan Mackenzie (Nuremberg, Germany).
Ross Finlayson
2024-01-25 01:21:02 UTC
Reply
Permalink
On Wednesday, January 24, 2024 at 11:41:22 AM UTC-8, Alan Mackenzie wrote:
> Ross Finlayson <***@gmail.com> wrote:
>
> [ .... ]
> > Basically thinking about a "backing file format convention".
>
> > The message ID's are universally unique. File-systems support various
> > counts and depths of sub-directories. The message ID's aren't
> > necessarily opaque structurally as file-names. So, the first thing is
> > a function that given a message-ID, results a message-ID-file-name.
>
> > Then, as it's figured that groups, are, separable, is about how, to,
> > either have all the messages in one store, or, split it out by groups.
> > Either way the idea is to convert the message-ID-file-name, to a given
> > depth of directories, also legal in file names, so it results that the
> > message's get uniformly distributed in sub-directories of approximately
> > equal count and depth.
>
> > A....D...G <- message-ID
>
> > ABCDEFG <- message-ID-file-name
>
> > /A/B/C/D/E/F/ABCDEFG <- message-ID-directory-path
>
> > So, the idea is that the backing file format convention, basically
> > results uniform lookup of a file's existence, then about ingestion and
> > constructing a message, then, moving that directory as a link in the
> > filesystem, so it results atomicity in the file system that supports
> > that the existence of a message-ID-directory-path is a function of
> > message-ID, and usual filesystem guarantees.
> [ .... ]
> > Then, threads and the message numbers, where threading by message
> > number is the
>
> > header + numbers + body
>
> > the numbers part, sort of is for open and closed threads, here though
> > of course that threads are formally always open, or about composing
> > threads of those as over them being partitioned in usual reasonable
> > times, for transient threads and long-winded threads and recurring
> > threads.
>
> > Then, besides "control" and "junk" and such or relating administration,
> > is here for the sort of minimal administration that results this NOOBNB
> > curation. This and matters of relay ingestion and authoring ingestion
> > and ingestion as concatenation of BFF files, is about these kinds of
> > things.
> What you've described is known as a news server, and several well
> established ones exist. You're trying to reinvent the wheel.
>
> --
> Alan Mackenzie (Nuremberg, Germany).




Yeah, when there's a single point of ingress, is pretty much simpler than
when there's federated ingress, or here NNTP peerage, vis-a-vis a site's
own postings.

Here it's uncomplicated when all messages get propagated to all peers,
with the idea that NOOBNB pattern is going to ingest raw and result curated
(curated, cured, cur).


How to figure out for each incoming item, whether to have System Tag Bot
result appending another item marking it, or, just storing a stub for the
item as excluded, gets into "deep inspection", or as related to the things.

Because Usenet is already an ongoing concern, it's sort of easy to identify
old posters already, then about the issue of handling New/Non, and as
with regards to identifying Bad, as what it results Cur is New/Old/Off
and Raw includes Bot/Non/Bad, or rather that it excludes Bot/Non/Bad,
with regards to whether the purpose of Bot is to propagate Bans.


It's sort of expected that the Author field makes for a given Author,
but some posters for example mutilate the e-mail address or result
something non-unique. Disambiguating those, then, is for the idea
that either the full contents of the Author field make a thing or that
otherwise Authors would need to make some way to disambiguate Sender.

About propagation and stubbing, the idea is that propagation should
generally result, then that presence of articles or stubs either way
results the relevant response code, as with regards to either
"propagating raw including Non and Bad" or just "propagating Raw
only Non-Tag and Bad-Tag Tag-Bot, generated messages", basically
with the idea of semantics of "control" and "junk", or "just ignore it".


The use case of lots of users of Usenet isn't a copy of Usenet, just
a few relevant groups. Others for example appreciate all the _belles lettres_
of text, and nothing from binaries. Lots of users of Usenet have it
as mostly a suck-feed of warez and vice. Here I don't much care about
except _belles lettres_.


So, here NOOBNB is a sort of white-list approach, because Authors is
much less than messages, to relate incoming messages, to Authors,
per group, here that ingestion is otherwise constant-rate for assigning
numbers in the groups a message is in, then as with regards to threading
and bucketing, about how to result these sorts ideas sort of building up
from "the utility of bitmaps" to this "patterns in range" and "region calculus",
here though what's to result partially digested intermediate results for an
overall concatenation strategy then for selection and analysis,
all entirely write-once-read-many.

It's figured that Authors will write and somebody will eventually read them,
with regards to that readings and replies result the Author born as New
and then maturing to Old, what results after Author infancy, to result
a usual sort of idea that Authors that read Bad are likely enough Bad themselves.

I.e., there's a sort of hysteresis to arrive at born as New, in a group,
then a sort of gentle infancy to result Old, or Off, in a group, as
with regards to the purgatory of Non or banishment of Bad.

happy case:
Non -> New -> Old (good)
Non -> Bad (bad)

Old -> Off
Off -> Old


The idea's that nobody's a moderator, but anybody's a reviewer,
and correspondent, then that correspondents to spam or Bad get
the storage of a signed quantity, about the judgment, of what
is spam, in the error modes.

error modes:
Non -> false New
Non -> false not Bad


New -> Bad
Old -> Bad

(There's that reviewers and correspondents
Old <-> Old
Off <-> Old
Old <-> Off
Off <-> Off
result those are all same O <-> O.)

The idea's that nobody's a moderator, and furthermore then all
the rules of the ignorance of Non and banishment of Bad,
then though are as how to arrive at that Non's, get a chance
to be reviewed by Old/Off and New, with respect to New and New
resulting also the conditions of creation, of a group, vis-a-vis,
the conditions of continuity, of a group.


I.e. the relations should so arise that creating a group and posting
to it, should result "Originator" or a sort of class of Old, about these
ideas of the best sort of reasonable performance and long-lived scalability
and horizontal scalability, that results interpreting any usual sort of
messaging with message-ID's and authors, in a reference algorithm
and error-detection and error-correction, "NOOBNB".

There's an idea that Bot replies to new posters, "the Nota Bene",
but, another that Bot replies to Non and Bad, and another that
there's none of that at all, or not guaranteed.


Then, the idea is that this is matters of convention and site policy,
what it results exactly the same as a conformant Usenet peer,
in "NOOBNB compeering: slightly less crap".


Then, getting into relating readings (reviews) and correspondence
as a matter of site policy in readings or demonstration in correspondence,
results largely correspondence discriminates Old from Bad, and New from Non.

Then as "un-moderated" there's still basically "site-policy",
basically in layers that result "un-abuse", "dis-abuse".

I.e. the disabusement of abuse, is of this Old <-> Off for the venial,
and about the ceremony of infancy via some kind of interaction
or the author's own origination, about gating New, then figuring
that New matures to Old and then the compute cost is on News,
that long-running conversations result constants, called stability.

Well I'm curious your opinion of this sort of approach, it's basically as of
defining conventions of common messaging, what result a simplest
and most-egalitarian common resource of correspondents in _belles lettres_.
Ross Finlayson
2024-01-25 03:12:23 UTC
Reply
Permalink
On Wednesday, January 24, 2024 at 5:21:09 PM UTC-8, Ross Finlayson wrote:
> On Wednesday, January 24, 2024 at 11:41:22 AM UTC-8, Alan Mackenzie wrote:
> > Ross Finlayson <***@gmail.com> wrote:
> >
> > [ .... ]
> > > Basically thinking about a "backing file format convention".
> >
> > > The message ID's are universally unique. File-systems support various
> > > counts and depths of sub-directories. The message ID's aren't
> > > necessarily opaque structurally as file-names. So, the first thing is
> > > a function that given a message-ID, results a message-ID-file-name.
> >
> > > Then, as it's figured that groups, are, separable, is about how, to,
> > > either have all the messages in one store, or, split it out by groups.
> > > Either way the idea is to convert the message-ID-file-name, to a given
> > > depth of directories, also legal in file names, so it results that the
> > > message's get uniformly distributed in sub-directories of approximately
> > > equal count and depth.
> >
> > > A....D...G <- message-ID
> >
> > > ABCDEFG <- message-ID-file-name
> >
> > > /A/B/C/D/E/F/ABCDEFG <- message-ID-directory-path
> >
> > > So, the idea is that the backing file format convention, basically
> > > results uniform lookup of a file's existence, then about ingestion and
> > > constructing a message, then, moving that directory as a link in the
> > > filesystem, so it results atomicity in the file system that supports
> > > that the existence of a message-ID-directory-path is a function of
> > > message-ID, and usual filesystem guarantees.
> > [ .... ]
> > > Then, threads and the message numbers, where threading by message
> > > number is the
> >
> > > header + numbers + body
> >
> > > the numbers part, sort of is for open and closed threads, here though
> > > of course that threads are formally always open, or about composing
> > > threads of those as over them being partitioned in usual reasonable
> > > times, for transient threads and long-winded threads and recurring
> > > threads.
> >
> > > Then, besides "control" and "junk" and such or relating administration,
> > > is here for the sort of minimal administration that results this NOOBNB
> > > curation. This and matters of relay ingestion and authoring ingestion
> > > and ingestion as concatenation of BFF files, is about these kinds of
> > > things.
> > What you've described is known as a news server, and several well
> > established ones exist. You're trying to reinvent the wheel.
> >
> > --
> > Alan Mackenzie (Nuremberg, Germany).
> Yeah, when there's a single point of ingress, is pretty much simpler than
> when there's federated ingress, or here NNTP peerage, vis-a-vis a site's
> own postings.
>
> Here it's uncomplicated when all messages get propagated to all peers,
> with the idea that NOOBNB pattern is going to ingest raw and result curated
> (curated, cured, cur).
>
>
> How to figure out for each incoming item, whether to have System Tag Bot
> result appending another item marking it, or, just storing a stub for the
> item as excluded, gets into "deep inspection", or as related to the things.
>
> Because Usenet is already an ongoing concern, it's sort of easy to identify
> old posters already, then about the issue of handling New/Non, and as
> with regards to identifying Bad, as what it results Cur is New/Old/Off
> and Raw includes Bot/Non/Bad, or rather that it excludes Bot/Non/Bad,
> with regards to whether the purpose of Bot is to propagate Bans.
>
>
> It's sort of expected that the Author field makes for a given Author,
> but some posters for example mutilate the e-mail address or result
> something non-unique. Disambiguating those, then, is for the idea
> that either the full contents of the Author field make a thing or that
> otherwise Authors would need to make some way to disambiguate Sender.
>
> About propagation and stubbing, the idea is that propagation should
> generally result, then that presence of articles or stubs either way
> results the relevant response code, as with regards to either
> "propagating raw including Non and Bad" or just "propagating Raw
> only Non-Tag and Bad-Tag Tag-Bot, generated messages", basically
> with the idea of semantics of "control" and "junk", or "just ignore it".
>
>
> The use case of lots of users of Usenet isn't a copy of Usenet, just
> a few relevant groups. Others for example appreciate all the _belles lettres_
> of text, and nothing from binaries. Lots of users of Usenet have it
> as mostly a suck-feed of warez and vice. Here I don't much care about
> except _belles lettres_.
>
>
> So, here NOOBNB is a sort of white-list approach, because Authors is
> much less than messages, to relate incoming messages, to Authors,
> per group, here that ingestion is otherwise constant-rate for assigning
> numbers in the groups a message is in, then as with regards to threading
> and bucketing, about how to result these sorts ideas sort of building up
> from "the utility of bitmaps" to this "patterns in range" and "region calculus",
> here though what's to result partially digested intermediate results for an
> overall concatenation strategy then for selection and analysis,
> all entirely write-once-read-many.
>
> It's figured that Authors will write and somebody will eventually read them,
> with regards to that readings and replies result the Author born as New
> and then maturing to Old, what results after Author infancy, to result
> a usual sort of idea that Authors that read Bad are likely enough Bad themselves.
>
> I.e., there's a sort of hysteresis to arrive at born as New, in a group,
> then a sort of gentle infancy to result Old, or Off, in a group, as
> with regards to the purgatory of Non or banishment of Bad.
>
> happy case:
> Non -> New -> Old (good)
> Non -> Bad (bad)
>
> Old -> Off
> Off -> Old
>
>
> The idea's that nobody's a moderator, but anybody's a reviewer,
> and correspondent, then that correspondents to spam or Bad get
> the storage of a signed quantity, about the judgment, of what
> is spam, in the error modes.
>
> error modes:
> Non -> false New
> Non -> false not Bad
>
>
> New -> Bad
> Old -> Bad
>
> (There's that reviewers and correspondents
> Old <-> Old
> Off <-> Old
> Old <-> Off
> Off <-> Off
> result those are all same O <-> O.)
>
> The idea's that nobody's a moderator, and furthermore then all
> the rules of the ignorance of Non and banishment of Bad,
> then though are as how to arrive at that Non's, get a chance
> to be reviewed by Old/Off and New, with respect to New and New
> resulting also the conditions of creation, of a group, vis-a-vis,
> the conditions of continuity, of a group.
>
>
> I.e. the relations should so arise that creating a group and posting
> to it, should result "Originator" or a sort of class of Old, about these
> ideas of the best sort of reasonable performance and long-lived scalability
> and horizontal scalability, that results interpreting any usual sort of
> messaging with message-ID's and authors, in a reference algorithm
> and error-detection and error-correction, "NOOBNB".
>
> There's an idea that Bot replies to new posters, "the Nota Bene",
> but, another that Bot replies to Non and Bad, and another that
> there's none of that at all, or not guaranteed.
>
>
> Then, the idea is that this is matters of convention and site policy,
> what it results exactly the same as a conformant Usenet peer,
> in "NOOBNB compeering: slightly less crap".
>
>
> Then, getting into relating readings (reviews) and correspondence
> as a matter of site policy in readings or demonstration in correspondence,
> results largely correspondence discriminates Old from Bad, and New from Non.
>
> Then as "un-moderated" there's still basically "site-policy",
> basically in layers that result "un-abuse", "dis-abuse".
>
> I.e. the disabusement of abuse, is of this Old <-> Off for the venial,
> and about the ceremony of infancy via some kind of interaction
> or the author's own origination, about gating New, then figuring
> that New matures to Old and then the compute cost is on News,
> that long-running conversations result constants, called stability.
>
> Well I'm curious your opinion of this sort of approach, it's basically as of
> defining conventions of common messaging, what result a simplest
> and most-egalitarian common resource of correspondents in _belles lettres_.






Then it seems the idea is to have _three_ editions,

Cur: current, curated, New/Old/Off
Pur: purgatory, Non/New/Old/Off
Raw: raw, Non/New/Old/Off/Bot/Bad

Then, the idea for bot, seems to be for system, to have delegations,
of Bot to Old, with respect to otherwise usually the actions of Old,
to indicate correspondence.

Then, with regards to review, it would sort of depend on some Old
or Off authors reviewing Pur, with regards to review and/or correspondence,
what results graduating Non to New, then that it results that
there's exactly a sort of usual write-once-read-many, common
backing store well-defined by presence in access (according to filesystem).



Then, for the groups files, it's figured there's the main message-Id's,
as with respect to cur/pur/raw, then with regards to author's on the
groups, presence in the authors files indicating Old, then with regards
to graduation Non to New and New to Old.

Keeping things simple, then the idea is to make it so that usual New
have a way to graduate from Non, where there is or isn't much traffic
or is or isn't much attention paid to Pur.

The idea is that newbies log on to Pur, then post there on their own
or in replies to New/Old/Off, that thus far this is entirely of a monadic
or pure function the routine, which is thusly compile-able and parallelizable,
and about variables in effect, what result site policy, and error modes.


There's an idea that Non's could reply to their own posts,
as to eventually those graduating altogether, or for example
just that posting is allowed, to Pur, until marked either New or Bad.


The ratio of Bad+Non+Bot to Old+Off+New, basically has that it's figured
that due to attacks like the one currently underway from Google Groups,
would be non-zero. The idea then is whether to grow the groups file,
in the sequence of all message-IDs, and whether to maintain one edition
of the groups file, and ever modify it in place, that here the goal is instead
growing files of write-once-read-many, and because propagation is permanent.

Raw >= Pur >= Cur

I.e., every message-id gets a line in the raw feed, that there is one, then as
with regards to whether the line has reserved characters, where otherwise
it's a fixed-length record up above the maximum length of message-id,
the line, of the groups file, the index of its message-numbers.


See, the idea here is a sort of reference implementation, and a normative implementation,
in what are fungible and well-defined resources, here files, with reasonable performance
and horizontal scale-ability and long-time performance with minimal or monotone maintenance.

Then the files are sort of defined as either write-once and final or write-once and growing,
given that pretty much unbounded file resources result a quite most usual runtime.



Don't they already have one of these somewhere?
Ross Finlayson
2024-01-26 20:37:46 UTC
Reply
Permalink
On Wednesday, January 24, 2024 at 7:12:29 PM UTC-8, Ross Finlayson wrote:
> On Wednesday, January 24, 2024 at 5:21:09 PM UTC-8, Ross Finlayson wrote:
> > On Wednesday, January 24, 2024 at 11:41:22 AM UTC-8, Alan Mackenzie wrote:
> > > Ross Finlayson <***@gmail.com> wrote:
> > >
> > > [ .... ]
> > > > Basically thinking about a "backing file format convention".
> > >
> > > > The message ID's are universally unique. File-systems support various
> > > > counts and depths of sub-directories. The message ID's aren't
> > > > necessarily opaque structurally as file-names. So, the first thing is
> > > > a function that given a message-ID, results a message-ID-file-name.
> > >
> > > > Then, as it's figured that groups, are, separable, is about how, to,
> > > > either have all the messages in one store, or, split it out by groups.
> > > > Either way the idea is to convert the message-ID-file-name, to a given
> > > > depth of directories, also legal in file names, so it results that the
> > > > message's get uniformly distributed in sub-directories of approximately
> > > > equal count and depth.
> > >
> > > > A....D...G <- message-ID
> > >
> > > > ABCDEFG <- message-ID-file-name
> > >
> > > > /A/B/C/D/E/F/ABCDEFG <- message-ID-directory-path
> > >
> > > > So, the idea is that the backing file format convention, basically
> > > > results uniform lookup of a file's existence, then about ingestion and
> > > > constructing a message, then, moving that directory as a link in the
> > > > filesystem, so it results atomicity in the file system that supports
> > > > that the existence of a message-ID-directory-path is a function of
> > > > message-ID, and usual filesystem guarantees.
> > > [ .... ]
> > > > Then, threads and the message numbers, where threading by message
> > > > number is the
> > >
> > > > header + numbers + body
> > >
> > > > the numbers part, sort of is for open and closed threads, here though
> > > > of course that threads are formally always open, or about composing
> > > > threads of those as over them being partitioned in usual reasonable
> > > > times, for transient threads and long-winded threads and recurring
> > > > threads.
> > >
> > > > Then, besides "control" and "junk" and such or relating administration,
> > > > is here for the sort of minimal administration that results this NOOBNB
> > > > curation. This and matters of relay ingestion and authoring ingestion
> > > > and ingestion as concatenation of BFF files, is about these kinds of
> > > > things.
> > > What you've described is known as a news server, and several well
> > > established ones exist. You're trying to reinvent the wheel.
> > >
> > > --
> > > Alan Mackenzie (Nuremberg, Germany).
> > Yeah, when there's a single point of ingress, is pretty much simpler than
> > when there's federated ingress, or here NNTP peerage, vis-a-vis a site's
> > own postings.
> >
> > Here it's uncomplicated when all messages get propagated to all peers,
> > with the idea that NOOBNB pattern is going to ingest raw and result curated
> > (curated, cured, cur).
> >
> >
> > How to figure out for each incoming item, whether to have System Tag Bot
> > result appending another item marking it, or, just storing a stub for the
> > item as excluded, gets into "deep inspection", or as related to the things.
> >
> > Because Usenet is already an ongoing concern, it's sort of easy to identify
> > old posters already, then about the issue of handling New/Non, and as
> > with regards to identifying Bad, as what it results Cur is New/Old/Off
> > and Raw includes Bot/Non/Bad, or rather that it excludes Bot/Non/Bad,
> > with regards to whether the purpose of Bot is to propagate Bans.
> >
> >
> > It's sort of expected that the Author field makes for a given Author,
> > but some posters for example mutilate the e-mail address or result
> > something non-unique. Disambiguating those, then, is for the idea
> > that either the full contents of the Author field make a thing or that
> > otherwise Authors would need to make some way to disambiguate Sender.
> >
> > About propagation and stubbing, the idea is that propagation should
> > generally result, then that presence of articles or stubs either way
> > results the relevant response code, as with regards to either
> > "propagating raw including Non and Bad" or just "propagating Raw
> > only Non-Tag and Bad-Tag Tag-Bot, generated messages", basically
> > with the idea of semantics of "control" and "junk", or "just ignore it".
> >
> >
> > The use case of lots of users of Usenet isn't a copy of Usenet, just
> > a few relevant groups. Others for example appreciate all the _belles lettres_
> > of text, and nothing from binaries. Lots of users of Usenet have it
> > as mostly a suck-feed of warez and vice. Here I don't much care about
> > except _belles lettres_.
> >
> >
> > So, here NOOBNB is a sort of white-list approach, because Authors is
> > much less than messages, to relate incoming messages, to Authors,
> > per group, here that ingestion is otherwise constant-rate for assigning
> > numbers in the groups a message is in, then as with regards to threading
> > and bucketing, about how to result these sorts ideas sort of building up
> > from "the utility of bitmaps" to this "patterns in range" and "region calculus",
> > here though what's to result partially digested intermediate results for an
> > overall concatenation strategy then for selection and analysis,
> > all entirely write-once-read-many.
> >
> > It's figured that Authors will write and somebody will eventually read them,
> > with regards to that readings and replies result the Author born as New
> > and then maturing to Old, what results after Author infancy, to result
> > a usual sort of idea that Authors that read Bad are likely enough Bad themselves.
> >
> > I.e., there's a sort of hysteresis to arrive at born as New, in a group,
> > then a sort of gentle infancy to result Old, or Off, in a group, as
> > with regards to the purgatory of Non or banishment of Bad.
> >
> > happy case:
> > Non -> New -> Old (good)
> > Non -> Bad (bad)
> >
> > Old -> Off
> > Off -> Old
> >
> >
> > The idea's that nobody's a moderator, but anybody's a reviewer,
> > and correspondent, then that correspondents to spam or Bad get
> > the storage of a signed quantity, about the judgment, of what
> > is spam, in the error modes.
> >
> > error modes:
> > Non -> false New
> > Non -> false not Bad
> >
> >
> > New -> Bad
> > Old -> Bad
> >
> > (There's that reviewers and correspondents
> > Old <-> Old
> > Off <-> Old
> > Old <-> Off
> > Off <-> Off
> > result those are all same O <-> O.)
> >
> > The idea's that nobody's a moderator, and furthermore then all
> > the rules of the ignorance of Non and banishment of Bad,
> > then though are as how to arrive at that Non's, get a chance
> > to be reviewed by Old/Off and New, with respect to New and New
> > resulting also the conditions of creation, of a group, vis-a-vis,
> > the conditions of continuity, of a group.
> >
> >
> > I.e. the relations should so arise that creating a group and posting
> > to it, should result "Originator" or a sort of class of Old, about these
> > ideas of the best sort of reasonable performance and long-lived scalability
> > and horizontal scalability, that results interpreting any usual sort of
> > messaging with message-ID's and authors, in a reference algorithm
> > and error-detection and error-correction, "NOOBNB".
> >
> > There's an idea that Bot replies to new posters, "the Nota Bene",
> > but, another that Bot replies to Non and Bad, and another that
> > there's none of that at all, or not guaranteed.
> >
> >
> > Then, the idea is that this is matters of convention and site policy,
> > what it results exactly the same as a conformant Usenet peer,
> > in "NOOBNB compeering: slightly less crap".
> >
> >
> > Then, getting into relating readings (reviews) and correspondence
> > as a matter of site policy in readings or demonstration in correspondence,
> > results largely correspondence discriminates Old from Bad, and New from Non.
> >
> > Then as "un-moderated" there's still basically "site-policy",
> > basically in layers that result "un-abuse", "dis-abuse".
> >
> > I.e. the disabusement of abuse, is of this Old <-> Off for the venial,
> > and about the ceremony of infancy via some kind of interaction
> > or the author's own origination, about gating New, then figuring
> > that New matures to Old and then the compute cost is on News,
> > that long-running conversations result constants, called stability.
> >
> > Well I'm curious your opinion of this sort of approach, it's basically as of
> > defining conventions of common messaging, what result a simplest
> > and most-egalitarian common resource of correspondents in _belles lettres_.
> Then it seems the idea is to have _three_ editions,
>
> Cur: current, curated, New/Old/Off
> Pur: purgatory, Non/New/Old/Off
> Raw: raw, Non/New/Old/Off/Bot/Bad
>
> Then, the idea for bot, seems to be for system, to have delegations,
> of Bot to Old, with respect to otherwise usually the actions of Old,
> to indicate correspondence.
>
> Then, with regards to review, it would sort of depend on some Old
> or Off authors reviewing Pur, with regards to review and/or correspondence,
> what results graduating Non to New, then that it results that
> there's exactly a sort of usual write-once-read-many, common
> backing store well-defined by presence in access (according to filesystem).
>
>
>
> Then, for the groups files, it's figured there's the main message-Id's,
> as with respect to cur/pur/raw, then with regards to author's on the
> groups, presence in the authors files indicating Old, then with regards
> to graduation Non to New and New to Old.
>
> Keeping things simple, then the idea is to make it so that usual New
> have a way to graduate from Non, where there is or isn't much traffic
> or is or isn't much attention paid to Pur.
>
> The idea is that newbies log on to Pur, then post there on their own
> or in replies to New/Old/Off, that thus far this is entirely of a monadic
> or pure function the routine, which is thusly compile-able and parallelizable,
> and about variables in effect, what result site policy, and error modes.
>
>
> There's an idea that Non's could reply to their own posts,
> as to eventually those graduating altogether, or for example
> just that posting is allowed, to Pur, until marked either New or Bad.
>
>
> The ratio of Bad+Non+Bot to Old+Off+New, basically has that it's figured
> that due to attacks like the one currently underway from Google Groups,
> would be non-zero. The idea then is whether to grow the groups file,
> in the sequence of all message-IDs, and whether to maintain one edition
> of the groups file, and ever modify it in place, that here the goal is instead
> growing files of write-once-read-many, and because propagation is permanent.
>
> Raw >= Pur >= Cur
>
> I.e., every message-id gets a line in the raw feed, that there is one, then as
> with regards to whether the line has reserved characters, where otherwise
> it's a fixed-length record up above the maximum length of message-id,
> the line, of the groups file, the index of its message-numbers.
>
>
> See, the idea here is a sort of reference implementation, and a normative implementation,
> in what are fungible and well-defined resources, here files, with reasonable performance
> and horizontal scale-ability and long-time performance with minimal or monotone maintenance.
>
> Then the files are sort of defined as either write-once and final or write-once and growing,
> given that pretty much unbounded file resources result a quite most usual runtime.
>
>
>
> Don't they already have one of these somewhere?


I suppose the idea is to have that Noobs post to alt.test, then as with regards to
various forms to follow, like:

I read the charter
I demonstrated knowledge of understanding the charter's definitions and intent
I intend to follow the charter

How I do or don't is my own business, how others do or don't is their own business

I can see the exclusion rules
I understand not to post against the exclusion rules
I understand that the exclusion rules are applied unconditionally to all

... is basically for a literacy test and an etiquette assertion.


Basically making for shepherding Noobs through alt.test, or that people who post
in alt.test aren't Noobs, yet still I'm not quite sure how to make it for usual first-time
posters, how to get them out of Purgatory to New. (Or ban them to Bad.)

This is where federated ingestion basically will have that in-feeds are either

these posts are good,
these posts are mixed,
these posts are bad,

with regards then to putting them variously in Cur, Pur, Raw.

Then, there's sort exclusions and bans, with regards to posts, and authors.
This is that posts are omitted by exclusion, authors' posts are omitted by ban.

Then, trying to associate all the author's of a mega-nym, in this case
the Google's spam flood to make a barrier-to-entry of having open communications,
is basically attributing those as a class those authors to a banned mega-nym.

Yet, then there is the use case of identity fraud's abuses, disabusing an innocent dupe,
where logins basically got hacked or the path to return to innocence.


This sort of results a yes/no/maybe for authors, sort of like:

yes, it's a known author, it's unlikely they are really bad
(... these likely frauds are Non's?)

no, it's a known excluded post, open rules
no, it's a known excluded author, criminal or a-topical solicitation
no, it's a new excluded author, associated with an abstract criminal or a-topical solicitation

maybe (yes), no reason why not

that a "rules engine" is highly efficient deriving decisions yes/no/maybe,
in both execution and maintenance of the rules (data plane / control plane).

Groups like sci.math have a very high bar to participation, literacy
in mostly English and the language of mathematics. Groups have
a very low bar to pollution, all else.

So, figuring out a common "topicality standard", here is the idea to associate
concepts with charter with topicality, then for of course a very loose and
egalitarian approach to participation, otherwise free.

(Message integrity, irrepudiability, free expression, free press, free speech,
not inconsequence, nor the untrammeled.)
Ross Finlayson
2024-01-28 17:21:40 UTC
Reply
Permalink
On Friday, January 26, 2024 at 12:37:52 PM UTC-8, Ross Finlayson wrote:
> On Wednesday, January 24, 2024 at 7:12:29 PM UTC-8, Ross Finlayson wrote:
> > On Wednesday, January 24, 2024 at 5:21:09 PM UTC-8, Ross Finlayson wrote:
> > > On Wednesday, January 24, 2024 at 11:41:22 AM UTC-8, Alan Mackenzie wrote:
> > > > Ross Finlayson <***@gmail.com> wrote:
> > > >
> > > > [ .... ]
> > > > > Basically thinking about a "backing file format convention".
> > > >
> > > > > The message ID's are universally unique. File-systems support various
> > > > > counts and depths of sub-directories. The message ID's aren't
> > > > > necessarily opaque structurally as file-names. So, the first thing is
> > > > > a function that given a message-ID, results a message-ID-file-name.
> > > >
> > > > > Then, as it's figured that groups, are, separable, is about how, to,
> > > > > either have all the messages in one store, or, split it out by groups.
> > > > > Either way the idea is to convert the message-ID-file-name, to a given
> > > > > depth of directories, also legal in file names, so it results that the
> > > > > message's get uniformly distributed in sub-directories of approximately
> > > > > equal count and depth.
> > > >
> > > > > A....D...G <- message-ID
> > > >
> > > > > ABCDEFG <- message-ID-file-name
> > > >
> > > > > /A/B/C/D/E/F/ABCDEFG <- message-ID-directory-path
> > > >
> > > > > So, the idea is that the backing file format convention, basically
> > > > > results uniform lookup of a file's existence, then about ingestion and
> > > > > constructing a message, then, moving that directory as a link in the
> > > > > filesystem, so it results atomicity in the file system that supports
> > > > > that the existence of a message-ID-directory-path is a function of
> > > > > message-ID, and usual filesystem guarantees.
> > > > [ .... ]
> > > > > Then, threads and the message numbers, where threading by message
> > > > > number is the
> > > >
> > > > > header + numbers + body
> > > >
> > > > > the numbers part, sort of is for open and closed threads, here though
> > > > > of course that threads are formally always open, or about composing
> > > > > threads of those as over them being partitioned in usual reasonable
> > > > > times, for transient threads and long-winded threads and recurring
> > > > > threads.
> > > >
> > > > > Then, besides "control" and "junk" and such or relating administration,
> > > > > is here for the sort of minimal administration that results this NOOBNB
> > > > > curation. This and matters of relay ingestion and authoring ingestion
> > > > > and ingestion as concatenation of BFF files, is about these kinds of
> > > > > things.
> > > > What you've described is known as a news server, and several well
> > > > established ones exist. You're trying to reinvent the wheel.
> > > >
> > > > --
> > > > Alan Mackenzie (Nuremberg, Germany).
> > > Yeah, when there's a single point of ingress, is pretty much simpler than
> > > when there's federated ingress, or here NNTP peerage, vis-a-vis a site's
> > > own postings.
> > >
> > > Here it's uncomplicated when all messages get propagated to all peers,
> > > with the idea that NOOBNB pattern is going to ingest raw and result curated
> > > (curated, cured, cur).
> > >
> > >
> > > How to figure out for each incoming item, whether to have System Tag Bot
> > > result appending another item marking it, or, just storing a stub for the
> > > item as excluded, gets into "deep inspection", or as related to the things.
> > >
> > > Because Usenet is already an ongoing concern, it's sort of easy to identify
> > > old posters already, then about the issue of handling New/Non, and as
> > > with regards to identifying Bad, as what it results Cur is New/Old/Off
> > > and Raw includes Bot/Non/Bad, or rather that it excludes Bot/Non/Bad,
> > > with regards to whether the purpose of Bot is to propagate Bans.
> > >
> > >
> > > It's sort of expected that the Author field makes for a given Author,
> > > but some posters for example mutilate the e-mail address or result
> > > something non-unique. Disambiguating those, then, is for the idea
> > > that either the full contents of the Author field make a thing or that
> > > otherwise Authors would need to make some way to disambiguate Sender.
> > >
> > > About propagation and stubbing, the idea is that propagation should
> > > generally result, then that presence of articles or stubs either way
> > > results the relevant response code, as with regards to either
> > > "propagating raw including Non and Bad" or just "propagating Raw
> > > only Non-Tag and Bad-Tag Tag-Bot, generated messages", basically
> > > with the idea of semantics of "control" and "junk", or "just ignore it".
> > >
> > >
> > > The use case of lots of users of Usenet isn't a copy of Usenet, just
> > > a few relevant groups. Others for example appreciate all the _belles lettres_
> > > of text, and nothing from binaries. Lots of users of Usenet have it
> > > as mostly a suck-feed of warez and vice. Here I don't much care about
> > > except _belles lettres_.
> > >
> > >
> > > So, here NOOBNB is a sort of white-list approach, because Authors is
> > > much less than messages, to relate incoming messages, to Authors,
> > > per group, here that ingestion is otherwise constant-rate for assigning
> > > numbers in the groups a message is in, then as with regards to threading
> > > and bucketing, about how to result these sorts ideas sort of building up
> > > from "the utility of bitmaps" to this "patterns in range" and "region calculus",
> > > here though what's to result partially digested intermediate results for an
> > > overall concatenation strategy then for selection and analysis,
> > > all entirely write-once-read-many.
> > >
> > > It's figured that Authors will write and somebody will eventually read them,
> > > with regards to that readings and replies result the Author born as New
> > > and then maturing to Old, what results after Author infancy, to result
> > > a usual sort of idea that Authors that read Bad are likely enough Bad themselves.
> > >
> > > I.e., there's a sort of hysteresis to arrive at born as New, in a group,
> > > then a sort of gentle infancy to result Old, or Off, in a group, as
> > > with regards to the purgatory of Non or banishment of Bad.
> > >
> > > happy case:
> > > Non -> New -> Old (good)
> > > Non -> Bad (bad)
> > >
> > > Old -> Off
> > > Off -> Old
> > >
> > >
> > > The idea's that nobody's a moderator, but anybody's a reviewer,
> > > and correspondent, then that correspondents to spam or Bad get
> > > the storage of a signed quantity, about the judgment, of what
> > > is spam, in the error modes.
> > >
> > > error modes:
> > > Non -> false New
> > > Non -> false not Bad
> > >
> > >
> > > New -> Bad
> > > Old -> Bad
> > >
> > > (There's that reviewers and correspondents
> > > Old <-> Old
> > > Off <-> Old
> > > Old <-> Off
> > > Off <-> Off
> > > result those are all same O <-> O.)
> > >
> > > The idea's that nobody's a moderator, and furthermore then all
> > > the rules of the ignorance of Non and banishment of Bad,
> > > then though are as how to arrive at that Non's, get a chance
> > > to be reviewed by Old/Off and New, with respect to New and New
> > > resulting also the conditions of creation, of a group, vis-a-vis,
> > > the conditions of continuity, of a group.
> > >
> > >
> > > I.e. the relations should so arise that creating a group and posting
> > > to it, should result "Originator" or a sort of class of Old, about these
> > > ideas of the best sort of reasonable performance and long-lived scalability
> > > and horizontal scalability, that results interpreting any usual sort of
> > > messaging with message-ID's and authors, in a reference algorithm
> > > and error-detection and error-correction, "NOOBNB".
> > >
> > > There's an idea that Bot replies to new posters, "the Nota Bene",
> > > but, another that Bot replies to Non and Bad, and another that
> > > there's none of that at all, or not guaranteed.
> > >
> > >
> > > Then, the idea is that this is matters of convention and site policy,
> > > what it results exactly the same as a conformant Usenet peer,
> > > in "NOOBNB compeering: slightly less crap".
> > >
> > >
> > > Then, getting into relating readings (reviews) and correspondence
> > > as a matter of site policy in readings or demonstration in correspondence,
> > > results largely correspondence discriminates Old from Bad, and New from Non.
> > >
> > > Then as "un-moderated" there's still basically "site-policy",
> > > basically in layers that result "un-abuse", "dis-abuse".
> > >
> > > I.e. the disabusement of abuse, is of this Old <-> Off for the venial,
> > > and about the ceremony of infancy via some kind of interaction
> > > or the author's own origination, about gating New, then figuring
> > > that New matures to Old and then the compute cost is on News,
> > > that long-running conversations result constants, called stability.
> > >
> > > Well I'm curious your opinion of this sort of approach, it's basically as of
> > > defining conventions of common messaging, what result a simplest
> > > and most-egalitarian common resource of correspondents in _belles lettres_.
> > Then it seems the idea is to have _three_ editions,
> >
> > Cur: current, curated, New/Old/Off
> > Pur: purgatory, Non/New/Old/Off
> > Raw: raw, Non/New/Old/Off/Bot/Bad
> >
> > Then, the idea for bot, seems to be for system, to have delegations,
> > of Bot to Old, with respect to otherwise usually the actions of Old,
> > to indicate correspondence.
> >
> > Then, with regards to review, it would sort of depend on some Old
> > or Off authors reviewing Pur, with regards to review and/or correspondence,
> > what results graduating Non to New, then that it results that
> > there's exactly a sort of usual write-once-read-many, common
> > backing store well-defined by presence in access (according to filesystem).
> >
> >
> >
> > Then, for the groups files, it's figured there's the main message-Id's,
> > as with respect to cur/pur/raw, then with regards to author's on the
> > groups, presence in the authors files indicating Old, then with regards
> > to graduation Non to New and New to Old.
> >
> > Keeping things simple, then the idea is to make it so that usual New
> > have a way to graduate from Non, where there is or isn't much traffic
> > or is or isn't much attention paid to Pur.
> >
> > The idea is that newbies log on to Pur, then post there on their own
> > or in replies to New/Old/Off, that thus far this is entirely of a monadic
> > or pure function the routine, which is thusly compile-able and parallelizable,
> > and about variables in effect, what result site policy, and error modes.
> >
> >
> > There's an idea that Non's could reply to their own posts,
> > as to eventually those graduating altogether, or for example
> > just that posting is allowed, to Pur, until marked either New or Bad.
> >
> >
> > The ratio of Bad+Non+Bot to Old+Off+New, basically has that it's figured
> > that due to attacks like the one currently underway from Google Groups,
> > would be non-zero. The idea then is whether to grow the groups file,
> > in the sequence of all message-IDs, and whether to maintain one edition
> > of the groups file, and ever modify it in place, that here the goal is instead
> > growing files of write-once-read-many, and because propagation is permanent.
> >
> > Raw >= Pur >= Cur
> >
> > I.e., every message-id gets a line in the raw feed, that there is one, then as
> > with regards to whether the line has reserved characters, where otherwise
> > it's a fixed-length record up above the maximum length of message-id,
> > the line, of the groups file, the index of its message-numbers.
> >
> >
> > See, the idea here is a sort of reference implementation, and a normative implementation,
> > in what are fungible and well-defined resources, here files, with reasonable performance
> > and horizontal scale-ability and long-time performance with minimal or monotone maintenance.
> >
> > Then the files are sort of defined as either write-once and final or write-once and growing,
> > given that pretty much unbounded file resources result a quite most usual runtime.
> >
> >
> >
> > Don't they already have one of these somewhere?
> I suppose the idea is to have that Noobs post to alt.test, then as with regards to
> various forms to follow, like:
>
> I read the charter
> I demonstrated knowledge of understanding the charter's definitions and intent
> I intend to follow the charter
>
> How I do or don't is my own business, how others do or don't is their own business
>
> I can see the exclusion rules
> I understand not to post against the exclusion rules
> I understand that the exclusion rules are applied unconditionally to all
>
> ... is basically for a literacy test and an etiquette assertion.
>
>
> Basically making for shepherding Noobs through alt.test, or that people who post
> in alt.test aren't Noobs, yet still I'm not quite sure how to make it for usual first-time
> posters, how to get them out of Purgatory to New. (Or ban them to Bad.)
>
> This is where federated ingestion basically will have that in-feeds are either
>
> these posts are good,
> these posts are mixed,
> these posts are bad,
>
> with regards then to putting them variously in Cur, Pur, Raw.
>
> Then, there's sort exclusions and bans, with regards to posts, and authors.
> This is that posts are omitted by exclusion, authors' posts are omitted by ban.
>
> Then, trying to associate all the author's of a mega-nym, in this case
> the Google's spam flood to make a barrier-to-entry of having open communications,
> is basically attributing those as a class those authors to a banned mega-nym.
>
> Yet, then there is the use case of identity fraud's abuses, disabusing an innocent dupe,
> where logins basically got hacked or the path to return to innocence.
>
>
> This sort of results a yes/no/maybe for authors, sort of like:
>
> yes, it's a known author, it's unlikely they are really bad
> (... these likely frauds are Non's?)
>
> no, it's a known excluded post, open rules
> no, it's a known excluded author, criminal or a-topical solicitation
> no, it's a new excluded author, associated with an abstract criminal or a-topical solicitation
>
> maybe (yes), no reason why not
>
> that a "rules engine" is highly efficient deriving decisions yes/no/maybe,
> in both execution and maintenance of the rules (data plane / control plane).
>
> Groups like sci.math have a very high bar to participation, literacy
> in mostly English and the language of mathematics. Groups have
> a very low bar to pollution, all else.
>
> So, figuring out a common "topicality standard", here is the idea to associate
> concepts with charter with topicality, then for of course a very loose and
> egalitarian approach to participation, otherwise free.
>
> (Message integrity, irrepudiability, free expression, free press, free speech,
> not inconsequence, nor the untrammeled.)


Well, "what is spam", then, I suppose sort of follows from the
"spam is a word coined on Usenet for unsolicated a-topical posts",
then the ideas about how to find spam, basically make for that
there are some ways to identify these things.

The ideas of
cohort: a group, a thread, a poster
cliques: a group, posts that reply to each other

Then
content: words and such
clicks: links

Here the idea is to categorize content according to cohorts and cliques,
and content and clicks,

It's figured that all spam has clicks in it, then though that of course clicks
are the greatest sort of thing for hypertext, with regards to

duplicate links
duplicate domains

and these sorts of things.

The idea is that it costs resources to categorize content, is according
to the content, or the original idea that "spam must be identified by
its subject header alone", vis-a-vis the maintenance of related data,
and the indicator of matching various aspects of relations in data.

So, clicks seem the first way to identify spam, basically that a histogram
of links by their domain and path, results duplicates are spam, vis-a-vis,
that clicks in a poster's sig or repeated many times in a long thread, are not.

In this sense there's that posts are collections of their context,
about how to make an algorithm in best effort to relate context
to the original posts, usually according to threading.

The idea here is that Non's can be excluded when first of all they
have links, then for figuring that each group has usual sites that
aren't spam, like their youtube links or their doc repo links or their
wiki links or their arxiv or sep or otherwise, usual sorts good links,
while that mostly it's the multiplicity of links that represent a spam attack,
then just to leave all those in Purgatory.

It's figured then that good posters when they reach Old, pretty much
are past spamming, then about that posters are New for quite a while,
and have some readings or otherwise mature into Old, about that
simply Old and Off posters posts go right through, New posters posts
go right through, then to go about categorizing for spam, excluding spam.


I.e., the "what is spam", predicate, is to be an open-rules sort of composition,
that basically makes it so that spamverts would be ineffective because
spammers exploit lazy and if their links don't go through, get nothing.

Then, there's still "what is spam" with regards to just link-less spam,
about that mostly it would be about "repeated junk", that "spam is not unique".
This is the usual notion of "signal to noise", basically finding whether
it's just noise in Purgatory, that signal in Purgatory is a good sign of New.

So, "what is spam" is sort of "what is not noise". Again, the goal is open-rules
normative algorithms that operate on write-once-read-many graduated feeds,
what result that the Usenet compeering, curates its federated ingress, then
as for feeding its out-feed, with regards to other Usenet compeers following
the same algorithm, then would get the same results.

Then, the file-store might still have copies of all the spams, with the idea then
that it's truncatable, because spam-campaigns are not long-running for archival,
then to drop the partitions of Purgatory and Raw, according to retention.
This then also is for fishing out what are Type I / Type II errors, about promoting
from Non to New or also about the banishment of Non to Bad, or, Off to Bad.
I.e., there's not so much "cancel", yet there's still for "no-archive", about how
to make it open and normative how these kinds of things are.

Luckily the availability of unbounded in size filesystems is pretty large these days,
and, implementing things write-once-read-many, makes for pretty simple routines
that make maintenance.


It's like "whuh how do I monetize that?" and it's like "you don't", and "you figure
that people will buy into free speech, free association, and free press".
You can make your own front-end and decorate it with what spam you want,
it just won't get federated back in the ingress of this Usenet Compeerage.

Then it's like "well I want to only see Archimedes Plutonium and his co-horts"
then there's the idea that there's to be generated some files with relations,
the summaries and histrograms, then for those to be according to time-series
buckets, making tractable sorts metadata partially digested, then for making
digests of those, again according to normative algorithms with well-defined
access patternry and run-times, according to here pretty a hierarchical file-system.
Again it's sort of a front-end thing, with surfacing either the back-end files
or the summaries and digests, for making search tractable in many dimensions.

So, for the cohort, seems for sort of accumulated acceptance and rejection,
about accepters and rejectors and the formal language of hierarchical data
that's established by its presence and maintenance, about "what is spam"
according to the entire cohort, and cliques, then with regards to Old/Off
and spam or Non, with regards to spam and Bad.

So, "what is spam" is basically that whatever results excluded was spam.
Ross Finlayson
2024-02-03 17:25:50 UTC
Reply
Permalink
On 01/28/2024 09:21 AM, Ross Finlayson wrote:
> On Friday, January 26, 2024 at 12:37:52 PM UTC-8, Ross Finlayson wrote:
>> On Wednesday, January 24, 2024 at 7:12:29 PM UTC-8, Ross Finlayson wrote:
>>> On Wednesday, January 24, 2024 at 5:21:09 PM UTC-8, Ross Finlayson wrote:
>>>> On Wednesday, January 24, 2024 at 11:41:22 AM UTC-8, Alan Mackenzie wrote:
>>>>> Ross Finlayson <***@gmail.com> wrote:
>>>>>
>>>>> [ .... ]
>>>>>> Basically thinking about a "backing file format convention".
>>>>>
>>>>>> The message ID's are universally unique. File-systems support various
>>>>>> counts and depths of sub-directories. The message ID's aren't
>>>>>> necessarily opaque structurally as file-names. So, the first thing is
>>>>>> a function that given a message-ID, results a message-ID-file-name.
>>>>>
>>>>>> Then, as it's figured that groups, are, separable, is about how, to,
>>>>>> either have all the messages in one store, or, split it out by groups.
>>>>>> Either way the idea is to convert the message-ID-file-name, to a given
>>>>>> depth of directories, also legal in file names, so it results that the
>>>>>> message's get uniformly distributed in sub-directories of approximately
>>>>>> equal count and depth.
>>>>>
>>>>>> A....D...G <- message-ID
>>>>>
>>>>>> ABCDEFG <- message-ID-file-name
>>>>>
>>>>>> /A/B/C/D/E/F/ABCDEFG <- message-ID-directory-path
>>>>>
>>>>>> So, the idea is that the backing file format convention, basically
>>>>>> results uniform lookup of a file's existence, then about ingestion and
>>>>>> constructing a message, then, moving that directory as a link in the
>>>>>> filesystem, so it results atomicity in the file system that supports
>>>>>> that the existence of a message-ID-directory-path is a function of
>>>>>> message-ID, and usual filesystem guarantees.
>>>>> [ .... ]
>>>>>> Then, threads and the message numbers, where threading by message
>>>>>> number is the
>>>>>
>>>>>> header + numbers + body
>>>>>
>>>>>> the numbers part, sort of is for open and closed threads, here though
>>>>>> of course that threads are formally always open, or about composing
>>>>>> threads of those as over them being partitioned in usual reasonable
>>>>>> times, for transient threads and long-winded threads and recurring
>>>>>> threads.
>>>>>
>>>>>> Then, besides "control" and "junk" and such or relating administration,
>>>>>> is here for the sort of minimal administration that results this NOOBNB
>>>>>> curation. This and matters of relay ingestion and authoring ingestion
>>>>>> and ingestion as concatenation of BFF files, is about these kinds of
>>>>>> things.
>>>>> What you've described is known as a news server, and several well
>>>>> established ones exist. You're trying to reinvent the wheel.
>>>>>
>>>>> --
>>>>> Alan Mackenzie (Nuremberg, Germany).
>>>> Yeah, when there's a single point of ingress, is pretty much simpler than
>>>> when there's federated ingress, or here NNTP peerage, vis-a-vis a site's
>>>> own postings.
>>>>
>>>> Here it's uncomplicated when all messages get propagated to all peers,
>>>> with the idea that NOOBNB pattern is going to ingest raw and result curated
>>>> (curated, cured, cur).
>>>>
>>>>
>>>> How to figure out for each incoming item, whether to have System Tag Bot
>>>> result appending another item marking it, or, just storing a stub for the
>>>> item as excluded, gets into "deep inspection", or as related to the things.
>>>>
>>>> Because Usenet is already an ongoing concern, it's sort of easy to identify
>>>> old posters already, then about the issue of handling New/Non, and as
>>>> with regards to identifying Bad, as what it results Cur is New/Old/Off
>>>> and Raw includes Bot/Non/Bad, or rather that it excludes Bot/Non/Bad,
>>>> with regards to whether the purpose of Bot is to propagate Bans.
>>>>
>>>>
>>>> It's sort of expected that the Author field makes for a given Author,
>>>> but some posters for example mutilate the e-mail address or result
>>>> something non-unique. Disambiguating those, then, is for the idea
>>>> that either the full contents of the Author field make a thing or that
>>>> otherwise Authors would need to make some way to disambiguate Sender.
>>>>
>>>> About propagation and stubbing, the idea is that propagation should
>>>> generally result, then that presence of articles or stubs either way
>>>> results the relevant response code, as with regards to either
>>>> "propagating raw including Non and Bad" or just "propagating Raw
>>>> only Non-Tag and Bad-Tag Tag-Bot, generated messages", basically
>>>> with the idea of semantics of "control" and "junk", or "just ignore it".
>>>>
>>>>
>>>> The use case of lots of users of Usenet isn't a copy of Usenet, just
>>>> a few relevant groups. Others for example appreciate all the _belles lettres_
>>>> of text, and nothing from binaries. Lots of users of Usenet have it
>>>> as mostly a suck-feed of warez and vice. Here I don't much care about
>>>> except _belles lettres_.
>>>>
>>>>
>>>> So, here NOOBNB is a sort of white-list approach, because Authors is
>>>> much less than messages, to relate incoming messages, to Authors,
>>>> per group, here that ingestion is otherwise constant-rate for assigning
>>>> numbers in the groups a message is in, then as with regards to threading
>>>> and bucketing, about how to result these sorts ideas sort of building up
>>>> from "the utility of bitmaps" to this "patterns in range" and "region calculus",
>>>> here though what's to result partially digested intermediate results for an
>>>> overall concatenation strategy then for selection and analysis,
>>>> all entirely write-once-read-many.
>>>>
>>>> It's figured that Authors will write and somebody will eventually read them,
>>>> with regards to that readings and replies result the Author born as New
>>>> and then maturing to Old, what results after Author infancy, to result
>>>> a usual sort of idea that Authors that read Bad are likely enough Bad themselves.
>>>>
>>>> I.e., there's a sort of hysteresis to arrive at born as New, in a group,
>>>> then a sort of gentle infancy to result Old, or Off, in a group, as
>>>> with regards to the purgatory of Non or banishment of Bad.
>>>>
>>>> happy case:
>>>> Non -> New -> Old (good)
>>>> Non -> Bad (bad)
>>>>
>>>> Old -> Off
>>>> Off -> Old
>>>>
>>>>
>>>> The idea's that nobody's a moderator, but anybody's a reviewer,
>>>> and correspondent, then that correspondents to spam or Bad get
>>>> the storage of a signed quantity, about the judgment, of what
>>>> is spam, in the error modes.
>>>>
>>>> error modes:
>>>> Non -> false New
>>>> Non -> false not Bad
>>>>
>>>>
>>>> New -> Bad
>>>> Old -> Bad
>>>>
>>>> (There's that reviewers and correspondents
>>>> Old <-> Old
>>>> Off <-> Old
>>>> Old <-> Off
>>>> Off <-> Off
>>>> result those are all same O <-> O.)
>>>>
>>>> The idea's that nobody's a moderator, and furthermore then all
>>>> the rules of the ignorance of Non and banishment of Bad,
>>>> then though are as how to arrive at that Non's, get a chance
>>>> to be reviewed by Old/Off and New, with respect to New and New
>>>> resulting also the conditions of creation, of a group, vis-a-vis,
>>>> the conditions of continuity, of a group.
>>>>
>>>>
>>>> I.e. the relations should so arise that creating a group and posting
>>>> to it, should result "Originator" or a sort of class of Old, about these
>>>> ideas of the best sort of reasonable performance and long-lived scalability
>>>> and horizontal scalability, that results interpreting any usual sort of
>>>> messaging with message-ID's and authors, in a reference algorithm
>>>> and error-detection and error-correction, "NOOBNB".
>>>>
>>>> There's an idea that Bot replies to new posters, "the Nota Bene",
>>>> but, another that Bot replies to Non and Bad, and another that
>>>> there's none of that at all, or not guaranteed.
>>>>
>>>>
>>>> Then, the idea is that this is matters of convention and site policy,
>>>> what it results exactly the same as a conformant Usenet peer,
>>>> in "NOOBNB compeering: slightly less crap".
>>>>
>>>>
>>>> Then, getting into relating readings (reviews) and correspondence
>>>> as a matter of site policy in readings or demonstration in correspondence,
>>>> results largely correspondence discriminates Old from Bad, and New from Non.
>>>>
>>>> Then as "un-moderated" there's still basically "site-policy",
>>>> basically in layers that result "un-abuse", "dis-abuse".
>>>>
>>>> I.e. the disabusement of abuse, is of this Old <-> Off for the venial,
>>>> and about the ceremony of infancy via some kind of interaction
>>>> or the author's own origination, about gating New, then figuring
>>>> that New matures to Old and then the compute cost is on News,
>>>> that long-running conversations result constants, called stability.
>>>>
>>>> Well I'm curious your opinion of this sort of approach, it's basically as of
>>>> defining conventions of common messaging, what result a simplest
>>>> and most-egalitarian common resource of correspondents in _belles lettres_.
>>> Then it seems the idea is to have _three_ editions,
>>>
>>> Cur: current, curated, New/Old/Off
>>> Pur: purgatory, Non/New/Old/Off
>>> Raw: raw, Non/New/Old/Off/Bot/Bad
>>>
>>> Then, the idea for bot, seems to be for system, to have delegations,
>>> of Bot to Old, with respect to otherwise usually the actions of Old,
>>> to indicate correspondence.
>>>
>>> Then, with regards to review, it would sort of depend on some Old
>>> or Off authors reviewing Pur, with regards to review and/or correspondence,
>>> what results graduating Non to New, then that it results that
>>> there's exactly a sort of usual write-once-read-many, common
>>> backing store well-defined by presence in access (according to filesystem).
>>>
>>>
>>>
>>> Then, for the groups files, it's figured there's the main message-Id's,
>>> as with respect to cur/pur/raw, then with regards to author's on the
>>> groups, presence in the authors files indicating Old, then with regards
>>> to graduation Non to New and New to Old.
>>>
>>> Keeping things simple, then the idea is to make it so that usual New
>>> have a way to graduate from Non, where there is or isn't much traffic
>>> or is or isn't much attention paid to Pur.
>>>
>>> The idea is that newbies log on to Pur, then post there on their own
>>> or in replies to New/Old/Off, that thus far this is entirely of a monadic
>>> or pure function the routine, which is thusly compile-able and parallelizable,
>>> and about variables in effect, what result site policy, and error modes.
>>>
>>>
>>> There's an idea that Non's could reply to their own posts,
>>> as to eventually those graduating altogether, or for example
>>> just that posting is allowed, to Pur, until marked either New or Bad.
>>>
>>>
>>> The ratio of Bad+Non+Bot to Old+Off+New, basically has that it's figured
>>> that due to attacks like the one currently underway from Google Groups,
>>> would be non-zero. The idea then is whether to grow the groups file,
>>> in the sequence of all message-IDs, and whether to maintain one edition
>>> of the groups file, and ever modify it in place, that here the goal is instead
>>> growing files of write-once-read-many, and because propagation is permanent.
>>>
>>> Raw >= Pur >= Cur
>>>
>>> I.e., every message-id gets a line in the raw feed, that there is one, then as
>>> with regards to whether the line has reserved characters, where otherwise
>>> it's a fixed-length record up above the maximum length of message-id,
>>> the line, of the groups file, the index of its message-numbers.
>>>
>>>
>>> See, the idea here is a sort of reference implementation, and a normative implementation,
>>> in what are fungible and well-defined resources, here files, with reasonable performance
>>> and horizontal scale-ability and long-time performance with minimal or monotone maintenance.
>>>
>>> Then the files are sort of defined as either write-once and final or write-once and growing,
>>> given that pretty much unbounded file resources result a quite most usual runtime.
>>>
>>>
>>>
>>> Don't they already have one of these somewhere?
>> I suppose the idea is to have that Noobs post to alt.test, then as with regards to
>> various forms to follow, like:
>>
>> I read the charter
>> I demonstrated knowledge of understanding the charter's definitions and intent
>> I intend to follow the charter
>>
>> How I do or don't is my own business, how others do or don't is their own business
>>
>> I can see the exclusion rules
>> I understand not to post against the exclusion rules
>> I understand that the exclusion rules are applied unconditionally to all
>>
>> ... is basically for a literacy test and an etiquette assertion.
>>
>>
>> Basically making for shepherding Noobs through alt.test, or that people who post
>> in alt.test aren't Noobs, yet still I'm not quite sure how to make it for usual first-time
>> posters, how to get them out of Purgatory to New. (Or ban them to Bad.)
>>
>> This is where federated ingestion basically will have that in-feeds are either
>>
>> these posts are good,
>> these posts are mixed,
>> these posts are bad,
>>
>> with regards then to putting them variously in Cur, Pur, Raw.
>>
>> Then, there's sort exclusions and bans, with regards to posts, and authors.
>> This is that posts are omitted by exclusion, authors' posts are omitted by ban.
>>
>> Then, trying to associate all the author's of a mega-nym, in this case
>> the Google's spam flood to make a barrier-to-entry of having open communications,
>> is basically attributing those as a class those authors to a banned mega-nym.
>>
>> Yet, then there is the use case of identity fraud's abuses, disabusing an innocent dupe,
>> where logins basically got hacked or the path to return to innocence.
>>
>>
>> This sort of results a yes/no/maybe for authors, sort of like:
>>
>> yes, it's a known author, it's unlikely they are really bad
>> (... these likely frauds are Non's?)
>>
>> no, it's a known excluded post, open rules
>> no, it's a known excluded author, criminal or a-topical solicitation
>> no, it's a new excluded author, associated with an abstract criminal or a-topical solicitation
>>
>> maybe (yes), no reason why not
>>
>> that a "rules engine" is highly efficient deriving decisions yes/no/maybe,
>> in both execution and maintenance of the rules (data plane / control plane).
>>
>> Groups like sci.math have a very high bar to participation, literacy
>> in mostly English and the language of mathematics. Groups have
>> a very low bar to pollution, all else.
>>
>> So, figuring out a common "topicality standard", here is the idea to associate
>> concepts with charter with topicality, then for of course a very loose and
>> egalitarian approach to participation, otherwise free.
>>
>> (Message integrity, irrepudiability, free expression, free press, free speech,
>> not inconsequence, nor the untrammeled.)
>
>
> Well, "what is spam", then, I suppose sort of follows from the
> "spam is a word coined on Usenet for unsolicated a-topical posts",
> then the ideas about how to find spam, basically make for that
> there are some ways to identify these things.
>
> The ideas of
> cohort: a group, a thread, a poster
> cliques: a group, posts that reply to each other
>
> Then
> content: words and such
> clicks: links
>
> Here the idea is to categorize content according to cohorts and cliques,
> and content and clicks,
>
> It's figured that all spam has clicks in it, then though that of course clicks
> are the greatest sort of thing for hypertext, with regards to
>
> duplicate links
> duplicate domains
>
> and these sorts of things.
>
> The idea is that it costs resources to categorize content, is according
> to the content, or the original idea that "spam must be identified by
> its subject header alone", vis-a-vis the maintenance of related data,
> and the indicator of matching various aspects of relations in data.
>
> So, clicks seem the first way to identify spam, basically that a histogram
> of links by their domain and path, results duplicates are spam, vis-a-vis,
> that clicks in a poster's sig or repeated many times in a long thread, are not.
>
> In this sense there's that posts are collections of their context,
> about how to make an algorithm in best effort to relate context
> to the original posts, usually according to threading.
>
> The idea here is that Non's can be excluded when first of all they
> have links, then for figuring that each group has usual sites that
> aren't spam, like their youtube links or their doc repo links or their
> wiki links or their arxiv or sep or otherwise, usual sorts good links,
> while that mostly it's the multiplicity of links that represent a spam attack,
> then just to leave all those in Purgatory.
>
> It's figured then that good posters when they reach Old, pretty much
> are past spamming, then about that posters are New for quite a while,
> and have some readings or otherwise mature into Old, about that
> simply Old and Off posters posts go right through, New posters posts
> go right through, then to go about categorizing for spam, excluding spam.
>
>
> I.e., the "what is spam", predicate, is to be an open-rules sort of composition,
> that basically makes it so that spamverts would be ineffective because
> spammers exploit lazy and if their links don't go through, get nothing.
>
> Then, there's still "what is spam" with regards to just link-less spam,
> about that mostly it would be about "repeated junk", that "spam is not unique".
> This is the usual notion of "signal to noise", basically finding whether
> it's just noise in Purgatory, that signal in Purgatory is a good sign of New.
>
> So, "what is spam" is sort of "what is not noise". Again, the goal is open-rules
> normative algorithms that operate on write-once-read-many graduated feeds,
> what result that the Usenet compeering, curates its federated ingress, then
> as for feeding its out-feed, with regards to other Usenet compeers following
> the same algorithm, then would get the same results.
>
> Then, the file-store might still have copies of all the spams, with the idea then
> that it's truncatable, because spam-campaigns are not long-running for archival,
> then to drop the partitions of Purgatory and Raw, according to retention.
> This then also is for fishing out what are Type I / Type II errors, about promoting
> from Non to New or also about the banishment of Non to Bad, or, Off to Bad.
> I.e., there's not so much "cancel", yet there's still for "no-archive", about how
> to make it open and normative how these kinds of things are.
>
> Luckily the availability of unbounded in size filesystems is pretty large these days,
> and, implementing things write-once-read-many, makes for pretty simple routines
> that make maintenance.
>
>
> It's like "whuh how do I monetize that?" and it's like "you don't", and "you figure
> that people will buy into free speech, free association, and free press".
> You can make your own front-end and decorate it with what spam you want,
> it just won't get federated back in the ingress of this Usenet Compeerage.
>
> Then it's like "well I want to only see Archimedes Plutonium and his co-horts"
> then there's the idea that there's to be generated some files with relations,
> the summaries and histrograms, then for those to be according to time-series
> buckets, making tractable sorts metadata partially digested, then for making
> digests of those, again according to normative algorithms with well-defined
> access patternry and run-times, according to here pretty a hierarchical file-system.
> Again it's sort of a front-end thing, with surfacing either the back-end files
> or the summaries and digests, for making search tractable in many dimensions.
>
> So, for the cohort, seems for sort of accumulated acceptance and rejection,
> about accepters and rejectors and the formal language of hierarchical data
> that's established by its presence and maintenance, about "what is spam"
> according to the entire cohort, and cliques, then with regards to Old/Off
> and spam or Non, with regards to spam and Bad.
>
> So, "what is spam" is basically that whatever results excluded was spam.
>
>
>
>
>
>
>

Well, with the great spam-walling of 2024 well underway, it's a bit too
late to setup
very easy personal Internet, but, it's still pretty simple, the Internet
text protocols,
and implementing standards-based network-interoperable systems, and
there are
still even some places where you can plug into the network and run your
own code.

So anyways the problem with the Internet today is that anything that's
public facing
can expect to get mostly not-want-traffic, where the general idea is to
only get want-traffic.

So, it looks like that any sort of public facing port, where TCP/IP
sockets for the connection-oriented
protocols like here the Internet protocols are basically as for the
concept that the two participants
in a client-server or two-way communication are each "host" and "port",
then as for protocol, and
as with respect to binding of the ports and so on or sockets or about
the 7-layer ISO model of
networking abstraction, here it's hosts and ports or what result IP
addresses and packets
destined for ports, those multiplexed and reassembled by the TCP/IP
protocols' stacks on
the usual commodity hardware's operating systems, otherwise as with
respect to network
devices, their addresses as in accords with the network topology's
connection and routing
logic, and that otherwise a connection-oriented protocol is in terms of
listening and ephemeral
ports, with respect to the connection-oriented protocols, theirs sockets
or Address Family UNIX
sockets, and, packets and the TCP/IP protocol semantics of the NICs and
their UARTS, as with
regards to usual intrusive middleware like PPP, NAT, BGP, and other
stuff in the way of IP, IPv4, and IPv6.


Thus, for implementing a server, is basically the idea then that as
simply accepting connections,
then is to implement for the framework, that it has at least enough
knowledge of the semantics
of TCP/IP, and the origin of requests, then as with regards to
implementing a sort of "Load Shed"
or "Load Hold", where Load Shedding is to dump not-want-traffic and Load
Holding is to feed
it very small packets at very infrequent intervals within socket
timeouts, while dropping immediately
anything it sends and using absolutely minimal resources otherwise in
the TCP/IP stack, to basically
give unwanted traffic a connection that never completes, as a sort of
passive-aggressive response
to unwanted traffic. "This light never changes."


So, for Linux it's sockets and Windows it's like WSASocket and Java it's
java.nio.channels.SocketChannel,
about that the socket basically has responsibilities for happy-case
want-traffic, and enemy-case not-want-traffic.


Then, where in my general design for Internet protocol network
interfaces, what I have filled in
here is basically this sort of

Reader -> Scanner -> Executor -> Printer -> Writer

where the notions of the "home office equipment" like the multi-function
device has here that in
metaphor it basically considers the throughput as like a combination
scanner/printer fax-machine,
then the idea is that there needs to be some sort of protection mostly
on the front, basically that
the "Hopper" then has about the infeed and outfeed Hoppers, or with the
Stamper at the end,
figuring the Hopper does Shed/Hold, or Shed/Fold/Hold, while, the
Stamper does the encryption
and compression, about that Encryption and Compression are simply
regular concerns what result
plain Internet protocol text (and, binary) commands in the middle.

Hopper -> Reader -> Scanner -> Executor -> Printer -> Writer

Then, for Internet protocols like, SMTP, NNTP, IMAP, HTTP, usual sorts
request/response client/server
protocols, then I suppose I should wonder about multiplexing
connections, though, HTTP/2 really
is just about multiple calls with pretty much the same session, and
getting into the affinity of sessions,
about client/server protocols, logins, requests/responses, and sessions,
here with the idea of
pretty much implementing a machine, for implementing protocol, for the
half-dozen usual messaging
and web-service protocols mentioned above, and a complement of their
usual options,
implementing a sort of usual process designed to be exposed on its own
port, resulting a
sort shatter-proof protocol implementation, figuring the Internet is an
ugly place and
the Hopper is regularly clearing the shit out of the front.

So anyways, then about how to go about implementing a want-traffic feed
is basically the
white-list approach, from the notion that there is want and not want,
but not to be racist,
basically a want-list approach, and a drop-list. The idea is that you
expect to get email from
people you've sent email, or their domain, and then, sometimes when you
plan to expect an
email, then the idea is to just maintain a window and put in terms what
you expect to get or
expect to have recently gotten, then to fish those out from all the
trash, basically over time
to put in the matches for the account, that messages to the account,
given matches surface
the messages, otherwise pretty much just maintaining a rotating queue of
junk that dumps
off the junk when it rotates, while basically having a copy of the
incoming junk, for as
necessary looking through the junk for the valuable message.


The Internet protocols then for what they are the messaging level or
user land, of the user-agents,
have a great affinity and common implementation.

SMTP -> POP|IMAP

IMAP -> NNTP

NNTP
HTTP -> NNTP
HTTP -> IMAP -> NNTP

SMTP -> NNTP
NNTP -> SMTP


I'm really quite old-fashioned, and sort of rely on natural written
language, while, still, there's
the idea that messages are arbitrarily large and of arbitrary format and
of arbitrary volume
over an arbitrary amount of time, or 'unbounded' if '-trary' sounds too
much like 'betrayedly',
with the notion that there's basically for small storage and large
storage, and small buffers
and large buffers, and bounds, called quota or limits, so to result that
usual functional message
passing systems among small groups of people using modest amounts of
resources can distance
themselves from absolute buffoon's HDTV'ing themselves picking their nose.

So, back to the Hopper, or Bouncer, then the idea is that everything
gets in an input queue,
because, spam-walls can't necessarily be depended on to let in the
want-traffic. Then the
want-list (guest-list) is used to bring those in to sort of again what
results this, "NOOBNB",
layout, so it sort of results again a common sort of "NOOBNB BFF/SFF",
layout, that it results
the layout can be serialized and tore down and set back up and commenced
same, serialized.

Then, this sort of "yes/no/maybe" (sure/no/yes, "wildmat"), has the idea
of that still there
can be consulted any sorts accepters/rejectors, and it builds a sort of
easy way to make
for the implementation, that it can result an infeed and conformant
agent, on the network,
while both employing opt-in sort spam-wall baggage, or, just winging it
and picking ham deliberately.

In this manner NOOBNB is sort of settling into the idea of the physical
layout, then for the
idea of this Load: Roll/Fold/Shed/Hold, is for sorts policies of
"expect happy case", "expect
usual case", "forget about it", and "let them think about it".

The idea here is sort of to design modes of the implementation of the
protocols, in
simple and easy-to-remember terms like "NOOBNB", "BFF/SFF",
"Roll/Fold/Shed/Hold",
what results pragmatic and usual happy-case Internet protocols, on an
Internet full
of fat-cats spam-walling each other, getting in the way of the ham.
(That "want" is ham,
and "not-want" is spam.) "Ham is not spam, spam is spiced canned ham."


Then, after the Internet protocols sitting behind a port on a host with
an address,
and that the address is static or dynamic in the usual sense, but that
every host has one,
vis-a-vis networks and routing, then the next thing to figure out is
DNS, the name of
the host, with respect to the overall infrastructure of the
implementation of agents,
in the protocols, on the network, in the world.

Then, I don't know too much about DNS, as with respect to that in the
old days it was sort
of easy to register in DNS, that these days becoming a registrar is
pretty involved, so after
hiring some CPU+RAM+DISK+NET sitting on a single port (then for its
ephemeral connections
as up above that, but ports entirely in the protocol), with an address,
is how to get traffic
pointed at the address, by surfacing its address in DNS, or, just making
an intermediary service
for the discovery of addresses and ports and configuring one's own DNS
resolver, but here
of course to keep things simple for publicly-facing services that are
good actors on the network
and in Internet protocols.

So I don't know too much about DNS, and it deserves some more study.
Basically the DNS resolver
algorithm makes lookups into a file called "the DNS file" and thusly a
DNS resolver results
addresses or lookup hosts for addresses and sorts of DNS records, like
the "Mail Exchanger" record,
or "the A record", "the CNAME", "various text attributes", "various
special purpose attributes",
then that DNS resolvers will mostly look those up to point their proxies
they insert to it,
then present those as addresses at the DNS resolver. (Like I said, the
Internet protocols
are pretty simple.)

So, for service discovery pretty much, it looks like the DNS
"authoritative name server",
basically is to be designed for the idea that there are two user-agents
that want to connect,
over the Internet, and they're happy, then anything else that connects,
is usual, so there's
basically the idea that the authoritative name server, is to work itself
up in the DNS protocols,
so it results that anybody using the addresses of its names will have
found itself with some
reverse lookups or something like that, helping meet in the middle.

https://en.wikipedia.org/wiki/Domain_Name_System

RR Resource Records
SOA Start of Authority
A, AAAA IP addresses
MX, Mail Exchanger
NS, Name Server
PTR, Reverse DNS Lookups
CNAME, domain name aliases

RP Responsible Person
DNSSEC
TXT ...

("Unsolicited email"? You mean lawyers and whores won't even touch them?)

So, DNS runs over both UDP and TCP, so, there's for making that the Name
Server,
is basically that anybody who comes looking for a domain, it should
result that
then there's the high-availability Name Server, special-purpose for
managing
address resolution, and as within the context of name cache-ing, with
regards
to personal Internet services designed to run reliably and correctly in
a more-or-less
very modest and ad-hoc fashion. (Of primary importance of any Internet
protocol
implementation is to remain a good actor on the network, of course among
other
important things like protecting the users the agents their persons.)

https://en.wikipedia.org/wiki/BIND

"BIND 9 is intended to be fully compliant with the IETF DNS standards
and draft standards."

https://datatracker.ietf.org/wg/dnsop/documents/

Here the point seems to be to make it mostly so that response fit in a
single
user datagram or packet, with regards to UDP implementation, while TCP
implementation is according to this sort of "HRSEPW" throughput model.

I.e. mostly the role here is for personal Internet services, not
surfacing a
vended layer of a copy of the Internet for a wide proxy all snuffling
the host.
(Though, that has also its role, for example creating wide and deep traffic
sniffing, and for example buddy-checking equivalent views of the network,
twisting up TLS exercises and such. If you've read the manuals, ....)


Lots of the DNS standards these days are designed to aid the giants,
from clobbering each other, here the goal mostly is effective
industrious ants,
effective industrious and idealistic ants, dedicated to their gents.


So, "dnsops" is way too much specifications to worry about, instead just
reading through those to arrive at what's functionally correct,
and peels away to be correct backwards.

https://datatracker.ietf.org/doc/draft-ietf-dnsop-rfc8499bis/

"The Domain Name System (DNS) is defined in literally dozens of
different RFCs."

Wow, imagine the reading, ....

"This document updates RFC 2308 by clarifying the definitions of
"forwarder" and "QNAME"."


"In this document, the words "byte" and "octet" are used interchangably. "


"Any path of a directed acyclic graph can be
represented by a domain name consisting of the labels of its
nodes, ordered by decreasing distance from the root(s) (which is
the normal convention within the DNS, including this document)."

The goal seems implementation of a Name Server with quite correct cache-ing
and currency semantics, TTLs, and with regards to particularly the Mail
Exchanger,
reflecting on a usual case of mostly receiving in a spam-filled
spam-walled world,
while occasionally sending or posting in a modest and personal fashion,
while
in accords with what protocols, result well-received ham.

"The header of a DNS message is its first 12 octets."

"There is no formal definition of "DNS server", but RFCs generally
assume that it is an Internet server that listens for queries and
sends responses using the DNS protocol defined in [RFC1035] and its
successors."

So, it seems that for these sorts of personal Internet services, then
the idea
is that a DNS Name Server is the sort of long-running and highly-available
thing to provision, with regards to it being exceedingly small and fast,
and brief in implementation, then as with regards to it tenanting the
lookups
for the various and varying, running on-demand or under-expectations.
(Eg, with the sentinel pattern or accepting a very small amount of traffic
while starting up a larger dedicated handler, or making for the sort of
sentinel-to-wakeup or wakeup-on-service pattern.)

https://en.wikipedia.org/wiki/DNS_Certification_Authority_Authorization
https://en.wikipedia.org/wiki/Incident_Object_Description_Exchange_Format


Then it looks like I'm supposed to implement Session Initiation Protocol,
and have it do service discovery and relation or Dynamic DNS, but I sort of
despise Session Initiation Protocol as it's so abused and twisted, yet,
there's
some idea to make a localhost server that fronts personal Internet agents
that could drive off either SIP or DDNS, vis-a-vis starting up the
agents on demand,
as with respect to running the agents essentially locally and making
peer-to-peer.

https://en.wikipedia.org/wiki/Zero-configuration_networking#DNS-based_service_discovery

But, it's simplest to just have a static IP and then run the agents as
an MTA,
here given that the resources are so cheap that personal Internet agents
is economical,
or as where anything resolves to a host and a well-known port, to
virtualize that
to well known ports at an address.

PIA: in the interests of PII.
Ross Finlayson
2024-02-08 21:04:11 UTC
Reply
Permalink
On 03/08/2023 08:51 PM, Ross Finlayson wrote:
> On Monday, December 6, 2021 at 7:32:16 AM UTC-8, Ross A. Finlayson wrote:
>> On Monday, November 16, 2020 at 5:39:08 PM UTC-8, Ross A. Finlayson wrote:
>>> On Monday, November 16, 2020 at 5:00:51 PM UTC-8, Ross A. Finlayson wrote:
>>>> On Tuesday, June 30, 2020 at 10:00:52 AM UTC-7, Mostowski Collapse wrote:
>>>>> NNTP is not HTTP. I was using bare metal access to
>>>>> usenet, not using Google group, via:
>>>>>
>>>>> news.albasani.net, unfortunately dead since Corona
>>>>>
>>>>> So was looking for an alternative. And found this
>>>>> alternative, which seems fine:
>>>>>
>>>>> news.solani.org
>>>>>
>>>>> Have Fun!
>>>>>
>>>>> P.S.: Technical spec of news.solani.org:
>>>>>
>>>>> Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 500 GB SSD RAID1
>>>>> Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 2 TB HDD RAID1
>>>>> Intel Xeon VM, 1.8 GHz, 1 GB RAM, 20 GB
>>>>> Standort: 2x Falkenstein, 1x New York
>>>>>
>>>>> advantage of bare metal usenet,
>>>>> you see all headers of message.
>>>>> Am Dienstag, 30. Juni 2020 06:24:53 UTC+2 schrieb Ross A. Finlayson:
>>>>>> Search you mentioned and for example HTTP is adding the SEARCH verb,
>>>> In traffic there are two kinds of usenet users,
>>>> viewers and traffic through Google Groups,
>>>> and, USENET. (USENET traffic.)
>>>>
>>>> Here now Google turned on login to view their
>>>> Google Groups - effectively closing the Google Groups
>>>> without a Google login.
>>>>
>>>> I suppose if they're used at work or whatever though
>>>> they'd be open.
>>>>
>>>>
>>>>
>>>> Where I got with the C10K non-blocking I/O for a usenet server,
>>>> it scales up though then I think in the runtime is a situation where
>>>> it only runs epoll or kqueue that the test scale ups, then at the end
>>>> or in sockets there is a drop, or it fell off the driver. I've implemented
>>>> the code this far, what has all of NNTP in a file and then the "re-routine,
>>>> industry-pattern back-end" in memory, then for that running usually.
>>>>
>>>> (Cooperative multithreading on top of non-blocking I/O.)
>>>>
>>>> Implementing the serial queue or "monohydra", or slique,
>>>> makes for that then when the parser is constantly parsing,
>>>> it seems a usual queue like data structure with parsing
>>>> returning its bounds, consuming the queue.
>>>>
>>>> Having the file buffers all down small on 4K pages,
>>>> has that a next usual page size is the megabyte.
>>>>
>>>> Here though it seems to make sense to have a natural
>>>> 4K alignment the file system representation, then that
>>>> it is moving files.
>>>>
>>>> So, then with the new modern Java, it that runs in its own
>>>> Java server runtime environment, it seems I would also
>>>> need to see whether the cloud virt supported the I/O model
>>>> or not, or that the cooperative multi-threading for example
>>>> would be single-threaded. (Blocking abstractly.)
>>>>
>>>> Then besides I suppose that could be neatly with basically
>>>> the program model, and its file model, being well-defined,
>>>> then for NNTP with IMAP organization search and extensions,
>>>> those being standardized, seems to make sense for an efficient
>>>> news file organization.
>>>>
>>>> Here then it seems for serving the NNTP, and for example
>>>> their file bodies under the storage, with the fixed headers,
>>>> variable header or XREF, and the message body, then under
>>>> content it's same as storage.
>>>>
>>>> NNTP has "OVERVIEW" then from it is built search.
>>>>
>>>> Let's see here then, if I get the load test running, or,
>>>> just put a limit under the load while there are no load test
>>>> errors, it seems the algorithm then scales under load to be
>>>> making usually the algorithm serial in CPU, with: encryption,
>>>> and compression (traffic). (Block ciphers instead of serial transfer.)
>>>>
>>>> Then, the industry pattern with re-routines, has that the
>>>> re-routines are naturally co-operative in the blocking,
>>>> and in the language, including flow-of-control and exception scope.
>>>>
>>>>
>>>> So, I have a high-performance implementation here.
>>> It seems like for NFS, then, and having the separate read and write of the client,
>>> a default filesystem, is an idea for the system facility: mirroring the mounted file
>>> locally, and, providing the read view from that via a different route.
>>>
>>>
>>> A next idea then seems for the organization, the client views themselves
>>> organize over the durable and available file system representation, this
>>> provides anyone a view over the protocol with a group file convention.
>>>
>>> I.e., while usual continuous traffic was surfing, individual reads over group
>>> files could have independent views, for example collating contents.
>>>
>>> Then, extracting requests from traffic and threads seems usual.
>>>
>>> (For example a specialized object transfer view.)
>>>
>>> Making protocols for implementing internet protocols in groups and
>>> so on, here makes for giving usenet example views to content generally.
>>>
>>> So, I have designed a protocol node and implemented it mostly,
>>> then about designed an object transfer protocol, here the idea
>>> is how to make it so people can extract data, for example their own
>>> data, from a large durable store of all the usenet messages,
>>> making views of usenet running on usenet, eg "Feb. 2016: AP's
>>> Greatest Hits".
>>>
>>> Here the point is to figure that usenet, these days, can be operated
>>> in cooperation with usenet, and really for its own sake, for leaving
>>> messages in usenet and here for usenet protocol stores as there's
>>> no reason it's plain text the content, while the protocol supports it.
>>>
>>> Building personal view for example is a simple matter of very many
>>> service providers any of which sells usenet all day for a good deal.
>>>
>>> Let's see here, $25/MM, storage on the cloud last year for about
>>> a million messages for a month is about $25. Outbound traffic is
>>> usually the metered cloud traffic, here for example that CDN traffic
>>> support the universal share convention, under metering. What that
>>> the algorithm is effectively tunable in CPU and RAM, makes for under
>>> I/O that's it's "unobtrusive" or the cooperative in routine, for CPI I/O and
>>> RAM, then that there is for seeking that Network Store or Database Time
>>> instead effectively becomes File I/O time, as what may be faster,
>>> and more durable. There's a faster database time for scaling the ingestion
>>> here with that the file view is eventually consistent. (And reliable.)
>>>
>>> Checking the files would be over time for example with "last checked"
>>> and "last dropped" something along the lines of, finding wrong offsets,
>>> basically having to make it so that it survives neatly corruption of the
>>> store (by being more-or-less stored in-place).
>>>
>>> Content catalog and such, catalog.
>> Then I wonder and figure the re-routine can scale.
>>
>> Here for the re-routine, the industry factory pattern,
>> and the commands in the protocols in the templates,
>> and the memory module, with the algorithm interface,
>> in the high-performance computer resource, it is here
>> that this simple kind of "writing Internet software"
>> makes pretty rapidly for adding resources.
>>
>> Here the design is basically of a file I/O abstraction,
>> that the computer reads data files with mmap to get
>> their handlers, what results that for I/O map the channels
>> result transferring the channels in I/O for what results,
>> in mostly the allocated resource requirements generally,
>> and for the protocol and algorithm, it results then that
>> the industry factory pattern and making for interfaces,
>> then also here the I/O routine as what results that this
>> is an implementation, of a network server, mostly is making
>> for that the re-routine, results very neatly a model of
>> parallel cooperation.
>>
>> I think computers still have file systems and file I/O but
>> in abstraction just because PAGE_SIZE is still relevant for
>> the network besides or I/O, if eventually, here is that the
>> value types are in the commands and so on, it is besides
>> that in terms of the resources so defined it still is in a filesystem
>> convention that a remote and unreliable view of it suffices.
>>
>> Here then the source code also being "this is only 20-50k",
>> lines of code, with basically an entire otherwise library stack
>> of the runtime itself, only the network and file abstraction,
>> this makes for also that modularity results. (Factory Industry
>> Pattern Modules.)
>>
>> For a network server, here, that, mostly it is high performance
>> in the sense that this is about the most direct handle on the channels
>> and here mostly for the text layer in the I/O order, or protocol layer,
>> here is that basically encryption and compression usually in the layer,
>> there is besides a usual concern where encryption and compression
>> are left out, there is that text in the layer itself is commands.
>>
>> Then, those being constants under the resources for the protocol,
>> it's what results usual protocols like NNTP and HTTP and other protocols
>> with usually one server and many clients, here is for that these protocols
>> are defined in these modules, mostly there NNTP and IMAP, ..., HTTP.
>>
>> These are here defined "all Java" or "Pure Java", i.e. let's be clear that
>> in terms of the reference abstraction layer, I think computers still use
>> the non-blocking I/O and filesystems and network to RAM, so that as
>> the I/O is implemented in those it actually has those besides instead for
>> example defaulting to byte-per-channel or character I/O. I.e. the usual
>> semantics for servicing the I/O in the accepter routine and what makes
>> for that the platform also provides a reference encryption implementation,
>> if not so relevant for the block encoder chain, besides that for example
>> compression has a default implementation, here the I/O model is as simply
>> in store for handles, channels, ..., that it results that data especially delivered
>> from a constant store can anyways be mostly compressed and encrypted
>> already or predigested to serve, here that it's the convention, here is for
>> resulting that these client-server protocols, with usually reads > postings
>> then here besides "retention", basically here is for what it is.
>>
>> With the re-routine and the protocol layer besides, having written the
>> routines in the re-routine, what there is to write here is this industry
>> factory, or a module framework, implementing the re-routines, as they're
>> built from the linear description a routine, makes for as the routine progresses
>> that it's "in the language" and that more than less in the terms, it makes for
>> implementing the case of logic for values, in the logic's flow-of-control's terms.
>>
>> Then, there is that actually running the software is different than just
>> writing it, here in the sense that as a server runtime, it is to be made a
>> thing, by giving it a name, and giving it an authority, to exist on the Internet.
>>
>> There is basically that for BGP and NAT and so on, and, mobile fabric networks,
>> IP and TCP/IP, of course IPv4 and IPv6 are the coarse fabric main space, with
>> respect to what are CIDR and 24 bits rule and what makes for TCP/IP, here
>> entirely the course is using the TCP/IP stack and Java's TCP/IP stack, with
>> respect to that TCP/IP is so provided or in terms of process what results
>> ports mostly and connection models where it is exactly the TCP after the IP,
>> the Transport Control Protocol and Internet Protocol, have here both this
>> socket and datagram connection orientation, or stateful and stateless or
>> here that in terms of routing it's defined in addresses, under that names
>> and routing define sources, routes, destinations, ..., that routine numeric
>> IP addresses result in the usual sense of the network being behind an IP
>> and including IPv4 network fabric with respect to local routers.
>>
>> I.e., here to include a service framework is "here besides the routine, let's
>> make it clear that in terms of being a durable resource, there needs to be
>> some lockbox filled with its sustenance that in some locked or constant
>> terms results that for the duration of its outlay, say five years, it is held
>> up, then, it will be so again, or, let down to result the carry-over that it
>> invested to archive itself, I won't have to care or do anything until then".
>>
>>
>> About the service activation and the idea that, for a port, the routine itself
>> needs only run under load, i.e. there is effectively little traffic on the old archives,
>> and usually only the some other archive needs any traffic. Here the point is
>> that for the Java routine there is the system port that was accepted for the
>> request, that inetd or the systemd or means the network service was accessed,
>> made for that much as for HTTP the protocol is client-server also for IP the
>> protocol is client-server, while the TCP is packets. This is a general idea for
>> system integration while here mostly the routine is that being a detail:
>> the filesystem or network resource that results that the re-routines basically
>> make very large CPU scaling.
>>
>> Then, it is basically containerized this sense of "at some domain name, there
>> is a service, it's HTTP and NNTP and IMAP besides, what cares the world".
>>
>> I.e. being built on connection oriented protocols like the socket layer,
>> HTTP(S) and NNTP(S) and IMAP(S) or with the TLS orientation to certificates,
>> it's more than less sensible that most users have no idea of installing some
>> NNTP browser or pointing their email to IMAP so that the email browser
>> browses the newsgroups and for postings, here this is mostly only talk
>> about implementing NNTP then IMAP and HTTP that happens to look like that,
>> besides for example SMTP or NNTP posting.
>>
>> I.e., having "this IMAP server, happens to be this NNTP module", or
>> "this HTTP server, happens to be a real simple mailbox these groups",
>> makes for having partitions and retentions of those and that basically
>> NNTP messages in the protocol can be more or less the same content
>> in media, what otherwise is of a usual message type.
>>
>> Then, the NNTP server-server routine is the progation of messages
>> besides "I shall hire ten great usenet retention accounts and gently
>> and politely draw them down and back-fill Usenet, these ten groups".
>>
>> By then I would have to have made for retention in storage, such contents,
>> as have a reference value, then for besides making that independent in
>> reference value, just so that it suffices that it basically results "a usable
>> durable filesystem that happens you can browse it like usenet". I.e. as
>> the pieces to make the backfill are dug up, they get assigned reference numbers
>> of their time to make for what here is that in a grand schema of things,
>> they have a reference number in numerical order (and what's also the
>> server's "message-number" besides its "message-id") as noted above this
>> gets into the storage for retention of a file, while, most services for this
>> are instead for storage and serving, not necessarily or at all retention.
>>
>> I.e., the point is that as the groups are retained from retention, there is an
>> approach what makes for an orderly archeology, as for what convention
>> some data arrives, here that this server-server routine is besides the usual
>> routine which is "here are new posts, propagate them", it's "please deliver
>> as of a retention scan, and I'll try not to repeat it, what results as orderly
>> as possible a proof or exercise of what we'll call afterward entire retention",
>> then will be for as of writing a file that "as of the date, from start to finish,
>> this site certified these messages as best-effort retention".
>>
>> It seems then besides there is basically "here is some mbox file, serve it
>> like it was an NNTP group or an IMAP mailbox", ingestion, in terms of that
>> what is ingestion, is to result for the protocol that "for this protocol,
>> there is actually a normative filesystem representation that happens to
>> be pretty much also altogether definede by the protocol", the point is
>> that ingestion would result in command to remain in the protocol,
>> that a usual file type that "presents a usual abstraction, of a filesystem,
>> as from the contents of a file", here with the notion of "for all these
>> threaded discussions, here this system only cares some approach to
>> these ten particular newgroups that already have mostly their corpus
>> though it's not in perhaps their native mbox instead consulted from services".
>>
>> Then, there's for storing and serving the files, and there is the usual
>> notion that moving the data, is to result, that really these file organizations
>> are not so large in terms of resources, being "less than gigabytes" or so,
>> still there's a notion that as a durable resource they're to be made
>> fungible here the networked file approach in the native filesystem,
>> then that with respect to it's a backing store, it's to make for that
>> the entire enterprise is more or less to made in terms of account,
>> that then as a facility on the network then a service in the network,
>> it's basically separated the facility and service, while still of course
>> that the service is basically defined by its corpus.
>>
>>
>> Then, to make that fungible in a world of account, while with an exit
>> strategy so that the operation isn't not abstract, is mostly about the
>> domain name, then that what results the networking, after trusted
>> network naming and connections for what result routing, and then
>> the port, in terms of that there are usual firewalls in ports though that
>> besides usually enough client ports are ephemeral, here the point is
>> that the protocols and their well-known ports, here it's usually enough
>> that the Internet doesn't concern itself so much protocols but with
>> respect to proxies, here that for example NNTP and IMAP don't have
>> so much anything so related that way after startTLS. For the world of
>> account, is basically to have for a domain name, an administrator, and,
>> an owner or representative. These are to establish authority for changes
>> and also accountability for usage.
>>
>> Basically they're to be persons and there is a process to get to be an
>> administrator of DNS, most always there are services that a usual person
>> implementing the system might use, besides for example the numerical.
>>
>> More relevant though to DNS is getting servers on the network, with respect
>> to listening ports and that they connect to clients what so discover them as
>> via DNS or configuration, here as above the usual notion that these are
>> standard services and run on well-known ports for inetd or systemd.
>> I.e. there is basically that running a server and dedicated networking,
>> and power and so on, and some notion of the limits of reliability, is then
>> as very much in other aspects of the organization of the system, i.e. its name,
>> while at the same time, the point that a module makes for that basically
>> the provision of a domain name or well-known or ephemeral host, is the
>> usual notion that static IP addresses are a limited resource and as about
>> the various networks in IPv4 and how they route traffic, is for that these
>> services have well-known sections in DNS for at least that the most usual
>> configuration is none.
>>
>> For a usual global reliability and availability, is some notion basically that
>> each region and zone has a service available on the IP address, for that
>> "hostname" resolves to the IP addresses. As well, in reverse, for the IP
>> address and about the hostname, it should resolve reverse to hostname.
>>
>> About certificates mostly for identification after mapping to port, or
>> multi-home Internet routing, here is the point that whether the domain
>> name administration is "epochal" or "regular", is that epochs are defined
>> by the ports behind the numbers and the domain name system as well,
>> where in terms of the registrar, the domain names are epochal to the
>> registrar, with respect to owners of domain names.
>>
>> Then if DNS is a datagram or UDP service is for ICMP as for TCP/IP,
>> and also BGP and NAT and routing and what are local and remote
>> addresses, here is for not-so-much "implement DNS the protocol
>> also while you're at it", rather for what results that there is a durable
>> and long-standing and proper doorman, for some usenet.science.
>>
>> Here then the notion seems to be whether the doorman basically
>> knows well-known services, is a multi-homing router, or otherwise
>> what is the point that it starts the lean runtime, with respect to that
>> it's a container and having enough sense of administration its operation
>> as contained. I.e. here given a port and a hostname and always running
>> makes for that as long as there is the low (preferable no) idle for services
>> running that have no clients, is here also for the cheapest doorman that
>> knows how to standup the client sentinel. (And put it back away.)
>>
>> Probably the most awful thing in the cloud services is the cost for
>> data ingress and egress. What that means is that for example using
>> a facility that is bound by that as a cost instead of under some constant
>> cost, is basically why there is the approach that the containers needs a
>> handle to the files, and they're either local files or network files, here
>> with the some convention above in archival a shared consistent view
>> of all the files, or abstractly consistent, is for making that the doorman
>> can handle lots of starting and finishing connections, while it is out of
>> the way when usually it's client traffic and opening and closing connections,
>> and the usual abstraction is that the client sentinel is never off and doorman
>> does nothing, here is for attaching the one to some lower constant cost,
>> where for example any long-running cost is more than some low constant cost.
>>
>> Then, this kind of service is often represented by nodes, in the usual sense
>> "here is an abstract container with you hope some native performance under
>> the hypervisor where it lives on the farm on its rack, it basically is moved the
>> image to wherever it's requested from and lives there, have fun, the meter is on".
>> I.e. that's just "this Jar has some config conventions and you can make the
>> container associate it and watchdog it with systemd for example and use the
>> cgroups while you're at it and make for tempfs quota and also the best network
>> file share, which you might be welcome to cache if you care just in the off-chance
>> that this file-mapping is free or constant cost as long as it doesn't egress the
>> network", is for here about the facilities that work, to get a copy of the system
>> what with respect to its usual operation is a piece of the Internet.
>>
>> For the different reference modules (industry factories) in their patterns then
>> and under combined configuration "file + process + network + fare", is that
>> the fare of the service basically reflects a daily coin, in the sense that it
>> represents an annual or epochal fee, what results for the time there is
>> what is otherwise all defined the "file + process + network + name",
>> what results it perpetuates in operation more than less simply and automatically.
>>
>> Then, the point though is to get it to where "I can go to this service, and
>> administer it more or less by paying an account, that it thus lives in its
>> budget and quota in its metered world".
>>
>> That though is very involved with identity, that in terms of "I the account
>> as provided this sum make this sum paid with respect to an agreement",
>> is that authority to make agreements must make that it results that the
>> operation of the system, is entirely transparent, and defined in terms of
>> the roles and delegation, conventions in operation.
>>
>> I.e., I personally don't want to administer a copy of usenet, but, it's here
>> pretty much sorted out that I can administer one once then that it's to
>> administer itself in the following, in terms of it having resources to allocate
>> and resources to disburse. Also if nobody's using it it should basically work
>> itself out to dial its lights down (while maintaining availability).
>>
>> Then a point seems "maintain and administer the operation in effect,
>> what arrangement sees via delegation, that a card number and a phone
>> number and an email account and more than less a responsible entity,
>> is so indicated for example in cryptographic identity thus that the operation
>> of this system as a service, effectively operates itself out of a kitty,
>> what makes for administration and overhead, an entirely transparent
>> model of a miniature business the system as a service".
>>
>> "... and a mailing address and mail service."
>>
>> Then, for accounts and accounts, for example is the provision of the component
>> as simply an image in cloud algorithms, where basically as above here it's configured
>> that anybody with any cloud account could basically run it on their own terms,
>> there is for here sorting out "after this delegation to some business entity what
>> results a corporation in effect, the rest is business-in-a-box and more-than-less
>> what makes for its administration in state, is for how it basically limits and replicates
>> its service, in terms of its own assets here as what administered is abstractly
>> "durable forever mailboxes with private ownership if on public or managed resources".
>>
>> A usual notion of a private email and usenet service offering and business-in-a-box,
>> here what I'm looking at is that besides archiving sci.math and copying out its content
>> under author line, is to make such an industry for example here that "once having
>> implemented an Internet service, an Internet service of them results Internet".
>>
>> I.e. here the point is to make a corporation and a foundation in effect, what in terms
>> of then about the books and accounts, is about accounts for the business accounts
>> that reflect a persistent entity, then what results in terms of computing, networking,
>> and internetworking, with a regular notion of "let's never change this arrangement
>> but it's in monthly or annual terms", here for that in overall arrangements,
>> it results what the entire system more than less runs in ways then to either
>> run out its limits or make itself a sponsored effort, about more-or-less a simple
>> and responsible and accountable set of operations what effect the business
>> (here that in terms of service there is basically the realm of agreement)
>> that basically this sort of business-in-a-box model, is then besides itself of
>> accounts, toward the notion as pay-as-you-go and "usual credits and their limits".
>>
>> Then for a news://usenet.science, or for example sci.math.usenet.science,
>> is the idea that the entity is "some assemblage what is so that in DNS, and,
>> in the accounts payable and receivable, and, in the material matters of
>> arrangement and authority for administration, of DNS and resources and
>> accounts what result durably persisting the business, is basically for a service
>> then of what these are usual enough tasks, as that are interactive workflows
>> and for mechanical workflows.
>>
>> I.e. the point is for having the service than an on/off button and more or less
>> what is for a given instance of the operation, what results from some protocol
>> that provides a "durable store" of a sort of the business, that at any time basically
>> some re-routine or "eventually consistent" continuance of the operation of the
>> business, results basically a continuity in its operations, what is entirely granular,
>> that here for example the point is to "pick a DNS name, attach an account service,
>> go" it so results that in the terms, basically there are the placeholders of the
>> interactive workflows in that, and as what in terms are often for example simply
>> card and phone number terms, account terms.
>>
>> I.e. a service to replenish accounts as kitties for making accounts only and
>> exactly limited to the one service, its transfers, basically results that there
>> is the notion of an email address, a phone number, a credit card's information,
>> here a fixed limit debit account that works as of a kitty, there is a regular workflow
>> service that will read out the durable stores and according to the timeliness of
>> their events, affect the configuration and reconciliation of payments for accounts
>> (closed loop scheduling/receiving).
>>
>> https://datatracker.ietf.org/doc/draft-flanagan-regext-datadictionary/
>> https://www.rfc-editor.org/rfc/rfc9022.txt
>>
>> Basically for dailies, monthlies, and annuals, what make weeklies,
>> is this idea of Internet-from-a- account, what is services.
>
>
> After implementing a store, and the protocol for getting messages, then what seems relevant here in the
> context of the SEARCH command, is a fungible file-format, that is derived from the body of the message
> in a normal form, that is a data structure that represents an index and catalog and dictionary and summary
> of the message, a form of a data structure of a "search index".
>
> These types files should naturally compose, and result a data structure that according to some normal
> forms of search and summary algorithms, result that a data structure results, that makes for efficient
> search of sections of the corpus for information retrieval, here that "information retrieval is the science
> of search algorithms".
>
> Now, for what and how people search, or what is the specification of a search, is in terms of queries, say,
> here for some brief forms of queries that advise what's definitely included in the search, what's excluded,
> then perhaps what's maybe included, or yes/no/maybe, which makes for a predicate that can be built,
> that can be applied to results that compose and build for the terms of a filter with yes/no/maybe or
> sure/no/yes, with predicates in values.
>
> Here there is basically "free text search" and "matching summaries", where text is the text and summary is
> a data structure, with attributes as paths the leaves of the tree of which match.
>
> Then, the message has text, its body, and and headers, key-value pairs or collections thereof, where as well
> there are default summaries like "a histogram of words by occurrence" or for example default text like "the
> MIME body of this message has a default text representation".
>
> So, the idea developing here is to define what are "normal" forms of data structures that have some "normal"
> forms of encoding that result that these "normalizing" after "normative" data structures define well-behaved
> algorithms upon them, which provide well-defined bounds in resources that return some quantification of results,
> like any/each/every/all, "hits".
>
> This is where usually enough search engines' or collected search algorithms ("find") usually enough have these
> de-facto forms, "under the hood", as it were, to make it first-class that for a given message and body that
> there is a normal form of a "catalog summary index" which can be compiled to a constant when the message
> is ingested, that then basically any filestore of these messages has alongside it the filestore of the "catsums"
> or as on-demand, then that any algorithm has at least well-defined behavior under partitions or collections
> or selections of these messages, or items, for various standard algorithms that separate "to find" from
> "to serve to find".
>
> So, ..., what I'm wondering are what would be sufficient normal forms in brief that result that there are
> defined for a given corpus of messages, basically at the granularity of messages, how is defined how
> there is a normal form for each message its "catsum", that catums have a natural algebra that a
> concatenation of catums is a catsum and that some standard algorithms naturally have well-defined
> results on their predicates and quantifiers of matching, in serial and parallel, and that the results
> combine in serial and parallel.
>
> The results should be applicable to any kind of data but here it's more or less about usenet groups.
>


So, if you know all about old-fashioned
Internet protocols like DNS, then NNTP,
IMAP, SMTP, HTTP, and so on, then where
it's at is figuring out these various sorts
conventions then to result a sort-of, the
sensible, fungible, and tractable, conventions
of the data structures and algorithms, in
the protocols, what result keeping things
simple and standing up a usual Internet
messaging agentry.


BFF: backing-file formats, "Best friends forever"

Message files
Group files

Thread link files
Date link files

SFF: search-file formats, "partially digested metadata"



NOOBNB: Noob Nota Bene: Cur/Pur/Raw

Load Roll/Fold/Shed/Hold: throughput/offput



Then, the idea is to make it so that by constructing
the files or a logical/physical sort of distinction,
that then results a neat tape archive then that
those can just be laid down together and result
a corpus, or filtered on down and result a corpus,
where the existence standard is sort of called "mailbox"
or "mbox" format, with the idea muchly of
"converting mbox to BFF".


Then, for enabling search, basically the idea or a
design principle of the FF is that they're concatenable
or just overlaid and all write-once-read-many, then
with regards to things like merges, which also should
result as some sort of algorithm in tools, what results,
that of course usual sorts tools like textutils, working
on these files, would make it so that usual extant tools,
are native on the files.

So for metadata, the idea is that there are standard
metadata attributes like the closed categories of
headers and so on, where the primary attributes sort
of look like

message-id
author

delivery-path
delivery-metadata (account, GUID, ...)

destinations

subject
size
content

hash-raw-id <- after message-id
hash-invariant-id <- after removing inconstants
hash-uncoded-id <- after uncoding out to full

Because messages are supposed to be unique,
there's an idea to sort of detect differences.


The idea is to sort of implement NNTP's OVERVIEW
and WILDMAT, then there's IMAP, figuring that the
first goals of SFF is to implement the normative
commands, then with regards to implementations,
basically working up for HTTP SEARCH, a sort of
normative representation of messages, groups,
threads, and so on, sort of what results a neat sort
standard system for all sorts purposes these, "posts".


Anybody know any "normative RFC email's in HTTP"?
Here the idea is basically that a naive server
simply gets pointed at BFF files for message-id
and loads any message there as an HTTP representation,
with regards to HTTP, HTML, and so on, about these
sorts "sensible, fungible, tractable" conventions.


It's been a while since I studied the standards,
so I'm looking to get back tapping at the C10K server
here, basically with hi-po full throughput then with
regards to the sentinel/doorman bit (Load R/F/S/H).

So, I'll be looking for "partially digested and
composable search metadata formats" and "informative
and normative standards-based message and content".

They already have one of those, it's called "Internet".
Ross Finlayson
2024-02-10 06:37:33 UTC
Reply
Permalink
On 02/08/2024 01:04 PM, Ross Finlayson wrote:
> On 03/08/2023 08:51 PM, Ross Finlayson wrote:
>> On Monday, December 6, 2021 at 7:32:16 AM UTC-8, Ross A. Finlayson wrote:
>>> On Monday, November 16, 2020 at 5:39:08 PM UTC-8, Ross A. Finlayson
>>> wrote:
>>>> On Monday, November 16, 2020 at 5:00:51 PM UTC-8, Ross A. Finlayson
>>>> wrote:
>>>>> On Tuesday, June 30, 2020 at 10:00:52 AM UTC-7, Mostowski Collapse
>>>>> wrote:
>>>>>> NNTP is not HTTP. I was using bare metal access to
>>>>>> usenet, not using Google group, via:
>>>>>>
>>>>>> news.albasani.net, unfortunately dead since Corona
>>>>>>
>>>>>> So was looking for an alternative. And found this
>>>>>> alternative, which seems fine:
>>>>>>
>>>>>> news.solani.org
>>>>>>
>>>>>> Have Fun!
>>>>>>
>>>>>> P.S.: Technical spec of news.solani.org:
>>>>>>
>>>>>> Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 500 GB SSD RAID1
>>>>>> Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 2 TB HDD RAID1
>>>>>> Intel Xeon VM, 1.8 GHz, 1 GB RAM, 20 GB
>>>>>> Standort: 2x Falkenstein, 1x New York
>>>>>>
>>>>>> advantage of bare metal usenet,
>>>>>> you see all headers of message.
>>>>>> Am Dienstag, 30. Juni 2020 06:24:53 UTC+2 schrieb Ross A. Finlayson:
>>>>>>> Search you mentioned and for example HTTP is adding the SEARCH verb,
>>>>> In traffic there are two kinds of usenet users,
>>>>> viewers and traffic through Google Groups,
>>>>> and, USENET. (USENET traffic.)
>>>>>
>>>>> Here now Google turned on login to view their
>>>>> Google Groups - effectively closing the Google Groups
>>>>> without a Google login.
>>>>>
>>>>> I suppose if they're used at work or whatever though
>>>>> they'd be open.
>>>>>
>>>>>
>>>>>
>>>>> Where I got with the C10K non-blocking I/O for a usenet server,
>>>>> it scales up though then I think in the runtime is a situation where
>>>>> it only runs epoll or kqueue that the test scale ups, then at the end
>>>>> or in sockets there is a drop, or it fell off the driver. I've
>>>>> implemented
>>>>> the code this far, what has all of NNTP in a file and then the
>>>>> "re-routine,
>>>>> industry-pattern back-end" in memory, then for that running usually.
>>>>>
>>>>> (Cooperative multithreading on top of non-blocking I/O.)
>>>>>
>>>>> Implementing the serial queue or "monohydra", or slique,
>>>>> makes for that then when the parser is constantly parsing,
>>>>> it seems a usual queue like data structure with parsing
>>>>> returning its bounds, consuming the queue.
>>>>>
>>>>> Having the file buffers all down small on 4K pages,
>>>>> has that a next usual page size is the megabyte.
>>>>>
>>>>> Here though it seems to make sense to have a natural
>>>>> 4K alignment the file system representation, then that
>>>>> it is moving files.
>>>>>
>>>>> So, then with the new modern Java, it that runs in its own
>>>>> Java server runtime environment, it seems I would also
>>>>> need to see whether the cloud virt supported the I/O model
>>>>> or not, or that the cooperative multi-threading for example
>>>>> would be single-threaded. (Blocking abstractly.)
>>>>>
>>>>> Then besides I suppose that could be neatly with basically
>>>>> the program model, and its file model, being well-defined,
>>>>> then for NNTP with IMAP organization search and extensions,
>>>>> those being standardized, seems to make sense for an efficient
>>>>> news file organization.
>>>>>
>>>>> Here then it seems for serving the NNTP, and for example
>>>>> their file bodies under the storage, with the fixed headers,
>>>>> variable header or XREF, and the message body, then under
>>>>> content it's same as storage.
>>>>>
>>>>> NNTP has "OVERVIEW" then from it is built search.
>>>>>
>>>>> Let's see here then, if I get the load test running, or,
>>>>> just put a limit under the load while there are no load test
>>>>> errors, it seems the algorithm then scales under load to be
>>>>> making usually the algorithm serial in CPU, with: encryption,
>>>>> and compression (traffic). (Block ciphers instead of serial transfer.)
>>>>>
>>>>> Then, the industry pattern with re-routines, has that the
>>>>> re-routines are naturally co-operative in the blocking,
>>>>> and in the language, including flow-of-control and exception scope.
>>>>>
>>>>>
>>>>> So, I have a high-performance implementation here.
>>>> It seems like for NFS, then, and having the separate read and write
>>>> of the client,
>>>> a default filesystem, is an idea for the system facility: mirroring
>>>> the mounted file
>>>> locally, and, providing the read view from that via a different route.
>>>>
>>>>
>>>> A next idea then seems for the organization, the client views
>>>> themselves
>>>> organize over the durable and available file system representation,
>>>> this
>>>> provides anyone a view over the protocol with a group file convention.
>>>>
>>>> I.e., while usual continuous traffic was surfing, individual reads
>>>> over group
>>>> files could have independent views, for example collating contents.
>>>>
>>>> Then, extracting requests from traffic and threads seems usual.
>>>>
>>>> (For example a specialized object transfer view.)
>>>>
>>>> Making protocols for implementing internet protocols in groups and
>>>> so on, here makes for giving usenet example views to content generally.
>>>>
>>>> So, I have designed a protocol node and implemented it mostly,
>>>> then about designed an object transfer protocol, here the idea
>>>> is how to make it so people can extract data, for example their own
>>>> data, from a large durable store of all the usenet messages,
>>>> making views of usenet running on usenet, eg "Feb. 2016: AP's
>>>> Greatest Hits".
>>>>
>>>> Here the point is to figure that usenet, these days, can be operated
>>>> in cooperation with usenet, and really for its own sake, for leaving
>>>> messages in usenet and here for usenet protocol stores as there's
>>>> no reason it's plain text the content, while the protocol supports it.
>>>>
>>>> Building personal view for example is a simple matter of very many
>>>> service providers any of which sells usenet all day for a good deal.
>>>>
>>>> Let's see here, $25/MM, storage on the cloud last year for about
>>>> a million messages for a month is about $25. Outbound traffic is
>>>> usually the metered cloud traffic, here for example that CDN traffic
>>>> support the universal share convention, under metering. What that
>>>> the algorithm is effectively tunable in CPU and RAM, makes for under
>>>> I/O that's it's "unobtrusive" or the cooperative in routine, for CPI
>>>> I/O and
>>>> RAM, then that there is for seeking that Network Store or Database Time
>>>> instead effectively becomes File I/O time, as what may be faster,
>>>> and more durable. There's a faster database time for scaling the
>>>> ingestion
>>>> here with that the file view is eventually consistent. (And reliable.)
>>>>
>>>> Checking the files would be over time for example with "last checked"
>>>> and "last dropped" something along the lines of, finding wrong offsets,
>>>> basically having to make it so that it survives neatly corruption of
>>>> the
>>>> store (by being more-or-less stored in-place).
>>>>
>>>> Content catalog and such, catalog.
>>> Then I wonder and figure the re-routine can scale.
>>>
>>> Here for the re-routine, the industry factory pattern,
>>> and the commands in the protocols in the templates,
>>> and the memory module, with the algorithm interface,
>>> in the high-performance computer resource, it is here
>>> that this simple kind of "writing Internet software"
>>> makes pretty rapidly for adding resources.
>>>
>>> Here the design is basically of a file I/O abstraction,
>>> that the computer reads data files with mmap to get
>>> their handlers, what results that for I/O map the channels
>>> result transferring the channels in I/O for what results,
>>> in mostly the allocated resource requirements generally,
>>> and for the protocol and algorithm, it results then that
>>> the industry factory pattern and making for interfaces,
>>> then also here the I/O routine as what results that this
>>> is an implementation, of a network server, mostly is making
>>> for that the re-routine, results very neatly a model of
>>> parallel cooperation.
>>>
>>> I think computers still have file systems and file I/O but
>>> in abstraction just because PAGE_SIZE is still relevant for
>>> the network besides or I/O, if eventually, here is that the
>>> value types are in the commands and so on, it is besides
>>> that in terms of the resources so defined it still is in a filesystem
>>> convention that a remote and unreliable view of it suffices.
>>>
>>> Here then the source code also being "this is only 20-50k",
>>> lines of code, with basically an entire otherwise library stack
>>> of the runtime itself, only the network and file abstraction,
>>> this makes for also that modularity results. (Factory Industry
>>> Pattern Modules.)
>>>
>>> For a network server, here, that, mostly it is high performance
>>> in the sense that this is about the most direct handle on the channels
>>> and here mostly for the text layer in the I/O order, or protocol layer,
>>> here is that basically encryption and compression usually in the layer,
>>> there is besides a usual concern where encryption and compression
>>> are left out, there is that text in the layer itself is commands.
>>>
>>> Then, those being constants under the resources for the protocol,
>>> it's what results usual protocols like NNTP and HTTP and other protocols
>>> with usually one server and many clients, here is for that these
>>> protocols
>>> are defined in these modules, mostly there NNTP and IMAP, ..., HTTP.
>>>
>>> These are here defined "all Java" or "Pure Java", i.e. let's be clear
>>> that
>>> in terms of the reference abstraction layer, I think computers still use
>>> the non-blocking I/O and filesystems and network to RAM, so that as
>>> the I/O is implemented in those it actually has those besides instead
>>> for
>>> example defaulting to byte-per-channel or character I/O. I.e. the usual
>>> semantics for servicing the I/O in the accepter routine and what makes
>>> for that the platform also provides a reference encryption
>>> implementation,
>>> if not so relevant for the block encoder chain, besides that for example
>>> compression has a default implementation, here the I/O model is as
>>> simply
>>> in store for handles, channels, ..., that it results that data
>>> especially delivered
>>> from a constant store can anyways be mostly compressed and encrypted
>>> already or predigested to serve, here that it's the convention, here
>>> is for
>>> resulting that these client-server protocols, with usually reads >
>>> postings
>>> then here besides "retention", basically here is for what it is.
>>>
>>> With the re-routine and the protocol layer besides, having written the
>>> routines in the re-routine, what there is to write here is this industry
>>> factory, or a module framework, implementing the re-routines, as they're
>>> built from the linear description a routine, makes for as the routine
>>> progresses
>>> that it's "in the language" and that more than less in the terms, it
>>> makes for
>>> implementing the case of logic for values, in the logic's
>>> flow-of-control's terms.
>>>
>>> Then, there is that actually running the software is different than just
>>> writing it, here in the sense that as a server runtime, it is to be
>>> made a
>>> thing, by giving it a name, and giving it an authority, to exist on
>>> the Internet.
>>>
>>> There is basically that for BGP and NAT and so on, and, mobile fabric
>>> networks,
>>> IP and TCP/IP, of course IPv4 and IPv6 are the coarse fabric main
>>> space, with
>>> respect to what are CIDR and 24 bits rule and what makes for TCP/IP,
>>> here
>>> entirely the course is using the TCP/IP stack and Java's TCP/IP
>>> stack, with
>>> respect to that TCP/IP is so provided or in terms of process what
>>> results
>>> ports mostly and connection models where it is exactly the TCP after
>>> the IP,
>>> the Transport Control Protocol and Internet Protocol, have here both
>>> this
>>> socket and datagram connection orientation, or stateful and stateless or
>>> here that in terms of routing it's defined in addresses, under that
>>> names
>>> and routing define sources, routes, destinations, ..., that routine
>>> numeric
>>> IP addresses result in the usual sense of the network being behind an IP
>>> and including IPv4 network fabric with respect to local routers.
>>>
>>> I.e., here to include a service framework is "here besides the
>>> routine, let's
>>> make it clear that in terms of being a durable resource, there needs
>>> to be
>>> some lockbox filled with its sustenance that in some locked or constant
>>> terms results that for the duration of its outlay, say five years, it
>>> is held
>>> up, then, it will be so again, or, let down to result the carry-over
>>> that it
>>> invested to archive itself, I won't have to care or do anything until
>>> then".
>>>
>>>
>>> About the service activation and the idea that, for a port, the
>>> routine itself
>>> needs only run under load, i.e. there is effectively little traffic
>>> on the old archives,
>>> and usually only the some other archive needs any traffic. Here the
>>> point is
>>> that for the Java routine there is the system port that was accepted
>>> for the
>>> request, that inetd or the systemd or means the network service was
>>> accessed,
>>> made for that much as for HTTP the protocol is client-server also for
>>> IP the
>>> protocol is client-server, while the TCP is packets. This is a
>>> general idea for
>>> system integration while here mostly the routine is that being a detail:
>>> the filesystem or network resource that results that the re-routines
>>> basically
>>> make very large CPU scaling.
>>>
>>> Then, it is basically containerized this sense of "at some domain
>>> name, there
>>> is a service, it's HTTP and NNTP and IMAP besides, what cares the
>>> world".
>>>
>>> I.e. being built on connection oriented protocols like the socket layer,
>>> HTTP(S) and NNTP(S) and IMAP(S) or with the TLS orientation to
>>> certificates,
>>> it's more than less sensible that most users have no idea of
>>> installing some
>>> NNTP browser or pointing their email to IMAP so that the email browser
>>> browses the newsgroups and for postings, here this is mostly only talk
>>> about implementing NNTP then IMAP and HTTP that happens to look like
>>> that,
>>> besides for example SMTP or NNTP posting.
>>>
>>> I.e., having "this IMAP server, happens to be this NNTP module", or
>>> "this HTTP server, happens to be a real simple mailbox these groups",
>>> makes for having partitions and retentions of those and that basically
>>> NNTP messages in the protocol can be more or less the same content
>>> in media, what otherwise is of a usual message type.
>>>
>>> Then, the NNTP server-server routine is the progation of messages
>>> besides "I shall hire ten great usenet retention accounts and gently
>>> and politely draw them down and back-fill Usenet, these ten groups".
>>>
>>> By then I would have to have made for retention in storage, such
>>> contents,
>>> as have a reference value, then for besides making that independent in
>>> reference value, just so that it suffices that it basically results
>>> "a usable
>>> durable filesystem that happens you can browse it like usenet". I.e. as
>>> the pieces to make the backfill are dug up, they get assigned
>>> reference numbers
>>> of their time to make for what here is that in a grand schema of things,
>>> they have a reference number in numerical order (and what's also the
>>> server's "message-number" besides its "message-id") as noted above this
>>> gets into the storage for retention of a file, while, most services
>>> for this
>>> are instead for storage and serving, not necessarily or at all
>>> retention.
>>>
>>> I.e., the point is that as the groups are retained from retention,
>>> there is an
>>> approach what makes for an orderly archeology, as for what convention
>>> some data arrives, here that this server-server routine is besides
>>> the usual
>>> routine which is "here are new posts, propagate them", it's "please
>>> deliver
>>> as of a retention scan, and I'll try not to repeat it, what results
>>> as orderly
>>> as possible a proof or exercise of what we'll call afterward entire
>>> retention",
>>> then will be for as of writing a file that "as of the date, from
>>> start to finish,
>>> this site certified these messages as best-effort retention".
>>>
>>> It seems then besides there is basically "here is some mbox file,
>>> serve it
>>> like it was an NNTP group or an IMAP mailbox", ingestion, in terms of
>>> that
>>> what is ingestion, is to result for the protocol that "for this
>>> protocol,
>>> there is actually a normative filesystem representation that happens to
>>> be pretty much also altogether definede by the protocol", the point is
>>> that ingestion would result in command to remain in the protocol,
>>> that a usual file type that "presents a usual abstraction, of a
>>> filesystem,
>>> as from the contents of a file", here with the notion of "for all these
>>> threaded discussions, here this system only cares some approach to
>>> these ten particular newgroups that already have mostly their corpus
>>> though it's not in perhaps their native mbox instead consulted from
>>> services".
>>>
>>> Then, there's for storing and serving the files, and there is the usual
>>> notion that moving the data, is to result, that really these file
>>> organizations
>>> are not so large in terms of resources, being "less than gigabytes"
>>> or so,
>>> still there's a notion that as a durable resource they're to be made
>>> fungible here the networked file approach in the native filesystem,
>>> then that with respect to it's a backing store, it's to make for that
>>> the entire enterprise is more or less to made in terms of account,
>>> that then as a facility on the network then a service in the network,
>>> it's basically separated the facility and service, while still of course
>>> that the service is basically defined by its corpus.
>>>
>>>
>>> Then, to make that fungible in a world of account, while with an exit
>>> strategy so that the operation isn't not abstract, is mostly about the
>>> domain name, then that what results the networking, after trusted
>>> network naming and connections for what result routing, and then
>>> the port, in terms of that there are usual firewalls in ports though
>>> that
>>> besides usually enough client ports are ephemeral, here the point is
>>> that the protocols and their well-known ports, here it's usually enough
>>> that the Internet doesn't concern itself so much protocols but with
>>> respect to proxies, here that for example NNTP and IMAP don't have
>>> so much anything so related that way after startTLS. For the world of
>>> account, is basically to have for a domain name, an administrator, and,
>>> an owner or representative. These are to establish authority for changes
>>> and also accountability for usage.
>>>
>>> Basically they're to be persons and there is a process to get to be an
>>> administrator of DNS, most always there are services that a usual person
>>> implementing the system might use, besides for example the numerical.
>>>
>>> More relevant though to DNS is getting servers on the network, with
>>> respect
>>> to listening ports and that they connect to clients what so discover
>>> them as
>>> via DNS or configuration, here as above the usual notion that these are
>>> standard services and run on well-known ports for inetd or systemd.
>>> I.e. there is basically that running a server and dedicated networking,
>>> and power and so on, and some notion of the limits of reliability, is
>>> then
>>> as very much in other aspects of the organization of the system, i.e.
>>> its name,
>>> while at the same time, the point that a module makes for that basically
>>> the provision of a domain name or well-known or ephemeral host, is the
>>> usual notion that static IP addresses are a limited resource and as
>>> about
>>> the various networks in IPv4 and how they route traffic, is for that
>>> these
>>> services have well-known sections in DNS for at least that the most
>>> usual
>>> configuration is none.
>>>
>>> For a usual global reliability and availability, is some notion
>>> basically that
>>> each region and zone has a service available on the IP address, for that
>>> "hostname" resolves to the IP addresses. As well, in reverse, for the IP
>>> address and about the hostname, it should resolve reverse to hostname.
>>>
>>> About certificates mostly for identification after mapping to port, or
>>> multi-home Internet routing, here is the point that whether the domain
>>> name administration is "epochal" or "regular", is that epochs are
>>> defined
>>> by the ports behind the numbers and the domain name system as well,
>>> where in terms of the registrar, the domain names are epochal to the
>>> registrar, with respect to owners of domain names.
>>>
>>> Then if DNS is a datagram or UDP service is for ICMP as for TCP/IP,
>>> and also BGP and NAT and routing and what are local and remote
>>> addresses, here is for not-so-much "implement DNS the protocol
>>> also while you're at it", rather for what results that there is a
>>> durable
>>> and long-standing and proper doorman, for some usenet.science.
>>>
>>> Here then the notion seems to be whether the doorman basically
>>> knows well-known services, is a multi-homing router, or otherwise
>>> what is the point that it starts the lean runtime, with respect to that
>>> it's a container and having enough sense of administration its operation
>>> as contained. I.e. here given a port and a hostname and always running
>>> makes for that as long as there is the low (preferable no) idle for
>>> services
>>> running that have no clients, is here also for the cheapest doorman that
>>> knows how to standup the client sentinel. (And put it back away.)
>>>
>>> Probably the most awful thing in the cloud services is the cost for
>>> data ingress and egress. What that means is that for example using
>>> a facility that is bound by that as a cost instead of under some
>>> constant
>>> cost, is basically why there is the approach that the containers needs a
>>> handle to the files, and they're either local files or network files,
>>> here
>>> with the some convention above in archival a shared consistent view
>>> of all the files, or abstractly consistent, is for making that the
>>> doorman
>>> can handle lots of starting and finishing connections, while it is
>>> out of
>>> the way when usually it's client traffic and opening and closing
>>> connections,
>>> and the usual abstraction is that the client sentinel is never off
>>> and doorman
>>> does nothing, here is for attaching the one to some lower constant cost,
>>> where for example any long-running cost is more than some low
>>> constant cost.
>>>
>>> Then, this kind of service is often represented by nodes, in the
>>> usual sense
>>> "here is an abstract container with you hope some native performance
>>> under
>>> the hypervisor where it lives on the farm on its rack, it basically
>>> is moved the
>>> image to wherever it's requested from and lives there, have fun, the
>>> meter is on".
>>> I.e. that's just "this Jar has some config conventions and you can
>>> make the
>>> container associate it and watchdog it with systemd for example and
>>> use the
>>> cgroups while you're at it and make for tempfs quota and also the
>>> best network
>>> file share, which you might be welcome to cache if you care just in
>>> the off-chance
>>> that this file-mapping is free or constant cost as long as it doesn't
>>> egress the
>>> network", is for here about the facilities that work, to get a copy
>>> of the system
>>> what with respect to its usual operation is a piece of the Internet.
>>>
>>> For the different reference modules (industry factories) in their
>>> patterns then
>>> and under combined configuration "file + process + network + fare",
>>> is that
>>> the fare of the service basically reflects a daily coin, in the sense
>>> that it
>>> represents an annual or epochal fee, what results for the time there is
>>> what is otherwise all defined the "file + process + network + name",
>>> what results it perpetuates in operation more than less simply and
>>> automatically.
>>>
>>> Then, the point though is to get it to where "I can go to this
>>> service, and
>>> administer it more or less by paying an account, that it thus lives
>>> in its
>>> budget and quota in its metered world".
>>>
>>> That though is very involved with identity, that in terms of "I the
>>> account
>>> as provided this sum make this sum paid with respect to an agreement",
>>> is that authority to make agreements must make that it results that the
>>> operation of the system, is entirely transparent, and defined in
>>> terms of
>>> the roles and delegation, conventions in operation.
>>>
>>> I.e., I personally don't want to administer a copy of usenet, but,
>>> it's here
>>> pretty much sorted out that I can administer one once then that it's to
>>> administer itself in the following, in terms of it having resources
>>> to allocate
>>> and resources to disburse. Also if nobody's using it it should
>>> basically work
>>> itself out to dial its lights down (while maintaining availability).
>>>
>>> Then a point seems "maintain and administer the operation in effect,
>>> what arrangement sees via delegation, that a card number and a phone
>>> number and an email account and more than less a responsible entity,
>>> is so indicated for example in cryptographic identity thus that the
>>> operation
>>> of this system as a service, effectively operates itself out of a kitty,
>>> what makes for administration and overhead, an entirely transparent
>>> model of a miniature business the system as a service".
>>>
>>> "... and a mailing address and mail service."
>>>
>>> Then, for accounts and accounts, for example is the provision of the
>>> component
>>> as simply an image in cloud algorithms, where basically as above here
>>> it's configured
>>> that anybody with any cloud account could basically run it on their
>>> own terms,
>>> there is for here sorting out "after this delegation to some business
>>> entity what
>>> results a corporation in effect, the rest is business-in-a-box and
>>> more-than-less
>>> what makes for its administration in state, is for how it basically
>>> limits and replicates
>>> its service, in terms of its own assets here as what administered is
>>> abstractly
>>> "durable forever mailboxes with private ownership if on public or
>>> managed resources".
>>>
>>> A usual notion of a private email and usenet service offering and
>>> business-in-a-box,
>>> here what I'm looking at is that besides archiving sci.math and
>>> copying out its content
>>> under author line, is to make such an industry for example here that
>>> "once having
>>> implemented an Internet service, an Internet service of them results
>>> Internet".
>>>
>>> I.e. here the point is to make a corporation and a foundation in
>>> effect, what in terms
>>> of then about the books and accounts, is about accounts for the
>>> business accounts
>>> that reflect a persistent entity, then what results in terms of
>>> computing, networking,
>>> and internetworking, with a regular notion of "let's never change
>>> this arrangement
>>> but it's in monthly or annual terms", here for that in overall
>>> arrangements,
>>> it results what the entire system more than less runs in ways then to
>>> either
>>> run out its limits or make itself a sponsored effort, about
>>> more-or-less a simple
>>> and responsible and accountable set of operations what effect the
>>> business
>>> (here that in terms of service there is basically the realm of
>>> agreement)
>>> that basically this sort of business-in-a-box model, is then besides
>>> itself of
>>> accounts, toward the notion as pay-as-you-go and "usual credits and
>>> their limits".
>>>
>>> Then for a news://usenet.science, or for example
>>> sci.math.usenet.science,
>>> is the idea that the entity is "some assemblage what is so that in
>>> DNS, and,
>>> in the accounts payable and receivable, and, in the material matters of
>>> arrangement and authority for administration, of DNS and resources and
>>> accounts what result durably persisting the business, is basically
>>> for a service
>>> then of what these are usual enough tasks, as that are interactive
>>> workflows
>>> and for mechanical workflows.
>>>
>>> I.e. the point is for having the service than an on/off button and
>>> more or less
>>> what is for a given instance of the operation, what results from some
>>> protocol
>>> that provides a "durable store" of a sort of the business, that at
>>> any time basically
>>> some re-routine or "eventually consistent" continuance of the
>>> operation of the
>>> business, results basically a continuity in its operations, what is
>>> entirely granular,
>>> that here for example the point is to "pick a DNS name, attach an
>>> account service,
>>> go" it so results that in the terms, basically there are the
>>> placeholders of the
>>> interactive workflows in that, and as what in terms are often for
>>> example simply
>>> card and phone number terms, account terms.
>>>
>>> I.e. a service to replenish accounts as kitties for making accounts
>>> only and
>>> exactly limited to the one service, its transfers, basically results
>>> that there
>>> is the notion of an email address, a phone number, a credit card's
>>> information,
>>> here a fixed limit debit account that works as of a kitty, there is a
>>> regular workflow
>>> service that will read out the durable stores and according to the
>>> timeliness of
>>> their events, affect the configuration and reconciliation of payments
>>> for accounts
>>> (closed loop scheduling/receiving).
>>>
>>> https://datatracker.ietf.org/doc/draft-flanagan-regext-datadictionary/
>>> https://www.rfc-editor.org/rfc/rfc9022.txt
>>>
>>> Basically for dailies, monthlies, and annuals, what make weeklies,
>>> is this idea of Internet-from-a- account, what is services.
>>
>>
>> After implementing a store, and the protocol for getting messages,
>> then what seems relevant here in the
>> context of the SEARCH command, is a fungible file-format, that is
>> derived from the body of the message
>> in a normal form, that is a data structure that represents an index
>> and catalog and dictionary and summary
>> of the message, a form of a data structure of a "search index".
>>
>> These types files should naturally compose, and result a data
>> structure that according to some normal
>> forms of search and summary algorithms, result that a data structure
>> results, that makes for efficient
>> search of sections of the corpus for information retrieval, here that
>> "information retrieval is the science
>> of search algorithms".
>>
>> Now, for what and how people search, or what is the specification of a
>> search, is in terms of queries, say,
>> here for some brief forms of queries that advise what's definitely
>> included in the search, what's excluded,
>> then perhaps what's maybe included, or yes/no/maybe, which makes for a
>> predicate that can be built,
>> that can be applied to results that compose and build for the terms of
>> a filter with yes/no/maybe or
>> sure/no/yes, with predicates in values.
>>
>> Here there is basically "free text search" and "matching summaries",
>> where text is the text and summary is
>> a data structure, with attributes as paths the leaves of the tree of
>> which match.
>>
>> Then, the message has text, its body, and and headers, key-value pairs
>> or collections thereof, where as well
>> there are default summaries like "a histogram of words by occurrence"
>> or for example default text like "the
>> MIME body of this message has a default text representation".
>>
>> So, the idea developing here is to define what are "normal" forms of
>> data structures that have some "normal"
>> forms of encoding that result that these "normalizing" after
>> "normative" data structures define well-behaved
>> algorithms upon them, which provide well-defined bounds in resources
>> that return some quantification of results,
>> like any/each/every/all, "hits".
>>
>> This is where usually enough search engines' or collected search
>> algorithms ("find") usually enough have these
>> de-facto forms, "under the hood", as it were, to make it first-class
>> that for a given message and body that
>> there is a normal form of a "catalog summary index" which can be
>> compiled to a constant when the message
>> is ingested, that then basically any filestore of these messages has
>> alongside it the filestore of the "catsums"
>> or as on-demand, then that any algorithm has at least well-defined
>> behavior under partitions or collections
>> or selections of these messages, or items, for various standard
>> algorithms that separate "to find" from
>> "to serve to find".
>>
>> So, ..., what I'm wondering are what would be sufficient normal forms
>> in brief that result that there are
>> defined for a given corpus of messages, basically at the granularity
>> of messages, how is defined how
>> there is a normal form for each message its "catsum", that catums have
>> a natural algebra that a
>> concatenation of catums is a catsum and that some standard algorithms
>> naturally have well-defined
>> results on their predicates and quantifiers of matching, in serial and
>> parallel, and that the results
>> combine in serial and parallel.
>>
>> The results should be applicable to any kind of data but here it's
>> more or less about usenet groups.
>>
>
>
> So, if you know all about old-fashioned
> Internet protocols like DNS, then NNTP,
> IMAP, SMTP, HTTP, and so on, then where
> it's at is figuring out these various sorts
> conventions then to result a sort-of, the
> sensible, fungible, and tractable, conventions
> of the data structures and algorithms, in
> the protocols, what result keeping things
> simple and standing up a usual Internet
> messaging agentry.
>
>
> BFF: backing-file formats, "Best friends forever"
>
> Message files
> Group files
>
> Thread link files
> Date link files
>
> SFF: search-file formats, "partially digested metadata"
>
>
>
> NOOBNB: Noob Nota Bene: Cur/Pur/Raw
>
> Load Roll/Fold/Shed/Hold: throughput/offput
>
>
>
> Then, the idea is to make it so that by constructing
> the files or a logical/physical sort of distinction,
> that then results a neat tape archive then that
> those can just be laid down together and result
> a corpus, or filtered on down and result a corpus,
> where the existence standard is sort of called "mailbox"
> or "mbox" format, with the idea muchly of
> "converting mbox to BFF".
>
>
> Then, for enabling search, basically the idea or a
> design principle of the FF is that they're concatenable
> or just overlaid and all write-once-read-many, then
> with regards to things like merges, which also should
> result as some sort of algorithm in tools, what results,
> that of course usual sorts tools like textutils, working
> on these files, would make it so that usual extant tools,
> are native on the files.
>
> So for metadata, the idea is that there are standard
> metadata attributes like the closed categories of
> headers and so on, where the primary attributes sort
> of look like
>
> message-id
> author
>
> delivery-path
> delivery-metadata (account, GUID, ...)
>
> destinations
>
> subject
> size
> content
>
> hash-raw-id <- after message-id
> hash-invariant-id <- after removing inconstants
> hash-uncoded-id <- after uncoding out to full
>
> Because messages are supposed to be unique,
> there's an idea to sort of detect differences.
>
>
> The idea is to sort of implement NNTP's OVERVIEW
> and WILDMAT, then there's IMAP, figuring that the
> first goals of SFF is to implement the normative
> commands, then with regards to implementations,
> basically working up for HTTP SEARCH, a sort of
> normative representation of messages, groups,
> threads, and so on, sort of what results a neat sort
> standard system for all sorts purposes these, "posts".
>
>
> Anybody know any "normative RFC email's in HTTP"?
> Here the idea is basically that a naive server
> simply gets pointed at BFF files for message-id
> and loads any message there as an HTTP representation,
> with regards to HTTP, HTML, and so on, about these
> sorts "sensible, fungible, tractable" conventions.
>
>
> It's been a while since I studied the standards,
> so I'm looking to get back tapping at the C10K server
> here, basically with hi-po full throughput then with
> regards to the sentinel/doorman bit (Load R/F/S/H).
>
> So, I'll be looking for "partially digested and
> composable search metadata formats" and "informative
> and normative standards-based message and content".
>
> They already have one of those, it's called "Internet".
>
>



Reading up on anti-spam, it seems that Usenet messages have
a pretty simple format, then with regards to all of Internet
messages, or Email and MIME and so on, gets into basically
the nitty-gritty of the Internet Protocols like SMTP, IMAP, NNTP,
and HTTP, about figuring out what's the needful then for things
like Netnews messages, Email messages, HTTP messages,
and these kinds of things, basically for message multi-part.

https://en.wikipedia.org/wiki/MIME

(DANE, DKIM, DMARC, ....)

It's kind of complicated to implement correctly the parsing
of Internet messages, so, it should be done up right.

The compeering would involve the conventions of INND.
The INND software is very usual, vis-a-vis Tornado or some
commercial cousins, these days.

The idea seems to be "run INND with cleanfeed", in terms
of control and junk and the blood/brain barrier or here
the text/binaries barrier, I'm only interested in setting up
for text and then maybe some "richer text" or as with
regards to Internet protocols for messaging and messages.

Then the idea is to implement this "clean-room", so it results
a sort of plain description of data structures logical/physical
then a reference implementation.

The groups then accepted/rejected for compeering basically
follow the WILDMAT format, which is pretty reasonable
in terms of yes/no/maybe or sure/no/yes sorts of filters.

https://www.eyrie.org/~eagle/software/inn/docs-2.6/newsfeeds.html

https://www.eyrie.org/~eagle/software/inn/docs-2.6/libstorage.html

https://www.eyrie.org/~eagle/software/inn/docs-2.6/storage.conf.html#S2

It refers to the INND storageApi token so I'll be curious about
that and BFF. The tradspool format, here as it partitions under
groups, is that BFF instead partitions under message-ID, that
then groups files have pointers into those.

message-id/

id <- "id"

hd <- "head"
bd <- "body"

td <- "thread", reference, references
rd <- "replied to", touchfile

ad <- "author directory", ... (author id)
yd <- "year to date" (date)

xd <- "expired", no-archive, ...
dd <- "dead", "soft-delete"
ud <- "undead", ...

The files here basically indicate by presence then content,
what's in the message, and what's its state. Then, the idea
is that some markers basically indicate any "inconsistent" state.

The idea is that the message-id folder should be exactly on
the order of the message size, only. I.e. besides head and body,
the other files are only presence indicators or fixed size.
And, the presence files should be limited to fit in the range
of the alphabet, as above it results single-letter named files.

Then the idea is that the message-id folder is created on the
side with id,hd,bd then just moved/renamed into its place,
then by its presence the rest follows. (That it's well-formed.)

The idea here again is that the storage is just stored deflated already,
with the idea that then as the message is served up with threading,
where to litter the thread links, and whether to only litter the
referring post's folder with the referenced post's ID, or that otherwise
there's this idea that it's a poor-man's sort of write-once-read-many
organization, that's horizontally scalable, then that any assemblage
of messages can be overlaid together, then groups files can be created
on demand, then that as far as files go, the natural file-system cache,
caches access to the files.

The idea that the message is stored compressed is that many messages
aren't much read, and most clients support compressed delivery,
and the common deflate format allows "stitching" together in
a reference algorithm, what results the header + glue + body.
This will save much space and not be too complicated to assemble,
where compression and encryption are a lot of the time,
in Internet protocols.

The message-id is part of the message, so there's some idea that
it's also related to de-duplication under path, then that otherwise
when two messages with the same message-id arrive, but different
otherwise content, is wrong, about what to do when there are conflicts
in content.

All the groups files basically live in one folder, then with regards
to their overviews, as that it sort of results just a growing file,
where the idea is that "fixed length records" pretty directly relate
a simplest sort of addressing, in a world where storage has grown
to be unbounded, if slow, that it also works well with caches and
mmap and all the usual facilities of the usual general purpose
scheduler and such.

Relating that to time-series data then and currency, is a key sort
of thing, about here that the idea is to make for time-series
organization that it's usually enough hierarchical YYYYMMDD,
or for example YYMMDD, if for example this system's epoch
is Jan 1 2000, with a usual sort of idea then to either have
a list of message ID's, or, indices that are offsets to the group
file, or, otherwise as to how to implement access in partition
to relations of the items, for browsing and searching by date.

Then it seems for authors there's a sort of "author-id" to get
sorted, so that basically like threads is for making the
set-associativity of messages and threads, and groups, to authors,
then also as with regards to NOOBNB that there are
New/Old/Off authors and Bot/Non/Bad authors,
keeping things simple.

Here the idea is that authors, who reply to other authors,
are related variously, people they reply to and people who
reply to them, and also the opposite, people who they
don't reply to and people who don't reply to them.
The idea is that common interest is reflected in replies,
and that can be read off the messages, then also as
for "direct" and "indirect" replies, either down the chain
or on the same thread, or same group.

(Cliques after Kudos and "Frenemies" after "Jabber",
are about same, in "tendered response" and "tendered reserve",
in groups, their threads, then into the domain of context.)

So, the first part of SFF seems to be making OVERVIEW,
which is usual key attributes, then relating authorships,
then as about content. As well for supporting NNTP and IMAP,
is for some default SFF supporting summary and retrieval.

groups/group-id/

ms <- messages

<- overview ?
<- thread heads/tails ?
<- authors ?
<- date ranges ?

It's a usual idea that BFF, the backing file-format, and
SFF, the search file-format, has that they're distinct
and that SFF is just derived from BFF, and on-demand,
so that it works out that search algorithms are implemented
on BFF files, naively, then as with regards to those making
their own plans and building their own index files as then
for search and pointing those back to groups, messages,
threads, authors, and so on.


The basic idea of expiry or time-to-live is basically
that there isn't one, yet, it's basically to result that
the message-id folders get tagged in usual rotations
over the folders in the arrival and date partitions,
then marked out or expunged or what, as with regards
to the write-once-read-many or regenerated groups
files, and the presence or absence of messages by their ID.
(And the state of authors, in time and date ranges.)
Ross Finlayson
2024-02-10 19:49:42 UTC
Reply
Permalink
On 02/09/2024 10:37 PM, Ross Finlayson wrote:
> On 02/08/2024 01:04 PM, Ross Finlayson wrote:
>> On 03/08/2023 08:51 PM, Ross Finlayson wrote:
>>> On Monday, December 6, 2021 at 7:32:16 AM UTC-8, Ross A. Finlayson
>>> wrote:
>>>> On Monday, November 16, 2020 at 5:39:08 PM UTC-8, Ross A. Finlayson
>>>> wrote:
>>>>> On Monday, November 16, 2020 at 5:00:51 PM UTC-8, Ross A. Finlayson
>>>>> wrote:
>>>>>> On Tuesday, June 30, 2020 at 10:00:52 AM UTC-7, Mostowski Collapse
>>>>>> wrote:
>>>>>>> NNTP is not HTTP. I was using bare metal access to
>>>>>>> usenet, not using Google group, via:
>>>>>>>
>>>>>>> news.albasani.net, unfortunately dead since Corona
>>>>>>>
>>>>>>> So was looking for an alternative. And found this
>>>>>>> alternative, which seems fine:
>>>>>>>
>>>>>>> news.solani.org
>>>>>>>
>>>>>>> Have Fun!
>>>>>>>
>>>>>>> P.S.: Technical spec of news.solani.org:
>>>>>>>
>>>>>>> Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 500 GB SSD RAID1
>>>>>>> Intel Core i7, 4x 3.4 GHz, 32 GB RAM, 2 TB HDD RAID1
>>>>>>> Intel Xeon VM, 1.8 GHz, 1 GB RAM, 20 GB
>>>>>>> Standort: 2x Falkenstein, 1x New York
>>>>>>>
>>>>>>> advantage of bare metal usenet,
>>>>>>> you see all headers of message.
>>>>>>> Am Dienstag, 30. Juni 2020 06:24:53 UTC+2 schrieb Ross A. Finlayson:
>>>>>>>> Search you mentioned and for example HTTP is adding the SEARCH
>>>>>>>> verb,
>>>>>> In traffic there are two kinds of usenet users,
>>>>>> viewers and traffic through Google Groups,
>>>>>> and, USENET. (USENET traffic.)
>>>>>>
>>>>>> Here now Google turned on login to view their
>>>>>> Google Groups - effectively closing the Google Groups
>>>>>> without a Google login.
>>>>>>
>>>>>> I suppose if they're used at work or whatever though
>>>>>> they'd be open.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Where I got with the C10K non-blocking I/O for a usenet server,
>>>>>> it scales up though then I think in the runtime is a situation where
>>>>>> it only runs epoll or kqueue that the test scale ups, then at the end
>>>>>> or in sockets there is a drop, or it fell off the driver. I've
>>>>>> implemented
>>>>>> the code this far, what has all of NNTP in a file and then the
>>>>>> "re-routine,
>>>>>> industry-pattern back-end" in memory, then for that running usually.
>>>>>>
>>>>>> (Cooperative multithreading on top of non-blocking I/O.)
>>>>>>
>>>>>> Implementing the serial queue or "monohydra", or slique,
>>>>>> makes for that then when the parser is constantly parsing,
>>>>>> it seems a usual queue like data structure with parsing
>>>>>> returning its bounds, consuming the queue.
>>>>>>
>>>>>> Having the file buffers all down small on 4K pages,
>>>>>> has that a next usual page size is the megabyte.
>>>>>>
>>>>>> Here though it seems to make sense to have a natural
>>>>>> 4K alignment the file system representation, then that
>>>>>> it is moving files.
>>>>>>
>>>>>> So, then with the new modern Java, it that runs in its own
>>>>>> Java server runtime environment, it seems I would also
>>>>>> need to see whether the cloud virt supported the I/O model
>>>>>> or not, or that the cooperative multi-threading for example
>>>>>> would be single-threaded. (Blocking abstractly.)
>>>>>>
>>>>>> Then besides I suppose that could be neatly with basically
>>>>>> the program model, and its file model, being well-defined,
>>>>>> then for NNTP with IMAP organization search and extensions,
>>>>>> those being standardized, seems to make sense for an efficient
>>>>>> news file organization.
>>>>>>
>>>>>> Here then it seems for serving the NNTP, and for example
>>>>>> their file bodies under the storage, with the fixed headers,
>>>>>> variable header or XREF, and the message body, then under
>>>>>> content it's same as storage.
>>>>>>
>>>>>> NNTP has "OVERVIEW" then from it is built search.
>>>>>>
>>>>>> Let's see here then, if I get the load test running, or,
>>>>>> just put a limit under the load while there are no load test
>>>>>> errors, it seems the algorithm then scales under load to be
>>>>>> making usually the algorithm serial in CPU, with: encryption,
>>>>>> and compression (traffic). (Block ciphers instead of serial
>>>>>> transfer.)
>>>>>>
>>>>>> Then, the industry pattern with re-routines, has that the
>>>>>> re-routines are naturally co-operative in the blocking,
>>>>>> and in the language, including flow-of-control and exception scope.
>>>>>>
>>>>>>
>>>>>> So, I have a high-performance implementation here.
>>>>> It seems like for NFS, then, and having the separate read and write
>>>>> of the client,
>>>>> a default filesystem, is an idea for the system facility: mirroring
>>>>> the mounted file
>>>>> locally, and, providing the read view from that via a different route.
>>>>>
>>>>>
>>>>> A next idea then seems for the organization, the client views
>>>>> themselves
>>>>> organize over the durable and available file system representation,
>>>>> this
>>>>> provides anyone a view over the protocol with a group file convention.
>>>>>
>>>>> I.e., while usual continuous traffic was surfing, individual reads
>>>>> over group
>>>>> files could have independent views, for example collating contents.
>>>>>
>>>>> Then, extracting requests from traffic and threads seems usual.
>>>>>
>>>>> (For example a specialized object transfer view.)
>>>>>
>>>>> Making protocols for implementing internet protocols in groups and
>>>>> so on, here makes for giving usenet example views to content
>>>>> generally.
>>>>>
>>>>> So, I have designed a protocol node and implemented it mostly,
>>>>> then about designed an object transfer protocol, here the idea
>>>>> is how to make it so people can extract data, for example their own
>>>>> data, from a large durable store of all the usenet messages,
>>>>> making views of usenet running on usenet, eg "Feb. 2016: AP's
>>>>> Greatest Hits".
>>>>>
>>>>> Here the point is to figure that usenet, these days, can be operated
>>>>> in cooperation with usenet, and really for its own sake, for leaving
>>>>> messages in usenet and here for usenet protocol stores as there's
>>>>> no reason it's plain text the content, while the protocol supports it.
>>>>>
>>>>> Building personal view for example is a simple matter of very many
>>>>> service providers any of which sells usenet all day for a good deal.
>>>>>
>>>>> Let's see here, $25/MM, storage on the cloud last year for about
>>>>> a million messages for a month is about $25. Outbound traffic is
>>>>> usually the metered cloud traffic, here for example that CDN traffic
>>>>> support the universal share convention, under metering. What that
>>>>> the algorithm is effectively tunable in CPU and RAM, makes for under
>>>>> I/O that's it's "unobtrusive" or the cooperative in routine, for CPI
>>>>> I/O and
>>>>> RAM, then that there is for seeking that Network Store or Database
>>>>> Time
>>>>> instead effectively becomes File I/O time, as what may be faster,
>>>>> and more durable. There's a faster database time for scaling the
>>>>> ingestion
>>>>> here with that the file view is eventually consistent. (And reliable.)
>>>>>
>>>>> Checking the files would be over time for example with "last checked"
>>>>> and "last dropped" something along the lines of, finding wrong
>>>>> offsets,
>>>>> basically having to make it so that it survives neatly corruption of
>>>>> the
>>>>> store (by being more-or-less stored in-place).
>>>>>
>>>>> Content catalog and such, catalog.
>>>> Then I wonder and figure the re-routine can scale.
>>>>
>>>> Here for the re-routine, the industry factory pattern,
>>>> and the commands in the protocols in the templates,
>>>> and the memory module, with the algorithm interface,
>>>> in the high-performance computer resource, it is here
>>>> that this simple kind of "writing Internet software"
>>>> makes pretty rapidly for adding resources.
>>>>
>>>> Here the design is basically of a file I/O abstraction,
>>>> that the computer reads data files with mmap to get
>>>> their handlers, what results that for I/O map the channels
>>>> result transferring the channels in I/O for what results,
>>>> in mostly the allocated resource requirements generally,
>>>> and for the protocol and algorithm, it results then that
>>>> the industry factory pattern and making for interfaces,
>>>> then also here the I/O routine as what results that this
>>>> is an implementation, of a network server, mostly is making
>>>> for that the re-routine, results very neatly a model of
>>>> parallel cooperation.
>>>>
>>>> I think computers still have file systems and file I/O but
>>>> in abstraction just because PAGE_SIZE is still relevant for
>>>> the network besides or I/O, if eventually, here is that the
>>>> value types are in the commands and so on, it is besides
>>>> that in terms of the resources so defined it still is in a filesystem
>>>> convention that a remote and unreliable view of it suffices.
>>>>
>>>> Here then the source code also being "this is only 20-50k",
>>>> lines of code, with basically an entire otherwise library stack
>>>> of the runtime itself, only the network and file abstraction,
>>>> this makes for also that modularity results. (Factory Industry
>>>> Pattern Modules.)
>>>>
>>>> For a network server, here, that, mostly it is high performance
>>>> in the sense that this is about the most direct handle on the channels
>>>> and here mostly for the text layer in the I/O order, or protocol layer,
>>>> here is that basically encryption and compression usually in the layer,
>>>> there is besides a usual concern where encryption and compression
>>>> are left out, there is that text in the layer itself is commands.
>>>>
>>>> Then, those being constants under the resources for the protocol,
>>>> it's what results usual protocols like NNTP and HTTP and other
>>>> protocols
>>>> with usually one server and many clients, here is for that these
>>>> protocols
>>>> are defined in these modules, mostly there NNTP and IMAP, ..., HTTP.
>>>>
>>>> These are here defined "all Java" or "Pure Java", i.e. let's be clear
>>>> that
>>>> in terms of the reference abstraction layer, I think computers still
>>>> use
>>>> the non-blocking I/O and filesystems and network to RAM, so that as
>>>> the I/O is implemented in those it actually has those besides instead
>>>> for
>>>> example defaulting to byte-per-channel or character I/O. I.e. the usual
>>>> semantics for servicing the I/O in the accepter routine and what makes
>>>> for that the platform also provides a reference encryption
>>>> implementation,
>>>> if not so relevant for the block encoder chain, besides that for
>>>> example
>>>> compression has a default implementation, here the I/O model is as
>>>> simply
>>>> in store for handles, channels, ..., that it results that data
>>>> especially delivered
>>>> from a constant store can anyways be mostly compressed and encrypted
>>>> already or predigested to serve, here that it's the convention, here
>>>> is for
>>>> resulting that these client-server protocols, with usually reads >
>>>> postings
>>>> then here besides "retention", basically here is for what it is.
>>>>
>>>> With the re-routine and the protocol layer besides, having written the
>>>> routines in the re-routine, what there is to write here is this
>>>> industry
>>>> factory, or a module framework, implementing the re-routines, as
>>>> they're
>>>> built from the linear description a routine, makes for as the routine
>>>> progresses
>>>> that it's "in the language" and that more than less in the terms, it
>>>> makes for
>>>> implementing the case of logic for values, in the logic's
>>>> flow-of-control's terms.
>>>>
>>>> Then, there is that actually running the software is different than
>>>> just
>>>> writing it, here in the sense that as a server runtime, it is to be
>>>> made a
>>>> thing, by giving it a name, and giving it an authority, to exist on
>>>> the Internet.
>>>>
>>>> There is basically that for BGP and NAT and so on, and, mobile fabric
>>>> networks,
>>>> IP and TCP/IP, of course IPv4 and IPv6 are the coarse fabric main
>>>> space, with
>>>> respect to what are CIDR and 24 bits rule and what makes for TCP/IP,
>>>> here
>>>> entirely the course is using the TCP/IP stack and Java's TCP/IP
>>>> stack, with
>>>> respect to that TCP/IP is so provided or in terms of process what
>>>> results
>>>> ports mostly and connection models where it is exactly the TCP after
>>>> the IP,
>>>> the Transport Control Protocol and Internet Protocol, have here both
>>>> this
>>>> socket and datagram connection orientation, or stateful and
>>>> stateless or
>>>> here that in terms of routing it's defined in addresses, under that
>>>> names
>>>> and routing define sources, routes, destinations, ..., that routine
>>>> numeric
>>>> IP addresses result in the usual sense of the network being behind
>>>> an IP
>>>> and including IPv4 network fabric with respect to local routers.
>>>>
>>>> I.e., here to include a service framework is "here besides the
>>>> routine, let's
>>>> make it clear that in terms of being a durable resource, there needs
>>>> to be
>>>> some lockbox filled with its sustenance that in some locked or constant
>>>> terms results that for the duration of its outlay, say five years, it
>>>> is held
>>>> up, then, it will be so again, or, let down to result the carry-over
>>>> that it
>>>> invested to archive itself, I won't have to care or do anything until
>>>> then".
>>>>
>>>>
>>>> About the service activation and the idea that, for a port, the
>>>> routine itself
>>>> needs only run under load, i.e. there is effectively little traffic
>>>> on the old archives,
>>>> and usually only the some other archive needs any traffic. Here the
>>>> point is
>>>> that for the Java routine there is the system port that was accepted
>>>> for the
>>>> request, that inetd or the systemd or means the network service was
>>>> accessed,
>>>> made for that much as for HTTP the protocol is client-server also for
>>>> IP the
>>>> protocol is client-server, while the TCP is packets. This is a
>>>> general idea for
>>>> system integration while here mostly the routine is that being a
>>>> detail:
>>>> the filesystem or network resource that results that the re-routines
>>>> basically
>>>> make very large CPU scaling.
>>>>
>>>> Then, it is basically containerized this sense of "at some domain
>>>> name, there
>>>> is a service, it's HTTP and NNTP and IMAP besides, what cares the
>>>> world".
>>>>
>>>> I.e. being built on connection oriented protocols like the socket
>>>> layer,
>>>> HTTP(S) and NNTP(S) and IMAP(S) or with the TLS orientation to
>>>> certificates,
>>>> it's more than less sensible that most users have no idea of
>>>> installing some
>>>> NNTP browser or pointing their email to IMAP so that the email browser
>>>> browses the newsgroups and for postings, here this is mostly only talk
>>>> about implementing NNTP then IMAP and HTTP that happens to look like
>>>> that,
>>>> besides for example SMTP or NNTP posting.
>>>>
>>>> I.e., having "this IMAP server, happens to be this NNTP module", or
>>>> "this HTTP server, happens to be a real simple mailbox these groups",
>>>> makes for having partitions and retentions of those and that basically
>>>> NNTP messages in the protocol can be more or less the same content
>>>> in media, what otherwise is of a usual message type.
>>>>
>>>> Then, the NNTP server-server routine is the progation of messages
>>>> besides "I shall hire ten great usenet retention accounts and gently
>>>> and politely draw them down and back-fill Usenet, these ten groups".
>>>>
>>>> By then I would have to have made for retention in storage, such
>>>> contents,
>>>> as have a reference value, then for besides making that independent in
>>>> reference value, just so that it suffices that it basically results
>>>> "a usable
>>>> durable filesystem that happens you can browse it like usenet". I.e. as
>>>> the pieces to make the backfill are dug up, they get assigned
>>>> reference numbers
>>>> of their time to make for what here is that in a grand schema of
>>>> things,
>>>> they have a reference number in numerical order (and what's also the
>>>> server's "message-number" besides its "message-id") as noted above this
>>>> gets into the storage for retention of a file, while, most services
>>>> for this
>>>> are instead for storage and serving, not necessarily or at all
>>>> retention.
>>>>
>>>> I.e., the point is that as the groups are retained from retention,
>>>> there is an
>>>> approach what makes for an orderly archeology, as for what convention
>>>> some data arrives, here that this server-server routine is besides
>>>> the usual
>>>> routine which is "here are new posts, propagate them", it's "please
>>>> deliver
>>>> as of a retention scan, and I'll try not to repeat it, what results
>>>> as orderly
>>>> as possible a proof or exercise of what we'll call afterward entire
>>>> retention",
>>>> then will be for as of writing a file that "as of the date, from
>>>> start to finish,
>>>> this site certified these messages as best-effort retention".
>>>>
>>>> It seems then besides there is basically "here is some mbox file,
>>>> serve it
>>>> like it was an NNTP group or an IMAP mailbox", ingestion, in terms of
>>>> that
>>>> what is ingestion, is to result for the protocol that "for this
>>>> protocol,
>>>> there is actually a normative filesystem representation that happens to
>>>> be pretty much also altogether definede by the protocol", the point is
>>>> that ingestion would result in command to remain in the protocol,
>>>> that a usual file type that "presents a usual abstraction, of a
>>>> filesystem,
>>>> as from the contents of a file", here with the notion of "for all these
>>>> threaded discussions, here this system only cares some approach to
>>>> these ten particular newgroups that already have mostly their corpus
>>>> though it's not in perhaps their native mbox instead consulted from
>>>> services".
>>>>
>>>> Then, there's for storing and serving the files, and there is the usual
>>>> notion that moving the data, is to result, that really these file
>>>> organizations
>>>> are not so large in terms of resources, being "less than gigabytes"
>>>> or so,
>>>> still there's a notion that as a durable resource they're to be made
>>>> fungible here the networked file approach in the native filesystem,
>>>> then that with respect to it's a backing store, it's to make for that
>>>> the entire enterprise is more or less to made in terms of account,
>>>> that then as a facility on the network then a service in the network,
>>>> it's basically separated the facility and service, while still of
>>>> course
>>>> that the service is basically defined by its corpus.
>>>>
>>>>
>>>> Then, to make that fungible in a world of account, while with an exit
>>>> strategy so that the operation isn't not abstract, is mostly about the
>>>> domain name, then that what results the networking, after trusted
>>>> network naming and connections for what result routing, and then
>>>> the port, in terms of that there are usual firewalls in ports though
>>>> that
>>>> besides usually enough client ports are ephemeral, here the point is
>>>> that the protocols and their well-known ports, here it's usually enough
>>>> that the Internet doesn't concern itself so much protocols but with
>>>> respect to proxies, here that for example NNTP and IMAP don't have
>>>> so much anything so related that way after startTLS. For the world of
>>>> account, is basically to have for a domain name, an administrator, and,
>>>> an owner or representative. These are to establish authority for
>>>> changes
>>>> and also accountability for usage.
>>>>
>>>> Basically they're to be persons and there is a process to get to be an
>>>> administrator of DNS, most always there are services that a usual
>>>> person
>>>> implementing the system might use, besides for example the numerical.
>>>>
>>>> More relevant though to DNS is getting servers on the network, with
>>>> respect
>>>> to listening ports and that they connect to clients what so discover
>>>> them as
>>>> via DNS or configuration, here as above the usual notion that these are
>>>> standard services and run on well-known ports for inetd or systemd.
>>>> I.e. there is basically that running a server and dedicated networking,
>>>> and power and so on, and some notion of the limits of reliability, is
>>>> then
>>>> as very much in other aspects of the organization of the system, i.e.
>>>> its name,
>>>> while at the same time, the point that a module makes for that
>>>> basically
>>>> the provision of a domain name or well-known or ephemeral host, is the
>>>> usual notion that static IP addresses are a limited resource and as
>>>> about
>>>> the various networks in IPv4 and how they route traffic, is for that
>>>> these
>>>> services have well-known sections in DNS for at least that the most
>>>> usual
>>>> configuration is none.
>>>>
>>>> For a usual global reliability and availability, is some notion
>>>> basically that
>>>> each region and zone has a service available on the IP address, for
>>>> that
>>>> "hostname" resolves to the IP addresses. As well, in reverse, for
>>>> the IP
>>>> address and about the hostname, it should resolve reverse to hostname.
>>>>
>>>> About certificates mostly for identification after mapping to port, or
>>>> multi-home Internet routing, here is the point that whether the domain
>>>> name administration is "epochal" or "regular", is that epochs are
>>>> defined
>>>> by the ports behind the numbers and the domain name system as well,
>>>> where in terms of the registrar, the domain names are epochal to the
>>>> registrar, with respect to owners of domain names.
>>>>
>>>> Then if DNS is a datagram or UDP service is for ICMP as for TCP/IP,
>>>> and also BGP and NAT and routing and what are local and remote
>>>> addresses, here is for not-so-much "implement DNS the protocol
>>>> also while you're at it", rather for what results that there is a
>>>> durable
>>>> and long-standing and proper doorman, for some usenet.science.
>>>>
>>>> Here then the notion seems to be whether the doorman basically
>>>> knows well-known services, is a multi-homing router, or otherwise
>>>> what is the point that it starts the lean runtime, with respect to that
>>>> it's a container and having enough sense of administration its
>>>> operation
>>>> as contained. I.e. here given a port and a hostname and always running
>>>> makes for that as long as there is the low (preferable no) idle for
>>>> services
>>>> running that have no clients, is here also for the cheapest doorman
>>>> that
>>>> knows how to standup the client sentinel. (And put it back away.)
>>>>
>>>> Probably the most awful thing in the cloud services is the cost for
>>>> data ingress and egress. What that means is that for example using
>>>> a facility that is bound by that as a cost instead of under some
>>>> constant
>>>> cost, is basically why there is the approach that the containers
>>>> needs a
>>>> handle to the files, and they're either local files or network files,
>>>> here
>>>> with the some convention above in archival a shared consistent view
>>>> of all the files, or abstractly consistent, is for making that the
>>>> doorman
>>>> can handle lots of starting and finishing connections, while it is
>>>> out of
>>>> the way when usually it's client traffic and opening and closing
>>>> connections,
>>>> and the usual abstraction is that the client sentinel is never off
>>>> and doorman
>>>> does nothing, here is for attaching the one to some lower constant
>>>> cost,
>>>> where for example any long-running cost is more than some low
>>>> constant cost.
>>>>
>>>> Then, this kind of service is often represented by nodes, in the
>>>> usual sense
>>>> "here is an abstract container with you hope some native performance
>>>> under
>>>> the hypervisor where it lives on the farm on its rack, it basically
>>>> is moved the
>>>> image to wherever it's requested from and lives there, have fun, the
>>>> meter is on".
>>>> I.e. that's just "this Jar has some config conventions and you can
>>>> make the
>>>> container associate it and watchdog it with systemd for example and
>>>> use the
>>>> cgroups while you're at it and make for tempfs quota and also the
>>>> best network
>>>> file share, which you might be welcome to cache if you care just in
>>>> the off-chance
>>>> that this file-mapping is free or constant cost as long as it doesn't
>>>> egress the
>>>> network", is for here about the facilities that work, to get a copy
>>>> of the system
>>>> what with respect to its usual operation is a piece of the Internet.
>>>>
>>>> For the different reference modules (industry factories) in their
>>>> patterns then
>>>> and under combined configuration "file + process + network + fare",
>>>> is that
>>>> the fare of the service basically reflects a daily coin, in the sense
>>>> that it
>>>> represents an annual or epochal fee, what results for the time there is
>>>> what is otherwise all defined the "file + process + network + name",
>>>> what results it perpetuates in operation more than less simply and
>>>> automatically.
>>>>
>>>> Then, the point though is to get it to where "I can go to this
>>>> service, and
>>>> administer it more or less by paying an account, that it thus lives
>>>> in its
>>>> budget and quota in its metered world".
>>>>
>>>> That though is very involved with identity, that in terms of "I the
>>>> account
>>>> as provided this sum make this sum paid with respect to an agreement",
>>>> is that authority to make agreements must make that it results that the
>>>> operation of the system, is entirely transparent, and defined in
>>>> terms of
>>>> the roles and delegation, conventions in operation.
>>>>
>>>> I.e., I personally don't want to administer a copy of usenet, but,
>>>> it's here
>>>> pretty much sorted out that I can administer one once then that it's to
>>>> administer itself in the following, in terms of it having resources
>>>> to allocate
>>>> and resources to disburse. Also if nobody's using it it should
>>>> basically work
>>>> itself out to dial its lights down (while maintaining availability).
>>>>
>>>> Then a point seems "maintain and administer the operation in effect,
>>>> what arrangement sees via delegation, that a card number and a phone
>>>> number and an email account and more than less a responsible entity,
>>>> is so indicated for example in cryptographic identity thus that the
>>>> operation
>>>> of this system as a service, effectively operates itself out of a
>>>> kitty,
>>>> what makes for administration and overhead, an entirely transparent
>>>> model of a miniature business the system as a service".
>>>>
>>>> "... and a mailing address and mail service."
>>>>
>>>> Then, for accounts and accounts, for example is the provision of the
>>>> component
>>>> as simply an image in cloud algorithms, where basically as above here
>>>> it's configured
>>>> that anybody with any cloud account could basically run it on their
>>>> own terms,
>>>> there is for here sorting out "after this delegation to some business
>>>> entity what
>>>> results a corporation in effect, the rest is business-in-a-box and
>>>> more-than-less
>>>> what makes for its administration in state, is for how it basically
>>>> limits and replicates
>>>> its service, in terms of its own assets here as what administered is
>>>> abstractly
>>>> "durable forever mailboxes with private ownership if on public or
>>>> managed resources".
>>>>
>>>> A usual notion of a private email and usenet service offering and
>>>> business-in-a-box,
>>>> here what I'm looking at is that besides archiving sci.math and
>>>> copying out its content
>>>> under author line, is to make such an industry for example here that
>>>> "once having
>>>> implemented an Internet service, an Internet service of them results
>>>> Internet".
>>>>
>>>> I.e. here the point is to make a corporation and a foundation in
>>>> effect, what in terms
>>>> of then about the books and accounts, is about accounts for the
>>>> business accounts
>>>> that reflect a persistent entity, then what results in terms of
>>>> computing, networking,
>>>> and internetworking, with a regular notion of "let's never change
>>>> this arrangement
>>>> but it's in monthly or annual terms", here for that in overall
>>>> arrangements,
>>>> it results what the entire system more than less runs in ways then to
>>>> either
>>>> run out its limits or make itself a sponsored effort, about
>>>> more-or-less a simple
>>>> and responsible and accountable set of operations what effect the
>>>> business
>>>> (here that in terms of service there is basically the realm of
>>>> agreement)
>>>> that basically this sort of business-in-a-box model, is then besides
>>>> itself of
>>>> accounts, toward the notion as pay-as-you-go and "usual credits and
>>>> their limits".
>>>>
>>>> Then for a news://usenet.science, or for example
>>>> sci.math.usenet.science,
>>>> is the idea that the entity is "some assemblage what is so that in
>>>> DNS, and,
>>>> in the accounts payable and receivable, and, in the material matters of
>>>> arrangement and authority for administration, of DNS and resources and
>>>> accounts what result durably persisting the business, is basically
>>>> for a service
>>>> then of what these are usual enough tasks, as that are interactive
>>>> workflows
>>>> and for mechanical workflows.
>>>>
>>>> I.e. the point is for having the service than an on/off button and
>>>> more or less
>>>> what is for a given instance of the operation, what results from some
>>>> protocol
>>>> that provides a "durable store" of a sort of the business, that at
>>>> any time basically
>>>> some re-routine or "eventually consistent" continuance of the
>>>> operation of the
>>>> business, results basically a continuity in its operations, what is
>>>> entirely granular,
>>>> that here for example the point is to "pick a DNS name, attach an
>>>> account service,
>>>> go" it so results that in the terms, basically there are the
>>>> placeholders of the
>>>> interactive workflows in that, and as what in terms are often for
>>>> example simply
>>>> card and phone number terms, account terms.
>>>>
>>>> I.e. a service to replenish accounts as kitties for making accounts
>>>> only and
>>>> exactly limited to the one service, its transfers, basically results
>>>> that there
>>>> is the notion of an email address, a phone number, a credit card's
>>>> information,
>>>> here a fixed limit debit account that works as of a kitty, there is a
>>>> regular workflow
>>>> service that will read out the durable stores and according to the
>>>> timeliness of
>>>> their events, affect the configuration and reconciliation of payments
>>>> for accounts
>>>> (closed loop scheduling/receiving).
>>>>
>>>> https://datatracker.ietf.org/doc/draft-flanagan-regext-datadictionary/
>>>> https://www.rfc-editor.org/rfc/rfc9022.txt
>>>>
>>>> Basically for dailies, monthlies, and annuals, what make weeklies,
>>>> is this idea of Internet-from-a- account, what is services.
>>>
>>>
>>> After implementing a store, and the protocol for getting messages,
>>> then what seems relevant here in the
>>> context of the SEARCH command, is a fungible file-format, that is
>>> derived from the body of the message
>>> in a normal form, that is a data structure that represents an index
>>> and catalog and dictionary and summary
>>> of the message, a form of a data structure of a "search index".
>>>
>>> These types files should naturally compose, and result a data
>>> structure that according to some normal
>>> forms of search and summary algorithms, result that a data structure
>>> results, that makes for efficient
>>> search of sections of the corpus for information retrieval, here that
>>> "information retrieval is the science
>>> of search algorithms".
>>>
>>> Now, for what and how people search, or what is the specification of a
>>> search, is in terms of queries, say,
>>> here for some brief forms of queries that advise what's definitely
>>> included in the search, what's excluded,
>>> then perhaps what's maybe included, or yes/no/maybe, which makes for a
>>> predicate that can be built,
>>> that can be applied to results that compose and build for the terms of
>>> a filter with yes/no/maybe or
>>> sure/no/yes, with predicates in values.
>>>
>>> Here there is basically "free text search" and "matching summaries",
>>> where text is the text and summary is
>>> a data structure, with attributes as paths the leaves of the tree of
>>> which match.
>>>
>>> Then, the message has text, its body, and and headers, key-value pairs
>>> or collections thereof, where as well
>>> there are default summaries like "a histogram of words by occurrence"
>>> or for example default text like "the
>>> MIME body of this message has a default text representation".
>>>
>>> So, the idea developing here is to define what are "normal" forms of
>>> data structures that have some "normal"
>>> forms of encoding that result that these "normalizing" after
>>> "normative" data structures define well-behaved
>>> algorithms upon them, which provide well-defined bounds in resources
>>> that return some quantification of results,
>>> like any/each/every/all, "hits".
>>>
>>> This is where usually enough search engines' or collected search
>>> algorithms ("find") usually enough have these
>>> de-facto forms, "under the hood", as it were, to make it first-class
>>> that for a given message and body that
>>> there is a normal form of a "catalog summary index" which can be
>>> compiled to a constant when the message
>>> is ingested, that then basically any filestore of these messages has
>>> alongside it the filestore of the "catsums"
>>> or as on-demand, then that any algorithm has at least well-defined
>>> behavior under partitions or collections
>>> or selections of these messages, or items, for various standard
>>> algorithms that separate "to find" from
>>> "to serve to find".
>>>
>>> So, ..., what I'm wondering are what would be sufficient normal forms
>>> in brief that result that there are
>>> defined for a given corpus of messages, basically at the granularity
>>> of messages, how is defined how
>>> there is a normal form for each message its "catsum", that catums have
>>> a natural algebra that a
>>> concatenation of catums is a catsum and that some standard algorithms
>>> naturally have well-defined
>>> results on their predicates and quantifiers of matching, in serial and
>>> parallel, and that the results
>>> combine in serial and parallel.
>>>
>>> The results should be applicable to any kind of data but here it's
>>> more or less about usenet groups.
>>>
>>
>>
>> So, if you know all about old-fashioned
>> Internet protocols like DNS, then NNTP,
>> IMAP, SMTP, HTTP, and so on, then where
>> it's at is figuring out these various sorts
>> conventions then to result a sort-of, the
>> sensible, fungible, and tractable, conventions
>> of the data structures and algorithms, in
>> the protocols, what result keeping things
>> simple and standing up a usual Internet
>> messaging agentry.
>>
>>
>> BFF: backing-file formats, "Best friends forever"
>>
>> Message files
>> Group files
>>
>> Thread link files
>> Date link files
>>
>> SFF: search-file formats, "partially digested metadata"
>>
>>
>>
>> NOOBNB: Noob Nota Bene: Cur/Pur/Raw
>>
>> Load Roll/Fold/Shed/Hold: throughput/offput
>>
>>
>>
>> Then, the idea is to make it so that by constructing
>> the files or a logical/physical sort of distinction,
>> that then results a neat tape archive then that
>> those can just be laid down together and result
>> a corpus, or filtered on down and result a corpus,
>> where the existence standard is sort of called "mailbox"
>> or "mbox" format, with the idea muchly of
>> "converting mbox to BFF".
>>
>>
>> Then, for enabling search, basically the idea or a
>> design principle of the FF is that they're concatenable
>> or just overlaid and all write-once-read-many, then
>> with regards to things like merges, which also should
>> result as some sort of algorithm in tools, what results,
>> that of course usual sorts tools like textutils, working
>> on these files, would make it so that usual extant tools,
>> are native on the files.
>>
>> So for metadata, the idea is that there are standard
>> metadata attributes like the closed categories of
>> headers and so on, where the primary attributes sort
>> of look like
>>
>> message-id
>> author
>>
>> delivery-path
>> delivery-metadata (account, GUID, ...)
>>
>> destinations
>>
>> subject
>> size
>> content
>>
>> hash-raw-id <- after message-id
>> hash-invariant-id <- after removing inconstants
>> hash-uncoded-id <- after uncoding out to full
>>
>> Because messages are supposed to be unique,
>> there's an idea to sort of detect differences.
>>
>>
>> The idea is to sort of implement NNTP's OVERVIEW
>> and WILDMAT, then there's IMAP, figuring that the
>> first goals of SFF is to implement the normative
>> commands, then with regards to implementations,
>> basically working up for HTTP SEARCH, a sort of
>> normative representation of messages, groups,
>> threads, and so on, sort of what results a neat sort
>> standard system for all sorts purposes these, "posts".
>>
>>
>> Anybody know any "normative RFC email's in HTTP"?
>> Here the idea is basically that a naive server
>> simply gets pointed at BFF files for message-id
>> and loads any message there as an HTTP representation,
>> with regards to HTTP, HTML, and so on, about these
>> sorts "sensible, fungible, tractable" conventions.
>>
>>
>> It's been a while since I studied the standards,
>> so I'm looking to get back tapping at the C10K server
>> here, basically with hi-po full throughput then with
>> regards to the sentinel/doorman bit (Load R/F/S/H).
>>
>> So, I'll be looking for "partially digested and
>> composable search metadata formats" and "informative
>> and normative standards-based message and content".
>>
>> They already have one of those, it's called "Internet".
>>
>>
>
>
>
> Reading up on anti-spam, it seems that Usenet messages have
> a pretty simple format, then with regards to all of Internet
> messages, or Email and MIME and so on, gets into basically
> the nitty-gritty of the Internet Protocols like SMTP, IMAP, NNTP,
> and HTTP, about figuring out what's the needful then for things
> like Netnews messages, Email messages, HTTP messages,
> and these kinds of things, basically for message multi-part.
>
> https://en.wikipedia.org/wiki/MIME
>
> (DANE, DKIM, DMARC, ....)
>
> It's kind of complicated to implement correctly the parsing
> of Internet messages, so, it should be done up right.
>
> The compeering would involve the conventions of INND.
> The INND software is very usual, vis-a-vis Tornado or some
> commercial cousins, these days.
>
> The idea seems to be "run INND with cleanfeed", in terms
> of control and junk and the blood/brain barrier or here
> the text/binaries barrier, I'm only interested in setting up
> for text and then maybe some "richer text" or as with
> regards to Internet protocols for messaging and messages.
>
> Then the idea is to implement this "clean-room", so it results
> a sort of plain description of data structures logical/physical
> then a reference implementation.
>
> The groups then accepted/rejected for compeering basically
> follow the WILDMAT format, which is pretty reasonable
> in terms of yes/no/maybe or sure/no/yes sorts of filters.
>
> https://www.eyrie.org/~eagle/software/inn/docs-2.6/newsfeeds.html
>
> https://www.eyrie.org/~eagle/software/inn/docs-2.6/libstorage.html
>
> https://www.eyrie.org/~eagle/software/inn/docs-2.6/storage.conf.html#S2
>
> It refers to the INND storageApi token so I'll be curious about
> that and BFF. The tradspool format, here as it partitions under
> groups, is that BFF instead partitions under message-ID, that
> then groups files have pointers into those.
>
> message-id/
>
> id <- "id"
>
> hd <- "head"
> bd <- "body"
>
> td <- "thread", reference, references
> rd <- "replied to", touchfile
>
> ad <- "author directory", ... (author id)
> yd <- "year to date" (date)
>
> xd <- "expired", no-archive, ...
> dd <- "dead", "soft-delete"
> ud <- "undead", ...
>
> The files here basically indicate by presence then content,
> what's in the message, and what's its state. Then, the idea
> is that some markers basically indicate any "inconsistent" state.
>
> The idea is that the message-id folder should be exactly on
> the order of the message size, only. I.e. besides head and body,
> the other files are only presence indicators or fixed size.
> And, the presence files should be limited to fit in the range
> of the alphabet, as above it results single-letter named files.
>
> Then the idea is that the message-id folder is created on the
> side with id,hd,bd then just moved/renamed into its place,
> then by its presence the rest follows. (That it's well-formed.)
>
> The idea here again is that the storage is just stored deflated already,
> with the idea that then as the message is served up with threading,
> where to litter the thread links, and whether to only litter the
> referring post's folder with the referenced post's ID, or that otherwise
> there's this idea that it's a poor-man's sort of write-once-read-many
> organization, that's horizontally scalable, then that any assemblage
> of messages can be overlaid together, then groups files can be created
> on demand, then that as far as files go, the natural file-system cache,
> caches access to the files.
>
> The idea that the message is stored compressed is that many messages
> aren't much read, and most clients support compressed delivery,
> and the common deflate format allows "stitching" together in
> a reference algorithm, what results the header + glue + body.
> This will save much space and not be too complicated to assemble,
> where compression and encryption are a lot of the time,
> in Internet protocols.
>
> The message-id is part of the message, so there's some idea that
> it's also related to de-duplication under path, then that otherwise
> when two messages with the same message-id arrive, but different
> otherwise content, is wrong, about what to do when there are conflicts
> in content.
>
> All the groups files basically live in one folder, then with regards
> to their overviews, as that it sort of results just a growing file,
> where the idea is that "fixed length records" pretty directly relate
> a simplest sort of addressing, in a world where storage has grown
> to be unbounded, if slow, that it also works well with caches and
> mmap and all the usual facilities of the usual general purpose
> scheduler and such.
>
> Relating that to time-series data then and currency, is a key sort
> of thing, about here that the idea is to make for time-series
> organization that it's usually enough hierarchical YYYYMMDD,
> or for example YYMMDD, if for example this system's epoch
> is Jan 1 2000, with a usual sort of idea then to either have
> a list of message ID's, or, indices that are offsets to the group
> file, or, otherwise as to how to implement access in partition
> to relations of the items, for browsing and searching by date.
>
> Then it seems for authors there's a sort of "author-id" to get
> sorted, so that basically like threads is for making the
> set-associativity of messages and threads, and groups, to authors,
> then also as with regards to NOOBNB that there are
> New/Old/Off authors and Bot/Non/Bad authors,
> keeping things simple.
>
> Here the idea is that authors, who reply to other authors,
> are related variously, people they reply to and people who
> reply to them, and also the opposite, people who they
> don't reply to and people who don't reply to them.
> The idea is that common interest is reflected in replies,
> and that can be read off the messages, then also as
> for "direct" and "indirect" replies, either down the chain
> or on the same thread, or same group.
>
> (Cliques after Kudos and "Frenemies" after "Jabber",
> are about same, in "tendered response" and "tendered reserve",
> in groups, their threads, then into the domain of context.)
>
> So, the first part of SFF seems to be making OVERVIEW,
> which is usual key attributes, then relating authorships,
> then as about content. As well for supporting NNTP and IMAP,
> is for some default SFF supporting summary and retrieval.
>
> groups/group-id/
>
> ms <- messages
>
> <- overview ?
> <- thread heads/tails ?
> <- authors ?
> <- date ranges ?
>
> It's a usual idea that BFF, the backing file-format, and
> SFF, the search file-format, has that they're distinct
> and that SFF is just derived from BFF, and on-demand,
> so that it works out that search algorithms are implemented
> on BFF files, naively, then as with regards to those making
> their own plans and building their own index files as then
> for search and pointing those back to groups, messages,
> threads, authors, and so on.
>
>
> The basic idea of expiry or time-to-live is basically
> that there isn't one, yet, it's basically to result that
> the message-id folders get tagged in usual rotations
> over the folders in the arrival and date partitions,
> then marked out or expunged or what, as with regards
> to the write-once-read-many or regenerated groups
> files, and the presence or absence of messages by their ID.
> (And the state of authors, in time and date ranges.)
>
>
>

About TLS again, encryption, one of the biggest costs
of serving data in time (CPU time), is encryption, the
other usually being compression, here with regards
to what are static assets or already generated and
sort of digested.

So, looking at the ciphersuites of TLS, is basically
that after the handshake and negotiation, and
as above there's the notion of employing
renegotiation in 1.2 to share "closer certificates",
that 1.3 cut out, that after negotiation then is
the shared secret of the session that along in
the session the usual sort of symmetric block-cipher
converts the plain- or compressed-data, to,
the encrypted and what results the wire data.
(In TLS the client and server use the same
"master secret" for the symmetric block/stream
cipher both ways.)

So what I'm wondering is about how to make it
so, that the data is stored first compressed at
rest, and in pieces, with the goal to make it so
that usual tools like zcat and zgrep work on
the files at rest, and for example inflate them
for use with textutils. Then, I also wonder about
what usual ciphersuites result, to make it so that
there's scratch/discardable/ephemeral/ad-hoc/
opportunistic derived data, that's at least already
"partially encrypted", so that then serving it for
the TLS session, results a sort of "block-cipher's
simpler-finishing encryption".

Looking at ChaCha algorithm, it employs
"addition, complement, and rotate".
(Most block and streaming ciphers aim to
have the same size of the output as the input
with respect to otherwise a usual idea that
padding output reduces available information.)

https://en.wikipedia.org/wiki/Block_cipher
https://en.wikipedia.org/wiki/Stream_cipher

So, as you can imagine, block-ciphers are
a very minimal subset of ciphers altogether.

There's a basic idea that the server just always
uses the same symmetric keys so that then
it can just encrypt the data at rest with those,
and, serve them right up. But, it's a matter of
the TLS Handshake establishing the "PreMaster
secret" (or, lack thereof) and it's "pesudo-random function",
what with regards to the server basically making
for contriving its "random number" earlier in
the handshake to arrive at some "predetermined
number".

Then the idea is for example just to make it
so for each algorithm that the data's stored
encrypted then that it kind of goes in and out
of the block cipher, so that then it sort of results
that it's already sort of encrypted and takes less
rounds to line up with the session secret.

https://datatracker.ietf.org/doc/html/rfc8446

"All the traffic keying material is recomputed
whenever the underlying Secret changes
(e.g., when changing from the handshake
to Application Data keys or upon a key update)."

TLS 1.3: "The key derivation functions have
been redesigned. The new design allows
easier analysis by cryptographers due to
their improved key separation properties.
The HMAC-based Extract-and-Expand Key
Derivation Function (HKDF) is used as an
underlying primitive."

https://en.wikipedia.org/wiki/HKDF

So, the idea is "what goes into HKDF so
that it results a known value, then
having the data already encrypted for that."

I'm not much interested in actual _strength_
of encryption, just making it real simple in
the protocol to have static data ready to
send right over the wire according to the
server indicating in the handshake how it will be.

And that that can change on demand, ....

"Values are defined in Appendix B.4."

https://datatracker.ietf.org/doc/html/rfc8446#appendix-B.4

So, I'm looking at GCM, CCM, and POLY1305,
with respect to how to compute values that
it results the HKDF is a given value.

https://en.wikipedia.org/wiki/Cipher_suite

Then also there's for basically TLS 1.2, just
enough backward and forward that the server
can indicate the ciphersuite, and the input to
the key derivation function, for which its data is
already ready.

It's not the world's hardest problem to arrive
at what inputs will make for a given hash
algorithm that it will arrive at a given hash,
but it's pretty tough. Here though it would
allow this weak encryption (and caching of them)
the static assets, then serving them in protocol,
figuring that man-in-the-middle is already broken
anyways, with regards to the usual 100's of
"root CAs" bundled with usual User-Agentry.

I.e., the idea here is just to conform with TLS,
while, having the least cost to serve it, while, using
standard algorithms, and not just plain-text,
then, being effectively weak, and, not really
expecting any forward privacy, but, saving
the environment by using less watts.

Then what it seems results is that the server just
indicates ciphersuites that have that the resulting
computed key can be made so for its hash,
putting the cost on the handshake, then
that the actual block cipher is a no-op.


You like ...?
Ross Finlayson
2024-02-11 22:18:16 UTC
Reply
Permalink
So I'm looking at my hi-po C10K low-load/constant-load
Internet text protocol server, then with respect to
encryption and compression as usual, then I'm looking
to make that in the framework, to have those basically
be out-of-band, with respect to things like
encryption and compression, or things like
transport and HTTP or "upgrade".

I.e., the idea here is to implement the servers first
in "TLS-terminated" or un-encrypted, then as with
respect to having enough aware in the protocol,
to make for adapting to encrypting and compressing
and upgrading front-ends, with regards to the
publicly-facing endpoints and the internally-facing
endpoints, which you would know about if you're
usually enough familiar with client-server frameworks
and server-oriented architecture and these kinds of
things.

The idea then is to offload the TLS-termination
to a sort of dedicated layer, then as with regards
to a generic sort of "out-of-band" state machine
the establishment and maintenance of the connections,
where still I'm mostly interested in "stateful" protocols
or "connection-oriented" vis-a-vis the "datagram"
protocols, or about endpoints and sockets vis-a-vis
endpoints and datagrams, those usually enough sharing
an address family while variously their transport (packets).

Then there's sort of whether to host TLS-termination
inside the runtime as usually, or next to it as sort of
either in-process or out-of-process, similarly with
compression, and including for example concepts
of cache-ing, and upgrade, and these sorts things,
while keeping it so that the "protocol module" is
all self-contained and behaves according to protocol,
for the great facility of the standardization and deployment
of Internet protocols in a friendly sort of environment,
vis-a-vis the DMZ to the wider Internet, as basically with
the idea of only surfacing one well-known port and otherwise
abstracting away the rest of the box altogether,
to reduce the attack surface its vectors, for
a usual goal of thread-modeling, reducing it.


So people would usually enough just launch a proxy,
but I'm mostly interested only in supporting TLS and
perhaps compression in the protocol as only altogether
a pass-through layer, then as with regards to connecting
that in-process as possible, so passing I/O handles,
otherwise with a usual notion of domain sockets
or just plain Address Family UNIX sockets.

There's basically whether the publicly-facing actually
just serves on the usual un-encrypted port, for the
insensitive types of things, and the usual encrypted
port, or whether it's mostly in the protocol that
STARTTLS or "upgrade" occurs, "in-band" or "out-of-band",
and with respect to usually there's no notion at all
of STREAMS or "out-of-band" in STREAMS, sockets,
Address Family UNIX.


The usual notion here is making it like so:

NNTP
IMAP -> NNTP
HTTP -> IMAP -> NNTP

for a Usenet service, then as with respect to
that there's such high affinity of SMTP, then
as with regards to HTTP more generally as
the most usual fungible de facto client-server
protocol, is connecting those locally after
TLS-termination, while still having TLS-layer
between the Internet and the server.

So in this high-performance implementation it
sort of relies directly on the commonly implemented
and ubiquitously available non-blocking I/O of
the runtime, here as about keeping it altogether
simple, with respect to the process model,
and the runtime according to the OS/virt/scheduler's
login and quota and bindings, and back-end,
that in some runtimes like an app-container,
that's supposed to live all in-process, while with
respect to off-loading load to right-sized resources,
it's sort of general.

Then I've written this mostly in Java and plan to
keep it this way, where the Direct Memory for
the service of non-blocking I/O, is pretty well
understood, vis-a-vis actually just writing this
closer to the user-space libraries, here as with
regards to usual notions of cross-compiling and
so on. Here it's kind of simplified because this
entire stack has no dependencies outside the
usual Virtual Machine, it compiles and runs
without a dependency manager at all, then
though that it gets involved the parsing the content,
while simply the framework of ingesting, storing,
and moving blobs is just damn fast, and
very well-behaved in the resources of the runtime.

So, setting up TLS termination for these sorts
protocols where the protocol either does or
doesn't have an explicit STARTTLS up front
or always just opens with the handshake,
basically has where I'm looking at how to
instrument and connect that for the Hopper
as above and how besides passing native
file and I/O handles and buffers, what least
needful results a useful approach for TLS on/off.

So, this is a sort of approach, figuring for
"nesting the protocols", where similarly is
the goal of having the fronting of the backings,
sort of like so, ...

NNTP
IMAP -> NNTP
HTTP -> NNTP
HTTP -> IMAP -> NNTP

with the front being in the protocol, then
that HTTP has a sort of normative protocol
for IMAP and NNTP protocols, and IMAP
has as for NNTP protocols, treating groups
like mailboxes, and commands as under usual
sorts HTTP verbs and resources.

Similarly the same server can just serve each
the relevant protocols on each the relevant ports.

If you know these things, ....
Ross Finlayson
2024-02-12 18:55:13 UTC
Reply
Permalink
Looking at how Usenet moderated groups operate,
well first there's PGP and control messages then
later it seems there's this sort Stump/Webstump
setup, or as with regards to moderators.isc.org,
what is usual with regards to control messages
and usual notions of control and cancel messages
and as with regards to newsgroups that actually
want to employ Usenet moderation sort of standardly.

(Usenet trust is mostly based on PGP, or
'Philip Zimmerman's Pretty Good Privacy',
though there are variations and over time.)

http://tools.ietf.org/html/rfc5537

http://wiki.killfile.org/projects/usenet/faqs/nam/


Reading into RFC5537 gets into some detail like
limits in the headers field with respect to References
or Threads:

https://datatracker.ietf.org/doc/html/rfc5537#section-3.4.4

https://datatracker.ietf.org/doc/html/rfc5537#section-3.5.1

So, the agents are described as

Posting
Injecting
Relaying
Serving
Reading

Moderator
Gateway

then with respect to these sorts separations duties,
the usual notions of Internet protocols their agents
and behavior in the protocol, old IETF MUST/SHOULD/MAY
and so on.

So, the goal here seems to be to define a
profile of "connected core services" of sorts
of Internet protocol messaging, then this
"common central storage" of this BFF/SFF
and then reference implementations then
for reference editions, these sorts things.

Of course there already is one, it's called
"Internet mail and news".
Ross Finlayson
2024-02-14 19:03:46 UTC
Reply
Permalink
On 02/11/2024 02:18 PM, Ross Finlayson wrote:
>
> So I'm looking at my hi-po C10K low-load/constant-load
> Internet text protocol server, then with respect to
> encryption and compression as usual, then I'm looking
> to make that in the framework, to have those basically
> be out-of-band, with respect to things like
> encryption and compression, or things like
> transport and HTTP or "upgrade".
>
> I.e., the idea here is to implement the servers first
> in "TLS-terminated" or un-encrypted, then as with
> respect to having enough aware in the protocol,
> to make for adapting to encrypting and compressing
> and upgrading front-ends, with regards to the
> publicly-facing endpoints and the internally-facing
> endpoints, which you would know about if you're
> usually enough familiar with client-server frameworks
> and server-oriented architecture and these kinds of
> things.
>
> The idea then is to offload the TLS-termination
> to a sort of dedicated layer, then as with regards
> to a generic sort of "out-of-band" state machine
> the establishment and maintenance of the connections,
> where still I'm mostly interested in "stateful" protocols
> or "connection-oriented" vis-a-vis the "datagram"
> protocols, or about endpoints and sockets vis-a-vis
> endpoints and datagrams, those usually enough sharing
> an address family while variously their transport (packets).
>
> Then there's sort of whether to host TLS-termination
> inside the runtime as usually, or next to it as sort of
> either in-process or out-of-process, similarly with
> compression, and including for example concepts
> of cache-ing, and upgrade, and these sorts things,
> while keeping it so that the "protocol module" is
> all self-contained and behaves according to protocol,
> for the great facility of the standardization and deployment
> of Internet protocols in a friendly sort of environment,
> vis-a-vis the DMZ to the wider Internet, as basically with
> the idea of only surfacing one well-known port and otherwise
> abstracting away the rest of the box altogether,
> to reduce the attack surface its vectors, for
> a usual goal of thread-modeling, reducing it.
>
>
> So people would usually enough just launch a proxy,
> but I'm mostly interested only in supporting TLS and
> perhaps compression in the protocol as only altogether
> a pass-through layer, then as with regards to connecting
> that in-process as possible, so passing I/O handles,
> otherwise with a usual notion of domain sockets
> or just plain Address Family UNIX sockets.
>
> There's basically whether the publicly-facing actually
> just serves on the usual un-encrypted port, for the
> insensitive types of things, and the usual encrypted
> port, or whether it's mostly in the protocol that
> STARTTLS or "upgrade" occurs, "in-band" or "out-of-band",
> and with respect to usually there's no notion at all
> of STREAMS or "out-of-band" in STREAMS, sockets,
> Address Family UNIX.
>
>
> The usual notion here is making it like so:
>
> NNTP
> IMAP -> NNTP
> HTTP -> IMAP -> NNTP
>
> for a Usenet service, then as with respect to
> that there's such high affinity of SMTP, then
> as with regards to HTTP more generally as
> the most usual fungible de facto client-server
> protocol, is connecting those locally after
> TLS-termination, while still having TLS-layer
> between the Internet and the server.
>
> So in this high-performance implementation it
> sort of relies directly on the commonly implemented
> and ubiquitously available non-blocking I/O of
> the runtime, here as about keeping it altogether
> simple, with respect to the process model,
> and the runtime according to the OS/virt/scheduler's
> login and quota and bindings, and back-end,
> that in some runtimes like an app-container,
> that's supposed to live all in-process, while with
> respect to off-loading load to right-sized resources,
> it's sort of general.
>
> Then I've written this mostly in Java and plan to
> keep it this way, where the Direct Memory for
> the service of non-blocking I/O, is pretty well
> understood, vis-a-vis actually just writing this
> closer to the user-space libraries, here as with
> regards to usual notions of cross-compiling and
> so on. Here it's kind of simplified because this
> entire stack has no dependencies outside the
> usual Virtual Machine, it compiles and runs
> without a dependency manager at all, then
> though that it gets involved the parsing the content,
> while simply the framework of ingesting, storing,
> and moving blobs is just damn fast, and
> very well-behaved in the resources of the runtime.
>
> So, setting up TLS termination for these sorts
> protocols where the protocol either does or
> doesn't have an explicit STARTTLS up front
> or always just opens with the handshake,
> basically has where I'm looking at how to
> instrument and connect that for the Hopper
> as above and how besides passing native
> file and I/O handles and buffers, what least
> needful results a useful approach for TLS on/off.
>
> So, this is a sort of approach, figuring for
> "nesting the protocols", where similarly is
> the goal of having the fronting of the backings,
> sort of like so, ...
>
> NNTP
> IMAP -> NNTP
> HTTP -> NNTP
> HTTP -> IMAP -> NNTP
>
> with the front being in the protocol, then
> that HTTP has a sort of normative protocol
> for IMAP and NNTP protocols, and IMAP
> has as for NNTP protocols, treating groups
> like mailboxes, and commands as under usual
> sorts HTTP verbs and resources.
>
> Similarly the same server can just serve each
> the relevant protocols on each the relevant ports.
>
> If you know these things, ....
>
>



So one thing I want here is to make it so that data can
be encrypted very weakly at rest, then, that, the SSL
or TLS for TLS 1.2 or TLS 1.3, results that the symmetric
key bits for the records is always the same as this what
is the very-weak key.

This way pretty much the entire CPU load of TLS is
eliminated, while still the data is encrypted very-weakly
which at least naively is entirely inscrutable.

The idea is that in TLS 1.2 there's this

client random cr ->
<- server random sr
client premaster cpm ->

these going into PRF (cpm, 'blah', cr + sr, [48]), then
whether renegotiation keeps the same client random
and client premaster, then that the server can compute
the server random to make it so derived the very-weakly
key, or for example any of what results least-effort.

Maybe not, sort of depends.

Then the TLS 1.3 has this HKDF, HMAC Key Derivation Function,
it can again provide a salt or server random, then as with
regards to that filling out in the algorithm to result the
very-weakly key, for a least-effort block cipher that's also
zero-effort and being a pass-through no-op, so the block cipher
stays out the way of the data already concatenably-compressed
and very-weakly encrypted at rest.


Then it looks like I'd be trying to make hash collisions which
is practically intractable, about what goes into the seeds
whether it can result things like "the server random is
zero minus the client random, their sum is zero" and
this kind of thing.


I suppose it would be demonstrative to setup a usual
sort of "TLS man-in-the-middle" Mitm just to demonstrate
that given the client trusts any of Mitm's CAs and the
server trusts any of Mitm's CAs that Mitm sits in the middle
and can intercept all traffic.

So, the TLS 1.2, PRF or pseudo-random function, is as of
"a secret, a seed, and an identifying label". It's all SHA-256
in TLS 1.2. Then it's iterative over the seed, that the
secret is hashed with the seed-hashed secret so many times,
each round of that concatenated ++ until there's enough bytes
to result the key material. Then in TLS the seed is defined
as "blah' ++ seed, so, to figure out how to figure to make it
so that 'blah' ++ (client random + server random) makes it
possible to make a spigot of the hash algorithm, of zeros,
or an initial segment long enough for all key sizes,
to split out of that the server write MAC and encryption keys,
then to very-weakly encrypt the data at rest with that.

Then the client would still be sending up with the client
MAC and encryption keys, about whether it's possible
to setup part of the master key or the whole thing.
Whether a client could fabricate the premaster secret
so that the data resulted very-weakly encryped on its
own terms, doesn't seem feasible as the client random
is sent first, but cooperating could help make it so,
with regards to the client otherwise picking a weak
random secret overall.

(Figuring TLS interception is all based on Mitm,
not "cryptanalysis and the enigma cipher", and
even the very-weakly just look like 0's and 1's.)

So, P_SHA256 is being used to generated 48 bytes,
so that's two rounds, where the first round is
32 bytes then second 32 bytes half those dropped,
then if the client/server MAC/encrypt
are split up into those, ..., or rather only the first
32 bytes, then only the first SHA 256 round occurs,
if the Initialization Vector IV's are un-used, ...,
results whether it's possible to figure out
whether "master secret" ++ (client random + server random),
makes for any way for such a round of SHA-256,
given an arbitrary input to result a contrived value.

Hm..., reading thar Web suggests that "label + seed"
is the concatenation of the 'blah' and the digits of
client random + server random, as character digits.

Let's see, a random then looks like so,

struct {
uint32 gmt_unix_time;
opaque random_bytes[28];
} Random;

thus that's quite a bit to play with, but I'm
not sure at all how to make it so that round after
round of SHA-256, settles on down to a constant,
given that 28 bytes' decimal digits worth of seed
can be contrived, while the first 4 bytes of the
resulting 32 bytes is a gmt_unix_time, with the
idea that they may be scrambled, as it's not mentioned
anywhere else to check the time in the random.

"Clocks are not required to be set correctly
by the basic TLS protocol; higher-level or
application protocols may define additional
requirements."

So, the server-random can be contrived,
what it results the 13 + 32 bytes that are
the seed for the effectively 1-round SHA-256
hash of an arbitrary input, that the 32 bytes
can be contrived, then is for wondering
about how to make it so that results a
contrived very-weakly SHA-256 output.

So the premaster secret is decrypted with
the server's private key, or as with respect
to the exponents of DH or what, then that's
padded to 64 bytes, which is also the SHA-256
chunk size, then the output of the first round
the used keys and second the probably un-used
initialization vectors, ...

https://en.wikipedia.org/wiki/SHA-2#Pseudocode


"The SHA-256 hash algorithm produces hash values
that are hard to predict from the input."
--

https://datatracker.ietf.org/doc/html/rfc2104

So with client-random from ClientHello,
and server-random from ServerHello,
then ClientKeyExchange sends 48 bytes
premaster secret, then

SHA256_Hmac(premaster[48], blahrandom[13+32])

is then taking two rounds and the first only is
the 32 bytes of 8 bytes each:

client write MAC key
server write MAC key
client write encryption key
server write encryption key
client write IV
server write IV

according to SecurityParameters, ...,
https://www.ietf.org/rfc/rfc5246.html#section-6.1 ,


enum { null, rc4, 3des, aes }
BulkCipherAlgorithm;


So, figuring TLS certificates are standard RSA,
then setting up to serve that up, on the handshakes,

CipherSuite / KeyExchange / Cipher / Mac
TLS_NULL_WITH_NULL_NULL NULL NULL NULL
TLS_RSA_WITH_NULL_MD5 RSA NULL MD5
TLS_RSA_WITH_NULL_SHA RSA NULL SHA
TLS_RSA_WITH_NULL_SHA256 RSA NULL SHA256
TLS_RSA_WITH_RC4_128_MD5 RSA RC4_128 MD5
TLS_RSA_WITH_RC4_128_SHA RSA RC4_128 SHA
TLS_RSA_WITH_3DES_EDE_CBC_SHA RSA 3DES_EDE_CBC SHA
TLS_RSA_WITH_AES_128_CBC_SHA RSA AES_128_CBC SHA
TLS_RSA_WITH_AES_256_CBC_SHA RSA AES_256_CBC SHA
TLS_RSA_WITH_AES_128_CBC_SHA256 RSA AES_128_CBC SHA256
TLS_RSA_WITH_AES_256_CBC_SHA256 RSA AES_256_CBC SHA256

figuring the client will support at least one of those
while for example perhaps not supporting any
with "null" or "rc4" or "3des", ..., is though the
idea that if the very-weakly bulk key can be contrived,
then to make at-rest editions of each of those,
though they're unlikely to be supported,
when stronger ciphersuites are available.

Cipher / Type / Key Material Size / IV Size / Block Size
NULL Stream 0 0 N/A
RC4_128 Stream 16 0 N/A
3DES_EDE_CBC Block 24 8 8
AES_128_CBC Block 16 16 16
AES_256_CBC Block 32 16 16

Key Material
The number of bytes from the key_block that are used for
generating the write keys.

Ah, then this makes for the section 6.3, Key Calculation,
https://www.ietf.org/rfc/rfc5246.html#section-6.3 ,
generating the key_block is another pseudo-random function,
but it says that blah is 'key expansion'[13], where the relevant
of these would be the Key Material Size would be these
lengths.

client_write_MAC_key[SecurityParameters.mac_key_length]
server_write_MAC_key[SecurityParameters.mac_key_length]
client_write_key[SecurityParameters.enc_key_length]
server_write_key[SecurityParameters.enc_key_length]

Then I'd be bummed to try and contrive 64, 96, or 128 bytes
output, with the 13 + 32 many bytes into the HMAC, 32 contrived,
given arbitrary input the master secret [48], where 1 round @32
is more simple than 2, 4, or 6 rounds input. (SHA-256 makes @32,
PRF makes rounds.)

So here the hash function is SHA-256, the master secret is the input[48],
and the hash secret is blah++contrived[13+32].

HMac(SHA-256, blah++contrived[13+32], input[48])

So, SHA-256 has (input[64], output[32]), thus the
input[48] will be padded to input[64], ..., where the
padding is a 1 bit then rest 0 bits. Well that kind of
simplified things, the first round input ends with 0's,
then to get those 0's propagating and contrive a
key what results 0's.

So for HMac for SHA-256, https://datatracker.ietf.org/doc/html/rfc2104 ,
input B=padround[64]
output L=nextround[32]
key K=blah[13+32]

The Hmac has these inner and outer masks of 0x36, 0x5C,
like 00110110b and 01011100b, ....

So, the first SHA-256 chunk will be Kinner, then padround,
then that's digested to inround, then the first SHA-256 chunk
will be Kouter, then inround, the output of that results nextround.
So, the contrivance the 64 bytes of K, with first 13 fixed,
32 variable, and 19 zeros, then gets involved with how to
go about resulting any kind of contrivance of nextround.

The simplest would be zeros, with the idea that K is 13 bytes
fixed, then 51 zeros.


Then, really though it's about contriving the master secret,
because, then of course the key derivation is derived from
that and would be a constant, if client-random + server-random,
is also a constant. Otherwise the idea would be to try to
contrive the 'key extraction' instead of 'master secret',
because only server-random can be contrived.

So, the only thing I can figure is to contrive it so most
the 'blah' is just the initial SHA-256 seeds so they zero
out, but then, that would only reduce the possible values
and not much help make for "very-weakly encrypted at rest".

It's a good dog - but it won't hunt.

Looking into implementing TLS, then, basically for
the server side has that usually CA certificates are
either installed in system stores, or, a keystore is
particular for virtual machines or runtimes, with
respect to certificate generation and distribution
and rotation.

The algorithms, ..., aren't so many, ..., though it gets
involved the ASN.1 and the OID's and the algorithms,
the contents and constants of the PKCS files, here
though as above is a sort of run-through of
the TLS protocol, then as with regards to how to
keep it out the way of the I/O, where this otherwise
very low CPU-intensive runtime, spends most its
time flipping and unflipping bits.

There's a world of cryptographic algorithms,
but there are only so many in use in basically
only TLS 1.2 and TLS 1.3 and without TLS 1.2
compression, making for that for encryption
and compression, to be making a reference layer
for that, what's otherwise a very plain sort
of data-moving I/O machine.

Yeah it looks like RSA, then Diffie-Hellman,
with a bit of ASN.1 or OSI the usual sorts
of X.400/X.500 bits, then various hash algorithms,
pseudorandom functions for those, then
some various block ciphers, with regards to
PSK (pre-shared key, not phase-shift keying),
RC4 and 3DES and AES the block ciphers,
then about Elliptic Curve, hmm....

(It's pretty clear that any Mitm that can
sign as any of the CAs in client's trust store
has keys-to-the-kingdom.)

Now I remember following Elliptic Curve a
bit when it was still IEEE working group on
same, but I don't like that it's not just plain
IETF RFC's, expecting to achieve interoperability
largely from IETF RFC's.

TLs 1.3 (RFC 8446):

"A TLS-compliant application MUST implement the TLS_AES_128_GCM_SHA256
[GCM] cipher suite and SHOULD implement the TLS_AES_256_GCM_SHA384
[GCM] and TLS_CHACHA20_POLY1305_SHA256 [RFC8439] cipher suites (see
Appendix B.4).

A TLS-compliant application MUST support digital signatures with
rsa_pkcs1_sha256 (for certificates), rsa_pss_rsae_sha256 (for
CertificateVerify and certificates), and ecdsa_secp256r1_sha256. A
TLS-compliant application MUST support key exchange with secp256r1
(NIST P-256) and SHOULD support key exchange with X25519 [RFC7748]."
-- https://datatracker.ietf.org/doc/html/rfc8446

Implementing a pretty reasonable default application
profile of TLS, or basically 1.2 and 1.3 support, it's usually
enough considered one of those involved things, but
it can be a good idea to have one, when the goals for
the reference implementation include being that
it's repurposable to various runtimes.

https://datatracker.ietf.org/doc/html/rfc6655
https://datatracker.ietf.org/doc/html/rfc6655
https://datatracker.ietf.org/doc/html/rfc8439

The whole idea that TLS 1.3 makes every payload
wrapped in AEAD sort of seems like getting in the way,
not to mention having plaintext. ("It's already
on the wire", "pick it up".) The whole idea of having
to keep I/O sequence when before it's just "that's
its write key and MAC", and be always changing it up,
seems a bit too involved. Or, I guess it was Fragmentation
and Compression in TLS 1.2, TLs 1.2 "All Records are
compressed", TLS 1.3 "No compression, all records are AEAD."

"A 64-bit sequence number ...."

https://datatracker.ietf.org/doc/html/rfc5116

Hmm....

https://www.ietf.org/rfc/rfc5246.html#section-6.2

"The TLS record layer receives uninterpreted data
from higher layers in non-empty blocks of arbitrary size."

So, in these otherwise kind of simple Internet protocols,
TLS seems about the most involved, the other protocols
being all stable, yet, it is used on everything, so, there's that, ....

Now, there's still lots of software that was implemented
with TLS 1.1. TLS 1.0 is just too old, and, SSLv3 is right out,
though there's something to be said for that also they
have ways to confound Mitm. (Which here is contrived
as about PSK otherwise randoms, which just get replayed
anyways.) So anyways the idea is to make for a gentle
sort of common application profile of TLS, since 1.0,
then with regards to making for it that it's fungible.

https://www.ietf.org/rfc/rfc4346.html TLS 1.1
https://datatracker.ietf.org/doc/rfc8996/ 1.0 and 1.1 deprecated

Then, looking back to the hi-po I/O idea, basically has
that each connection's context then has that fragmentation
is about the most "off by one" bit to get figured. Even if
the data's not very-weakly encrypted at rest, gets into
fragmenting it at rest, then that at least the encryption
is just filling in and flipping bits, not changing layout,
at the composition of the message layer.

So, looking at this with respect to "implementing the
required TLS algorithms neatly can make for a usual
sort of unintrusive reference routine", you know,
vis-a-vis "a huge clunking cludge of smudge of pudge".

Not that there's anything necessarily wrong with that, ....
Ross Finlayson
2024-02-17 19:38:33 UTC
Reply
Permalink
"Search", then, here the idea is to facilitate search, variously.

SEARCH: it's an HTTP verb, with an indicate request body.
What are its semantics? It's undefined, just a request/response
with a request body.

SEARCH: it's an IMAP command.

WILDMAT: sometimes "find" is exactly the command that's
running on file systems, and its predicates are similar with
WILDMAT, as with regards to match/dont/match/dont/...,
about what is "accepter/rejector networks", for the usual
notions of formal automata of the accepter and rejector,
and the binary propositions what result match/dont,
with regards usually to the relation called "match".

After BFF a sort of "a normative file format with the
properties of being concatenable resulting set-like
semantics", is the idea that "SFF" or "search file format"
is for _summaries_ and _digests_ and _intermediate_
forms, what result data that's otherwise derived from
"the data", derived on demand or cached opportunistically,
about the language of "Information Retrieval", after the
language of "summary" and "digest".

The word "summary" basically reflects on statistics,
that a "summary statistic" in otherwise the memoryless,
like a mean, is for histograms and for "match", about
making what is summary data.

For some people the search corpus is indices, for
something like the open-source search engines,
which are just runtimes that have usual sorts
binary data structures for log N lookups,
here though the idea is a general form as for
"summary", that is tractable as files, then what
can be purposed to being inputs to usual sorts
"key-value" or "content", "hits", in documents.

For some people the search corpus is the fully-normalized
database, then all sorts usual queries what result
denormalized data and summaries and the hierarchical
and these kinds things.

So, here the sort of approach is for the "Library/Museum",
about the "Browse, Exhibits, Tours, Carrels", that search
and summary and digest and report is a lot of different
things, with the idea that "SFF" files, generally, make it
sensible, fungible, and tractable, how to deal with all this.

It's not really part of "NNTP, IMAP, HTTP", yet at the same
time, it's a very generic sort of thing, here with the idea
that by designing some reference algorithms that result
making partially digested summary with context,
those just being concatenable, then that the usual
idea of the Search Query being Yes/No/Maybe or Sure/No/Yes,
that being about same as Wildmat, for variously attributes
and content, and the relations in documents and among them,
gets into these ideas about how tooling generally results,
making for files what then have simple algorithms that
work on them, variously repurposable to compiled indices
for usual "instant gratification" types.
Ross Finlayson
2024-02-18 20:14:49 UTC
Reply
Permalink
On 02/17/2024 11:38 AM, Ross Finlayson wrote:
> "Search", then, here the idea is to facilitate search, variously.
>
> SEARCH: it's an HTTP verb, with an indicate request body.
> What are its semantics? It's undefined, just a request/response
> with a request body.
>
> SEARCH: it's an IMAP command.
>
> WILDMAT: sometimes "find" is exactly the command that's
> running on file systems, and its predicates are similar with
> WILDMAT, as with regards to match/dont/match/dont/...,
> about what is "accepter/rejector networks", for the usual
> notions of formal automata of the accepter and rejector,
> and the binary propositions what result match/dont,
> with regards usually to the relation called "match".
>
> After BFF a sort of "a normative file format with the
> properties of being concatenable resulting set-like
> semantics", is the idea that "SFF" or "search file format"
> is for _summaries_ and _digests_ and _intermediate_
> forms, what result data that's otherwise derived from
> "the data", derived on demand or cached opportunistically,
> about the language of "Information Retrieval", after the
> language of "summary" and "digest".
>
> The word "summary" basically reflects on statistics,
> that a "summary statistic" in otherwise the memoryless,
> like a mean, is for histograms and for "match", about
> making what is summary data.
>
> For some people the search corpus is indices, for
> something like the open-source search engines,
> which are just runtimes that have usual sorts
> binary data structures for log N lookups,
> here though the idea is a general form as for
> "summary", that is tractable as files, then what
> can be purposed to being inputs to usual sorts
> "key-value" or "content", "hits", in documents.
>
> For some people the search corpus is the fully-normalized
> database, then all sorts usual queries what result
> denormalized data and summaries and the hierarchical
> and these kinds things.
>
> So, here the sort of approach is for the "Library/Museum",
> about the "Browse, Exhibits, Tours, Carrels", that search
> and summary and digest and report is a lot of different
> things, with the idea that "SFF" files, generally, make it
> sensible, fungible, and tractable, how to deal with all this.
>
> It's not really part of "NNTP, IMAP, HTTP", yet at the same
> time, it's a very generic sort of thing, here with the idea
> that by designing some reference algorithms that result
> making partially digested summary with context,
> those just being concatenable, then that the usual
> idea of the Search Query being Yes/No/Maybe or Sure/No/Yes,
> that being about same as Wildmat, for variously attributes
> and content, and the relations in documents and among them,
> gets into these ideas about how tooling generally results,
> making for files what then have simple algorithms that
> work on them, variously repurposable to compiled indices
> for usual "instant gratification" types.
>
>



Well, for extraction and segmentation, there's
what's involved is a model of messages and
then as of a sort of model of MIME, with
regards to "access-patternry", then for
extraction and characterization and
segmentation and ellision, these kinds of
things what result the things.

Extraction is sort of after messages attributes
or the headers, then the content encoding and
such, then as with regards to then embedding
of documents in otherwise the document.

Characterization here really reflects on character
encodings, with the idea that a corpus of words
has a range of an alphabet and that these days
of all the code pages and glyph-maps of the world,
what it reflects that members of alphabets indicate
for any given textual representation as character data,
that it matches the respective code-pages or planes
or regions of the Unicode, these days, with respect
to legacy encodings and such.

So, for extraction and characterization, then gets
into quite usual patterns of language, with things
like punctuation and syntax, bracketing and groupings,
commas and joiners and separators, the parenthetical,
comments, quoting, and these kinds of things, in
quite most all usual languages.

For message formats and MIME, then, and content-encoding
then extraction, in characterization after alphabet and
punctuation, then gets pretty directly into the lexical,
syntax, and grammar, with regards to texts.

"Theory saturation ...."
Ross Finlayson
2024-02-19 03:00:20 UTC
Reply
Permalink
On 02/18/2024 12:14 PM, Ross Finlayson wrote:
> On 02/17/2024 11:38 AM, Ross Finlayson wrote:
>> "Search", then, here the idea is to facilitate search, variously.
>>
>> SEARCH: it's an HTTP verb, with an indicate request body.
>> What are its semantics? It's undefined, just a request/response
>> with a request body.
>>
>> SEARCH: it's an IMAP command.
>>
>> WILDMAT: sometimes "find" is exactly the command that's
>> running on file systems, and its predicates are similar with
>> WILDMAT, as with regards to match/dont/match/dont/...,
>> about what is "accepter/rejector networks", for the usual
>> notions of formal automata of the accepter and rejector,
>> and the binary propositions what result match/dont,
>> with regards usually to the relation called "match".
>>
>> After BFF a sort of "a normative file format with the
>> properties of being concatenable resulting set-like
>> semantics", is the idea that "SFF" or "search file format"
>> is for _summaries_ and _digests_ and _intermediate_
>> forms, what result data that's otherwise derived from
>> "the data", derived on demand or cached opportunistically,
>> about the language of "Information Retrieval", after the
>> language of "summary" and "digest".
>>
>> The word "summary" basically reflects on statistics,
>> that a "summary statistic" in otherwise the memoryless,
>> like a mean, is for histograms and for "match", about
>> making what is summary data.
>>
>> For some people the search corpus is indices, for
>> something like the open-source search engines,
>> which are just runtimes that have usual sorts
>> binary data structures for log N lookups,
>> here though the idea is a general form as for
>> "summary", that is tractable as files, then what
>> can be purposed to being inputs to usual sorts
>> "key-value" or "content", "hits", in documents.
>>
>> For some people the search corpus is the fully-normalized
>> database, then all sorts usual queries what result
>> denormalized data and summaries and the hierarchical
>> and these kinds things.
>>
>> So, here the sort of approach is for the "Library/Museum",
>> about the "Browse, Exhibits, Tours, Carrels", that search
>> and summary and digest and report is a lot of different
>> things, with the idea that "SFF" files, generally, make it
>> sensible, fungible, and tractable, how to deal with all this.
>>
>> It's not really part of "NNTP, IMAP, HTTP", yet at the same
>> time, it's a very generic sort of thing, here with the idea
>> that by designing some reference algorithms that result
>> making partially digested summary with context,
>> those just being concatenable, then that the usual
>> idea of the Search Query being Yes/No/Maybe or Sure/No/Yes,
>> that being about same as Wildmat, for variously attributes
>> and content, and the relations in documents and among them,
>> gets into these ideas about how tooling generally results,
>> making for files what then have simple algorithms that
>> work on them, variously repurposable to compiled indices
>> for usual "instant gratification" types.
>>
>>
>
>
>
> Well, for extraction and segmentation, there's
> what's involved is a model of messages and
> then as of a sort of model of MIME, with
> regards to "access-patternry", then for
> extraction and characterization and
> segmentation and ellision, these kinds of
> things what result the things.
>
> Extraction is sort of after messages attributes
> or the headers, then the content encoding and
> such, then as with regards to then embedding
> of documents in otherwise the document.
>
> Characterization here really reflects on character
> encodings, with the idea that a corpus of words
> has a range of an alphabet and that these days
> of all the code pages and glyph-maps of the world,
> what it reflects that members of alphabets indicate
> for any given textual representation as character data,
> that it matches the respective code-pages or planes
> or regions of the Unicode, these days, with respect
> to legacy encodings and such.
>
> So, for extraction and characterization, then gets
> into quite usual patterns of language, with things
> like punctuation and syntax, bracketing and groupings,
> commas and joiners and separators, the parenthetical,
> comments, quoting, and these kinds of things, in
> quite most all usual languages.
>
> For message formats and MIME, then, and content-encoding
> then extraction, in characterization after alphabet and
> punctuation, then gets pretty directly into the lexical,
> syntax, and grammar, with regards to texts.
>
> "Theory saturation ...."
>
>




It seems like Gert Webelhuth has a good book called
"Principles and Parameters of Syntactic Saturation",
discusses linguistics pretty thoroughly.

global.oup.com/academic/product/principles-and-parameters-of-syntactic-saturation-9780195070415?cc=us&lang=en&
books.google.com/books?id=nXboTBXbhwAC

Reading about this notion of "saturation", on the one
hand it seems to indicate lack of information, on the
other hand it seems to be capricious selective ignorance.

www.tandfonline.com/doi/full/10.1080/23311886.2020.1838706
doi.org/10.1080/23311886.2020.1838706
Saturation controversy in qualitative research: Complexities and
underlying assumptions. A literature review
Favourate Y. Sebele-Mpofu

Here it's called "censoring samples", which is often enough
with respect to "outliers". Here it's also called "retro-finitist".
The author details it's a big subjective mess and from a
statistical design sort of view it's, not saying much.


Here this is starting a bit simpler with for example a sort of
goal to understand annotated and threaded plain text
conversations, in the usual sort of way of establishing
sequence, about the idea for relational algebra, to be
relating posts and conversations in threads, in groups
in time, as with regards to simple fungible BFF's, as
with regards to simple fungible SFF's, what result highly
repurposable presentation, via storage-neutral means.

It results sort of bulky to start making the in-place
summary file formats, with regards to, for example,
the resulting size of larger summaries, yet at the same
time, the extraction and segmentation, after characterization,
and ellision:

extraction: headers and body
characterization: content encoding
extraction: text extraction
segmentation: words are atoms, letters are atoms, segments are atoms
ellision: hyphen-ization, 1/*comment*/2

then has for natural sorts bracketing and grouping,
here for example as with paragraphs and itemizations,
for the plainest sort of text having default characterization.

In this context it's particularly attribution which is a content
convention, the "quoting depth" character, for example,
in a world of spaces and tabs, with regards to enumerating
branches, what result relations what are to summarize
together, and apart. I.e. there's a notion with the document,
that often enough the posts bring their own context,
for being self-contained, in the threaded organization,
how to best guess attribution, given good faith attribution,
in the most usual sorts of contexts, of plain text extraction.


Then, SEARCH here is basically that "search finds hits",
or what matches, according to WILDMAT and IMAP SEARCH
and variously Yes/No/Maybe as a sort of WILDMAT search,
then for _where_ it finds hits, here in the groups', the threads',
the authors', and the dates', for browsing into those variously.

That speaks to a usual form of relation for navigation,

group -> threads
thread -> authors
author -> threads
date -> threads

and these kinds of things, about the many relations that
in summary are all derivable from the above described BFF
files, which are plain messages files with dates linked in from
the side, threading indicated in the message files, and authors
linked out from the messages.

I.e., here the idea then for content, is that, specific mentions
of technical words, basically relate to "tag cloud", about
finding related messages, authors, threads, groups,
among the things.
Ross Finlayson
2024-02-21 03:47:07 UTC
Reply
Permalink
About a "dedicated little OS" to run a "dedicated little service".


"Critix"

1) some boot code
power on self test, EFI/UEFI, certificates and boot, boot

2) a virt model / a machine model
maybe running in a virt
maybe running on metal

3) a process/scheduler model
it's processes, a process model
goal is, "some of POSIX"

Resources

Drivers

RAM
Bus
USB, ... serial/parallel, device connections, ....
DMA
framebuffer
audio dac/adc


Disk

hard
memory
network


Login

identity
resources



Networking

TCP/IP stack
UDP, ...
SCTP, ...
raw, ...

naming


Windowing

"video memory and what follows SVGA"
"Java, a plain windowing VM"



PCI <-> PCIe

USB 1/2 USB 3/4

MMU <-> DMA

Serial ATA

NIC / IEEE 802

"EFI system partition"

virtualization model
emulator

clock-accurate / bit-accurate
clock-inaccurate / voltage


mainboard / motherboard
circuit summary

emulator environment

CPU
main memory
host adapters

PU's
bus

I^2C

clock model / timing model
interconnect model / flow model
insertion model / removal model
instruction model
Ross Finlayson
2024-03-01 03:55:37 UTC
Reply
Permalink
On 02/20/2024 07:47 PM, Ross Finlayson wrote:
> About a "dedicated little OS" to run a "dedicated little service".
>
>
> "Critix"
>
> 1) some boot code
> power on self test, EFI/UEFI, certificates and boot, boot
>
> 2) a virt model / a machine model
> maybe running in a virt
> maybe running on metal
>
> 3) a process/scheduler model
> it's processes, a process model
> goal is, "some of POSIX"
>
> Resources
>
> Drivers
>
> RAM
> Bus
> USB, ... serial/parallel, device connections, ....
> DMA
> framebuffer
> audio dac/adc
>
>
> Disk
>
> hard
> memory
> network
>
>
> Login
>
> identity
> resources
>
>
>
> Networking
>
> TCP/IP stack
> UDP, ...
> SCTP, ...
> raw, ...
>
> naming
>
>
> Windowing
>
> "video memory and what follows SVGA"
> "Java, a plain windowing VM"
>
>
>
> PCI <-> PCIe
>
> USB 1/2 USB 3/4
>
> MMU <-> DMA
>
> Serial ATA
>
> NIC / IEEE 802
>
> "EFI system partition"
>
> virtualization model
> emulator
>
> clock-accurate / bit-accurate
> clock-inaccurate / voltage
>
>
> mainboard / motherboard
> circuit summary
>
> emulator environment
>
> CPU
> main memory
> host adapters
>
> PU's
> bus
>
> I^2C
>
> clock model / timing model
> interconnect model / flow model
> insertion model / removal model
> instruction model
>
>




I got looking into PC architecture wondering
how it was since I studied internals and it really
seems it's stabilized a lot.

UEFI ACPI SMBIOS

DRAM
DMA
virtualized addressing

CPU

System Bus

Intel CSI QPI UPI
AMD HyperTransport
ARM CoreLink


PCI
PCIe

Host Adapters
ATA
NVMe
USB
NIC

So I'm wondering to myself, well first I wonder
about writing UEFI plugins to sort of enumerate
the setup and for example print it out and for
example see what keys are in the TPM and for
example the partition table and what goes in
in terms of the device tree and basically for
diagnostic, boot services then runtime services
after UEFI exits after having loaded into memory
the tables of the "runtime services" which are
mostly sort of a table in memory with offsets
of the things and maybe how they're ID's as
with regards to the System Bus the Host Adapters.


Then it's a pretty simplified model and gets
into things like wondering what all else is
going on in the device tree and I2C the
blinking lights and perhaps the beep, or bell.

A lot of times it looks like the video is onboard
out the CPU, vis-a-vis the UEFI video output
or what appears to be going on, I'm wondering
about it.


So I'm wondering how to make a simulator,
an emulator, uh, of these things above,
and then basically the low-speed things
and the high-speed things, and, their logical
protocols vis-a-vis the voltage and the
bit-and-clock accurate and the voltage as
symbols vis-a-vis symbolically the protocols,
how to make it so to have a sort of simulator
or emulator of this sort of usual system,
with a usual idea to target code to it to
that kind of system or a virt over the virtualized
system to otherwise exactly that kind of system, ....
Ross Finlayson
2024-03-07 16:09:25 UTC
Reply
Permalink
On 02/29/2024 07:55 PM, Ross Finlayson wrote:
> On 02/20/2024 07:47 PM, Ross Finlayson wrote:
>> About a "dedicated little OS" to run a "dedicated little service".
>>
>>
>> "Critix"
>>
>> 1) some boot code
>> power on self test, EFI/UEFI, certificates and boot, boot
>>
>> 2) a virt model / a machine model
>> maybe running in a virt
>> maybe running on metal
>>
>> 3) a process/scheduler model
>> it's processes, a process model
>> goal is, "some of POSIX"
>>
>> Resources
>>
>> Drivers
>>
>> RAM
>> Bus
>> USB, ... serial/parallel, device connections, ....
>> DMA
>> framebuffer
>> audio dac/adc
>>
>>
>> Disk
>>
>> hard
>> memory
>> network
>>
>>
>> Login
>>
>> identity
>> resources
>>
>>
>>
>> Networking
>>
>> TCP/IP stack
>> UDP, ...
>> SCTP, ...
>> raw, ...
>>
>> naming
>>
>>
>> Windowing
>>
>> "video memory and what follows SVGA"
>> "Java, a plain windowing VM"
>>
>>
>>
>> PCI <-> PCIe
>>
>> USB 1/2 USB 3/4
>>
>> MMU <-> DMA
>>
>> Serial ATA
>>
>> NIC / IEEE 802
>>
>> "EFI system partition"
>>
>> virtualization model
>> emulator
>>
>> clock-accurate / bit-accurate
>> clock-inaccurate / voltage
>>
>>
>> mainboard / motherboard
>> circuit summary
>>
>> emulator environment
>>
>> CPU
>> main memory
>> host adapters
>>
>> PU's
>> bus
>>
>> I^2C
>>
>> clock model / timing model
>> interconnect model / flow model
>> insertion model / removal model
>> instruction model
>>
>>
>
>
>
>
> I got looking into PC architecture wondering
> how it was since I studied internals and it really
> seems it's stabilized a lot.
>
> UEFI ACPI SMBIOS
>
> DRAM
> DMA
> virtualized addressing
>
> CPU
>
> System Bus
>
> Intel CSI QPI UPI
> AMD HyperTransport
> ARM CoreLink
>
>
> PCI
> PCIe
>
> Host Adapters
> ATA
> NVMe
> USB
> NIC
>
> So I'm wondering to myself, well first I wonder
> about writing UEFI plugins to sort of enumerate
> the setup and for example print it out and for
> example see what keys are in the TPM and for
> example the partition table and what goes in
> in terms of the device tree and basically for
> diagnostic, boot services then runtime services
> after UEFI exits after having loaded into memory
> the tables of the "runtime services" which are
> mostly sort of a table in memory with offsets
> of the things and maybe how they're ID's as
> with regards to the System Bus the Host Adapters.
>
>
> Then it's a pretty simplified model and gets
> into things like wondering what all else is
> going on in the device tree and I2C the
> blinking lights and perhaps the beep, or bell.
>
> A lot of times it looks like the video is onboard
> out the CPU, vis-a-vis the UEFI video output
> or what appears to be going on, I'm wondering
> about it.
>
>
> So I'm wondering how to make a simulator,
> an emulator, uh, of these things above,
> and then basically the low-speed things
> and the high-speed things, and, their logical
> protocols vis-a-vis the voltage and the
> bit-and-clock accurate and the voltage as
> symbols vis-a-vis symbolically the protocols,
> how to make it so to have a sort of simulator
> or emulator of this sort of usual system,
> with a usual idea to target code to it to
> that kind of system or a virt over the virtualized
> system to otherwise exactly that kind of system, ....
>
>
>


Critix

boot protocols

UEFI ACPI SMBIOS

CPU and instruction model

bus protocols

low-speed protocols
high-speed protocols



Looking at the instructions, it looks pretty much
that the kernel code is involved inside the system
instructions, to support the "bare-metal" and then
also the "virt-guests", then that communication
is among the nodes in AMD, then, the HyperTransport
basically is indicated as, IO, then for there to be figured
out that the guest virts get a sort of view of the "hardware
abstraction layer", then with regards to the segments and
otherwise the mappings, for the guest virts, vis-a-vis,
the mappings to the memory and I/O, getting figured
out these kinds of things as an example of what gets
into a model of a sort of machine, as a sort of emulator,
basically figuring to be bit-accurate and ignore being
clock-accurate.

The "BIOS and kernel guide" gets into the order of
system initializaiton and the links, and DRAM.
It looks that there are nodes basically being parallel
processors, and on those cores, being CPUs or
processors.

Then each of the processors has its control and status
registers, then with regards to tables, and with regards
to memory and cache, about those the segments,
figuring to model the various interconnections this
way in a little model of a mainboard CPU. "Using L2
Cache as General Storage During Boot".

Then it gets into enumerating and building the links,
and setting up the buffers, to figure out what's going
on the DRAM and DMA, and, PCI and PCIe, and, then
about what's ATA, NVMe, and USB, these kinds things.

Nodes' cores share registers or "software must ensure...",
with statics and scopes. Then it seems the cache lines
and then the interrupt vectors or APIC IDs get enumerated,
setting up the routes and tables.

Then various system and operating modes proceed,
where there's an idea that the basic difference
among executive, scheduler, and operating system,
basically is in with respect to the operating mode,
with respect to old real, protected, and, "unreal",
I suppose, modes, here that basically it's all really
simplified about protected mode and guest virts.

"After storing the save state, execution starts ...."

Then the's described "spring-boarding" into SMM
that the BSP and BSM, a quick protocol then that
all the live nodes enter SMM, basically according
to ACPI and the APIC.

"The processor supports many power management
features in a variety of systems."

This gets into voltage proper, here though that
what results is bit-accurate events.

"P-states are operational performance states
characterized by a unique frequency and voltage."

The idea here is to support very-low-power operation
vis-a-vis modest, usual, and full (P0). Then besides
consumption, is also reducing heat, or dialing down
according to temperature. Then there are C-states
and S-states, then mostly these would be as by
the BIOS, what gets surfaced as ACPI to the kernel.

There are some more preliminaries, the topology
gets setup, then gets involved the DCT DIMM DRAM
frequency and for DRAM, lighting up RAM, that
basically to be constant rate, about the DCT and DDR.

There are about 1000 model-specific registers what
seem to be for the BIOS to inspect and figure out
the above pretty much and put the system into a
state for regular operation.

Then it seems like an emulator would be setting
that up, then as with regards to usually enough
"known states" and setting up for simulating the
exercise of execution and I/O.

instructions


system-purpose


interrupt

CLGI CLI STI STGI
HLT
IRET IRETD IRETQ
LIDT SIDT
MONITOR MWAIT
RSM
SKINIT

privileges

ARPL
LAR
RDPKRU WRPKRU
VERR VERW

alignment

CLAC STAC

jump/routine

SYSCALL SYSRET
SYSENTER SYSEXIT

task, stack, tlb, gdt, ldt, cache

CLTS
CLRSSBSY SETSSBSY
INCSSP
INVD
INVLPG INVLPGA INVLPGB INVPCID TLBSYNC
LGDT SGDT
LLDT SLDT
LMSW
LSL
LTR STR
RDSSP
RSTORSSP SAVEPREVSSP
WBINVD WBNOINVD
WRSS WRUSS


load/store
MOV CRn MOV DRn
RDMSR WRMSR
SMSW
SWAPGS

virtual

PSMASH PVALIDATE
RMPADJUST RMPUPDATE
RMPQUERY
VMLOAD VMSAVE
VMMCALL VMGEXIT
VMRUN


perf

RDPMC
RDTSC RDTSCP


debug

INT 3




general-purpose

context
CPUID
LLWPCB LWPINS LWPVAL SLWPCB
NOP
PAUSE

RDFSBASE

RDPID
RPPRU

UD0 UD1 UD2

jump/routine
CALL RET
ENTER LEAVE
INT
INTO
Jcc
JCXZ JECXZ JRCXZ
JMP

register
BOUND
BT BTC BTR BTS
CLC CLD CMC
LAHF SAHF
STC STD
WRFSBASE WRGSBASE

compare
cmp
CMP
CMPS CMPSB CMPSW CMPSD CMPSQ
CMPXCHG CMPXCHG8B CMPXCHG16B
SCAS SCASB SCASW SCASD SCASQ
SETcc
TEST
branch
LOOP LOOPE LOOPNE LOOPNZ LOOPZ


input/output
IN
INS INSB INSW INSD
OUT
OUTS OUTSB OUTSW OUTSD

memory/cache
CLFLUSH CLFLUSHOPT
CLWB
CLZERO
LFENCE MCOMMIT MFENCE SFENCE
MONITORX MWAITX
PREFETCH PREFETCHW PREFETCHlevel

memory/stack
POP
POPA POPAD
POPF POPFD POPFQ
PUSH
PUSHA PUSHAD
PUSHF PUSHFD PUSHFQ

memory/segment
XLAT XLATB

load/store
BEXTR
BLCFILL BLCI BLCIC BLCMSK BLCS BLCIC BLCMSK BLSFILL BLSI BLSMSK BLSR
BSF BSR
BSWAP
BZHI
CBW CWDE CDQE CWD CDQ CQO
CMOVcc
LDS LES LFS LGS LSS
LEA
LODS LODSB LODSW LODSQ
MOV
MOVBE
MOVD
MOVMSKPD MOVMSKPS
MOVNTI
MOVS MOVSB MOVSW MOVSD MOVSQ
MOVSX MOVSXD MOVZX
PDEP PEXT
RDRAND RDSEED
STOD STOSB STOSW STOSD STODQ
XADD XCHG




bitwise/math
and or nand nor
complement
roll
AND ANDN
LZCNT TZCNT
NOT
OR XOR
POPCNT
RCL RCR ROL ROR RORX
SAL SHL SAR SARX SHL SHLD SHLX SHR SHRD SHRX
T1MSKC TZMSK
math
plus minus mul div muldiv
ADC ADCX ADD
DEC INC
DIV IDIV IMUL MUL MULX
NEG
SBB SUB





ignored / unimplemented

bcd binary coded decimal
AAA AAD AAM AAS
DAA DAS

CRC32




instruction

opprefixes opcode operands opeffects

opcode: the op-code
operands:
implicits, explicits
inputs, outputs
opeffects: register effects

operations
Ross Finlayson
2024-03-12 17:08:30 UTC
Reply
Permalink
On 03/07/2024 08:09 AM, Ross Finlayson wrote:
> On 02/29/2024 07:55 PM, Ross Finlayson wrote:
>> On 02/20/2024 07:47 PM, Ross Finlayson wrote:
>>> About a "dedicated little OS" to run a "dedicated little service".
>>>
>>>
>>> "Critix"
>>>
>>> 1) some boot code
>>> power on self test, EFI/UEFI, certificates and boot, boot
>>>
>>> 2) a virt model / a machine model
>>> maybe running in a virt
>>> maybe running on metal
>>>
>>> 3) a process/scheduler model
>>> it's processes, a process model
>>> goal is, "some of POSIX"
>>>
>>> Resources
>>>
>>> Drivers
>>>
>>> RAM
>>> Bus
>>> USB, ... serial/parallel, device connections, ....
>>> DMA
>>> framebuffer
>>> audio dac/adc
>>>
>>>
>>> Disk
>>>
>>> hard
>>> memory
>>> network
>>>
>>>
>>> Login
>>>
>>> identity
>>> resources
>>>
>>>
>>>
>>> Networking
>>>
>>> TCP/IP stack
>>> UDP, ...
>>> SCTP, ...
>>> raw, ...
>>>
>>> naming
>>>
>>>
>>> Windowing
>>>
>>> "video memory and what follows SVGA"
>>> "Java, a plain windowing VM"
>>>
>>>
>>>
>>> PCI <-> PCIe
>>>
>>> USB 1/2 USB 3/4
>>>
>>> MMU <-> DMA
>>>
>>> Serial ATA
>>>
>>> NIC / IEEE 802
>>>
>>> "EFI system partition"
>>>
>>> virtualization model
>>> emulator
>>>
>>> clock-accurate / bit-accurate
>>> clock-inaccurate / voltage
>>>
>>>
>>> mainboard / motherboard
>>> circuit summary
>>>
>>> emulator environment
>>>
>>> CPU
>>> main memory
>>> host adapters
>>>
>>> PU's
>>> bus
>>>
>>> I^2C
>>>
>>> clock model / timing model
>>> interconnect model / flow model
>>> insertion model / removal model
>>> instruction model
>>>
>>>
>>
>>
>>
>>
>> I got looking into PC architecture wondering
>> how it was since I studied internals and it really
>> seems it's stabilized a lot.
>>
>> UEFI ACPI SMBIOS
>>
>> DRAM
>> DMA
>> virtualized addressing
>>
>> CPU
>>
>> System Bus
>>
>> Intel CSI QPI UPI
>> AMD HyperTransport
>> ARM CoreLink
>>
>>
>> PCI
>> PCIe
>>
>> Host Adapters
>> ATA
>> NVMe
>> USB
>> NIC
>>
>> So I'm wondering to myself, well first I wonder
>> about writing UEFI plugins to sort of enumerate
>> the setup and for example print it out and for
>> example see what keys are in the TPM and for
>> example the partition table and what goes in
>> in terms of the device tree and basically for
>> diagnostic, boot services then runtime services
>> after UEFI exits after having loaded into memory
>> the tables of the "runtime services" which are
>> mostly sort of a table in memory with offsets
>> of the things and maybe how they're ID's as
>> with regards to the System Bus the Host Adapters.
>>
>>
>> Then it's a pretty simplified model and gets
>> into things like wondering what all else is
>> going on in the device tree and I2C the
>> blinking lights and perhaps the beep, or bell.
>>
>> A lot of times it looks like the video is onboard
>> out the CPU, vis-a-vis the UEFI video output
>> or what appears to be going on, I'm wondering
>> about it.
>>
>>
>> So I'm wondering how to make a simulator,
>> an emulator, uh, of these things above,
>> and then basically the low-speed things
>> and the high-speed things, and, their logical
>> protocols vis-a-vis the voltage and the
>> bit-and-clock accurate and the voltage as
>> symbols vis-a-vis symbolically the protocols,
>> how to make it so to have a sort of simulator
>> or emulator of this sort of usual system,
>> with a usual idea to target code to it to
>> that kind of system or a virt over the virtualized
>> system to otherwise exactly that kind of system, ....
>>
>>
>>
>
>
> Critix
>
> boot protocols
>
> UEFI ACPI SMBIOS
>
> CPU and instruction model
>
> bus protocols
>
> low-speed protocols
> high-speed protocols
>
>
>
> Looking at the instructions, it looks pretty much
> that the kernel code is involved inside the system
> instructions, to support the "bare-metal" and then
> also the "virt-guests", then that communication
> is among the nodes in AMD, then, the HyperTransport
> basically is indicated as, IO, then for there to be figured
> out that the guest virts get a sort of view of the "hardware
> abstraction layer", then with regards to the segments and
> otherwise the mappings, for the guest virts, vis-a-vis,
> the mappings to the memory and I/O, getting figured
> out these kinds of things as an example of what gets
> into a model of a sort of machine, as a sort of emulator,
> basically figuring to be bit-accurate and ignore being
> clock-accurate.
>
> The "BIOS and kernel guide" gets into the order of
> system initializaiton and the links, and DRAM.
> It looks that there are nodes basically being parallel
> processors, and on those cores, being CPUs or
> processors.
>
> Then each of the processors has its control and status
> registers, then with regards to tables, and with regards
> to memory and cache, about those the segments,
> figuring to model the various interconnections this
> way in a little model of a mainboard CPU. "Using L2
> Cache as General Storage During Boot".
>
> Then it gets into enumerating and building the links,
> and setting up the buffers, to figure out what's going
> on the DRAM and DMA, and, PCI and PCIe, and, then
> about what's ATA, NVMe, and USB, these kinds things.
>
> Nodes' cores share registers or "software must ensure...",
> with statics and scopes. Then it seems the cache lines
> and then the interrupt vectors or APIC IDs get enumerated,
> setting up the routes and tables.
>
> Then various system and operating modes proceed,
> where there's an idea that the basic difference
> among executive, scheduler, and operating system,
> basically is in with respect to the operating mode,
> with respect to old real, protected, and, "unreal",
> I suppose, modes, here that basically it's all really
> simplified about protected mode and guest virts.
>
> "After storing the save state, execution starts ...."
>
> Then the's described "spring-boarding" into SMM
> that the BSP and BSM, a quick protocol then that
> all the live nodes enter SMM, basically according
> to ACPI and the APIC.
>
> "The processor supports many power management
> features in a variety of systems."
>
> This gets into voltage proper, here though that
> what results is bit-accurate events.
>
> "P-states are operational performance states
> characterized by a unique frequency and voltage."
>
> The idea here is to support very-low-power operation
> vis-a-vis modest, usual, and full (P0). Then besides
> consumption, is also reducing heat, or dialing down
> according to temperature. Then there are C-states
> and S-states, then mostly these would be as by
> the BIOS, what gets surfaced as ACPI to the kernel.
>
> There are some more preliminaries, the topology
> gets setup, then gets involved the DCT DIMM DRAM
> frequency and for DRAM, lighting up RAM, that
> basically to be constant rate, about the DCT and DDR.
>
> There are about 1000 model-specific registers what
> seem to be for the BIOS to inspect and figure out
> the above pretty much and put the system into a
> state for regular operation.
>
> Then it seems like an emulator would be setting
> that up, then as with regards to usually enough
> "known states" and setting up for simulating the
> exercise of execution and I/O.
>
> instructions
>
>
> system-purpose
>
>
> interrupt
>
> CLGI CLI STI STGI
> HLT
> IRET IRETD IRETQ
> LIDT SIDT
> MONITOR MWAIT
> RSM
> SKINIT
>
> privileges
>
> ARPL
> LAR
> RDPKRU WRPKRU
> VERR VERW
>
> alignment
>
> CLAC STAC
>
> jump/routine
>
> SYSCALL SYSRET
> SYSENTER SYSEXIT
>
> task, stack, tlb, gdt, ldt, cache
>
> CLTS
> CLRSSBSY SETSSBSY
> INCSSP
> INVD
> INVLPG INVLPGA INVLPGB INVPCID TLBSYNC
> LGDT SGDT
> LLDT SLDT
> LMSW
> LSL
> LTR STR
> RDSSP
> RSTORSSP SAVEPREVSSP
> WBINVD WBNOINVD
> WRSS WRUSS
>
>
> load/store
> MOV CRn MOV DRn
> RDMSR WRMSR
> SMSW
> SWAPGS
>
> virtual
>
> PSMASH PVALIDATE
> RMPADJUST RMPUPDATE
> RMPQUERY
> VMLOAD VMSAVE
> VMMCALL VMGEXIT
> VMRUN
>
>
> perf
>
> RDPMC
> RDTSC RDTSCP
>
>
> debug
>
> INT 3
>
>
>
>
> general-purpose
>
> context
> CPUID
> LLWPCB LWPINS LWPVAL SLWPCB
> NOP
> PAUSE
>
> RDFSBASE
>
> RDPID
> RPPRU
>
> UD0 UD1 UD2
>
> jump/routine
> CALL RET
> ENTER LEAVE
> INT
> INTO
> Jcc
> JCXZ JECXZ JRCXZ
> JMP
>
> register
> BOUND
> BT BTC BTR BTS
> CLC CLD CMC
> LAHF SAHF
> STC STD
> WRFSBASE WRGSBASE
>
> compare
> cmp
> CMP
> CMPS CMPSB CMPSW CMPSD CMPSQ
> CMPXCHG CMPXCHG8B CMPXCHG16B
> SCAS SCASB SCASW SCASD SCASQ
> SETcc
> TEST
> branch
> LOOP LOOPE LOOPNE LOOPNZ LOOPZ
>
>
> input/output
> IN
> INS INSB INSW INSD
> OUT
> OUTS OUTSB OUTSW OUTSD
>
> memory/cache
> CLFLUSH CLFLUSHOPT
> CLWB
> CLZERO
> LFENCE MCOMMIT MFENCE SFENCE
> MONITORX MWAITX
> PREFETCH PREFETCHW PREFETCHlevel
>
> memory/stack
> POP
> POPA POPAD
> POPF POPFD POPFQ
> PUSH
> PUSHA PUSHAD
> PUSHF PUSHFD PUSHFQ
>
> memory/segment
> XLAT XLATB
>
> load/store
> BEXTR
> BLCFILL BLCI BLCIC BLCMSK BLCS BLCIC BLCMSK BLSFILL BLSI BLSMSK BLSR
> BSF BSR
> BSWAP
> BZHI
> CBW CWDE CDQE CWD CDQ CQO
> CMOVcc
> LDS LES LFS LGS LSS
> LEA
> LODS LODSB LODSW LODSQ
> MOV
> MOVBE
> MOVD
> MOVMSKPD MOVMSKPS
> MOVNTI
> MOVS MOVSB MOVSW MOVSD MOVSQ
> MOVSX MOVSXD MOVZX
> PDEP PEXT
> RDRAND RDSEED
> STOD STOSB STOSW STOSD STODQ
> XADD XCHG
>
>
>
>
> bitwise/math
> and or nand nor
> complement
> roll
> AND ANDN
> LZCNT TZCNT
> NOT
> OR XOR
> POPCNT
> RCL RCR ROL ROR RORX
> SAL SHL SAR SARX SHL SHLD SHLX SHR SHRD SHRX
> T1MSKC TZMSK
> math
> plus minus mul div muldiv
> ADC ADCX ADD
> DEC INC
> DIV IDIV IMUL MUL MULX
> NEG
> SBB SUB
>
>
>
>
>
> ignored / unimplemented
>
> bcd binary coded decimal
> AAA AAD AAM AAS
> DAA DAS
>
> CRC32
>
>
>
>
> instruction
>
> opprefixes opcode operands opeffects
>
> opcode: the op-code
> operands:
> implicits, explicits
> inputs, outputs
> opeffects: register effects
>
> operations
>
>


Ethernet and IEEE 802
https://en.wikipedia.org/wiki/IEEE_802.3
TCP, TCP/IP

packets

Unicast and multicast

datagrams
sockets
SCTP



v4 ARP IP->MAC
NAT

v6 Neighbor IP->MAC


DNS and domain name resolvers
domain names and IP addresses
IP addresses and MAC addresses

packet construction and emission
packet receipt and deconstruction

packet routing
routes and packets

Gateway
Local Network
DHCP
PPPoE

NICs
I/O
routing
built-ins


NICs and the bus
NICs and DMA


The runtime, basically has memory and the bus,
in terms of that all transport is on the bus and
all state is in the memory.

At the peripherals or "outside the box", basically
has that the simulator model has only as whatever
of those are effects, either in protocols and thus
synchronously, with the modeling of the asynchronous
request/response as synchronously, as what results
the "out-of-band" then with respect to the interrupts,
the service of the interrupts, and otherwise usually
the service of the bus, with regards to the service of
the memory, modes of the synchronous routine,
among independently operating units.


Power over Ethernet / Wake-on-LAN
https://en.wikipedia.org/wiki/Energy-Efficient_Ethernet

https://en.wikipedia.org/wiki/Physical_layer#PHY


Now, this isn't really related necessarily to the
idea of implementing Usenet and other text-based
Internet Message protocols in the application layer,
yet, there's sort of an idea, that a model machine
as a simulator, results how to implement an entire
operating system whose only purpose is to implement
text-based Internet Message protocols.

https://en.wikipedia.org/wiki/Link_Layer_Discovery_Protocol

One nice thing about IETF RFC's is that they're available
largely gratis while when getting into IEEE recommendations
that it results they're money.

It helps that mostly though all the needful is in the RFC's.

https://en.wikipedia.org/wiki/Network_interface_controller

So, the NIC or LAN adapter, basically is to get figured that
it sort of supports a stack already or that otherwise it's
to get figured how it results packets vis-a-vis the service
of the I/O's and how to implement the buffers and how
to rotate the buffers as the buffers are serviced, by the
synchronous routine.

https://en.wikipedia.org/wiki/TCP_offload_engine

Then there's sort of a goal "the application protocols
sit directly on that", vis-a-vis, "the operating system
asynchronous and vector-I/O facility sits directly on
that, and the application protocol sits directly on that".


This is where, for the protocols, basically involves any
matters of packet handling like firewalls and this kind
of thing, vis-a-vis the application or presentation layer
or session layer, about the control plane and data plane.



The idea that specialized units handle protocols,
reminds me one time, I was working at this place,
and one of the product, was a daughterboard,
the purpose of which was to sort data, a sorter unit.
Here the idea that the NIC knows protocol and results
bus traffic, gets into variously whether it matters.



Two key notions of the thing are, "affinity", and "coherency".

The "coherency" is sort of an "expanding wave" of consistency,
while, "affinity", is sort of a "directed edge", of consistency.

Basically affinity indicates caring about coherency,
and coherency indicates consistency of affinity.

This way the "locality" and "coherency" and "affinity" then
make for topology for satisfying the locality's affinities
of coherency, that being the definition of "behavior, defined".

"Communicating sequential processes" is a very usual metaphor,
with regards to priority and capacity and opportunity and compulsion.

https://en.wikipedia.org/wiki/Communicating_sequential_processes

There are _affinities_ in the various layers, of _affinities_
in the various layers, here for examples "packets and streams",
and "messages and threads", for example.


Much then gets involved in implementing the finite-state-machines,
with regards to, the modes the protocols the finite-state-machines
each a process in communicating sequential processes in
communicating coherent independent processes.

Co-unicating, ....

So, the idea of "open-finite-state-machine" is that there
is defined behavior the expected and unexpected, with
regards to resets, and defined behavior the known and
unknown, with regards to restarts, then the keeping and
the loss of state, what exist in the configuration space
the establishment of state and change and the state of change
and the changes of state, the open-finite-state-machine.

https://en.wikipedia.org/wiki/Unbounded_nondeterminism



https://en.wikipedia.org/wiki/X86-64#Microarchitecture_levels

When I studied the IA-32 and I studied Itanium and IA-64 a lot,
and I studied RISC, and these kinds things, with regards to x86
and how Itanium is kind of like RISC and RISC and ring registers
and these kinds of things, the modes, and so on, that was mostly
looking at assembler instructions with regards to image CODEC
code. So anyways these days it seems like this whole x86-64 has
really simplified a lot of things, that the co-operation on the bus
still seems a lot about the IDT the Interrupt Descriptor Table,
which has 256 entries, then with regards to the tags that go
into those, and service vectors about those. I'm wondering
about basically whether those are fixed from the get-go or
whether they can be blocked in and out, with regards to status
on the bus, vis-a-vis otherwise sort of funneling exceptions into
as few as possible, figuring those are few and far between
or that when they come mostly get dumped out.

I'm not very interested in peripherals and mostly interested
in figuring out hi-po I/O in minimal memory, then with regards
to the CPU and RAM for compute tasks, but mostly for scatter/gather.
Ross Finlayson
2024-02-21 04:38:35 UTC
Reply
Permalink
Alright then, about the SFF, "summary" file-format,
"sorted" file-format, "search" file-format, the idea
here is to figure out normal forms of summary,
that go with the posts, with the idea that "a post's
directory is on the order of contained size of the
size of the post", while, "a post's directory is on
a constant order of entries", here is for sort of
summarizing what a post's directory looks like
in "well-formed BFF", then as with regards to
things like Intermediate file-formats as mentioned
above here with the goal of "very-weakly-encrypted
at rest as constant contents", then here for
"SFF files, either in the post's-directory or
on the side, and about how links to them get
collected to directories in a filesystem structure
for the conventions of the concatenation of files".

So, here the idea so far is that BFF has a normative
form for each post, which has a particular opaque
globally-universal unique identifier, the Message-ID,
then that the directory looks like MessageId/ then its
contents were as these files.

id hd bd yd td rd ad dd ud xd
id, header, body, year-to-date, thread, referenced, authored, dead,
undead, expired

or just files named

i h b y t r a d u x

which according to the presence of the files and
their contents, indicate that the presence of the
MessageId/ directory indicates the presence of
a well-formed message, contingent not being expired.

... Where hd bd are the message split into its parts,
with regards to the composition of messages by
concatenating those back together with the computed
message numbers and this kind of thing, with regards to
the site, and the idea that they're stored at-rest pre-compressed,
then knowledge of the compression algorithm makes for
concatenating them in message-composition as compressed.

Then, there are variously already relations of the
posts, according to groups, then here as above that
there's perceived required for date, and author.
I.e. these are files on the order the counts of posts,
or span in time, or count of authors.

(About threading and relating posts, is the idea of
matching subjects not-so-much but employing the
References header, then as with regards to IMAP and
parity as for IMAP's THREADS extension, ...,
www.rfc-editor.org/rfc/rfc5256.html , cf SORT and THREAD.
There's a usual sort of notion that sorted, threaded
enumeration is either in date order or thread-tree
traversal order, usually more sensibly date order,
with regards to breaking out sub-threads, variously.
"It's all one thread." IMAP: "there is an implicit sort
criterion of sequence number".)


Then, similarly is for defining models for the sort, summary,
search, SFF, that it sort of (ha) rather begins with sort,
about the idea that it's sort of expected that there will
be a date order partition either as symlinks or as an index file,
or as with regards to that messages date is also stored in
the yd file, then as with regards to "no file-times can be
assumed or reliable", with regards to "there's exactly one
file named YYYY-MM-DD-HH-MM-SS in MessageId/", these
kinds of things. There's a real goal that it works easy
with shell built-ins and text-utils, or "command line",
to work with the files.


So, sort pretty well goes with filtering.
If you're familiar with the context, of, "data tables",
with a filter-predicate and a sort-predicate,
they're different things but then go together.
It's figured that they get front-ended according
to the quite most usual "column model" of the
"table model" then "yes/no/maybe" row filtering
and "multi-sort" row sorting. (In relational algebra, ...,
or as rather with 'relational algebra with rows and nulls',
this most usual sort of 'composable filtering' and 'multi-sort').

Then in IMAP, the THREAD command is "a variant of
SEARCH with threading semantics for the results".
This is where both posts and emails work off the
References header, but it looks like in the wild there
is something like "a vendor does poor-man's subject
threading for you and stuffs in a X-References",
this kind of thing, here with regards to that
instead of concatenation, is that intermediate
results get sorted and threaded together,
then those, get interleaved and stably sorted
together, that being sort of the idea, with regards
to search results in or among threads.

(Cf www.jwz.org/doc/threading.html as
via www.rfc-editor.org/rfc/rfc5256.html ,
with regards to In-Reply-To and References.
There are some interesting articles there
about "mailbox summarization".)

About the summary of posts, one way to start
as for example an interesting article about mailbox
summarization gets into, is, all the necessary text-encodings
to result UTF-8, of Unicode, after UCS-2 or UCS-4 or ASCII,
or CP-1252, in the base of BE or LE BOMs, or anything to
do with summarizing the character data, of any of the
headers, or the body of the text, figuring of course
that everything's delivered as it arrives, as with regards
to the opacity usually of everything vis-a-vis its inspection.

This could be a normative sort of file that goes in the messageId/
folder.

cd: character-data, a summary of whatever form of character
encoding or requirements of unfolding or unquoting or in
the headers or the body or anywhere involved indicating
a stamp indicating each of the encodings or character sets.

Then, the idea is that it's a pretty deep inspection to
figure out how the various attributes, what are their
encodings, and the body, and the contents, with regards
to a sort of, "a normalized string indicating the necessary
character encodings necessary to extract attributes and
given attributes and the body and given sections", for such
matters of indicating the needful for things like sort,
and collation, in internationalization and localization,
aka i18n and l10n. (Given that the messages are stored
as they arrived and undisturbed.)

The idea is that "the cd file doesn't exist for messages
in plain ASCII7, but for anything anywhere else, breaks
out what results how to get it out". This is where text
is often in a sort of format like this.

Ascii
it's keyboard characters
ISO8859-1/ISO8859-15/CP-1252
it's Latin1 often though with the Windows guys
Sideout
it's Ascii with 0-127 gigglies or upper glyphs
Wideout
it's 0-256 with any 256 wide characters in upper Unicode planes
Unicode
it's Unicode

Then there are all sorts of encodings, this is according to
the rules of Messages with regards to header and body
and content and transfer-encoding and all these sorts
things, it's Unicode.

Then, another thing to get figured out is lengths,
the size of contents or counts or lengths, figuring
that it's a great boon to message-composition to
allocate exactly what it needs for when, as a sum
of invariant lengths.

Then the MessageId/ files still has un-used 'l' and 's',
then though that 'l' looks too close to '1', here it's
sort of unambiguous.

ld: lengthed, the coded and uncoded lengths of attributes and parts

The idea here is to make it easiest for something like
"consult the lengths and allocate it raw, concatenate
the message into it, consult the lengths and allocate
it uncoded, uncode the message into it".

So, getting into the SFF, is that basically
"BFF indicates well-formed messages or their expiry",
"SFF is derived via a common algorithm for all messages",
and "some SFF lives next to BFF and is also write-once-read-many",
vis-a-vis that "generally SFF is discardable because it's derivable".
Ross Finlayson
2024-02-22 20:11:07 UTC
Reply
Permalink
Then, it seems that cd and ld should be part of the BFF,
the backing file-format, or as so generated on demand,
that with regards to the structural content of the messages,
and the composition of the wire forms of the messages,
they're intermediate values which indicate sort of a validation.
Of course they'd have to be validated in a sense, for the idea
that otherwise routine can rely on them.

Here for the character determination, is basically for a
specification, after validation, of text encodings, what's
to result, that such a specification starts in "closed categories",
as with regards to the names of things or a registry of them,
associated with specific normative algorithms,
that result a common text encoding.

So, here cd starts with, "7-bit clean ASCII". Then as above
there are the most usual character sets involved, as what
these days fall into Unicode, with respect to all the character
encodings in the world, and their normalized names and
glyphs and codes as these days fall into the great effort
what is, "Unicode", and the ubiquitous encoding, UTF-8,
for UCS-2, or UTF-16 or UTF-32 and other such notions,
and their variants sometimes when UTF-8 for example
in some settings has an encoding, here that it's mostly
entirely tractable everywhere, "printable ASCII" or
"UTF-8, excluding non-printable characters".

So, the idea for the contents of the specification,
gets into here dealing with messages. The messages
have headers, they have bodies, there are overall
or default or implicit or specific or self-declaring
sorts textual data, the code-pages, the representations,
the encodings, and the forms. This is all called, "textual",
data.

Then here the usual idea for messages, is that, while
Usenet messages are particularly simple, with regards
to Email messages, or the usual serialization of HTTP messages,
it's a header with a multi-set of attributes and a body,
the interpretation as by the relevant content headers
or defaultly or implicitly, with respect to the system encoding
and locale, and other usual expectations of defaults,
vis-a-vis, explicits.

So, the idea of BFF's cd, is to be a specification, of
all the normative character encodings' textual,
for a given edition or revision of all the character
encodings, here as simplified being "Internet Messages".
This is associated with the headers, overall, the headers,
apiece, or their segmented values, apiece, the body,
overall, the parts of the body, apiece, or their segment
values, apiece, and the message, altogether.


Then, the lengths, or BFF's ld, is also after following
a particular normative reading of "the bytes" or "the wire",
and "the characters" and "in their character encoding",
and it must be valid, to be reliable to allocate the buffer
for the wire data, filling the buffer exactly, according
to the lengths, the sizes. The mal-formed or the ambiguous
or the mistaken or any ways otherwise the invalid, is
basically that for the summary to follow, that the contents
of otherwise the opaque at-rest transport format,
get the extraction to result the attributes, in scalars,
the values, for locale and collation.


Then, I know quite well all the standards of the textual,
now to learn enough about the Internet Message,
for Email and Usenet and MIME and HTTP's usual,
for example like "Usenet messages end on the wire
with a dot that's in an escapement otherwise, erm",
these kinds of things, resulting for this sort of BFF
message format, though it does give an entire directory
on the file system to each message in the representation,
with a write-once-read-many expectation as is pretty usual,
and soft-delete, and for operations message-wise,
here is getting into the particulars of "cd" and "ld",
these data derived from the Message, what results
a usual means for the validity and the transparency
of the textual in the content of the message.


This is of course, "Meta", to sci.math, and
humor is irrelevant to sci.math, but it's an
exercise in the study of Internet Protocols.
Ross Finlayson
2024-02-24 18:09:38 UTC
Reply
Permalink
IETF RFC

NNTP

3977 https://datatracker.ietf.org/doc/html/rfc3977
8054 https://www.rfc-editor.org/rfc/rfc8054

SMTP

5321 https://datatracker.ietf.org/doc/html/rfc5321
2821 https://www.ietf.org/rfc/rfc2821.txt
2822 https://datatracker.ietf.org/doc/html/rfc2822 <- Internet Message
Format

IMAP

3501 https://datatracker.ietf.org/doc/html/rfc3501
2683 https://datatracker.ietf.org/doc/html/rfc2683
4978 https://datatracker.ietf.org/doc/html/rfc4978
3516 https://datatracker.ietf.org/doc/html/rfc3516

POP3

1725 https://www.ietf.org/rfc/rfc1939.txt


MIME

2045 https://datatracker.ietf.org/doc/html/rfc2045
2049 https://datatracker.ietf.org/doc/html/rfc2049
2046 https://datatracker.ietf.org/doc/html/rfc2046

DEFLATE

1950 https://datatracker.ietf.org/doc/html/rfc1950
1951 https://datatracker.ietf.org/doc/html/rfc1951

HTTP

7231 https://datatracker.ietf.org/doc/html/rfc7231
7230 https://datatracker.ietf.org/doc/html/rfc7230

"dot-stuffing":

https://datatracker.ietf.org/doc/html/rfc3977#section-6.3.1.2


If posting is permitted, the article MUST be in the format specified
in Section 3.6 and MUST be sent by the client to the server as a
multi-line data block (see Section 3.1.1). Thus a single dot (".")
on a line indicates the end of the text, and lines starting with a
dot in the original text have that dot doubled during transmission.

https://datatracker.ietf.org/doc/html/rfc3977#section-6.3.2.2

If transmission of the article is requested, the client MUST send the
entire article, including headers and body, to the server as a
multi-line data block (see Section 3.1.1). Thus, a single dot (".")
on a line indicates the end of the text, and lines starting with a
dot in the original text have that dot doubled during transmission.



Well I was under the impression that there was something of
the dynamic in the headers, vis-a-vis the body, and that often
enough it's always ARTICLE not HEAD, BODY, or STAT, why with
regards to having hd and bd being separate files, is a thing.
Still though it can be nice to have them separate.

Then, for the message content at rest, there's "dot-stuffing",
this is basically an artifact of "dot alone on a line ends a post,
in a terminal window telnet'ed to an NNTP server", here with
regards to that POST and IHAVE and so on are supposed to deliver
it, and it's supposed to be returned both as part of the end of
the ARTICLE and also BODY but also HEAD, but it's supposed to
not be counted in :bytes, while though the spec says not to rely
on "bytes" because for example it's not ignored.

I.e. this is about "the NNTP of the thing" vis-a-vis, that as just a
message store, here is for studying SMTP and seeing what Email
says about it.

SMTP: SMTP indicates the end of the mail data by sending a
line containing only a "." (period or full stop). A transparency
procedure is used to prevent this from interfering with the user's
text (see section 4.5.2).

- Before sending a line of mail text, the SMTP client checks the
first character of the line. If it is a period, one additional
period is inserted at the beginning of the line.

- When a line of mail text is received by the SMTP server, it checks
the line. If the line is composed of a single period, it is
treated as the end of mail indicator. If the first character is a
period and there are other characters on the line, the first
character is deleted.



So here it's like dot-stuffing in NNTP, is sort of different than
dot-stuffing in SMTP, with regards to that I want the data to
be a constant at rest, then here about though then there's
also for having a text edition at rest, i.e. that "uncompressed"
makes for that it's the same for any kind of messages, vis-a-vis
the "end of data" or "dot-stuffing", ....


POP3: When all lines of the response have been sent, a
final line is sent, consisting of a termination octet (decimal code
046, ".") and a CRLF pair. If any line of the multi-line response
begins with the termination octet, the line is "byte-stuffed" by
pre-pending the termination octet to that line of the response.
Hence a multi-line response is terminated with the five octets
"CRLF.CRLF".

POP3 RETR: "After the initial +OK, the
POP3 server sends the message corresponding to the given
message-number, being careful to byte-stuff the termination
character (as with all multi-line responses)."

I don't mind just concatenating the termination sequence at
the end, it's a constant of fixed size, but I want the content
to be un-stuffed at rest, ....

"In order to simplify parsing, all POP3 servers are
required to use a certain format for scan listings. A
scan listing consists of the message-number of the
message, followed by a single space and the exact size of
the message in octets. Methods for calculating the exact
size of the message are described in the "Message Format"
section below. "

https://datatracker.ietf.org/doc/html/rfc2822#section-3.5
"Lines in a message MUST be a maximum of 998 characters
excluding the CRLF, but it is RECOMMENDED that lines be limited to 78
characters excluding the CRLF."


Hmm..., what I'm trying to figure out is how to store the data
at rest, in its pieces, that just concatenate back together to
form message composition, here variously that parts are
compressible or already compressed, and about the uncompressed,
whether to have dot-stuffing in the compressed and not-dot-stuffing
in the otherwise plain-text at rest, with regards to Usenet and Email
messages, and other usual bodies like HTTP with respect to MIME
and MIME multipart and so on. This is where there's something
like "oh about three and a half terabytes, uncompressed, a copy
of text Usenet", and figuring out how to have it so that it all fits
exploded all out on a modern filesystem, in this write-once-read-many
approach, (or, often enough, write-once-read-never), and that
ingesting the data is expeditious and it's very normative and tractable
at rest.

It gets into ideas like this, "name the files that are fragments of
deflate/gzip to something like h7/b7, where 7 is almost Z",
and "build the Huffman tables over sort of the whole world
as it's figured that they're sort of constant over time, for lots
of repeated constants in the headers", this kind of thing.
Mostly though it's the idea of having the file fragments
being concatenable with some reference files to stream them.

Then, as this is sort of an aside from the cd and ld, the
characters and lengths, of the summary metadata, as well
is about the extraction of the data, vis-a-vis the data at rest.
The idea is that whole extraction is "stream a concatenation
of the data at rest", while there's usually for overview and
search to be extracting attributes' values and resulting those
populate overviews, or for example renditions of threads,
and about the idea here of basically having NNTP, and then
IMAP sitting in front of that, and then also HTTP variously
in front of that, with that NNTP and IMAP and HTTP have
a very high affinity with respect to the usual operation of
their protocols, and also the content, here then with regards
to MIME, and for "MIME at rest", and this kind of thing.




One thing about summary, then is about, that's there's
derived data what is to make for extraction and summary,
sort, and search, then about access, which gets into values
that stored as files, are not write-once-read-many. Then,
whether to have this in the same directory as MessageId,
or to have the volatiles as they are, gets into the write-once-read-many
and about object stores and this kind of thing, with regards
to atomicity and changes, and this kind of thing. Basically
the idea for access is that that's IMAP and the status of
messages apiece for the login, for example, and then
hit counters, here with head-hits and body-hits for article-hits,
to help get an idea of hits to help establish relevance
of articles by accesses or hits, views. This would feed back
into the NOOBNB idea, with regards to figuring out views,
and some way to indicate like by viewing a related item,
to validate a view, this kind of thing.

It's sort of figured that the author-article pair is the
datum, then for those to get aggregated, with respect
to calling the login an author, here that all logins are
authors. Basically the idea with that is that the client
requesting the article would make it so, then for things
like "the IMAP fronting the NNTP and delegating the
author on down into the NNTP", and these kinds of things.


For MIME the idea seems to actually be to break the
parts on out into files into a subdirectory, that something
like "bm" indicates "body-MIME", then that MIME bodies
have a natural enough filesystem-representation,
where it results a good idea to make their transfer
and content encoding for the various transfer and
content encodings, and for delivering parts, ....
Then the usual idea of the MIME body as the
single-part MIME object, binary, basically is
for blobs, ..., then as with regards to those prepared
also "b7-at-rest" for delivering any kind of object,
here with its routing as a message besides as just
a usual kind of object-store.


https://datatracker.ietf.org/doc/html/rfc2046#section-5


The idea here is that it's great that messages, usually,
can just be considered exactly as they arrive, the
ingestion having added a Path element, say,
serialized and stored as they arrived from the wire,
and retreived and returned as back to it. Then,
messages in various structures, eventually have
parts and entities and messages in them and
transfer and content encodings that were applied
and data that is or isn't compressible and will or won't
by served as textual or as binary, or as reference, in
getting into the linked-content and "Content-ID",
the idea that large blobs of data are also aside.

Then, this idea is to store the entities and parts
and contained messages and blobs, at rest, as
where their content encoding and transfer encoding,
make for the repurposable and constant representations
at-rest, then that when it result either extraction, or,
retrieval, that the point here is that extraction is
"inside the envelope", then with the idea that
message-composition, should have it so that
largely the server just spews retrievals as
concatenating the parts at rest, or putting them
in content and transfer encodings, with regards
to eventually the transfer encoding, then the compression
layer as here is pretty usual, then the encryption and
compression layers on out, the idea being to make
those modular, factorizable, in terms of message-composition,
that it gets pretty involved yet then results handling
any kinds of Internet message content like this at all.


Hmm, ..., "quoted-printable".

https://datatracker.ietf.org/doc/html/rfc2049#section-4

"he process of composing a MIME entity can be modeled as being done
in a number of steps. Note that these steps are roughly similar to
those steps used in PEM [RFC-1421] ..."

(PEM, "Privacy Enhanced Mail", ....)


So, it's being kind of sorted out mostly how to get
the messages flowing pass-through, as much as possible,
this still being the BFF, with regards then to extraction,
and use cases for SFF.


About "the three and a half terabytes uncompressed
the Usenet archive", ....
Ross Finlayson
2024-02-26 05:25:37 UTC
Reply
Permalink
On 02/24/2024 10:09 AM, Ross Finlayson wrote:
> IETF RFC
>
> NNTP
>
> 3977 https://datatracker.ietf.org/doc/html/rfc3977
> 8054 https://www.rfc-editor.org/rfc/rfc8054
>
> SMTP
>
> 5321 https://datatracker.ietf.org/doc/html/rfc5321
> 2821 https://www.ietf.org/rfc/rfc2821.txt
> 2822 https://datatracker.ietf.org/doc/html/rfc2822 <- Internet Message
> Format
>
> IMAP
>
> 3501 https://datatracker.ietf.org/doc/html/rfc3501
> 2683 https://datatracker.ietf.org/doc/html/rfc2683
> 4978 https://datatracker.ietf.org/doc/html/rfc4978
> 3516 https://datatracker.ietf.org/doc/html/rfc3516
>
> POP3
>
> 1725 https://www.ietf.org/rfc/rfc1939.txt
>
>
> MIME
>
> 2045 https://datatracker.ietf.org/doc/html/rfc2045
> 2049 https://datatracker.ietf.org/doc/html/rfc2049
> 2046 https://datatracker.ietf.org/doc/html/rfc2046
>
> DEFLATE
>
> 1950 https://datatracker.ietf.org/doc/html/rfc1950
> 1951 https://datatracker.ietf.org/doc/html/rfc1951
>
> HTTP
>
> 7231 https://datatracker.ietf.org/doc/html/rfc7231
> 7230 https://datatracker.ietf.org/doc/html/rfc7230
>
> "dot-stuffing":
>
> https://datatracker.ietf.org/doc/html/rfc3977#section-6.3.1.2
>
>
> If posting is permitted, the article MUST be in the format specified
> in Section 3.6 and MUST be sent by the client to the server as a
> multi-line data block (see Section 3.1.1). Thus a single dot (".")
> on a line indicates the end of the text, and lines starting with a
> dot in the original text have that dot doubled during transmission.
>
> https://datatracker.ietf.org/doc/html/rfc3977#section-6.3.2.2
>
> If transmission of the article is requested, the client MUST send the
> entire article, including headers and body, to the server as a
> multi-line data block (see Section 3.1.1). Thus, a single dot (".")
> on a line indicates the end of the text, and lines starting with a
> dot in the original text have that dot doubled during transmission.
>
>
>
> Well I was under the impression that there was something of
> the dynamic in the headers, vis-a-vis the body, and that often
> enough it's always ARTICLE not HEAD, BODY, or STAT, why with
> regards to having hd and bd being separate files, is a thing.
> Still though it can be nice to have them separate.
>
> Then, for the message content at rest, there's "dot-stuffing",
> this is basically an artifact of "dot alone on a line ends a post,
> in a terminal window telnet'ed to an NNTP server", here with
> regards to that POST and IHAVE and so on are supposed to deliver
> it, and it's supposed to be returned both as part of the end of
> the ARTICLE and also BODY but also HEAD, but it's supposed to
> not be counted in :bytes, while though the spec says not to rely
> on "bytes" because for example it's not ignored.
>
> I.e. this is about "the NNTP of the thing" vis-a-vis, that as just a
> message store, here is for studying SMTP and seeing what Email
> says about it.
>
> SMTP: SMTP indicates the end of the mail data by sending a
> line containing only a "." (period or full stop). A transparency
> procedure is used to prevent this from interfering with the user's
> text (see section 4.5.2).
>
> - Before sending a line of mail text, the SMTP client checks the
> first character of the line. If it is a period, one additional
> period is inserted at the beginning of the line.
>
> - When a line of mail text is received by the SMTP server, it checks
> the line. If the line is composed of a single period, it is
> treated as the end of mail indicator. If the first character is a
> period and there are other characters on the line, the first
> character is deleted.
>
>
>
> So here it's like dot-stuffing in NNTP, is sort of different than
> dot-stuffing in SMTP, with regards to that I want the data to
> be a constant at rest, then here about though then there's
> also for having a text edition at rest, i.e. that "uncompressed"
> makes for that it's the same for any kind of messages, vis-a-vis
> the "end of data" or "dot-stuffing", ....
>
>
> POP3: When all lines of the response have been sent, a
> final line is sent, consisting of a termination octet (decimal code
> 046, ".") and a CRLF pair. If any line of the multi-line response
> begins with the termination octet, the line is "byte-stuffed" by
> pre-pending the termination octet to that line of the response.
> Hence a multi-line response is terminated with the five octets
> "CRLF.CRLF".
>
> POP3 RETR: "After the initial +OK, the
> POP3 server sends the message corresponding to the given
> message-number, being careful to byte-stuff the termination
> character (as with all multi-line responses)."
>
> I don't mind just concatenating the termination sequence at
> the end, it's a constant of fixed size, but I want the content
> to be un-stuffed at rest, ....
>
> "In order to simplify parsing, all POP3 servers are
> required to use a certain format for scan listings. A
> scan listing consists of the message-number of the
> message, followed by a single space and the exact size of
> the message in octets. Methods for calculating the exact
> size of the message are described in the "Message Format"
> section below. "
>
> https://datatracker.ietf.org/doc/html/rfc2822#section-3.5
> "Lines in a message MUST be a maximum of 998 characters
> excluding the CRLF, but it is RECOMMENDED that lines be limited to 78
> characters excluding the CRLF."
>
>
> Hmm..., what I'm trying to figure out is how to store the data
> at rest, in its pieces, that just concatenate back together to
> form message composition, here variously that parts are
> compressible or already compressed, and about the uncompressed,
> whether to have dot-stuffing in the compressed and not-dot-stuffing
> in the otherwise plain-text at rest, with regards to Usenet and Email
> messages, and other usual bodies like HTTP with respect to MIME
> and MIME multipart and so on. This is where there's something
> like "oh about three and a half terabytes, uncompressed, a copy
> of text Usenet", and figuring out how to have it so that it all fits
> exploded all out on a modern filesystem, in this write-once-read-many
> approach, (or, often enough, write-once-read-never), and that
> ingesting the data is expeditious and it's very normative and tractable
> at rest.
>
> It gets into ideas like this, "name the files that are fragments of
> deflate/gzip to something like h7/b7, where 7 is almost Z",
> and "build the Huffman tables over sort of the whole world
> as it's figured that they're sort of constant over time, for lots
> of repeated constants in the headers", this kind of thing.
> Mostly though it's the idea of having the file fragments
> being concatenable with some reference files to stream them.
>
> Then, as this is sort of an aside from the cd and ld, the
> characters and lengths, of the summary metadata, as well
> is about the extraction of the data, vis-a-vis the data at rest.
> The idea is that whole extraction is "stream a concatenation
> of the data at rest", while there's usually for overview and
> search to be extracting attributes' values and resulting those
> populate overviews, or for example renditions of threads,
> and about the idea here of basically having NNTP, and then
> IMAP sitting in front of that, and then also HTTP variously
> in front of that, with that NNTP and IMAP and HTTP have
> a very high affinity with respect to the usual operation of
> their protocols, and also the content, here then with regards
> to MIME, and for "MIME at rest", and this kind of thing.
>
>
>
>
> One thing about summary, then is about, that's there's
> derived data what is to make for extraction and summary,
> sort, and search, then about access, which gets into values
> that stored as files, are not write-once-read-many. Then,
> whether to have this in the same directory as MessageId,
> or to have the volatiles as they are, gets into the write-once-read-many
> and about object stores and this kind of thing, with regards
> to atomicity and changes, and this kind of thing. Basically
> the idea for access is that that's IMAP and the status of
> messages apiece for the login, for example, and then
> hit counters, here with head-hits and body-hits for article-hits,
> to help get an idea of hits to help establish relevance
> of articles by accesses or hits, views. This would feed back
> into the NOOBNB idea, with regards to figuring out views,
> and some way to indicate like by viewing a related item,
> to validate a view, this kind of thing.
>
> It's sort of figured that the author-article pair is the
> datum, then for those to get aggregated, with respect
> to calling the login an author, here that all logins are
> authors. Basically the idea with that is that the client
> requesting the article would make it so, then for things
> like "the IMAP fronting the NNTP and delegating the
> author on down into the NNTP", and these kinds of things.
>
>
> For MIME the idea seems to actually be to break the
> parts on out into files into a subdirectory, that something
> like "bm" indicates "body-MIME", then that MIME bodies
> have a natural enough filesystem-representation,
> where it results a good idea to make their transfer
> and content encoding for the various transfer and
> content encodings, and for delivering parts, ....
> Then the usual idea of the MIME body as the
> single-part MIME object, binary, basically is
> for blobs, ..., then as with regards to those prepared
> also "b7-at-rest" for delivering any kind of object,
> here with its routing as a message besides as just
> a usual kind of object-store.
>
>
> https://datatracker.ietf.org/doc/html/rfc2046#section-5
>
>
> The idea here is that it's great that messages, usually,
> can just be considered exactly as they arrive, the
> ingestion having added a Path element, say,
> serialized and stored as they arrived from the wire,
> and retreived and returned as back to it. Then,
> messages in various structures, eventually have
> parts and entities and messages in them and
> transfer and content encodings that were applied
> and data that is or isn't compressible and will or won't
> by served as textual or as binary, or as reference, in
> getting into the linked-content and "Content-ID",
> the idea that large blobs of data are also aside.
>
> Then, this idea is to store the entities and parts
> and contained messages and blobs, at rest, as
> where their content encoding and transfer encoding,
> make for the repurposable and constant representations
> at-rest, then that when it result either extraction, or,
> retrieval, that the point here is that extraction is
> "inside the envelope", then with the idea that
> message-composition, should have it so that
> largely the server just spews retrievals as
> concatenating the parts at rest, or putting them
> in content and transfer encodings, with regards
> to eventually the transfer encoding, then the compression
> layer as here is pretty usual, then the encryption and
> compression layers on out, the idea being to make
> those modular, factorizable, in terms of message-composition,
> that it gets pretty involved yet then results handling
> any kinds of Internet message content like this at all.
>
>
> Hmm, ..., "quoted-printable".
>
> https://datatracker.ietf.org/doc/html/rfc2049#section-4
>
> "he process of composing a MIME entity can be modeled as being done
> in a number of steps. Note that these steps are roughly similar to
> those steps used in PEM [RFC-1421] ..."
>
> (PEM, "Privacy Enhanced Mail", ....)
>
>
> So, it's being kind of sorted out mostly how to get
> the messages flowing pass-through, as much as possible,
> this still being the BFF, with regards then to extraction,
> and use cases for SFF.
>
>
> About "the three and a half terabytes uncompressed
> the Usenet archive", ....
>
>
>




https://en.wikipedia.org/wiki/Maildir

"Supported mailbox formats are Maildir, mbox, MH, Babyl, and MMDF."
https://docs.python.org/3/library/mailbox.html


Wow, technology's arrived at 3-D C-D's that store
an entire petabit, hundreds of thousands of gigabytes,
on one 3-D C-D.

So big it's like "yeah it's only bits not bytes,
but it's more than a quadrillion bits, on one 3-D C-D".

Not sure if petabits or pebibits, ....

Here the idea is that maildir has /tmp, /new, /cur,
in that just being files apiece with the contents,
that the idea is that BFF has directories apiece,
then that it seems needful to have at least one
file that is the message itself, and perhaps a
compressed edition, then that software that
expects a maildir, could just have symlinks
built for it, then figuring maildir apps could
move symlinks from /new to /cur, while the
BFF just sits at rest.

These days a usual notion of a store is an object-store,
or a volume that is like ext3 or ext4 filesystem, say.

Then, for sort of making it so, that BFF, is designed,
so that other "one message one file" organizations
can sit next to it, basically involves watching the
/new folder, and having that BFF folders have a sort
of ingestion program, ...

bff-drop/
bff-depo/
bff-repo/

figuring that bff-deposit is where BFF aware inputs
deposit their messages, then for moving the MessageId/
folder ensuite into bff-repo, then for the idea that
basically a helper app, makes symlinks from maildir layout,
into bff-repo, where one of the files in MessageId/
is the "plain message", and the symlinks build the conventions
of the maildir and this kind of thing.

The idea then is that tools that use maildir, basically
"don't maintain the maildir" in this kind of setup,
and that instead of /tmp -> /new -> ingestion, there's
instead a BFF file-watch on /tmp, that copies it to bff-drop/,
and a file-watch on bff-repo/, that builds a symlink in /new.

(What this may entail for this one message one directory
approach, is to have one message one directory two donefiles,
for a usual sort of touchfile convention to watch, for,
and delete, after the first donefile, indicates readiness.)

Or, the idea would be that procmail, or what drops mail
into maildir, would be configured that its /new is simply
pointed at bff-drop/, while other IMAP and so applications
using maildir, would point at a usual /new and /cur, in maildir,
that is just symlinks that a BFF file-watch on bff-drop,
maintains in the same convention.

Then its various that application using maildir also accept
the files at-rest being compressed, that here most of the
idea of bff-depo, is to deposit and decompose the messages
into the MessageId/ folder, then to move that up, then to
touch the MessageId/id file, which is the touchfile convention
when it exists and is fully-formed.

The idea here of decomposing the messages is that basically
the usual idea is to just deliver them exactly as arrive, but
the idea is that parts variously would have different modes
of compression, or encryption, to decompose them "to rest",
then to move them altogether to bff-repo, "at rest".

The ext3 supports about 32K sub-directories. So, where this
setup is "one message one directory", vis-a-vis, "one message
one file", so, while it an work out that there's a sort of
object-store view that's basically flat because MessageId's
are unique, still is for a hierarchical directory partitioning,
figuring that a good uniformizing hash-code will balance
those out. Here the idea is to run md5sum, result 128 bits,
then just split that into parts and xor them together.

Let's see, 2^4^4 = 2^16, less than 32k is less than 2^15,
so each directory name should be 14 or less hexadecimal
characters, each one 4 bits, 32 of those in an md5sum,
just splitting the md5 sum into 4-many 8-hexchar alphanumeric
letters, putting the MessageId/ folders under those,
figuring messages would be sparse in those, then though
that as they approach about 4 billion, is for figuring out
what is reaching the limits of the file system, about PATH_MAX,
NAME_MAX, according to symlinks, max directories, max files,
fileystem limits, and filesystem access times, these kinds of things.

Then, for filesystems though that support it, is basically
for either nesting subdirectories, or having a flat directory
where various modern filesystems or object-stores result
as many sub-directories as until they fill the disk.

The idea is that filesystems and object-stores have their
various guarantees, and limits, here getting into the
"write once read many" and "write once read never"
usual files, then about the entirely various use cases
of the ephemeral data what's derived and discardable,
that BFF always has a complete message in the various
renditions, then to work the extraction and updates,
at any later date.



IETF RFC

NNTP

https://datatracker.ietf.org/wg/nntpext/documents/

3977 https://datatracker.ietf.org/doc/html/rfc3977
8054 https://www.rfc-editor.org/rfc/rfc8054
6048 https://datatracker.ietf.org/doc/html/rfc6048

SMTP

5321 https://datatracker.ietf.org/doc/html/rfc5321
2821 https://www.ietf.org/rfc/rfc2821.txt
2822 https://datatracker.ietf.org/doc/html/rfc2822 <- Internet Message
Format
3030 https://www.ietf.org/rfc/rfc3030.txt

IMAP

3501 https://datatracker.ietf.org/doc/html/rfc3501
2683 https://datatracker.ietf.org/doc/html/rfc2683
4978 https://datatracker.ietf.org/doc/html/rfc4978
3516 https://datatracker.ietf.org/doc/html/rfc3516

POP3

1725 https://www.ietf.org/rfc/rfc1939.txt


Message Encapsulation / PEM

934 https://datatracker.ietf.org/doc/html/rfc934
1421 https://datatracker.ietf.org/doc/html/rfc1421
1422 https://datatracker.ietf.org/doc/html/rfc1422
1423 https://datatracker.ietf.org/doc/html/rfc1423
1424 https://datatracker.ietf.org/doc/html/rfc1424
7468 https://datatracker.ietf.org/doc/html/rfc7468

Language

4646 https://datatracker.ietf.org/doc/html/rfc4646
4647 https://datatracker.ietf.org/doc/html/rfc4647

MIME

2045 https://datatracker.ietf.org/doc/html/rfc2045
2049 https://datatracker.ietf.org/doc/html/rfc2049
2046 https://datatracker.ietf.org/doc/html/rfc2046
2047 https://datatracker.ietf.org/doc/html/rfc2047
4288 https://datatracker.ietf.org/doc/html/rfc4288
4289 https://datatracker.ietf.org/doc/html/rfc4289
1521 https://datatracker.ietf.org/doc/html/rfc1521
1522 https://datatracker.ietf.org/doc/html/rfc1522
2231 https://datatracker.ietf.org/doc/html/rfc2231

BASE64

4648 https://datatracker.ietf.org/doc/html/rfc4648

DEFLATE

1950 https://datatracker.ietf.org/doc/html/rfc1950
1951 https://datatracker.ietf.org/doc/html/rfc1951

HTTP

7231 https://datatracker.ietf.org/doc/html/rfc7231
7230 https://datatracker.ietf.org/doc/html/rfc7230
Ross Finlayson
2024-03-01 03:43:01 UTC
Reply
Permalink
On 02/25/2024 09:25 PM, Ross Finlayson wrote:
> On 02/24/2024 10:09 AM, Ross Finlayson wrote:
>> IETF RFC
>>
>> NNTP
>>
>> 3977 https://datatracker.ietf.org/doc/html/rfc3977
>> 8054 https://www.rfc-editor.org/rfc/rfc8054
>>
>> SMTP
>>
>> 5321 https://datatracker.ietf.org/doc/html/rfc5321
>> 2821 https://www.ietf.org/rfc/rfc2821.txt
>> 2822 https://datatracker.ietf.org/doc/html/rfc2822 <- Internet Message
>> Format
>>
>> IMAP
>>
>> 3501 https://datatracker.ietf.org/doc/html/rfc3501
>> 2683 https://datatracker.ietf.org/doc/html/rfc2683
>> 4978 https://datatracker.ietf.org/doc/html/rfc4978
>> 3516 https://datatracker.ietf.org/doc/html/rfc3516
>>
>> POP3
>>
>> 1725 https://www.ietf.org/rfc/rfc1939.txt
>>
>>
>> MIME
>>
>> 2045 https://datatracker.ietf.org/doc/html/rfc2045
>> 2049 https://datatracker.ietf.org/doc/html/rfc2049
>> 2046 https://datatracker.ietf.org/doc/html/rfc2046
>>
>> DEFLATE
>>
>> 1950 https://datatracker.ietf.org/doc/html/rfc1950
>> 1951 https://datatracker.ietf.org/doc/html/rfc1951
>>
>> HTTP
>>
>> 7231 https://datatracker.ietf.org/doc/html/rfc7231
>> 7230 https://datatracker.ietf.org/doc/html/rfc7230
>>
>> "dot-stuffing":
>>
>> https://datatracker.ietf.org/doc/html/rfc3977#section-6.3.1.2
>>
>>
>> If posting is permitted, the article MUST be in the format specified
>> in Section 3.6 and MUST be sent by the client to the server as a
>> multi-line data block (see Section 3.1.1). Thus a single dot (".")
>> on a line indicates the end of the text, and lines starting with a
>> dot in the original text have that dot doubled during transmission.
>>
>> https://datatracker.ietf.org/doc/html/rfc3977#section-6.3.2.2
>>
>> If transmission of the article is requested, the client MUST send the
>> entire article, including headers and body, to the server as a
>> multi-line data block (see Section 3.1.1). Thus, a single dot (".")
>> on a line indicates the end of the text, and lines starting with a
>> dot in the original text have that dot doubled during transmission.
>>
>>
>>
>> Well I was under the impression that there was something of
>> the dynamic in the headers, vis-a-vis the body, and that often
>> enough it's always ARTICLE not HEAD, BODY, or STAT, why with
>> regards to having hd and bd being separate files, is a thing.
>> Still though it can be nice to have them separate.
>>
>> Then, for the message content at rest, there's "dot-stuffing",
>> this is basically an artifact of "dot alone on a line ends a post,
>> in a terminal window telnet'ed to an NNTP server", here with
>> regards to that POST and IHAVE and so on are supposed to deliver
>> it, and it's supposed to be returned both as part of the end of
>> the ARTICLE and also BODY but also HEAD, but it's supposed to
>> not be counted in :bytes, while though the spec says not to rely
>> on "bytes" because for example it's not ignored.
>>
>> I.e. this is about "the NNTP of the thing" vis-a-vis, that as just a
>> message store, here is for studying SMTP and seeing what Email
>> says about it.
>>
>> SMTP: SMTP indicates the end of the mail data by sending a
>> line containing only a "." (period or full stop). A transparency
>> procedure is used to prevent this from interfering with the user's
>> text (see section 4.5.2).
>>
>> - Before sending a line of mail text, the SMTP client checks the
>> first character of the line. If it is a period, one additional
>> period is inserted at the beginning of the line.
>>
>> - When a line of mail text is received by the SMTP server, it checks
>> the line. If the line is composed of a single period, it is
>> treated as the end of mail indicator. If the first character is a
>> period and there are other characters on the line, the first
>> character is deleted.
>>
>>
>>
>> So here it's like dot-stuffing in NNTP, is sort of different than
>> dot-stuffing in SMTP, with regards to that I want the data to
>> be a constant at rest, then here about though then there's
>> also for having a text edition at rest, i.e. that "uncompressed"
>> makes for that it's the same for any kind of messages, vis-a-vis
>> the "end of data" or "dot-stuffing", ....
>>
>>
>> POP3: When all lines of the response have been sent, a
>> final line is sent, consisting of a termination octet (decimal code
>> 046, ".") and a CRLF pair. If any line of the multi-line response
>> begins with the termination octet, the line is "byte-stuffed" by
>> pre-pending the termination octet to that line of the response.
>> Hence a multi-line response is terminated with the five octets
>> "CRLF.CRLF".
>>
>> POP3 RETR: "After the initial +OK, the
>> POP3 server sends the message corresponding to the given
>> message-number, being careful to byte-stuff the termination
>> character (as with all multi-line responses)."
>>
>> I don't mind just concatenating the termination sequence at
>> the end, it's a constant of fixed size, but I want the content
>> to be un-stuffed at rest, ....
>>
>> "In order to simplify parsing, all POP3 servers are
>> required to use a certain format for scan listings. A
>> scan listing consists of the message-number of the
>> message, followed by a single space and the exact size of
>> the message in octets. Methods for calculating the exact
>> size of the message are described in the "Message Format"
>> section below. "
>>
>> https://datatracker.ietf.org/doc/html/rfc2822#section-3.5
>> "Lines in a message MUST be a maximum of 998 characters
>> excluding the CRLF, but it is RECOMMENDED that lines be limited to 78
>> characters excluding the CRLF."
>>
>>
>> Hmm..., what I'm trying to figure out is how to store the data
>> at rest, in its pieces, that just concatenate back together to
>> form message composition, here variously that parts are
>> compressible or already compressed, and about the uncompressed,
>> whether to have dot-stuffing in the compressed and not-dot-stuffing
>> in the otherwise plain-text at rest, with regards to Usenet and Email
>> messages, and other usual bodies like HTTP with respect to MIME
>> and MIME multipart and so on. This is where there's something
>> like "oh about three and a half terabytes, uncompressed, a copy
>> of text Usenet", and figuring out how to have it so that it all fits
>> exploded all out on a modern filesystem, in this write-once-read-many
>> approach, (or, often enough, write-once-read-never), and that
>> ingesting the data is expeditious and it's very normative and tractable
>> at rest.
>>
>> It gets into ideas like this, "name the files that are fragments of
>> deflate/gzip to something like h7/b7, where 7 is almost Z",
>> and "build the Huffman tables over sort of the whole world
>> as it's figured that they're sort of constant over time, for lots
>> of repeated constants in the headers", this kind of thing.
>> Mostly though it's the idea of having the file fragments
>> being concatenable with some reference files to stream them.
>>
>> Then, as this is sort of an aside from the cd and ld, the
>> characters and lengths, of the summary metadata, as well
>> is about the extraction of the data, vis-a-vis the data at rest.
>> The idea is that whole extraction is "stream a concatenation
>> of the data at rest", while there's usually for overview and
>> search to be extracting attributes' values and resulting those
>> populate overviews, or for example renditions of threads,
>> and about the idea here of basically having NNTP, and then
>> IMAP sitting in front of that, and then also HTTP variously
>> in front of that, with that NNTP and IMAP and HTTP have
>> a very high affinity with respect to the usual operation of
>> their protocols, and also the content, here then with regards
>> to MIME, and for "MIME at rest", and this kind of thing.
>>
>>
>>
>>
>> One thing about summary, then is about, that's there's
>> derived data what is to make for extraction and summary,
>> sort, and search, then about access, which gets into values
>> that stored as files, are not write-once-read-many. Then,
>> whether to have this in the same directory as MessageId,
>> or to have the volatiles as they are, gets into the write-once-read-many
>> and about object stores and this kind of thing, with regards
>> to atomicity and changes, and this kind of thing. Basically
>> the idea for access is that that's IMAP and the status of
>> messages apiece for the login, for example, and then
>> hit counters, here with head-hits and body-hits for article-hits,
>> to help get an idea of hits to help establish relevance
>> of articles by accesses or hits, views. This would feed back
>> into the NOOBNB idea, with regards to figuring out views,
>> and some way to indicate like by viewing a related item,
>> to validate a view, this kind of thing.
>>
>> It's sort of figured that the author-article pair is the
>> datum, then for those to get aggregated, with respect
>> to calling the login an author, here that all logins are
>> authors. Basically the idea with that is that the client
>> requesting the article would make it so, then for things
>> like "the IMAP fronting the NNTP and delegating the
>> author on down into the NNTP", and these kinds of things.
>>
>>
>> For MIME the idea seems to actually be to break the
>> parts on out into files into a subdirectory, that something
>> like "bm" indicates "body-MIME", then that MIME bodies
>> have a natural enough filesystem-representation,
>> where it results a good idea to make their transfer
>> and content encoding for the various transfer and
>> content encodings, and for delivering parts, ....
>> Then the usual idea of the MIME body as the
>> single-part MIME object, binary, basically is
>> for blobs, ..., then as with regards to those prepared
>> also "b7-at-rest" for delivering any kind of object,
>> here with its routing as a message besides as just
>> a usual kind of object-store.
>>
>>
>> https://datatracker.ietf.org/doc/html/rfc2046#section-5
>>
>>
>> The idea here is that it's great that messages, usually,
>> can just be considered exactly as they arrive, the
>> ingestion having added a Path element, say,
>> serialized and stored as they arrived from the wire,
>> and retreived and returned as back to it. Then,
>> messages in various structures, eventually have
>> parts and entities and messages in them and
>> transfer and content encodings that were applied
>> and data that is or isn't compressible and will or won't
>> by served as textual or as binary, or as reference, in
>> getting into the linked-content and "Content-ID",
>> the idea that large blobs of data are also aside.
>>
>> Then, this idea is to store the entities and parts
>> and contained messages and blobs, at rest, as
>> where their content encoding and transfer encoding,
>> make for the repurposable and constant representations
>> at-rest, then that when it result either extraction, or,
>> retrieval, that the point here is that extraction is
>> "inside the envelope", then with the idea that
>> message-composition, should have it so that
>> largely the server just spews retrievals as
>> concatenating the parts at rest, or putting them
>> in content and transfer encodings, with regards
>> to eventually the transfer encoding, then the compression
>> layer as here is pretty usual, then the encryption and
>> compression layers on out, the idea being to make
>> those modular, factorizable, in terms of message-composition,
>> that it gets pretty involved yet then results handling
>> any kinds of Internet message content like this at all.
>>
>>
>> Hmm, ..., "quoted-printable".
>>
>> https://datatracker.ietf.org/doc/html/rfc2049#section-4
>>
>> "he process of composing a MIME entity can be modeled as being done
>> in a number of steps. Note that these steps are roughly similar to
>> those steps used in PEM [RFC-1421] ..."
>>
>> (PEM, "Privacy Enhanced Mail", ....)
>>
>>
>> So, it's being kind of sorted out mostly how to get
>> the messages flowing pass-through, as much as possible,
>> this still being the BFF, with regards then to extraction,
>> and use cases for SFF.
>>
>>
>> About "the three and a half terabytes uncompressed
>> the Usenet archive", ....
>>
>>
>>
>
>
>
>
> https://en.wikipedia.org/wiki/Maildir
>
> "Supported mailbox formats are Maildir, mbox, MH, Babyl, and MMDF."
> https://docs.python.org/3/library/mailbox.html
>
>
> Wow, technology's arrived at 3-D C-D's that store
> an entire petabit, hundreds of thousands of gigabytes,
> on one 3-D C-D.
>
> So big it's like "yeah it's only bits not bytes,
> but it's more than a quadrillion bits, on one 3-D C-D".
>
> Not sure if petabits or pebibits, ....
>
> Here the idea is that maildir has /tmp, /new, /cur,
> in that just being files apiece with the contents,
> that the idea is that BFF has directories apiece,
> then that it seems needful to have at least one
> file that is the message itself, and perhaps a
> compressed edition, then that software that
> expects a maildir, could just have symlinks
> built for it, then figuring maildir apps could
> move symlinks from /new to /cur, while the
> BFF just sits at rest.
>
> These days a usual notion of a store is an object-store,
> or a volume that is like ext3 or ext4 filesystem, say.
>
> Then, for sort of making it so, that BFF, is designed,
> so that other "one message one file" organizations
> can sit next to it, basically involves watching the
> /new folder, and having that BFF folders have a sort
> of ingestion program, ...
>
> bff-drop/
> bff-depo/
> bff-repo/
>
> figuring that bff-deposit is where BFF aware inputs
> deposit their messages, then for moving the MessageId/
> folder ensuite into bff-repo, then for the idea that
> basically a helper app, makes symlinks from maildir layout,
> into bff-repo, where one of the files in MessageId/
> is the "plain message", and the symlinks build the conventions
> of the maildir and this kind of thing.
>
> The idea then is that tools that use maildir, basically
> "don't maintain the maildir" in this kind of setup,
> and that instead of /tmp -> /new -> ingestion, there's
> instead a BFF file-watch on /tmp, that copies it to bff-drop/,
> and a file-watch on bff-repo/, that builds a symlink in /new.
>
> (What this may entail for this one message one directory
> approach, is to have one message one directory two donefiles,
> for a usual sort of touchfile convention to watch, for,
> and delete, after the first donefile, indicates readiness.)
>
> Or, the idea would be that procmail, or what drops mail
> into maildir, would be configured that its /new is simply
> pointed at bff-drop/, while other IMAP and so applications
> using maildir, would point at a usual /new and /cur, in maildir,
> that is just symlinks that a BFF file-watch on bff-drop,
> maintains in the same convention.
>
> Then its various that application using maildir also accept
> the files at-rest being compressed, that here most of the
> idea of bff-depo, is to deposit and decompose the messages
> into the MessageId/ folder, then to move that up, then to
> touch the MessageId/id file, which is the touchfile convention
> when it exists and is fully-formed.
>
> The idea here of decomposing the messages is that basically
> the usual idea is to just deliver them exactly as arrive, but
> the idea is that parts variously would have different modes
> of compression, or encryption, to decompose them "to rest",
> then to move them altogether to bff-repo, "at rest".
>
> The ext3 supports about 32K sub-directories. So, where this
> setup is "one message one directory", vis-a-vis, "one message
> one file", so, while it an work out that there's a sort of
> object-store view that's basically flat because MessageId's
> are unique, still is for a hierarchical directory partitioning,
> figuring that a good uniformizing hash-code will balance
> those out. Here the idea is to run md5sum, result 128 bits,
> then just split that into parts and xor them together.
>
> Let's see, 2^4^4 = 2^16, less than 32k is less than 2^15,
> so each directory name should be 14 or less hexadecimal
> characters, each one 4 bits, 32 of those in an md5sum,
> just splitting the md5 sum into 4-many 8-hexchar alphanumeric
> letters, putting the MessageId/ folders under those,
> figuring messages would be sparse in those, then though
> that as they approach about 4 billion, is for figuring out
> what is reaching the limits of the file system, about PATH_MAX,
> NAME_MAX, according to symlinks, max directories, max files,
> fileystem limits, and filesystem access times, these kinds of things.
>
> Then, for filesystems though that support it, is basically
> for either nesting subdirectories, or having a flat directory
> where various modern filesystems or object-stores result
> as many sub-directories as until they fill the disk.
>
> The idea is that filesystems and object-stores have their
> various guarantees, and limits, here getting into the
> "write once read many" and "write once read never"
> usual files, then about the entirely various use cases
> of the ephemeral data what's derived and discardable,
> that BFF always has a complete message in the various
> renditions, then to work the extraction and updates,
> at any later date.
>
>
>
> IETF RFC
>
> NNTP
>
> https://datatracker.ietf.org/wg/nntpext/documents/
>
> 3977 https://datatracker.ietf.org/doc/html/rfc3977
> 8054 https://www.rfc-editor.org/rfc/rfc8054
> 6048 https://datatracker.ietf.org/doc/html/rfc6048
>
> SMTP
>
> 5321 https://datatracker.ietf.org/doc/html/rfc5321
> 2821 https://www.ietf.org/rfc/rfc2821.txt
> 2822 https://datatracker.ietf.org/doc/html/rfc2822 <- Internet Message
> Format
> 3030 https://www.ietf.org/rfc/rfc3030.txt
>
> IMAP
>
> 3501 https://datatracker.ietf.org/doc/html/rfc3501
> 2683 https://datatracker.ietf.org/doc/html/rfc2683
> 4978 https://datatracker.ietf.org/doc/html/rfc4978
> 3516 https://datatracker.ietf.org/doc/html/rfc3516
>
> POP3
>
> 1725 https://www.ietf.org/rfc/rfc1939.txt
>
>
> Message Encapsulation / PEM
>
> 934 https://datatracker.ietf.org/doc/html/rfc934
> 1421 https://datatracker.ietf.org/doc/html/rfc1421
> 1422 https://datatracker.ietf.org/doc/html/rfc1422
> 1423 https://datatracker.ietf.org/doc/html/rfc1423
> 1424 https://datatracker.ietf.org/doc/html/rfc1424
> 7468 https://datatracker.ietf.org/doc/html/rfc7468
>
> Language
>
> 4646 https://datatracker.ietf.org/doc/html/rfc4646
> 4647 https://datatracker.ietf.org/doc/html/rfc4647
>
> MIME
>
> 2045 https://datatracker.ietf.org/doc/html/rfc2045
> 2049 https://datatracker.ietf.org/doc/html/rfc2049
> 2046 https://datatracker.ietf.org/doc/html/rfc2046
> 2047 https://datatracker.ietf.org/doc/html/rfc2047
> 4288 https://datatracker.ietf.org/doc/html/rfc4288
> 4289 https://datatracker.ietf.org/doc/html/rfc4289
> 1521 https://datatracker.ietf.org/doc/html/rfc1521
> 1522 https://datatracker.ietf.org/doc/html/rfc1522
> 2231 https://datatracker.ietf.org/doc/html/rfc2231
>
> BASE64
>
> 4648 https://datatracker.ietf.org/doc/html/rfc4648
>
> DEFLATE
>
> 1950 https://datatracker.ietf.org/doc/html/rfc1950
> 1951 https://datatracker.ietf.org/doc/html/rfc1951
>
> HTTP
>
> 7231 https://datatracker.ietf.org/doc/html/rfc7231
> 7230 https://datatracker.ietf.org/doc/html/rfc7230
>



So, thinking about the file-system layout of
the backing file format, it seems sure that
the compressed edition is stored, to save space,
while, if it results enough time is spent decompressing
it, to result storing the uncompressed edition,
to save time.

Then, with regards to storing the head and body
separately, message = head + CRLF + body +- dot-stuffing,
it seems the idea is to have it so that the splitting is
a varia concern, and the dot-stuffing is a varia concern,
among "write-one-read-many", "write-once-read-never",
and "wire format". (Message length in :bytes is generally
considered not including dot-stuffing, which is only
relevant to NNTP and POP3.) There's a perceived requirement
that wire data at rest in files greatly facilitates vector I/O
with disk-controller to DMA to NIC, yet as above in the
discussion, when TLS or SASL get involved, encryption,
is for figuring out the "very-weak encryption at rest",
vis-a-vis "the nanny, watchdog, sentinel, and doorman".
The other main idea is "compress the data at rest".
It's an idea that open file handles are a limited resource
and that opening and closing files is slow, yet it's general
purpose and "tractable to tooling".

Then, the wire format including TLS, seems to just leave
space for that in the "wire-ready" files, then load those
into direct memory, which is a limited resource, then
to act upon those buffers as with that, without resizing
them, then to write those fully on out.

SASL

https://datatracker.ietf.org/doc/html/rfc4643


So, it pretty much seems the idea that the default
store should be the compressed message splits,
head and body, then these sorts derived.

head
body
message
body dot-stuffed
message dot-stuffed

compressed, uncompressed
compressed and encryption blocked
uncompressed and encryption blocked

Here it's with the idea that, whatever rendtion
of the head +body result being made wire data,
gets written as the file, then over time to save space,
what isn't the compressed reference just gets deleted.

Then still this doesn't really design what to do
for the MIME parts for the incompressible data
and how to avoid trying to recompress it, and
for the external data and about MIME references
to external data or "larger blobs" or just system entities,
doesn't really say.
Ross Finlayson
2024-03-02 21:44:02 UTC
Reply
Permalink
On 02/20/2024 08:38 PM, Ross Finlayson wrote:
>
>
> Alright then, about the SFF, "summary" file-format,
> "sorted" file-format, "search" file-format, the idea
> here is to figure out normal forms of summary,
> that go with the posts, with the idea that "a post's
> directory is on the order of contained size of the
> size of the post", while, "a post's directory is on
> a constant order of entries", here is for sort of
> summarizing what a post's directory looks like
> in "well-formed BFF", then as with regards to
> things like Intermediate file-formats as mentioned
> above here with the goal of "very-weakly-encrypted
> at rest as constant contents", then here for
> "SFF files, either in the post's-directory or
> on the side, and about how links to them get
> collected to directories in a filesystem structure
> for the conventions of the concatenation of files".
>
> So, here the idea so far is that BFF has a normative
> form for each post, which has a particular opaque
> globally-universal unique identifier, the Message-ID,
> then that the directory looks like MessageId/ then its
> contents were as these files.
>
> id hd bd yd td rd ad dd ud xd
> id, header, body, year-to-date, thread, referenced, authored, dead,
> undead, expired
>
> or just files named
>
> i h b y t r a d u x
>
> which according to the presence of the files and
> their contents, indicate that the presence of the
> MessageId/ directory indicates the presence of
> a well-formed message, contingent not being expired.
>
> ... Where hd bd are the message split into its parts,
> with regards to the composition of messages by
> concatenating those back together with the computed
> message numbers and this kind of thing, with regards to
> the site, and the idea that they're stored at-rest pre-compressed,
> then knowledge of the compression algorithm makes for
> concatenating them in message-composition as compressed.
>
> Then, there are variously already relations of the
> posts, according to groups, then here as above that
> there's perceived required for date, and author.
> I.e. these are files on the order the counts of posts,
> or span in time, or count of authors.
>
> (About threading and relating posts, is the idea of
> matching subjects not-so-much but employing the
> References header, then as with regards to IMAP and
> parity as for IMAP's THREADS extension, ...,
> www.rfc-editor.org/rfc/rfc5256.html , cf SORT and THREAD.
> There's a usual sort of notion that sorted, threaded
> enumeration is either in date order or thread-tree
> traversal order, usually more sensibly date order,
> with regards to breaking out sub-threads, variously.
> "It's all one thread." IMAP: "there is an implicit sort
> criterion of sequence number".)
>
>
> Then, similarly is for defining models for the sort, summary,
> search, SFF, that it sort of (ha) rather begins with sort,
> about the idea that it's sort of expected that there will
> be a date order partition either as symlinks or as an index file,
> or as with regards to that messages date is also stored in
> the yd file, then as with regards to "no file-times can be
> assumed or reliable", with regards to "there's exactly one
> file named YYYY-MM-DD-HH-MM-SS in MessageId/", these
> kinds of things. There's a real goal that it works easy
> with shell built-ins and text-utils, or "command line",
> to work with the files.
>
>
> So, sort pretty well goes with filtering.
> If you're familiar with the context, of, "data tables",
> with a filter-predicate and a sort-predicate,
> they're different things but then go together.
> It's figured that they get front-ended according
> to the quite most usual "column model" of the
> "table model" then "yes/no/maybe" row filtering
> and "multi-sort" row sorting. (In relational algebra, ...,
> or as rather with 'relational algebra with rows and nulls',
> this most usual sort of 'composable filtering' and 'multi-sort').
>
> Then in IMAP, the THREAD command is "a variant of
> SEARCH with threading semantics for the results".
> This is where both posts and emails work off the
> References header, but it looks like in the wild there
> is something like "a vendor does poor-man's subject
> threading for you and stuffs in a X-References",
> this kind of thing, here with regards to that
> instead of concatenation, is that intermediate
> results get sorted and threaded together,
> then those, get interleaved and stably sorted
> together, that being sort of the idea, with regards
> to search results in or among threads.
>
> (Cf www.jwz.org/doc/threading.html as
> via www.rfc-editor.org/rfc/rfc5256.html ,
> with regards to In-Reply-To and References.
> There are some interesting articles there
> about "mailbox summarization".)
>
> About the summary of posts, one way to start
> as for example an interesting article about mailbox
> summarization gets into, is, all the necessary text-encodings
> to result UTF-8, of Unicode, after UCS-2 or UCS-4 or ASCII,
> or CP-1252, in the base of BE or LE BOMs, or anything to
> do with summarizing the character data, of any of the
> headers, or the body of the text, figuring of course
> that everything's delivered as it arrives, as with regards
> to the opacity usually of everything vis-a-vis its inspection.
>
> This could be a normative sort of file that goes in the messageId/
> folder.
>
> cd: character-data, a summary of whatever form of character
> encoding or requirements of unfolding or unquoting or in
> the headers or the body or anywhere involved indicating
> a stamp indicating each of the encodings or character sets.
>
> Then, the idea is that it's a pretty deep inspection to
> figure out how the various attributes, what are their
> encodings, and the body, and the contents, with regards
> to a sort of, "a normalized string indicating the necessary
> character encodings necessary to extract attributes and
> given attributes and the body and given sections", for such
> matters of indicating the needful for things like sort,
> and collation, in internationalization and localization,
> aka i18n and l10n. (Given that the messages are stored
> as they arrived and undisturbed.)
>
> The idea is that "the cd file doesn't exist for messages
> in plain ASCII7, but for anything anywhere else, breaks
> out what results how to get it out". This is where text
> is often in a sort of format like this.
>
> Ascii
> it's keyboard characters
> ISO8859-1/ISO8859-15/CP-1252
> it's Latin1 often though with the Windows guys
> Sideout
> it's Ascii with 0-127 gigglies or upper glyphs
> Wideout
> it's 0-256 with any 256 wide characters in upper Unicode planes
> Unicode
> it's Unicode
>
> Then there are all sorts of encodings, this is according to
> the rules of Messages with regards to header and body
> and content and transfer-encoding and all these sorts
> things, it's Unicode.
>
> Then, another thing to get figured out is lengths,
> the size of contents or counts or lengths, figuring
> that it's a great boon to message-composition to
> allocate exactly what it needs for when, as a sum
> of invariant lengths.
>
> Then the MessageId/ files still has un-used 'l' and 's',
> then though that 'l' looks too close to '1', here it's
> sort of unambiguous.
>
> ld: lengthed, the coded and uncoded lengths of attributes and parts
>
> The idea here is to make it easiest for something like
> "consult the lengths and allocate it raw, concatenate
> the message into it, consult the lengths and allocate
> it uncoded, uncode the message into it".
>
> So, getting into the SFF, is that basically
> "BFF indicates well-formed messages or their expiry",
> "SFF is derived via a common algorithm for all messages",
> and "some SFF lives next to BFF and is also write-once-read-many",
> vis-a-vis that "generally SFF is discardable because it's derivable".
>
>



So, figuring that BFF then is about designed,
basically for storing Internet messages with
regards to MessageId, then about ContentId
and external resources separately, then here
the idea again becomes how to make for
the SFF files, what results, intermediate, tractable,
derivable, discardable, composable data structures,
in files of a format with regards to write-once-read-many,
write-once-read-never, and, "partition it", in terms of
natural partitions like time intervals and categorical attributes.


There are some various great open-source search
engines, here with respect to something like Lucene
or SOLR or ElasticSearch.

The idea is that there are attributes searches,
and full-text searches, those resulting hits,
to documents apiece, or sections of their content,
then backward along their attributes, like
threads and related threads, and authors and
their cliques, while across groups and periods
of time.

There's not much of a notion of "semantic search",
though, it's expected to sort of naturally result,
here as for usually enough least distance, as for
"the terms of matching", and predicates from what
results a filter predicate, here with what I call,
"Yes/No/Maybe".

Now, what is, "yes/no/maybe", one might ask.
Well, it's the query specification, of the world
of results, to filter to the specified results.
The idea is that there's an accepter network
for "Yes" and a rejector network for "No"
and an accepter network for "Maybe" and
then rest are rejected.

The idea is that the search, is a combination
of a bunch of yes/no/maybe terms, or,
sure/no/yes, to indicate what's definitely
included, what's not, and what is, then that
the term, results that it's composable, from
sorting the terms, to result a filter predicate
implementation, that can run anywhere along
the way, from the backend to the frontend,
this way being a, "search query specification".


There are notions like, "*", and single match
and multimatch, about basically columns and
a column model, of documents, that are
basically rows.


The idea of course is to built an arithmetic expression,
that also is exactly a natural expression,
for "matches", and "ranges".

"AP"|Archimedes|Plutonium in first|last

Here, there is a search, for various names, that
it composes this way.

AP first
AP last
Archimedes first
Archimedes last
Plutonium first
Plutonium last

As you can see, these "match terms", just naturally
break out, then that what's gets into negations,
break out and double, and what gets into ranges,
then, well that involves for partitions and ranges,
duplicating and breaking that out.

It results though a very fungible and normal form
of a search query specification, that rebuilds the
filter predicate according to sorting those, then
has very well understood runtime according to
yes/no/maybe and the multimatch, across and
among multiple attributes, multiple terms.


This sort of enriches a usual sort of query
"exact full hit", with this sort "ranges and conditions,
exact full hits".

So, the Yes/No/Maybe, is the generic search query
specification, overall, just reflecting an accepter/rejector
network, with a bit on the front to reflect keep/toss,
that's it's very practical and of course totally commonplace
and easily written broken out as find or wildmat specs.

For then these the objects and the terms relating
the things, there's about maintaining this, while
refining it, that basically there's an ownership
and a reference count of the filter objects, so
that various controls according to the syntax of
the normal form of the expression itself, with
most usual English terms like "is" and "in" and
"has" and "between", and "not", with & for "and"
and | for "or", makes that this should be the kind
of filter query specification that one would expect
to be general purpose on all such manners of
filter query specifications and their controls.

So, a normal form for these filter objects, then
gets relating them to the SFF files, because, an
SFF file of a given input corpus, satisifies some
of these specifications, the queries, or for example
doesn't, about making the language and files
first of the query, then the content, then just
mapping those to the content, which are built
off extractors and summarizers.

I already thought about this a lot. It results
that it sort of has its own little theory,
thus what can result its own little normal forms,
for making a fungible SFF description, what
results for any query, going through those,
running the same query or as so filtered down
the query for the partition already, from the
front-end to the back-end and back, a little
noisy protocol, that delivers search results.
Ross Finlayson
2024-03-04 19:23:40 UTC
Reply
Permalink
So, figuring that BFF then is about designed,
basically for storing Internet messages with
regards to MessageId, then about ContentId
and external resources separately, then here
the idea again becomes how to make for
the SFF files, what results, intermediate, tractable,
derivable, discardable, composable data structures,
in files of a format with regards to write-once-read-many,
write-once-read-never, and, "partition it", in terms of
natural partitions like time intervals and categorical attributes.


There are some various great open-source search
engines, here with respect to something like Lucene
or SOLR or ElasticSearch.

The idea is that there are attributes searches,
and full-text searches, those resulting hits,
to documents apiece, or sections of their content,
then backward along their attributes, like
threads and related threads, and authors and
their cliques, while across groups and periods
of time.

There's not much of a notion of "semantic search",
though, it's expected to sort of naturally result,
here as for usually enough least distance, as for
"the terms of matching", and predicates from what
results a filter predicate, here with what I call,
"Yes/No/Maybe".

Now, what is, "yes/no/maybe", one might ask.
Well, it's the query specification, of the world
of results, to filter to the specified results.
The idea is that there's an accepter network
for "Yes" and a rejector network for "No"
and an accepter network for "Maybe" and
then rest are rejected.

The idea is that the search, is a combination
of a bunch of yes/no/maybe terms, or,
sure/no/yes, to indicate what's definitely
included, what's not, and what is, then that
the term, results that it's composable, from
sorting the terms, to result a filter predicate
implementation, that can run anywhere along
the way, from the backend to the frontend,
this way being a, "search query specification".


There are notions like, "*", and single match
and multimatch, about basically columns and
a column model, of documents, that are
basically rows.


The idea of course is to built an arithmetic expression,
that also is exactly a natural expression,
for "matches", and "ranges".

"AP"|Archimedes|Plutonium in first|last

Here, there is a search, for various names, that
it composes this way.

AP first
AP last
Archimedes first
Archimedes last
Plutonium first
Plutonium last

As you can see, these "match terms", just naturally
break out, then that what's gets into negations,
break out and double, and what gets into ranges,
then, well that involves for partitions and ranges,
duplicating and breaking that out.

It results though a very fungible and normal form
of a search query specification, that rebuilds the
filter predicate according to sorting those, then
has very well understood runtime according to
yes/no/maybe and the multimatch, across and
among multiple attributes, multiple terms.


This sort of enriches a usual sort of query
"exact full hit", with this sort "ranges and conditions,
exact full hits".

So, the Yes/No/Maybe, is the generic search query
specification, overall, just reflecting an accepter/rejector
network, with a bit on the front to reflect keep/toss,
that's it's very practical and of course totally commonplace
and easily written broken out as find or wildmat specs.

For then these the objects and the terms relating
the things, there's about maintaining this, while
refining it, that basically there's an ownership
and a reference count of the filter objects, so
that various controls according to the syntax of
the normal form of the expression itself, with
most usual English terms like "is" and "in" and
"has" and "between", and "not", with & for "and"
and | for "or", makes that this should be the kind
of filter query specification that one would expect
to be general purpose on all such manners of
filter query specifications and their controls.

So, a normal form for these filter objects, then
gets relating them to the SFF files, because, an
SFF file of a given input corpus, satisifies some
of these specifications, the queries, or for example
doesn't, about making the language and files
first of the query, then the content, then just
mapping those to the content, which are built
off extractors and summarizers.

I already thought about this a lot. It results
that it sort of has its own little theory,
thus what can result its own little normal forms,
for making a fungible SFF description, what
results for any query, going through those,
running the same query or as so filtered down
the query for the partition already, from the
front-end to the back-end and back, a little
noisy protocol, that delivers search results.




The document is element of the corpus.
Here each message is a corpus. Now,
there's a convention in Internet messages,
not always followed, being that the ignorant
or lacking etiquette or just plain different,
don't follow it or break it, there's a convention
of attribution in Internet messages the
content that's replied to, and, this is
variously "block" or "inline".

From the outside though, the document here
has the "overview" attributes, the key-value
pairs of the headers those being, and the
"body" or "document" itself, which can as
well have extracted attributes, vis-a-vis
otherwise its, "full text".

https://en.wikipedia.org/wiki/Search_engine_indexing


The key thing here for partitioning is to
make for date-range partitioning, while,
the organization of the messages by ID is
essentially flat, and constant rate to access one
but linear to trawl through them, although parallelizable,
for example with a parallelizable filter predicate
like yes/no/maybe, before getting into the
inter-document of terms, here the idea is that
there's basically

date partition
group partition

then as with regards to

threads
authors

that these are each having their own linear organization,
or as with respect to time-series partitions, and the serial.

Then, there are two sorts of data structures
to build with:

binary trees,
bit-maps.

So, the idea is to build indexes for date ranges
and then just search separately, either linear
or from an in-memory currency, the current.

I'm not too interested in "rapid results" as
much as "thoroughly parallelizable and
effectively indexed", and "providing
incremental results" and "full hits".

The idea here is to relate date ranges,
to an index file for the groups files,
then to just search the date ranges,
and for example as maybe articles expire,
which here they don't as it's archival,
to relate dropping old partitions with
updating the groups indexes.

For NNTP and IMAP then there's,
OVERVIEW and SEARCH. So, the
key attributes relevant those protocols,
are here to make it so that messages
have an abstraction of an extraction,
those being fixed as what results,
then those being very naively composable,
with regards to building data structures
of those, what with regards to match terms,
evaluate matches in ranges on those.

Now, NNTP is basically write-once-read-many,
though I suppose it's mostly write-once-read-
maybe-a-few-times-then-never, while IMAP
basically adds to the notion of the session,
what's read and un-read, and, otherwise
with regards to flags, IMAP flags. I.e. flags
are variables, all this other stuff being constants.


So, there's an idea to build a sort of, top-down,
or onion-y, layered, match-finder. This is where
it's naively composable to concatenate the
world of terms, in attributes, of documents,
in date ranges and group partitions, to find
"there is a hit" then to dive deeper into it,
figuring the idea is to horizontally scale
by refining date partitions and serial collections,
then parallelize those, where as well that serial
algorithms work the same on those, eg, by
concatenating those and working on that.

This is where a group and a date partition
each have a relatively small range, of overview
attributes, and their values, then that for
noisy values, like timestamps, to detect those
and work out what are small cardinal categories
and large cardinal ergodic identifiers.

It's sort of like, "Why don't you check out the
book Information Retrieval and read that again",
and, in a sense, it's because I figure that Google
has littered all their no-brainer patterns with junk patents
that instead I expect to clean-room and prior-art this.
Maybe that's not so, I just wonder sometimes how
they've arrived at monopolizing what's a totally
usual sort of "fetch it" routine.


So, the goal is to find hits, in conventions of
documents, inside the convention of quoting,
with regards to
bidirectional relations of correspondence, and,
unidirectional relations of nesting, those
being terms for matching, and building matching,
then that the match document, is just copied
and sent to each partition in parallel, each
resulting its hits.

The idea is to show a sort of search plan, over
the partitions, then that there's incremental
progress and expected times displayed, and
incremental results gathered, digging it up.

There's basically for partitions "has-a-hit" and
"hit-count", "hit-list", "hit-stream". That might
sound sort of macabre, but it means search hits
not mob hits, then for the keep/toss and yes/no/maybe,
that partitions are boundaries of sorts, on down
to ideas of "document-level" and "attribute-level"
aspects of, "intromissive and extromissive visibility".


https://lucene.apache.org/core/3_5_0/fileformats.html

https://solr.apache.org/guide/solr/latest/configuration-guide/index-location-format.html

It seems sort of sensible to adapt to Lucene's index file format,
or, it's pretty sensible, then with regards to default attributes
and this kind of thing, and the idea that threads are
documents for searching in threads and finding the
content actually aside the quotes.

The Lucene's index file format, isn't a data structure itself,
in terms of a data structure built for b-tree/b-map, where
the idea is to result a file, that's a serialization of a data
structure, within it, the pointer relations as to offsets
in the file, so that, it can be loaded into memory and
run, or that, I/O can seek through it and run, but especially
that, it can be mapped into memory and run.

I.e., "implementing the lookup" as following pointer offsets
in files, vis-a-vis a usual idea that the pointers are just links
in the tree or off the map, is one of these "SFF" files.

So, for an "index", it's really sort of only the terms then
that they're inverted from the documents that contain
them, to point back to them.

Then, because there are going to be index files for each
partition, is that there are terms and there are partitions,
with the idea that the query's broken out by organization,
so that search proceeds only when there's matching partitions,
then into matching terms.

AP 2020-2023

* AP
!afore(2020)
!after(2023)

AP 2019, 2024

* AP
!afore(2019)
!after(2019)

* AP
!afore(2024)
!after(2024)


Here for example the idea is to search the partitions
according to they match "natural" date terms, vis-a-vis,
referenced dates, and matching the term in any fields,
then that the range terms result either one query or
two, in the sense of breaking those out and resulting
that then their results get concatenated.

You can see that "in", here, as "between", for example
in terms of range, is implemented as "not out", for
that this way the Yes/No/Maybe, Sure/No/Yes, runs

match _any_ Sure: yes
match _any_ No: no
match _all_ Yes: yes
no

I.e. it's not a "Should/Must/MustNot Boolean" query.

What happens is that this way everything sort
of "or's" together "any", then when are introduced
no's, then those double about, when introduced
between's, those are no's, and when disjoint between's,
those break out otherwise redundant but separately
partitionable, queries.

AP not subject|body AI

not subject AI
not body AI
AP

Then the filter objects have these attributes:
owner, refcount, sure, not, operand, match term.

This is a fundamental sort of accepter/rejector that
I wrote up quite a bit on sci.logic, and here a bit.

Then this is that besides terms, a given file, has
for partitions, to relate those in terms of dates,
and skip those that don't apply, having that inside
the file, vis-a-vis, having it alongside the file,
pulling it from a file. Basically a search is to
identify SFF files as they're found going along,
then search through those.

The term frequency / inverse document frequency,
gets into summary statistics of terms in documents
the corpus, here as about those building up out
of partitions, and summing the summaries
with either concatenation or categorical closures.

So, about the terms, and the content, here it's
plainly text content, and there is a convention
the quoting convention. This is where, a reference
is quoted in part or in full, then the content is
either after-article (the article convention), afore-article
(the email convention) or "amidst-article", inline,
interspersed, or combinations thereof.

afore-article: reference follows
amidst-article: article split
after-article: reference is quoted

The idea in the quoting convention, is that
nothing changes in the quoted content,
which is indicated by the text convention.

This gets into the idea of sorting the hits for
relevance, and origin, about threads, or references,
when terms are introduced into threads, then
to follow those references, returning threads,
that have terms for hits.

The idea is to implement a sort of article-diff,
according to discovering quoting character
conventions, about what would be fragments,
of articles as documents, and documents,
their fragments by quoting, referring to
references, as introduce terms.

The references thread then as a data structure,
has at least two ways to look at it. The reference
itself is indicated by a directed-acyclic-graph or
tree built as links, it's a primary attribute, then
there's time-series data, then there's matching
of the subject attribute, and even as that search
results are a sort of thread.

In this sense then a thread, is abstractly of threads,
threads have heads, about that hits on articles,
are also hits on their threads, with each article
being head of a thread.


About common words, basically gets into language.
These are the articles (the definite and indefinite
articles of language), the usual copulas, the usual
prepositions, and all such words of parts-of-speech
that are syntactical and implement referents, and
about how they connect meaningful words, and
into language, in terms of sentences, paragraphs,
fragments, articles, and documents.

The idea is that a long enough article will eventually
contain all the common words. It's much structurally
about language, though, and usual match terms of
Yes/No/Maybe or the match terms of the Boolean,
are here for firstly exact match then secondarily
into "fuzzy" match and about terms that comprise
phrases, that the goal is that SFF makes data that
can be used to relate these things, when abstractly
each document is in a vacuum of all the languages
and is just an octet stream or character stream.

The, multi-lingual, then, basically figures to have
either common words of multiple languages,
and be multi-lingual, or meaningful words from
multiple languages, then that those are loanwords.

So, back to NNTP WILDMAT and IMAP SEARCH, ....

https://www.rfc-editor.org/rfc/rfc2980.html#section-3.3
https://datatracker.ietf.org/doc/html/rfc3977#section-4.2

If you've ever spent a lot of time making regexes
and running find to match files, wildmat is sort
of sensible and indeed a lot like Yes/No/Maybe.
Kind of like, sed accepts a list of commands,
and sometimes tr, when find, sed, and tr are the tools.
Anyways, implementing WILDMAT is to be implemented
according to SFF backing it then a reference algorithm.
The match terms of Yes/No/Maybe, don't really have
wildcards. They match substrings. For example
"equals" is equals and "in" is substring and "~" for
"relates" is by default "in". Then, there's either adding
wildcards, or adding anchors, to those, where the
anchors would be "^" for front and "$" for end.
Basically though WILDMAT is a sequence (Yes|No),
indicated by Yes terms not starting with '!' and No
terms marked with '!', then in reverse order,
i.e., right-to-left, any Yes match is yes and any No
match is no, and default is no. So, in Yes/No/Maybe,
it's a stack of Yes/No/Maybe's.

Mostly though NNTP doesn't have SEARCH, though,
so, .... And, wildmat is as much a match term, as
an accepter/rejector, for accepter/rejector algorithms,
that compose as queries.

https://datatracker.ietf.org/doc/html/rfc3501#section-6.4.4

IMAP defines "keys", these being the language of
the query, then as for expressions in those. Then
most of those get into the flags, counters, and
with regards to the user, session, that get into
the general idea that NNTP's session is just a
notion of "current group and current article",
that IMAP's user and session have flags and counters
applied to each message.

Search, then, basically is into search and selection,
and accumulating selection, and refining search,
that basically Sure accumulates as the selection
and No/Yes is the search. This gets relevant in
the IMAP extensions of SEARCH for selection,
then with the idea of commands on the selection.



Relevance: gets into "signal, and noise". That is
to say, back-and-forth references that don't
introduce new terms, are noise, and it's the
introduction of terms, and following that
their reference, that's relevance.

For attributes, this basically is for determining
low cardinality and high cardinality attributes,
that low cardinality attributes are categories,
and high cardinality attributes are identifiers.

This gets into "distance", and relation, then to
find close relations in near distances, helping
to find the beginnings and ends of things.


So, I figure BFF is about designed, so to carry
it out, and then get into SFF, that to have in
the middle something MFF metadata file-format
or session and user-wise, and the collection documents
and the query documents, yet, the layout of
the files and partitions, should be planned about
that it will grow, either the number of directories
or files, or there depth thereof, and it should be
partitionable, so that it results being able to add
or drop partitions by moving folders or making
links, about that mailbox is a file and maildir is
a directory and here the idea is "unbounded
retention and performant maintenance".

It involves read/write, instead of write-once-ready-many.
Rather, it involves read/write, or growing files,
and critical transactionality of serialization of
parallel routine, vis-a-vis the semantics of atomic move.

Then, for, "distance", is the distances of relations,
about how to relate things, and how to find
regions, that result a small distance among them,
like words and roots and authors and topics
and these kinds things, to build summary statistics
that are discrete and composable, then that those
naturally build both summaries as digests and also
histograms, not so much "data mining" as "towers of relation".

So, for a sort of notion of, "network distance",
is that basically there is time-series data and
auto-association of equality.
Ross Finlayson
2024-03-07 16:10:01 UTC
Reply
Permalink
On 03/04/2024 11:23 AM, Ross Finlayson wrote:
>
> So, figuring that BFF then is about designed,
> basically for storing Internet messages with
> regards to MessageId, then about ContentId
> and external resources separately, then here
> the idea again becomes how to make for
> the SFF files, what results, intermediate, tractable,
> derivable, discardable, composable data structures,
> in files of a format with regards to write-once-read-many,
> write-once-read-never, and, "partition it", in terms of
> natural partitions like time intervals and categorical attributes.
>
>
> There are some various great open-source search
> engines, here with respect to something like Lucene
> or SOLR or ElasticSearch.
>
> The idea is that there are attributes searches,
> and full-text searches, those resulting hits,
> to documents apiece, or sections of their content,
> then backward along their attributes, like
> threads and related threads, and authors and
> their cliques, while across groups and periods
> of time.
>
> There's not much of a notion of "semantic search",
> though, it's expected to sort of naturally result,
> here as for usually enough least distance, as for
> "the terms of matching", and predicates from what
> results a filter predicate, here with what I call,
> "Yes/No/Maybe".
>
> Now, what is, "yes/no/maybe", one might ask.
> Well, it's the query specification, of the world
> of results, to filter to the specified results.
> The idea is that there's an accepter network
> for "Yes" and a rejector network for "No"
> and an accepter network for "Maybe" and
> then rest are rejected.
>
> The idea is that the search, is a combination
> of a bunch of yes/no/maybe terms, or,
> sure/no/yes, to indicate what's definitely
> included, what's not, and what is, then that
> the term, results that it's composable, from
> sorting the terms, to result a filter predicate
> implementation, that can run anywhere along
> the way, from the backend to the frontend,
> this way being a, "search query specification".
>
>
> There are notions like, "*", and single match
> and multimatch, about basically columns and
> a column model, of documents, that are
> basically rows.
>
>
> The idea of course is to built an arithmetic expression,
> that also is exactly a natural expression,
> for "matches", and "ranges".
>
> "AP"|Archimedes|Plutonium in first|last
>
> Here, there is a search, for various names, that
> it composes this way.
>
> AP first
> AP last
> Archimedes first
> Archimedes last
> Plutonium first
> Plutonium last
>
> As you can see, these "match terms", just naturally
> break out, then that what's gets into negations,
> break out and double, and what gets into ranges,
> then, well that involves for partitions and ranges,
> duplicating and breaking that out.
>
> It results though a very fungible and normal form
> of a search query specification, that rebuilds the
> filter predicate according to sorting those, then
> has very well understood runtime according to
> yes/no/maybe and the multimatch, across and
> among multiple attributes, multiple terms.
>
>
> This sort of enriches a usual sort of query
> "exact full hit", with this sort "ranges and conditions,
> exact full hits".
>
> So, the Yes/No/Maybe, is the generic search query
> specification, overall, just reflecting an accepter/rejector
> network, with a bit on the front to reflect keep/toss,
> that's it's very practical and of course totally commonplace
> and easily written broken out as find or wildmat specs.
>
> For then these the objects and the terms relating
> the things, there's about maintaining this, while
> refining it, that basically there's an ownership
> and a reference count of the filter objects, so
> that various controls according to the syntax of
> the normal form of the expression itself, with
> most usual English terms like "is" and "in" and
> "has" and "between", and "not", with & for "and"
> and | for "or", makes that this should be the kind
> of filter query specification that one would expect
> to be general purpose on all such manners of
> filter query specifications and their controls.
>
> So, a normal form for these filter objects, then
> gets relating them to the SFF files, because, an
> SFF file of a given input corpus, satisifies some
> of these specifications, the queries, or for example
> doesn't, about making the language and files
> first of the query, then the content, then just
> mapping those to the content, which are built
> off extractors and summarizers.
>
> I already thought about this a lot. It results
> that it sort of has its own little theory,
> thus what can result its own little normal forms,
> for making a fungible SFF description, what
> results for any query, going through those,
> running the same query or as so filtered down
> the query for the partition already, from the
> front-end to the back-end and back, a little
> noisy protocol, that delivers search results.
>
>
>
>
> The document is element of the corpus.
> Here each message is a corpus. Now,
> there's a convention in Internet messages,
> not always followed, being that the ignorant
> or lacking etiquette or just plain different,
> don't follow it or break it, there's a convention
> of attribution in Internet messages the
> content that's replied to, and, this is
> variously "block" or "inline".
>
> From the outside though, the document here
> has the "overview" attributes, the key-value
> pairs of the headers those being, and the
> "body" or "document" itself, which can as
> well have extracted attributes, vis-a-vis
> otherwise its, "full text".
>
> https://en.wikipedia.org/wiki/Search_engine_indexing
>
>
> The key thing here for partitioning is to
> make for date-range partitioning, while,
> the organization of the messages by ID is
> essentially flat, and constant rate to access one
> but linear to trawl through them, although parallelizable,
> for example with a parallelizable filter predicate
> like yes/no/maybe, before getting into the
> inter-document of terms, here the idea is that
> there's basically
>
> date partition
> group partition
>
> then as with regards to
>
> threads
> authors
>
> that these are each having their own linear organization,
> or as with respect to time-series partitions, and the serial.
>
> Then, there are two sorts of data structures
> to build with:
>
> binary trees,
> bit-maps.
>
> So, the idea is to build indexes for date ranges
> and then just search separately, either linear
> or from an in-memory currency, the current.
>
> I'm not too interested in "rapid results" as
> much as "thoroughly parallelizable and
> effectively indexed", and "providing
> incremental results" and "full hits".
>
> The idea here is to relate date ranges,
> to an index file for the groups files,
> then to just search the date ranges,
> and for example as maybe articles expire,
> which here they don't as it's archival,
> to relate dropping old partitions with
> updating the groups indexes.
>
> For NNTP and IMAP then there's,
> OVERVIEW and SEARCH. So, the
> key attributes relevant those protocols,
> are here to make it so that messages
> have an abstraction of an extraction,
> those being fixed as what results,
> then those being very naively composable,
> with regards to building data structures
> of those, what with regards to match terms,
> evaluate matches in ranges on those.
>
> Now, NNTP is basically write-once-read-many,
> though I suppose it's mostly write-once-read-
> maybe-a-few-times-then-never, while IMAP
> basically adds to the notion of the session,
> what's read and un-read, and, otherwise
> with regards to flags, IMAP flags. I.e. flags
> are variables, all this other stuff being constants.
>
>
> So, there's an idea to build a sort of, top-down,
> or onion-y, layered, match-finder. This is where
> it's naively composable to concatenate the
> world of terms, in attributes, of documents,
> in date ranges and group partitions, to find
> "there is a hit" then to dive deeper into it,
> figuring the idea is to horizontally scale
> by refining date partitions and serial collections,
> then parallelize those, where as well that serial
> algorithms work the same on those, eg, by
> concatenating those and working on that.
>
> This is where a group and a date partition
> each have a relatively small range, of overview
> attributes, and their values, then that for
> noisy values, like timestamps, to detect those
> and work out what are small cardinal categories
> and large cardinal ergodic identifiers.
>
> It's sort of like, "Why don't you check out the
> book Information Retrieval and read that again",
> and, in a sense, it's because I figure that Google
> has littered all their no-brainer patterns with junk patents
> that instead I expect to clean-room and prior-art this.
> Maybe that's not so, I just wonder sometimes how
> they've arrived at monopolizing what's a totally
> usual sort of "fetch it" routine.
>
>
> So, the goal is to find hits, in conventions of
> documents, inside the convention of quoting,
> with regards to
> bidirectional relations of correspondence, and,
> unidirectional relations of nesting, those
> being terms for matching, and building matching,
> then that the match document, is just copied
> and sent to each partition in parallel, each
> resulting its hits.
>
> The idea is to show a sort of search plan, over
> the partitions, then that there's incremental
> progress and expected times displayed, and
> incremental results gathered, digging it up.
>
> There's basically for partitions "has-a-hit" and
> "hit-count", "hit-list", "hit-stream". That might
> sound sort of macabre, but it means search hits
> not mob hits, then for the keep/toss and yes/no/maybe,
> that partitions are boundaries of sorts, on down
> to ideas of "document-level" and "attribute-level"
> aspects of, "intromissive and extromissive visibility".
>
>
> https://lucene.apache.org/core/3_5_0/fileformats.html
>
> https://solr.apache.org/guide/solr/latest/configuration-guide/index-location-format.html
>
>
> It seems sort of sensible to adapt to Lucene's index file format,
> or, it's pretty sensible, then with regards to default attributes
> and this kind of thing, and the idea that threads are
> documents for searching in threads and finding the
> content actually aside the quotes.
>
> The Lucene's index file format, isn't a data structure itself,
> in terms of a data structure built for b-tree/b-map, where
> the idea is to result a file, that's a serialization of a data
> structure, within it, the pointer relations as to offsets
> in the file, so that, it can be loaded into memory and
> run, or that, I/O can seek through it and run, but especially
> that, it can be mapped into memory and run.
>
> I.e., "implementing the lookup" as following pointer offsets
> in files, vis-a-vis a usual idea that the pointers are just links
> in the tree or off the map, is one of these "SFF" files.
>
> So, for an "index", it's really sort of only the terms then
> that they're inverted from the documents that contain
> them, to point back to them.
>
> Then, because there are going to be index files for each
> partition, is that there are terms and there are partitions,
> with the idea that the query's broken out by organization,
> so that search proceeds only when there's matching partitions,
> then into matching terms.
>
> AP 2020-2023
>
> * AP
> !afore(2020)
> !after(2023)
>
> AP 2019, 2024
>
> * AP
> !afore(2019)
> !after(2019)
>
> * AP
> !afore(2024)
> !after(2024)
>
>
> Here for example the idea is to search the partitions
> according to they match "natural" date terms, vis-a-vis,
> referenced dates, and matching the term in any fields,
> then that the range terms result either one query or
> two, in the sense of breaking those out and resulting
> that then their results get concatenated.
>
> You can see that "in", here, as "between", for example
> in terms of range, is implemented as "not out", for
> that this way the Yes/No/Maybe, Sure/No/Yes, runs
>
> match _any_ Sure: yes
> match _any_ No: no
> match _all_ Yes: yes
> no
>
> I.e. it's not a "Should/Must/MustNot Boolean" query.
>
> What happens is that this way everything sort
> of "or's" together "any", then when are introduced
> no's, then those double about, when introduced
> between's, those are no's, and when disjoint between's,
> those break out otherwise redundant but separately
> partitionable, queries.
>
> AP not subject|body AI
>
> not subject AI
> not body AI
> AP
>
> Then the filter objects have these attributes:
> owner, refcount, sure, not, operand, match term.
>
> This is a fundamental sort of accepter/rejector that
> I wrote up quite a bit on sci.logic, and here a bit.
>
> Then this is that besides terms, a given file, has
> for partitions, to relate those in terms of dates,
> and skip those that don't apply, having that inside
> the file, vis-a-vis, having it alongside the file,
> pulling it from a file. Basically a search is to
> identify SFF files as they're found going along,
> then search through those.
>
> The term frequency / inverse document frequency,
> gets into summary statistics of terms in documents
> the corpus, here as about those building up out
> of partitions, and summing the summaries
> with either concatenation or categorical closures.
>
> So, about the terms, and the content, here it's
> plainly text content, and there is a convention
> the quoting convention. This is where, a reference
> is quoted in part or in full, then the content is
> either after-article (the article convention), afore-article
> (the email convention) or "amidst-article", inline,
> interspersed, or combinations thereof.
>
> afore-article: reference follows
> amidst-article: article split
> after-article: reference is quoted
>
> The idea in the quoting convention, is that
> nothing changes in the quoted content,
> which is indicated by the text convention.
>
> This gets into the idea of sorting the hits for
> relevance, and origin, about threads, or references,
> when terms are introduced into threads, then
> to follow those references, returning threads,
> that have terms for hits.
>
> The idea is to implement a sort of article-diff,
> according to discovering quoting character
> conventions, about what would be fragments,
> of articles as documents, and documents,
> their fragments by quoting, referring to
> references, as introduce terms.
>
> The references thread then as a data structure,
> has at least two ways to look at it. The reference
> itself is indicated by a directed-acyclic-graph or
> tree built as links, it's a primary attribute, then
> there's time-series data, then there's matching
> of the subject attribute, and even as that search
> results are a sort of thread.
>
> In this sense then a thread, is abstractly of threads,
> threads have heads, about that hits on articles,
> are also hits on their threads, with each article
> being head of a thread.
>
>
> About common words, basically gets into language.
> These are the articles (the definite and indefinite
> articles of language), the usual copulas, the usual
> prepositions, and all such words of parts-of-speech
> that are syntactical and implement referents, and
> about how they connect meaningful words, and
> into language, in terms of sentences, paragraphs,
> fragments, articles, and documents.
>
> The idea is that a long enough article will eventually
> contain all the common words. It's much structurally
> about language, though, and usual match terms of
> Yes/No/Maybe or the match terms of the Boolean,
> are here for firstly exact match then secondarily
> into "fuzzy" match and about terms that comprise
> phrases, that the goal is that SFF makes data that
> can be used to relate these things, when abstractly
> each document is in a vacuum of all the languages
> and is just an octet stream or character stream.
>
> The, multi-lingual, then, basically figures to have
> either common words of multiple languages,
> and be multi-lingual, or meaningful words from
> multiple languages, then that those are loanwords.
>
> So, back to NNTP WILDMAT and IMAP SEARCH, ....
>
> https://www.rfc-editor.org/rfc/rfc2980.html#section-3.3
> https://datatracker.ietf.org/doc/html/rfc3977#section-4.2
>
> If you've ever spent a lot of time making regexes
> and running find to match files, wildmat is sort
> of sensible and indeed a lot like Yes/No/Maybe.
> Kind of like, sed accepts a list of commands,
> and sometimes tr, when find, sed, and tr are the tools.
> Anyways, implementing WILDMAT is to be implemented
> according to SFF backing it then a reference algorithm.
> The match terms of Yes/No/Maybe, don't really have
> wildcards. They match substrings. For example
> "equals" is equals and "in" is substring and "~" for
> "relates" is by default "in". Then, there's either adding
> wildcards, or adding anchors, to those, where the
> anchors would be "^" for front and "$" for end.
> Basically though WILDMAT is a sequence (Yes|No),
> indicated by Yes terms not starting with '!' and No
> terms marked with '!', then in reverse order,
> i.e., right-to-left, any Yes match is yes and any No
> match is no, and default is no. So, in Yes/No/Maybe,
> it's a stack of Yes/No/Maybe's.
>
> Mostly though NNTP doesn't have SEARCH, though,
> so, .... And, wildmat is as much a match term, as
> an accepter/rejector, for accepter/rejector algorithms,
> that compose as queries.
>
> https://datatracker.ietf.org/doc/html/rfc3501#section-6.4.4
>
> IMAP defines "keys", these being the language of
> the query, then as for expressions in those. Then
> most of those get into the flags, counters, and
> with regards to the user, session, that get into
> the general idea that NNTP's session is just a
> notion of "current group and current article",
> that IMAP's user and session have flags and counters
> applied to each message.
>
> Search, then, basically is into search and selection,
> and accumulating selection, and refining search,
> that basically Sure accumulates as the selection
> and No/Yes is the search. This gets relevant in
> the IMAP extensions of SEARCH for selection,
> then with the idea of commands on the selection.
>
>
>
> Relevance: gets into "signal, and noise". That is
> to say, back-and-forth references that don't
> introduce new terms, are noise, and it's the
> introduction of terms, and following that
> their reference, that's relevance.
>
> For attributes, this basically is for determining
> low cardinality and high cardinality attributes,
> that low cardinality attributes are categories,
> and high cardinality attributes are identifiers.
>
> This gets into "distance", and relation, then to
> find close relations in near distances, helping
> to find the beginnings and ends of things.
>
>
> So, I figure BFF is about designed, so to carry
> it out, and then get into SFF, that to have in
> the middle something MFF metadata file-format
> or session and user-wise, and the collection documents
> and the query documents, yet, the layout of
> the files and partitions, should be planned about
> that it will grow, either the number of directories
> or files, or there depth thereof, and it should be
> partitionable, so that it results being able to add
> or drop partitions by moving folders or making
> links, about that mailbox is a file and maildir is
> a directory and here the idea is "unbounded
> retention and performant maintenance".
>
> It involves read/write, instead of write-once-ready-many.
> Rather, it involves read/write, or growing files,
> and critical transactionality of serialization of
> parallel routine, vis-a-vis the semantics of atomic move.
>
> Then, for, "distance", is the distances of relations,
> about how to relate things, and how to find
> regions, that result a small distance among them,
> like words and roots and authors and topics
> and these kinds things, to build summary statistics
> that are discrete and composable, then that those
> naturally build both summaries as digests and also
> histograms, not so much "data mining" as "towers of relation".
>
> So, for a sort of notion of, "network distance",
> is that basically there is time-series data and
> auto-association of equality.
>

Then, it's sort of figured out what is a sort
of BFF that results then a "normal physical
store with atomic file semantics".

The partitioning seems essentially date-ranged,
with regards to then getting figured how to
have the groups and overview file made into
delivering the files.

The SFF seems to make for author->words
and thread->words, author<-> thread, and
about making intermediate files what result
running longer searches in the unbounded,
while also making for usual sorts simple
composable queries.


Then, with that making for the data, then
is again to the consideration of the design
of the server runtime, basically about that
there's to be the layers of protocols, that
result the layers indicate the at-rest formats,
i.e. compressed or padded for encryption,
then to make it so that the protocols per
connection mostly get involved with the
"attachment" per connection, which is
basically the private data structure.

This is where the attachment has for
the protocol as much there is of the
session, about what results that
according to the composability of protocols,
in terms of their message composition
and transport in commands, is to result
that the state-machine of the protocol
layering is to result a sort of stack of
protocols in the attachment, here for
that the attachment is a minimal amount
of data associated with a connection,
and would be the same in a sort of
thread-per-connection model, for
a sort of
intra-protocol,
inter-protocol,
infra-protocol,
that the intra-protocol reflects the
command layer, the inter-protocols
reflect message composition and transport,
and the infra-protocol reflects changed
in protocol.

It's similar then with the connection itself,
intra, inter, infra, with regards to the
semantics of flows, and session, with
regards to single connections and their
flows, and multiple connections and
their session.

Then, the layering of protocol seems
much about one sort of command set,
and various sorts transport encoding,
while related the session, then another
notion of layering of protocol involves
when one protocol is used to fulfill
another protocol directly, figuring
that instead that's "inside" what reflects
usually upstream/downstream, or request/
response, here about IMAP backed by NNTP
and mail2news and this kind of thing.
Ross Finlayson
2024-03-12 17:09:01 UTC
Reply
Permalink
On 03/07/2024 08:10 AM, Ross Finlayson wrote:
> On 03/04/2024 11:23 AM, Ross Finlayson wrote:
>>
>> So, figuring that BFF then is about designed,
>> basically for storing Internet messages with
>> regards to MessageId, then about ContentId
>> and external resources separately, then here
>> the idea again becomes how to make for
>> the SFF files, what results, intermediate, tractable,
>> derivable, discardable, composable data structures,
>> in files of a format with regards to write-once-read-many,
>> write-once-read-never, and, "partition it", in terms of
>> natural partitions like time intervals and categorical attributes.
>>
>>
>> There are some various great open-source search
>> engines, here with respect to something like Lucene
>> or SOLR or ElasticSearch.
>>
>> The idea is that there are attributes searches,
>> and full-text searches, those resulting hits,
>> to documents apiece, or sections of their content,
>> then backward along their attributes, like
>> threads and related threads, and authors and
>> their cliques, while across groups and periods
>> of time.
>>
>> There's not much of a notion of "semantic search",
>> though, it's expected to sort of naturally result,
>> here as for usually enough least distance, as for
>> "the terms of matching", and predicates from what
>> results a filter predicate, here with what I call,
>> "Yes/No/Maybe".
>>
>> Now, what is, "yes/no/maybe", one might ask.
>> Well, it's the query specification, of the world
>> of results, to filter to the specified results.
>> The idea is that there's an accepter network
>> for "Yes" and a rejector network for "No"
>> and an accepter network for "Maybe" and
>> then rest are rejected.
>>
>> The idea is that the search, is a combination
>> of a bunch of yes/no/maybe terms, or,
>> sure/no/yes, to indicate what's definitely
>> included, what's not, and what is, then that
>> the term, results that it's composable, from
>> sorting the terms, to result a filter predicate
>> implementation, that can run anywhere along
>> the way, from the backend to the frontend,
>> this way being a, "search query specification".
>>
>>
>> There are notions like, "*", and single match
>> and multimatch, about basically columns and
>> a column model, of documents, that are
>> basically rows.
>>
>>
>> The idea of course is to built an arithmetic expression,
>> that also is exactly a natural expression,
>> for "matches", and "ranges".
>>
>> "AP"|Archimedes|Plutonium in first|last
>>
>> Here, there is a search, for various names, that
>> it composes this way.
>>
>> AP first
>> AP last
>> Archimedes first
>> Archimedes last
>> Plutonium first
>> Plutonium last
>>
>> As you can see, these "match terms", just naturally
>> break out, then that what's gets into negations,
>> break out and double, and what gets into ranges,
>> then, well that involves for partitions and ranges,
>> duplicating and breaking that out.
>>
>> It results though a very fungible and normal form
>> of a search query specification, that rebuilds the
>> filter predicate according to sorting those, then
>> has very well understood runtime according to
>> yes/no/maybe and the multimatch, across and
>> among multiple attributes, multiple terms.
>>
>>
>> This sort of enriches a usual sort of query
>> "exact full hit", with this sort "ranges and conditions,
>> exact full hits".
>>
>> So, the Yes/No/Maybe, is the generic search query
>> specification, overall, just reflecting an accepter/rejector
>> network, with a bit on the front to reflect keep/toss,
>> that's it's very practical and of course totally commonplace
>> and easily written broken out as find or wildmat specs.
>>
>> For then these the objects and the terms relating
>> the things, there's about maintaining this, while
>> refining it, that basically there's an ownership
>> and a reference count of the filter objects, so
>> that various controls according to the syntax of
>> the normal form of the expression itself, with
>> most usual English terms like "is" and "in" and
>> "has" and "between", and "not", with & for "and"
>> and | for "or", makes that this should be the kind
>> of filter query specification that one would expect
>> to be general purpose on all such manners of
>> filter query specifications and their controls.
>>
>> So, a normal form for these filter objects, then
>> gets relating them to the SFF files, because, an
>> SFF file of a given input corpus, satisifies some
>> of these specifications, the queries, or for example
>> doesn't, about making the language and files
>> first of the query, then the content, then just
>> mapping those to the content, which are built
>> off extractors and summarizers.
>>
>> I already thought about this a lot. It results
>> that it sort of has its own little theory,
>> thus what can result its own little normal forms,
>> for making a fungible SFF description, what
>> results for any query, going through those,
>> running the same query or as so filtered down
>> the query for the partition already, from the
>> front-end to the back-end and back, a little
>> noisy protocol, that delivers search results.
>>
>>
>>
>>
>> The document is element of the corpus.
>> Here each message is a corpus. Now,
>> there's a convention in Internet messages,
>> not always followed, being that the ignorant
>> or lacking etiquette or just plain different,
>> don't follow it or break it, there's a convention
>> of attribution in Internet messages the
>> content that's replied to, and, this is
>> variously "block" or "inline".
>>
>> From the outside though, the document here
>> has the "overview" attributes, the key-value
>> pairs of the headers those being, and the
>> "body" or "document" itself, which can as
>> well have extracted attributes, vis-a-vis
>> otherwise its, "full text".
>>
>> https://en.wikipedia.org/wiki/Search_engine_indexing
>>
>>
>> The key thing here for partitioning is to
>> make for date-range partitioning, while,
>> the organization of the messages by ID is
>> essentially flat, and constant rate to access one
>> but linear to trawl through them, although parallelizable,
>> for example with a parallelizable filter predicate
>> like yes/no/maybe, before getting into the
>> inter-document of terms, here the idea is that
>> there's basically
>>
>> date partition
>> group partition
>>
>> then as with regards to
>>
>> threads
>> authors
>>
>> that these are each having their own linear organization,
>> or as with respect to time-series partitions, and the serial.
>>
>> Then, there are two sorts of data structures
>> to build with:
>>
>> binary trees,
>> bit-maps.
>>
>> So, the idea is to build indexes for date ranges
>> and then just search separately, either linear
>> or from an in-memory currency, the current.
>>
>> I'm not too interested in "rapid results" as
>> much as "thoroughly parallelizable and
>> effectively indexed", and "providing
>> incremental results" and "full hits".
>>
>> The idea here is to relate date ranges,
>> to an index file for the groups files,
>> then to just search the date ranges,
>> and for example as maybe articles expire,
>> which here they don't as it's archival,
>> to relate dropping old partitions with
>> updating the groups indexes.
>>
>> For NNTP and IMAP then there's,
>> OVERVIEW and SEARCH. So, the
>> key attributes relevant those protocols,
>> are here to make it so that messages
>> have an abstraction of an extraction,
>> those being fixed as what results,
>> then those being very naively composable,
>> with regards to building data structures
>> of those, what with regards to match terms,
>> evaluate matches in ranges on those.
>>
>> Now, NNTP is basically write-once-read-many,
>> though I suppose it's mostly write-once-read-
>> maybe-a-few-times-then-never, while IMAP
>> basically adds to the notion of the session,
>> what's read and un-read, and, otherwise
>> with regards to flags, IMAP flags. I.e. flags
>> are variables, all this other stuff being constants.
>>
>>
>> So, there's an idea to build a sort of, top-down,
>> or onion-y, layered, match-finder. This is where
>> it's naively composable to concatenate the
>> world of terms, in attributes, of documents,
>> in date ranges and group partitions, to find
>> "there is a hit" then to dive deeper into it,
>> figuring the idea is to horizontally scale
>> by refining date partitions and serial collections,
>> then parallelize those, where as well that serial
>> algorithms work the same on those, eg, by
>> concatenating those and working on that.
>>
>> This is where a group and a date partition
>> each have a relatively small range, of overview
>> attributes, and their values, then that for
>> noisy values, like timestamps, to detect those
>> and work out what are small cardinal categories
>> and large cardinal ergodic identifiers.
>>
>> It's sort of like, "Why don't you check out the
>> book Information Retrieval and read that again",
>> and, in a sense, it's because I figure that Google
>> has littered all their no-brainer patterns with junk patents
>> that instead I expect to clean-room and prior-art this.
>> Maybe that's not so, I just wonder sometimes how
>> they've arrived at monopolizing what's a totally
>> usual sort of "fetch it" routine.
>>
>>
>> So, the goal is to find hits, in conventions of
>> documents, inside the convention of quoting,
>> with regards to
>> bidirectional relations of correspondence, and,
>> unidirectional relations of nesting, those
>> being terms for matching, and building matching,
>> then that the match document, is just copied
>> and sent to each partition in parallel, each
>> resulting its hits.
>>
>> The idea is to show a sort of search plan, over
>> the partitions, then that there's incremental
>> progress and expected times displayed, and
>> incremental results gathered, digging it up.
>>
>> There's basically for partitions "has-a-hit" and
>> "hit-count", "hit-list", "hit-stream". That might
>> sound sort of macabre, but it means search hits
>> not mob hits, then for the keep/toss and yes/no/maybe,
>> that partitions are boundaries of sorts, on down
>> to ideas of "document-level" and "attribute-level"
>> aspects of, "intromissive and extromissive visibility".
>>
>>
>> https://lucene.apache.org/core/3_5_0/fileformats.html
>>
>> https://solr.apache.org/guide/solr/latest/configuration-guide/index-location-format.html
>>
>>
>>
>> It seems sort of sensible to adapt to Lucene's index file format,
>> or, it's pretty sensible, then with regards to default attributes
>> and this kind of thing, and the idea that threads are
>> documents for searching in threads and finding the
>> content actually aside the quotes.
>>
>> The Lucene's index file format, isn't a data structure itself,
>> in terms of a data structure built for b-tree/b-map, where
>> the idea is to result a file, that's a serialization of a data
>> structure, within it, the pointer relations as to offsets
>> in the file, so that, it can be loaded into memory and
>> run, or that, I/O can seek through it and run, but especially
>> that, it can be mapped into memory and run.
>>
>> I.e., "implementing the lookup" as following pointer offsets
>> in files, vis-a-vis a usual idea that the pointers are just links
>> in the tree or off the map, is one of these "SFF" files.
>>
>> So, for an "index", it's really sort of only the terms then
>> that they're inverted from the documents that contain
>> them, to point back to them.
>>
>> Then, because there are going to be index files for each
>> partition, is that there are terms and there are partitions,
>> with the idea that the query's broken out by organization,
>> so that search proceeds only when there's matching partitions,
>> then into matching terms.
>>
>> AP 2020-2023
>>
>> * AP
>> !afore(2020)
>> !after(2023)
>>
>> AP 2019, 2024
>>
>> * AP
>> !afore(2019)
>> !after(2019)
>>
>> * AP
>> !afore(2024)
>> !after(2024)
>>
>>
>> Here for example the idea is to search the partitions
>> according to they match "natural" date terms, vis-a-vis,
>> referenced dates, and matching the term in any fields,
>> then that the range terms result either one query or
>> two, in the sense of breaking those out and resulting
>> that then their results get concatenated.
>>
>> You can see that "in", here, as "between", for example
>> in terms of range, is implemented as "not out", for
>> that this way the Yes/No/Maybe, Sure/No/Yes, runs
>>
>> match _any_ Sure: yes
>> match _any_ No: no
>> match _all_ Yes: yes
>> no
>>
>> I.e. it's not a "Should/Must/MustNot Boolean" query.
>>
>> What happens is that this way everything sort
>> of "or's" together "any", then when are introduced
>> no's, then those double about, when introduced
>> between's, those are no's, and when disjoint between's,
>> those break out otherwise redundant but separately
>> partitionable, queries.
>>
>> AP not subject|body AI
>>
>> not subject AI
>> not body AI
>> AP
>>
>> Then the filter objects have these attributes:
>> owner, refcount, sure, not, operand, match term.
>>
>> This is a fundamental sort of accepter/rejector that
>> I wrote up quite a bit on sci.logic, and here a bit.
>>
>> Then this is that besides terms, a given file, has
>> for partitions, to relate those in terms of dates,
>> and skip those that don't apply, having that inside
>> the file, vis-a-vis, having it alongside the file,
>> pulling it from a file. Basically a search is to
>> identify SFF files as they're found going along,
>> then search through those.
>>
>> The term frequency / inverse document frequency,
>> gets into summary statistics of terms in documents
>> the corpus, here as about those building up out
>> of partitions, and summing the summaries
>> with either concatenation or categorical closures.
>>
>> So, about the terms, and the content, here it's
>> plainly text content, and there is a convention
>> the quoting convention. This is where, a reference
>> is quoted in part or in full, then the content is
>> either after-article (the article convention), afore-article
>> (the email convention) or "amidst-article", inline,
>> interspersed, or combinations thereof.
>>
>> afore-article: reference follows
>> amidst-article: article split
>> after-article: reference is quoted
>>
>> The idea in the quoting convention, is that
>> nothing changes in the quoted content,
>> which is indicated by the text convention.
>>
>> This gets into the idea of sorting the hits for
>> relevance, and origin, about threads, or references,
>> when terms are introduced into threads, then
>> to follow those references, returning threads,
>> that have terms for hits.
>>
>> The idea is to implement a sort of article-diff,
>> according to discovering quoting character
>> conventions, about what would be fragments,
>> of articles as documents, and documents,
>> their fragments by quoting, referring to
>> references, as introduce terms.
>>
>> The references thread then as a data structure,
>> has at least two ways to look at it. The reference
>> itself is indicated by a directed-acyclic-graph or
>> tree built as links, it's a primary attribute, then
>> there's time-series data, then there's matching
>> of the subject attribute, and even as that search
>> results are a sort of thread.
>>
>> In this sense then a thread, is abstractly of threads,
>> threads have heads, about that hits on articles,
>> are also hits on their threads, with each article
>> being head of a thread.
>>
>>
>> About common words, basically gets into language.
>> These are the articles (the definite and indefinite
>> articles of language), the usual copulas, the usual
>> prepositions, and all such words of parts-of-speech
>> that are syntactical and implement referents, and
>> about how they connect meaningful words, and
>> into language, in terms of sentences, paragraphs,
>> fragments, articles, and documents.
>>
>> The idea is that a long enough article will eventually
>> contain all the common words. It's much structurally
>> about language, though, and usual match terms of
>> Yes/No/Maybe or the match terms of the Boolean,
>> are here for firstly exact match then secondarily
>> into "fuzzy" match and about terms that comprise
>> phrases, that the goal is that SFF makes data that
>> can be used to relate these things, when abstractly
>> each document is in a vacuum of all the languages
>> and is just an octet stream or character stream.
>>
>> The, multi-lingual, then, basically figures to have
>> either common words of multiple languages,
>> and be multi-lingual, or meaningful words from
>> multiple languages, then that those are loanwords.
>>
>> So, back to NNTP WILDMAT and IMAP SEARCH, ....
>>
>> https://www.rfc-editor.org/rfc/rfc2980.html#section-3.3
>> https://datatracker.ietf.org/doc/html/rfc3977#section-4.2
>>
>> If you've ever spent a lot of time making regexes
>> and running find to match files, wildmat is sort
>> of sensible and indeed a lot like Yes/No/Maybe.
>> Kind of like, sed accepts a list of commands,
>> and sometimes tr, when find, sed, and tr are the tools.
>> Anyways, implementing WILDMAT is to be implemented
>> according to SFF backing it then a reference algorithm.
>> The match terms of Yes/No/Maybe, don't really have
>> wildcards. They match substrings. For example
>> "equals" is equals and "in" is substring and "~" for
>> "relates" is by default "in". Then, there's either adding
>> wildcards, or adding anchors, to those, where the
>> anchors would be "^" for front and "$" for end.
>> Basically though WILDMAT is a sequence (Yes|No),
>> indicated by Yes terms not starting with '!' and No
>> terms marked with '!', then in reverse order,
>> i.e., right-to-left, any Yes match is yes and any No
>> match is no, and default is no. So, in Yes/No/Maybe,
>> it's a stack of Yes/No/Maybe's.
>>
>> Mostly though NNTP doesn't have SEARCH, though,
>> so, .... And, wildmat is as much a match term, as
>> an accepter/rejector, for accepter/rejector algorithms,
>> that compose as queries.
>>
>> https://datatracker.ietf.org/doc/html/rfc3501#section-6.4.4
>>
>> IMAP defines "keys", these being the language of
>> the query, then as for expressions in those. Then
>> most of those get into the flags, counters, and
>> with regards to the user, session, that get into
>> the general idea that NNTP's session is just a
>> notion of "current group and current article",
>> that IMAP's user and session have flags and counters
>> applied to each message.
>>
>> Search, then, basically is into search and selection,
>> and accumulating selection, and refining search,
>> that basically Sure accumulates as the selection
>> and No/Yes is the search. This gets relevant in
>> the IMAP extensions of SEARCH for selection,
>> then with the idea of commands on the selection.
>>
>>
>>
>> Relevance: gets into "signal, and noise". That is
>> to say, back-and-forth references that don't
>> introduce new terms, are noise, and it's the
>> introduction of terms, and following that
>> their reference, that's relevance.
>>
>> For attributes, this basically is for determining
>> low cardinality and high cardinality attributes,
>> that low cardinality attributes are categories,
>> and high cardinality attributes are identifiers.
>>
>> This gets into "distance", and relation, then to
>> find close relations in near distances, helping
>> to find the beginnings and ends of things.
>>
>>
>> So, I figure BFF is about designed, so to carry
>> it out, and then get into SFF, that to have in
>> the middle something MFF metadata file-format
>> or session and user-wise, and the collection documents
>> and the query documents, yet, the layout of
>> the files and partitions, should be planned about
>> that it will grow, either the number of directories
>> or files, or there depth thereof, and it should be
>> partitionable, so that it results being able to add
>> or drop partitions by moving folders or making
>> links, about that mailbox is a file and maildir is
>> a directory and here the idea is "unbounded
>> retention and performant maintenance".
>>
>> It involves read/write, instead of write-once-ready-many.
>> Rather, it involves read/write, or growing files,
>> and critical transactionality of serialization of
>> parallel routine, vis-a-vis the semantics of atomic move.
>>
>> Then, for, "distance", is the distances of relations,
>> about how to relate things, and how to find
>> regions, that result a small distance among them,
>> like words and roots and authors and topics
>> and these kinds things, to build summary statistics
>> that are discrete and composable, then that those
>> naturally build both summaries as digests and also
>> histograms, not so much "data mining" as "towers of relation".
>>
>> So, for a sort of notion of, "network distance",
>> is that basically there is time-series data and
>> auto-association of equality.
>>
>
> Then, it's sort of figured out what is a sort
> of BFF that results then a "normal physical
> store with atomic file semantics".
>
> The partitioning seems essentially date-ranged,
> with regards to then getting figured how to
> have the groups and overview file made into
> delivering the files.
>
> The SFF seems to make for author->words
> and thread->words, author<-> thread, and
> about making intermediate files what result
> running longer searches in the unbounded,
> while also making for usual sorts simple
> composable queries.
>
>
> Then, with that making for the data, then
> is again to the consideration of the design
> of the server runtime, basically about that
> there's to be the layers of protocols, that
> result the layers indicate the at-rest formats,
> i.e. compressed or padded for encryption,
> then to make it so that the protocols per
> connection mostly get involved with the
> "attachment" per connection, which is
> basically the private data structure.
>
> This is where the attachment has for
> the protocol as much there is of the
> session, about what results that
> according to the composability of protocols,
> in terms of their message composition
> and transport in commands, is to result
> that the state-machine of the protocol
> layering is to result a sort of stack of
> protocols in the attachment, here for
> that the attachment is a minimal amount
> of data associated with a connection,
> and would be the same in a sort of
> thread-per-connection model, for
> a sort of
> intra-protocol,
> inter-protocol,
> infra-protocol,
> that the intra-protocol reflects the
> command layer, the inter-protocols
> reflect message composition and transport,
> and the infra-protocol reflects changed
> in protocol.
>
> It's similar then with the connection itself,
> intra, inter, infra, with regards to the
> semantics of flows, and session, with
> regards to single connections and their
> flows, and multiple connections and
> their session.
>
> Then, the layering of protocol seems
> much about one sort of command set,
> and various sorts transport encoding,
> while related the session, then another
> notion of layering of protocol involves
> when one protocol is used to fulfill
> another protocol directly, figuring
> that instead that's "inside" what reflects
> usually upstream/downstream, or request/
> response, here about IMAP backed by NNTP
> and mail2news and this kind of thing.
>
>


Then, with sort of a good idea on the backing store,
figuring that it represents a fungible sort of
representation, gets to that it's wasteful to have
small files though that here the limits are more in
the count of entries than the size overall.


What I've been wondering about is how to
design the run-time, then. First it seems there's
to be made an abstraction of the I/O, as the
implementation I tapped out so far or a while
back does just fine with non-blocking I/O and
up to thousands of connections, but the idea
is that then adding protocols at the end like
TLS (encryption), SASL (authentication), and
compression, have that the whole idea of hi-po
I/O is that largely the data is all at rest, then
that to move it along is better pass-through
or pass-along than that each layer does its
own message composition, in the message
transport.

So I've been thinking about how to define
the interfaces, that pretty much look exactly
like the I/O model, while dealing with the
messages as they are composing and composed,
the header and body of the message as a
sequence of handles, then with buffers and
whether they need be heap memory or direct
memory, and with the session-of-connection
as about the overall session-of-lifetime and
session-of-history, about the attachment
that is the datum associated with the connection,
and then with regards to the multiplexing
for things like IMAP and HTTP/2, hi-po I/O layers.

That sort of sums it up, that connections
arrive and negotiate and upgrade their
layers in the protocol, then when those
result being fixed, that the ingress parses
messages until complete messages arrive,
execute the resulting commands, then that
the message in composition and transport,
is ideally a list of handles that get directly
send out, and otherwise that in the least
sort resources, rotate and fill buffers in
the handles, that then result getting sent,
or multiplexed, concluding the request's response.

Handling streaming in the message composition
is it's own kind of idea, and then handling
"large" transports is its own kind of case,
where streaming is the idea of setting up
an own sort of segmenter for a connection,
that then has to coordinate with the back-end
or the executor, so it results that when the
back-end results an unbounded stream,
that through some logic or piece-by-piece,
those get "appended" to the message. The
idea of "large" or "jumbo" messages, is about
either large files up or large files down, with
regards to basically spooling those off,
or having "stuffing" to the back-end as a
complement to "streaming" from the back-end,
these kinds of things.

So, the usual abstraction of request/response,
and the usual abstraction of header and body,
and the usual abstraction of composition and transport,
and the usual abstraction of multiplexing mux/demux,
and the usual abstraction of streaming and stuffing,
and the usual abstraction of handles and layers,
in the usual abstraction of connections and resources,
of a usual context of attachments and sessions,
in the usual abstraction of route links and handles,
makes for a usual abstraction of protocol,
for connection-oriented architectures.
Ross Finlayson
2024-03-14 19:41:18 UTC
Reply
Permalink
On 03/12/2024 10:09 AM, Ross Finlayson wrote:

> So, the usual abstraction of request/response,
> and the usual abstraction of header and body,
> and the usual abstraction of composition and transport,
> and the usual abstraction of multiplexing mux/demux,
> and the usual abstraction of streaming and stuffing,
> and the usual abstraction of handles and layers,
> in the usual abstraction of connections and resources,
> of a usual context of attachments and sessions,
> in the usual abstraction of route links and handles,
> makes for a usual abstraction of protocol,
> for connection-oriented architectures.
>
>

Hipoio

"Protocol" and "Negotiation"

The usual sort of framework, for request/response or
message-oriented protocols, often has a serialization
layer, which means from the wire to an object representation,
and from an object to a wire representation.

So, deserializing, involves parsing the contents as arrive
on the wire, and resultingly constructing an object. Then,
serializing is the complementary converse notion, iterating
over the content of the object and emitting it to the wire.

Here the wire is an octet-sequence, for a connection that's
bi-directional there is the request or client wire and response
or server wire, then that usual matters of protocol, are
communicating sequential processes, either taking turns
talking on the wire, "half-duplex", or, multiplexing events
as independently, "full-duplex".

So, the message deserialization and message composition,
result in the protocol, as about those get nested, what's
generally called "header and body". So, a command or
request, it's got a header and body, then in some protocols
that's all there is to it, while for example in other protocols,
the command is its own sort of header then its body is the
header and body of a contained message, treating messages
first class, and basically how that results all sorts of notions
of header and body, and the body and payload, these are the
usual kinds of ideas and words, that apply to pretty much all
these kinds of things, and, it's usually simplified as much as
possible, so that frameworks implement all this and then
people implementing a single function don't need to know
anything about it at all, instead just in terms of objects.

Protocol usually also involves the stateful, or session,
anything that's static or "more global" with respect to
the scope, the state, the content, the completions,
the protocol, the session, the state.

The idea then I've been getting into is a sort of framework,
which more or less supports the protocol in its terms, and,
the wire in its terms, and, the resources in their terms, where
here, "the resources" usually refers to one of two things,
the "logical resource" that is a business object or has an identifier,
and the "physical" or "computational resource" which is of
the resources that fulfill transfer or changes of the state of
the "logical resources". So, usually when I say "resources"
I mean capacity and when I say "objects" it means what's
often called "business objects" or the stateful representations
of identified logical values their lifecycle of being, objects.


So, one of the things that happens in the frameworks,
is the unbounded, and what happens when messages
or payloads get large, in terms of the serial action that
reads or writes them off the wire, into an object, about
that it fills all the "ephemeral" resources, vis-a-vis vis
the "durable" resources, where the goal is to pass the
"streaming" of these, by coordinating the (de)serialization
and (de)composition, what makes it like so.

start ... end

start ... first ... following ... end

Then another usual notion besides "streaming", a large
item broken into smaller, is "batching", small items
gathered into larger.


So what I'm figuring for the framework and the protocols
and the negotiation, is what results a first-class sort of
abstraction of serialization and composition as together,
in terms of composing the payload and serializing the message,
of the message's header and body, that the payload is the message.

This might be familiar in packets, as, nested packets,
and, collected packets, with regards to that in the model
of the Ethernet network, packets are finite and small,
and that a convention of sockets, for example, establishes
a connection-oriented protocol, for example, that then
either the packets have external organization of their
reassembly, or internal organization of their reassembly,
their sequencing, their serialization.


Of course the entire usual idea of encapsulation is to
keep these things ignorant of each other, as it results
making a coupling of the things, and things that are
coupled must be de-coupled and re-coupled, as sequential
must be serialized and deserialized or even scattered and
gathered, about then the idea of the least sort of
"protocol or streaming" or "convention of streaming",
that the parsing picks up start/first/following/end,
vis-a-vis that when it fits in start/end, then that's
"under available ephemeral resources", and that when
the message as it starts getting parsed gets large,
then makes for "over available ephemeral resources",
that it's to be coordinate with its receiver or handler,
whether there's enough context, to go from batch-to-streaming
or streaming-to-batch, or to spool it off in what results
anything other an ephemeral resource, so it doesn't
block the messages that do fit, "under ephemeral resources".


So, it gets into the whole idea of the difference between
"request/response" of a command invocation in a protocol,
and, "commence/complete", of an own sort of protocol,
within otherwise the wire protocol, of the receives and
handlers, either round-tripping or one-way in the half-duplex
or full-duplex, with mux/demux both sides of request/response
and commence/complete.


This then becomes a matter relevant to protocol usually,
how to define, that within the protocol command + payload,
within the protocol header + body, with a stream-of-sequences
being a batch-of-bytes, and vice-versa, that for the conventions
and protocols of the utilization and disposition of resources,
computational and business, results defining how to implement
streaming and batching as conventions inside protocols,
according to inner and outer the bodies and payloads.


The big deal with that is implementing that in the (de)serializers,
the (de)composers, then about that a complete operation can
exit as of start -> success/fail, while commence might start but
it can fail while then it's underway, vis-a-vis that it's "well-formed".

So, what this introduces, is a sort of notion, of, "well-formedness",
which is pretty usual, "well-formed", "valid", these being the things,
then "well-flowing", "viable", or "versed" or these automatic sorts
of notions of batching and streaming, with regards to all-or-none and
goodrows/badrows.


Thusly, getting into the framework and the protocols, and the
layers and granular and smooth or discrete and indiscrete,
I've been studying request/response and the stateful in session
and streaming and batching and the computational and business
for a long time, basically that any protocol has a wire protocol,
and a logical protocol above that, then that streaming or batching,
is either "in the protocol" or "beneath the protocol", (or, "over the
protocol", of course the most usual notion of event streams and their
batches), is that here the idea is to fill out according to message
composition, what then can result "under the protocol", a simplest
definition of (de)serialization and (de)composition,
for the well-formedness and well-flowingness the valid and versed,
that for half-duplex and full-duplex protocols or the (de)multiplexer,
makes it so possible to have a most usual means to declare
under strong types, "implement streaming", in otherwise
a very simple framework, that has a most usual adapter
the receiver or handler when the work is "within available
ephemeral resources", and falls back to the valid/versed
when not, all the through the same layers and multiplexers,
pretty much any sort usual connection-oriented protocol.


Hi-Po I/O
Ross Finlayson
2024-03-23 04:30:45 UTC
Reply
Permalink
On 03/02/2024 01:44 PM, Ross Finlayson wrote:
> On 02/20/2024 08:38 PM, Ross Finlayson wrote:
>>
>>
>> Alright then, about the SFF, "summary" file-format,
>> "sorted" file-format, "search" file-format, the idea
>> here is to figure out normal forms of summary,
>> that go with the posts, with the idea that "a post's
>> directory is on the order of contained size of the
>> size of the post", while, "a post's directory is on
>> a constant order of entries", here is for sort of
>> summarizing what a post's directory looks like
>> in "well-formed BFF", then as with regards to
>> things like Intermediate file-formats as mentioned
>> above here with the goal of "very-weakly-encrypted
>> at rest as constant contents", then here for
>> "SFF files, either in the post's-directory or
>> on the side, and about how links to them get
>> collected to directories in a filesystem structure
>> for the conventions of the concatenation of files".
>>
>> So, here the idea so far is that BFF has a normative
>> form for each post, which has a particular opaque
>> globally-universal unique identifier, the Message-ID,
>> then that the directory looks like MessageId/ then its
>> contents were as these files.
>>
>> id hd bd yd td rd ad dd ud xd
>> id, header, body, year-to-date, thread, referenced, authored, dead,
>> undead, expired
>>
>> or just files named
>>
>> i h b y t r a d u x
>>
>> which according to the presence of the files and
>> their contents, indicate that the presence of the
>> MessageId/ directory indicates the presence of
>> a well-formed message, contingent not being expired.
>>
>> ... Where hd bd are the message split into its parts,
>> with regards to the composition of messages by
>> concatenating those back together with the computed
>> message numbers and this kind of thing, with regards to
>> the site, and the idea that they're stored at-rest pre-compressed,
>> then knowledge of the compression algorithm makes for
>> concatenating them in message-composition as compressed.
>>
>> Then, there are variously already relations of the
>> posts, according to groups, then here as above that
>> there's perceived required for date, and author.
>> I.e. these are files on the order the counts of posts,
>> or span in time, or count of authors.
>>
>> (About threading and relating posts, is the idea of
>> matching subjects not-so-much but employing the
>> References header, then as with regards to IMAP and
>> parity as for IMAP's THREADS extension, ...,
>> www.rfc-editor.org/rfc/rfc5256.html , cf SORT and THREAD.
>> There's a usual sort of notion that sorted, threaded
>> enumeration is either in date order or thread-tree
>> traversal order, usually more sensibly date order,
>> with regards to breaking out sub-threads, variously.
>> "It's all one thread." IMAP: "there is an implicit sort
>> criterion of sequence number".)
>>
>>
>> Then, similarly is for defining models for the sort, summary,
>> search, SFF, that it sort of (ha) rather begins with sort,
>> about the idea that it's sort of expected that there will
>> be a date order partition either as symlinks or as an index file,
>> or as with regards to that messages date is also stored in
>> the yd file, then as with regards to "no file-times can be
>> assumed or reliable", with regards to "there's exactly one
>> file named YYYY-MM-DD-HH-MM-SS in MessageId/", these
>> kinds of things. There's a real goal that it works easy
>> with shell built-ins and text-utils, or "command line",
>> to work with the files.
>>
>>
>> So, sort pretty well goes with filtering.
>> If you're familiar with the context, of, "data tables",
>> with a filter-predicate and a sort-predicate,
>> they're different things but then go together.
>> It's figured that they get front-ended according
>> to the quite most usual "column model" of the
>> "table model" then "yes/no/maybe" row filtering
>> and "multi-sort" row sorting. (In relational algebra, ...,
>> or as rather with 'relational algebra with rows and nulls',
>> this most usual sort of 'composable filtering' and 'multi-sort').
>>
>> Then in IMAP, the THREAD command is "a variant of
>> SEARCH with threading semantics for the results".
>> This is where both posts and emails work off the
>> References header, but it looks like in the wild there
>> is something like "a vendor does poor-man's subject
>> threading for you and stuffs in a X-References",
>> this kind of thing, here with regards to that
>> instead of concatenation, is that intermediate
>> results get sorted and threaded together,
>> then those, get interleaved and stably sorted
>> together, that being sort of the idea, with regards
>> to search results in or among threads.
>>
>> (Cf www.jwz.org/doc/threading.html as
>> via www.rfc-editor.org/rfc/rfc5256.html ,
>> with regards to In-Reply-To and References.
>> There are some interesting articles there
>> about "mailbox summarization".)
>>
>> About the summary of posts, one way to start
>> as for example an interesting article about mailbox
>> summarization gets into, is, all the necessary text-encodings
>> to result UTF-8, of Unicode, after UCS-2 or UCS-4 or ASCII,
>> or CP-1252, in the base of BE or LE BOMs, or anything to
>> do with summarizing the character data, of any of the
>> headers, or the body of the text, figuring of course
>> that everything's delivered as it arrives, as with regards
>> to the opacity usually of everything vis-a-vis its inspection.
>>
>> This could be a normative sort of file that goes in the messageId/
>> folder.
>>
>> cd: character-data, a summary of whatever form of character
>> encoding or requirements of unfolding or unquoting or in
>> the headers or the body or anywhere involved indicating
>> a stamp indicating each of the encodings or character sets.
>>
>> Then, the idea is that it's a pretty deep inspection to
>> figure out how the various attributes, what are their
>> encodings, and the body, and the contents, with regards
>> to a sort of, "a normalized string indicating the necessary
>> character encodings necessary to extract attributes and
>> given attributes and the body and given sections", for such
>> matters of indicating the needful for things like sort,
>> and collation, in internationalization and localization,
>> aka i18n and l10n. (Given that the messages are stored
>> as they arrived and undisturbed.)
>>
>> The idea is that "the cd file doesn't exist for messages
>> in plain ASCII7, but for anything anywhere else, breaks
>> out what results how to get it out". This is where text
>> is often in a sort of format like this.
>>
>> Ascii
>> it's keyboard characters
>> ISO8859-1/ISO8859-15/CP-1252
>> it's Latin1 often though with the Windows guys
>> Sideout
>> it's Ascii with 0-127 gigglies or upper glyphs
>> Wideout
>> it's 0-256 with any 256 wide characters in upper Unicode planes
>> Unicode
>> it's Unicode
>>
>> Then there are all sorts of encodings, this is according to
>> the rules of Messages with regards to header and body
>> and content and transfer-encoding and all these sorts
>> things, it's Unicode.
>>
>> Then, another thing to get figured out is lengths,
>> the size of contents or counts or lengths, figuring
>> that it's a great boon to message-composition to
>> allocate exactly what it needs for when, as a sum
>> of invariant lengths.
>>
>> Then the MessageId/ files still has un-used 'l' and 's',
>> then though that 'l' looks too close to '1', here it's
>> sort of unambiguous.
>>
>> ld: lengthed, the coded and uncoded lengths of attributes and parts
>>
>> The idea here is to make it easiest for something like
>> "consult the lengths and allocate it raw, concatenate
>> the message into it, consult the lengths and allocate
>> it uncoded, uncode the message into it".
>>
>> So, getting into the SFF, is that basically
>> "BFF indicates well-formed messages or their expiry",
>> "SFF is derived via a common algorithm for all messages",
>> and "some SFF lives next to BFF and is also write-once-read-many",
>> vis-a-vis that "generally SFF is discardable because it's derivable".
>>
>>
>
>
>
> So, figuring that BFF then is about designed,
> basically for storing Internet messages with
> regards to MessageId, then about ContentId
> and external resources separately, then here
> the idea again becomes how to make for
> the SFF files, what results, intermediate, tractable,
> derivable, discardable, composable data structures,
> in files of a format with regards to write-once-read-many,
> write-once-read-never, and, "partition it", in terms of
> natural partitions like time intervals and categorical attributes.
>
>
> There are some various great open-source search
> engines, here with respect to something like Lucene
> or SOLR or ElasticSearch.
>
> The idea is that there are attributes searches,
> and full-text searches, those resulting hits,
> to documents apiece, or sections of their content,
> then backward along their attributes, like
> threads and related threads, and authors and
> their cliques, while across groups and periods
> of time.
>
> There's not much of a notion of "semantic search",
> though, it's expected to sort of naturally result,
> here as for usually enough least distance, as for
> "the terms of matching", and predicates from what
> results a filter predicate, here with what I call,
> "Yes/No/Maybe".
>
> Now, what is, "yes/no/maybe", one might ask.
> Well, it's the query specification, of the world
> of results, to filter to the specified results.
> The idea is that there's an accepter network
> for "Yes" and a rejector network for "No"
> and an accepter network for "Maybe" and
> then rest are rejected.
>
> The idea is that the search, is a combination
> of a bunch of yes/no/maybe terms, or,
> sure/no/yes, to indicate what's definitely
> included, what's not, and what is, then that
> the term, results that it's composable, from
> sorting the terms, to result a filter predicate
> implementation, that can run anywhere along
> the way, from the backend to the frontend,
> this way being a, "search query specification".
>
>
> There are notions like, "*", and single match
> and multimatch, about basically columns and
> a column model, of documents, that are
> basically rows.
>
>
> The idea of course is to built an arithmetic expression,
> that also is exactly a natural expression,
> for "matches", and "ranges".
>
> "AP"|Archimedes|Plutonium in first|last
>
> Here, there is a search, for various names, that
> it composes this way.
>
> AP first
> AP last
> Archimedes first
> Archimedes last
> Plutonium first
> Plutonium last
>
> As you can see, these "match terms", just naturally
> break out, then that what's gets into negations,
> break out and double, and what gets into ranges,
> then, well that involves for partitions and ranges,
> duplicating and breaking that out.
>
> It results though a very fungible and normal form
> of a search query specification, that rebuilds the
> filter predicate according to sorting those, then
> has very well understood runtime according to
> yes/no/maybe and the multimatch, across and
> among multiple attributes, multiple terms.
>
>
> This sort of enriches a usual sort of query
> "exact full hit", with this sort "ranges and conditions,
> exact full hits".
>
> So, the Yes/No/Maybe, is the generic search query
> specification, overall, just reflecting an accepter/rejector
> network, with a bit on the front to reflect keep/toss,
> that's it's very practical and of course totally commonplace
> and easily written broken out as find or wildmat specs.
>
> For then these the objects and the terms relating
> the things, there's about maintaining this, while
> refining it, that basically there's an ownership
> and a reference count of the filter objects, so
> that various controls according to the syntax of
> the normal form of the expression itself, with
> most usual English terms like "is" and "in" and
> "has" and "between", and "not", with & for "and"
> and | for "or", makes that this should be the kind
> of filter query specification that one would expect
> to be general purpose on all such manners of
> filter query specifications and their controls.
>
> So, a normal form for these filter objects, then
> gets relating them to the SFF files, because, an
> SFF file of a given input corpus, satisifies some
> of these specifications, the queries, or for example
> doesn't, about making the language and files
> first of the query, then the content, then just
> mapping those to the content, which are built
> off extractors and summarizers.
>
> I already thought about this a lot. It results
> that it sort of has its own little theory,
> thus what can result its own little normal forms,
> for making a fungible SFF description, what
> results for any query, going through those,
> running the same query or as so filtered down
> the query for the partition already, from the
> front-end to the back-end and back, a little
> noisy protocol, that delivers search results.
>
>



Wondering about how to implement SFF or summary
and search, the idea seems "well you just use Lucene
like everybody else", and it's like, well, I sort of have
this idea about a query language already, and there's
that I might or might not have the use case of cluster
computing a whole Internet, and pretty much figure
that it's just some partitions and then there's not much
to be usually having massive-memory on-line clusters,
vis-a-vis, low or no traffic, then for the usual idea
that the implementation should auto-scale, be
elastic as it were, and that it should even fall back
to just looking through files or naive search, vis-a-vis
indices. The idea of partitions is that they indicate
the beginning through the end of changes to data,
that archive partitions can have enduring search indices,
while active partitions have growing search indices.


So, the main idea is that searches make matches make
hits, then the idea that there's a partitions concordance,
then with regards to the index of a document its terms,
then with regards to the most usual sorts of the fungible
forms the inverse document frequency setup, in the middle.

https://en.wikipedia.org/wiki/Concordance


What this gets into then is "growing file / compacting file".
The idea is that occurrences accumulate in the growing
file, forward, and (linear) searches of the growing file
are backward, though what it entails, is that the entries
get accumulated, then compacting is to deduplicate those,
or just pick off the last, then put that into binary tree
or lexicographic, or about the associations of the terms.

"The quick brown fox jumped over the lazy dog."

This is a usual example sentence, "The quick brown
fox jumped over the lazy dog", vis-a-vis, "Lorem ipsum".

https://en.wikipedia.org/wiki/Lorem_ipsum

Ah, it's, "the quick brown fox jumps over the lazy dog",
specifically as a, "pangram", a sentence containing each
letter of the alphabet.

https://en.wikipedia.org/wiki/The_quick_brown_fox_jumps_over_the_lazy_dog

So, the idea is basically to write lines, appending those,
that basically there's a serial appender, then that search
on the active partition, searches backward so can find
the last most full line, which the appender can also do,
with regards to a corresponding "reverse line reader",
with regards to a line-index file, fixed-length offsets
to each line, with regards to memory-mapping the
file, and forward and reverse iterators.

document 1 See Spot Run
document 2 See Spot Run

See: 1
Spot: 1
Run: 1
See: 1,2
Spot: 1,2
Run: 1,2

That for individual terms, blows up very quickly. Yet,
the idea is that most terms are in archive partitions,
where then those would be stored in a format
basically with lexicographic or phone-book sorting,
seems for something like, "anagram phonebook",

ees: see 1,2
nru: run 1,2
post: spot 1,2

vis-a-vis "plain phone-book",

run: 1,2
see: 1,2
spot: 1,2

the idea that to look up a word, to look up its letters,
or for example its distinct letters,

es: see 1,2
nru: run 1,2
post: spot 1,2

with regards to a pretty agnostic setting of words, by letters.

Getting into etymology and stemming, and roots and
the whole shebang of parts-of-speech and synonymity,
would seem to get involved, vis-a-vis symbols and terms,
that in terms of letters like ideograms, results ideograms
work out about same, as with regards to contents of
single- and multiple-letter, or glyph, words, and these
kinds things, and for example emojis and the range.

Then another idea that gets involved for close matches
and these kinds of things, is a distance between the minimal
letters, though with regards to hits and misses.

e
es: see 1,2
n
nr
nru: run 1,2
p
po
pos
post: spot 1,2


e 12
es 2
n 345
nr 45
nru 5
p 6789
po 789
pos 89
post 9

https://en.wikipedia.org/wiki/Nonparametric_statistics
https://en.wikipedia.org/wiki/Summary_statistics

The idea for statistics is to help result when it's
possible for "found the hits", vis-a-vis, "not found
the hits", then also as that search queries and search
results also, become "growing files / compacting files"
in the "active partition / archive partition", of search
results, then with regards to "matching queries /
matching hits", with regards to duplicated queries,
and usual and ordinary queries having compiled hits
for their partitions. (Active query hits for each
partition.) This gets into MRU, LRU, this kind of
thing, usual notions of cache affinity and coherency.

https://en.wikipedia.org/wiki/Frecency

Now that's a new one, I never heard of "frecency" before,
but the idea of combining MRU and MFU, most-recently
and most-frequently, makes a lot of sense.

Then this idea for search queries, is to break it down,
or to have a default sort of plan, what results then
the terms search in the sub-query, get composable,
vis-a-vis, building the results.

https://en.wikipedia.org/wiki/Indexed_file
https://en.wikipedia.org/wiki/Inverted_index


The idea for binary tree, seems to find the
beginning and end of ranges, then search
the linear part inside that with two or
alternated iterators, that "exact-match
is worst-case", or middle of the range,
yet it works out that most aren't that bad.

I.e., average case.

https://en.wikipedia.org/wiki/Bag-of-words_model

So, this seems sort of a bag-of-letters model,
about things like common letters and words,
and usual means of reducing words to unambiguous
representations removing "redundant" letters,
about rdndnt lttrs though litters. I.e. it would
be dictionariological, dictionarial, with here that
being secondary, and after stemming and etymology.

https://en.wikipedia.org/wiki/Shorthand
https://en.wikipedia.org/wiki/Stemming


(As far as stemming goes, I'm still trying to
figure out plurals, or plural forms.)

https://en.wikipedia.org/wiki/Z39.50

Huh, haven't heard of Z39.50 in a while.

So, it's like, "well this isn't the usual idea of
making Lucene-compatible input files and
making a big old data structure in memory
and a bit of a multi-cast topology and scaling
by exploding" and it isn't, this is much more
of a "modestly accommodate indices to implement
search with growing and compacting files
and natural partitions with what results
sort of being readable and self-describing".


The query format is this idea of "Sure/No/Yes"
which makes for that the match terms,
and the Boolean, or conjunctive and disjunctive,
of course has a sort of natural language
representation into what queries may be,
then about the goals of results of surveying
the corpus for matching the query.

So, part of surveying the corpus, is hits,
direct deep hits to matches. The other,
is prompts, that given a query term that
matches many, to then refine those.
Then the idea is to select of among those
putting the result into "Sure", then refine
the query, that the query language, supports
a sort of query session, then to result bulk
actions on the selections.

The query language then, is about as simple
and associative as it can be, for example,
by example, then with regards to that there
are attribute-limited searches, or as with
respect to "columns", about rows and columns,
and then usually with regards to the front-end
doing selection and filtering, and sorting,
and the back-end doing this sort of accumulation
of the query session in terms of the refinements
or iterations of the query, to what should result
the idea that then the query is decomposable,
to reflect that then over the partitions over
the SFF files, as it were, the summary and search
data, and then into the documents themselves,
or as with regards to the concordance the
sections, making for a model of query as
both search and selection, and filtering and sorting,
front-end and back-end, that it's pretty usual
in all sorts of "data table" and "search and browse"
type use-cases, or applications.

Archimedes Plutonium

Name Plutonium?
Subject Plutonium?
Body Plutonium?

The usual idea with prompts is to fill the suggestion
bar with question marks, then to use space
to toggle into those, but that gets involved
with "smart search" and "smart bar" implementations.

Name is Archimedes or Plutonium
Subject has Archimedes or Plutonium
Body has Archimedes or Plutonium

bob not carol joan mark

bob joan mark
not carol

bob
not carol joan mark

bob -carol joan mark

Name is Bob, Role is Job

Archimedes Plutonium

* Archimedes * Plutonium

* *

*

See, the idea is that each term is "column*, term*",
then that those are "or" inside, and "and" outside.

Name bob carol joan mark Role job

Then the various ideas of "or" as combining and
"and" as "excluding outside the or", make and
keep things simple, then also as that when
there are ambiguities, then ambiguities can
be presented as alternatives, then those picked out.

cell|desk 206|415 address local <- different columns, "and", implicit
phone is local, or, address local <- different columns, "or", explicit

The idea is that for a corpus, there are only so
many column names, all else being values,
or term-match-predicate inputs.

2010- Archimedes Plutonium

It's figured that "between" gets involved in
ranges, either time ranges or lexicographic/alphabetic
ranges, that it's implemented this "not less than"
and "not greater" than, that the _expression_,
get parsed down to these simpler sorts
match terms, so that then those all combine
then for the single and multiple column cases,
with multiplicity in disjoint ranges, this is sort
of how it is when I designed this and implemented
much of a smart search bit for all the usual use-cases.

"Yes No Maybe", ..., with reference-counted search
control owners in a combined selection, search,
and filtering model, for the front-end and back-end,
both the same data structure, "query session",
then mostly about usual match terms and operators.

It's so sensible that it should be pretty much standard,
basically as follows being defined by a column model.
I.e., it's tabular data.


"Prompts" then is figuring out prompts and tops,
column-tops in a column model, then as with
regards to "Excerpts", is that in this particular
use case, messages almost always include both
references in their threads, and, excerpts in
the replies, to associate the excerpts with their
sources, that being as well a sort of matching,
though that it's helped by the convention,
the so-many-deep so-many-back block-quoting
convention, which though is subject to
not following the convention.

Here then this is for one of the BFF files, if
you might recall or it's here in this thread,
about that block-quoting is a convention,
vis-a-vis the usual top-posting and bottom-posting
and the usual full-excerpt or partial-excerpt
and the usual convention and the destroyed,
that the search hit goes to the source, only
falling back to the excerpt, when the source
doesn't exist, or that it sticks out as "broken"
the 'misquoted out of context', bit.

Yet, the BFF is mostly agnostic and that mean
ignorant of anything but "message contents,
one item". So how the BFF and SFF are co-located,
gets into these things, where there's sort of
1-SFF, that's derivative one message, 2-SFF,
that's pairwise two messages, then as with
regards to n-SFF, is about the relations of
those, with regards to N-SFF the world of those,
then though P-SFF particularly, the partition
of those, and the pair-wise relations which
explode, and the partition summaries which enclose.


These kinds of things, ....
Ross Finlayson
2024-03-27 01:04:26 UTC
Reply
Permalink
arithmetic hash searches

take a hashcode, split it up

invert each arithmetically, find intersection in 64 bits

fill in those

detect misses when the bits don't intersect the search

when all hits, then "refine", next double range,

compose those naturally by union

when definite misses excluded then go find matching partition

arithmetic partition hash

So, the idea is, that, each message ID, has applied a uniform
hash, then that it fills a range, of so many bits.

Then, its hash is split into smaller chunks the same 1/2/3/4
of the paths, then those are considered a fixed-point fraction,
of the bits set of the word width, plus one.

Then, sort of pyramidally, is that in increasing words, or doubling,
is that a bunch of those together, mark those words,
uniformly in the range.

For example 0b00001111, would mark 0b00001000, then
0b0000000010000000, and so on, for detecting whether
the hash code's integer value, is in the range 15/16 - 16/16.

The idea is that the ranges this way compose with binary OR,
then that a given integer, then that the integer, can be
detected to be out of the range, if its bit is zero, and then
otherwise that it may or may not be in the range.

0b00001111 number N1
0b00001000 range R1
0b00000111 number N2
0b00000100 range R2

0b00001100 union range UR = R1 | R2 | ....


missing(N) {
return (UR & N == 0);
}


This sort of helps where, in a usual hash map, determining
that an item doesn't exist, is worst case, while the usual
finding the item that exists is log 2, then that usually its value
is associated with that, besides.

Then, when there are lots of partitions, and they're about
uniform, it's expected the message ID to be found in only
one of the partitions, is that the partitions can be organized
according to their axes of partitions, composing the ranges
together, then that search walks down those, until it's either
a definite miss, or an ambiguous hit, then to search among
those.

It seems then for each partition (group x date), then those
can be composed together (group x month, group x year,
groups x year, all), so that looking to find the group x date
where a message ID is, results that it's a constant-time
operation to check each of those, and the data structure
is not very large, with regards to computing the integers'
offset in each larger range, either giving up when it's
an unambiguous miss or fully searching when it's an
ambiguous hit.

This is where, the binary-tree that searches in log 2 n,
worst-case, where it's balanced and uniform, though
it's not to be excluded that a usual hashmap implementation
is linear in hash collisions, is for excluding partitions,
in about constant time and space given that it's just a
function of the number of partitions and the eventual
size of the pyramidal range, that instead of having a
binary tree with space n^2, the front of it has size L r
for L the levels of the partition pyramid and r the size
of the range stamp.

Then, searching in the partitions, seems it essentially
results, that there's an ordering of the message IDs,
so there's the "message IDs" file, either fixed-length-records
or with an index file with fixed-length-records or otherwise
for reading out the groups' messages, then another one
with the message ID's sorted, figuring there's a natural
enough binary search of those with value identity, or bsearch
after qsort, as it were.

So, the idea is that there's a big grid of group X date archives,
each one of those a zip file, with being sort of contrived the
zip files, so that each entry is self-contained, and it sort of
results that concatenating them results another. So
anyways, the idea then is for each of those, for each of
their message IDs, to compute its four integers, W_i,
then allocate a range, and zero it, then saturate each
bit, in each range for each integer. So, that's like, say,
for fitting the range into 4K, for each partition, with
there being 2^8 of those in a megabyte, or that many
partitions (512), or about a megabyte in space for each
partition, but really where these are just variables,
because it's opportunistic, and the ranges can start
with just 32 or 64 bits figuring that most partitions
are sparse, also, in this case, though usually it would
be expected they are half-full.

There are as many of these ranges as the hash is split
into numbers, is the idea.

Then the idea is that these ranges are pyramidal in the
sense, that when doing lookup for the ID, is starting
from the top of the pyramid, projecting the hash number
into the range bit string, with one bit for each sub-range,
so it's branchless, and'ing the number bits and the partition
range together, and if any of the hash splits isn't in the
range, a branch, dropping the partition pyramid, else,
descending into the partition pyramid.

(Code without branches can go a lot faster than
code with lots of branches, if/then.)

At each level of the pyramid, it's figured that only one
of the partitions will not be excluded, except for hash
collisions, then if it's a base level to commence bsearch,
else to drop the other partition pyramids, and continue
with the reduced set of ranges in RAM, and the projected
bits of the ID's hash integer.

The ranges don't even really have to be constant if it's
so that there's a limit so they're under a constant, then
according to uniformity they only have so many, eg,
just projecting out their 1's, so the partition pyramid
digging sort of always finds one or more partitions
with possible matches, those being hash collisions or
messages duplicated across groups, and mostly finds
those with exclusions, so that it results reducing, for
example that empty groups are dropped right off
though not being skipped, while full groups then
get into needing more than constant space and
constant time to search.

Of course if all the partitions miss then it's
also a fast exit that none have the ID.

So, this, "partition pyramid hash filter", with basically,
"constant and configurable space and time", basically
has that because Message Id's will only exist in one or
a few partitions, and for a single group and not across
about all groups, exactly one, and the hash is uniform, so
that hash collisions are low, and the partitions aren't
overfilled, so that hash collisions are low, then it sort
of results all the un-used partitions at rest, don't fill
up in n^2 space the log 2 n hash-map search. Then,
they could, if there was spare space, and it made sense
that in the write-once-read-many world it was somehow
many instead of never, a usual case, or, just using a
list of sorted message Id's in the partition and bsearch,
this can map the file without loading its contents in
space, except as ephemerally, or the usual disk controller's
mmap space, or "ready-time" and "ephemeral-space".

In this sort of way there's no resident RAM for the partitions
except each one with a fixed-size arithmetic hash stamp,
while lookups have a fixed or constant cost, plus then
also a much smaller usual log 2 time / n^2 space trade-off,
while memory-mapping active files automatically caches.


So, the idea is to combine the BFF backing file format
and LFF library file format ideas, with that the group x date
partitions make the for archive and active partitions,
then to have constant-time/constant-space partition
pyramid arithmetic hash range for lookup, then
ready-time/ephemeral-space lookup in partitions,
then that the maintenance of the pyramid tree,
happens with dropping partitions, while just
accumulating with adding partitions.

Yeah, I know that a usual idea is just to make a hash map
after an associative array with log 2 n lookup in n^2 space,
that maintenance is in adding and removing items,
here the idea is to have partitions above items,
and sort of naturally to result "on startup, find
the current partitions, compose their partition pyramid,
then run usually constant-time/constant-space in that
then ready-time/ephemeral-space under that,
maintenance free", then that as active partitions
being written roll over to archive partitions being
finished, then they just get added to the pyramid
and their ranges or'ed up into the pyramid.

Hmm... 32K or 2^15 groups, 16K or 2^14 days, or
about 40 years of Usenet in partitions, 2^29,
about 2^8 per megabyte or about 2^20 or one
gigabyte RAM, or, just a file, then memory-mapping
the partition pyramid file, figuring again that
most partitions are not resident in RAM,
this seems a sort of good simple idea to
implement lookup by Message ID over 2^30 many.

I mean if "text Usenet for all time is about a billion messages",
it seems around that size.
Ross Finlayson
2024-03-28 04:05:44 UTC
Reply
Permalink
On 03/26/2024 06:04 PM, Ross Finlayson wrote:
> arithmetic hash searches
>
> take a hashcode, split it up
>
> invert each arithmetically, find intersection in 64 bits
>
> fill in those
>
> detect misses when the bits don't intersect the search
>
> when all hits, then "refine", next double range,
>
> compose those naturally by union
>
> when definite misses excluded then go find matching partition
>
> arithmetic partition hash
>
> So, the idea is, that, each message ID, has applied a uniform
> hash, then that it fills a range, of so many bits.
>
> Then, its hash is split into smaller chunks the same 1/2/3/4
> of the paths, then those are considered a fixed-point fraction,
> of the bits set of the word width, plus one.
>
> Then, sort of pyramidally, is that in increasing words, or doubling,
> is that a bunch of those together, mark those words,
> uniformly in the range.
>
> For example 0b00001111, would mark 0b00001000, then
> 0b0000000010000000, and so on, for detecting whether
> the hash code's integer value, is in the range 15/16 - 16/16.
>
> The idea is that the ranges this way compose with binary OR,
> then that a given integer, then that the integer, can be
> detected to be out of the range, if its bit is zero, and then
> otherwise that it may or may not be in the range.
>
> 0b00001111 number N1
> 0b00001000 range R1
> 0b00000111 number N2
> 0b00000100 range R2
>
> 0b00001100 union range UR = R1 | R2 | ....
>
>
> missing(N) {
> return (UR & RN == 0);
> }
>
>
> This sort of helps where, in a usual hash map, determining
> that an item doesn't exist, is worst case, while the usual
> finding the item that exists is log 2, then that usually its value
> is associated with that, besides.
>
> Then, when there are lots of partitions, and they're about
> uniform, it's expected the message ID to be found in only
> one of the partitions, is that the partitions can be organized
> according to their axes of partitions, composing the ranges
> together, then that search walks down those, until it's either
> a definite miss, or an ambiguous hit, then to search among
> those.
>
> It seems then for each partition (group x date), then those
> can be composed together (group x month, group x year,
> groups x year, all), so that looking to find the group x date
> where a message ID is, results that it's a constant-time
> operation to check each of those, and the data structure
> is not very large, with regards to computing the integers'
> offset in each larger range, either giving up when it's
> an unambiguous miss or fully searching when it's an
> ambiguous hit.
>
> This is where, the binary-tree that searches in log 2 n,
> worst-case, where it's balanced and uniform, though
> it's not to be excluded that a usual hashmap implementation
> is linear in hash collisions, is for excluding partitions,
> in about constant time and space given that it's just a
> function of the number of partitions and the eventual
> size of the pyramidal range, that instead of having a
> binary tree with space n^2, the front of it has size L r
> for L the levels of the partition pyramid and r the size
> of the range stamp.
>
> Then, searching in the partitions, seems it essentially
> results, that there's an ordering of the message IDs,
> so there's the "message IDs" file, either fixed-length-records
> or with an index file with fixed-length-records or otherwise
> for reading out the groups' messages, then another one
> with the message ID's sorted, figuring there's a natural
> enough binary search of those with value identity, or bsearch
> after qsort, as it were.
>
> So, the idea is that there's a big grid of group X date archives,
> each one of those a zip file, with being sort of contrived the
> zip files, so that each entry is self-contained, and it sort of
> results that concatenating them results another. So
> anyways, the idea then is for each of those, for each of
> their message IDs, to compute its four integers, W_i,
> then allocate a range, and zero it, then saturate each
> bit, in each range for each integer. So, that's like, say,
> for fitting the range into 4K, for each partition, with
> there being 2^8 of those in a megabyte, or that many
> partitions (512), or about a megabyte in space for each
> partition, but really where these are just variables,
> because it's opportunistic, and the ranges can start
> with just 32 or 64 bits figuring that most partitions
> are sparse, also, in this case, though usually it would
> be expected they are half-full.
>
> There are as many of these ranges as the hash is split
> into numbers, is the idea.
>
> Then the idea is that these ranges are pyramidal in the
> sense, that when doing lookup for the ID, is starting
> from the top of the pyramid, projecting the hash number
> into the range bit string, with one bit for each sub-range,
> so it's branchless, and'ing the number bits and the partition
> range together, and if any of the hash splits isn't in the
> range, a branch, dropping the partition pyramid, else,
> descending into the partition pyramid.
>
> (Code without branches can go a lot faster than
> code with lots of branches, if/then.)
>
> At each level of the pyramid, it's figured that only one
> of the partitions will not be excluded, except for hash
> collisions, then if it's a base level to commence bsearch,
> else to drop the other partition pyramids, and continue
> with the reduced set of ranges in RAM, and the projected
> bits of the ID's hash integer.
>
> The ranges don't even really have to be constant if it's
> so that there's a limit so they're under a constant, then
> according to uniformity they only have so many, eg,
> just projecting out their 1's, so the partition pyramid
> digging sort of always finds one or more partitions
> with possible matches, those being hash collisions or
> messages duplicated across groups, and mostly finds
> those with exclusions, so that it results reducing, for
> example that empty groups are dropped right off
> though not being skipped, while full groups then
> get into needing more than constant space and
> constant time to search.
>
> Of course if all the partitions miss then it's
> also a fast exit that none have the ID.
>
> So, this, "partition pyramid hash filter", with basically,
> "constant and configurable space and time", basically
> has that because Message Id's will only exist in one or
> a few partitions, and for a single group and not across
> about all groups, exactly one, and the hash is uniform, so
> that hash collisions are low, and the partitions aren't
> overfilled, so that hash collisions are low, then it sort
> of results all the un-used partitions at rest, don't fill
> up in n^2 space the log 2 n hash-map search. Then,
> they could, if there was spare space, and it made sense
> that in the write-once-read-many world it was somehow
> many instead of never, a usual case, or, just using a
> list of sorted message Id's in the partition and bsearch,
> this can map the file without loading its contents in
> space, except as ephemerally, or the usual disk controller's
> mmap space, or "ready-time" and "ephemeral-space".
>
> In this sort of way there's no resident RAM for the partitions
> except each one with a fixed-size arithmetic hash stamp,
> while lookups have a fixed or constant cost, plus then
> also a much smaller usual log 2 time / n^2 space trade-off,
> while memory-mapping active files automatically caches.
>
>
> So, the idea is to combine the BFF backing file format
> and LFF library file format ideas, with that the group x date
> partitions make the for archive and active partitions,
> then to have constant-time/constant-space partition
> pyramid arithmetic hash range for lookup, then
> ready-time/ephemeral-space lookup in partitions,
> then that the maintenance of the pyramid tree,
> happens with dropping partitions, while just
> accumulating with adding partitions.
>
> Yeah, I know that a usual idea is just to make a hash map
> after an associative array with log 2 n lookup in n^2 space,
> that maintenance is in adding and removing items,
> here the idea is to have partitions above items,
> and sort of naturally to result "on startup, find
> the current partitions, compose their partition pyramid,
> then run usually constant-time/constant-space in that
> then ready-time/ephemeral-space under that,
> maintenance free", then that as active partitions
> being written roll over to archive partitions being
> finished, then they just get added to the pyramid
> and their ranges or'ed up into the pyramid.
>
> Hmm... 32K or 2^15 groups, 16K or 2^14 days, or
> about 40 years of Usenet in partitions, 2^29,
> about 2^8 per megabyte or about 2^20 or one
> gigabyte RAM, or, just a file, then memory-mapping
> the partition pyramid file, figuring again that
> most partitions are not resident in RAM,
> this seems a sort of good simple idea to
> implement lookup by Message ID over 2^30 many.
>
> I mean if "text Usenet for all time is about a billion messages",
> it seems around that size.
>
>



So, trying to figure out if this "arithmetic hash range
pyramidal partition" data structure is actually sort of
reasonable, gets into that it involves finding a balance
in what's otherwise a very well-understood trade-off,
in terms of the cost of a lookup, over time, and then
especially as whether an algorithm is "scale-able",
that even a slightly lesser algorithm might be better
if it results "scale-able", especially if it breaks down
to a very, very minimal set of resources, in time,
and in various organizations of space, or distance,
which everybody knows as CPU, RAM, and DISK,
in terms of time, those of lookups per second,
and particularly where parallelizable as with
regards to both linear speed-up and also immutable
data structures, or, clustering. ("Scale.")


Then it's probably so that the ranges are pretty small,
because they double, and whether it's best just to
have an overall single range, or, refinements of it,
according to a "factor", a "factor" that represents
how likely it is that hashes don't collide in the range,
or that they do.

This is a different way of looking at hash collisions,
besides that two objects have the same hash,
just that they're in the same partition of the range
their integer value, for fixed-length uniform hashes.

I.e., a hash collision proper would always be a
redundant or order-dependent dig-up, of a sort,
where the idea is that the lookup first results
searching the pyramid plan for possibles, then
digging up each of those and checking for match.

The idea that group x date sort of has that those
are about on the same order is a thing, then about
the idea that "category" and "year" are similarly
about so,

Big8 x year
group x date

it's very contrived to have those be on the same
order, in terms of otherwise partitioning, or about
what it results that "partitions are organized so that
their partitions are tuples and the tuples are about
on the same order, so it goes, thus that uniformity
of hashes, results being equi-distributed in those,
so that it results the factor is good and that arithmetic
hash ranges filter out most of the partitions, and,
especially that there aren't many false-positive dig-up
partitions.

It's sort of contrived, but then it does sort of make
it so that also other search concerns like "only these
groups or only these years anyways", naturally get
dropped out at the partition layer, and, right in the
front of the lookup algorithm.

It's pretty much expected though that there would
be non-zero false-positive dig-ups, where here a dig-up
is that the arithmetic hash range matched, but it's
actually a different Message ID's hash in the range,
and not the lookup value(s).

Right, so just re-capping here a bit, the idea is that
there are groups, and dates, and for each is a zip file,
which is a collection of files in a file-system entry file
with about random access on the zip file each entry,
and compressed, and the entries include Messages,
by their Message ID's, then that the entries are
maybe in sub-directories, that reflect components
of the Message ID's hash, where a hash, is a fixed-length
value, like 64 bytes or 128 bytes, or a power of two
and usually an even power of two thus a multiple of four,
thus that a 64 byte hash has 2^64 * 2^8 many possible
values, then that a range, of length R bits, has R many
partitions, in terms of the hash size and the range size,
whether the factor is low enough, that most partitions
will naturally be absent most ranges, because hashes
can only be computed from Message ID's, not by their
partitions or other information like the group or date.

So, if there are 2^30 or a billion messages, then a
32 bit hash, would have a fair expectation that
unused values would be not dense, then for
what gets into "birthday problem" or otherwise
how "Dirichlet principle" makes for how often
are hash collisions, for how often are range collisions,
either making redundant dig-ups, in the way this
sort of algorithm services look-ups.

The 32 bits is quite a bit less than 64 * 8, though,
about whether it would also result, that, splitting
that into subdirectories, results different organizations
here about "tuned to Usenet-scale and organization",
vis-a-vis, "everybody's email" or something like that.
That said, it shouldn't just fall apart if the size or
count blows up, though it might be expect then
a various sorts of partitioning, to keep the partition
tuple orders square, or on the same orders.


The md5 is widely available, "md5sum", it's 128 bits,
its output is hexadecimal characters, 32-many.

https://en.wikipedia.org/wiki/MD5
https://en.wikipedia.org/wiki/Partition_(database)
https://en.wikipedia.org/wiki/Hash_function#Uniformity

Otherwise the only goal of the hash is to be uniform,
and also to have "avalanche criterion", so that near Message-Id's
will still be expected to have different hashes, as it's not
necessarily expected that they're the same group and
date, though that would be a thing, yet Message ID's
should be considered opaque and not seated together.

Then MD5 is about the most usual hash utility laying
around, if not SHA-1, or SHA-256. Hmm..., in the
interests of digital preservation is "the tools for
any algorithms should also be around forever",
one of those things.

So anyways, then each group x date has its Message ID's,
each of those has its hash, each of those fits in a range,
indicating one bit in the range where it is, then those are
OR'd together to result a bit-mask of the range, then
that a lookup can check its hash's bit against the range,
and dig-up the partition if it's in, or, skip the partition
if it's not, with the idea that the range is big enough
and the resulting group x date is small enough, that
the "pyramidal partition", is mostly sparse, at the lower
levels, that it's mostly "look-arounds" until finally the
"dig-ups", in the leaf nodes of the pyramidal partitions.

I.e., the dig-ups will eventually include spurious or
redundant false-positives, that the algorithm will
access the leaf partitions at uniform random.

The "pyramidal" then also get into both the empties,
like rec.calm with zero posts ten years running,
or alt.spew which any given day exceeds zip files
or results a lot of "zip format, but the variously
packaged, not-recompressed binaries", the various
other use cases than mostly at-rest and never-read
archival purposes. The idea of the "arithmetic hash
range pyramidal partition" is that mostly the
leaf partitions are quite small and sparse, and
mostly the leveling of the pyramid into year/month/date
and big8/middle/group, as it were, winnows those
down in what's a constant-rate constant-space scan
on the immutable data structure of the partition pyramid.

Yeah, I know, "numbers", here though the idea is
that about 30K groups at around 18K days = 50 years
makes about 30 * 20 * million or less than a billion
files the zip files, which would all fit on a volume
that supports up to four billion-many files, or an
object-store, then with regards to that most of
those would be quite small or even empty,
then with regards to "building the pyramid",
the levels big8/middle/group X year/month/date,
the data structure of the hashes marking the ranges,
then those themselves resulting a file, which are
basically the entire contents of allocated RAM,
or for that matter a memory-mapped file, with
the idea that everything else is ephemeral RAM.
Ross Finlayson
2024-04-14 15:36:01 UTC
Reply
Permalink
On 03/27/2024 09:05 PM, Ross Finlayson wrote:
> On 03/26/2024 06:04 PM, Ross Finlayson wrote:
>> arithmetic hash searches
>>
>> take a hashcode, split it up
>>
>> invert each arithmetically, find intersection in 64 bits
>>
>> fill in those
>>
>> detect misses when the bits don't intersect the search
>>
>> when all hits, then "refine", next double range,
>>
>> compose those naturally by union
>>
>> when definite misses excluded then go find matching partition
>>
>> arithmetic partition hash
>>
>> So, the idea is, that, each message ID, has applied a uniform
>> hash, then that it fills a range, of so many bits.
>>
>> Then, its hash is split into smaller chunks the same 1/2/3/4
>> of the paths, then those are considered a fixed-point fraction,
>> of the bits set of the word width, plus one.
>>
>> Then, sort of pyramidally, is that in increasing words, or doubling,
>> is that a bunch of those together, mark those words,
>> uniformly in the range.
>>
>> For example 0b00001111, would mark 0b00001000, then
>> 0b0000000010000000, and so on, for detecting whether
>> the hash code's integer value, is in the range 15/16 - 16/16.
>>
>> The idea is that the ranges this way compose with binary OR,
>> then that a given integer, then that the integer, can be
>> detected to be out of the range, if its bit is zero, and then
>> otherwise that it may or may not be in the range.
>>
>> 0b00001111 number N1
>> 0b00001000 range R1
>> 0b00000111 number N2
>> 0b00000100 range R2
>>
>> 0b00001100 union range UR = R1 | R2 | ....
>>
>>
>> missing(N) {
>> return (UR & RN == 0);
>> }
>>
>>
>> This sort of helps where, in a usual hash map, determining
>> that an item doesn't exist, is worst case, while the usual
>> finding the item that exists is log 2, then that usually its value
>> is associated with that, besides.
>>
>> Then, when there are lots of partitions, and they're about
>> uniform, it's expected the message ID to be found in only
>> one of the partitions, is that the partitions can be organized
>> according to their axes of partitions, composing the ranges
>> together, then that search walks down those, until it's either
>> a definite miss, or an ambiguous hit, then to search among
>> those.
>>
>> It seems then for each partition (group x date), then those
>> can be composed together (group x month, group x year,
>> groups x year, all), so that looking to find the group x date
>> where a message ID is, results that it's a constant-time
>> operation to check each of those, and the data structure
>> is not very large, with regards to computing the integers'
>> offset in each larger range, either giving up when it's
>> an unambiguous miss or fully searching when it's an
>> ambiguous hit.
>>
>> This is where, the binary-tree that searches in log 2 n,
>> worst-case, where it's balanced and uniform, though
>> it's not to be excluded that a usual hashmap implementation
>> is linear in hash collisions, is for excluding partitions,
>> in about constant time and space given that it's just a
>> function of the number of partitions and the eventual
>> size of the pyramidal range, that instead of having a
>> binary tree with space n^2, the front of it has size L r
>> for L the levels of the partition pyramid and r the size
>> of the range stamp.
>>
>> Then, searching in the partitions, seems it essentially
>> results, that there's an ordering of the message IDs,
>> so there's the "message IDs" file, either fixed-length-records
>> or with an index file with fixed-length-records or otherwise
>> for reading out the groups' messages, then another one
>> with the message ID's sorted, figuring there's a natural
>> enough binary search of those with value identity, or bsearch
>> after qsort, as it were.
>>
>> So, the idea is that there's a big grid of group X date archives,
>> each one of those a zip file, with being sort of contrived the
>> zip files, so that each entry is self-contained, and it sort of
>> results that concatenating them results another. So
>> anyways, the idea then is for each of those, for each of
>> their message IDs, to compute its four integers, W_i,
>> then allocate a range, and zero it, then saturate each
>> bit, in each range for each integer. So, that's like, say,
>> for fitting the range into 4K, for each partition, with
>> there being 2^8 of those in a megabyte, or that many
>> partitions (512), or about a megabyte in space for each
>> partition, but really where these are just variables,
>> because it's opportunistic, and the ranges can start
>> with just 32 or 64 bits figuring that most partitions
>> are sparse, also, in this case, though usually it would
>> be expected they are half-full.
>>
>> There are as many of these ranges as the hash is split
>> into numbers, is the idea.
>>
>> Then the idea is that these ranges are pyramidal in the
>> sense, that when doing lookup for the ID, is starting
>> from the top of the pyramid, projecting the hash number
>> into the range bit string, with one bit for each sub-range,
>> so it's branchless, and'ing the number bits and the partition
>> range together, and if any of the hash splits isn't in the
>> range, a branch, dropping the partition pyramid, else,
>> descending into the partition pyramid.
>>
>> (Code without branches can go a lot faster than
>> code with lots of branches, if/then.)
>>
>> At each level of the pyramid, it's figured that only one
>> of the partitions will not be excluded, except for hash
>> collisions, then if it's a base level to commence bsearch,
>> else to drop the other partition pyramids, and continue
>> with the reduced set of ranges in RAM, and the projected
>> bits of the ID's hash integer.
>>
>> The ranges don't even really have to be constant if it's
>> so that there's a limit so they're under a constant, then
>> according to uniformity they only have so many, eg,
>> just projecting out their 1's, so the partition pyramid
>> digging sort of always finds one or more partitions
>> with possible matches, those being hash collisions or
>> messages duplicated across groups, and mostly finds
>> those with exclusions, so that it results reducing, for
>> example that empty groups are dropped right off
>> though not being skipped, while full groups then
>> get into needing more than constant space and
>> constant time to search.
>>
>> Of course if all the partitions miss then it's
>> also a fast exit that none have the ID.
>>
>> So, this, "partition pyramid hash filter", with basically,
>> "constant and configurable space and time", basically
>> has that because Message Id's will only exist in one or
>> a few partitions, and for a single group and not across
>> about all groups, exactly one, and the hash is uniform, so
>> that hash collisions are low, and the partitions aren't
>> overfilled, so that hash collisions are low, then it sort
>> of results all the un-used partitions at rest, don't fill
>> up in n^2 space the log 2 n hash-map search. Then,
>> they could, if there was spare space, and it made sense
>> that in the write-once-read-many world it was somehow
>> many instead of never, a usual case, or, just using a
>> list of sorted message Id's in the partition and bsearch,
>> this can map the file without loading its contents in
>> space, except as ephemerally, or the usual disk controller's
>> mmap space, or "ready-time" and "ephemeral-space".
>>
>> In this sort of way there's no resident RAM for the partitions
>> except each one with a fixed-size arithmetic hash stamp,
>> while lookups have a fixed or constant cost, plus then
>> also a much smaller usual log 2 time / n^2 space trade-off,
>> while memory-mapping active files automatically caches.
>>
>>
>> So, the idea is to combine the BFF backing file format
>> and LFF library file format ideas, with that the group x date
>> partitions make the for archive and active partitions,
>> then to have constant-time/constant-space partition
>> pyramid arithmetic hash range for lookup, then
>> ready-time/ephemeral-space lookup in partitions,
>> then that the maintenance of the pyramid tree,
>> happens with dropping partitions, while just
>> accumulating with adding partitions.
>>
>> Yeah, I know that a usual idea is just to make a hash map
>> after an associative array with log 2 n lookup in n^2 space,
>> that maintenance is in adding and removing items,
>> here the idea is to have partitions above items,
>> and sort of naturally to result "on startup, find
>> the current partitions, compose their partition pyramid,
>> then run usually constant-time/constant-space in that
>> then ready-time/ephemeral-space under that,
>> maintenance free", then that as active partitions
>> being written roll over to archive partitions being
>> finished, then they just get added to the pyramid
>> and their ranges or'ed up into the pyramid.
>>
>> Hmm... 32K or 2^15 groups, 16K or 2^14 days, or
>> about 40 years of Usenet in partitions, 2^29,
>> about 2^8 per megabyte or about 2^20 or one
>> gigabyte RAM, or, just a file, then memory-mapping
>> the partition pyramid file, figuring again that
>> most partitions are not resident in RAM,
>> this seems a sort of good simple idea to
>> implement lookup by Message ID over 2^30 many.
>>
>> I mean if "text Usenet for all time is about a billion messages",
>> it seems around that size.
>>
>>
>
>
>
> So, trying to figure out if this "arithmetic hash range
> pyramidal partition" data structure is actually sort of
> reasonable, gets into that it involves finding a balance
> in what's otherwise a very well-understood trade-off,
> in terms of the cost of a lookup, over time, and then
> especially as whether an algorithm is "scale-able",
> that even a slightly lesser algorithm might be better
> if it results "scale-able", especially if it breaks down
> to a very, very minimal set of resources, in time,
> and in various organizations of space, or distance,
> which everybody knows as CPU, RAM, and DISK,
> in terms of time, those of lookups per second,
> and particularly where parallelizable as with
> regards to both linear speed-up and also immutable
> data structures, or, clustering. ("Scale.")
>
>
> Then it's probably so that the ranges are pretty small,
> because they double, and whether it's best just to
> have an overall single range, or, refinements of it,
> according to a "factor", a "factor" that represents
> how likely it is that hashes don't collide in the range,
> or that they do.
>
> This is a different way of looking at hash collisions,
> besides that two objects have the same hash,
> just that they're in the same partition of the range
> their integer value, for fixed-length uniform hashes.
>
> I.e., a hash collision proper would always be a
> redundant or order-dependent dig-up, of a sort,
> where the idea is that the lookup first results
> searching the pyramid plan for possibles, then
> digging up each of those and checking for match.
>
> The idea that group x date sort of has that those
> are about on the same order is a thing, then about
> the idea that "category" and "year" are similarly
> about so,
>
> Big8 x year
> group x date
>
> it's very contrived to have those be on the same
> order, in terms of otherwise partitioning, or about
> what it results that "partitions are organized so that
> their partitions are tuples and the tuples are about
> on the same order, so it goes, thus that uniformity
> of hashes, results being equi-distributed in those,
> so that it results the factor is good and that arithmetic
> hash ranges filter out most of the partitions, and,
> especially that there aren't many false-positive dig-up
> partitions.
>
> It's sort of contrived, but then it does sort of make
> it so that also other search concerns like "only these
> groups or only these years anyways", naturally get
> dropped out at the partition layer, and, right in the
> front of the lookup algorithm.
>
> It's pretty much expected though that there would
> be non-zero false-positive dig-ups, where here a dig-up
> is that the arithmetic hash range matched, but it's
> actually a different Message ID's hash in the range,
> and not the lookup value(s).
>
> Right, so just re-capping here a bit, the idea is that
> there are groups, and dates, and for each is a zip file,
> which is a collection of files in a file-system entry file
> with about random access on the zip file each entry,
> and compressed, and the entries include Messages,
> by their Message ID's, then that the entries are
> maybe in sub-directories, that reflect components
> of the Message ID's hash, where a hash, is a fixed-length
> value, like 64 bytes or 128 bytes, or a power of two
> and usually an even power of two thus a multiple of four,
> thus that a 64 byte hash has 2^64 * 2^8 many possible
> values, then that a range, of length R bits, has R many
> partitions, in terms of the hash size and the range size,
> whether the factor is low enough, that most partitions
> will naturally be absent most ranges, because hashes
> can only be computed from Message ID's, not by their
> partitions or other information like the group or date.
>
> So, if there are 2^30 or a billion messages, then a
> 32 bit hash, would have a fair expectation that
> unused values would be not dense, then for
> what gets into "birthday problem" or otherwise
> how "Dirichlet principle" makes for how often
> are hash collisions, for how often are range collisions,
> either making redundant dig-ups, in the way this
> sort of algorithm services look-ups.
>
> The 32 bits is quite a bit less than 64 * 8, though,
> about whether it would also result, that, splitting
> that into subdirectories, results different organizations
> here about "tuned to Usenet-scale and organization",
> vis-a-vis, "everybody's email" or something like that.
> That said, it shouldn't just fall apart if the size or
> count blows up, though it might be expect then
> a various sorts of partitioning, to keep the partition
> tuple orders square, or on the same orders.
>
>
> The md5 is widely available, "md5sum", it's 128 bits,
> its output is hexadecimal characters, 32-many.
>
> https://en.wikipedia.org/wiki/MD5
> https://en.wikipedia.org/wiki/Partition_(database)
> https://en.wikipedia.org/wiki/Hash_function#Uniformity
>
> Otherwise the only goal of the hash is to be uniform,
> and also to have "avalanche criterion", so that near Message-Id's
> will still be expected to have different hashes, as it's not
> necessarily expected that they're the same group and
> date, though that would be a thing, yet Message ID's
> should be considered opaque and not seated together.
>
> Then MD5 is about the most usual hash utility laying
> around, if not SHA-1, or SHA-256. Hmm..., in the
> interests of digital preservation is "the tools for
> any algorithms should also be around forever",
> one of those things.
>
> So anyways, then each group x date has its Message ID's,
> each of those has its hash, each of those fits in a range,
> indicating one bit in the range where it is, then those are
> OR'd together to result a bit-mask of the range, then
> that a lookup can check its hash's bit against the range,
> and dig-up the partition if it's in, or, skip the partition
> if it's not, with the idea that the range is big enough
> and the resulting group x date is small enough, that
> the "pyramidal partition", is mostly sparse, at the lower
> levels, that it's mostly "look-arounds" until finally the
> "dig-ups", in the leaf nodes of the pyramidal partitions.
>
> I.e., the dig-ups will eventually include spurious or
> redundant false-positives, that the algorithm will
> access the leaf partitions at uniform random.
>
> The "pyramidal" then also get into both the empties,
> like rec.calm with zero posts ten years running,
> or alt.spew which any given day exceeds zip files
> or results a lot of "zip format, but the variously
> packaged, not-recompressed binaries", the various
> other use cases than mostly at-rest and never-read
> archival purposes. The idea of the "arithmetic hash
> range pyramidal partition" is that mostly the
> leaf partitions are quite small and sparse, and
> mostly the leveling of the pyramid into year/month/date
> and big8/middle/group, as it were, winnows those
> down in what's a constant-rate constant-space scan
> on the immutable data structure of the partition pyramid.
>
> Yeah, I know, "numbers", here though the idea is
> that about 30K groups at around 18K days = 50 years
> makes about 30 * 20 * million or less than a billion
> files the zip files, which would all fit on a volume
> that supports up to four billion-many files, or an
> object-store, then with regards to that most of
> those would be quite small or even empty,
> then with regards to "building the pyramid",
> the levels big8/middle/group X year/month/date,
> the data structure of the hashes marking the ranges,
> then those themselves resulting a file, which are
> basically the entire contents of allocated RAM,
> or for that matter a memory-mapped file, with
> the idea that everything else is ephemeral RAM.
>
>
>



Wonder about the pyramidal partition arithmetic range hash
some more, with figuring out how to make it so that
the group x date grid of buckets, has a reasonably
well-defined run-time, while using a minimal amount
of memory, or a tunable amount giving performance,
for a well-defined constant resource, that's constant
and fully re-entrant with regards to parallel lookups.

The idea is to implement the lookup by message-id,
where messages are in buckets or partitions basically
according to group x date,

a.b.c/yyyy/mmdd/0.zip
a.b.c/yyyy/mmdd/0.pyr

with the idea of working up so that the groups,
on the order of 30K or so, and days, on the order
of 15K or so, have that mostly also the posts are
pretty sparse over all the groups and dates,
with the idea that absence and presence in
the file-system or object-store result for usual
sorts lookups, that search hits would be associated
with a message-id, then to look it up in any group
it was posted to, then across those or concomitantly,
the idea that cross-posts exist in duplicate data
across each partition.

a/b.c/yyyy/mmdd

yyyy/mmdd/a/b.c

The idea is that yyyy is on the order of 40 or 50,
while mmdd is 365, with the idea of having "0000"
for example as placeholders for otherwise dateless
posts sort of found in the order, and that 'a' is about
on the order of 30 or 40, all beyond the Big 8, then
that after finding matches in those, which would
be expected to be pretty dense in those, where
the message-id is hashed, then split into four pieces
and each of those a smaller uniform hash, then
it's value in then the range, simply |'s into the range
bits, then diving into the next level of the pyramid,
and those that match, and those that match, and
so on, serially yet parallelizably, until finding the
group's date files to dig, then actually looking
into the file of message-ids.

a/b.c/yyyy/mmdd/0.zip
a/b.c/yyyy/mmdd/0.pyr
a/b.c/yyyy/mmdd/0.ids

a/b.c/yyyy/mmdd.pyr
a/b.c/yyyy.pyr
a/b.c.pyr
a/pyr

yyyy/mmdd/a/b.c.pyr
yyyy/mmdd/a.pyr
yyyy/mmdd.pyr
yyyy.pyr

One can see here that "building the pyramid" is
pretty simply, it's a depth-first sort of traversal
to just "or" together the lower level's .pyramid files,
then usually for the active or recent besides the
archival or older, those just being checked for
when usually lookups are for recent. The maintenance
or re-building the pyramid, has a basic invalidation
routine, where lastModifiedTime is reliable, or
for example a signature or even just a checksum,
or that anyways the rebuilding the data structure's
file backing is just a filesystem operation of a usual sort.

Then, with like a 16KiB or so, range, is basically
about 4KiB for each the 4 hashes, so any hash-miss
results a drop, then that's about 16 kibibits,
about as above usual or a default hash for
the message-id's, where it's also designed that
/h1/h2/h3/h4/message-id results a file-system
depth that keeps the directory size within usual
limits of filesystems and archival package files,
of all the files, apiece.

Then, a megabyte of RAM or so, 2^20, then with
regards to 2^10 2^4, is about 2^6 = 64 of those
per megabyte.

30K groups x 15K days ~ 450M group days, hmm, ...,
not planning on fitting that into RAM.

2 groups x 18262 days, 36K, that should fit,
or, 32768 = 2^15, say, by 2^6 is about 2^9 or
512 megabytes RAM, hmm..., figuring lookups
in that would be at about 1 GHz / 512 MiB
about half a second, ....

The idea is that message by group-number are just
according to the partitions adding up counts,
then lookup by message-Id is that whatever
results search that returns a message-id for hits,
then has some reasonable lookup for message-id.
Ross Finlayson
2024-04-20 18:24:49 UTC
Reply
Permalink
Well I've been thinking about the re-routine as a model of cooperative
multithreading,
then thinking about the flow-machine of protocols

NNTP
IMAP <-> NNTP
HTTP <-> IMAP <-> NNTP

Both IMAP and NNTP are session-oriented on the connection, while,
HTTP, in terms of session, has various approaches in terms of HTTP 1.1
and connections, and the session ID shared client/server.


The re-routine idea is this, that each kind of method, is memoizable,
and, it memoizes, by object identity as the key, for the method, all
its callers, how this is like so.

interface Reroutine1 {

Result1 rr1(String a1) {

Result2 r2 = reroutine2.rr2(a1);

Result3 r3 = reroutine3.rr3(r2);

return result(r2, r3);
}

}


The idea is that the executor, when it's submitted a reroutine,
when it runs the re-routine, in a thread, then it puts in a ThreadLocal,
the re-routine, so that when a re-routine it calls, returns null as it
starts an asynchronous computation for the input, then when
it completes, it submits to the executor the re-routine again.

Then rr1 runs through again, retrieving r2 which is memoized,
invokes rr3, which throws, after queuing to memoize and
resubmit rr1, when that calls back to resubmit r1, then rr1
routines, signaling the original invoker.

Then it seems each re-routine basically has an instance part
and a memoized part, and that it's to flush the memo
after it finishes, in terms of memoizing the inputs.


Result 1 rr(String a1) {
// if a1 is in the memo, return for it
// else queue for it and carry on

}


What is a re-routine?

It's a pattern for cooperative multithreading.

It's sort of a functional approach to functions and flow.

It has a declarative syntax in the language with usual flow-of-control.

So, it's cooperative multithreading so it yields?

No, it just quits, and expects to be called back.

So, if it quits, how does it complete?

The entry point to re-routine provides a callback.

Re-routines only return results to other re-routines,
It's the default callback. Otherwise they just callback.

So, it just quits?

If a re-routine gets called with a null, it throws.

If a re-routine gets a null, it just continues.

If a re-routine completes, it callbacks.

So, can a re-routine call any regular code?

Yeah, there are some issues, though.

So, it's got callbacks everywhere?

Well, it's just got callbacks implicitly everywhere.

So, how does it work?

Well, you build a re-routine with an input and a callback,
you call it, then when it completes, it calls the callback.

Then, re-routines call other re-routines with the argument,
and the callback's in a ThreadLocal, and the re-routine memoizes
all of its return values according to the object identity of the inputs,
then when a re-routine completes, it calls again with another ThreadLocal
indicating to delete the memos, following the exact same flow-of-control
only deleting the memos going along, until it results all the memos in
the re-routines for the interned or ref-counted input are deleted,
then the state of the re-routine is de-allocated.

So, it's sort of like a monad and all in pure and idempotent functions?

Yeah, it's sort of like a monad and all in pure and idempotent functions.

So, it's a model of cooperative multithreading, though with no yield,
and callbacks implicitly everywhere?

Yeah, it's sort of figured that a called re-routine always has a
callback in the ThreadLocal, because the runtime has pre-emptive
multithreading anyways, that the thread runs through its re-routines in
their normal declarative flow-of-control with exception handling, and
whatever re-routines or other pure monadic idempotent functions it
calls, throw when they get null inputs.

Also it sort of doesn't have primitive types, Strings must always be
interned, all objects must have a distinct identity w.r.t. ==, and null
is never an argument or return value.

So, what does it look like?

interface Reroutine1 {

Result1 rr1(String a1) {

Result2 r2 = reroutine2.rr2(a1);

Result3 r3 = reroutine3.rr3(r2);

return result(r2, r3);
}

}

So, I expect that to return "result(r2, r3)".

Well, that's synchronous, and maybe blocking, the idea is that it calls
rr2, gets a1, and rr2 constructs with the callback of rr1 and it's own
callback, and a1, and makes a memo for a1, and invokes whatever is its
implementation, and returns null, then rr1 continues and invokes rr3
with r2, which is null, so that throws a NullPointerException, and rr1
quits.

So, ..., that's cooperative multithreading?

Well you see what happens is that rr2 invoked another re-routine or end
routine, and at some point it will get called back, and that will happen
over and over again until rr2 has an r2, then rr2 will memoize (a1, r2),
and then it will callback rr1.

Then rr1 had quit, it runs again, this time it gets r2 from the (a1,
r2) memo in the monad it's building, then it passes a non-null r2 to
rr3, which proceeds in much the same way, while rr1 quits again until
rr3 calls it back.

So, ..., it's non-blocking, because it just quits all the time, then
happens to run through the same paces filling in?

That's the idea, that re-routines are responsible to build the monad
and call-back.

So, can I just implement rr2 and rr3 as synchronous and blocking?

Sure, they're interfaces, their implementation is separate. If they
don't know re-routine semantics then they're just synchronous and
blocking. They'll get called every time though when the re-routine gets
called back, and actually they need to know the semantics of returning
an Object or value by identity, because, calling equals() to implement
Memo usually would be too much, where the idea is to actually function
only monadically, and that given same Object or value input, must return
same Object or value output.

So, it's sort of an approach as a monadic pure idempotency?

Well, yeah, you can call it that.

So, what's the point of all this?

Well, the idea is that there are 10,000 connections, and any time one
of them demultiplexes off the connection an input command message, then
it builds one of these with the response input to the demultiplexer on
its protocol on its connection, on the multiplexer to all the
connections, with a callback to itself. Then the re-routine is launched
and when it returns, it calls-back to the originator by its
callback-number, then the output command response writes those back out.

The point is that there are only as many Theads as cores so the goal is
that they never block,
and that the memos make for interning Objects by value, then the goal is
mostly to receive command objects and handles to request bodies and
result objects and handles to response bodies, then to call-back with
those in whatever serial order is necessary, or not.

So, won't this run through each of these re-routines umpteen times?

Yeah, you figure that the runtime of the re-routine is on the order of
n^2 the order of statements in the re-routine.

So, isn't that terrible?

Well, it doesn't block.

So, it sounds like a big mess.

Yeah, it could be. That's why to avoid blocking and callback
semantics, is to make monadic idempotency semantics, so then the
re-routines are just written in normal synchronous flow-of-control, and
they're well-defined behavior is exactly according to flow-of-control
including exception-handling.

There's that and there's basically it only needs one Thread, so, less
Thread x stack size, for a deep enough thread call-stack. Then the idea
is about one Thread per core, figuring for the thread to always be
running and never be blocking.

So, it's just normal flow-of-control.

Well yeah, you expect to write the routine in normal flow-of-control,
and to test it with synchronous and in-memory editions that just run
through synchronously, and that if you don't much care if it blocks,
then it's the same code and has no semantics about the asynchronous or
callbacks actually in it. It just returns when it's done.


So what's the requirements of one of these again?

Well, the idea is, that, for a given instance of a re-routine, it's an
Object, that implements an interface, and it has arguments, and it has a
return value. The expectation is that the re-routine gets called with
the same arguments, and must return the same return value. This way
later calls to re-routines can match the same expectation, same/same.

Also, if it gets different arguments, by Object identity or primitive
value, the re-routine must return a different return value, those being
same/same.

The re-routine memoizes its arguments by its argument list, Object or
primitive value, and a given argument list is same if the order and
types and values of those are same, and it must return the same return
value by type and value.

So, how is this cooperative multithreading unobtrusively in
flow-of-control again?

Here for example the idea would be, rr2 quits and rr1 continues, rr3
quits and rr1 continues, then reaching rr4, rr4 throws and rr1 quits.
When rr2's or rr3's memo-callback completes, then it calls-back rr1. as
those come in, at some point rr4 will be fulfilled, and thus rr4 will
quit and rr1 will quit. When rr4's callback completes, then it will
call-back rr1, which will finally complete, and then call-back whatever
called r1. Then rr1 runs itself through one more time to
delete or decrement all its memos.

interface Reroutine1 {

Result1 rr1(String a1) {

Result2 r2 = reroutine2.rr2(a1);

Result3 r3 = reroutine3.rr3(a1);

Result4 r4 = reroutine4.rr4(a1, r2, r3);

return Result1.r4(a1, r4);
}

}

The idea is that it doesn't block when it launchs rr2 and rr3, until
such time as it just quits when it tries to invoke rr4 and gets a
resulting NullPointerException, then eventually rr4 will complete and be
memoized and call-back rr1, then rr1 will be called-back and then
complete, then run itself through to delete or decrement the ref-count
of all its memo-ized fragmented monad respectively.

Thusly it's cooperative multithreading by never blocking and always just
launching callbacks.

There's this System.identityHashCode() method and then there's a notion
of Object pools and interning Objects then as for about this way that
it's about numeric identity instead of value identity, so that when
making memo's that it's always "==" and for a HashMap with
System.identityHashCode() instead of ever calling equals(), when calling
equals() is more expensive than calling == and the same/same
memo-ization is about Object numeric value or the primitive scalar
value, those being same/same.

https://docs.oracle.com/javase/8/docs/api/java/lang/System.html#identityHashCode-java.lang.Object-

So, you figure to return Objects to these connections by their session
and connection and mux/demux in these callbacks and then write those out?

Well, the idea is to make it so that according to the protocol, the
back-end sort of knows what makes a handle to a datum of the sort, given
the protocol and the protocol and the protocol, and the callback is just
these handles, about what goes in the outer callbacks or outside the
re-routine, those can be different/same. Then the single writer thread
servicing the network I/O just wants to transfer those handles, or, as
necessary through the compression and encryption codecs, then write
those out, well making use of the java.nio for scatter/gather and vector
I/O in the non-blocking and asynchronous I/O as much as possible.


So, that seems a lot of effort to just passing the handles, ....

Well, I don't want to write any code except normal flow-of-control.

So, this same/same bit seems onerous, as long as different/same has a
ref-count and thus the memo-ized monad-fragment is maintained when all
sorts of requests fetch the same thing.

Yeah, maybe you're right. There's much to be gained by re-using monadic
pure idempotent functions yet only invoking them once. That gets into
value equality besides numeric equality, though, with regards to going
into re-routines and interning all Objects by value, so that inside and
through it's all "==" and System.identityHashCode, the memos, then about
the ref-counting in the memos.


So, I suppose you know HTTP, and about HTTP/2 and IMAP and NNTP here?

Yeah, it's a thing.

So, I think this needs a much cleaner and well-defined definition, to
fully explore its meaning.

Yeah, I suppose. There's something to be said for reading it again.
Ross Finlayson
2024-04-22 17:06:02 UTC
Reply
Permalink
On 04/20/2024 11:24 AM, Ross Finlayson wrote:
>
>
> Well I've been thinking about the re-routine as a model of cooperative
> multithreading,
> then thinking about the flow-machine of protocols
>
> NNTP
> IMAP <-> NNTP
> HTTP <-> IMAP <-> NNTP
>
> Both IMAP and NNTP are session-oriented on the connection, while,
> HTTP, in terms of session, has various approaches in terms of HTTP 1.1
> and connections, and the session ID shared client/server.
>
>
> The re-routine idea is this, that each kind of method, is memoizable,
> and, it memoizes, by object identity as the key, for the method, all
> its callers, how this is like so.
>
> interface Reroutine1 {
>
> Result1 rr1(String a1) {
>
> Result2 r2 = reroutine2.rr2(a1);
>
> Result3 r3 = reroutine3.rr3(r2);
>
> return result(r2, r3);
> }
>
> }
>
>
> The idea is that the executor, when it's submitted a reroutine,
> when it runs the re-routine, in a thread, then it puts in a ThreadLocal,
> the re-routine, so that when a re-routine it calls, returns null as it
> starts an asynchronous computation for the input, then when
> it completes, it submits to the executor the re-routine again.
>
> Then rr1 runs through again, retrieving r2 which is memoized,
> invokes rr3, which throws, after queuing to memoize and
> resubmit rr1, when that calls back to resubmit r1, then rr1
> routines, signaling the original invoker.
>
> Then it seems each re-routine basically has an instance part
> and a memoized part, and that it's to flush the memo
> after it finishes, in terms of memoizing the inputs.
>
>
> Result 1 rr(String a1) {
> // if a1 is in the memo, return for it
> // else queue for it and carry on
>
> }
>
>
> What is a re-routine?
>
> It's a pattern for cooperative multithreading.
>
> It's sort of a functional approach to functions and flow.
>
> It has a declarative syntax in the language with usual
> flow-of-control.
>
> So, it's cooperative multithreading so it yields?
>
> No, it just quits, and expects to be called back.
>
> So, if it quits, how does it complete?
>
> The entry point to re-routine provides a callback.
>
> Re-routines only return results to other re-routines,
> It's the default callback. Otherwise they just callback.
>
> So, it just quits?
>
> If a re-routine gets called with a null, it throws.
>
> If a re-routine gets a null, it just continues.
>
> If a re-routine completes, it callbacks.
>
> So, can a re-routine call any regular code?
>
> Yeah, there are some issues, though.
>
> So, it's got callbacks everywhere?
>
> Well, it's just got callbacks implicitly everywhere.
>
> So, how does it work?
>
> Well, you build a re-routine with an input and a callback,
> you call it, then when it completes, it calls the callback.
>
> Then, re-routines call other re-routines with the argument,
> and the callback's in a ThreadLocal, and the re-routine memoizes
> all of its return values according to the object identity of the
> inputs,
> then when a re-routine completes, it calls again with another
> ThreadLocal
> indicating to delete the memos, following the exact same
> flow-of-control
> only deleting the memos going along, until it results all the memos in
> the re-routines for the interned or ref-counted input are deleted,
> then the state of the re-routine is de-allocated.
>
> So, it's sort of like a monad and all in pure and idempotent functions?
>
> Yeah, it's sort of like a monad and all in pure and idempotent
> functions.
>
> So, it's a model of cooperative multithreading, though with no yield,
> and callbacks implicitly everywhere?
>
> Yeah, it's sort of figured that a called re-routine always has a
> callback in the ThreadLocal, because the runtime has pre-emptive
> multithreading anyways, that the thread runs through its re-routines in
> their normal declarative flow-of-control with exception handling, and
> whatever re-routines or other pure monadic idempotent functions it
> calls, throw when they get null inputs.
>
> Also it sort of doesn't have primitive types, Strings must always
> be interned, all objects must have a distinct identity w.r.t. ==, and
> null is never an argument or return value.
>
> So, what does it look like?
>
> interface Reroutine1 {
>
> Result1 rr1(String a1) {
>
> Result2 r2 = reroutine2.rr2(a1);
>
> Result3 r3 = reroutine3.rr3(r2);
>
> return result(r2, r3);
> }
>
> }
>
> So, I expect that to return "result(r2, r3)".
>
> Well, that's synchronous, and maybe blocking, the idea is that it
> calls rr2, gets a1, and rr2 constructs with the callback of rr1 and it's
> own callback, and a1, and makes a memo for a1, and invokes whatever is
> its implementation, and returns null, then rr1 continues and invokes rr3
> with r2, which is null, so that throws a NullPointerException, and rr1
> quits.
>
> So, ..., that's cooperative multithreading?
>
> Well you see what happens is that rr2 invoked another re-routine or
> end routine, and at some point it will get called back, and that will
> happen over and over again until rr2 has an r2, then rr2 will memoize
> (a1, r2), and then it will callback rr1.
>
> Then rr1 had quit, it runs again, this time it gets r2 from the
> (a1, r2) memo in the monad it's building, then it passes a non-null r2
> to rr3, which proceeds in much the same way, while rr1 quits again until
> rr3 calls it back.
>
> So, ..., it's non-blocking, because it just quits all the time, then
> happens to run through the same paces filling in?
>
> That's the idea, that re-routines are responsible to build the
> monad and call-back.
>
> So, can I just implement rr2 and rr3 as synchronous and blocking?
>
> Sure, they're interfaces, their implementation is separate. If
> they don't know re-routine semantics then they're just synchronous and
> blocking. They'll get called every time though when the re-routine gets
> called back, and actually they need to know the semantics of returning
> an Object or value by identity, because, calling equals() to implement
> Memo usually would be too much, where the idea is to actually function
> only monadically, and that given same Object or value input, must return
> same Object or value output.
>
> So, it's sort of an approach as a monadic pure idempotency?
>
> Well, yeah, you can call it that.
>
> So, what's the point of all this?
>
> Well, the idea is that there are 10,000 connections, and any time
> one of them demultiplexes off the connection an input command message,
> then it builds one of these with the response input to the demultiplexer
> on its protocol on its connection, on the multiplexer to all the
> connections, with a callback to itself. Then the re-routine is launched
> and when it returns, it calls-back to the originator by its
> callback-number, then the output command response writes those back out.
>
> The point is that there are only as many Theads as cores so the
> goal is that they never block,
> and that the memos make for interning Objects by value, then the goal is
> mostly to receive command objects and handles to request bodies and
> result objects and handles to response bodies, then to call-back with
> those in whatever serial order is necessary, or not.
>
> So, won't this run through each of these re-routines umpteen times?
>
> Yeah, you figure that the runtime of the re-routine is on the order
> of n^2 the order of statements in the re-routine.
>
> So, isn't that terrible?
>
> Well, it doesn't block.
>
> So, it sounds like a big mess.
>
> Yeah, it could be. That's why to avoid blocking and callback
> semantics, is to make monadic idempotency semantics, so then the
> re-routines are just written in normal synchronous flow-of-control, and
> they're well-defined behavior is exactly according to flow-of-control
> including exception-handling.
>
> There's that and there's basically it only needs one Thread, so,
> less Thread x stack size, for a deep enough thread call-stack. Then the
> idea is about one Thread per core, figuring for the thread to always be
> running and never be blocking.
>
> So, it's just normal flow-of-control.
>
> Well yeah, you expect to write the routine in normal
> flow-of-control, and to test it with synchronous and in-memory editions
> that just run through synchronously, and that if you don't much care if
> it blocks, then it's the same code and has no semantics about the
> asynchronous or callbacks actually in it. It just returns when it's done.
>
>
> So what's the requirements of one of these again?
>
> Well, the idea is, that, for a given instance of a re-routine, it's
> an Object, that implements an interface, and it has arguments, and it
> has a return value. The expectation is that the re-routine gets called
> with the same arguments, and must return the same return value. This
> way later calls to re-routines can match the same expectation, same/same.
>
> Also, if it gets different arguments, by Object identity or
> primitive value, the re-routine must return a different return value,
> those being same/same.
>
> The re-routine memoizes its arguments by its argument list, Object
> or primitive value, and a given argument list is same if the order and
> types and values of those are same, and it must return the same return
> value by type and value.
>
> So, how is this cooperative multithreading unobtrusively in
> flow-of-control again?
>
> Here for example the idea would be, rr2 quits and rr1 continues, rr3
> quits and rr1 continues, then reaching rr4, rr4 throws and rr1 quits.
> When rr2's or rr3's memo-callback completes, then it calls-back rr1. as
> those come in, at some point rr4 will be fulfilled, and thus rr4 will
> quit and rr1 will quit. When rr4's callback completes, then it will
> call-back rr1, which will finally complete, and then call-back whatever
> called r1. Then rr1 runs itself through one more time to
> delete or decrement all its memos.
>
> interface Reroutine1 {
>
> Result1 rr1(String a1) {
>
> Result2 r2 = reroutine2.rr2(a1);
>
> Result3 r3 = reroutine3.rr3(a1);
>
> Result4 r4 = reroutine4.rr4(a1, r2, r3);
>
> return Result1.r4(a1, r4);
> }
>
> }
>
> The idea is that it doesn't block when it launchs rr2 and rr3, until
> such time as it just quits when it tries to invoke rr4 and gets a
> resulting NullPointerException, then eventually rr4 will complete and be
> memoized and call-back rr1, then rr1 will be called-back and then
> complete, then run itself through to delete or decrement the ref-count
> of all its memo-ized fragmented monad respectively.
>
> Thusly it's cooperative multithreading by never blocking and always just
> launching callbacks.
>
> There's this System.identityHashCode() method and then there's a notion
> of Object pools and interning Objects then as for about this way that
> it's about numeric identity instead of value identity, so that when
> making memo's that it's always "==" and for a HashMap with
> System.identityHashCode() instead of ever calling equals(), when calling
> equals() is more expensive than calling == and the same/same
> memo-ization is about Object numeric value or the primitive scalar
> value, those being same/same.
>
> https://docs.oracle.com/javase/8/docs/api/java/lang/System.html#identityHashCode-java.lang.Object-
>
>
> So, you figure to return Objects to these connections by their session
> and connection and mux/demux in these callbacks and then write those out?
>
> Well, the idea is to make it so that according to the protocol, the
> back-end sort of knows what makes a handle to a datum of the sort, given
> the protocol and the protocol and the protocol, and the callback is just
> these handles, about what goes in the outer callbacks or outside the
> re-routine, those can be different/same. Then the single writer thread
> servicing the network I/O just wants to transfer those handles, or, as
> necessary through the compression and encryption codecs, then write
> those out, well making use of the java.nio for scatter/gather and vector
> I/O in the non-blocking and asynchronous I/O as much as possible.
>
>
> So, that seems a lot of effort to just passing the handles, ....
>
> Well, I don't want to write any code except normal flow-of-control.
>
> So, this same/same bit seems onerous, as long as different/same has a
> ref-count and thus the memo-ized monad-fragment is maintained when all
> sorts of requests fetch the same thing.
>
> Yeah, maybe you're right. There's much to be gained by re-using monadic
> pure idempotent functions yet only invoking them once. That gets into
> value equality besides numeric equality, though, with regards to going
> into re-routines and interning all Objects by value, so that inside and
> through it's all "==" and System.identityHashCode, the memos, then about
> the ref-counting in the memos.
>
>
> So, I suppose you know HTTP, and about HTTP/2 and IMAP and NNTP here?
>
> Yeah, it's a thing.
>
> So, I think this needs a much cleaner and well-defined definition, to
> fully explore its meaning.
>
> Yeah, I suppose. There's something to be said for reading it again.
>
>
>
>
>
>





ReRoutines: monadic functional non-blocking asynchrony in the language


Implementing a sort of Internet protocol server, it sort of has three or
four kinds of machines.

flow-machine: select/epoll hardware driven I/O events

protocol-establishment: setting up and changing protocol (commands,
encryption/compression)

protocol-coding: block coding in encryption/compression and wire/object
commands/results

routine: inside the objects of the commands of the protocol,
commands/results

Then, it often looks sort of like

flow <-> protocol <-> routine <-> protocol <-> flow


On either outer side of the flow is a connection, it's a socket or the
receipt or sending of a datagram, according to the network interface and
select/epoll.

The establishment of a protocol looks like
connection/configuration/commencement/conclusion, or setup/teardown.
Protocols get involved renegotiation within a protocol, and for example
upgrade among protocols. Then the protocol is setup and established.

The idea is that a protocol's coding is in three parts for
coding/decoding, compression/decompression, and (en)cryption/decryption,
or as it gets set up.

flow->decrypt->decomp->decod->routine->cod->comp->crypt->flow-v
flow<-crypt<-comp<-cod<-routine<-decod<-decomp<-decrypt<-flow<-



Whenever data arrives, the idea goes, is that the flow is interpreted
according to the protocol, resulting commands, then the routine derives
results from the commands, as by issuing others, in their protocols, to
the backend flow. Then, the results get sent back out through the
protocol, to the frontend, the clients of what it serves the protocol
the server.

The idea is that there are about 10,000 connections at a time, or more
or less.

flow <-> protocol <-> routine <-> protocol <-> flow
flow <-> protocol <-> routine <-> protocol <-> flow
flow <-> protocol <-> routine <-> protocol <-> flow
...




Then, the routine in the middle, has that there's one processor, and on
the processor are a number of cores, each one independent. Then, the
operating system establishes that each of the cores, has any number of
threads-of-control or threads, and each thread has the state of where it
is in the callstack of routines, and the threads are preempted so that
multithreading, that a core runs multiple threads, gives each thread
some running from the entry to the exit of the thread, in any given
interval of time. Each thread-of-control is thusly independent, while it
must synchronize with any other thread-of-control, to establish common
or mutual state, and threads establish taking turns by mutual exclusion,
called "mutex".

Into and out of the protocol, coding, is either a byte-sequence or
block, or otherwise the flow is a byte-sequence, that being serial,
however the protocol multiplexes and demultiplexes messages, the
commands and their results, to and from the flow.

Then the idea is that what arrives to/from the routine, is objects in
the protocol, or handles to the transport of byte sequences, in the
protocol, to the flow.

A usual idea is that there's a thread that services the flow, where, how
it works is that a thread blocks waiting for there to be any I/O,
input/output, reading input from the flow, and writing output to the
flow. So, mostly the thread that blocks has that there's one thread that
blocks on input, and when there's any input, then it reads or transfers
the bytes from the input, into buffers. That's its only job, and only
one thread can block on a given select/epoll selector, which is any
given number of ports, the connections, the idea being that it just
blocks until select returns for its keys of interest, it services each
of the I/O's by copying from the network interface's buffers into the
program's buffers, then other threads do the rest.

So, if a thread results waiting at all for any other action to complete
or be ready, it's said to "block". While a thread is blocked, the CPU or
core just skips it in scheduling the preemptive multithreading, yet it
still takes some memory and other resources and is in the scheduler of
the threads.

The idea that the I/O thread, ever blocks, is that it's a feature of
select/epoll that hardware results waking it up, with the idea that
that's the only thread that ever blocks.

So, for the other threads, in the decryption/decompression/decoding and
coding/compression/cryption, the idea is that a thread, runs through
those, then returns what it's doing, and joins back to a limited pool of
threads, with a usual idea of there being 1 core : 1 thread, so that
multithreading is sort of simplified, because as far as the system
process is concerned, it has a given number of cores and the system
preemptively multithreads it, and as far as the virtual machine is
concerned, is has a given number of cores and the virtual machine
preemptively multithreads its threads, about the thread-of-control, in
the flow-of-control, of the thing.

A usual way that the routine muliplexes and demultiplexes objects in the
protocol from a flow's input back to a flow's output, has that the
thread-per-connection model has that a single thread carries out the
entire task through the backend flow, blocking along the way, until it
results joining after writing back out to its connection. Yet, that has
a thread per each connection, and threads use scheduling and heap
resources. So, here thread-per-connection is being avoided.

Then, a usual idea of the tasks, is that as I/O is received and flows
into the decryption/decompression/decoding, then what's decoded, results
the specification of a task, the command, and the connection, where to
return its result. The specification is a data structure, so it's an
object or Object, then. This is added to a queue of tasks, where
"buffers" represent the ephemeral storage of content in transport the
byte-sequences, while, the queue is as usually a first-in/first-out
(FIFO) queue also, of tasks.

Then, the idea is that each of the cores consumes task specifications
from the task queue, performs them according to the task specification,
then the results are written out, as coded/compressed/crypted, in the
protocol.

So, to avoid the threads blocking at all, introduces the idea of
"asynchrony" or callbacks, where the idea is that the "blocking" and
"synchronous" has that anywhere in the threads' thread-of-control
flow-of-control, according to the program or the routine, it is current
and synchronous, the value that it has, then with regards to what it
returns or writes, as the result. So, "asynchrony" is the idea that
there's established a callback, or a place to pause and continue, then a
specification of the task in the protocol is put to an event queue and
executed, or from servicing the O/I's of the backend flow, that what
results from that, has the context of the callback and returns/writes to
the relevant connection, its result.

I -> flow -> protocol -> routine -> protocol -> flow -> O -v
O <- flow <- protocol <- routine <- protocol <- flow <- I <-


The idea of non-blocking then, is that a routine either provides a
result immediately available, and is non-blocking, or, queues a task
what results a callback that provides the result eventually, and is
non-blocking, and never invokes any other routine that blocks, so is
non-blocking.

This way a thread, executing tasks, always runs through a task, and thus
services the task queue or TQ, so that the cores' threads are always
running and never blocking. (Besides the I/O and O/I threads which block
when there's no traffic, and usually would be constantly woken up and
not waiting blocked.) This way, the TQ threads, only block when there's
nothing in the TQ, or are just deconstructed, and reconstructed, in a
"pool" of threads, the TQ's executor pool.

Enter the ReRoutine

The idea of a ReRoutine, a re-routine, is that it is a usual procedural
implementation as if it were synchronous, and agnostic of callbacks.

It is named after "routine" and "co-routine". It is a sort of co-routine
that builds a monad and is aware its originating caller, re-caller, and
callback, or, its re-routine caller, re-caller, and callback.

The idea is that there are callbacks implicitly at each method boundary,
and that nulls are reserved values to indicate the result or lack
thereof of re-routines, so that the code has neither callbacks nor any
nulls.

The originating caller has that the TQ, has a task specification, the
session+attachment of the client in the protocol where to write the
output, and the command, then the state of the monad of the task, that
lives on the heap with the task specification and task object. The TQ
consumers or executors or the executor, when a thread picks up the task,
it picks up or builds ("originates") the monad state, which is the
partial state of the re-routine and a memo of the partial state of the
re-routine, and installs this in the thread local storage or
ThreadLocal, for the duration of the invocation of the re-routine. Then
the thread enters the re-routine, which proceeds until it would block,
where instead it queues a command/task with callback to re-call it to
re-launch it, and throw a NullPointerException and quits/returns.

This happens recursively and iteratively in the re-routine implemented
as re-routines, each re-routine updates the partial state of the monad,
then that as a re-routine completes, it re-launches the calling
re-routine, until the original re-routine completes, and it calls the
original callback with the result.

This way the re-routine's method body, is written as plain declarative
procedural code, the flow-of-control, is exactly as if it were
synchronous code, and flow-of-control is exactly as if written in the
language with no callbacks and never nulls, and exception-handling as
exactly defined by the language.

As the re-routine accumulates the partial results, they live on the
heap, in the monad, as a member of the originating task's object the
task in the task queue. This is always added back to the queue as one of
the pending results of a re-routine, so it stays referenced as an object
on the heap, then that as it is completed and the original re-routine
returns, then it's no longer referenced and the garbage-collector can
reclaim it from the heap or the allocator can delete it.







Well, for the re-routine, I sort of figure there's a Callstack and a
Callback type

class Callstack {
Stack<Callback> callstack;
}

interface Callback {
void callback() throws Exception;
}

and then a placeholder sort of type for Callflush

class Callflush {
Callstack callstack;
}

with the idea that the presence in ThreadLocals is to be sorted out,
about a kind of ThreadLocal static pretty much.

With not returning null and for memoizing call-graph dependencies,
there's basically for an "unvoid" type.

class unvoid {

}

Then it's sort of figure that there's an interface with some defaults,
with the idea that some boilerplate gets involved in the Memoization.

interface Caller {}

interface Callee {}

class Callmemo {
memoize(Caller caller, Object[] args);
flush(Caller caller);
}


Then it seems that the Callstack should instead be of a Callgraph, and
then what's maintained from call to call is a Callpath, and then what's
memoized is all kept with the Callgraph, then with regards to objects on
the heap and their distinctness, only being reachable from the
Callgraph, leaving less work for the garbage collector, to maintain the
heap.

The interning semantics would still be on the class level, or for
constructor semantics, as with regards to either interning Objects for
uniqueness, or that otherwise they'd be memoized, with the key being the
Callpath, and the initial arguments into the Callgraph.

Then the idea seems that the ThreaderCaller, establishes the Callgraph
with respect to the Callgraph of an object, installing it on the thread,
otherwise attached to the Callgraph, with regards to the ReRoutine.



About the ReRoutine, it's starting to come together as an idea, what is
the apparatus for invoking re-routines, that they build the monad of the
IOE's (inputs, outputs, exceptions) of the re-routines in their
call-graph, in terms of ThreadLocals of some ThreadLocals that callers
of the re-routines, maintain, with idea of the memoized monad along the
way, and each original re-routine.

class IOE <O, E> {
Object[] input;
Object output;
Exception exception;
}

So the idea is that there are some ThreadLocal's in a static ThreadGlobal

public class ThreadGlobals {
public static ThreadLocal<MonadMemo> monadMemo;
}

where callers or originators or ReRoutines, keep a map of the Runnables
or Callables they have, to the MonadMemo's,

class Originator {
Map<? extends ReRoutineMapKey, MonadMemo> monadMemoMap;
}

then when it's about to invoke a Runnable, if it's a ReRoutine, then it
either retrieves the MonadMemo or makes a new one, and sets it on the
ThreadLocal, then invokes the Runnable, then clears the ThreadLocal.

Then a MonadMemo, pretty simply, is a List of IOE's, that when the
ReRoutine runs through the callgraph, the callstack is indicated by a
tree of integers, and the stack path in the ReRoutine, so that any
ReRoutine that calls ReRoutines A/B/C, points to an IOE that it finds in
the thing, then it's default behavior is to return its memo-ized value,
that otherwise is making the callback that fills its memo and re-invokes
all the way back the Original routine, or just its own entry point.

This is basically that the Originator, when the ReRoutine quits out,
sort of has that any ReRoutine it originates, also gets filled up by the
Originator.

So, then the Originator sort of has a map to a ReRoutine, then for any
Path, the Monad, so that when it sets the ThreadLocal with the
MonadMemo, it also sets the Path for the callee, launches it again when
its callback returned to set its memo and relaunch it, then back up the
path stack to the original re-routine.

One of the issues here is "automatic parallelization". What I mean by
that is that the re-routine just goes along and when it gets nulls
meaning "pending" it just continues along, then expects
NullPointerExceptions as "UnsatisifiedInput", to quit, figuring it gets
relaunched when its input is satisfied.

This way then when routines serially don't depend on each others'
outputs, then they all get launched apiece, parallelizing.

Then, I wonder about usual library code, basically about Collections and
Streams, and the usual sorts of routines that are applied to the
arguments, and how to basically establish that the rule of re-routine
code is that anything that gets a null must throw a
NullPointerException, so the re-routine will quit until the arguments
are satisfied, the inputs to library code. Then with the Memo being
stored in the MonadMemo, it's figured that will work out regardless the
Objects' or primitives' value, with regards to Collections and Stream
code and after usual flow-of-control in Iterables for the for loops, or
whatever other application library code, that they will be run each time
the re-routine passes their section with satisfied arguments, then as
with regards to, that the Memo is just whatever serial order the
re-routine passes, not needing to lookup by Object identity which is
otherwise part of an interning pattern.

void rr1(String s1) {

List<String> l1 = rr2.get(s1);

Map<String, String> m1 = new LinkedHashMap<>();

l1.stream().forEach(s -> m1.put(s, rr3.get(s)));

return m1;
}

See what I figure is that the order of the invocations to rr3.get() is
serial, so it really only needs to memoize its OE, Output|Exception,
then about that putting null values in the Map, and having to check the
values in the Map for null values, and otherwise to make it so that the
semantics of null and NullPointerException, result that satisfying
inputs result calls, and unsatisfying inputs result quits, figuring
those unsatisfying inputs are results of unsatisfied outputs, that will
be satisfied when the callee gets populated its memo and makes the callback.

If the order of invocations is out-of-order, gets again into whether the
Object/primitive by value needs to be the same each time, IOE, about the
library code in Collections, Streams, parallelStream, and Iterables, and
basically otherwise that any kind of library code, should throw
NullPointerException if it gets an "unexpected" null or what doesn't
fulfill it.

The idea though that rr3 will get invoked say 1000 times with the rr2's
result, those each make their call, then re-launch 1000 times, has that
it's figured that the Executor, or Originator, when it looks up and
loads the "ReRoutineMapKey", is to have the count of those and whether
the count is fulfilled, then to no-op later re-launches of the
call-backs, after all the results are populated in the partial monad memo.

Then, there's perhaps instead as that each re-routine just checks its
input or checks its return value for nulls, those being unsatisfied.

(The exception handling thoroughly or what happens when rr3 throws and
this kind of thing is involved thoroughly in library code.)

The idea is it remains correct if the worst thing nulls do is throw
NullPointerException, because that's just a usual quit and means another
re-launch is coming up, and that it automatically queues for
asynchronous parallel invocation each the derivations while resulting
never blocking.

It's figured that re-routines check their inputs for nulls, and throw
quit, and check their inputs for library container types, and checking
any member of a library container collection for null, to throw quit,
and then it will result that the automatic asynchronous parallelization
proceeds, while the re-routines are never blocking, there's only as much
memory on the heap of the monad as would be in the lifetime of the
original re-routine, and whatever re-calls or re-launches of the
re-routine established local state in local variables and library code,
would come in and out of scope according to plain stack unwinding.

Then there's still the perceived deficiency that the re-routine's method
body will be run many times, yet it's only run as many times as result
throwing-quit, when it reaches where its argument to the re-routine or
result value isn't yet satisfied yet is pending.

It would re-run the library code any number of times, until it results
all non-nulls, then the resulting satisfied argument to the following
re-routines, would be memo-ized in the monad, and the return value of
the re-routine thus returning immediately its value on the partial monad.

This way each re-call of the re-routine, mostly encounters its own monad
results in constant time, and throws-quit or gets thrown-quit only when
it would be unsatisfying, with the expectation that whatever
throws-quit, either NullPointerException or extending
NullPointerException, will have a pending callback, that will queue on a
TQ, the task specification to re-launch and re-enter the original or
derived, re-routine.

The idea is sort of that it's sort of, Java with non-blocking I/O and
ThreadLocal (1.7+, not 17+), or you know, C/C++ with non-blocking I/O
and thread local storage, then for the abstract or interface of the
re-routines, how it works out that it's a usual sort of model of
co-operative multithreading, the re-routine, the routine "in the language".


Then it's great that the routine can be stubbed or implemented agnostic
of asynchrony, and declared in the language with standard libraries,
basically using the semantics of exception handling and convention of
re-launching callbacks to implement thread-of-control flow-of-control,
that can be implemented in the synchronous and blocking for unit tests
and modules of the routine, making a great abstraction of flow-of-control.


Basically anything that _does_ block then makes for having its own
thread, whose only job is to block and when it unblocks, throw-toss the
re-launch toward the origin of the re-routine, and consume the next
blocking-task off the TQ. Yet, the re-routines and their servicing the
TQ only need one thread and never block. (And scale in core count and
automatically parallelize asynchronous requests according to satisfied
inputs.)


Mostly the idea of the re-routine is "in the language, it's just plain,
ordinary, synchronous routine".
Ross Finlayson
2024-04-25 17:46:48 UTC
Reply
Permalink
On 04/22/2024 10:06 AM, Ross Finlayson wrote:
> On 04/20/2024 11:24 AM, Ross Finlayson wrote:
>>
>>
>> Well I've been thinking about the re-routine as a model of cooperative
>> multithreading,
>> then thinking about the flow-machine of protocols
>>
>> NNTP
>> IMAP <-> NNTP
>> HTTP <-> IMAP <-> NNTP
>>
>> Both IMAP and NNTP are session-oriented on the connection, while,
>> HTTP, in terms of session, has various approaches in terms of HTTP 1.1
>> and connections, and the session ID shared client/server.
>>
>>
>> The re-routine idea is this, that each kind of method, is memoizable,
>> and, it memoizes, by object identity as the key, for the method, all
>> its callers, how this is like so.
>>
>> interface Reroutine1 {
>>
>> Result1 rr1(String a1) {
>>
>> Result2 r2 = reroutine2.rr2(a1);
>>
>> Result3 r3 = reroutine3.rr3(r2);
>>
>> return result(r2, r3);
>> }
>>
>> }
>>
>>
>> The idea is that the executor, when it's submitted a reroutine,
>> when it runs the re-routine, in a thread, then it puts in a ThreadLocal,
>> the re-routine, so that when a re-routine it calls, returns null as it
>> starts an asynchronous computation for the input, then when
>> it completes, it submits to the executor the re-routine again.
>>
>> Then rr1 runs through again, retrieving r2 which is memoized,
>> invokes rr3, which throws, after queuing to memoize and
>> resubmit rr1, when that calls back to resubmit r1, then rr1
>> routines, signaling the original invoker.
>>
>> Then it seems each re-routine basically has an instance part
>> and a memoized part, and that it's to flush the memo
>> after it finishes, in terms of memoizing the inputs.
>>
>>
>> Result 1 rr(String a1) {
>> // if a1 is in the memo, return for it
>> // else queue for it and carry on
>>
>> }
>>
>>
>> What is a re-routine?
>>
>> It's a pattern for cooperative multithreading.
>>
>> It's sort of a functional approach to functions and flow.
>>
>> It has a declarative syntax in the language with usual
>> flow-of-control.
>>
>> So, it's cooperative multithreading so it yields?
>>
>> No, it just quits, and expects to be called back.
>>
>> So, if it quits, how does it complete?
>>
>> The entry point to re-routine provides a callback.
>>
>> Re-routines only return results to other re-routines,
>> It's the default callback. Otherwise they just callback.
>>
>> So, it just quits?
>>
>> If a re-routine gets called with a null, it throws.
>>
>> If a re-routine gets a null, it just continues.
>>
>> If a re-routine completes, it callbacks.
>>
>> So, can a re-routine call any regular code?
>>
>> Yeah, there are some issues, though.
>>
>> So, it's got callbacks everywhere?
>>
>> Well, it's just got callbacks implicitly everywhere.
>>
>> So, how does it work?
>>
>> Well, you build a re-routine with an input and a callback,
>> you call it, then when it completes, it calls the callback.
>>
>> Then, re-routines call other re-routines with the argument,
>> and the callback's in a ThreadLocal, and the re-routine memoizes
>> all of its return values according to the object identity of the
>> inputs,
>> then when a re-routine completes, it calls again with another
>> ThreadLocal
>> indicating to delete the memos, following the exact same
>> flow-of-control
>> only deleting the memos going along, until it results all the
>> memos in
>> the re-routines for the interned or ref-counted input are deleted,
>> then the state of the re-routine is de-allocated.
>>
>> So, it's sort of like a monad and all in pure and idempotent functions?
>>
>> Yeah, it's sort of like a monad and all in pure and idempotent
>> functions.
>>
>> So, it's a model of cooperative multithreading, though with no yield,
>> and callbacks implicitly everywhere?
>>
>> Yeah, it's sort of figured that a called re-routine always has a
>> callback in the ThreadLocal, because the runtime has pre-emptive
>> multithreading anyways, that the thread runs through its re-routines in
>> their normal declarative flow-of-control with exception handling, and
>> whatever re-routines or other pure monadic idempotent functions it
>> calls, throw when they get null inputs.
>>
>> Also it sort of doesn't have primitive types, Strings must always
>> be interned, all objects must have a distinct identity w.r.t. ==, and
>> null is never an argument or return value.
>>
>> So, what does it look like?
>>
>> interface Reroutine1 {
>>
>> Result1 rr1(String a1) {
>>
>> Result2 r2 = reroutine2.rr2(a1);
>>
>> Result3 r3 = reroutine3.rr3(r2);
>>
>> return result(r2, r3);
>> }
>>
>> }
>>
>> So, I expect that to return "result(r2, r3)".
>>
>> Well, that's synchronous, and maybe blocking, the idea is that it
>> calls rr2, gets a1, and rr2 constructs with the callback of rr1 and it's
>> own callback, and a1, and makes a memo for a1, and invokes whatever is
>> its implementation, and returns null, then rr1 continues and invokes rr3
>> with r2, which is null, so that throws a NullPointerException, and rr1
>> quits.
>>
>> So, ..., that's cooperative multithreading?
>>
>> Well you see what happens is that rr2 invoked another re-routine or
>> end routine, and at some point it will get called back, and that will
>> happen over and over again until rr2 has an r2, then rr2 will memoize
>> (a1, r2), and then it will callback rr1.
>>
>> Then rr1 had quit, it runs again, this time it gets r2 from the
>> (a1, r2) memo in the monad it's building, then it passes a non-null r2
>> to rr3, which proceeds in much the same way, while rr1 quits again until
>> rr3 calls it back.
>>
>> So, ..., it's non-blocking, because it just quits all the time, then
>> happens to run through the same paces filling in?
>>
>> That's the idea, that re-routines are responsible to build the
>> monad and call-back.
>>
>> So, can I just implement rr2 and rr3 as synchronous and blocking?
>>
>> Sure, they're interfaces, their implementation is separate. If
>> they don't know re-routine semantics then they're just synchronous and
>> blocking. They'll get called every time though when the re-routine gets
>> called back, and actually they need to know the semantics of returning
>> an Object or value by identity, because, calling equals() to implement
>> Memo usually would be too much, where the idea is to actually function
>> only monadically, and that given same Object or value input, must return
>> same Object or value output.
>>
>> So, it's sort of an approach as a monadic pure idempotency?
>>
>> Well, yeah, you can call it that.
>>
>> So, what's the point of all this?
>>
>> Well, the idea is that there are 10,000 connections, and any time
>> one of them demultiplexes off the connection an input command message,
>> then it builds one of these with the response input to the demultiplexer
>> on its protocol on its connection, on the multiplexer to all the
>> connections, with a callback to itself. Then the re-routine is launched
>> and when it returns, it calls-back to the originator by its
>> callback-number, then the output command response writes those back out.
>>
>> The point is that there are only as many Theads as cores so the
>> goal is that they never block,
>> and that the memos make for interning Objects by value, then the goal is
>> mostly to receive command objects and handles to request bodies and
>> result objects and handles to response bodies, then to call-back with
>> those in whatever serial order is necessary, or not.
>>
>> So, won't this run through each of these re-routines umpteen times?
>>
>> Yeah, you figure that the runtime of the re-routine is on the order
>> of n^2 the order of statements in the re-routine.
>>
>> So, isn't that terrible?
>>
>> Well, it doesn't block.
>>
>> So, it sounds like a big mess.
>>
>> Yeah, it could be. That's why to avoid blocking and callback
>> semantics, is to make monadic idempotency semantics, so then the
>> re-routines are just written in normal synchronous flow-of-control, and
>> they're well-defined behavior is exactly according to flow-of-control
>> including exception-handling.
>>
>> There's that and there's basically it only needs one Thread, so,
>> less Thread x stack size, for a deep enough thread call-stack. Then the
>> idea is about one Thread per core, figuring for the thread to always be
>> running and never be blocking.
>>
>> So, it's just normal flow-of-control.
>>
>> Well yeah, you expect to write the routine in normal
>> flow-of-control, and to test it with synchronous and in-memory editions
>> that just run through synchronously, and that if you don't much care if
>> it blocks, then it's the same code and has no semantics about the
>> asynchronous or callbacks actually in it. It just returns when it's
>> done.
>>
>>
>> So what's the requirements of one of these again?
>>
>> Well, the idea is, that, for a given instance of a re-routine, it's
>> an Object, that implements an interface, and it has arguments, and it
>> has a return value. The expectation is that the re-routine gets called
>> with the same arguments, and must return the same return value. This
>> way later calls to re-routines can match the same expectation, same/same.
>>
>> Also, if it gets different arguments, by Object identity or
>> primitive value, the re-routine must return a different return value,
>> those being same/same.
>>
>> The re-routine memoizes its arguments by its argument list, Object
>> or primitive value, and a given argument list is same if the order and
>> types and values of those are same, and it must return the same return
>> value by type and value.
>>
>> So, how is this cooperative multithreading unobtrusively in
>> flow-of-control again?
>>
>> Here for example the idea would be, rr2 quits and rr1 continues, rr3
>> quits and rr1 continues, then reaching rr4, rr4 throws and rr1 quits.
>> When rr2's or rr3's memo-callback completes, then it calls-back rr1. as
>> those come in, at some point rr4 will be fulfilled, and thus rr4 will
>> quit and rr1 will quit. When rr4's callback completes, then it will
>> call-back rr1, which will finally complete, and then call-back whatever
>> called r1. Then rr1 runs itself through one more time to
>> delete or decrement all its memos.
>>
>> interface Reroutine1 {
>>
>> Result1 rr1(String a1) {
>>
>> Result2 r2 = reroutine2.rr2(a1);
>>
>> Result3 r3 = reroutine3.rr3(a1);
>>
>> Result4 r4 = reroutine4.rr4(a1, r2, r3);
>>
>> return Result1.r4(a1, r4);
>> }
>>
>> }
>>
>> The idea is that it doesn't block when it launchs rr2 and rr3, until
>> such time as it just quits when it tries to invoke rr4 and gets a
>> resulting NullPointerException, then eventually rr4 will complete and be
>> memoized and call-back rr1, then rr1 will be called-back and then
>> complete, then run itself through to delete or decrement the ref-count
>> of all its memo-ized fragmented monad respectively.
>>
>> Thusly it's cooperative multithreading by never blocking and always just
>> launching callbacks.
>>
>> There's this System.identityHashCode() method and then there's a notion
>> of Object pools and interning Objects then as for about this way that
>> it's about numeric identity instead of value identity, so that when
>> making memo's that it's always "==" and for a HashMap with
>> System.identityHashCode() instead of ever calling equals(), when calling
>> equals() is more expensive than calling == and the same/same
>> memo-ization is about Object numeric value or the primitive scalar
>> value, those being same/same.
>>
>> https://docs.oracle.com/javase/8/docs/api/java/lang/System.html#identityHashCode-java.lang.Object-
>>
>>
>>
>> So, you figure to return Objects to these connections by their session
>> and connection and mux/demux in these callbacks and then write those out?
>>
>> Well, the idea is to make it so that according to the protocol, the
>> back-end sort of knows what makes a handle to a datum of the sort, given
>> the protocol and the protocol and the protocol, and the callback is just
>> these handles, about what goes in the outer callbacks or outside the
>> re-routine, those can be different/same. Then the single writer thread
>> servicing the network I/O just wants to transfer those handles, or, as
>> necessary through the compression and encryption codecs, then write
>> those out, well making use of the java.nio for scatter/gather and vector
>> I/O in the non-blocking and asynchronous I/O as much as possible.
>>
>>
>> So, that seems a lot of effort to just passing the handles, ....
>>
>> Well, I don't want to write any code except normal flow-of-control.
>>
>> So, this same/same bit seems onerous, as long as different/same has a
>> ref-count and thus the memo-ized monad-fragment is maintained when all
>> sorts of requests fetch the same thing.
>>
>> Yeah, maybe you're right. There's much to be gained by re-using monadic
>> pure idempotent functions yet only invoking them once. That gets into
>> value equality besides numeric equality, though, with regards to going
>> into re-routines and interning all Objects by value, so that inside and
>> through it's all "==" and System.identityHashCode, the memos, then about
>> the ref-counting in the memos.
>>
>>
>> So, I suppose you know HTTP, and about HTTP/2 and IMAP and NNTP here?
>>
>> Yeah, it's a thing.
>>
>> So, I think this needs a much cleaner and well-defined definition, to
>> fully explore its meaning.
>>
>> Yeah, I suppose. There's something to be said for reading it again.
>>
>>
>>
>>
>>
>>
>
>
>
>
>
> ReRoutines: monadic functional non-blocking asynchrony in the language
>
>
> Implementing a sort of Internet protocol server, it sort of has three or
> four kinds of machines.
>
> flow-machine: select/epoll hardware driven I/O events
>
> protocol-establishment: setting up and changing protocol (commands,
> encryption/compression)
>
> protocol-coding: block coding in encryption/compression and wire/object
> commands/results
>
> routine: inside the objects of the commands of the protocol,
> commands/results
>
> Then, it often looks sort of like
>
> flow <-> protocol <-> routine <-> protocol <-> flow
>
>
> On either outer side of the flow is a connection, it's a socket or the
> receipt or sending of a datagram, according to the network interface and
> select/epoll.
>
> The establishment of a protocol looks like
> connection/configuration/commencement/conclusion, or setup/teardown.
> Protocols get involved renegotiation within a protocol, and for example
> upgrade among protocols. Then the protocol is setup and established.
>
> The idea is that a protocol's coding is in three parts for
> coding/decoding, compression/decompression, and (en)cryption/decryption,
> or as it gets set up.
>
> flow->decrypt->decomp->decod->routine->cod->comp->crypt->flow-v
> flow<-crypt<-comp<-cod<-routine<-decod<-decomp<-decrypt<-flow<-
>
>
>
> Whenever data arrives, the idea goes, is that the flow is interpreted
> according to the protocol, resulting commands, then the routine derives
> results from the commands, as by issuing others, in their protocols, to
> the backend flow. Then, the results get sent back out through the
> protocol, to the frontend, the clients of what it serves the protocol
> the server.
>
> The idea is that there are about 10,000 connections at a time, or more
> or less.
>
> flow <-> protocol <-> routine <-> protocol <-> flow
> flow <-> protocol <-> routine <-> protocol <-> flow
> flow <-> protocol <-> routine <-> protocol <-> flow
> ...
>
>
>
>
> Then, the routine in the middle, has that there's one processor, and on
> the processor are a number of cores, each one independent. Then, the
> operating system establishes that each of the cores, has any number of
> threads-of-control or threads, and each thread has the state of where it
> is in the callstack of routines, and the threads are preempted so that
> multithreading, that a core runs multiple threads, gives each thread
> some running from the entry to the exit of the thread, in any given
> interval of time. Each thread-of-control is thusly independent, while it
> must synchronize with any other thread-of-control, to establish common
> or mutual state, and threads establish taking turns by mutual exclusion,
> called "mutex".
>
> Into and out of the protocol, coding, is either a byte-sequence or
> block, or otherwise the flow is a byte-sequence, that being serial,
> however the protocol multiplexes and demultiplexes messages, the
> commands and their results, to and from the flow.
>
> Then the idea is that what arrives to/from the routine, is objects in
> the protocol, or handles to the transport of byte sequences, in the
> protocol, to the flow.
>
> A usual idea is that there's a thread that services the flow, where, how
> it works is that a thread blocks waiting for there to be any I/O,
> input/output, reading input from the flow, and writing output to the
> flow. So, mostly the thread that blocks has that there's one thread that
> blocks on input, and when there's any input, then it reads or transfers
> the bytes from the input, into buffers. That's its only job, and only
> one thread can block on a given select/epoll selector, which is any
> given number of ports, the connections, the idea being that it just
> blocks until select returns for its keys of interest, it services each
> of the I/O's by copying from the network interface's buffers into the
> program's buffers, then other threads do the rest.
>
> So, if a thread results waiting at all for any other action to complete
> or be ready, it's said to "block". While a thread is blocked, the CPU or
> core just skips it in scheduling the preemptive multithreading, yet it
> still takes some memory and other resources and is in the scheduler of
> the threads.
>
> The idea that the I/O thread, ever blocks, is that it's a feature of
> select/epoll that hardware results waking it up, with the idea that
> that's the only thread that ever blocks.
>
> So, for the other threads, in the decryption/decompression/decoding and
> coding/compression/cryption, the idea is that a thread, runs through
> those, then returns what it's doing, and joins back to a limited pool of
> threads, with a usual idea of there being 1 core : 1 thread, so that
> multithreading is sort of simplified, because as far as the system
> process is concerned, it has a given number of cores and the system
> preemptively multithreads it, and as far as the virtual machine is
> concerned, is has a given number of cores and the virtual machine
> preemptively multithreads its threads, about the thread-of-control, in
> the flow-of-control, of the thing.
>
> A usual way that the routine muliplexes and demultiplexes objects in the
> protocol from a flow's input back to a flow's output, has that the
> thread-per-connection model has that a single thread carries out the
> entire task through the backend flow, blocking along the way, until it
> results joining after writing back out to its connection. Yet, that has
> a thread per each connection, and threads use scheduling and heap
> resources. So, here thread-per-connection is being avoided.
>
> Then, a usual idea of the tasks, is that as I/O is received and flows
> into the decryption/decompression/decoding, then what's decoded, results
> the specification of a task, the command, and the connection, where to
> return its result. The specification is a data structure, so it's an
> object or Object, then. This is added to a queue of tasks, where
> "buffers" represent the ephemeral storage of content in transport the
> byte-sequences, while, the queue is as usually a first-in/first-out
> (FIFO) queue also, of tasks.
>
> Then, the idea is that each of the cores consumes task specifications
> from the task queue, performs them according to the task specification,
> then the results are written out, as coded/compressed/crypted, in the
> protocol.
>
> So, to avoid the threads blocking at all, introduces the idea of
> "asynchrony" or callbacks, where the idea is that the "blocking" and
> "synchronous" has that anywhere in the threads' thread-of-control
> flow-of-control, according to the program or the routine, it is current
> and synchronous, the value that it has, then with regards to what it
> returns or writes, as the result. So, "asynchrony" is the idea that
> there's established a callback, or a place to pause and continue, then a
> specification of the task in the protocol is put to an event queue and
> executed, or from servicing the O/I's of the backend flow, that what
> results from that, has the context of the callback and returns/writes to
> the relevant connection, its result.
>
> I -> flow -> protocol -> routine -> protocol -> flow -> O -v
> O <- flow <- protocol <- routine <- protocol <- flow <- I <-
>
>
> The idea of non-blocking then, is that a routine either provides a
> result immediately available, and is non-blocking, or, queues a task
> what results a callback that provides the result eventually, and is
> non-blocking, and never invokes any other routine that blocks, so is
> non-blocking.
>
> This way a thread, executing tasks, always runs through a task, and thus
> services the task queue or TQ, so that the cores' threads are always
> running and never blocking. (Besides the I/O and O/I threads which block
> when there's no traffic, and usually would be constantly woken up and
> not waiting blocked.) This way, the TQ threads, only block when there's
> nothing in the TQ, or are just deconstructed, and reconstructed, in a
> "pool" of threads, the TQ's executor pool.
>
> Enter the ReRoutine
>
> The idea of a ReRoutine, a re-routine, is that it is a usual procedural
> implementation as if it were synchronous, and agnostic of callbacks.
>
> It is named after "routine" and "co-routine". It is a sort of co-routine
> that builds a monad and is aware its originating caller, re-caller, and
> callback, or, its re-routine caller, re-caller, and callback.
>
> The idea is that there are callbacks implicitly at each method boundary,
> and that nulls are reserved values to indicate the result or lack
> thereof of re-routines, so that the code has neither callbacks nor any
> nulls.
>
> The originating caller has that the TQ, has a task specification, the
> session+attachment of the client in the protocol where to write the
> output, and the command, then the state of the monad of the task, that
> lives on the heap with the task specification and task object. The TQ
> consumers or executors or the executor, when a thread picks up the task,
> it picks up or builds ("originates") the monad state, which is the
> partial state of the re-routine and a memo of the partial state of the
> re-routine, and installs this in the thread local storage or
> ThreadLocal, for the duration of the invocation of the re-routine. Then
> the thread enters the re-routine, which proceeds until it would block,
> where instead it queues a command/task with callback to re-call it to
> re-launch it, and throw a NullPointerException and quits/returns.
>
> This happens recursively and iteratively in the re-routine implemented
> as re-routines, each re-routine updates the partial state of the monad,
> then that as a re-routine completes, it re-launches the calling
> re-routine, until the original re-routine completes, and it calls the
> original callback with the result.
>
> This way the re-routine's method body, is written as plain declarative
> procedural code, the flow-of-control, is exactly as if it were
> synchronous code, and flow-of-control is exactly as if written in the
> language with no callbacks and never nulls, and exception-handling as
> exactly defined by the language.
>
> As the re-routine accumulates the partial results, they live on the
> heap, in the monad, as a member of the originating task's object the
> task in the task queue. This is always added back to the queue as one of
> the pending results of a re-routine, so it stays referenced as an object
> on the heap, then that as it is completed and the original re-routine
> returns, then it's no longer referenced and the garbage-collector can
> reclaim it from the heap or the allocator can delete it.
>
>
>
>
>
>
>
> Well, for the re-routine, I sort of figure there's a Callstack and a
> Callback type
>
> class Callstack {
> Stack<Callback> callstack;
> }
>
> interface Callback {
> void callback() throws Exception;
> }
>
> and then a placeholder sort of type for Callflush
>
> class Callflush {
> Callstack callstack;
> }
>
> with the idea that the presence in ThreadLocals is to be sorted out,
> about a kind of ThreadLocal static pretty much.
>
> With not returning null and for memoizing call-graph dependencies,
> there's basically for an "unvoid" type.
>
> class unvoid {
>
> }
>
> Then it's sort of figure that there's an interface with some defaults,
> with the idea that some boilerplate gets involved in the Memoization.
>
> interface Caller {}
>
> interface Callee {}
>
> class Callmemo {
> memoize(Caller caller, Object[] args);
> flush(Caller caller);
> }
>
>
> Then it seems that the Callstack should instead be of a Callgraph, and
> then what's maintained from call to call is a Callpath, and then what's
> memoized is all kept with the Callgraph, then with regards to objects on
> the heap and their distinctness, only being reachable from the
> Callgraph, leaving less work for the garbage collector, to maintain the
> heap.
>
> The interning semantics would still be on the class level, or for
> constructor semantics, as with regards to either interning Objects for
> uniqueness, or that otherwise they'd be memoized, with the key being the
> Callpath, and the initial arguments into the Callgraph.
>
> Then the idea seems that the ThreaderCaller, establishes the Callgraph
> with respect to the Callgraph of an object, installing it on the thread,
> otherwise attached to the Callgraph, with regards to the ReRoutine.
>
>
>
> About the ReRoutine, it's starting to come together as an idea, what is
> the apparatus for invoking re-routines, that they build the monad of the
> IOE's (inputs, outputs, exceptions) of the re-routines in their
> call-graph, in terms of ThreadLocals of some ThreadLocals that callers
> of the re-routines, maintain, with idea of the memoized monad along the
> way, and each original re-routine.
>
> class IOE <O, E> {
> Object[] input;
> Object output;
> Exception exception;
> }
>
> So the idea is that there are some ThreadLocal's in a static ThreadGlobal
>
> public class ThreadGlobals {
> public static ThreadLocal<MonadMemo> monadMemo;
> }
>
> where callers or originators or ReRoutines, keep a map of the Runnables
> or Callables they have, to the MonadMemo's,
>
> class Originator {
> Map<? extends ReRoutineMapKey, MonadMemo> monadMemoMap;
> }
>
> then when it's about to invoke a Runnable, if it's a ReRoutine, then it
> either retrieves the MonadMemo or makes a new one, and sets it on the
> ThreadLocal, then invokes the Runnable, then clears the ThreadLocal.
>
> Then a MonadMemo, pretty simply, is a List of IOE's, that when the
> ReRoutine runs through the callgraph, the callstack is indicated by a
> tree of integers, and the stack path in the ReRoutine, so that any
> ReRoutine that calls ReRoutines A/B/C, points to an IOE that it finds in
> the thing, then it's default behavior is to return its memo-ized value,
> that otherwise is making the callback that fills its memo and re-invokes
> all the way back the Original routine, or just its own entry point.
>
> This is basically that the Originator, when the ReRoutine quits out,
> sort of has that any ReRoutine it originates, also gets filled up by the
> Originator.
>
> So, then the Originator sort of has a map to a ReRoutine, then for any
> Path, the Monad, so that when it sets the ThreadLocal with the
> MonadMemo, it also sets the Path for the callee, launches it again when
> its callback returned to set its memo and relaunch it, then back up the
> path stack to the original re-routine.
>
> One of the issues here is "automatic parallelization". What I mean by
> that is that the re-routine just goes along and when it gets nulls
> meaning "pending" it just continues along, then expects
> NullPointerExceptions as "UnsatisifiedInput", to quit, figuring it gets
> relaunched when its input is satisfied.
>
> This way then when routines serially don't depend on each others'
> outputs, then they all get launched apiece, parallelizing.
>
> Then, I wonder about usual library code, basically about Collections and
> Streams, and the usual sorts of routines that are applied to the
> arguments, and how to basically establish that the rule of re-routine
> code is that anything that gets a null must throw a
> NullPointerException, so the re-routine will quit until the arguments
> are satisfied, the inputs to library code. Then with the Memo being
> stored in the MonadMemo, it's figured that will work out regardless the
> Objects' or primitives' value, with regards to Collections and Stream
> code and after usual flow-of-control in Iterables for the for loops, or
> whatever other application library code, that they will be run each time
> the re-routine passes their section with satisfied arguments, then as
> with regards to, that the Memo is just whatever serial order the
> re-routine passes, not needing to lookup by Object identity which is
> otherwise part of an interning pattern.
>
> void rr1(String s1) {
>
> List<String> l1 = rr2.get(s1);
>
> Map<String, String> m1 = new LinkedHashMap<>();
>
> l1.stream().forEach(s -> m1.put(s, rr3.get(s)));
>
> return m1;
> }
>
> See what I figure is that the order of the invocations to rr3.get() is
> serial, so it really only needs to memoize its OE, Output|Exception,
> then about that putting null values in the Map, and having to check the
> values in the Map for null values, and otherwise to make it so that the
> semantics of null and NullPointerException, result that satisfying
> inputs result calls, and unsatisfying inputs result quits, figuring
> those unsatisfying inputs are results of unsatisfied outputs, that will
> be satisfied when the callee gets populated its memo and makes the
> callback.
>
> If the order of invocations is out-of-order, gets again into whether the
> Object/primitive by value needs to be the same each time, IOE, about the
> library code in Collections, Streams, parallelStream, and Iterables, and
> basically otherwise that any kind of library code, should throw
> NullPointerException if it gets an "unexpected" null or what doesn't
> fulfill it.
>
> The idea though that rr3 will get invoked say 1000 times with the rr2's
> result, those each make their call, then re-launch 1000 times, has that
> it's figured that the Executor, or Originator, when it looks up and
> loads the "ReRoutineMapKey", is to have the count of those and whether
> the count is fulfilled, then to no-op later re-launches of the
> call-backs, after all the results are populated in the partial monad memo.
>
> Then, there's perhaps instead as that each re-routine just checks its
> input or checks its return value for nulls, those being unsatisfied.
>
> (The exception handling thoroughly or what happens when rr3 throws and
> this kind of thing is involved thoroughly in library code.)
>
> The idea is it remains correct if the worst thing nulls do is throw
> NullPointerException, because that's just a usual quit and means another
> re-launch is coming up, and that it automatically queues for
> asynchronous parallel invocation each the derivations while resulting
> never blocking.
>
> It's figured that re-routines check their inputs for nulls, and throw
> quit, and check their inputs for library container types, and checking
> any member of a library container collection for null, to throw quit,
> and then it will result that the automatic asynchronous parallelization
> proceeds, while the re-routines are never blocking, there's only as much
> memory on the heap of the monad as would be in the lifetime of the
> original re-routine, and whatever re-calls or re-launches of the
> re-routine established local state in local variables and library code,
> would come in and out of scope according to plain stack unwinding.
>
> Then there's still the perceived deficiency that the re-routine's method
> body will be run many times, yet it's only run as many times as result
> throwing-quit, when it reaches where its argument to the re-routine or
> result value isn't yet satisfied yet is pending.
>
> It would re-run the library code any number of times, until it results
> all non-nulls, then the resulting satisfied argument to the following
> re-routines, would be memo-ized in the monad, and the return value of
> the re-routine thus returning immediately its value on the partial monad.
>
> This way each re-call of the re-routine, mostly encounters its own monad
> results in constant time, and throws-quit or gets thrown-quit only when
> it would be unsatisfying, with the expectation that whatever
> throws-quit, either NullPointerException or extending
> NullPointerException, will have a pending callback, that will queue on a
> TQ, the task specification to re-launch and re-enter the original or
> derived, re-routine.
>
> The idea is sort of that it's sort of, Java with non-blocking I/O and
> ThreadLocal (1.7+, not 17+), or you know, C/C++ with non-blocking I/O
> and thread local storage, then for the abstract or interface of the
> re-routines, how it works out that it's a usual sort of model of
> co-operative multithreading, the re-routine, the routine "in the language".
>
>
> Then it's great that the routine can be stubbed or implemented agnostic
> of asynchrony, and declared in the language with standard libraries,
> basically using the semantics of exception handling and convention of
> re-launching callbacks to implement thread-of-control flow-of-control,
> that can be implemented in the synchronous and blocking for unit tests
> and modules of the routine, making a great abstraction of flow-of-control.
>
>
> Basically anything that _does_ block then makes for having its own
> thread, whose only job is to block and when it unblocks, throw-toss the
> re-launch toward the origin of the re-routine, and consume the next
> blocking-task off the TQ. Yet, the re-routines and their servicing the
> TQ only need one thread and never block. (And scale in core count and
> automatically parallelize asynchronous requests according to satisfied
> inputs.)
>
>
> Mostly the idea of the re-routine is "in the language, it's just plain,
> ordinary, synchronous routine".
>
>
>


Protocol Establishment

Each of these protocols is a combined sort of protocol, then according
to different modes, there's established a protocol, then data flows in
the protocol (in time).


stream-based (connections)
sockets, TCP/IP
sctp SCTP
message-based (datagrams)
datagrams, UDP

The idea is that connections can have state and session state, while,
messages do not.

Abstractly then there's just that connections make for reading from the
connection, or writing to the connection, byte-by-byte,
while messages make for receiving a complete message, or writing a
complete message. SCTP is sort of both.

A bit more concretely, the non-blocking or asychronous or vector I/O,
means that when some bytes arrive the connection is readable, and while
the output buffer is not full a connection is writeable.

For messages it's that when messages arrive messages are readable, and
while the output buffer is not full messages are writeable.

Otherwise bytes or messages that pile up while not readable/writeable
pile up and in cases of limited resources get lost.

So, the idea is that when bytes arrive, whatever's servicing the I/O's
has that the connection has data to read, and, data to write.
The usual idea is that an abstract Reader thread, will give any or all
of the connections something to read, in an arbitrary order,
at an arbitrary rate, then the role of the protocol, is to consume the
bytes to read, thus releasing the buffers, that the Reader, writes to.

Inputting/Reading
Writing/Outputting

The most usual idea of client-server is that
client writes to server then reads from server, while,
server reads from client then writes to client.

Yet, that is just a mode, reads and writes are peer-peer,
reads and writes in any order, while serial according to
that bytes in the octet stream arrive in an order.

There isn't much consideration of the out-of-band,
about sockets and the STREAMS protocol, for
that bytes can arrive out-of-band.


So, the layers of the protocol, result that some layers of the protocol
don't know anything about the protocol, all they know is sequences of
bytes, and, whatever session state is involved to implement the codec,
of the layers of the protocol. All they need to know is that given that
all previous bytes are read/written, that the connection's state is
synchronized, and everything after is read/written through the layer.
Mostly once encryption or compression is setup it's never toredown.

Encryption, TLS
Compression, LZ77 (Deflate, gzip)

The layers of the protocol, result that some layers of the protocol,
only indicate state or conditions of the session.

SASL, Login, AuthN/AuthZ

So, for NNTP, a connection, usually enough starts with no layers,
then in the various protocols and layers, get negotiated to get
established,
combinations of the protocols and layers. Other protocols expect to
start with layers, or not, it varies.

Layering, then, either is in the protocol, to synchronize the session
then establish the layer in the layer protocol then maintain the layer
in the main protocol, has that TLS makes a handsake to establish a
encryption key for all the data, then the TLS layer only needs to
encrypt and decrypt the data by that key, while for Deflate, it's
usually the only option, then after it's setup as a layer, then
everything other way reads/writes gets compressed.


client -> REQUEST
RESPONSE <- server

In some protocols these interleave

client -> REQUEST1
client -> REQUEST2

RESPONSE1A <- server
RESPONSE2A <- server
RESPONSE1B <- server
RESPONSE2B <- server

This then is called multiplexing/demultiplexing, for protocols like IMAP
and HTTP/2,
and another name for multiplexer/demultiplexer is mux/demux.




So, for TLS, the idea is that usually most or all of the connections
will be using the same algorithms with different keys, and each
connection will have its own key, so the idea is to completely separate
TLS establishment from TLS cryptec (crypt/decryp), so, the layer need
only key up the bytes by the connection's key, in their TLS frames.

Then, most of the connections will use compression, then the idea is
that the data is stored at rest compressed already and in a form that it
can be concatenated, and that similarly as constants are a bunch of the
textual context of the text-based protocol, they have compressed and
concatenable constants, with the idea that the Deflate compec
(comp/decomp) just passes those along concatenating them, or actively
compresses/decompresses buffers of bytes or as of sequences of bytes.

The idea is that Readers and Writers deal with bytes at a time,
arbitrarily many, then that what results being passed around as the
data, is as much as possible handles to the data. So, according to the
protocol and layers, indicates the types, that the command routines, get
and return, so that the command routines can get specialized, when the
data at rest, is already layerized, and otherwise to adapt to the more
concrete abstraction, of the non-blocking, asynchronous, and vector I/O,
of what results the flow-machine.


When the library of the runtime of the framework of the language
provides the cryptec or compec, then, there's issues, when, it doesn't
make it so for something like "I will read and write you the bytes as of
making a TLS handshake, then return the algorithm and the key and that
will implement the cryptec", or, "compec, here's either some data or
handles of various types, send them through", it's to be figured out.
The idea for the TLS handshake, is basically to sit in the middle, i.e.
to read and write bytes as of what the client and server send, then
figuring out what is the algorithm and key and then just using that as
the cryptec. Then after TLS algorithm and key is established the rest is
sort of discarded, though there's some idea about state and session, for
the session key feature in TLS. The TLS 1.2 also includes comp/decomp,
though, it's figured that instead it's a feature of the protocol whether
it supports compression, point being that's combining layers, and to be
implemented about these byte-sequences/handles.


mux/demux
crypt/decrypt
comp/decomp
cod/decod

codec


So, the idea is to implement toward the concrete abstraction of
nonblocking vector I/O, while, remaining agnostic of that, so that all
sorts the usual test routines yet particularly the composition of layers
and establishment and upgrade of protocols, is to happen.


Then, from the byte sequences or messages as byte sequences, or handles
of byte sequences, results that in the protocol, the protocol either way
in/out has a given expected set of alternatives that it can read, then
as of derivative of those what it will write.

So, after the layers, which are agnostic of anything but byte-sequences,
and their buffers and framing and chunking and so on, then is the
protocol, or protocols, of the command-set and request/response
semantics, and ordering/session statefulness, and lack thereof.

Then, a particular machine in the flow-machine is as of the "Recognizer"
and "Parser", then what results "Annunciators" and "Legibilizers", as it
were, of what's usually enough called "Deserialization", reading off
from a serial byte-sequence, and "Serialization, writing off to a serial
byte-sequence, first the text of the commands or the structures in these
text-based protocols, the commands and their headers/bodies/payloads,
then the Objects in the object types of the languages of the runtime,
where then the routines of the servicing of the protocol, are defined in
types according to the domain types of the protocol (and their
representations as byte-sequences and handles).

As packets and bytes arrive in the byte-sequence, the Recognizer/Parser
detects when there's a fully-formed command, and its payload, after the
Mux/Demux Demultiplexer, has that the Demultiplexer represents any given
number of separate byte-sequences, then according to the protocol
anything their statefulness/session or orderedness/unorderedness.

So, the Demultiplexer is to Recognize/Parse from the combined input
byte-stream its chunks, that now the connection, has any number of
ordered/unordered byte-sequences, then usually that those are ephemeral
or come and go, while the connection endures, with the most usual notion
that there's only one stream and it's ordered in requets and ordered in
responses, then whether commands gets pipelined and requests need not
await their responses (they're ordered), and whether commands are
numbers and their responses get associated with their command sequence
numbers (they're unordered and the client has its own mux/demux to
relate them).

So, the Recognizer/Parser, theoretically only gets a byte at a time, or
even none, and may get an entire fully-formed message (command), or not,
and may get more bytes than a fully-formed message, or not, and the
bytes may be a well-formed message, or not, and valid, or not.

Then the job of the Recognizer/Parser, is from the beginning of the
byte-sequence, to Recognize a fully-formed message, then to create an
instance of the command object related to the handle back through the
mux/demux to the multiplexer, called the attachment to the connection,
or the return address according to the attachment representing any
routed response and usually meaning that the attachment is the user-data
and any session data attached to the connection and here of the
mux/demux of the connection, the job of the Recognizer/Parser is to work
any time input is received, then to recognize and parse any number of
fully-formed messages from the input, create those Commands according to
the protocol, that the attachment includes the return destination, and,
thusly release those buffers or advance the marker on the Input
byte-sequence, so that the resources are freed, and later
Recognizings/Parsing starts where it left off.

The idea is that bytes arrive, the Recognizer/Parser has to determine
when there's a fully-formed message, consume that and service the
buffers the byte-sequence, having created the derived command.

Now, commands are small, or so few words, then the headers/body/payload,
basically get larger and later unboundedly large. Then, the idea is that
the protocol, has certain modes or sub-protocols, about "switching
protocols", or modes, when basically the service of the routine changes
from recognizing and servicing the beginning to ending of a command, to
recognizing and servicing an arbitrarily large payload, or, for example,
entering a mode where streamed data arrives or whatever sort, then that
according to the length or content of the sub-protocol format, the
Recognizer's job includes that the sub-protocol-streaming, modes, get
into that "sub-protocols" is a sort of "switching protocols", the only
idea though being going into the sub-protocol then back out to the main
protocol, while "switching protocols" is involved in basically any the
establishment or upgrade of the protocol, with regards to the stateful
connection (and not stateless messages, which always are according to
their established or simply some fixed protocol).

This way unboundedly large inputs, don't actually live in the buffers of
the Recognizers that service the buffers of the Inputters/Readers and
Multiplexers/Demultiplexers, instead define modes where they will be
streaming through arbitrarily large payloads.

Here for NNTP and so on, the payloads are not considered arbitrarily
large, though, it's sort of a thing that sending or receiving the
payload of each message, can be defined this way so that in very, very
limited resources of buffers, that the flow-machine keeps flowing.


Then, here, the idea is that these commands and their payloads, have
their outputs that are derived as a function of the inputs. It's
abstractly however this so occurs is the way it is. The idea here is
that the attachment+command+payload makes a re-routine task, and is
pushed onto a task queue (TQ). Then it's figured that the TQ represents
abstractly the execution of all the commands. Then, however many Task
Workers or TW, or the TQ that runs itself, get the oldest task from the
queue (FIFO) and run it. When it's complete, then there's a response
ready in byte-sequences are handles, these are returned to the attachment.

(The "attachment" usually just means a user or private datum associated
with the connection to identify its session with the connection
according to non-blocking I/O, here it also means the mux/demux
"remultiplexer" attachment, it's the destination of any response
associated with a stream of commands over the connection.)

So, here then the TQ basically has the idea of the re-routine, that is
non-blocking and involves the asynchronous fulfillment of the routine in
the domain types of the domain of object types that the protocol adapts
as an adapter, that the domain types fulfill as adapted. Then for NNTP
that's like groups and messages and summaries and such, the objects. For
IMAP its mailboxes and messages to read, for SMTP its emails to send,
with various protocols in SMTP being separate protocols like DKIM or
what, for all these sorts protocols. For HTTP and HTTP/2 it's usual HTTP
verbs, usually HTTP 1.1 serial and pipelined requests over a connection,
in HTTP/2 mutiplexed requests over a connection. Then "session" means
broadly that it may be across connections, what gets into the attachment
and the establishment and upgrade of protocol, that sessions are
stateful thusly, yet granularly, as to connections yet as to each request.


Then, the same sort of thing is the same sort of thing to back-end,
whatever makes for adapters, to domain types, that have their protocols,
and what results the O/I side to the I/O side, that the I/O side is the
server's client-facing side, while the O/I side is the
server-as-a-client-to-the-backend's, side.

Then, the O/I side is just the same sort of idea that in the
flow-machine, the protocols get established in their layers, so that all
through the routine, then the domain type are to get specialized to when
byte-sequences and handles are known well-formed in compatible
protocols, that the domain and protocol come together in their
definition, basically so it results that from the back-end is retrieved
for messages by their message-ID that are stored compressed at rest, to
result passing back handles to those, for example a memory-map range
offset to an open handle of a zip file that has the concatenable entry
of the message-Id from the groups' day's messages, or a list of those
for a range of messages, then the re-routine results passing the handles
back out to the attachment, which sends them right out.

So, this way there's that besides the TQ and its TW's, that those are to
never block or be long-running, that anything that's long-running is on
the O/I side, and has its own resources, buffers, and so on, where of
course all the resources here of this flow-machine are shared by all the
flow-machines in the flow-machine, in the sense that they are not shared
yet come from a common resource altogether, and are exclusive. (This
gets into the definition of "share" as with regards to "free to share,
or copy" and "exclusive to share, a.k.a. taking turns, not cutting in
line, and not stealing nor hoarding".)


Then on the O/I side or the backend side, it's figured the backend is
any kind of adapters, like DB adapters or FS adapters or WS adapters,
database or filesystem or webservice, where object-stores are considered
filesystem adapters. What that gets into is "pools" like client pools,
connection pools, resource pools, that a pool is usually enough
according to a session and the establishment of protocol, then with
regards to servicing the adapter and according to the protocol and the
domain objects that thusly implement the protocol, the backend side has
its own dedicated routines and TW's, or threads of execution, with
regards to that the backend side basically gets a callback+request and
the job is to invoke the adapter with the request, and invoke the
callback with the response, then whether for example the callback is
actually the original attachment, or it involves "bridging the unbounded
sub-protocol", what it means for the adapter to service the command.

Then the adapter is usually either provided as with intermediate or
domain types, or, for example it's just another protocol flow machine
and according to the connections or messaging or mux/demux or
establishing and upgrading layers and protocols, it basically works the
same way as above in reverse.

Here "to service" is the usual infinitive that for the noun means "this
machine provides a service" yet as a verb that service means to operate
according to the defined behavior of the machine in the resources of the
machine to meet the resource needs of the machine's actions in the
capabilities and limits of the resources of the machine, where this "I/O
flow-machine: a service" is basically one "node" or "process" in a usual
process model, allocated its own quota of resources according to the
process and its environment model in the runtime in the system, and
that's it. So, there's servicing as the main routine, then also what it
means the maintenance servicing or service of the extended routine.
Then, for protocols it's "implement this protocol according to its
standards according to the resources in routine".


You know, I don't know where they have one of these anywhere, ....
Ross Finlayson
2024-04-27 16:01:43 UTC
Reply
Permalink
On 04/25/2024 10:46 AM, Ross Finlayson wrote:
> On 04/22/2024 10:06 AM, Ross Finlayson wrote:
>> On 04/20/2024 11:24 AM, Ross Finlayson wrote:
>>>
>>>
>>> Well I've been thinking about the re-routine as a model of cooperative
>>> multithreading,
>>> then thinking about the flow-machine of protocols
>>>
>>> NNTP
>>> IMAP <-> NNTP
>>> HTTP <-> IMAP <-> NNTP
>>>
>>> Both IMAP and NNTP are session-oriented on the connection, while,
>>> HTTP, in terms of session, has various approaches in terms of HTTP 1.1
>>> and connections, and the session ID shared client/server.
>>>
>>>
>>> The re-routine idea is this, that each kind of method, is memoizable,
>>> and, it memoizes, by object identity as the key, for the method, all
>>> its callers, how this is like so.
>>>
>>> interface Reroutine1 {
>>>
>>> Result1 rr1(String a1) {
>>>
>>> Result2 r2 = reroutine2.rr2(a1);
>>>
>>> Result3 r3 = reroutine3.rr3(r2);
>>>
>>> return result(r2, r3);
>>> }
>>>
>>> }
>>>
>>>
>>> The idea is that the executor, when it's submitted a reroutine,
>>> when it runs the re-routine, in a thread, then it puts in a ThreadLocal,
>>> the re-routine, so that when a re-routine it calls, returns null as it
>>> starts an asynchronous computation for the input, then when
>>> it completes, it submits to the executor the re-routine again.
>>>
>>> Then rr1 runs through again, retrieving r2 which is memoized,
>>> invokes rr3, which throws, after queuing to memoize and
>>> resubmit rr1, when that calls back to resubmit r1, then rr1
>>> routines, signaling the original invoker.
>>>
>>> Then it seems each re-routine basically has an instance part
>>> and a memoized part, and that it's to flush the memo
>>> after it finishes, in terms of memoizing the inputs.
>>>
>>>
>>> Result 1 rr(String a1) {
>>> // if a1 is in the memo, return for it
>>> // else queue for it and carry on
>>>
>>> }
>>>
>>>
>>> What is a re-routine?
>>>
>>> It's a pattern for cooperative multithreading.
>>>
>>> It's sort of a functional approach to functions and flow.
>>>
>>> It has a declarative syntax in the language with usual
>>> flow-of-control.
>>>
>>> So, it's cooperative multithreading so it yields?
>>>
>>> No, it just quits, and expects to be called back.
>>>
>>> So, if it quits, how does it complete?
>>>
>>> The entry point to re-routine provides a callback.
>>>
>>> Re-routines only return results to other re-routines,
>>> It's the default callback. Otherwise they just callback.
>>>
>>> So, it just quits?
>>>
>>> If a re-routine gets called with a null, it throws.
>>>
>>> If a re-routine gets a null, it just continues.
>>>
>>> If a re-routine completes, it callbacks.
>>>
>>> So, can a re-routine call any regular code?
>>>
>>> Yeah, there are some issues, though.
>>>
>>> So, it's got callbacks everywhere?
>>>
>>> Well, it's just got callbacks implicitly everywhere.
>>>
>>> So, how does it work?
>>>
>>> Well, you build a re-routine with an input and a callback,
>>> you call it, then when it completes, it calls the callback.
>>>
>>> Then, re-routines call other re-routines with the argument,
>>> and the callback's in a ThreadLocal, and the re-routine memoizes
>>> all of its return values according to the object identity of the
>>> inputs,
>>> then when a re-routine completes, it calls again with another
>>> ThreadLocal
>>> indicating to delete the memos, following the exact same
>>> flow-of-control
>>> only deleting the memos going along, until it results all the
>>> memos in
>>> the re-routines for the interned or ref-counted input are deleted,
>>> then the state of the re-routine is de-allocated.
>>>
>>> So, it's sort of like a monad and all in pure and idempotent functions?
>>>
>>> Yeah, it's sort of like a monad and all in pure and idempotent
>>> functions.
>>>
>>> So, it's a model of cooperative multithreading, though with no yield,
>>> and callbacks implicitly everywhere?
>>>
>>> Yeah, it's sort of figured that a called re-routine always has a
>>> callback in the ThreadLocal, because the runtime has pre-emptive
>>> multithreading anyways, that the thread runs through its re-routines in
>>> their normal declarative flow-of-control with exception handling, and
>>> whatever re-routines or other pure monadic idempotent functions it
>>> calls, throw when they get null inputs.
>>>
>>> Also it sort of doesn't have primitive types, Strings must always
>>> be interned, all objects must have a distinct identity w.r.t. ==, and
>>> null is never an argument or return value.
>>>
>>> So, what does it look like?
>>>
>>> interface Reroutine1 {
>>>
>>> Result1 rr1(String a1) {
>>>
>>> Result2 r2 = reroutine2.rr2(a1);
>>>
>>> Result3 r3 = reroutine3.rr3(r2);
>>>
>>> return result(r2, r3);
>>> }
>>>
>>> }
>>>
>>> So, I expect that to return "result(r2, r3)".
>>>
>>> Well, that's synchronous, and maybe blocking, the idea is that it
>>> calls rr2, gets a1, and rr2 constructs with the callback of rr1 and it's
>>> own callback, and a1, and makes a memo for a1, and invokes whatever is
>>> its implementation, and returns null, then rr1 continues and invokes rr3
>>> with r2, which is null, so that throws a NullPointerException, and rr1
>>> quits.
>>>
>>> So, ..., that's cooperative multithreading?
>>>
>>> Well you see what happens is that rr2 invoked another re-routine or
>>> end routine, and at some point it will get called back, and that will
>>> happen over and over again until rr2 has an r2, then rr2 will memoize
>>> (a1, r2), and then it will callback rr1.
>>>
>>> Then rr1 had quit, it runs again, this time it gets r2 from the
>>> (a1, r2) memo in the monad it's building, then it passes a non-null r2
>>> to rr3, which proceeds in much the same way, while rr1 quits again until
>>> rr3 calls it back.
>>>
>>> So, ..., it's non-blocking, because it just quits all the time, then
>>> happens to run through the same paces filling in?
>>>
>>> That's the idea, that re-routines are responsible to build the
>>> monad and call-back.
>>>
>>> So, can I just implement rr2 and rr3 as synchronous and blocking?
>>>
>>> Sure, they're interfaces, their implementation is separate. If
>>> they don't know re-routine semantics then they're just synchronous and
>>> blocking. They'll get called every time though when the re-routine gets
>>> called back, and actually they need to know the semantics of returning
>>> an Object or value by identity, because, calling equals() to implement
>>> Memo usually would be too much, where the idea is to actually function
>>> only monadically, and that given same Object or value input, must return
>>> same Object or value output.
>>>
>>> So, it's sort of an approach as a monadic pure idempotency?
>>>
>>> Well, yeah, you can call it that.
>>>
>>> So, what's the point of all this?
>>>
>>> Well, the idea is that there are 10,000 connections, and any time
>>> one of them demultiplexes off the connection an input command message,
>>> then it builds one of these with the response input to the demultiplexer
>>> on its protocol on its connection, on the multiplexer to all the
>>> connections, with a callback to itself. Then the re-routine is launched
>>> and when it returns, it calls-back to the originator by its
>>> callback-number, then the output command response writes those back out.
>>>
>>> The point is that there are only as many Theads as cores so the
>>> goal is that they never block,
>>> and that the memos make for interning Objects by value, then the goal is
>>> mostly to receive command objects and handles to request bodies and
>>> result objects and handles to response bodies, then to call-back with
>>> those in whatever serial order is necessary, or not.
>>>
>>> So, won't this run through each of these re-routines umpteen times?
>>>
>>> Yeah, you figure that the runtime of the re-routine is on the order
>>> of n^2 the order of statements in the re-routine.
>>>
>>> So, isn't that terrible?
>>>
>>> Well, it doesn't block.
>>>
>>> So, it sounds like a big mess.
>>>
>>> Yeah, it could be. That's why to avoid blocking and callback
>>> semantics, is to make monadic idempotency semantics, so then the
>>> re-routines are just written in normal synchronous flow-of-control, and
>>> they're well-defined behavior is exactly according to flow-of-control
>>> including exception-handling.
>>>
>>> There's that and there's basically it only needs one Thread, so,
>>> less Thread x stack size, for a deep enough thread call-stack. Then the
>>> idea is about one Thread per core, figuring for the thread to always be
>>> running and never be blocking.
>>>
>>> So, it's just normal flow-of-control.
>>>
>>> Well yeah, you expect to write the routine in normal
>>> flow-of-control, and to test it with synchronous and in-memory editions
>>> that just run through synchronously, and that if you don't much care if
>>> it blocks, then it's the same code and has no semantics about the
>>> asynchronous or callbacks actually in it. It just returns when it's
>>> done.
>>>
>>>
>>> So what's the requirements of one of these again?
>>>
>>> Well, the idea is, that, for a given instance of a re-routine, it's
>>> an Object, that implements an interface, and it has arguments, and it
>>> has a return value. The expectation is that the re-routine gets called
>>> with the same arguments, and must return the same return value. This
>>> way later calls to re-routines can match the same expectation,
>>> same/same.
>>>
>>> Also, if it gets different arguments, by Object identity or
>>> primitive value, the re-routine must return a different return value,
>>> those being same/same.
>>>
>>> The re-routine memoizes its arguments by its argument list, Object
>>> or primitive value, and a given argument list is same if the order and
>>> types and values of those are same, and it must return the same return
>>> value by type and value.
>>>
>>> So, how is this cooperative multithreading unobtrusively in
>>> flow-of-control again?
>>>
>>> Here for example the idea would be, rr2 quits and rr1 continues, rr3
>>> quits and rr1 continues, then reaching rr4, rr4 throws and rr1 quits.
>>> When rr2's or rr3's memo-callback completes, then it calls-back rr1. as
>>> those come in, at some point rr4 will be fulfilled, and thus rr4 will
>>> quit and rr1 will quit. When rr4's callback completes, then it will
>>> call-back rr1, which will finally complete, and then call-back whatever
>>> called r1. Then rr1 runs itself through one more time to
>>> delete or decrement all its memos.
>>>
>>> interface Reroutine1 {
>>>
>>> Result1 rr1(String a1) {
>>>
>>> Result2 r2 = reroutine2.rr2(a1);
>>>
>>> Result3 r3 = reroutine3.rr3(a1);
>>>
>>> Result4 r4 = reroutine4.rr4(a1, r2, r3);
>>>
>>> return Result1.r4(a1, r4);
>>> }
>>>
>>> }
>>>
>>> The idea is that it doesn't block when it launchs rr2 and rr3, until
>>> such time as it just quits when it tries to invoke rr4 and gets a
>>> resulting NullPointerException, then eventually rr4 will complete and be
>>> memoized and call-back rr1, then rr1 will be called-back and then
>>> complete, then run itself through to delete or decrement the ref-count
>>> of all its memo-ized fragmented monad respectively.
>>>
>>> Thusly it's cooperative multithreading by never blocking and always just
>>> launching callbacks.
>>>
>>> There's this System.identityHashCode() method and then there's a notion
>>> of Object pools and interning Objects then as for about this way that
>>> it's about numeric identity instead of value identity, so that when
>>> making memo's that it's always "==" and for a HashMap with
>>> System.identityHashCode() instead of ever calling equals(), when calling
>>> equals() is more expensive than calling == and the same/same
>>> memo-ization is about Object numeric value or the primitive scalar
>>> value, those being same/same.
>>>
>>> https://docs.oracle.com/javase/8/docs/api/java/lang/System.html#identityHashCode-java.lang.Object-
>>>
>>>
>>>
>>>
>>> So, you figure to return Objects to these connections by their session
>>> and connection and mux/demux in these callbacks and then write those
>>> out?
>>>
>>> Well, the idea is to make it so that according to the protocol, the
>>> back-end sort of knows what makes a handle to a datum of the sort, given
>>> the protocol and the protocol and the protocol, and the callback is just
>>> these handles, about what goes in the outer callbacks or outside the
>>> re-routine, those can be different/same. Then the single writer thread
>>> servicing the network I/O just wants to transfer those handles, or, as
>>> necessary through the compression and encryption codecs, then write
>>> those out, well making use of the java.nio for scatter/gather and vector
>>> I/O in the non-blocking and asynchronous I/O as much as possible.
>>>
>>>
>>> So, that seems a lot of effort to just passing the handles, ....
>>>
>>> Well, I don't want to write any code except normal flow-of-control.
>>>
>>> So, this same/same bit seems onerous, as long as different/same has a
>>> ref-count and thus the memo-ized monad-fragment is maintained when all
>>> sorts of requests fetch the same thing.
>>>
>>> Yeah, maybe you're right. There's much to be gained by re-using monadic
>>> pure idempotent functions yet only invoking them once. That gets into
>>> value equality besides numeric equality, though, with regards to going
>>> into re-routines and interning all Objects by value, so that inside and
>>> through it's all "==" and System.identityHashCode, the memos, then about
>>> the ref-counting in the memos.
>>>
>>>
>>> So, I suppose you know HTTP, and about HTTP/2 and IMAP and NNTP here?
>>>
>>> Yeah, it's a thing.
>>>
>>> So, I think this needs a much cleaner and well-defined definition, to
>>> fully explore its meaning.
>>>
>>> Yeah, I suppose. There's something to be said for reading it again.
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>>
>> ReRoutines: monadic functional non-blocking asynchrony in the language
>>
>>
>> Implementing a sort of Internet protocol server, it sort of has three or
>> four kinds of machines.
>>
>> flow-machine: select/epoll hardware driven I/O events
>>
>> protocol-establishment: setting up and changing protocol (commands,
>> encryption/compression)
>>
>> protocol-coding: block coding in encryption/compression and wire/object
>> commands/results
>>
>> routine: inside the objects of the commands of the protocol,
>> commands/results
>>
>> Then, it often looks sort of like
>>
>> flow <-> protocol <-> routine <-> protocol <-> flow
>>
>>
>> On either outer side of the flow is a connection, it's a socket or the
>> receipt or sending of a datagram, according to the network interface and
>> select/epoll.
>>
>> The establishment of a protocol looks like
>> connection/configuration/commencement/conclusion, or setup/teardown.
>> Protocols get involved renegotiation within a protocol, and for example
>> upgrade among protocols. Then the protocol is setup and established.
>>
>> The idea is that a protocol's coding is in three parts for
>> coding/decoding, compression/decompression, and (en)cryption/decryption,
>> or as it gets set up.
>>
>> flow->decrypt->decomp->decod->routine->cod->comp->crypt->flow-v
>> flow<-crypt<-comp<-cod<-routine<-decod<-decomp<-decrypt<-flow<-
>>
>>
>>
>> Whenever data arrives, the idea goes, is that the flow is interpreted
>> according to the protocol, resulting commands, then the routine derives
>> results from the commands, as by issuing others, in their protocols, to
>> the backend flow. Then, the results get sent back out through the
>> protocol, to the frontend, the clients of what it serves the protocol
>> the server.
>>
>> The idea is that there are about 10,000 connections at a time, or more
>> or less.
>>
>> flow <-> protocol <-> routine <-> protocol <-> flow
>> flow <-> protocol <-> routine <-> protocol <-> flow
>> flow <-> protocol <-> routine <-> protocol <-> flow
>> ...
>>
>>
>>
>>
>> Then, the routine in the middle, has that there's one processor, and on
>> the processor are a number of cores, each one independent. Then, the
>> operating system establishes that each of the cores, has any number of
>> threads-of-control or threads, and each thread has the state of where it
>> is in the callstack of routines, and the threads are preempted so that
>> multithreading, that a core runs multiple threads, gives each thread
>> some running from the entry to the exit of the thread, in any given
>> interval of time. Each thread-of-control is thusly independent, while it
>> must synchronize with any other thread-of-control, to establish common
>> or mutual state, and threads establish taking turns by mutual exclusion,
>> called "mutex".
>>
>> Into and out of the protocol, coding, is either a byte-sequence or
>> block, or otherwise the flow is a byte-sequence, that being serial,
>> however the protocol multiplexes and demultiplexes messages, the
>> commands and their results, to and from the flow.
>>
>> Then the idea is that what arrives to/from the routine, is objects in
>> the protocol, or handles to the transport of byte sequences, in the
>> protocol, to the flow.
>>
>> A usual idea is that there's a thread that services the flow, where, how
>> it works is that a thread blocks waiting for there to be any I/O,
>> input/output, reading input from the flow, and writing output to the
>> flow. So, mostly the thread that blocks has that there's one thread that
>> blocks on input, and when there's any input, then it reads or transfers
>> the bytes from the input, into buffers. That's its only job, and only
>> one thread can block on a given select/epoll selector, which is any
>> given number of ports, the connections, the idea being that it just
>> blocks until select returns for its keys of interest, it services each
>> of the I/O's by copying from the network interface's buffers into the
>> program's buffers, then other threads do the rest.
>>
>> So, if a thread results waiting at all for any other action to complete
>> or be ready, it's said to "block". While a thread is blocked, the CPU or
>> core just skips it in scheduling the preemptive multithreading, yet it
>> still takes some memory and other resources and is in the scheduler of
>> the threads.
>>
>> The idea that the I/O thread, ever blocks, is that it's a feature of
>> select/epoll that hardware results waking it up, with the idea that
>> that's the only thread that ever blocks.
>>
>> So, for the other threads, in the decryption/decompression/decoding and
>> coding/compression/cryption, the idea is that a thread, runs through
>> those, then returns what it's doing, and joins back to a limited pool of
>> threads, with a usual idea of there being 1 core : 1 thread, so that
>> multithreading is sort of simplified, because as far as the system
>> process is concerned, it has a given number of cores and the system
>> preemptively multithreads it, and as far as the virtual machine is
>> concerned, is has a given number of cores and the virtual machine
>> preemptively multithreads its threads, about the thread-of-control, in
>> the flow-of-control, of the thing.
>>
>> A usual way that the routine muliplexes and demultiplexes objects in the
>> protocol from a flow's input back to a flow's output, has that the
>> thread-per-connection model has that a single thread carries out the
>> entire task through the backend flow, blocking along the way, until it
>> results joining after writing back out to its connection. Yet, that has
>> a thread per each connection, and threads use scheduling and heap
>> resources. So, here thread-per-connection is being avoided.
>>
>> Then, a usual idea of the tasks, is that as I/O is received and flows
>> into the decryption/decompression/decoding, then what's decoded, results
>> the specification of a task, the command, and the connection, where to
>> return its result. The specification is a data structure, so it's an
>> object or Object, then. This is added to a queue of tasks, where
>> "buffers" represent the ephemeral storage of content in transport the
>> byte-sequences, while, the queue is as usually a first-in/first-out
>> (FIFO) queue also, of tasks.
>>
>> Then, the idea is that each of the cores consumes task specifications
>> from the task queue, performs them according to the task specification,
>> then the results are written out, as coded/compressed/crypted, in the
>> protocol.
>>
>> So, to avoid the threads blocking at all, introduces the idea of
>> "asynchrony" or callbacks, where the idea is that the "blocking" and
>> "synchronous" has that anywhere in the threads' thread-of-control
>> flow-of-control, according to the program or the routine, it is current
>> and synchronous, the value that it has, then with regards to what it
>> returns or writes, as the result. So, "asynchrony" is the idea that
>> there's established a callback, or a place to pause and continue, then a
>> specification of the task in the protocol is put to an event queue and
>> executed, or from servicing the O/I's of the backend flow, that what
>> results from that, has the context of the callback and returns/writes to
>> the relevant connection, its result.
>>
>> I -> flow -> protocol -> routine -> protocol -> flow -> O -v
>> O <- flow <- protocol <- routine <- protocol <- flow <- I <-
>>
>>
>> The idea of non-blocking then, is that a routine either provides a
>> result immediately available, and is non-blocking, or, queues a task
>> what results a callback that provides the result eventually, and is
>> non-blocking, and never invokes any other routine that blocks, so is
>> non-blocking.
>>
>> This way a thread, executing tasks, always runs through a task, and thus
>> services the task queue or TQ, so that the cores' threads are always
>> running and never blocking. (Besides the I/O and O/I threads which block
>> when there's no traffic, and usually would be constantly woken up and
>> not waiting blocked.) This way, the TQ threads, only block when there's
>> nothing in the TQ, or are just deconstructed, and reconstructed, in a
>> "pool" of threads, the TQ's executor pool.
>>
>> Enter the ReRoutine
>>
>> The idea of a ReRoutine, a re-routine, is that it is a usual procedural
>> implementation as if it were synchronous, and agnostic of callbacks.
>>
>> It is named after "routine" and "co-routine". It is a sort of co-routine
>> that builds a monad and is aware its originating caller, re-caller, and
>> callback, or, its re-routine caller, re-caller, and callback.
>>
>> The idea is that there are callbacks implicitly at each method boundary,
>> and that nulls are reserved values to indicate the result or lack
>> thereof of re-routines, so that the code has neither callbacks nor any
>> nulls.
>>
>> The originating caller has that the TQ, has a task specification, the
>> session+attachment of the client in the protocol where to write the
>> output, and the command, then the state of the monad of the task, that
>> lives on the heap with the task specification and task object. The TQ
>> consumers or executors or the executor, when a thread picks up the task,
>> it picks up or builds ("originates") the monad state, which is the
>> partial state of the re-routine and a memo of the partial state of the
>> re-routine, and installs this in the thread local storage or
>> ThreadLocal, for the duration of the invocation of the re-routine. Then
>> the thread enters the re-routine, which proceeds until it would block,
>> where instead it queues a command/task with callback to re-call it to
>> re-launch it, and throw a NullPointerException and quits/returns.
>>
>> This happens recursively and iteratively in the re-routine implemented
>> as re-routines, each re-routine updates the partial state of the monad,
>> then that as a re-routine completes, it re-launches the calling
>> re-routine, until the original re-routine completes, and it calls the
>> original callback with the result.
>>
>> This way the re-routine's method body, is written as plain declarative
>> procedural code, the flow-of-control, is exactly as if it were
>> synchronous code, and flow-of-control is exactly as if written in the
>> language with no callbacks and never nulls, and exception-handling as
>> exactly defined by the language.
>>
>> As the re-routine accumulates the partial results, they live on the
>> heap, in the monad, as a member of the originating task's object the
>> task in the task queue. This is always added back to the queue as one of
>> the pending results of a re-routine, so it stays referenced as an object
>> on the heap, then that as it is completed and the original re-routine
>> returns, then it's no longer referenced and the garbage-collector can
>> reclaim it from the heap or the allocator can delete it.
>>
>>
>>
>>
>>
>>
>>
>> Well, for the re-routine, I sort of figure there's a Callstack and a
>> Callback type
>>
>> class Callstack {
>> Stack<Callback> callstack;
>> }
>>
>> interface Callback {
>> void callback() throws Exception;
>> }
>>
>> and then a placeholder sort of type for Callflush
>>
>> class Callflush {
>> Callstack callstack;
>> }
>>
>> with the idea that the presence in ThreadLocals is to be sorted out,
>> about a kind of ThreadLocal static pretty much.
>>
>> With not returning null and for memoizing call-graph dependencies,
>> there's basically for an "unvoid" type.
>>
>> class unvoid {
>>
>> }
>>
>> Then it's sort of figure that there's an interface with some defaults,
>> with the idea that some boilerplate gets involved in the Memoization.
>>
>> interface Caller {}
>>
>> interface Callee {}
>>
>> class Callmemo {
>> memoize(Caller caller, Object[] args);
>> flush(Caller caller);
>> }
>>
>>
>> Then it seems that the Callstack should instead be of a Callgraph, and
>> then what's maintained from call to call is a Callpath, and then what's
>> memoized is all kept with the Callgraph, then with regards to objects on
>> the heap and their distinctness, only being reachable from the
>> Callgraph, leaving less work for the garbage collector, to maintain the
>> heap.
>>
>> The interning semantics would still be on the class level, or for
>> constructor semantics, as with regards to either interning Objects for
>> uniqueness, or that otherwise they'd be memoized, with the key being the
>> Callpath, and the initial arguments into the Callgraph.
>>
>> Then the idea seems that the ThreaderCaller, establishes the Callgraph
>> with respect to the Callgraph of an object, installing it on the thread,
>> otherwise attached to the Callgraph, with regards to the ReRoutine.
>>
>>
>>
>> About the ReRoutine, it's starting to come together as an idea, what is
>> the apparatus for invoking re-routines, that they build the monad of the
>> IOE's (inputs, outputs, exceptions) of the re-routines in their
>> call-graph, in terms of ThreadLocals of some ThreadLocals that callers
>> of the re-routines, maintain, with idea of the memoized monad along the
>> way, and each original re-routine.
>>
>> class IOE <O, E> {
>> Object[] input;
>> Object output;
>> Exception exception;
>> }
>>
>> So the idea is that there are some ThreadLocal's in a static ThreadGlobal
>>
>> public class ThreadGlobals {
>> public static ThreadLocal<MonadMemo> monadMemo;
>> }
>>
>> where callers or originators or ReRoutines, keep a map of the Runnables
>> or Callables they have, to the MonadMemo's,
>>
>> class Originator {
>> Map<? extends ReRoutineMapKey, MonadMemo> monadMemoMap;
>> }
>>
>> then when it's about to invoke a Runnable, if it's a ReRoutine, then it
>> either retrieves the MonadMemo or makes a new one, and sets it on the
>> ThreadLocal, then invokes the Runnable, then clears the ThreadLocal.
>>
>> Then a MonadMemo, pretty simply, is a List of IOE's, that when the
>> ReRoutine runs through the callgraph, the callstack is indicated by a
>> tree of integers, and the stack path in the ReRoutine, so that any
>> ReRoutine that calls ReRoutines A/B/C, points to an IOE that it finds in
>> the thing, then it's default behavior is to return its memo-ized value,
>> that otherwise is making the callback that fills its memo and re-invokes
>> all the way back the Original routine, or just its own entry point.
>>
>> This is basically that the Originator, when the ReRoutine quits out,
>> sort of has that any ReRoutine it originates, also gets filled up by the
>> Originator.
>>
>> So, then the Originator sort of has a map to a ReRoutine, then for any
>> Path, the Monad, so that when it sets the ThreadLocal with the
>> MonadMemo, it also sets the Path for the callee, launches it again when
>> its callback returned to set its memo and relaunch it, then back up the
>> path stack to the original re-routine.
>>
>> One of the issues here is "automatic parallelization". What I mean by
>> that is that the re-routine just goes along and when it gets nulls
>> meaning "pending" it just continues along, then expects
>> NullPointerExceptions as "UnsatisifiedInput", to quit, figuring it gets
>> relaunched when its input is satisfied.
>>
>> This way then when routines serially don't depend on each others'
>> outputs, then they all get launched apiece, parallelizing.
>>
>> Then, I wonder about usual library code, basically about Collections and
>> Streams, and the usual sorts of routines that are applied to the
>> arguments, and how to basically establish that the rule of re-routine
>> code is that anything that gets a null must throw a
>> NullPointerException, so the re-routine will quit until the arguments
>> are satisfied, the inputs to library code. Then with the Memo being
>> stored in the MonadMemo, it's figured that will work out regardless the
>> Objects' or primitives' value, with regards to Collections and Stream
>> code and after usual flow-of-control in Iterables for the for loops, or
>> whatever other application library code, that they will be run each time
>> the re-routine passes their section with satisfied arguments, then as
>> with regards to, that the Memo is just whatever serial order the
>> re-routine passes, not needing to lookup by Object identity which is
>> otherwise part of an interning pattern.
>>
>> void rr1(String s1) {
>>
>> List<String> l1 = rr2.get(s1);
>>
>> Map<String, String> m1 = new LinkedHashMap<>();
>>
>> l1.stream().forEach(s -> m1.put(s, rr3.get(s)));
>>
>> return m1;
>> }
>>
>> See what I figure is that the order of the invocations to rr3.get() is
>> serial, so it really only needs to memoize its OE, Output|Exception,
>> then about that putting null values in the Map, and having to check the
>> values in the Map for null values, and otherwise to make it so that the
>> semantics of null and NullPointerException, result that satisfying
>> inputs result calls, and unsatisfying inputs result quits, figuring
>> those unsatisfying inputs are results of unsatisfied outputs, that will
>> be satisfied when the callee gets populated its memo and makes the
>> callback.
>>
>> If the order of invocations is out-of-order, gets again into whether the
>> Object/primitive by value needs to be the same each time, IOE, about the
>> library code in Collections, Streams, parallelStream, and Iterables, and
>> basically otherwise that any kind of library code, should throw
>> NullPointerException if it gets an "unexpected" null or what doesn't
>> fulfill it.
>>
>> The idea though that rr3 will get invoked say 1000 times with the rr2's
>> result, those each make their call, then re-launch 1000 times, has that
>> it's figured that the Executor, or Originator, when it looks up and
>> loads the "ReRoutineMapKey", is to have the count of those and whether
>> the count is fulfilled, then to no-op later re-launches of the
>> call-backs, after all the results are populated in the partial monad
>> memo.
>>
>> Then, there's perhaps instead as that each re-routine just checks its
>> input or checks its return value for nulls, those being unsatisfied.
>>
>> (The exception handling thoroughly or what happens when rr3 throws and
>> this kind of thing is involved thoroughly in library code.)
>>
>> The idea is it remains correct if the worst thing nulls do is throw
>> NullPointerException, because that's just a usual quit and means another
>> re-launch is coming up, and that it automatically queues for
>> asynchronous parallel invocation each the derivations while resulting
>> never blocking.
>>
>> It's figured that re-routines check their inputs for nulls, and throw
>> quit, and check their inputs for library container types, and checking
>> any member of a library container collection for null, to throw quit,
>> and then it will result that the automatic asynchronous parallelization
>> proceeds, while the re-routines are never blocking, there's only as much
>> memory on the heap of the monad as would be in the lifetime of the
>> original re-routine, and whatever re-calls or re-launches of the
>> re-routine established local state in local variables and library code,
>> would come in and out of scope according to plain stack unwinding.
>>
>> Then there's still the perceived deficiency that the re-routine's method
>> body will be run many times, yet it's only run as many times as result
>> throwing-quit, when it reaches where its argument to the re-routine or
>> result value isn't yet satisfied yet is pending.
>>
>> It would re-run the library code any number of times, until it results
>> all non-nulls, then the resulting satisfied argument to the following
>> re-routines, would be memo-ized in the monad, and the return value of
>> the re-routine thus returning immediately its value on the partial monad.
>>
>> This way each re-call of the re-routine, mostly encounters its own monad
>> results in constant time, and throws-quit or gets thrown-quit only when
>> it would be unsatisfying, with the expectation that whatever
>> throws-quit, either NullPointerException or extending
>> NullPointerException, will have a pending callback, that will queue on a
>> TQ, the task specification to re-launch and re-enter the original or
>> derived, re-routine.
>>
>> The idea is sort of that it's sort of, Java with non-blocking I/O and
>> ThreadLocal (1.7+, not 17+), or you know, C/C++ with non-blocking I/O
>> and thread local storage, then for the abstract or interface of the
>> re-routines, how it works out that it's a usual sort of model of
>> co-operative multithreading, the re-routine, the routine "in the
>> language".
>>
>>
>> Then it's great that the routine can be stubbed or implemented agnostic
>> of asynchrony, and declared in the language with standard libraries,
>> basically using the semantics of exception handling and convention of
>> re-launching callbacks to implement thread-of-control flow-of-control,
>> that can be implemented in the synchronous and blocking for unit tests
>> and modules of the routine, making a great abstraction of
>> flow-of-control.
>>
>>
>> Basically anything that _does_ block then makes for having its own
>> thread, whose only job is to block and when it unblocks, throw-toss the
>> re-launch toward the origin of the re-routine, and consume the next
>> blocking-task off the TQ. Yet, the re-routines and their servicing the
>> TQ only need one thread and never block. (And scale in core count and
>> automatically parallelize asynchronous requests according to satisfied
>> inputs.)
>>
>>
>> Mostly the idea of the re-routine is "in the language, it's just plain,
>> ordinary, synchronous routine".
>>
>>
>>
>
>
> Protocol Establishment
>
> Each of these protocols is a combined sort of protocol, then according
> to different modes, there's established a protocol, then data flows in
> the protocol (in time).
>
>
> stream-based (connections)
> sockets, TCP/IP
> sctp SCTP
> message-based (datagrams)
> datagrams, UDP
>
> The idea is that connections can have state and session state, while,
> messages do not.
>
> Abstractly then there's just that connections make for reading from the
> connection, or writing to the connection, byte-by-byte,
> while messages make for receiving a complete message, or writing a
> complete message. SCTP is sort of both.
>
> A bit more concretely, the non-blocking or asychronous or vector I/O,
> means that when some bytes arrive the connection is readable, and while
> the output buffer is not full a connection is writeable.
>
> For messages it's that when messages arrive messages are readable, and
> while the output buffer is not full messages are writeable.
>
> Otherwise bytes or messages that pile up while not readable/writeable
> pile up and in cases of limited resources get lost.
>
> So, the idea is that when bytes arrive, whatever's servicing the I/O's
> has that the connection has data to read, and, data to write.
> The usual idea is that an abstract Reader thread, will give any or all
> of the connections something to read, in an arbitrary order,
> at an arbitrary rate, then the role of the protocol, is to consume the
> bytes to read, thus releasing the buffers, that the Reader, writes to.
>
> Inputting/Reading
> Writing/Outputting
>
> The most usual idea of client-server is that
> client writes to server then reads from server, while,
> server reads from client then writes to client.
>
> Yet, that is just a mode, reads and writes are peer-peer,
> reads and writes in any order, while serial according to
> that bytes in the octet stream arrive in an order.
>
> There isn't much consideration of the out-of-band,
> about sockets and the STREAMS protocol, for
> that bytes can arrive out-of-band.
>
>
> So, the layers of the protocol, result that some layers of the protocol
> don't know anything about the protocol, all they know is sequences of
> bytes, and, whatever session state is involved to implement the codec,
> of the layers of the protocol. All they need to know is that given that
> all previous bytes are read/written, that the connection's state is
> synchronized, and everything after is read/written through the layer.
> Mostly once encryption or compression is setup it's never toredown.
>
> Encryption, TLS
> Compression, LZ77 (Deflate, gzip)
>
> The layers of the protocol, result that some layers of the protocol,
> only indicate state or conditions of the session.
>
> SASL, Login, AuthN/AuthZ
>
> So, for NNTP, a connection, usually enough starts with no layers,
> then in the various protocols and layers, get negotiated to get
> established,
> combinations of the protocols and layers. Other protocols expect to
> start with layers, or not, it varies.
>
> Layering, then, either is in the protocol, to synchronize the session
> then establish the layer in the layer protocol then maintain the layer
> in the main protocol, has that TLS makes a handsake to establish a
> encryption key for all the data, then the TLS layer only needs to
> encrypt and decrypt the data by that key, while for Deflate, it's
> usually the only option, then after it's setup as a layer, then
> everything other way reads/writes gets compressed.
>
>
> client -> REQUEST
> RESPONSE <- server
>
> In some protocols these interleave
>
> client -> REQUEST1
> client -> REQUEST2
>
> RESPONSE1A <- server
> RESPONSE2A <- server
> RESPONSE1B <- server
> RESPONSE2B <- server
>
> This then is called multiplexing/demultiplexing, for protocols like IMAP
> and HTTP/2,
> and another name for multiplexer/demultiplexer is mux/demux.
>
>
>
>
> So, for TLS, the idea is that usually most or all of the connections
> will be using the same algorithms with different keys, and each
> connection will have its own key, so the idea is to completely separate
> TLS establishment from TLS cryptec (crypt/decryp), so, the layer need
> only key up the bytes by the connection's key, in their TLS frames.
>
> Then, most of the connections will use compression, then the idea is
> that the data is stored at rest compressed already and in a form that it
> can be concatenated, and that similarly as constants are a bunch of the
> textual context of the text-based protocol, they have compressed and
> concatenable constants, with the idea that the Deflate compec
> (comp/decomp) just passes those along concatenating them, or actively
> compresses/decompresses buffers of bytes or as of sequences of bytes.
>
> The idea is that Readers and Writers deal with bytes at a time,
> arbitrarily many, then that what results being passed around as the
> data, is as much as possible handles to the data. So, according to the
> protocol and layers, indicates the types, that the command routines, get
> and return, so that the command routines can get specialized, when the
> data at rest, is already layerized, and otherwise to adapt to the more
> concrete abstraction, of the non-blocking, asynchronous, and vector I/O,
> of what results the flow-machine.
>
>
> When the library of the runtime of the framework of the language
> provides the cryptec or compec, then, there's issues, when, it doesn't
> make it so for something like "I will read and write you the bytes as of
> making a TLS handshake, then return the algorithm and the key and that
> will implement the cryptec", or, "compec, here's either some data or
> handles of various types, send them through", it's to be figured out.
> The idea for the TLS handshake, is basically to sit in the middle, i.e.
> to read and write bytes as of what the client and server send, then
> figuring out what is the algorithm and key and then just using that as
> the cryptec. Then after TLS algorithm and key is established the rest is
> sort of discarded, though there's some idea about state and session, for
> the session key feature in TLS. The TLS 1.2 also includes comp/decomp,
> though, it's figured that instead it's a feature of the protocol whether
> it supports compression, point being that's combining layers, and to be
> implemented about these byte-sequences/handles.
>
>
> mux/demux
> crypt/decrypt
> comp/decomp
> cod/decod
>
> codec
>
>
> So, the idea is to implement toward the concrete abstraction of
> nonblocking vector I/O, while, remaining agnostic of that, so that all
> sorts the usual test routines yet particularly the composition of layers
> and establishment and upgrade of protocols, is to happen.
>
>
> Then, from the byte sequences or messages as byte sequences, or handles
> of byte sequences, results that in the protocol, the protocol either way
> in/out has a given expected set of alternatives that it can read, then
> as of derivative of those what it will write.
>
> So, after the layers, which are agnostic of anything but byte-sequences,
> and their buffers and framing and chunking and so on, then is the
> protocol, or protocols, of the command-set and request/response
> semantics, and ordering/session statefulness, and lack thereof.
>
> Then, a particular machine in the flow-machine is as of the "Recognizer"
> and "Parser", then what results "Annunciators" and "Legibilizers", as it
> were, of what's usually enough called "Deserialization", reading off
> from a serial byte-sequence, and "Serialization, writing off to a serial
> byte-sequence, first the text of the commands or the structures in these
> text-based protocols, the commands and their headers/bodies/payloads,
> then the Objects in the object types of the languages of the runtime,
> where then the routines of the servicing of the protocol, are defined in
> types according to the domain types of the protocol (and their
> representations as byte-sequences and handles).
>
> As packets and bytes arrive in the byte-sequence, the Recognizer/Parser
> detects when there's a fully-formed command, and its payload, after the
> Mux/Demux Demultiplexer, has that the Demultiplexer represents any given
> number of separate byte-sequences, then according to the protocol
> anything their statefulness/session or orderedness/unorderedness.
>
> So, the Demultiplexer is to Recognize/Parse from the combined input
> byte-stream its chunks, that now the connection, has any number of
> ordered/unordered byte-sequences, then usually that those are ephemeral
> or come and go, while the connection endures, with the most usual notion
> that there's only one stream and it's ordered in requets and ordered in
> responses, then whether commands gets pipelined and requests need not
> await their responses (they're ordered), and whether commands are
> numbers and their responses get associated with their command sequence
> numbers (they're unordered and the client has its own mux/demux to
> relate them).
>
> So, the Recognizer/Parser, theoretically only gets a byte at a time, or
> even none, and may get an entire fully-formed message (command), or not,
> and may get more bytes than a fully-formed message, or not, and the
> bytes may be a well-formed message, or not, and valid, or not.
>
> Then the job of the Recognizer/Parser, is from the beginning of the
> byte-sequence, to Recognize a fully-formed message, then to create an
> instance of the command object related to the handle back through the
> mux/demux to the multiplexer, called the attachment to the connection,
> or the return address according to the attachment representing any
> routed response and usually meaning that the attachment is the user-data
> and any session data attached to the connection and here of the
> mux/demux of the connection, the job of the Recognizer/Parser is to work
> any time input is received, then to recognize and parse any number of
> fully-formed messages from the input, create those Commands according to
> the protocol, that the attachment includes the return destination, and,
> thusly release those buffers or advance the marker on the Input
> byte-sequence, so that the resources are freed, and later
> Recognizings/Parsing starts where it left off.
>
> The idea is that bytes arrive, the Recognizer/Parser has to determine
> when there's a fully-formed message, consume that and service the
> buffers the byte-sequence, having created the derived command.
>
> Now, commands are small, or so few words, then the headers/body/payload,
> basically get larger and later unboundedly large. Then, the idea is that
> the protocol, has certain modes or sub-protocols, about "switching
> protocols", or modes, when basically the service of the routine changes
> from recognizing and servicing the beginning to ending of a command, to
> recognizing and servicing an arbitrarily large payload, or, for example,
> entering a mode where streamed data arrives or whatever sort, then that
> according to the length or content of the sub-protocol format, the
> Recognizer's job includes that the sub-protocol-streaming, modes, get
> into that "sub-protocols" is a sort of "switching protocols", the only
> idea though being going into the sub-protocol then back out to the main
> protocol, while "switching protocols" is involved in basically any the
> establishment or upgrade of the protocol, with regards to the stateful
> connection (and not stateless messages, which always are according to
> their established or simply some fixed protocol).
>
> This way unboundedly large inputs, don't actually live in the buffers of
> the Recognizers that service the buffers of the Inputters/Readers and
> Multiplexers/Demultiplexers, instead define modes where they will be
> streaming through arbitrarily large payloads.
>
> Here for NNTP and so on, the payloads are not considered arbitrarily
> large, though, it's sort of a thing that sending or receiving the
> payload of each message, can be defined this way so that in very, very
> limited resources of buffers, that the flow-machine keeps flowing.
>
>
> Then, here, the idea is that these commands and their payloads, have
> their outputs that are derived as a function of the inputs. It's
> abstractly however this so occurs is the way it is. The idea here is
> that the attachment+command+payload makes a re-routine task, and is
> pushed onto a task queue (TQ). Then it's figured that the TQ represents
> abstractly the execution of all the commands. Then, however many Task
> Workers or TW, or the TQ that runs itself, get the oldest task from the
> queue (FIFO) and run it. When it's complete, then there's a response
> ready in byte-sequences are handles, these are returned to the attachment.
>
> (The "attachment" usually just means a user or private datum associated
> with the connection to identify its session with the connection
> according to non-blocking I/O, here it also means the mux/demux
> "remultiplexer" attachment, it's the destination of any response
> associated with a stream of commands over the connection.)
>
> So, here then the TQ basically has the idea of the re-routine, that is
> non-blocking and involves the asynchronous fulfillment of the routine in
> the domain types of the domain of object types that the protocol adapts
> as an adapter, that the domain types fulfill as adapted. Then for NNTP
> that's like groups and messages and summaries and such, the objects. For
> IMAP its mailboxes and messages to read, for SMTP its emails to send,
> with various protocols in SMTP being separate protocols like DKIM or
> what, for all these sorts protocols. For HTTP and HTTP/2 it's usual HTTP
> verbs, usually HTTP 1.1 serial and pipelined requests over a connection,
> in HTTP/2 mutiplexed requests over a connection. Then "session" means
> broadly that it may be across connections, what gets into the attachment
> and the establishment and upgrade of protocol, that sessions are
> stateful thusly, yet granularly, as to connections yet as to each request.
>
>
> Then, the same sort of thing is the same sort of thing to back-end,
> whatever makes for adapters, to domain types, that have their protocols,
> and what results the O/I side to the I/O side, that the I/O side is the
> server's client-facing side, while the O/I side is the
> server-as-a-client-to-the-backend's, side.
>
> Then, the O/I side is just the same sort of idea that in the
> flow-machine, the protocols get established in their layers, so that all
> through the routine, then the domain type are to get specialized to when
> byte-sequences and handles are known well-formed in compatible
> protocols, that the domain and protocol come together in their
> definition, basically so it results that from the back-end is retrieved
> for messages by their message-ID that are stored compressed at rest, to
> result passing back handles to those, for example a memory-map range
> offset to an open handle of a zip file that has the concatenable entry
> of the message-Id from the groups' day's messages, or a list of those
> for a range of messages, then the re-routine results passing the handles
> back out to the attachment, which sends them right out.
>
> So, this way there's that besides the TQ and its TW's, that those are to
> never block or be long-running, that anything that's long-running is on
> the O/I side, and has its own resources, buffers, and so on, where of
> course all the resources here of this flow-machine are shared by all the
> flow-machines in the flow-machine, in the sense that they are not shared
> yet come from a common resource altogether, and are exclusive. (This
> gets into the definition of "share" as with regards to "free to share,
> or copy" and "exclusive to share, a.k.a. taking turns, not cutting in
> line, and not stealing nor hoarding".)
>
>
> Then on the O/I side or the backend side, it's figured the backend is
> any kind of adapters, like DB adapters or FS adapters or WS adapters,
> database or filesystem or webservice, where object-stores are considered
> filesystem adapters. What that gets into is "pools" like client pools,
> connection pools, resource pools, that a pool is usually enough
> according to a session and the establishment of protocol, then with
> regards to servicing the adapter and according to the protocol and the
> domain objects that thusly implement the protocol, the backend side has
> its own dedicated routines and TW's, or threads of execution, with
> regards to that the backend side basically gets a callback+request and
> the job is to invoke the adapter with the request, and invoke the
> callback with the response, then whether for example the callback is
> actually the original attachment, or it involves "bridging the unbounded
> sub-protocol", what it means for the adapter to service the command.
>
> Then the adapter is usually either provided as with intermediate or
> domain types, or, for example it's just another protocol flow machine
> and according to the connections or messaging or mux/demux or
> establishing and upgrading layers and protocols, it basically works the
> same way as above in reverse.
>
> Here "to service" is the usual infinitive that for the noun means "this
> machine provides a service" yet as a verb that service means to operate
> according to the defined behavior of the machine in the resources of the
> machine to meet the resource needs of the machine's actions in the
> capabilities and limits of the resources of the machine, where this "I/O
> flow-machine: a service" is basically one "node" or "process" in a usual
> process model, allocated its own quota of resources according to the
> process and its environment model in the runtime in the system, and
> that's it. So, there's servicing as the main routine, then also what it
> means the maintenance servicing or service of the extended routine.
> Then, for protocols it's "implement this protocol according to its
> standards according to the resources in routine".
>
>
> You know, I don't know where they have one of these anywhere, ....
>
>









So, besides attachment+command+payload, also is for indicating the
protocol and layers, where it can inferred for the response, when the
callback exists or as the streaming sub-protocol starts|continues|ends,
what the response can be, in terms of domain objects, or handles, or
byte sequences, in terms of domain objects that can result handles to
transfer or byte-sequences to read or write,
attachment+command+payload+protocols "ACPP" data structure.

Another idea that seems pretty usual, is when the payload is off to the
side, about picking up the payload when the request arrives, about when
the command, in the protocol, involves that the request payload, is off
to the side, to side-load the payload, where usually it means the
payload is large, or bigger than the limits of the request size limit in
the protocol, it sort of seems a good idea, to indicate for the
protocol, whether it can resolve resource references, "external", then
that accessing them as off to the side happens before ingesting the
command or as whether it's the intent to reference the external
resource, and when, when the external resource off to the side, "is",
part of the request payload, or otherwise that it's just part of the
routine.

That though would get into when the side effect of the routine, is to
result the external reference or call, that it's figured that would all
be part of the routine. It depends on the protocol, and whether the
payload "is" fully-formed, with or without the external reference.


Then HTTP/2 and Websockets have plenty going on about the multiplexer,
where it's figured that multiplexed attachments, or "remultiplexer
attachment", RMA, out from the demultiplexer and back through the
multiplexer, have then that's another sort of protocol machine, in terms
of the layers, and about whether there's a thread or not that
multiplexing requires any sort of state on otherwise the connections'
attachment, that all the state of the multiplexer is figured lives in a
data structure on the actual attachment, while the logic should be
re-entrant and just a usual module for the protocol(s).

It's figured then that the attachment is a key, with respect to a key
number for the attachment, then that in the multiplexing or muxing
protocols, there's a serial number of the request or command. There's a
usual idea to have serial numbers for commands besides, for each
connection, and then even serial numbers for commands for the lifetime
of the runtime. Then it's the usual metric of success or the error rate
how many of those are successes and how many are failures, that
otherwise the machine is pretty agnostic that being in the protocol.

Timeouts and cancels are sort of figured to be attached to the monad and
the re-routine. It's figured that for any command in the protocol, it
has a timeout. When a command is received, is when the timeout countdown
starts, abstractly wall-clock time or system time. So, the ACPP has also
the timeout time, so, the task T has an ACPP
attachment-command-payload-protocol and a routine or reroutine R or RR.
Then also it has some metrics M or MT, here start time and expiry time,
and the serial numbers. So, how timeouts work is that when T is to be
picked up to a TW, first TW checks whether M.time is past expiry, then
if so it cancels the monad and results returning howsoever in the
protocol the timeout. If not what's figured is that before the
re-routine runs through, it just tosses T back on the TQ anyway, so that
then whenever it comes up again, it's just checked again until such time
as the task T actually completed, or it expires, or it was canceled, or
otherwise concluded, according to the combination of the monad of the
R/RR, and M.time, and system time. Now, this seems bad, because an
otherwise empty queue, would constantly be thrashing, so it's bad. Then,
what's to be figured is some sort of parameter, "toss when", that then
though would have timeout priority queues, or buckets of sorts with
regards to tossing all the tasks T back on the TQ for no other reason
than to check their timeout.

It's figured that the monad of the re-routine is all the heap objects
and references to handles of the outstanding command. So, when the
re-routine is completed/canceled/concluded, then all the resources of
the monad should be freed. Then it's figured that any routine to access
the monad is re-entrant, and so that it results that access to the monad
is atomic, to build the graph of memos in the monad, then that access to
each memo is atomic as after access to the monad itself, so that the
access to the monad is thread-safe (and to be non-blocking, where the
only thing that happens to the monad is adding re-routine paths, and
getting and setting values of object values and handles, then releasing
all of it [, after picking out otherwise the result]).

So it's figured that if there's a sort of sweeper or closer being the
usual idea of timeouts, then also in the case that for whatever reason
the asynchronous backend fails, to get a success or error result and
callback, so that the task T

T{
RMA attachment; // return/remultiplexer attachment
PCP command; // protocol command/payload
RR routine; // routine / re-routine (monad)
MT metrics; // metrics/time
}

has that timeouts, are of a sort of granularity. So, it's not so much
that timeouts need to be delivered at a given exact time, as delivered
within a given duration of time. The idea is that timeouts both call a
cancel on the routine and result an error in the protocol. (Connection
and socket timeouts or connection drops or closures and so on, should
also result cancels as some form of conclusion cleans up the monad's
resources.)

There's also that timeouts are irrelevant after conclusion, yet if
there's a task queue of timeouts, not to do any work fishing them out,
just letting them expire. Yet, given that timeouts are usually much
longer than actual execution times, there's no point keeping them around.

Then it's figured each routine and sub-routine, has its timing, then
it's figured to have that the RR and MT both have the time, then as with
regards to, the RR and MT both having a monad, then whether it's the
same monad what it's figured, is what it's figured.

TASK {
RMA attachment; // return/remultiplexer attachment
PCP command; // protocol command/payload
RRMT routine; // routine / re-routine, metrics / time (monad)
}

Then it's figured that any sub-routine checks the timeout overall, and
the timeouts up the re-routine, and the timeout of the task, resulting a
cancel in any timeout, then basically to push that on the back of the
task queue or LIFO last-in-first-out, which seems a bad idea, though
that it's to expeditiously return an error and release the resources,
and cancel any outstanding requests.

So, any time a task is touched, there's checking the attachment whether
it's dropped, checking the routine whether it's canceled, with the goal
of that it's all cleaned up to free the resources, and to close any
handles opened in the course of building the monad of the routine's results.

Otherwise while a command is outstanding there's not much to be done
about it, it's either outstanding and not started or outstanding and
started, until it concludes and there's a return, the idea being that
the attachment can drop at any time and that would be according to the
Inputter/Reader or Recognizer/Parser (an ill-formed command results
either an error or a drop), the routine can conclude at any time either
completing or being canceled, then that whether any handles are open in
the payload, is that a drop in the attachment, disconnect in the
[streaming] command, or cancel in the routine, ends each of the three,
each of those two, or that one.

(This is that the command when 'streaming sub-protocol' results a bunch
of commands in a sub-protocol that's one command in the protocol.)

The idea is that the RMA is only enough detail to relate to the current
state in the attachment of the remultiplexing, the command is enough
state to describe its command and payload and with regards to what
protocol it is and what sub-protocols it entered and what protocol it
returns to, and the routine is the monad of the entire state of the
routine, either value objects or open handles, to keep track of all the
things according to these things.

So, still it's not quite clear how to have the timeout in the case that
the backend hangs, or drops, or otherwise that there's no response from
the adapter, what's a timeout. This sort of introduces re-try logic to
go along with time-out logic.

The re-try logic, involves that anything can fail, and some things can
be re-tried when they fail. The re-try logic would be part of the
routine or re-routine, figuring that any re-tries still have to live in
the time of the command. Then re-tries are kind of like time-outs, it's
usual that it's not just hammering the re-tries, yet a usual sort of
back-off and retry-count, or retry strategy, and then whether that it
involves that it should be a new adapter handle from the pool, about
that adapter handles from the pool should be round-robin and when there
are retry-able errors that usually means the adapter connection is
un-usable, that getting a new adapter connection will get a new one and
whether retry-able errors plainly enough indicate to recycle the adapter
pool.

Then, retry-logic also involves resource-down, what's called
circuit-breaker when the resource is down that it's figured that it's
down until it's back up. [It's figured that errors by default are _not_
retry-able, and, then as about the resource-health or
backend-availability, what gets involved in a model of critical
resource-recycling and backend-health.]


About server-push, there's an idea that it involves the remultiplexer
and that the routine, according to the protocol, synthesizes tasks and
is involved with the remultiplexer, to result it makes tasks then that
run like usual tasks. [This is part of the idea also of the mux or
remux, about 1:many commands/responses, and usually enough their
serials, and then, with regards to "opportunistic server push", how to
drop the commands that follow that would otherwise request the
resources. HTTP/2 server-push looks deprecated, while then there's
WebSocket, which basically makes for a different sort of use-case
peer-peer than client-server. For IMAP is the idea that when there are
multiple responses to single commands then that's basically in the
mux/remux. For pipelined commands and also for serial commands is the
mux/remux. The pipelined commands would result state building in the
mux/remux when they're returned disordered, with regards to results and
the handles, and 'TCB' or 'TW' driving response results.]


So, how to implement timeout or the sweeper/closer, has for example that
a connection drop, should cancel all the outstanding tasks for that
connection. For example, undefined behavior of whatever sort results a
missed callback, should eventually timeout and cancel the task, or all
the tasks instances in the TQ for that task. (It's fair enough to just
mark the monads of the attachment or routine as canceled, then they'll
just get immediately discarded when they come up in the TQ.) There's no
point having timeouts in the task queue because they'd either get
invoked for nothing or get added to the task queue long after the task
usually completes. (It's figured that most timeouts are loose timeouts
and most tasks complete in much under their timeout, yet here it's
automatic that timeouts are granular to each step of the re-routine, in
terms of the re-routine erroring-out if a sub-routine times-out.)


The Recognizer/Parser (Commander) is otherwise stateless, the
Inputter/Reader and its Remultiplexer Attachment don't know what results
Tasks, the Task Queue will run (and here non-blockingly) any Task's
associated routine/re-reroutine, and catch timeouts in the execution of
the re-routine, the idea is that the sweeper/closer basically would only
result having anything to do when there's undefined behavior in the
re-routine, or bugs, or backend timeouts, then whether calls to the
adapter would have the timeout-task-lessors or "TTL's", in its task
queue, point being that when there's nothing going on that the entire
thing is essentially _idle_, with the Inputter/Reader blocked on select
on the I/O side, the Outputter/Writer or Backend Adapter sent on the O/I
side, the Inputter/Reader blocked on the O/I side, the TQ's empty (of,
the protocol, and, the backend adapters), and it's all just pending
input from the I/O or O/I side, to cascade the callbacks back to idle,
again.

I.e. there shouldn't be timeout tasks in the TQ, because, at low load,
they would just thrash and waste cycles, and at high load, would arrive
late. Yet, it is so that there is formal un-reliability of the routines,
and, formal un-reliability of the O/I side or backend, [and formal
un-reliability of connections or drops,] so some sweeper/closer checks
outstanding commands what should result canceling the command and its
routines, then as with regards to the backend adapter, recycling or
teardown the backend adapter, to set it up again.

Then the idea is that, Tasks, well enough represent the outstanding
commands, yet there's not to be maintaining a task set next to the task
queue, because it would use more space and maintenance in time than the
queue itself, while multiple instances of the same Task can be in the
Task queue as point each to the state of the monad in the re-routine,
then gets into whether it's so, that, there is a task-set next to the
task-queue, then that concluding the task removes it from the set, while
the sweeper/closer just is scheduled to run periodically through the
entire task-set and cancel those expired, or dropped.

Then, having both a task-set TS and task-queue TQ, maybe seems the thing
to do, where, it should be sort of rotating, because, the task-queue is
FIFO, while the task-set is just a set (a concurrent set, though as with
regards to that the tasks can only be marked canceled, and resubmitted
to the task queue, with regards to that the only action that removes
tasks from the task-set is for the task-queue to result them being
concluded, then that whatever task gets tossed on the task queue is to
be inserted into the task-set).

Then the task-set TS would be on the order of outstanding tasks, while,
the task-queue TQ would be on the order of outstanding tasks' re-routines.

Then the usual idea of sweeper/closer is to iterate through a view of
the TS, check each task whether its attachment dropped or command or
routine timed-out or canceled, then if dropped or canceled, to toss it
on the TQ, which would eventually result canceling if not already
canceled and dropping if dropped.

(Canceling/Cancelling.)

Most of the memory would be in the monads, also the open or live handles
would be in the routine's monads, with the idea being that when the task
concludes, then the results, that go out through the remultiplexer,
should be part of the task.

TASK {
RMA attachment; // return/remultiplexer attachment
PCP command; // protocol command/payload
RRMT routine; // routine / re-routine, metrics / time (monad)
RSLT result; // result (monad)
}

It's figured that the routine _returns_ a result, which is either a
serializable value or otherwise it's according to the protocol, or it's
a live handle or specification of handle, or it has an error/exception
that is expected to be according to the protocol, or that there was an
error then whether it results a drop according to the protocol. So, when
the routine and task concludes, then the routine and metrics monads can
be released, or de-allocated or deleted, while what live handles they
have, are to be passed back as expeditiously as possible to the
remultiplexer to be written to the output as on the wire the protocol,
so that the live handles can be closed or their reference counts
decremented or otherwise released to the handle pool, of a sort, which
is yet sort of undefined.

The result RSLT isn't really part of the task, once the task is
concluding, the RRMT goes right to the RMA according to the PCP, that
being the atomic operation of concluding the task, and deleting it from
the task-set. (It's figured that outstanding callbacks unaware their
cancel, of the re-routines, basically don't toss the task back onto the
TQ if they're canceled, that if they do, it would just sort of
spuriously add it back to the task-set, which would result it being
swept out eventually.)

TASK {
RMA attachment; // return/remultiplexer attachment
PCP command; // protocol command/payload
RRMT routine; // routine / re-routine, metrics / time (monad, live handles)
}

TQ // task queue
TS // task set

TW // task-queue worker thread, latch on TQ
TZ // task-set cleanup thread, scheduled about timeouts

Then, about what threads run the callbacks, is to get figured out.

TCF // thread call forward
TCB // thread call back

It's sort of figured that calling forward, is into the adapters and
backend, and calling back, is out of the result to the remultiplexer and
running the remultiplexer also. This is that the task-worker thread
invokes the re-routines, and the re-routine callbacks, are pretty much
called by the backend or TCF, because all they do is toss back onto the
TQ, so that the TW runs the re-routines, the TCF is involved in the O/I
side and the backend adapter, and what reserves live handles, while the
TCB returns the results through the I/O side, and what recycles live
handles.

Then it's sort of figured that the TCF result thread groups or whatever
otherwise results whatever blocks and so on howsoever it is that the
backend adapter is implemented, while TCB is pretty much a single
thread, because it's driving I/O back out through all the open
connections, or that it describes thread groups back out the I/O side.
("TCB" not to be confused with "thread control block".)


Nonblocking I/O, and, Asynchronous I/O

One thing I'm not too sure about is the limits of the read and write of
the non-blocking I/O. What I figure is that mostly buffers throughout
are 4KiB buffers from a free-list, which is the usual idea of reserving
buffers and getting them off a free-list and returning them when done.
Then, I sort of figure that the reader, gets about a 1MiB buffer for
itself, with the idea being, that the Inputter when there is data off
the wire, reads it into 1MiB buffer, then copies that off to 4KiB buffers.

BFL // buffer free-list, 1
BIR // buffer of the inputter/reader, 1
B4K // buffer of 4KiB size, many

What I figure that BIR is "direct memory" as much as possible, for DMA
where native, while, figuring that pretty much it's buffers on the heap,
fixed-size buffers of small enough size to usually not be mostly sparse,
while not so small that usual larger messages aren't a ton of them, then
with regards to the semantics of offsets and extents in the buffers and
buffer lists, and atomic consumption of the front of the list and atomic
concatenation to the back of the list, or queue, and about the
"monohydra" or "slique" data structure defined way above in this thread.

Then about writing is another thing, I figure that a given number of
4KiB buffers will write out, then no longer be non-blocking while
draining, about the non-blocking I/O, that read is usually non-blocking
because if nothing is available then nothing gets copied, while write
may be blocking because the UART or what it is remains to drain to write
more in.

I'm not even sure about O_NONBLOCK, aio_read/aio_write, and overlapped I/O.

Then it looks like O_NONBLOCKING with select and asynchronous I/O the
aio or overlapped I/O, sort of have different approaches.

I figure to use non-blocking select, then, the selector for the channel
at least in Java, has both read and write interest, or all interest,
with regards to there only being one selector key per channel (socket).
The issue with this is that there's basically that the Inputter/Reader
and Outputter/Writer are all one thread. So, it's figured that reads
would read about a megabyte at a time, then round-robin all the ready
reads and writes, that for each non-blocking read, it reads as much as a
megabyte into the one buffer there, copies the read bytes appending it
into the buffer array in front of the remux Input for the attachment,
tries to write as many as possbile for the buffer array for the write
output in front of the remux Output for the attachment, then proceeds
round-robin through the selector keys. (That each of those is
non-blocking on the read/write a.k.a. recv/send then copying from the
read buffer into application buffers is according to as fast as it can
fill a free-list given list of buffers, though that any might get
nothing done.)

One of the issues is that the selector keys get waked up for read, when
there is any input, and for write, when the output has any writeable
space, yet, there's no reason to service the write keys when there is
nothing to write, and nothing to read from the read keys when nothing to
read.

So, it's figured the read keys are always of interest, yet if the write
keys are of interest, mostly it's only one or the other. So I'd figure
to have separate read and write selectors, yet, it's suggested they must
go together the channel the operations of interest, then whether the
idea is "round-robin write then round-robin read", because all the
selector keys would always be waking up for writing nothing when the way
is clear, for nothing.

Then besides non-blocking I/O is asynchronous I/O, where, mostly the
idea is that the completion handler results about the same, ..., where
the completion handler is usually enough "copy the data out to read,
repeat", or just "atomic append more to write, repeat", with though
whether that results that each connection needs its own read buffers, in
terms of asynchronous I/O, not saying in what order or whether
completion handlers, completion ports or completion handlers, would for
reading each need their own buffer. I.e., to scale to unbounded many
connections, the idea is to use constant size resources, because
anything linear would grow unbounded. That what's to write is still all
these buffers of data and how to "deduplicate the backend" still has
that the heap fills up with tasks, that the other great hope is that the
resulting runtime naturally rate-limits itself, by what resources it
has, heap.

About "live handles" is the sort of hope that "well when it gets to the
writing the I/O, figuring to transfer an entire file, pass it an open
handle", is starting to seem a bad idea, mostly for not keeping handles
open while not actively reading and writing from them, and that mostly
for the usual backend though that does have a file-system or
object-store representation, how to result that results a sort of
streaming sub-protocol routine, about fetching ranges of the objects or
otherwise that the idea is that the backend file is a zip file, with
that the results are buffers of data ready to write, or handles, to
concatenate the compressed sections that happen to be just ranges in the
file, compressed, with concatenating them together about the internals
of zip file format, the data at rest. I.e. the idea is that handles are
sides of a pipe then to transfer the handle as readable to the output
side of the pipe as writeable.

It seems though for various runtimes, that both a sort of "classic
O_NONBLOCKING" and "async I/O in callbacks" organizations, can be about
same, figuring that whenever there's a read that it drives the Layers
then the Recognizer/Parser (the remux if any and then the
command/payload parser), and the Layers, and if there's anything to
write then the usual routine is to send it and release to recycle any
buffers, or close the handles, as their contents are sent.

It's figured to marshal whatever there is to write as buffers, while,
the idea of handles results being more on the asynchronous I/O on the
backend when it's filesystem. Otherwise it would get involved partially
written handles, though there's definitely something to be said for an
open handle to an unbounded file, and writing that out without breaking
it into a streaming-sub-protocol or not having it on the heap.

"Use nonblocking mode for this operation; that is, this call to preadv2
will fail and set errno to EAGAIN if the operation would block. "

The goal is mostly being entirely non-blocking, then with that the
atomic consume/concatenate of buffers makes for "don't touch the buffers
while their I/O is outstanding or imminent", then that what services I/O
only consumes and concatenates, while getting from the free-list or
returning to the free-list, what it concatenates or consumes. [It's
figured to have buffers of 4KiB or 512KiB size, the inputter gets a 1MiB
direct buffer, that RAM is a very scarce resource.]

So, for the non-blocking I/O, I'm trying to figure out how to service
the ready reads, while, only servicing ready writes that also have
something to write. Then I don't much worry about it because ready
writes with nothing to write would result a no-op. Then, about the
asynchronous I/O, is that there would always be an outstanding or
imminent completion result for the ready read, or that, I'm not sure how
to make it so that reads are not making busy-work, while, it seems clear
that writes are driven by there being something to write, then though
not wanting those to hammer when the output buffer is full. In this
sense the non-blocking vector I/O with select/epoll/kqueue or what, uses
less resources for services that have various levels of load, day-over-day.


https://hackage.haskell.org/package/machines
https://clojure.org/reference/transducers
https://chamibuddhika.wordpress.com/2012/08/11/io-demystified/


With non-blocking I/O, or at least in Java, the attachment, is attached
to the selection key, so, they're just round-robin'ed. In asynchronous
(aio on POSIX or overlapped I/O on Windows respectively), in Java the
completion event gets the attachment, but doesn't really say how to
invoke the async send/recv again, and I don't want to maintain a map of
attachments and connections, though it would be alright if that's the
way of things.

Then it sort of seems like "non-blocking for read, or drops, async I/O
for writes". Yet, for example in Java, a SocketChannel is a
SelectableChannel, while, an AsyncSocketChannel, is not a SelectableChannel.

Then, it seems pretty clear that while on Windows, one might want to
employ the aio model, because it's built into Windows, then as for the
sort of followup guarantees, or best when on Windows, that otherwise the
most usual approach is "O_NONBLOCKING" for the socket fd and the fd_set.

Then, what select seems to guarantee, is, that, operations of interest,
_going to ready_, get updated, it doesn't say anything about going to
un-ready. Reads start un-ready and writes start ready, then that the
idea is that select results updating readiness, but not unreadiness.
Then the usual selector implementation, for the selection keys, and the
registered keys and the selected keys, for the interest ops (here only
read and write yet also connect when drops fall out of it) and ready ops.

Yet, it doesn't seem to really claim to guarantee, that while working
with a view of the selection keys, that if selection keys are removed
because they're read-unready (nothing to do) or nothing-to-write
(nothing to do), one worries that the next select round has to have
marked any read-ready, while, it's figured that any something-to-write,
should add the corresponding key back to the selection keys. (There's
for that if the write buffer is full, it would just return 0 I suppose,
yet not wanting to hammer/thrash/churn instead just write when ready.)

So I want to establish that there can be more than one selector,
because, otherwise I suppose that the Inputter/Reader (now also
Outputter/Writer) wants read keys that update to ready, and write keys
that update to ready, yet not write keys that have nothing-to-do, when
they're all ready when they have nothing-to-do. Yet, it seems pretty
much that they all go through one function, like WSPSelect on Windows.

I suppose there's setting the interest ops of the key, according to
whether there's something to write, figuring there's always something to
read, yet when there is something to write, would involve finding the
key and setting its write-interest again. I don't figure that any kind
of changing the selector keys themselves is any kind of good idea at
all, but I only want to deal with the keys that get activity.

Also there's an idea that read() or write() might return -1 and set
EAGAIN in the POSIX thread local error number, yet for example in the
Java implementation it's to be avoided altogether calling the unready as
they only return >0 or throw an otherwise ambiguous exception.

So, I'm pretty much of a mind to just invoke select according to 60
seconds timeout, then just have the I/O thread service all the selection
keys, what way it can sort of discover drops as it goes through then
read if readable and write if write-able and timeout according to the
protocol if the protocol has a timeout.

Yet, it seems instead that when a read() or write() returns until read()
or write() returns 0, there is a bit of initialization to figure out,
must be. What it seems that selection is on all the interest ops, then
to unset interest on OP_WRITE, until there is something to write, then
to set interest on OP_WRITE on the selector's keys, before entering
select, wherein it will populate what's writable, as where it's
writable. Yet, there's not removing the key, as it will show up for
OP_READ presumably anyways.

Anyways it seems that it's alright to have multiple selectors anyways,
so having separate read and write selectors seems fine. Then though
there's two threads, so both can block in select() at the same time.
Then it's figured that the write selector is initialized by deleting the
selected-key as it starts by default write-able, and then it's only of
interest when it's ever full on writing, so it comes up, there's writes
until done and its' deleted, then that continues until there's nothing
to do. The reads are pretty simple then and when the selected-keys come
up they're read until nothing-to-do, then deleted from selected-keys.
[So, the writer thread is mostly only around to finish unfulfilled writes.]


Remux: Multiplexer/Demultiplexer, Remultiplexer, mux/demux

A command might have multiple responses, where it's figured it will
result multiple tasks, or a single task, that return to a single
attachment's connection. The multiplexer mostly accepts that requests
are mutiplexed over the connection, so it results that those are
ephemeral and that the remux creates remux attachments to the original
attachment, involved in any sort of frames/chunks. The compression layer
is variously before or after that, then encryption is after that, while
some protocols also have encryption of a sort within that.

The remux then results that the Recognizer/Parser just gets input, and
recognizes frames/chunks their creation, then assembling their contents
into commands/payloads. Then it's figured that the commands are
independent and just work their way through as tasks and then get
chunked/framed as according to the remux, then also as with regards to
"streaming sub-protocols with respect to the remux".

Pipelined commands basically result a remux, establishing that the
responses are written in serial order as were received.

It's basically figured that 63 bit or 31 bit serial numbers would be
plenty to identify unique requests per connection, and connections and
so on, about the lifetime of the routine and a serial number for each thing.



IO <-> Selectors <-> Rec/Par <-> Remux <-> Rec/Par <-> TQ/TS <-> backend
Ross Finlayson
2024-04-29 03:24:00 UTC
Reply
Permalink
On 04/27/2024 09:01 AM, Ross Finlayson wrote:
> On 04/25/2024 10:46 AM, Ross Finlayson wrote:
>> On 04/22/2024 10:06 AM, Ross Finlayson wrote:
>>> On 04/20/2024 11:24 AM, Ross Finlayson wrote:
>>>>
>>>>
>>>> Well I've been thinking about the re-routine as a model of cooperative
>>>> multithreading,
>>>> then thinking about the flow-machine of protocols
>>>>
>>>> NNTP
>>>> IMAP <-> NNTP
>>>> HTTP <-> IMAP <-> NNTP
>>>>
>>>> Both IMAP and NNTP are session-oriented on the connection, while,
>>>> HTTP, in terms of session, has various approaches in terms of HTTP 1.1
>>>> and connections, and the session ID shared client/server.
>>>>
>>>>
>>>> The re-routine idea is this, that each kind of method, is memoizable,
>>>> and, it memoizes, by object identity as the key, for the method, all
>>>> its callers, how this is like so.
>>>>
>>>> interface Reroutine1 {
>>>>
>>>> Result1 rr1(String a1) {
>>>>
>>>> Result2 r2 = reroutine2.rr2(a1);
>>>>
>>>> Result3 r3 = reroutine3.rr3(r2);
>>>>
>>>> return result(r2, r3);
>>>> }
>>>>
>>>> }
>>>>
>>>>
>>>> The idea is that the executor, when it's submitted a reroutine,
>>>> when it runs the re-routine, in a thread, then it puts in a
>>>> ThreadLocal,
>>>> the re-routine, so that when a re-routine it calls, returns null as it
>>>> starts an asynchronous computation for the input, then when
>>>> it completes, it submits to the executor the re-routine again.
>>>>
>>>> Then rr1 runs through again, retrieving r2 which is memoized,
>>>> invokes rr3, which throws, after queuing to memoize and
>>>> resubmit rr1, when that calls back to resubmit r1, then rr1
>>>> routines, signaling the original invoker.
>>>>
>>>> Then it seems each re-routine basically has an instance part
>>>> and a memoized part, and that it's to flush the memo
>>>> after it finishes, in terms of memoizing the inputs.
>>>>
>>>>
>>>> Result 1 rr(String a1) {
>>>> // if a1 is in the memo, return for it
>>>> // else queue for it and carry on
>>>>
>>>> }
>>>>
>>>>
>>>> What is a re-routine?
>>>>
>>>> It's a pattern for cooperative multithreading.
>>>>
>>>> It's sort of a functional approach to functions and flow.
>>>>
>>>> It has a declarative syntax in the language with usual
>>>> flow-of-control.
>>>>
>>>> So, it's cooperative multithreading so it yields?
>>>>
>>>> No, it just quits, and expects to be called back.
>>>>
>>>> So, if it quits, how does it complete?
>>>>
>>>> The entry point to re-routine provides a callback.
>>>>
>>>> Re-routines only return results to other re-routines,
>>>> It's the default callback. Otherwise they just callback.
>>>>
>>>> So, it just quits?
>>>>
>>>> If a re-routine gets called with a null, it throws.
>>>>
>>>> If a re-routine gets a null, it just continues.
>>>>
>>>> If a re-routine completes, it callbacks.
>>>>
>>>> So, can a re-routine call any regular code?
>>>>
>>>> Yeah, there are some issues, though.
>>>>
>>>> So, it's got callbacks everywhere?
>>>>
>>>> Well, it's just got callbacks implicitly everywhere.
>>>>
>>>> So, how does it work?
>>>>
>>>> Well, you build a re-routine with an input and a callback,
>>>> you call it, then when it completes, it calls the callback.
>>>>
>>>> Then, re-routines call other re-routines with the argument,
>>>> and the callback's in a ThreadLocal, and the re-routine memoizes
>>>> all of its return values according to the object identity of the
>>>> inputs,
>>>> then when a re-routine completes, it calls again with another
>>>> ThreadLocal
>>>> indicating to delete the memos, following the exact same
>>>> flow-of-control
>>>> only deleting the memos going along, until it results all the
>>>> memos in
>>>> the re-routines for the interned or ref-counted input are deleted,
>>>> then the state of the re-routine is de-allocated.
>>>>
>>>> So, it's sort of like a monad and all in pure and idempotent functions?
>>>>
>>>> Yeah, it's sort of like a monad and all in pure and idempotent
>>>> functions.
>>>>
>>>> So, it's a model of cooperative multithreading, though with no yield,
>>>> and callbacks implicitly everywhere?
>>>>
>>>> Yeah, it's sort of figured that a called re-routine always has a
>>>> callback in the ThreadLocal, because the runtime has pre-emptive
>>>> multithreading anyways, that the thread runs through its re-routines in
>>>> their normal declarative flow-of-control with exception handling, and
>>>> whatever re-routines or other pure monadic idempotent functions it
>>>> calls, throw when they get null inputs.
>>>>
>>>> Also it sort of doesn't have primitive types, Strings must always
>>>> be interned, all objects must have a distinct identity w.r.t. ==, and
>>>> null is never an argument or return value.
>>>>
>>>> So, what does it look like?
>>>>
>>>> interface Reroutine1 {
>>>>
>>>> Result1 rr1(String a1) {
>>>>
>>>> Result2 r2 = reroutine2.rr2(a1);
>>>>
>>>> Result3 r3 = reroutine3.rr3(r2);
>>>>
>>>> return result(r2, r3);
>>>> }
>>>>
>>>> }
>>>>
>>>> So, I expect that to return "result(r2, r3)".
>>>>
>>>> Well, that's synchronous, and maybe blocking, the idea is that it
>>>> calls rr2, gets a1, and rr2 constructs with the callback of rr1 and
>>>> it's
>>>> own callback, and a1, and makes a memo for a1, and invokes whatever is
>>>> its implementation, and returns null, then rr1 continues and invokes
>>>> rr3
>>>> with r2, which is null, so that throws a NullPointerException, and rr1
>>>> quits.
>>>>
>>>> So, ..., that's cooperative multithreading?
>>>>
>>>> Well you see what happens is that rr2 invoked another
>>>> re-routine or
>>>> end routine, and at some point it will get called back, and that will
>>>> happen over and over again until rr2 has an r2, then rr2 will memoize
>>>> (a1, r2), and then it will callback rr1.
>>>>
>>>> Then rr1 had quit, it runs again, this time it gets r2 from the
>>>> (a1, r2) memo in the monad it's building, then it passes a non-null r2
>>>> to rr3, which proceeds in much the same way, while rr1 quits again
>>>> until
>>>> rr3 calls it back.
>>>>
>>>> So, ..., it's non-blocking, because it just quits all the time, then
>>>> happens to run through the same paces filling in?
>>>>
>>>> That's the idea, that re-routines are responsible to build the
>>>> monad and call-back.
>>>>
>>>> So, can I just implement rr2 and rr3 as synchronous and blocking?
>>>>
>>>> Sure, they're interfaces, their implementation is separate. If
>>>> they don't know re-routine semantics then they're just synchronous and
>>>> blocking. They'll get called every time though when the re-routine
>>>> gets
>>>> called back, and actually they need to know the semantics of returning
>>>> an Object or value by identity, because, calling equals() to implement
>>>> Memo usually would be too much, where the idea is to actually function
>>>> only monadically, and that given same Object or value input, must
>>>> return
>>>> same Object or value output.
>>>>
>>>> So, it's sort of an approach as a monadic pure idempotency?
>>>>
>>>> Well, yeah, you can call it that.
>>>>
>>>> So, what's the point of all this?
>>>>
>>>> Well, the idea is that there are 10,000 connections, and any time
>>>> one of them demultiplexes off the connection an input command message,
>>>> then it builds one of these with the response input to the
>>>> demultiplexer
>>>> on its protocol on its connection, on the multiplexer to all the
>>>> connections, with a callback to itself. Then the re-routine is
>>>> launched
>>>> and when it returns, it calls-back to the originator by its
>>>> callback-number, then the output command response writes those back
>>>> out.
>>>>
>>>> The point is that there are only as many Theads as cores so the
>>>> goal is that they never block,
>>>> and that the memos make for interning Objects by value, then the
>>>> goal is
>>>> mostly to receive command objects and handles to request bodies and
>>>> result objects and handles to response bodies, then to call-back with
>>>> those in whatever serial order is necessary, or not.
>>>>
>>>> So, won't this run through each of these re-routines umpteen times?
>>>>
>>>> Yeah, you figure that the runtime of the re-routine is on the
>>>> order
>>>> of n^2 the order of statements in the re-routine.
>>>>
>>>> So, isn't that terrible?
>>>>
>>>> Well, it doesn't block.
>>>>
>>>> So, it sounds like a big mess.
>>>>
>>>> Yeah, it could be. That's why to avoid blocking and callback
>>>> semantics, is to make monadic idempotency semantics, so then the
>>>> re-routines are just written in normal synchronous flow-of-control, and
>>>> they're well-defined behavior is exactly according to flow-of-control
>>>> including exception-handling.
>>>>
>>>> There's that and there's basically it only needs one Thread, so,
>>>> less Thread x stack size, for a deep enough thread call-stack. Then
>>>> the
>>>> idea is about one Thread per core, figuring for the thread to always be
>>>> running and never be blocking.
>>>>
>>>> So, it's just normal flow-of-control.
>>>>
>>>> Well yeah, you expect to write the routine in normal
>>>> flow-of-control, and to test it with synchronous and in-memory editions
>>>> that just run through synchronously, and that if you don't much care if
>>>> it blocks, then it's the same code and has no semantics about the
>>>> asynchronous or callbacks actually in it. It just returns when it's
>>>> done.
>>>>
>>>>
>>>> So what's the requirements of one of these again?
>>>>
>>>> Well, the idea is, that, for a given instance of a re-routine,
>>>> it's
>>>> an Object, that implements an interface, and it has arguments, and it
>>>> has a return value. The expectation is that the re-routine gets called
>>>> with the same arguments, and must return the same return value. This
>>>> way later calls to re-routines can match the same expectation,
>>>> same/same.
>>>>
>>>> Also, if it gets different arguments, by Object identity or
>>>> primitive value, the re-routine must return a different return value,
>>>> those being same/same.
>>>>
>>>> The re-routine memoizes its arguments by its argument list, Object
>>>> or primitive value, and a given argument list is same if the order and
>>>> types and values of those are same, and it must return the same return
>>>> value by type and value.
>>>>
>>>> So, how is this cooperative multithreading unobtrusively in
>>>> flow-of-control again?
>>>>
>>>> Here for example the idea would be, rr2 quits and rr1 continues, rr3
>>>> quits and rr1 continues, then reaching rr4, rr4 throws and rr1 quits.
>>>> When rr2's or rr3's memo-callback completes, then it calls-back
>>>> rr1. as
>>>> those come in, at some point rr4 will be fulfilled, and thus rr4 will
>>>> quit and rr1 will quit. When rr4's callback completes, then it will
>>>> call-back rr1, which will finally complete, and then call-back whatever
>>>> called r1. Then rr1 runs itself through one more time to
>>>> delete or decrement all its memos.
>>>>
>>>> interface Reroutine1 {
>>>>
>>>> Result1 rr1(String a1) {
>>>>
>>>> Result2 r2 = reroutine2.rr2(a1);
>>>>
>>>> Result3 r3 = reroutine3.rr3(a1);
>>>>
>>>> Result4 r4 = reroutine4.rr4(a1, r2, r3);
>>>>
>>>> return Result1.r4(a1, r4);
>>>> }
>>>>
>>>> }
>>>>
>>>> The idea is that it doesn't block when it launchs rr2 and rr3, until
>>>> such time as it just quits when it tries to invoke rr4 and gets a
>>>> resulting NullPointerException, then eventually rr4 will complete
>>>> and be
>>>> memoized and call-back rr1, then rr1 will be called-back and then
>>>> complete, then run itself through to delete or decrement the ref-count
>>>> of all its memo-ized fragmented monad respectively.
>>>>
>>>> Thusly it's cooperative multithreading by never blocking and always
>>>> just
>>>> launching callbacks.
>>>>
>>>> There's this System.identityHashCode() method and then there's a notion
>>>> of Object pools and interning Objects then as for about this way that
>>>> it's about numeric identity instead of value identity, so that when
>>>> making memo's that it's always "==" and for a HashMap with
>>>> System.identityHashCode() instead of ever calling equals(), when
>>>> calling
>>>> equals() is more expensive than calling == and the same/same
>>>> memo-ization is about Object numeric value or the primitive scalar
>>>> value, those being same/same.
>>>>
>>>> https://docs.oracle.com/javase/8/docs/api/java/lang/System.html#identityHashCode-java.lang.Object-
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> So, you figure to return Objects to these connections by their session
>>>> and connection and mux/demux in these callbacks and then write those
>>>> out?
>>>>
>>>> Well, the idea is to make it so that according to the protocol, the
>>>> back-end sort of knows what makes a handle to a datum of the sort,
>>>> given
>>>> the protocol and the protocol and the protocol, and the callback is
>>>> just
>>>> these handles, about what goes in the outer callbacks or outside the
>>>> re-routine, those can be different/same. Then the single writer thread
>>>> servicing the network I/O just wants to transfer those handles, or, as
>>>> necessary through the compression and encryption codecs, then write
>>>> those out, well making use of the java.nio for scatter/gather and
>>>> vector
>>>> I/O in the non-blocking and asynchronous I/O as much as possible.
>>>>
>>>>
>>>> So, that seems a lot of effort to just passing the handles, ....
>>>>
>>>> Well, I don't want to write any code except normal flow-of-control.
>>>>
>>>> So, this same/same bit seems onerous, as long as different/same has a
>>>> ref-count and thus the memo-ized monad-fragment is maintained when all
>>>> sorts of requests fetch the same thing.
>>>>
>>>> Yeah, maybe you're right. There's much to be gained by re-using
>>>> monadic
>>>> pure idempotent functions yet only invoking them once. That gets into
>>>> value equality besides numeric equality, though, with regards to going
>>>> into re-routines and interning all Objects by value, so that inside and
>>>> through it's all "==" and System.identityHashCode, the memos, then
>>>> about
>>>> the ref-counting in the memos.
>>>>
>>>>
>>>> So, I suppose you know HTTP, and about HTTP/2 and IMAP and NNTP here?
>>>>
>>>> Yeah, it's a thing.
>>>>
>>>> So, I think this needs a much cleaner and well-defined definition, to
>>>> fully explore its meaning.
>>>>
>>>> Yeah, I suppose. There's something to be said for reading it again.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>>
>>> ReRoutines: monadic functional non-blocking asynchrony in the language
>>>
>>>
>>> Implementing a sort of Internet protocol server, it sort of has three or
>>> four kinds of machines.
>>>
>>> flow-machine: select/epoll hardware driven I/O events
>>>
>>> protocol-establishment: setting up and changing protocol (commands,
>>> encryption/compression)
>>>
>>> protocol-coding: block coding in encryption/compression and wire/object
>>> commands/results
>>>
>>> routine: inside the objects of the commands of the protocol,
>>> commands/results
>>>
>>> Then, it often looks sort of like
>>>
>>> flow <-> protocol <-> routine <-> protocol <-> flow
>>>
>>>
>>> On either outer side of the flow is a connection, it's a socket or the
>>> receipt or sending of a datagram, according to the network interface and
>>> select/epoll.
>>>
>>> The establishment of a protocol looks like
>>> connection/configuration/commencement/conclusion, or setup/teardown.
>>> Protocols get involved renegotiation within a protocol, and for example
>>> upgrade among protocols. Then the protocol is setup and established.
>>>
>>> The idea is that a protocol's coding is in three parts for
>>> coding/decoding, compression/decompression, and (en)cryption/decryption,
>>> or as it gets set up.
>>>
>>> flow->decrypt->decomp->decod->routine->cod->comp->crypt->flow-v
>>> flow<-crypt<-comp<-cod<-routine<-decod<-decomp<-decrypt<-flow<-
>>>
>>>
>>>
>>> Whenever data arrives, the idea goes, is that the flow is interpreted
>>> according to the protocol, resulting commands, then the routine derives
>>> results from the commands, as by issuing others, in their protocols, to
>>> the backend flow. Then, the results get sent back out through the
>>> protocol, to the frontend, the clients of what it serves the protocol
>>> the server.
>>>
>>> The idea is that there are about 10,000 connections at a time, or more
>>> or less.
>>>
>>> flow <-> protocol <-> routine <-> protocol <-> flow
>>> flow <-> protocol <-> routine <-> protocol <-> flow
>>> flow <-> protocol <-> routine <-> protocol <-> flow
>>> ...
>>>
>>>
>>>
>>>
>>> Then, the routine in the middle, has that there's one processor, and on
>>> the processor are a number of cores, each one independent. Then, the
>>> operating system establishes that each of the cores, has any number of
>>> threads-of-control or threads, and each thread has the state of where it
>>> is in the callstack of routines, and the threads are preempted so that
>>> multithreading, that a core runs multiple threads, gives each thread
>>> some running from the entry to the exit of the thread, in any given
>>> interval of time. Each thread-of-control is thusly independent, while it
>>> must synchronize with any other thread-of-control, to establish common
>>> or mutual state, and threads establish taking turns by mutual exclusion,
>>> called "mutex".
>>>
>>> Into and out of the protocol, coding, is either a byte-sequence or
>>> block, or otherwise the flow is a byte-sequence, that being serial,
>>> however the protocol multiplexes and demultiplexes messages, the
>>> commands and their results, to and from the flow.
>>>
>>> Then the idea is that what arrives to/from the routine, is objects in
>>> the protocol, or handles to the transport of byte sequences, in the
>>> protocol, to the flow.
>>>
>>> A usual idea is that there's a thread that services the flow, where, how
>>> it works is that a thread blocks waiting for there to be any I/O,
>>> input/output, reading input from the flow, and writing output to the
>>> flow. So, mostly the thread that blocks has that there's one thread that
>>> blocks on input, and when there's any input, then it reads or transfers
>>> the bytes from the input, into buffers. That's its only job, and only
>>> one thread can block on a given select/epoll selector, which is any
>>> given number of ports, the connections, the idea being that it just
>>> blocks until select returns for its keys of interest, it services each
>>> of the I/O's by copying from the network interface's buffers into the
>>> program's buffers, then other threads do the rest.
>>>
>>> So, if a thread results waiting at all for any other action to complete
>>> or be ready, it's said to "block". While a thread is blocked, the CPU or
>>> core just skips it in scheduling the preemptive multithreading, yet it
>>> still takes some memory and other resources and is in the scheduler of
>>> the threads.
>>>
>>> The idea that the I/O thread, ever blocks, is that it's a feature of
>>> select/epoll that hardware results waking it up, with the idea that
>>> that's the only thread that ever blocks.
>>>
>>> So, for the other threads, in the decryption/decompression/decoding and
>>> coding/compression/cryption, the idea is that a thread, runs through
>>> those, then returns what it's doing, and joins back to a limited pool of
>>> threads, with a usual idea of there being 1 core : 1 thread, so that
>>> multithreading is sort of simplified, because as far as the system
>>> process is concerned, it has a given number of cores and the system
>>> preemptively multithreads it, and as far as the virtual machine is
>>> concerned, is has a given number of cores and the virtual machine
>>> preemptively multithreads its threads, about the thread-of-control, in
>>> the flow-of-control, of the thing.
>>>
>>> A usual way that the routine muliplexes and demultiplexes objects in the
>>> protocol from a flow's input back to a flow's output, has that the
>>> thread-per-connection model has that a single thread carries out the
>>> entire task through the backend flow, blocking along the way, until it
>>> results joining after writing back out to its connection. Yet, that has
>>> a thread per each connection, and threads use scheduling and heap
>>> resources. So, here thread-per-connection is being avoided.
>>>
>>> Then, a usual idea of the tasks, is that as I/O is received and flows
>>> into the decryption/decompression/decoding, then what's decoded, results
>>> the specification of a task, the command, and the connection, where to
>>> return its result. The specification is a data structure, so it's an
>>> object or Object, then. This is added to a queue of tasks, where
>>> "buffers" represent the ephemeral storage of content in transport the
>>> byte-sequences, while, the queue is as usually a first-in/first-out
>>> (FIFO) queue also, of tasks.
>>>
>>> Then, the idea is that each of the cores consumes task specifications
>>> from the task queue, performs them according to the task specification,
>>> then the results are written out, as coded/compressed/crypted, in the
>>> protocol.
>>>
>>> So, to avoid the threads blocking at all, introduces the idea of
>>> "asynchrony" or callbacks, where the idea is that the "blocking" and
>>> "synchronous" has that anywhere in the threads' thread-of-control
>>> flow-of-control, according to the program or the routine, it is current
>>> and synchronous, the value that it has, then with regards to what it
>>> returns or writes, as the result. So, "asynchrony" is the idea that
>>> there's established a callback, or a place to pause and continue, then a
>>> specification of the task in the protocol is put to an event queue and
>>> executed, or from servicing the O/I's of the backend flow, that what
>>> results from that, has the context of the callback and returns/writes to
>>> the relevant connection, its result.
>>>
>>> I -> flow -> protocol -> routine -> protocol -> flow -> O -v
>>> O <- flow <- protocol <- routine <- protocol <- flow <- I <-
>>>
>>>
>>> The idea of non-blocking then, is that a routine either provides a
>>> result immediately available, and is non-blocking, or, queues a task
>>> what results a callback that provides the result eventually, and is
>>> non-blocking, and never invokes any other routine that blocks, so is
>>> non-blocking.
>>>
>>> This way a thread, executing tasks, always runs through a task, and thus
>>> services the task queue or TQ, so that the cores' threads are always
>>> running and never blocking. (Besides the I/O and O/I threads which block
>>> when there's no traffic, and usually would be constantly woken up and
>>> not waiting blocked.) This way, the TQ threads, only block when there's
>>> nothing in the TQ, or are just deconstructed, and reconstructed, in a
>>> "pool" of threads, the TQ's executor pool.
>>>
>>> Enter the ReRoutine
>>>
>>> The idea of a ReRoutine, a re-routine, is that it is a usual procedural
>>> implementation as if it were synchronous, and agnostic of callbacks.
>>>
>>> It is named after "routine" and "co-routine". It is a sort of co-routine
>>> that builds a monad and is aware its originating caller, re-caller, and
>>> callback, or, its re-routine caller, re-caller, and callback.
>>>
>>> The idea is that there are callbacks implicitly at each method boundary,
>>> and that nulls are reserved values to indicate the result or lack
>>> thereof of re-routines, so that the code has neither callbacks nor any
>>> nulls.
>>>
>>> The originating caller has that the TQ, has a task specification, the
>>> session+attachment of the client in the protocol where to write the
>>> output, and the command, then the state of the monad of the task, that
>>> lives on the heap with the task specification and task object. The TQ
>>> consumers or executors or the executor, when a thread picks up the task,
>>> it picks up or builds ("originates") the monad state, which is the
>>> partial state of the re-routine and a memo of the partial state of the
>>> re-routine, and installs this in the thread local storage or
>>> ThreadLocal, for the duration of the invocation of the re-routine. Then
>>> the thread enters the re-routine, which proceeds until it would block,
>>> where instead it queues a command/task with callback to re-call it to
>>> re-launch it, and throw a NullPointerException and quits/returns.
>>>
>>> This happens recursively and iteratively in the re-routine implemented
>>> as re-routines, each re-routine updates the partial state of the monad,
>>> then that as a re-routine completes, it re-launches the calling
>>> re-routine, until the original re-routine completes, and it calls the
>>> original callback with the result.
>>>
>>> This way the re-routine's method body, is written as plain declarative
>>> procedural code, the flow-of-control, is exactly as if it were
>>> synchronous code, and flow-of-control is exactly as if written in the
>>> language with no callbacks and never nulls, and exception-handling as
>>> exactly defined by the language.
>>>
>>> As the re-routine accumulates the partial results, they live on the
>>> heap, in the monad, as a member of the originating task's object the
>>> task in the task queue. This is always added back to the queue as one of
>>> the pending results of a re-routine, so it stays referenced as an object
>>> on the heap, then that as it is completed and the original re-routine
>>> returns, then it's no longer referenced and the garbage-collector can
>>> reclaim it from the heap or the allocator can delete it.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Well, for the re-routine, I sort of figure there's a Callstack and a
>>> Callback type
>>>
>>> class Callstack {
>>> Stack<Callback> callstack;
>>> }
>>>
>>> interface Callback {
>>> void callback() throws Exception;
>>> }
>>>
>>> and then a placeholder sort of type for Callflush
>>>
>>> class Callflush {
>>> Callstack callstack;
>>> }
>>>
>>> with the idea that the presence in ThreadLocals is to be sorted out,
>>> about a kind of ThreadLocal static pretty much.
>>>
>>> With not returning null and for memoizing call-graph dependencies,
>>> there's basically for an "unvoid" type.
>>>
>>> class unvoid {
>>>
>>> }
>>>
>>> Then it's sort of figure that there's an interface with some defaults,
>>> with the idea that some boilerplate gets involved in the Memoization.
>>>
>>> interface Caller {}
>>>
>>> interface Callee {}
>>>
>>> class Callmemo {
>>> memoize(Caller caller, Object[] args);
>>> flush(Caller caller);
>>> }
>>>
>>>
>>> Then it seems that the Callstack should instead be of a Callgraph, and
>>> then what's maintained from call to call is a Callpath, and then what's
>>> memoized is all kept with the Callgraph, then with regards to objects on
>>> the heap and their distinctness, only being reachable from the
>>> Callgraph, leaving less work for the garbage collector, to maintain the
>>> heap.
>>>
>>> The interning semantics would still be on the class level, or for
>>> constructor semantics, as with regards to either interning Objects for
>>> uniqueness, or that otherwise they'd be memoized, with the key being the
>>> Callpath, and the initial arguments into the Callgraph.
>>>
>>> Then the idea seems that the ThreaderCaller, establishes the Callgraph
>>> with respect to the Callgraph of an object, installing it on the thread,
>>> otherwise attached to the Callgraph, with regards to the ReRoutine.
>>>
>>>
>>>
>>> About the ReRoutine, it's starting to come together as an idea, what is
>>> the apparatus for invoking re-routines, that they build the monad of the
>>> IOE's (inputs, outputs, exceptions) of the re-routines in their
>>> call-graph, in terms of ThreadLocals of some ThreadLocals that callers
>>> of the re-routines, maintain, with idea of the memoized monad along the
>>> way, and each original re-routine.
>>>
>>> class IOE <O, E> {
>>> Object[] input;
>>> Object output;
>>> Exception exception;
>>> }
>>>
>>> So the idea is that there are some ThreadLocal's in a static
>>> ThreadGlobal
>>>
>>> public class ThreadGlobals {
>>> public static ThreadLocal<MonadMemo> monadMemo;
>>> }
>>>
>>> where callers or originators or ReRoutines, keep a map of the Runnables
>>> or Callables they have, to the MonadMemo's,
>>>
>>> class Originator {
>>> Map<? extends ReRoutineMapKey, MonadMemo> monadMemoMap;
>>> }
>>>
>>> then when it's about to invoke a Runnable, if it's a ReRoutine, then it
>>> either retrieves the MonadMemo or makes a new one, and sets it on the
>>> ThreadLocal, then invokes the Runnable, then clears the ThreadLocal.
>>>
>>> Then a MonadMemo, pretty simply, is a List of IOE's, that when the
>>> ReRoutine runs through the callgraph, the callstack is indicated by a
>>> tree of integers, and the stack path in the ReRoutine, so that any
>>> ReRoutine that calls ReRoutines A/B/C, points to an IOE that it finds in
>>> the thing, then it's default behavior is to return its memo-ized value,
>>> that otherwise is making the callback that fills its memo and re-invokes
>>> all the way back the Original routine, or just its own entry point.
>>>
>>> This is basically that the Originator, when the ReRoutine quits out,
>>> sort of has that any ReRoutine it originates, also gets filled up by the
>>> Originator.
>>>
>>> So, then the Originator sort of has a map to a ReRoutine, then for any
>>> Path, the Monad, so that when it sets the ThreadLocal with the
>>> MonadMemo, it also sets the Path for the callee, launches it again when
>>> its callback returned to set its memo and relaunch it, then back up the
>>> path stack to the original re-routine.
>>>
>>> One of the issues here is "automatic parallelization". What I mean by
>>> that is that the re-routine just goes along and when it gets nulls
>>> meaning "pending" it just continues along, then expects
>>> NullPointerExceptions as "UnsatisifiedInput", to quit, figuring it gets
>>> relaunched when its input is satisfied.
>>>
>>> This way then when routines serially don't depend on each others'
>>> outputs, then they all get launched apiece, parallelizing.
>>>
>>> Then, I wonder about usual library code, basically about Collections and
>>> Streams, and the usual sorts of routines that are applied to the
>>> arguments, and how to basically establish that the rule of re-routine
>>> code is that anything that gets a null must throw a
>>> NullPointerException, so the re-routine will quit until the arguments
>>> are satisfied, the inputs to library code. Then with the Memo being
>>> stored in the MonadMemo, it's figured that will work out regardless the
>>> Objects' or primitives' value, with regards to Collections and Stream
>>> code and after usual flow-of-control in Iterables for the for loops, or
>>> whatever other application library code, that they will be run each time
>>> the re-routine passes their section with satisfied arguments, then as
>>> with regards to, that the Memo is just whatever serial order the
>>> re-routine passes, not needing to lookup by Object identity which is
>>> otherwise part of an interning pattern.
>>>
>>> void rr1(String s1) {
>>>
>>> List<String> l1 = rr2.get(s1);
>>>
>>> Map<String, String> m1 = new LinkedHashMap<>();
>>>
>>> l1.stream().forEach(s -> m1.put(s, rr3.get(s)));
>>>
>>> return m1;
>>> }
>>>
>>> See what I figure is that the order of the invocations to rr3.get() is
>>> serial, so it really only needs to memoize its OE, Output|Exception,
>>> then about that putting null values in the Map, and having to check the
>>> values in the Map for null values, and otherwise to make it so that the
>>> semantics of null and NullPointerException, result that satisfying
>>> inputs result calls, and unsatisfying inputs result quits, figuring
>>> those unsatisfying inputs are results of unsatisfied outputs, that will
>>> be satisfied when the callee gets populated its memo and makes the
>>> callback.
>>>
>>> If the order of invocations is out-of-order, gets again into whether the
>>> Object/primitive by value needs to be the same each time, IOE, about the
>>> library code in Collections, Streams, parallelStream, and Iterables, and
>>> basically otherwise that any kind of library code, should throw
>>> NullPointerException if it gets an "unexpected" null or what doesn't
>>> fulfill it.
>>>
>>> The idea though that rr3 will get invoked say 1000 times with the rr2's
>>> result, those each make their call, then re-launch 1000 times, has that
>>> it's figured that the Executor, or Originator, when it looks up and
>>> loads the "ReRoutineMapKey", is to have the count of those and whether
>>> the count is fulfilled, then to no-op later re-launches of the
>>> call-backs, after all the results are populated in the partial monad
>>> memo.
>>>
>>> Then, there's perhaps instead as that each re-routine just checks its
>>> input or checks its return value for nulls, those being unsatisfied.
>>>
>>> (The exception handling thoroughly or what happens when rr3 throws and
>>> this kind of thing is involved thoroughly in library code.)
>>>
>>> The idea is it remains correct if the worst thing nulls do is throw
>>> NullPointerException, because that's just a usual quit and means another
>>> re-launch is coming up, and that it automatically queues for
>>> asynchronous parallel invocation each the derivations while resulting
>>> never blocking.
>>>
>>> It's figured that re-routines check their inputs for nulls, and throw
>>> quit, and check their inputs for library container types, and checking
>>> any member of a library container collection for null, to throw quit,
>>> and then it will result that the automatic asynchronous parallelization
>>> proceeds, while the re-routines are never blocking, there's only as much
>>> memory on the heap of the monad as would be in the lifetime of the
>>> original re-routine, and whatever re-calls or re-launches of the
>>> re-routine established local state in local variables and library code,
>>> would come in and out of scope according to plain stack unwinding.
>>>
>>> Then there's still the perceived deficiency that the re-routine's method
>>> body will be run many times, yet it's only run as many times as result
>>> throwing-quit, when it reaches where its argument to the re-routine or
>>> result value isn't yet satisfied yet is pending.
>>>
>>> It would re-run the library code any number of times, until it results
>>> all non-nulls, then the resulting satisfied argument to the following
>>> re-routines, would be memo-ized in the monad, and the return value of
>>> the re-routine thus returning immediately its value on the partial
>>> monad.
>>>
>>> This way each re-call of the re-routine, mostly encounters its own monad
>>> results in constant time, and throws-quit or gets thrown-quit only when
>>> it would be unsatisfying, with the expectation that whatever
>>> throws-quit, either NullPointerException or extending
>>> NullPointerException, will have a pending callback, that will queue on a
>>> TQ, the task specification to re-launch and re-enter the original or
>>> derived, re-routine.
>>>
>>> The idea is sort of that it's sort of, Java with non-blocking I/O and
>>> ThreadLocal (1.7+, not 17+), or you know, C/C++ with non-blocking I/O
>>> and thread local storage, then for the abstract or interface of the
>>> re-routines, how it works out that it's a usual sort of model of
>>> co-operative multithreading, the re-routine, the routine "in the
>>> language".
>>>
>>>
>>> Then it's great that the routine can be stubbed or implemented agnostic
>>> of asynchrony, and declared in the language with standard libraries,
>>> basically using the semantics of exception handling and convention of
>>> re-launching callbacks to implement thread-of-control flow-of-control,
>>> that can be implemented in the synchronous and blocking for unit tests
>>> and modules of the routine, making a great abstraction of
>>> flow-of-control.
>>>
>>>
>>> Basically anything that _does_ block then makes for having its own
>>> thread, whose only job is to block and when it unblocks, throw-toss the
>>> re-launch toward the origin of the re-routine, and consume the next
>>> blocking-task off the TQ. Yet, the re-routines and their servicing the
>>> TQ only need one thread and never block. (And scale in core count and
>>> automatically parallelize asynchronous requests according to satisfied
>>> inputs.)
>>>
>>>
>>> Mostly the idea of the re-routine is "in the language, it's just plain,
>>> ordinary, synchronous routine".
>>>
>>>
>>>
>>
>>
>> Protocol Establishment
>>
>> Each of these protocols is a combined sort of protocol, then according
>> to different modes, there's established a protocol, then data flows in
>> the protocol (in time).
>>
>>
>> stream-based (connections)
>> sockets, TCP/IP
>> sctp SCTP
>> message-based (datagrams)
>> datagrams, UDP
>>
>> The idea is that connections can have state and session state, while,
>> messages do not.
>>
>> Abstractly then there's just that connections make for reading from the
>> connection, or writing to the connection, byte-by-byte,
>> while messages make for receiving a complete message, or writing a
>> complete message. SCTP is sort of both.
>>
>> A bit more concretely, the non-blocking or asychronous or vector I/O,
>> means that when some bytes arrive the connection is readable, and while
>> the output buffer is not full a connection is writeable.
>>
>> For messages it's that when messages arrive messages are readable, and
>> while the output buffer is not full messages are writeable.
>>
>> Otherwise bytes or messages that pile up while not readable/writeable
>> pile up and in cases of limited resources get lost.
>>
>> So, the idea is that when bytes arrive, whatever's servicing the I/O's
>> has that the connection has data to read, and, data to write.
>> The usual idea is that an abstract Reader thread, will give any or all
>> of the connections something to read, in an arbitrary order,
>> at an arbitrary rate, then the role of the protocol, is to consume the
>> bytes to read, thus releasing the buffers, that the Reader, writes to.
>>
>> Inputting/Reading
>> Writing/Outputting
>>
>> The most usual idea of client-server is that
>> client writes to server then reads from server, while,
>> server reads from client then writes to client.
>>
>> Yet, that is just a mode, reads and writes are peer-peer,
>> reads and writes in any order, while serial according to
>> that bytes in the octet stream arrive in an order.
>>
>> There isn't much consideration of the out-of-band,
>> about sockets and the STREAMS protocol, for
>> that bytes can arrive out-of-band.
>>
>>
>> So, the layers of the protocol, result that some layers of the protocol
>> don't know anything about the protocol, all they know is sequences of
>> bytes, and, whatever session state is involved to implement the codec,
>> of the layers of the protocol. All they need to know is that given that
>> all previous bytes are read/written, that the connection's state is
>> synchronized, and everything after is read/written through the layer.
>> Mostly once encryption or compression is setup it's never toredown.
>>
>> Encryption, TLS
>> Compression, LZ77 (Deflate, gzip)
>>
>> The layers of the protocol, result that some layers of the protocol,
>> only indicate state or conditions of the session.
>>
>> SASL, Login, AuthN/AuthZ
>>
>> So, for NNTP, a connection, usually enough starts with no layers,
>> then in the various protocols and layers, get negotiated to get
>> established,
>> combinations of the protocols and layers. Other protocols expect to
>> start with layers, or not, it varies.
>>
>> Layering, then, either is in the protocol, to synchronize the session
>> then establish the layer in the layer protocol then maintain the layer
>> in the main protocol, has that TLS makes a handsake to establish a
>> encryption key for all the data, then the TLS layer only needs to
>> encrypt and decrypt the data by that key, while for Deflate, it's
>> usually the only option, then after it's setup as a layer, then
>> everything other way reads/writes gets compressed.
>>
>>
>> client -> REQUEST
>> RESPONSE <- server
>>
>> In some protocols these interleave
>>
>> client -> REQUEST1
>> client -> REQUEST2
>>
>> RESPONSE1A <- server
>> RESPONSE2A <- server
>> RESPONSE1B <- server
>> RESPONSE2B <- server
>>
>> This then is called multiplexing/demultiplexing, for protocols like IMAP
>> and HTTP/2,
>> and another name for multiplexer/demultiplexer is mux/demux.
>>
>>
>>
>>
>> So, for TLS, the idea is that usually most or all of the connections
>> will be using the same algorithms with different keys, and each
>> connection will have its own key, so the idea is to completely separate
>> TLS establishment from TLS cryptec (crypt/decryp), so, the layer need
>> only key up the bytes by the connection's key, in their TLS frames.
>>
>> Then, most of the connections will use compression, then the idea is
>> that the data is stored at rest compressed already and in a form that it
>> can be concatenated, and that similarly as constants are a bunch of the
>> textual context of the text-based protocol, they have compressed and
>> concatenable constants, with the idea that the Deflate compec
>> (comp/decomp) just passes those along concatenating them, or actively
>> compresses/decompresses buffers of bytes or as of sequences of bytes.
>>
>> The idea is that Readers and Writers deal with bytes at a time,
>> arbitrarily many, then that what results being passed around as the
>> data, is as much as possible handles to the data. So, according to the
>> protocol and layers, indicates the types, that the command routines, get
>> and return, so that the command routines can get specialized, when the
>> data at rest, is already layerized, and otherwise to adapt to the more
>> concrete abstraction, of the non-blocking, asynchronous, and vector I/O,
>> of what results the flow-machine.
>>
>>
>> When the library of the runtime of the framework of the language
>> provides the cryptec or compec, then, there's issues, when, it doesn't
>> make it so for something like "I will read and write you the bytes as of
>> making a TLS handshake, then return the algorithm and the key and that
>> will implement the cryptec", or, "compec, here's either some data or
>> handles of various types, send them through", it's to be figured out.
>> The idea for the TLS handshake, is basically to sit in the middle, i.e.
>> to read and write bytes as of what the client and server send, then
>> figuring out what is the algorithm and key and then just using that as
>> the cryptec. Then after TLS algorithm and key is established the rest is
>> sort of discarded, though there's some idea about state and session, for
>> the session key feature in TLS. The TLS 1.2 also includes comp/decomp,
>> though, it's figured that instead it's a feature of the protocol whether
>> it supports compression, point being that's combining layers, and to be
>> implemented about these byte-sequences/handles.
>>
>>
>> mux/demux
>> crypt/decrypt
>> comp/decomp
>> cod/decod
>>
>> codec
>>
>>
>> So, the idea is to implement toward the concrete abstraction of
>> nonblocking vector I/O, while, remaining agnostic of that, so that all
>> sorts the usual test routines yet particularly the composition of layers
>> and establishment and upgrade of protocols, is to happen.
>>
>>
>> Then, from the byte sequences or messages as byte sequences, or handles
>> of byte sequences, results that in the protocol, the protocol either way
>> in/out has a given expected set of alternatives that it can read, then
>> as of derivative of those what it will write.
>>
>> So, after the layers, which are agnostic of anything but byte-sequences,
>> and their buffers and framing and chunking and so on, then is the
>> protocol, or protocols, of the command-set and request/response
>> semantics, and ordering/session statefulness, and lack thereof.
>>
>> Then, a particular machine in the flow-machine is as of the "Recognizer"
>> and "Parser", then what results "Annunciators" and "Legibilizers", as it
>> were, of what's usually enough called "Deserialization", reading off
>> from a serial byte-sequence, and "Serialization, writing off to a serial
>> byte-sequence, first the text of the commands or the structures in these
>> text-based protocols, the commands and their headers/bodies/payloads,
>> then the Objects in the object types of the languages of the runtime,
>> where then the routines of the servicing of the protocol, are defined in
>> types according to the domain types of the protocol (and their
>> representations as byte-sequences and handles).
>>
>> As packets and bytes arrive in the byte-sequence, the Recognizer/Parser
>> detects when there's a fully-formed command, and its payload, after the
>> Mux/Demux Demultiplexer, has that the Demultiplexer represents any given
>> number of separate byte-sequences, then according to the protocol
>> anything their statefulness/session or orderedness/unorderedness.
>>
>> So, the Demultiplexer is to Recognize/Parse from the combined input
>> byte-stream its chunks, that now the connection, has any number of
>> ordered/unordered byte-sequences, then usually that those are ephemeral
>> or come and go, while the connection endures, with the most usual notion
>> that there's only one stream and it's ordered in requets and ordered in
>> responses, then whether commands gets pipelined and requests need not
>> await their responses (they're ordered), and whether commands are
>> numbers and their responses get associated with their command sequence
>> numbers (they're unordered and the client has its own mux/demux to
>> relate them).
>>
>> So, the Recognizer/Parser, theoretically only gets a byte at a time, or
>> even none, and may get an entire fully-formed message (command), or not,
>> and may get more bytes than a fully-formed message, or not, and the
>> bytes may be a well-formed message, or not, and valid, or not.
>>
>> Then the job of the Recognizer/Parser, is from the beginning of the
>> byte-sequence, to Recognize a fully-formed message, then to create an
>> instance of the command object related to the handle back through the
>> mux/demux to the multiplexer, called the attachment to the connection,
>> or the return address according to the attachment representing any
>> routed response and usually meaning that the attachment is the user-data
>> and any session data attached to the connection and here of the
>> mux/demux of the connection, the job of the Recognizer/Parser is to work
>> any time input is received, then to recognize and parse any number of
>> fully-formed messages from the input, create those Commands according to
>> the protocol, that the attachment includes the return destination, and,
>> thusly release those buffers or advance the marker on the Input
>> byte-sequence, so that the resources are freed, and later
>> Recognizings/Parsing starts where it left off.
>>
>> The idea is that bytes arrive, the Recognizer/Parser has to determine
>> when there's a fully-formed message, consume that and service the
>> buffers the byte-sequence, having created the derived command.
>>
>> Now, commands are small, or so few words, then the headers/body/payload,
>> basically get larger and later unboundedly large. Then, the idea is that
>> the protocol, has certain modes or sub-protocols, about "switching
>> protocols", or modes, when basically the service of the routine changes
>> from recognizing and servicing the beginning to ending of a command, to
>> recognizing and servicing an arbitrarily large payload, or, for example,
>> entering a mode where streamed data arrives or whatever sort, then that
>> according to the length or content of the sub-protocol format, the
>> Recognizer's job includes that the sub-protocol-streaming, modes, get
>> into that "sub-protocols" is a sort of "switching protocols", the only
>> idea though being going into the sub-protocol then back out to the main
>> protocol, while "switching protocols" is involved in basically any the
>> establishment or upgrade of the protocol, with regards to the stateful
>> connection (and not stateless messages, which always are according to
>> their established or simply some fixed protocol).
>>
>> This way unboundedly large inputs, don't actually live in the buffers of
>> the Recognizers that service the buffers of the Inputters/Readers and
>> Multiplexers/Demultiplexers, instead define modes where they will be
>> streaming through arbitrarily large payloads.
>>
>> Here for NNTP and so on, the payloads are not considered arbitrarily
>> large, though, it's sort of a thing that sending or receiving the
>> payload of each message, can be defined this way so that in very, very
>> limited resources of buffers, that the flow-machine keeps flowing.
>>
>>
>> Then, here, the idea is that these commands and their payloads, have
>> their outputs that are derived as a function of the inputs. It's
>> abstractly however this so occurs is the way it is. The idea here is
>> that the attachment+command+payload makes a re-routine task, and is
>> pushed onto a task queue (TQ). Then it's figured that the TQ represents
>> abstractly the execution of all the commands. Then, however many Task
>> Workers or TW, or the TQ that runs itself, get the oldest task from the
>> queue (FIFO) and run it. When it's complete, then there's a response
>> ready in byte-sequences are handles, these are returned to the
>> attachment.
>>
>> (The "attachment" usually just means a user or private datum associated
>> with the connection to identify its session with the connection
>> according to non-blocking I/O, here it also means the mux/demux
>> "remultiplexer" attachment, it's the destination of any response
>> associated with a stream of commands over the connection.)
>>
>> So, here then the TQ basically has the idea of the re-routine, that is
>> non-blocking and involves the asynchronous fulfillment of the routine in
>> the domain types of the domain of object types that the protocol adapts
>> as an adapter, that the domain types fulfill as adapted. Then for NNTP
>> that's like groups and messages and summaries and such, the objects. For
>> IMAP its mailboxes and messages to read, for SMTP its emails to send,
>> with various protocols in SMTP being separate protocols like DKIM or
>> what, for all these sorts protocols. For HTTP and HTTP/2 it's usual HTTP
>> verbs, usually HTTP 1.1 serial and pipelined requests over a connection,
>> in HTTP/2 mutiplexed requests over a connection. Then "session" means
>> broadly that it may be across connections, what gets into the attachment
>> and the establishment and upgrade of protocol, that sessions are
>> stateful thusly, yet granularly, as to connections yet as to each
>> request.
>>
>>
>> Then, the same sort of thing is the same sort of thing to back-end,
>> whatever makes for adapters, to domain types, that have their protocols,
>> and what results the O/I side to the I/O side, that the I/O side is the
>> server's client-facing side, while the O/I side is the
>> server-as-a-client-to-the-backend's, side.
>>
>> Then, the O/I side is just the same sort of idea that in the
>> flow-machine, the protocols get established in their layers, so that all
>> through the routine, then the domain type are to get specialized to when
>> byte-sequences and handles are known well-formed in compatible
>> protocols, that the domain and protocol come together in their
>> definition, basically so it results that from the back-end is retrieved
>> for messages by their message-ID that are stored compressed at rest, to
>> result passing back handles to those, for example a memory-map range
>> offset to an open handle of a zip file that has the concatenable entry
>> of the message-Id from the groups' day's messages, or a list of those
>> for a range of messages, then the re-routine results passing the handles
>> back out to the attachment, which sends them right out.
>>
>> So, this way there's that besides the TQ and its TW's, that those are to
>> never block or be long-running, that anything that's long-running is on
>> the O/I side, and has its own resources, buffers, and so on, where of
>> course all the resources here of this flow-machine are shared by all the
>> flow-machines in the flow-machine, in the sense that they are not shared
>> yet come from a common resource altogether, and are exclusive. (This
>> gets into the definition of "share" as with regards to "free to share,
>> or copy" and "exclusive to share, a.k.a. taking turns, not cutting in
>> line, and not stealing nor hoarding".)
>>
>>
>> Then on the O/I side or the backend side, it's figured the backend is
>> any kind of adapters, like DB adapters or FS adapters or WS adapters,
>> database or filesystem or webservice, where object-stores are considered
>> filesystem adapters. What that gets into is "pools" like client pools,
>> connection pools, resource pools, that a pool is usually enough
>> according to a session and the establishment of protocol, then with
>> regards to servicing the adapter and according to the protocol and the
>> domain objects that thusly implement the protocol, the backend side has
>> its own dedicated routines and TW's, or threads of execution, with
>> regards to that the backend side basically gets a callback+request and
>> the job is to invoke the adapter with the request, and invoke the
>> callback with the response, then whether for example the callback is
>> actually the original attachment, or it involves "bridging the unbounded
>> sub-protocol", what it means for the adapter to service the command.
>>
>> Then the adapter is usually either provided as with intermediate or
>> domain types, or, for example it's just another protocol flow machine
>> and according to the connections or messaging or mux/demux or
>> establishing and upgrading layers and protocols, it basically works the
>> same way as above in reverse.
>>
>> Here "to service" is the usual infinitive that for the noun means "this
>> machine provides a service" yet as a verb that service means to operate
>> according to the defined behavior of the machine in the resources of the
>> machine to meet the resource needs of the machine's actions in the
>> capabilities and limits of the resources of the machine, where this "I/O
>> flow-machine: a service" is basically one "node" or "process" in a usual
>> process model, allocated its own quota of resources according to the
>> process and its environment model in the runtime in the system, and
>> that's it. So, there's servicing as the main routine, then also what it
>> means the maintenance servicing or service of the extended routine.
>> Then, for protocols it's "implement this protocol according to its
>> standards according to the resources in routine".
>>
>>
>> You know, I don't know where they have one of these anywhere, ....
>>
>>
>
>
>
>
>
>
>
>
>
> So, besides attachment+command+payload, also is for indicating the
> protocol and layers, where it can inferred for the response, when the
> callback exists or as the streaming sub-protocol starts|continues|ends,
> what the response can be, in terms of domain objects, or handles, or
> byte sequences, in terms of domain objects that can result handles to
> transfer or byte-sequences to read or write,
> attachment+command+payload+protocols "ACPP" data structure.
>
> Another idea that seems pretty usual, is when the payload is off to the
> side, about picking up the payload when the request arrives, about when
> the command, in the protocol, involves that the request payload, is off
> to the side, to side-load the payload, where usually it means the
> payload is large, or bigger than the limits of the request size limit in
> the protocol, it sort of seems a good idea, to indicate for the
> protocol, whether it can resolve resource references, "external", then
> that accessing them as off to the side happens before ingesting the
> command or as whether it's the intent to reference the external
> resource, and when, when the external resource off to the side, "is",
> part of the request payload, or otherwise that it's just part of the
> routine.
>
> That though would get into when the side effect of the routine, is to
> result the external reference or call, that it's figured that would all
> be part of the routine. It depends on the protocol, and whether the
> payload "is" fully-formed, with or without the external reference.
>
>
> Then HTTP/2 and Websockets have plenty going on about the multiplexer,
> where it's figured that multiplexed attachments, or "remultiplexer
> attachment", RMA, out from the demultiplexer and back through the
> multiplexer, have then that's another sort of protocol machine, in terms
> of the layers, and about whether there's a thread or not that
> multiplexing requires any sort of state on otherwise the connections'
> attachment, that all the state of the multiplexer is figured lives in a
> data structure on the actual attachment, while the logic should be
> re-entrant and just a usual module for the protocol(s).
>
> It's figured then that the attachment is a key, with respect to a key
> number for the attachment, then that in the multiplexing or muxing
> protocols, there's a serial number of the request or command. There's a
> usual idea to have serial numbers for commands besides, for each
> connection, and then even serial numbers for commands for the lifetime
> of the runtime. Then it's the usual metric of success or the error rate
> how many of those are successes and how many are failures, that
> otherwise the machine is pretty agnostic that being in the protocol.
>
> Timeouts and cancels are sort of figured to be attached to the monad and
> the re-routine. It's figured that for any command in the protocol, it
> has a timeout. When a command is received, is when the timeout countdown
> starts, abstractly wall-clock time or system time. So, the ACPP has also
> the timeout time, so, the task T has an ACPP
> attachment-command-payload-protocol and a routine or reroutine R or RR.
> Then also it has some metrics M or MT, here start time and expiry time,
> and the serial numbers. So, how timeouts work is that when T is to be
> picked up to a TW, first TW checks whether M.time is past expiry, then
> if so it cancels the monad and results returning howsoever in the
> protocol the timeout. If not what's figured is that before the
> re-routine runs through, it just tosses T back on the TQ anyway, so that
> then whenever it comes up again, it's just checked again until such time
> as the task T actually completed, or it expires, or it was canceled, or
> otherwise concluded, according to the combination of the monad of the
> R/RR, and M.time, and system time. Now, this seems bad, because an
> otherwise empty queue, would constantly be thrashing, so it's bad. Then,
> what's to be figured is some sort of parameter, "toss when", that then
> though would have timeout priority queues, or buckets of sorts with
> regards to tossing all the tasks T back on the TQ for no other reason
> than to check their timeout.
>
> It's figured that the monad of the re-routine is all the heap objects
> and references to handles of the outstanding command. So, when the
> re-routine is completed/canceled/concluded, then all the resources of
> the monad should be freed. Then it's figured that any routine to access
> the monad is re-entrant, and so that it results that access to the monad
> is atomic, to build the graph of memos in the monad, then that access to
> each memo is atomic as after access to the monad itself, so that the
> access to the monad is thread-safe (and to be non-blocking, where the
> only thing that happens to the monad is adding re-routine paths, and
> getting and setting values of object values and handles, then releasing
> all of it [, after picking out otherwise the result]).
>
> So it's figured that if there's a sort of sweeper or closer being the
> usual idea of timeouts, then also in the case that for whatever reason
> the asynchronous backend fails, to get a success or error result and
> callback, so that the task T
>
> T{
> RMA attachment; // return/remultiplexer attachment
> PCP command; // protocol command/payload
> RR routine; // routine / re-routine (monad)
> MT metrics; // metrics/time
> }
>
> has that timeouts, are of a sort of granularity. So, it's not so much
> that timeouts need to be delivered at a given exact time, as delivered
> within a given duration of time. The idea is that timeouts both call a
> cancel on the routine and result an error in the protocol. (Connection
> and socket timeouts or connection drops or closures and so on, should
> also result cancels as some form of conclusion cleans up the monad's
> resources.)
>
> There's also that timeouts are irrelevant after conclusion, yet if
> there's a task queue of timeouts, not to do any work fishing them out,
> just letting them expire. Yet, given that timeouts are usually much
> longer than actual execution times, there's no point keeping them around.
>
> Then it's figured each routine and sub-routine, has its timing, then
> it's figured to have that the RR and MT both have the time, then as with
> regards to, the RR and MT both having a monad, then whether it's the
> same monad what it's figured, is what it's figured.
>
> TASK {
> RMA attachment; // return/remultiplexer attachment
> PCP command; // protocol command/payload
> RRMT routine; // routine / re-routine, metrics / time (monad)
> }
>
> Then it's figured that any sub-routine checks the timeout overall, and
> the timeouts up the re-routine, and the timeout of the task, resulting a
> cancel in any timeout, then basically to push that on the back of the
> task queue or LIFO last-in-first-out, which seems a bad idea, though
> that it's to expeditiously return an error and release the resources,
> and cancel any outstanding requests.
>
> So, any time a task is touched, there's checking the attachment whether
> it's dropped, checking the routine whether it's canceled, with the goal
> of that it's all cleaned up to free the resources, and to close any
> handles opened in the course of building the monad of the routine's
> results.
>
> Otherwise while a command is outstanding there's not much to be done
> about it, it's either outstanding and not started or outstanding and
> started, until it concludes and there's a return, the idea being that
> the attachment can drop at any time and that would be according to the
> Inputter/Reader or Recognizer/Parser (an ill-formed command results
> either an error or a drop), the routine can conclude at any time either
> completing or being canceled, then that whether any handles are open in
> the payload, is that a drop in the attachment, disconnect in the
> [streaming] command, or cancel in the routine, ends each of the three,
> each of those two, or that one.
>
> (This is that the command when 'streaming sub-protocol' results a bunch
> of commands in a sub-protocol that's one command in the protocol.)
>
> The idea is that the RMA is only enough detail to relate to the current
> state in the attachment of the remultiplexing, the command is enough
> state to describe its command and payload and with regards to what
> protocol it is and what sub-protocols it entered and what protocol it
> returns to, and the routine is the monad of the entire state of the
> routine, either value objects or open handles, to keep track of all the
> things according to these things.
>
> So, still it's not quite clear how to have the timeout in the case that
> the backend hangs, or drops, or otherwise that there's no response from
> the adapter, what's a timeout. This sort of introduces re-try logic to
> go along with time-out logic.
>
> The re-try logic, involves that anything can fail, and some things can
> be re-tried when they fail. The re-try logic would be part of the
> routine or re-routine, figuring that any re-tries still have to live in
> the time of the command. Then re-tries are kind of like time-outs, it's
> usual that it's not just hammering the re-tries, yet a usual sort of
> back-off and retry-count, or retry strategy, and then whether that it
> involves that it should be a new adapter handle from the pool, about
> that adapter handles from the pool should be round-robin and when there
> are retry-able errors that usually means the adapter connection is
> un-usable, that getting a new adapter connection will get a new one and
> whether retry-able errors plainly enough indicate to recycle the adapter
> pool.
>
> Then, retry-logic also involves resource-down, what's called
> circuit-breaker when the resource is down that it's figured that it's
> down until it's back up. [It's figured that errors by default are _not_
> retry-able, and, then as about the resource-health or
> backend-availability, what gets involved in a model of critical
> resource-recycling and backend-health.]
>
>
> About server-push, there's an idea that it involves the remultiplexer
> and that the routine, according to the protocol, synthesizes tasks and
> is involved with the remultiplexer, to result it makes tasks then that
> run like usual tasks. [This is part of the idea also of the mux or
> remux, about 1:many commands/responses, and usually enough their
> serials, and then, with regards to "opportunistic server push", how to
> drop the commands that follow that would otherwise request the
> resources. HTTP/2 server-push looks deprecated, while then there's
> WebSocket, which basically makes for a different sort of use-case
> peer-peer than client-server. For IMAP is the idea that when there are
> multiple responses to single commands then that's basically in the
> mux/remux. For pipelined commands and also for serial commands is the
> mux/remux. The pipelined commands would result state building in the
> mux/remux when they're returned disordered, with regards to results and
> the handles, and 'TCB' or 'TW' driving response results.]
>
>
> So, how to implement timeout or the sweeper/closer, has for example that
> a connection drop, should cancel all the outstanding tasks for that
> connection. For example, undefined behavior of whatever sort results a
> missed callback, should eventually timeout and cancel the task, or all
> the tasks instances in the TQ for that task. (It's fair enough to just
> mark the monads of the attachment or routine as canceled, then they'll
> just get immediately discarded when they come up in the TQ.) There's no
> point having timeouts in the task queue because they'd either get
> invoked for nothing or get added to the task queue long after the task
> usually completes. (It's figured that most timeouts are loose timeouts
> and most tasks complete in much under their timeout, yet here it's
> automatic that timeouts are granular to each step of the re-routine, in
> terms of the re-routine erroring-out if a sub-routine times-out.)
>
>
> The Recognizer/Parser (Commander) is otherwise stateless, the
> Inputter/Reader and its Remultiplexer Attachment don't know what results
> Tasks, the Task Queue will run (and here non-blockingly) any Task's
> associated routine/re-reroutine, and catch timeouts in the execution of
> the re-routine, the idea is that the sweeper/closer basically would only
> result having anything to do when there's undefined behavior in the
> re-routine, or bugs, or backend timeouts, then whether calls to the
> adapter would have the timeout-task-lessors or "TTL's", in its task
> queue, point being that when there's nothing going on that the entire
> thing is essentially _idle_, with the Inputter/Reader blocked on select
> on the I/O side, the Outputter/Writer or Backend Adapter sent on the O/I
> side, the Inputter/Reader blocked on the O/I side, the TQ's empty (of,
> the protocol, and, the backend adapters), and it's all just pending
> input from the I/O or O/I side, to cascade the callbacks back to idle,
> again.
>
> I.e. there shouldn't be timeout tasks in the TQ, because, at low load,
> they would just thrash and waste cycles, and at high load, would arrive
> late. Yet, it is so that there is formal un-reliability of the routines,
> and, formal un-reliability of the O/I side or backend, [and formal
> un-reliability of connections or drops,] so some sweeper/closer checks
> outstanding commands what should result canceling the command and its
> routines, then as with regards to the backend adapter, recycling or
> teardown the backend adapter, to set it up again.
>
> Then the idea is that, Tasks, well enough represent the outstanding
> commands, yet there's not to be maintaining a task set next to the task
> queue, because it would use more space and maintenance in time than the
> queue itself, while multiple instances of the same Task can be in the
> Task queue as point each to the state of the monad in the re-routine,
> then gets into whether it's so, that, there is a task-set next to the
> task-queue, then that concluding the task removes it from the set, while
> the sweeper/closer just is scheduled to run periodically through the
> entire task-set and cancel those expired, or dropped.
>
> Then, having both a task-set TS and task-queue TQ, maybe seems the thing
> to do, where, it should be sort of rotating, because, the task-queue is
> FIFO, while the task-set is just a set (a concurrent set, though as with
> regards to that the tasks can only be marked canceled, and resubmitted
> to the task queue, with regards to that the only action that removes
> tasks from the task-set is for the task-queue to result them being
> concluded, then that whatever task gets tossed on the task queue is to
> be inserted into the task-set).
>
> Then the task-set TS would be on the order of outstanding tasks, while,
> the task-queue TQ would be on the order of outstanding tasks' re-routines.
>
> Then the usual idea of sweeper/closer is to iterate through a view of
> the TS, check each task whether its attachment dropped or command or
> routine timed-out or canceled, then if dropped or canceled, to toss it
> on the TQ, which would eventually result canceling if not already
> canceled and dropping if dropped.
>
> (Canceling/Cancelling.)
>
> Most of the memory would be in the monads, also the open or live handles
> would be in the routine's monads, with the idea being that when the task
> concludes, then the results, that go out through the remultiplexer,
> should be part of the task.
>
> TASK {
> RMA attachment; // return/remultiplexer attachment
> PCP command; // protocol command/payload
> RRMT routine; // routine / re-routine, metrics / time (monad)
> RSLT result; // result (monad)
> }
>
> It's figured that the routine _returns_ a result, which is either a
> serializable value or otherwise it's according to the protocol, or it's
> a live handle or specification of handle, or it has an error/exception
> that is expected to be according to the protocol, or that there was an
> error then whether it results a drop according to the protocol. So, when
> the routine and task concludes, then the routine and metrics monads can
> be released, or de-allocated or deleted, while what live handles they
> have, are to be passed back as expeditiously as possible to the
> remultiplexer to be written to the output as on the wire the protocol,
> so that the live handles can be closed or their reference counts
> decremented or otherwise released to the handle pool, of a sort, which
> is yet sort of undefined.
>
> The result RSLT isn't really part of the task, once the task is
> concluding, the RRMT goes right to the RMA according to the PCP, that
> being the atomic operation of concluding the task, and deleting it from
> the task-set. (It's figured that outstanding callbacks unaware their
> cancel, of the re-routines, basically don't toss the task back onto the
> TQ if they're canceled, that if they do, it would just sort of
> spuriously add it back to the task-set, which would result it being
> swept out eventually.)
>
> TASK {
> RMA attachment; // return/remultiplexer attachment
> PCP command; // protocol command/payload
> RRMT routine; // routine / re-routine, metrics / time (monad, live
> handles)
> }
>
> TQ // task queue
> TS // task set
>
> TW // task-queue worker thread, latch on TQ
> TZ // task-set cleanup thread, scheduled about timeouts
>
> Then, about what threads run the callbacks, is to get figured out.
>
> TCF // thread call forward
> TCB // thread call back
>
> It's sort of figured that calling forward, is into the adapters and
> backend, and calling back, is out of the result to the remultiplexer and
> running the remultiplexer also. This is that the task-worker thread
> invokes the re-routines, and the re-routine callbacks, are pretty much
> called by the backend or TCF, because all they do is toss back onto the
> TQ, so that the TW runs the re-routines, the TCF is involved in the O/I
> side and the backend adapter, and what reserves live handles, while the
> TCB returns the results through the I/O side, and what recycles live
> handles.
>
> Then it's sort of figured that the TCF result thread groups or whatever
> otherwise results whatever blocks and so on howsoever it is that the
> backend adapter is implemented, while TCB is pretty much a single
> thread, because it's driving I/O back out through all the open
> connections, or that it describes thread groups back out the I/O side.
> ("TCB" not to be confused with "thread control block".)
>
>
> Nonblocking I/O, and, Asynchronous I/O
>
> One thing I'm not too sure about is the limits of the read and write of
> the non-blocking I/O. What I figure is that mostly buffers throughout
> are 4KiB buffers from a free-list, which is the usual idea of reserving
> buffers and getting them off a free-list and returning them when done.
> Then, I sort of figure that the reader, gets about a 1MiB buffer for
> itself, with the idea being, that the Inputter when there is data off
> the wire, reads it into 1MiB buffer, then copies that off to 4KiB buffers.
>
> BFL // buffer free-list, 1
> BIR // buffer of the inputter/reader, 1
> B4K // buffer of 4KiB size, many
>
> What I figure that BIR is "direct memory" as much as possible, for DMA
> where native, while, figuring that pretty much it's buffers on the heap,
> fixed-size buffers of small enough size to usually not be mostly sparse,
> while not so small that usual larger messages aren't a ton of them, then
> with regards to the semantics of offsets and extents in the buffers and
> buffer lists, and atomic consumption of the front of the list and atomic
> concatenation to the back of the list, or queue, and about the
> "monohydra" or "slique" data structure defined way above in this thread.
>
> Then about writing is another thing, I figure that a given number of
> 4KiB buffers will write out, then no longer be non-blocking while
> draining, about the non-blocking I/O, that read is usually non-blocking
> because if nothing is available then nothing gets copied, while write
> may be blocking because the UART or what it is remains to drain to write
> more in.
>
> I'm not even sure about O_NONBLOCK, aio_read/aio_write, and overlapped I/O.
>
> Then it looks like O_NONBLOCKING with select and asynchronous I/O the
> aio or overlapped I/O, sort of have different approaches.
>
> I figure to use non-blocking select, then, the selector for the channel
> at least in Java, has both read and write interest, or all interest,
> with regards to there only being one selector key per channel (socket).
> The issue with this is that there's basically that the Inputter/Reader
> and Outputter/Writer are all one thread. So, it's figured that reads
> would read about a megabyte at a time, then round-robin all the ready
> reads and writes, that for each non-blocking read, it reads as much as a
> megabyte into the one buffer there, copies the read bytes appending it
> into the buffer array in front of the remux Input for the attachment,
> tries to write as many as possbile for the buffer array for the write
> output in front of the remux Output for the attachment, then proceeds
> round-robin through the selector keys. (That each of those is
> non-blocking on the read/write a.k.a. recv/send then copying from the
> read buffer into application buffers is according to as fast as it can
> fill a free-list given list of buffers, though that any might get
> nothing done.)
>
> One of the issues is that the selector keys get waked up for read, when
> there is any input, and for write, when the output has any writeable
> space, yet, there's no reason to service the write keys when there is
> nothing to write, and nothing to read from the read keys when nothing to
> read.
>
> So, it's figured the read keys are always of interest, yet if the write
> keys are of interest, mostly it's only one or the other. So I'd figure
> to have separate read and write selectors, yet, it's suggested they must
> go together the channel the operations of interest, then whether the
> idea is "round-robin write then round-robin read", because all the
> selector keys would always be waking up for writing nothing when the way
> is clear, for nothing.
>
> Then besides non-blocking I/O is asynchronous I/O, where, mostly the
> idea is that the completion handler results about the same, ..., where
> the completion handler is usually enough "copy the data out to read,
> repeat", or just "atomic append more to write, repeat", with though
> whether that results that each connection needs its own read buffers, in
> terms of asynchronous I/O, not saying in what order or whether
> completion handlers, completion ports or completion handlers, would for
> reading each need their own buffer. I.e., to scale to unbounded many
> connections, the idea is to use constant size resources, because
> anything linear would grow unbounded. That what's to write is still all
> these buffers of data and how to "deduplicate the backend" still has
> that the heap fills up with tasks, that the other great hope is that the
> resulting runtime naturally rate-limits itself, by what resources it
> has, heap.
>
> About "live handles" is the sort of hope that "well when it gets to the
> writing the I/O, figuring to transfer an entire file, pass it an open
> handle", is starting to seem a bad idea, mostly for not keeping handles
> open while not actively reading and writing from them, and that mostly
> for the usual backend though that does have a file-system or
> object-store representation, how to result that results a sort of
> streaming sub-protocol routine, about fetching ranges of the objects or
> otherwise that the idea is that the backend file is a zip file, with
> that the results are buffers of data ready to write, or handles, to
> concatenate the compressed sections that happen to be just ranges in the
> file, compressed, with concatenating them together about the internals
> of zip file format, the data at rest. I.e. the idea is that handles are
> sides of a pipe then to transfer the handle as readable to the output
> side of the pipe as writeable.
>
> It seems though for various runtimes, that both a sort of "classic
> O_NONBLOCKING" and "async I/O in callbacks" organizations, can be about
> same, figuring that whenever there's a read that it drives the Layers
> then the Recognizer/Parser (the remux if any and then the
> command/payload parser), and the Layers, and if there's anything to
> write then the usual routine is to send it and release to recycle any
> buffers, or close the handles, as their contents are sent.
>
> It's figured to marshal whatever there is to write as buffers, while,
> the idea of handles results being more on the asynchronous I/O on the
> backend when it's filesystem. Otherwise it would get involved partially
> written handles, though there's definitely something to be said for an
> open handle to an unbounded file, and writing that out without breaking
> it into a streaming-sub-protocol or not having it on the heap.
>
> "Use nonblocking mode for this operation; that is, this call to preadv2
> will fail and set errno to EAGAIN if the operation would block. "
>
> The goal is mostly being entirely non-blocking, then with that the
> atomic consume/concatenate of buffers makes for "don't touch the buffers
> while their I/O is outstanding or imminent", then that what services I/O
> only consumes and concatenates, while getting from the free-list or
> returning to the free-list, what it concatenates or consumes. [It's
> figured to have buffers of 4KiB or 512KiB size, the inputter gets a 1MiB
> direct buffer, that RAM is a very scarce resource.]
>
> So, for the non-blocking I/O, I'm trying to figure out how to service
> the ready reads, while, only servicing ready writes that also have
> something to write. Then I don't much worry about it because ready
> writes with nothing to write would result a no-op. Then, about the
> asynchronous I/O, is that there would always be an outstanding or
> imminent completion result for the ready read, or that, I'm not sure how
> to make it so that reads are not making busy-work, while, it seems clear
> that writes are driven by there being something to write, then though
> not wanting those to hammer when the output buffer is full. In this
> sense the non-blocking vector I/O with select/epoll/kqueue or what, uses
> less resources for services that have various levels of load, day-over-day.
>
>
> https://hackage.haskell.org/package/machines
> https://clojure.org/reference/transducers
> https://chamibuddhika.wordpress.com/2012/08/11/io-demystified/
>
>
> With non-blocking I/O, or at least in Java, the attachment, is attached
> to the selection key, so, they're just round-robin'ed. In asynchronous
> (aio on POSIX or overlapped I/O on Windows respectively), in Java the
> completion event gets the attachment, but doesn't really say how to
> invoke the async send/recv again, and I don't want to maintain a map of
> attachments and connections, though it would be alright if that's the
> way of things.
>
> Then it sort of seems like "non-blocking for read, or drops, async I/O
> for writes". Yet, for example in Java, a SocketChannel is a
> SelectableChannel, while, an AsyncSocketChannel, is not a
> SelectableChannel.
>
> Then, it seems pretty clear that while on Windows, one might want to
> employ the aio model, because it's built into Windows, then as for the
> sort of followup guarantees, or best when on Windows, that otherwise the
> most usual approach is "O_NONBLOCKING" for the socket fd and the fd_set.
>
> Then, what select seems to guarantee, is, that, operations of interest,
> _going to ready_, get updated, it doesn't say anything about going to
> un-ready. Reads start un-ready and writes start ready, then that the
> idea is that select results updating readiness, but not unreadiness.
> Then the usual selector implementation, for the selection keys, and the
> registered keys and the selected keys, for the interest ops (here only
> read and write yet also connect when drops fall out of it) and ready ops.
>
> Yet, it doesn't seem to really claim to guarantee, that while working
> with a view of the selection keys, that if selection keys are removed
> because they're read-unready (nothing to do) or nothing-to-write
> (nothing to do), one worries that the next select round has to have
> marked any read-ready, while, it's figured that any something-to-write,
> should add the corresponding key back to the selection keys. (There's
> for that if the write buffer is full, it would just return 0 I suppose,
> yet not wanting to hammer/thrash/churn instead just write when ready.)
>
> So I want to establish that there can be more than one selector,
> because, otherwise I suppose that the Inputter/Reader (now also
> Outputter/Writer) wants read keys that update to ready, and write keys
> that update to ready, yet not write keys that have nothing-to-do, when
> they're all ready when they have nothing-to-do. Yet, it seems pretty
> much that they all go through one function, like WSPSelect on Windows.
>
> I suppose there's setting the interest ops of the key, according to
> whether there's something to write, figuring there's always something to
> read, yet when there is something to write, would involve finding the
> key and setting its write-interest again. I don't figure that any kind
> of changing the selector keys themselves is any kind of good idea at
> all, but I only want to deal with the keys that get activity.
>
> Also there's an idea that read() or write() might return -1 and set
> EAGAIN in the POSIX thread local error number, yet for example in the
> Java implementation it's to be avoided altogether calling the unready as
> they only return >0 or throw an otherwise ambiguous exception.
>
> So, I'm pretty much of a mind to just invoke select according to 60
> seconds timeout, then just have the I/O thread service all the selection
> keys, what way it can sort of discover drops as it goes through then
> read if readable and write if write-able and timeout according to the
> protocol if the protocol has a timeout.
>
> Yet, it seems instead that when a read() or write() returns until read()
> or write() returns 0, there is a bit of initialization to figure out,
> must be. What it seems that selection is on all the interest ops, then
> to unset interest on OP_WRITE, until there is something to write, then
> to set interest on OP_WRITE on the selector's keys, before entering
> select, wherein it will populate what's writable, as where it's
> writable. Yet, there's not removing the key, as it will show up for
> OP_READ presumably anyways.
>
> Anyways it seems that it's alright to have multiple selectors anyways,
> so having separate read and write selectors seems fine. Then though
> there's two threads, so both can block in select() at the same time.
> Then it's figured that the write selector is initialized by deleting the
> selected-key as it starts by default write-able, and then it's only of
> interest when it's ever full on writing, so it comes up, there's writes
> until done and its' deleted, then that continues until there's nothing
> to do. The reads are pretty simple then and when the selected-keys come
> up they're read until nothing-to-do, then deleted from selected-keys.
> [So, the writer thread is mostly only around to finish unfulfilled writes.]
>
>
> Remux: Multiplexer/Demultiplexer, Remultiplexer, mux/demux
>
> A command might have multiple responses, where it's figured it will
> result multiple tasks, or a single task, that return to a single
> attachment's connection. The multiplexer mostly accepts that requests
> are mutiplexed over the connection, so it results that those are
> ephemeral and that the remux creates remux attachments to the original
> attachment, involved in any sort of frames/chunks. The compression layer
> is variously before or after that, then encryption is after that, while
> some protocols also have encryption of a sort within that.
>
> The remux then results that the Recognizer/Parser just gets input, and
> recognizes frames/chunks their creation, then assembling their contents
> into commands/payloads. Then it's figured that the commands are
> independent and just work their way through as tasks and then get
> chunked/framed as according to the remux, then also as with regards to
> "streaming sub-protocols with respect to the remux".
>
> Pipelined commands basically result a remux, establishing that the
> responses are written in serial order as were received.
>
> It's basically figured that 63 bit or 31 bit serial numbers would be
> plenty to identify unique requests per connection, and connections and
> so on, about the lifetime of the routine and a serial number for each
> thing.
>
>
>
> IO <-> Selectors <-> Rec/Par <-> Remux <-> Rec/Par <-> TQ/TS <-> backend
>
>
>
>
>





Well I figure that any kind of server module for the protocol needs the
client module.

Also it's sort of figured that a client adapter has a similar usual
approach to the non-blocking I/O to get figured out, as what with
regards to then usual usage patterns of the API, and expecting to have a
same sort of model of anything stateful the session, and other issues
involved with the User-Agent, what with regards to the things how a
client is, then as with regards to it has how it constructs the commands
and payloads, with the requests it gets of the commands and partial
payloads (headers, body, payload), how it's to be a thing.

Also it's figured that there should be a plain old stdin/stdout that
then connects to one of these things instead of sockets, then also for
testing and exercising the client/server that it just builds a pair of
unidirectional pipes either way, these then being selectable channels in
Java or otherwise the usual idea of making it so that stdin/stdout are a
connection.

With regards to that then it looks like TLS (1.2, 1.3, maybe 1.1) should
be figured out first, then a reasonably involved multiplexing, then as
with regards to something like the QUIC UDP multiplexing, then about how
that sits with HTTP/2 style otherwise semantics, then as with regards to
SCTP, and this kind of thing.

I.e., if I'm going to implement QUIC, first it should be SCTP.

The idea of the client in the same context as the server, sort of looks
simple, it's a connection pool, then as with regards to that usually
enough, clients call servers not the other way around, and clients send
commands and receive results and servers receive commands and send
results. So, it's the O/I side.

It's pretty much figured that on protocols like HTTP 1.1, and otherwise
usual adapters with one session, there's not much considered about
sessions that bridge adapter connections, with the usual idea that
multiple client-side connections might be a session, and anything
session-stateful is session-stateful anywhere in the back-end fleet,
where it's figured usually that any old host in the backend picks up any
old connection from the front-end.

Trying to figure out QUIC ("hi, we think that TCP/IP is ossified because
we can't just update Chrome and outmode it, and can't be bothered to
implement SCTP and get other things figured out about unreliable
datagrams multiplexing a stream's multiplex connection, or changing IP
addresses"), then it seems adds a sort of selectable-channel abstraction
in front of it, in terms of anything about it being a session, and all
the datagrams just arriving at the datagram port. (QUIC also has
"server-initiated" so it's a peer-to-peer setup not a client-server
setup.) Then it's figured that anything modeling QUIC (over UDP) should
be side-by-side SCTP over UDP, and Datagram TLS DTLS.


So, TLS is ubiquitous, figuring if anybody actually wants to encrypt
anything that it's in the application layer, then there's this ALPN to
get it associated with protocols, or this sort of "no-time for a TLS
handshake, high-five", TLS is ubiquitous, to first sort of figure out
TLS, then for the connections, then about the multiplexing, about kinds
of stateful stream datagram, sessions. ("Man in the middle? Here let me
NAT your PAC while you're on the phone.")

As part of a protocol, there's either TLS always and it precedes
entering otherwise the protocol, or, there's STARTTLS, which is figured
then to for for the duration barring "switching protocols". It's assumed
that "streaming-sub-protocols" are "assume/resume" protocol, while
"switching protocols" is "finish/start".

Then, there's a simple sort of composition of attributes of protocols,
and profiles of protocols after capabilities and thus profiles of
protocols in effect.

In Java the usual idea of TLS is called SSLEngine. Yet, SSLEngine is
sort of organized around blocking calls, or "sitting on the socket". It
doesn't really have a way to feed it the handshake, get the master key,
then just encrypt everything following with that. So it's figured that
as a profile module, it's broken apart a bit the TLS protocol, then
anything to do with certificates or algorithms is in java.security or
javax.security anyways. Then AEAD is just a common way to make encrypted
frames/chunks. It's similar with Zip/Deflate, and that it should be
first-class anyways because there's a usual idea to use zip files as
file system representation of compresssed, concatenable data at rest,
for mostly transferring static constant at what's the compressed data at
rest. The idea of partially-or-weakly encrypted data at rest is a good
dog but won't hunt as above, yet the block-cipher in the usual sense
should operate non-blockingly on the buffers. Not sure about "TLS Change
Cipher".

So, TLS has "TLS connection state", it's transport-layer. Then, what
this introduces is how to characterize or specify frames, or chunks, in
terms of structs, and alignment, and fixed-length and variable-length
fields, of the most usual sorts of binary organizations of records, here
frames or chunks.

https://en.wikipedia.org/wiki/X.690

The ASN.1 encoding, abstract syntax notation, is a very usual way to
make a bunch of the usual things like for other ITU-T or ITU-X
specifications, like X.509 the certificates and so on. Then if the
structure is simple enough, then all the OID's have to get figured out
as basically the OID's are reserved values and constants according to
known kinds of contents, and in the wild there are many tens of
thousands of them, while all that's of interest is a few tens or less,
all the necessary things for interpreting TLS X.509 certificates and
this kind of thing. So, this is a usual way to describe the structures
and nested structures of any organization of data on the wire.

Then, frames and chunks, basically are octets, of a very usual sort of
structure as "header, and body", or "frame" or "chunk", where a frame
usually has a trailer, header-body-trailer. The headers and trailers are
usually fixed length, and one of the fields is the size or length of the
variable-length body. They're sometimes called blocks.

https://datatracker.ietf.org/doc/html/rfc1951 (Deflate)

Deflate has Huffman codes then some run-length encoding, for most of the
textual data it's from an alphabet of almost entirely ISO646 or ASCII,
and there's not much run-length at all, while the alphabets of base32 or
base64 might be very usual, then otherwise binary data is usually
already compressed and to be left alone.
There's basically to be figured if there's word-match for commonly or
recently used words, in the about 32K window of the Deflate block,
mostly otherwise about the minimal character sets and its plain sorted
Huffman table the block. The TLS plaintext blocks are limited to 2^14 or
about 16K, the Deflate non-compressed blocks are limited to about 64K,
the compressed blocks don't have length semantics, only
last-block/end-of-block. The Deflate blocks have a first few bits that
indicate block/last-block, then there's a reserved code end-of-block.
The TLS 1.2 with Deflate says that Deflate state has to continue
TLS-block over TLS block, while, it needn't, for putting Deflate blocks
in TLS blocks closed, though accepting Deflate blocks over consecutive
TLS blocks. For email messages it's figured that the header is a block,
the separator is a block, and the body is a block. For HTTP it's figured
the header is a Defalte block, the separator is a Deflate block, and the
body is a Deflate block. The commands and results, it's figured at
Deflate blocks. This way then they just get concatenated, and are
self-contained. It's figured that decompression, recognize/parse, copies
into plaintext, as whatever has arrived, after encryption, block-ciphers
the block into what's either the TLS 1.2 (not TLS 1.3) or mostly only
the application protocol has as compression, Deflate. (Data is
lsb-to-msb, Huffman codes msb-to-lsb. 256 = 0x100 =
1_0000_0000_0000_0000b is end-of-block. ) For text data it would seem
better to reduce the literal range overall, and also to make a Huffman
table of the characters, which are almost always < 256 many, anyways.
I.e., Deflate doesn't make a Huffman table of the alphabet of the input,
and the minimum length of a duplicate-coded word is 3.


"The Huffman trees for each block are independent
of those for previous or subsequent blocks; the LZ77 algorithm may
use a reference to a duplicated string occurring in a previous block,
up to 32K input bytes before." -
https://datatracker.ietf.org/doc/html/rfc1951#section-2

Zip file format (2012):
https://www.loc.gov/preservation/digital/formats/digformatspecs/APPNOTE%2820120901%29_Version_6.3.3.txt

https://www.loc.gov/preservation/digital/formats/fdd/fdd000354.shtml

"The second mechanism is the creation of a hidden index file containing
an array that maps file offsets of the uncompressed file, at every chunk
size interval, to the corresponding offset in the Deflate compressed
stream. This index is the structure that allows SOZip-aware readers to
skip about throughout the file."

- https://github.com/sozip/sozip-spec/blob/master/sozip_specification.md

It's figured that if the zip file has a length check and perhaps a
checksum attribute for the file, then besides modifications then


So, the profiles in the protocols, or capabilities, are variously called
extensions, about the mode of the protocol, and sub-protocols, or just
the support of the commands.

Then, there's what's "session", in the connection, and
"streaming-sub-protocols", then sorts, "chained-sub-protocols" ("command
sequence"), where streaming is for very large files where chained is for
sequences of commands so related, for examples SMTP's MAIL-RCPT-DATA and
MAIL-RCPT-RSET. Then the usual connection overall is a chained protocol,
from beginning and something like HELO/EHLO to ending and something like
QUIT. In HTTP for example, there's that besides upgrades which is
switching, and perhaps streaming-sub-protocols for large files, and
something like CORS expectations about OPTIONS, there are none, though
application protocol above it like "Real REST", may have.

Then, in "session", are where the application has from the server any of
its own initiated events, these results tasks as for the attachment.

The, "serial-sub-protocol" is when otherwise unordered commands, have
that a command must be issued in order, and also, must be completed
before the next command, altogether, with regards to the application and
session.




About driving the inputter/reader, it's figured the Reader thread, TIR,
both services the input I/O, then also drives the remux remultiplexer
and also drives the rec/par
recognizer/parser, and the decryption and the decompression, so that its
logic is to offload up to a megabyte from each connection, copying that
into buffers for each connection, then go through each of those, and
drive their inputs, constructing what's rec/par'ed and releasing the
buffers. It results it gets a "set" of ready inputs, and there's an idea
that the order those get served should be randomized, with the idea that
any connection is as likely as any other to get their commands in first.

Writing the Code

The idea of writing the code is: the least amount. Then, the protocol
and its related protocols, and its data structures and the elements of
its recognition and parsing, should as possible be minimal, then at
compile time, the implementation, derived, resulting then according to
the schema in "abstract syntax", files and classes and metadata, that
that interfaces and base classes are derived, generated, then that the
implementations are composed as of those.


(The "_" front or back is empty string,
"_" inside is " " space,
"__" inside is "-" dash,
"___" inside is "_" underscore,
and "____" inside is "." dot.)

class SMTP {

extension Base {
enum Specification { RFC821, RFC1869, RFC2821, RFC5321 }
enum Command {HELO, EHLO, MAIL, RCPT, DATA, RSET, VRFY, EXPN, HELP,
NOOP, QUIT}
}

extension SIZE {
enum Specification {RFC1870 }
enum Result { SIZE }
}
extension CHECKPOINT {
enum Specification {RFC1845 }
}
extension CHUNKING {
enum Specification {RFC3030 }
}
extension PIPELINING {
enum Specification {RFC2920 }
}
extension _8BITMIME {
enum Specification {RFC6152 }
enhanced _8BITMIME {
enum Command {EHLO, MAIL}
enum Result {_8BITMIME}
}
}

extension SMTP__AUTH {
enum Specification {RFC4954 }
command {AUTH}
}
extension START__TLS {
enum Specification {RFC3207}
enum Command { STARTTLS }
}

extension DSN {
enum Specification {RFC3461 }
}

extension RFC3865 {
enum Specification {RFC3865}
enhanced RFC3865 {
enum Command {EHLO, MAIL, RCPT }
enum Result {NO__SOLICITING, SOLICIT }
}
}
extension RFC4141 {
enum Specification {RFC4141 }
enhanced RFC4141 {
enum Command {EHLO, MAIL }
enum Result {CONPERM, CONNEG }
}
}

// enum Rfc {RFC3207, RFC6409 }
}

class POP3 {
enum Rfc {RFC1939, RFC 1734 }

extension Base {
enum Specification {RFC1939 }

class States {
class AUTHORIZATION {
enum Command {USER, PASS, APOP, QUIT}
}
class TRANSACTION {
enum Command {STAT, LIST, RETR, DELE, NOOP, RSET, QUIT , TOP, UIDL}
}
class UPDATE {
enum Command {QUIT}
}
}

}
}

class IMAP {
enum Rfc { RFC3501, RFC4315, RFC4466, RFC4978, RFC5256, RFC5819,
RFC6851, RFC8474, RFC9042 }

extension Base {
enum Specification {RFC3501}

class States {
class Any {
enum Command { CAPABILITY, NOOP, LOGOUT }
}
class NotAuthenticated {
enum Command { STARTTLS, AUTHENTICATE, LOGIN }
}
class Authenticated {
enum Command {SELECT, EXAMINE, CREATE, DELETE, RENAME, SUBSCRIBE,
UNSUBSCRIBE, LIST, LSUB, STATUS, APPEND}
}
class Selected {
enum Command { CHECK, CLOSE, EXPUNGE, SEARCH, FETCH, STORE, COPY, UID }
}
}
}

class NNTP {
enum Rfc {RFC3977, RFC4642, RFC4643}

extension Base {
enum Specification {RFC3977}
enum Command {CAPABILITIES, MODE_READER, QUIT, GROUP, LISTGROUP, LAST,
NEXT, ARTICLE, HEAD, BODY, STAT, POST, IHAVE, DATE, HELP, NEWGROUPS,
NEWNEWS, OVER, LIST_OVERVIEW____FMT, HDR, LIST_HEADERS }
}

extension NNTP__COMMON {
enum Specification {RFC2980 }
enum Command {MODE_STREAM, CHECK, TAKETHIS, XREPLIC, LIST_ACTIVE,
LIST_ACTIVE____TIMES, LIST_DISTRIBUTIONS, LIST_DISTRIB____PATS,
LIST_NEWSGROUPS, LIST_OVERVIEW___FMT, LISTGROUP, LIST_SUBSCRIPTIONS,
MODE_READER, XGTITLE, XHDR, XINDEX, XOVER, XPAT, XPATH, XROVER,
XTHREAD, AUTHINFO}
}

extension NNTP__TLS {
enum Specification {RFC4642}
enum Command {STARTTLS }
}

extension NNTP__AUTH {
enum Specification {RFC4643}
enum Command {AUTHINFO}
}

extension RFC4644 {
enum Specification {RFC4644}
enum Command {MODE_STREAM, CHECK, TAKETHIS }
}

enum RFC8054 {
// "like XZVER, XZHDR, XFEATURE COMPRESS, or MODE COMPRESS"
enum Specification {RFC8054}
enum Command {COMPRESS}
}
}

class HTTP {
extension Base {
enum Specification {RFC2616, RFC7321, RFC9110}
enum Command {GET, PUT, POST, OPTIONS, HEAD, DELETE, TRACE, CONNECT,
SEARCH}
}
}

class HTTP2 {
enum Rfc {RFC7540, RFC8740, RFC9113 }
}

class WebDAV { enum Rfc { RFC2518, RFC4918, RFC3253, RFC5323}}
class CardDAV { enum Rfc { RFC6352}}
class CalDAV { enum Rfc { RFC4791}}
class JMAP {}


Now, this isn't much in the way of code yet, just finding the
specifications of the standards and looking through the history of their
evolution in revision with some notion of their forward compatibility in
extension and backward compatibility in deprecation, and just
enumerating some or most of the commands, that according to the state of
connection its implicit states, and about the remux its
connection-multiplexing, what are commands as discrete, what either
layer the protocol, switch the protocol, make states in the session, or
imply constructed tasks what later result responses.


Then it's not much at all with regards to the layers of the protocol,
the streams in the protocol, their layers, the payload or parameters of
the commands, the actual logic of the routines of the commands, and the
results of the commands.

For something like HTTP, then there gets involved that besides the
commands, then the headers, trailers, or other: "attributes", of
commands, that commands have attributes and then payloads or parameters,
in their protocol, or as about "attachment protocol (state) command
attribute parameter payload", is in the semantics of the recognition and
parsing, of commands and attributes, as with regards to parameters that
are part of the application data, and parameters that are part of
attributes.



Recognizer/Parser

The Recognizer/Parser or recpar isn't so much for the entire object
representation of a command, where it's figured that the command +
attributes + payload is for the application itself, as for recognizing
the beginnings through the ends of commands, only parsing so much as
finding well-formedness in the command + attributes (the parameters,
and, variously headers, or, data, in the wire transmission of the body
or payload), and, the protocol semantics of the command, and the
specific protocol change or command task(s) to be constructed, according
to the session and state, of the connection or its stream, and the protocol.

For the layers or filters, of the cryptec or compec, mostly the
recognition is from the frames/chunks/blocks, while in the plaintext,
involves the wire representation of the command, usually a line, its
parameters on the command line, then if according to headers, and when
content-length or other chunking, is according to headers, or a
stop-bit, for example, the dot-stuffing. When for example trailers
follow, is assumed to be defined by the protocol, then as what the
recpar would expect.

Then, parsing of the content is mostly in the application, about the
notion that the commands reflect tasks of routines of a given logic its
implementation, command and parameters are a sort of data structure
specific to the command, that headers and perhaps trailers would be a
usual sort of data structure, while the body, and perhaps headers and
trailers, is a usual sort of data structure, given to the commands.

The recpars will be driven along with the filters and by the TIR thread
so must not block and mostly detect a shallow well-formedess. It's
figured that the implementation of the task on the body, does
deserialization and otherwise validation of the payload.

The TIR, if it finds connection/stream errors, writes the errors as
close streams/connections directly to TCB, 'short-circuit', while
engaging Metrics/Time.

The deserialization and validation of the payload then is as by a TW
task-worker or into the TCF call-forward thread.

The complement to Recognizer/Parser, is a Serializer/Passer, which is
the return path toward the Writer/Outputter, as the TCB call-back
thread, of directly serializable or passable representations of results,
according to that the command and task, has a particular ordering,
format, and wire format, the output. The idea is that results a byte
sequences or transferrable handles that goes out the remux and streams
to the connection according to the remultiplexer attachment to the
outbound connection.

The notion of callbacks generally, basically results the employment of
uni-directional or half-duplex pipes, and a system selector, thus that
as callbacks are constructed to be invoked, they have a plain input
stream that's ignored except for the selector, then that the idea is
there's an attachment on the pipe that's the holder for the
buffers/handles. That is, the idea is that in Java there's
java.nio.channels.spi.SelectorProvider, and openPipe, then that the Pipe
when constructed has that its reference is stored in a synchronous pipe
map of the TCB, with an associated attachment that when a callback
occurs, the pipe attachment has set the buffers/handlers or exception
the result, then simply sends a byte to the pipe, which the TCB picks up
from waiting on the pipe selector, deregisters and deletes the pipe, and
writes the results out the remux attachment, and returns to select() on
the pipe provider, that ultimately the usually no-work-to-do Writer
thread, sees any remaining output on its way out, among them releasing
the buffers and closing the handles.

If there's very much a hard limit on pipes, then the idea is to just
have the reference to the remux attachment and output to write in the
form of an address in the bytes written on the call-back pipe, in this
way only having a single or few long-lived call-back pipes, that TCB
blocks and waits on in select() when there's nothing to do, otherwise
servicing the writing/outputting in the serial order delivered on the
pipe, only having one SelectionKey in the Selector from the
SelectorProvider for the uni-directional pipe, with only read-ops
interest. Pipes are system objects and have limits on the count of
pipes, and, the depth of pipes. So, according to the that, could result
just an outputter/writer queue for a usually-nothing-to-do writer
thread, whether the TCB has a queue to consume, and a pipe-reader thread
that fills the queue, notify()'s the TCB as wait()'s on it, and
re-enters select() then for however it's so that at idle, all threads
are blocked in select(), or wait(), toward "0% CPU", and "floor RAM", at
idle. (... And all usual calls are non-blocking.)

For TW and TCF to "reach idle", or "rest at idle", basically has for
wait() and notify() in Java, vis-a-vis, pipe() and select() in
system-calls. (... Which idle at rest in wait() and wakeup and aren't
interrupted any notify() or notifyAll().) This is a design using mostly
plain and self-contained data structurse, and usual system calls and
library network and synchronization routines, which Java happens to
surface, so, it's not language or runtime-specific.

The callbacks on the forward side, basically are driven by TCF which
services the backend and adapters, those callbacks on the re-routines
pretty much just results the TW tasks, those then in their results
resulting the TCB callbacks as above. It's figured some adapters won't
have non-blocking ways, then either to give them threads, or, to
implement a queue as up after the pipe-selector approach, for the TCF
thread and thread group, and the back-end adapters.

It's figured for streaming-sub-protocols that backend adapters will be
entirely non-blocking also, resulting a usual sort of approach to
low-load high-output round-trip throughput, as what TCF (thread call
forward) returns to TTW (thread task worker) returns to TCB (thread call
back) off the TS and TQ (task set and task queue). Then also is a usual
sort of direct TIR to TCF to TCB approach, or bridging adapters.
Ross Finlayson
2024-04-29 15:17:48 UTC
Reply
Permalink
On 04/28/2024 08:24 PM, Ross Finlayson wrote:
> On 04/27/2024 09:01 AM, Ross Finlayson wrote:
>> On 04/25/2024 10:46 AM, Ross Finlayson wrote:
>>> On 04/22/2024 10:06 AM, Ross Finlayson wrote:
>>>> On 04/20/2024 11:24 AM, Ross Finlayson wrote:
>>>>>
>>>>>
>>>>> Well I've been thinking about the re-routine as a model of cooperative
>>>>> multithreading,
>>>>> then thinking about the flow-machine of protocols
>>>>>
>>>>> NNTP
>>>>> IMAP <-> NNTP
>>>>> HTTP <-> IMAP <-> NNTP
>>>>>
>>>>> Both IMAP and NNTP are session-oriented on the connection, while,
>>>>> HTTP, in terms of session, has various approaches in terms of HTTP 1.1
>>>>> and connections, and the session ID shared client/server.
>>>>>
>>>>>
>>>>> The re-routine idea is this, that each kind of method, is memoizable,
>>>>> and, it memoizes, by object identity as the key, for the method, all
>>>>> its callers, how this is like so.
>>>>>
>>>>> interface Reroutine1 {
>>>>>
>>>>> Result1 rr1(String a1) {
>>>>>
>>>>> Result2 r2 = reroutine2.rr2(a1);
>>>>>
>>>>> Result3 r3 = reroutine3.rr3(r2);
>>>>>
>>>>> return result(r2, r3);
>>>>> }
>>>>>
>>>>> }
>>>>>
>>>>>
>>>>> The idea is that the executor, when it's submitted a reroutine,
>>>>> when it runs the re-routine, in a thread, then it puts in a
>>>>> ThreadLocal,
>>>>> the re-routine, so that when a re-routine it calls, returns null as it
>>>>> starts an asynchronous computation for the input, then when
>>>>> it completes, it submits to the executor the re-routine again.
>>>>>
>>>>> Then rr1 runs through again, retrieving r2 which is memoized,
>>>>> invokes rr3, which throws, after queuing to memoize and
>>>>> resubmit rr1, when that calls back to resubmit r1, then rr1
>>>>> routines, signaling the original invoker.
>>>>>
>>>>> Then it seems each re-routine basically has an instance part
>>>>> and a memoized part, and that it's to flush the memo
>>>>> after it finishes, in terms of memoizing the inputs.
>>>>>
>>>>>
>>>>> Result 1 rr(String a1) {
>>>>> // if a1 is in the memo, return for it
>>>>> // else queue for it and carry on
>>>>>
>>>>> }
>>>>>
>>>>>
>>>>> What is a re-routine?
>>>>>
>>>>> It's a pattern for cooperative multithreading.
>>>>>
>>>>> It's sort of a functional approach to functions and flow.
>>>>>
>>>>> It has a declarative syntax in the language with usual
>>>>> flow-of-control.
>>>>>
>>>>> So, it's cooperative multithreading so it yields?
>>>>>
>>>>> No, it just quits, and expects to be called back.
>>>>>
>>>>> So, if it quits, how does it complete?
>>>>>
>>>>> The entry point to re-routine provides a callback.
>>>>>
>>>>> Re-routines only return results to other re-routines,
>>>>> It's the default callback. Otherwise they just callback.
>>>>>
>>>>> So, it just quits?
>>>>>
>>>>> If a re-routine gets called with a null, it throws.
>>>>>
>>>>> If a re-routine gets a null, it just continues.
>>>>>
>>>>> If a re-routine completes, it callbacks.
>>>>>
>>>>> So, can a re-routine call any regular code?
>>>>>
>>>>> Yeah, there are some issues, though.
>>>>>
>>>>> So, it's got callbacks everywhere?
>>>>>
>>>>> Well, it's just got callbacks implicitly everywhere.
>>>>>
>>>>> So, how does it work?
>>>>>
>>>>> Well, you build a re-routine with an input and a callback,
>>>>> you call it, then when it completes, it calls the callback.
>>>>>
>>>>> Then, re-routines call other re-routines with the argument,
>>>>> and the callback's in a ThreadLocal, and the re-routine memoizes
>>>>> all of its return values according to the object identity of the
>>>>> inputs,
>>>>> then when a re-routine completes, it calls again with another
>>>>> ThreadLocal
>>>>> indicating to delete the memos, following the exact same
>>>>> flow-of-control
>>>>> only deleting the memos going along, until it results all the
>>>>> memos in
>>>>> the re-routines for the interned or ref-counted input are
>>>>> deleted,
>>>>> then the state of the re-routine is de-allocated.
>>>>>
>>>>> So, it's sort of like a monad and all in pure and idempotent
>>>>> functions?
>>>>>
>>>>> Yeah, it's sort of like a monad and all in pure and idempotent
>>>>> functions.
>>>>>
>>>>> So, it's a model of cooperative multithreading, though with no yield,
>>>>> and callbacks implicitly everywhere?
>>>>>
>>>>> Yeah, it's sort of figured that a called re-routine always has a
>>>>> callback in the ThreadLocal, because the runtime has pre-emptive
>>>>> multithreading anyways, that the thread runs through its
>>>>> re-routines in
>>>>> their normal declarative flow-of-control with exception handling, and
>>>>> whatever re-routines or other pure monadic idempotent functions it
>>>>> calls, throw when they get null inputs.
>>>>>
>>>>> Also it sort of doesn't have primitive types, Strings must always
>>>>> be interned, all objects must have a distinct identity w.r.t. ==, and
>>>>> null is never an argument or return value.
>>>>>
>>>>> So, what does it look like?
>>>>>
>>>>> interface Reroutine1 {
>>>>>
>>>>> Result1 rr1(String a1) {
>>>>>
>>>>> Result2 r2 = reroutine2.rr2(a1);
>>>>>
>>>>> Result3 r3 = reroutine3.rr3(r2);
>>>>>
>>>>> return result(r2, r3);
>>>>> }
>>>>>
>>>>> }
>>>>>
>>>>> So, I expect that to return "result(r2, r3)".
>>>>>
>>>>> Well, that's synchronous, and maybe blocking, the idea is that it
>>>>> calls rr2, gets a1, and rr2 constructs with the callback of rr1 and
>>>>> it's
>>>>> own callback, and a1, and makes a memo for a1, and invokes whatever is
>>>>> its implementation, and returns null, then rr1 continues and invokes
>>>>> rr3
>>>>> with r2, which is null, so that throws a NullPointerException, and rr1
>>>>> quits.
>>>>>
>>>>> So, ..., that's cooperative multithreading?
>>>>>
>>>>> Well you see what happens is that rr2 invoked another
>>>>> re-routine or
>>>>> end routine, and at some point it will get called back, and that will
>>>>> happen over and over again until rr2 has an r2, then rr2 will memoize
>>>>> (a1, r2), and then it will callback rr1.
>>>>>
>>>>> Then rr1 had quit, it runs again, this time it gets r2 from the
>>>>> (a1, r2) memo in the monad it's building, then it passes a non-null r2
>>>>> to rr3, which proceeds in much the same way, while rr1 quits again
>>>>> until
>>>>> rr3 calls it back.
>>>>>
>>>>> So, ..., it's non-blocking, because it just quits all the time, then
>>>>> happens to run through the same paces filling in?
>>>>>
>>>>> That's the idea, that re-routines are responsible to build the
>>>>> monad and call-back.
>>>>>
>>>>> So, can I just implement rr2 and rr3 as synchronous and blocking?
>>>>>
>>>>> Sure, they're interfaces, their implementation is separate. If
>>>>> they don't know re-routine semantics then they're just synchronous and
>>>>> blocking. They'll get called every time though when the re-routine
>>>>> gets
>>>>> called back, and actually they need to know the semantics of returning
>>>>> an Object or value by identity, because, calling equals() to implement
>>>>> Memo usually would be too much, where the idea is to actually function
>>>>> only monadically, and that given same Object or value input, must
>>>>> return
>>>>> same Object or value output.
>>>>>
>>>>> So, it's sort of an approach as a monadic pure idempotency?
>>>>>
>>>>> Well, yeah, you can call it that.
>>>>>
>>>>> So, what's the point of all this?
>>>>>
>>>>> Well, the idea is that there are 10,000 connections, and any time
>>>>> one of them demultiplexes off the connection an input command message,
>>>>> then it builds one of these with the response input to the
>>>>> demultiplexer
>>>>> on its protocol on its connection, on the multiplexer to all the
>>>>> connections, with a callback to itself. Then the re-routine is
>>>>> launched
>>>>> and when it returns, it calls-back to the originator by its
>>>>> callback-number, then the output command response writes those back
>>>>> out.
>>>>>
>>>>> The point is that there are only as many Theads as cores so the
>>>>> goal is that they never block,
>>>>> and that the memos make for interning Objects by value, then the
>>>>> goal is
>>>>> mostly to receive command objects and handles to request bodies and
>>>>> result objects and handles to response bodies, then to call-back with
>>>>> those in whatever serial order is necessary, or not.
>>>>>
>>>>> So, won't this run through each of these re-routines umpteen times?
>>>>>
>>>>> Yeah, you figure that the runtime of the re-routine is on the
>>>>> order
>>>>> of n^2 the order of statements in the re-routine.
>>>>>
>>>>> So, isn't that terrible?
>>>>>
>>>>> Well, it doesn't block.
>>>>>
>>>>> So, it sounds like a big mess.
>>>>>
>>>>> Yeah, it could be. That's why to avoid blocking and callback
>>>>> semantics, is to make monadic idempotency semantics, so then the
>>>>> re-routines are just written in normal synchronous flow-of-control,
>>>>> and
>>>>> they're well-defined behavior is exactly according to flow-of-control
>>>>> including exception-handling.
>>>>>
>>>>> There's that and there's basically it only needs one Thread, so,
>>>>> less Thread x stack size, for a deep enough thread call-stack. Then
>>>>> the
>>>>> idea is about one Thread per core, figuring for the thread to
>>>>> always be
>>>>> running and never be blocking.
>>>>>
>>>>> So, it's just normal flow-of-control.
>>>>>
>>>>> Well yeah, you expect to write the routine in normal
>>>>> flow-of-control, and to test it with synchronous and in-memory
>>>>> editions
>>>>> that just run through synchronously, and that if you don't much
>>>>> care if
>>>>> it blocks, then it's the same code and has no semantics about the
>>>>> asynchronous or callbacks actually in it. It just returns when it's
>>>>> done.
>>>>>
>>>>>
>>>>> So what's the requirements of one of these again?
>>>>>
>>>>> Well, the idea is, that, for a given instance of a re-routine,
>>>>> it's
>>>>> an Object, that implements an interface, and it has arguments, and it
>>>>> has a return value. The expectation is that the re-routine gets
>>>>> called
>>>>> with the same arguments, and must return the same return value. This
>>>>> way later calls to re-routines can match the same expectation,
>>>>> same/same.
>>>>>
>>>>> Also, if it gets different arguments, by Object identity or
>>>>> primitive value, the re-routine must return a different return value,
>>>>> those being same/same.
>>>>>
>>>>> The re-routine memoizes its arguments by its argument list,
>>>>> Object
>>>>> or primitive value, and a given argument list is same if the order and
>>>>> types and values of those are same, and it must return the same return
>>>>> value by type and value.
>>>>>
>>>>> So, how is this cooperative multithreading unobtrusively in
>>>>> flow-of-control again?
>>>>>
>>>>> Here for example the idea would be, rr2 quits and rr1 continues, rr3
>>>>> quits and rr1 continues, then reaching rr4, rr4 throws and rr1 quits.
>>>>> When rr2's or rr3's memo-callback completes, then it calls-back
>>>>> rr1. as
>>>>> those come in, at some point rr4 will be fulfilled, and thus rr4 will
>>>>> quit and rr1 will quit. When rr4's callback completes, then it will
>>>>> call-back rr1, which will finally complete, and then call-back
>>>>> whatever
>>>>> called r1. Then rr1 runs itself through one more time to
>>>>> delete or decrement all its memos.
>>>>>
>>>>> interface Reroutine1 {
>>>>>
>>>>> Result1 rr1(String a1) {
>>>>>
>>>>> Result2 r2 = reroutine2.rr2(a1);
>>>>>
>>>>> Result3 r3 = reroutine3.rr3(a1);
>>>>>
>>>>> Result4 r4 = reroutine4.rr4(a1, r2, r3);
>>>>>
>>>>> return Result1.r4(a1, r4);
>>>>> }
>>>>>
>>>>> }
>>>>>
>>>>> The idea is that it doesn't block when it launchs rr2 and rr3, until
>>>>> such time as it just quits when it tries to invoke rr4 and gets a
>>>>> resulting NullPointerException, then eventually rr4 will complete
>>>>> and be
>>>>> memoized and call-back rr1, then rr1 will be called-back and then
>>>>> complete, then run itself through to delete or decrement the ref-count
>>>>> of all its memo-ized fragmented monad respectively.
>>>>>
>>>>> Thusly it's cooperative multithreading by never blocking and always
>>>>> just
>>>>> launching callbacks.
>>>>>
>>>>> There's this System.identityHashCode() method and then there's a
>>>>> notion
>>>>> of Object pools and interning Objects then as for about this way that
>>>>> it's about numeric identity instead of value identity, so that when
>>>>> making memo's that it's always "==" and for a HashMap with
>>>>> System.identityHashCode() instead of ever calling equals(), when
>>>>> calling
>>>>> equals() is more expensive than calling == and the same/same
>>>>> memo-ization is about Object numeric value or the primitive scalar
>>>>> value, those being same/same.
>>>>>
>>>>> https://docs.oracle.com/javase/8/docs/api/java/lang/System.html#identityHashCode-java.lang.Object-
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> So, you figure to return Objects to these connections by their session
>>>>> and connection and mux/demux in these callbacks and then write those
>>>>> out?
>>>>>
>>>>> Well, the idea is to make it so that according to the protocol, the
>>>>> back-end sort of knows what makes a handle to a datum of the sort,
>>>>> given
>>>>> the protocol and the protocol and the protocol, and the callback is
>>>>> just
>>>>> these handles, about what goes in the outer callbacks or outside the
>>>>> re-routine, those can be different/same. Then the single writer
>>>>> thread
>>>>> servicing the network I/O just wants to transfer those handles, or, as
>>>>> necessary through the compression and encryption codecs, then write
>>>>> those out, well making use of the java.nio for scatter/gather and
>>>>> vector
>>>>> I/O in the non-blocking and asynchronous I/O as much as possible.
>>>>>
>>>>>
>>>>> So, that seems a lot of effort to just passing the handles, ....
>>>>>
>>>>> Well, I don't want to write any code except normal flow-of-control.
>>>>>
>>>>> So, this same/same bit seems onerous, as long as different/same has a
>>>>> ref-count and thus the memo-ized monad-fragment is maintained when all
>>>>> sorts of requests fetch the same thing.
>>>>>
>>>>> Yeah, maybe you're right. There's much to be gained by re-using
>>>>> monadic
>>>>> pure idempotent functions yet only invoking them once. That gets into
>>>>> value equality besides numeric equality, though, with regards to going
>>>>> into re-routines and interning all Objects by value, so that inside
>>>>> and
>>>>> through it's all "==" and System.identityHashCode, the memos, then
>>>>> about
>>>>> the ref-counting in the memos.
>>>>>
>>>>>
>>>>> So, I suppose you know HTTP, and about HTTP/2 and IMAP and NNTP here?
>>>>>
>>>>> Yeah, it's a thing.
>>>>>
>>>>> So, I think this needs a much cleaner and well-defined definition, to
>>>>> fully explore its meaning.
>>>>>
>>>>> Yeah, I suppose. There's something to be said for reading it again.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ReRoutines: monadic functional non-blocking asynchrony in the language
>>>>
>>>>
>>>> Implementing a sort of Internet protocol server, it sort of has
>>>> three or
>>>> four kinds of machines.
>>>>
>>>> flow-machine: select/epoll hardware driven I/O events
>>>>
>>>> protocol-establishment: setting up and changing protocol (commands,
>>>> encryption/compression)
>>>>
>>>> protocol-coding: block coding in encryption/compression and wire/object
>>>> commands/results
>>>>
>>>> routine: inside the objects of the commands of the protocol,
>>>> commands/results
>>>>
>>>> Then, it often looks sort of like
>>>>
>>>> flow <-> protocol <-> routine <-> protocol <-> flow
>>>>
>>>>
>>>> On either outer side of the flow is a connection, it's a socket or the
>>>> receipt or sending of a datagram, according to the network interface
>>>> and
>>>> select/epoll.
>>>>
>>>> The establishment of a protocol looks like
>>>> connection/configuration/commencement/conclusion, or setup/teardown.
>>>> Protocols get involved renegotiation within a protocol, and for example
>>>> upgrade among protocols. Then the protocol is setup and established.
>>>>
>>>> The idea is that a protocol's coding is in three parts for
>>>> coding/decoding, compression/decompression, and
>>>> (en)cryption/decryption,
>>>> or as it gets set up.
>>>>
>>>> flow->decrypt->decomp->decod->routine->cod->comp->crypt->flow-v
>>>> flow<-crypt<-comp<-cod<-routine<-decod<-decomp<-decrypt<-flow<-
>>>>
>>>>
>>>>
>>>> Whenever data arrives, the idea goes, is that the flow is interpreted
>>>> according to the protocol, resulting commands, then the routine derives
>>>> results from the commands, as by issuing others, in their protocols, to
>>>> the backend flow. Then, the results get sent back out through the
>>>> protocol, to the frontend, the clients of what it serves the protocol
>>>> the server.
>>>>
>>>> The idea is that there are about 10,000 connections at a time, or more
>>>> or less.
>>>>
>>>> flow <-> protocol <-> routine <-> protocol <-> flow
>>>> flow <-> protocol <-> routine <-> protocol <-> flow
>>>> flow <-> protocol <-> routine <-> protocol <-> flow
>>>> ...
>>>>
>>>>
>>>>
>>>>
>>>> Then, the routine in the middle, has that there's one processor, and on
>>>> the processor are a number of cores, each one independent. Then, the
>>>> operating system establishes that each of the cores, has any number of
>>>> threads-of-control or threads, and each thread has the state of
>>>> where it
>>>> is in the callstack of routines, and the threads are preempted so that
>>>> multithreading, that a core runs multiple threads, gives each thread
>>>> some running from the entry to the exit of the thread, in any given
>>>> interval of time. Each thread-of-control is thusly independent,
>>>> while it
>>>> must synchronize with any other thread-of-control, to establish common
>>>> or mutual state, and threads establish taking turns by mutual
>>>> exclusion,
>>>> called "mutex".
>>>>
>>>> Into and out of the protocol, coding, is either a byte-sequence or
>>>> block, or otherwise the flow is a byte-sequence, that being serial,
>>>> however the protocol multiplexes and demultiplexes messages, the
>>>> commands and their results, to and from the flow.
>>>>
>>>> Then the idea is that what arrives to/from the routine, is objects in
>>>> the protocol, or handles to the transport of byte sequences, in the
>>>> protocol, to the flow.
>>>>
>>>> A usual idea is that there's a thread that services the flow, where,
>>>> how
>>>> it works is that a thread blocks waiting for there to be any I/O,
>>>> input/output, reading input from the flow, and writing output to the
>>>> flow. So, mostly the thread that blocks has that there's one thread
>>>> that
>>>> blocks on input, and when there's any input, then it reads or transfers
>>>> the bytes from the input, into buffers. That's its only job, and only
>>>> one thread can block on a given select/epoll selector, which is any
>>>> given number of ports, the connections, the idea being that it just
>>>> blocks until select returns for its keys of interest, it services each
>>>> of the I/O's by copying from the network interface's buffers into the
>>>> program's buffers, then other threads do the rest.
>>>>
>>>> So, if a thread results waiting at all for any other action to complete
>>>> or be ready, it's said to "block". While a thread is blocked, the
>>>> CPU or
>>>> core just skips it in scheduling the preemptive multithreading, yet it
>>>> still takes some memory and other resources and is in the scheduler of
>>>> the threads.
>>>>
>>>> The idea that the I/O thread, ever blocks, is that it's a feature of
>>>> select/epoll that hardware results waking it up, with the idea that
>>>> that's the only thread that ever blocks.
>>>>
>>>> So, for the other threads, in the decryption/decompression/decoding and
>>>> coding/compression/cryption, the idea is that a thread, runs through
>>>> those, then returns what it's doing, and joins back to a limited
>>>> pool of
>>>> threads, with a usual idea of there being 1 core : 1 thread, so that
>>>> multithreading is sort of simplified, because as far as the system
>>>> process is concerned, it has a given number of cores and the system
>>>> preemptively multithreads it, and as far as the virtual machine is
>>>> concerned, is has a given number of cores and the virtual machine
>>>> preemptively multithreads its threads, about the thread-of-control, in
>>>> the flow-of-control, of the thing.
>>>>
>>>> A usual way that the routine muliplexes and demultiplexes objects in
>>>> the
>>>> protocol from a flow's input back to a flow's output, has that the
>>>> thread-per-connection model has that a single thread carries out the
>>>> entire task through the backend flow, blocking along the way, until it
>>>> results joining after writing back out to its connection. Yet, that has
>>>> a thread per each connection, and threads use scheduling and heap
>>>> resources. So, here thread-per-connection is being avoided.
>>>>
>>>> Then, a usual idea of the tasks, is that as I/O is received and flows
>>>> into the decryption/decompression/decoding, then what's decoded,
>>>> results
>>>> the specification of a task, the command, and the connection, where to
>>>> return its result. The specification is a data structure, so it's an
>>>> object or Object, then. This is added to a queue of tasks, where
>>>> "buffers" represent the ephemeral storage of content in transport the
>>>> byte-sequences, while, the queue is as usually a first-in/first-out
>>>> (FIFO) queue also, of tasks.
>>>>
>>>> Then, the idea is that each of the cores consumes task specifications
>>>> from the task queue, performs them according to the task specification,
>>>> then the results are written out, as coded/compressed/crypted, in the
>>>> protocol.
>>>>
>>>> So, to avoid the threads blocking at all, introduces the idea of
>>>> "asynchrony" or callbacks, where the idea is that the "blocking" and
>>>> "synchronous" has that anywhere in the threads' thread-of-control
>>>> flow-of-control, according to the program or the routine, it is current
>>>> and synchronous, the value that it has, then with regards to what it
>>>> returns or writes, as the result. So, "asynchrony" is the idea that
>>>> there's established a callback, or a place to pause and continue,
>>>> then a
>>>> specification of the task in the protocol is put to an event queue and
>>>> executed, or from servicing the O/I's of the backend flow, that what
>>>> results from that, has the context of the callback and
>>>> returns/writes to
>>>> the relevant connection, its result.
>>>>
>>>> I -> flow -> protocol -> routine -> protocol -> flow -> O -v
>>>> O <- flow <- protocol <- routine <- protocol <- flow <- I <-
>>>>
>>>>
>>>> The idea of non-blocking then, is that a routine either provides a
>>>> result immediately available, and is non-blocking, or, queues a task
>>>> what results a callback that provides the result eventually, and is
>>>> non-blocking, and never invokes any other routine that blocks, so is
>>>> non-blocking.
>>>>
>>>> This way a thread, executing tasks, always runs through a task, and
>>>> thus
>>>> services the task queue or TQ, so that the cores' threads are always
>>>> running and never blocking. (Besides the I/O and O/I threads which
>>>> block
>>>> when there's no traffic, and usually would be constantly woken up and
>>>> not waiting blocked.) This way, the TQ threads, only block when there's
>>>> nothing in the TQ, or are just deconstructed, and reconstructed, in a
>>>> "pool" of threads, the TQ's executor pool.
>>>>
>>>> Enter the ReRoutine
>>>>
>>>> The idea of a ReRoutine, a re-routine, is that it is a usual procedural
>>>> implementation as if it were synchronous, and agnostic of callbacks.
>>>>
>>>> It is named after "routine" and "co-routine". It is a sort of
>>>> co-routine
>>>> that builds a monad and is aware its originating caller, re-caller, and
>>>> callback, or, its re-routine caller, re-caller, and callback.
>>>>
>>>> The idea is that there are callbacks implicitly at each method
>>>> boundary,
>>>> and that nulls are reserved values to indicate the result or lack
>>>> thereof of re-routines, so that the code has neither callbacks nor any
>>>> nulls.
>>>>
>>>> The originating caller has that the TQ, has a task specification, the
>>>> session+attachment of the client in the protocol where to write the
>>>> output, and the command, then the state of the monad of the task, that
>>>> lives on the heap with the task specification and task object. The TQ
>>>> consumers or executors or the executor, when a thread picks up the
>>>> task,
>>>> it picks up or builds ("originates") the monad state, which is the
>>>> partial state of the re-routine and a memo of the partial state of the
>>>> re-routine, and installs this in the thread local storage or
>>>> ThreadLocal, for the duration of the invocation of the re-routine. Then
>>>> the thread enters the re-routine, which proceeds until it would block,
>>>> where instead it queues a command/task with callback to re-call it to
>>>> re-launch it, and throw a NullPointerException and quits/returns.
>>>>
>>>> This happens recursively and iteratively in the re-routine implemented
>>>> as re-routines, each re-routine updates the partial state of the monad,
>>>> then that as a re-routine completes, it re-launches the calling
>>>> re-routine, until the original re-routine completes, and it calls the
>>>> original callback with the result.
>>>>
>>>> This way the re-routine's method body, is written as plain declarative
>>>> procedural code, the flow-of-control, is exactly as if it were
>>>> synchronous code, and flow-of-control is exactly as if written in the
>>>> language with no callbacks and never nulls, and exception-handling as
>>>> exactly defined by the language.
>>>>
>>>> As the re-routine accumulates the partial results, they live on the
>>>> heap, in the monad, as a member of the originating task's object the
>>>> task in the task queue. This is always added back to the queue as
>>>> one of
>>>> the pending results of a re-routine, so it stays referenced as an
>>>> object
>>>> on the heap, then that as it is completed and the original re-routine
>>>> returns, then it's no longer referenced and the garbage-collector can
>>>> reclaim it from the heap or the allocator can delete it.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Well, for the re-routine, I sort of figure there's a Callstack and a
>>>> Callback type
>>>>
>>>> class Callstack {
>>>> Stack<Callback> callstack;
>>>> }
>>>>
>>>> interface Callback {
>>>> void callback() throws Exception;
>>>> }
>>>>
>>>> and then a placeholder sort of type for Callflush
>>>>
>>>> class Callflush {
>>>> Callstack callstack;
>>>> }
>>>>
>>>> with the idea that the presence in ThreadLocals is to be sorted out,
>>>> about a kind of ThreadLocal static pretty much.
>>>>
>>>> With not returning null and for memoizing call-graph dependencies,
>>>> there's basically for an "unvoid" type.
>>>>
>>>> class unvoid {
>>>>
>>>> }
>>>>
>>>> Then it's sort of figure that there's an interface with some defaults,
>>>> with the idea that some boilerplate gets involved in the Memoization.
>>>>
>>>> interface Caller {}
>>>>
>>>> interface Callee {}
>>>>
>>>> class Callmemo {
>>>> memoize(Caller caller, Object[] args);
>>>> flush(Caller caller);
>>>> }
>>>>
>>>>
>>>> Then it seems that the Callstack should instead be of a Callgraph, and
>>>> then what's maintained from call to call is a Callpath, and then what's
>>>> memoized is all kept with the Callgraph, then with regards to
>>>> objects on
>>>> the heap and their distinctness, only being reachable from the
>>>> Callgraph, leaving less work for the garbage collector, to maintain the
>>>> heap.
>>>>
>>>> The interning semantics would still be on the class level, or for
>>>> constructor semantics, as with regards to either interning Objects for
>>>> uniqueness, or that otherwise they'd be memoized, with the key being
>>>> the
>>>> Callpath, and the initial arguments into the Callgraph.
>>>>
>>>> Then the idea seems that the ThreaderCaller, establishes the Callgraph
>>>> with respect to the Callgraph of an object, installing it on the
>>>> thread,
>>>> otherwise attached to the Callgraph, with regards to the ReRoutine.
>>>>
>>>>
>>>>
>>>> About the ReRoutine, it's starting to come together as an idea, what is
>>>> the apparatus for invoking re-routines, that they build the monad of
>>>> the
>>>> IOE's (inputs, outputs, exceptions) of the re-routines in their
>>>> call-graph, in terms of ThreadLocals of some ThreadLocals that callers
>>>> of the re-routines, maintain, with idea of the memoized monad along the
>>>> way, and each original re-routine.
>>>>
>>>> class IOE <O, E> {
>>>> Object[] input;
>>>> Object output;
>>>> Exception exception;
>>>> }
>>>>
>>>> So the idea is that there are some ThreadLocal's in a static
>>>> ThreadGlobal
>>>>
>>>> public class ThreadGlobals {
>>>> public static ThreadLocal<MonadMemo> monadMemo;
>>>> }
>>>>
>>>> where callers or originators or ReRoutines, keep a map of the Runnables
>>>> or Callables they have, to the MonadMemo's,
>>>>
>>>> class Originator {
>>>> Map<? extends ReRoutineMapKey, MonadMemo> monadMemoMap;
>>>> }
>>>>
>>>> then when it's about to invoke a Runnable, if it's a ReRoutine, then it
>>>> either retrieves the MonadMemo or makes a new one, and sets it on the
>>>> ThreadLocal, then invokes the Runnable, then clears the ThreadLocal.
>>>>
>>>> Then a MonadMemo, pretty simply, is a List of IOE's, that when the
>>>> ReRoutine runs through the callgraph, the callstack is indicated by a
>>>> tree of integers, and the stack path in the ReRoutine, so that any
>>>> ReRoutine that calls ReRoutines A/B/C, points to an IOE that it
>>>> finds in
>>>> the thing, then it's default behavior is to return its memo-ized value,
>>>> that otherwise is making the callback that fills its memo and
>>>> re-invokes
>>>> all the way back the Original routine, or just its own entry point.
>>>>
>>>> This is basically that the Originator, when the ReRoutine quits out,
>>>> sort of has that any ReRoutine it originates, also gets filled up by
>>>> the
>>>> Originator.
>>>>
>>>> So, then the Originator sort of has a map to a ReRoutine, then for any
>>>> Path, the Monad, so that when it sets the ThreadLocal with the
>>>> MonadMemo, it also sets the Path for the callee, launches it again when
>>>> its callback returned to set its memo and relaunch it, then back up the
>>>> path stack to the original re-routine.
>>>>
>>>> One of the issues here is "automatic parallelization". What I mean by
>>>> that is that the re-routine just goes along and when it gets nulls
>>>> meaning "pending" it just continues along, then expects
>>>> NullPointerExceptions as "UnsatisifiedInput", to quit, figuring it gets
>>>> relaunched when its input is satisfied.
>>>>
>>>> This way then when routines serially don't depend on each others'
>>>> outputs, then they all get launched apiece, parallelizing.
>>>>
>>>> Then, I wonder about usual library code, basically about Collections
>>>> and
>>>> Streams, and the usual sorts of routines that are applied to the
>>>> arguments, and how to basically establish that the rule of re-routine
>>>> code is that anything that gets a null must throw a
>>>> NullPointerException, so the re-routine will quit until the arguments
>>>> are satisfied, the inputs to library code. Then with the Memo being
>>>> stored in the MonadMemo, it's figured that will work out regardless the
>>>> Objects' or primitives' value, with regards to Collections and Stream
>>>> code and after usual flow-of-control in Iterables for the for loops, or
>>>> whatever other application library code, that they will be run each
>>>> time
>>>> the re-routine passes their section with satisfied arguments, then as
>>>> with regards to, that the Memo is just whatever serial order the
>>>> re-routine passes, not needing to lookup by Object identity which is
>>>> otherwise part of an interning pattern.
>>>>
>>>> void rr1(String s1) {
>>>>
>>>> List<String> l1 = rr2.get(s1);
>>>>
>>>> Map<String, String> m1 = new LinkedHashMap<>();
>>>>
>>>> l1.stream().forEach(s -> m1.put(s, rr3.get(s)));
>>>>
>>>> return m1;
>>>> }
>>>>
>>>> See what I figure is that the order of the invocations to rr3.get() is
>>>> serial, so it really only needs to memoize its OE, Output|Exception,
>>>> then about that putting null values in the Map, and having to check the
>>>> values in the Map for null values, and otherwise to make it so that the
>>>> semantics of null and NullPointerException, result that satisfying
>>>> inputs result calls, and unsatisfying inputs result quits, figuring
>>>> those unsatisfying inputs are results of unsatisfied outputs, that will
>>>> be satisfied when the callee gets populated its memo and makes the
>>>> callback.
>>>>
>>>> If the order of invocations is out-of-order, gets again into whether
>>>> the
>>>> Object/primitive by value needs to be the same each time, IOE, about
>>>> the
>>>> library code in Collections, Streams, parallelStream, and Iterables,
>>>> and
>>>> basically otherwise that any kind of library code, should throw
>>>> NullPointerException if it gets an "unexpected" null or what doesn't
>>>> fulfill it.
>>>>
>>>> The idea though that rr3 will get invoked say 1000 times with the rr2's
>>>> result, those each make their call, then re-launch 1000 times, has that
>>>> it's figured that the Executor, or Originator, when it looks up and
>>>> loads the "ReRoutineMapKey", is to have the count of those and whether
>>>> the count is fulfilled, then to no-op later re-launches of the
>>>> call-backs, after all the results are populated in the partial monad
>>>> memo.
>>>>
>>>> Then, there's perhaps instead as that each re-routine just checks its
>>>> input or checks its return value for nulls, those being unsatisfied.
>>>>
>>>> (The exception handling thoroughly or what happens when rr3 throws and
>>>> this kind of thing is involved thoroughly in library code.)
>>>>
>>>> The idea is it remains correct if the worst thing nulls do is throw
>>>> NullPointerException, because that's just a usual quit and means
>>>> another
>>>> re-launch is coming up, and that it automatically queues for
>>>> asynchronous parallel invocation each the derivations while resulting
>>>> never blocking.
>>>>
>>>> It's figured that re-routines check their inputs for nulls, and throw
>>>> quit, and check their inputs for library container types, and checking
>>>> any member of a library container collection for null, to throw quit,
>>>> and then it will result that the automatic asynchronous parallelization
>>>> proceeds, while the re-routines are never blocking, there's only as
>>>> much
>>>> memory on the heap of the monad as would be in the lifetime of the
>>>> original re-routine, and whatever re-calls or re-launches of the
>>>> re-routine established local state in local variables and library code,
>>>> would come in and out of scope according to plain stack unwinding.
>>>>
>>>> Then there's still the perceived deficiency that the re-routine's
>>>> method
>>>> body will be run many times, yet it's only run as many times as result
>>>> throwing-quit, when it reaches where its argument to the re-routine or
>>>> result value isn't yet satisfied yet is pending.
>>>>
>>>> It would re-run the library code any number of times, until it results
>>>> all non-nulls, then the resulting satisfied argument to the following
>>>> re-routines, would be memo-ized in the monad, and the return value of
>>>> the re-routine thus returning immediately its value on the partial
>>>> monad.
>>>>
>>>> This way each re-call of the re-routine, mostly encounters its own
>>>> monad
>>>> results in constant time, and throws-quit or gets thrown-quit only when
>>>> it would be unsatisfying, with the expectation that whatever
>>>> throws-quit, either NullPointerException or extending
>>>> NullPointerException, will have a pending callback, that will queue
>>>> on a
>>>> TQ, the task specification to re-launch and re-enter the original or
>>>> derived, re-routine.
>>>>
>>>> The idea is sort of that it's sort of, Java with non-blocking I/O and
>>>> ThreadLocal (1.7+, not 17+), or you know, C/C++ with non-blocking I/O
>>>> and thread local storage, then for the abstract or interface of the
>>>> re-routines, how it works out that it's a usual sort of model of
>>>> co-operative multithreading, the re-routine, the routine "in the
>>>> language".
>>>>
>>>>
>>>> Then it's great that the routine can be stubbed or implemented agnostic
>>>> of asynchrony, and declared in the language with standard libraries,
>>>> basically using the semantics of exception handling and convention of
>>>> re-launching callbacks to implement thread-of-control flow-of-control,
>>>> that can be implemented in the synchronous and blocking for unit tests
>>>> and modules of the routine, making a great abstraction of
>>>> flow-of-control.
>>>>
>>>>
>>>> Basically anything that _does_ block then makes for having its own
>>>> thread, whose only job is to block and when it unblocks, throw-toss the
>>>> re-launch toward the origin of the re-routine, and consume the next
>>>> blocking-task off the TQ. Yet, the re-routines and their servicing the
>>>> TQ only need one thread and never block. (And scale in core count and
>>>> automatically parallelize asynchronous requests according to satisfied
>>>> inputs.)
>>>>
>>>>
>>>> Mostly the idea of the re-routine is "in the language, it's just plain,
>>>> ordinary, synchronous routine".
>>>>
>>>>
>>>>
>>>
>>>
>>> Protocol Establishment
>>>
>>> Each of these protocols is a combined sort of protocol, then according
>>> to different modes, there's established a protocol, then data flows in
>>> the protocol (in time).
>>>
>>>
>>> stream-based (connections)
>>> sockets, TCP/IP
>>> sctp SCTP
>>> message-based (datagrams)
>>> datagrams, UDP
>>>
>>> The idea is that connections can have state and session state, while,
>>> messages do not.
>>>
>>> Abstractly then there's just that connections make for reading from the
>>> connection, or writing to the connection, byte-by-byte,
>>> while messages make for receiving a complete message, or writing a
>>> complete message. SCTP is sort of both.
>>>
>>> A bit more concretely, the non-blocking or asychronous or vector I/O,
>>> means that when some bytes arrive the connection is readable, and while
>>> the output buffer is not full a connection is writeable.
>>>
>>> For messages it's that when messages arrive messages are readable, and
>>> while the output buffer is not full messages are writeable.
>>>
>>> Otherwise bytes or messages that pile up while not readable/writeable
>>> pile up and in cases of limited resources get lost.
>>>
>>> So, the idea is that when bytes arrive, whatever's servicing the I/O's
>>> has that the connection has data to read, and, data to write.
>>> The usual idea is that an abstract Reader thread, will give any or all
>>> of the connections something to read, in an arbitrary order,
>>> at an arbitrary rate, then the role of the protocol, is to consume the
>>> bytes to read, thus releasing the buffers, that the Reader, writes to.
>>>
>>> Inputting/Reading
>>> Writing/Outputting
>>>
>>> The most usual idea of client-server is that
>>> client writes to server then reads from server, while,
>>> server reads from client then writes to client.
>>>
>>> Yet, that is just a mode, reads and writes are peer-peer,
>>> reads and writes in any order, while serial according to
>>> that bytes in the octet stream arrive in an order.
>>>
>>> There isn't much consideration of the out-of-band,
>>> about sockets and the STREAMS protocol, for
>>> that bytes can arrive out-of-band.
>>>
>>>
>>> So, the layers of the protocol, result that some layers of the protocol
>>> don't know anything about the protocol, all they know is sequences of
>>> bytes, and, whatever session state is involved to implement the codec,
>>> of the layers of the protocol. All they need to know is that given that
>>> all previous bytes are read/written, that the connection's state is
>>> synchronized, and everything after is read/written through the layer.
>>> Mostly once encryption or compression is setup it's never toredown.
>>>
>>> Encryption, TLS
>>> Compression, LZ77 (Deflate, gzip)
>>>
>>> The layers of the protocol, result that some layers of the protocol,
>>> only indicate state or conditions of the session.
>>>
>>> SASL, Login, AuthN/AuthZ
>>>
>>> So, for NNTP, a connection, usually enough starts with no layers,
>>> then in the various protocols and layers, get negotiated to get
>>> established,
>>> combinations of the protocols and layers. Other protocols expect to
>>> start with layers, or not, it varies.
>>>
>>> Layering, then, either is in the protocol, to synchronize the session
>>> then establish the layer in the layer protocol then maintain the layer
>>> in the main protocol, has that TLS makes a handsake to establish a
>>> encryption key for all the data, then the TLS layer only needs to
>>> encrypt and decrypt the data by that key, while for Deflate, it's
>>> usually the only option, then after it's setup as a layer, then
>>> everything other way reads/writes gets compressed.
>>>
>>>
>>> client -> REQUEST
>>> RESPONSE <- server
>>>
>>> In some protocols these interleave
>>>
>>> client -> REQUEST1
>>> client -> REQUEST2
>>>
>>> RESPONSE1A <- server
>>> RESPONSE2A <- server
>>> RESPONSE1B <- server
>>> RESPONSE2B <- server
>>>
>>> This then is called multiplexing/demultiplexing, for protocols like IMAP
>>> and HTTP/2,
>>> and another name for multiplexer/demultiplexer is mux/demux.
>>>
>>>
>>>
>>>
>>> So, for TLS, the idea is that usually most or all of the connections
>>> will be using the same algorithms with different keys, and each
>>> connection will have its own key, so the idea is to completely separate
>>> TLS establishment from TLS cryptec (crypt/decryp), so, the layer need
>>> only key up the bytes by the connection's key, in their TLS frames.
>>>
>>> Then, most of the connections will use compression, then the idea is
>>> that the data is stored at rest compressed already and in a form that it
>>> can be concatenated, and that similarly as constants are a bunch of the
>>> textual context of the text-based protocol, they have compressed and
>>> concatenable constants, with the idea that the Deflate compec
>>> (comp/decomp) just passes those along concatenating them, or actively
>>> compresses/decompresses buffers of bytes or as of sequences of bytes.
>>>
>>> The idea is that Readers and Writers deal with bytes at a time,
>>> arbitrarily many, then that what results being passed around as the
>>> data, is as much as possible handles to the data. So, according to the
>>> protocol and layers, indicates the types, that the command routines, get
>>> and return, so that the command routines can get specialized, when the
>>> data at rest, is already layerized, and otherwise to adapt to the more
>>> concrete abstraction, of the non-blocking, asynchronous, and vector I/O,
>>> of what results the flow-machine.
>>>
>>>
>>> When the library of the runtime of the framework of the language
>>> provides the cryptec or compec, then, there's issues, when, it doesn't
>>> make it so for something like "I will read and write you the bytes as of
>>> making a TLS handshake, then return the algorithm and the key and that
>>> will implement the cryptec", or, "compec, here's either some data or
>>> handles of various types, send them through", it's to be figured out.
>>> The idea for the TLS handshake, is basically to sit in the middle, i.e.
>>> to read and write bytes as of what the client and server send, then
>>> figuring out what is the algorithm and key and then just using that as
>>> the cryptec. Then after TLS algorithm and key is established the rest is
>>> sort of discarded, though there's some idea about state and session, for
>>> the session key feature in TLS. The TLS 1.2 also includes comp/decomp,
>>> though, it's figured that instead it's a feature of the protocol whether
>>> it supports compression, point being that's combining layers, and to be
>>> implemented about these byte-sequences/handles.
>>>
>>>
>>> mux/demux
>>> crypt/decrypt
>>> comp/decomp
>>> cod/decod
>>>
>>> codec
>>>
>>>
>>> So, the idea is to implement toward the concrete abstraction of
>>> nonblocking vector I/O, while, remaining agnostic of that, so that all
>>> sorts the usual test routines yet particularly the composition of layers
>>> and establishment and upgrade of protocols, is to happen.
>>>
>>>
>>> Then, from the byte sequences or messages as byte sequences, or handles
>>> of byte sequences, results that in the protocol, the protocol either way
>>> in/out has a given expected set of alternatives that it can read, then
>>> as of derivative of those what it will write.
>>>
>>> So, after the layers, which are agnostic of anything but byte-sequences,
>>> and their buffers and framing and chunking and so on, then is the
>>> protocol, or protocols, of the command-set and request/response
>>> semantics, and ordering/session statefulness, and lack thereof.
>>>
>>> Then, a particular machine in the flow-machine is as of the "Recognizer"
>>> and "Parser", then what results "Annunciators" and "Legibilizers", as it
>>> were, of what's usually enough called "Deserialization", reading off
>>> from a serial byte-sequence, and "Serialization, writing off to a serial
>>> byte-sequence, first the text of the commands or the structures in these
>>> text-based protocols, the commands and their headers/bodies/payloads,
>>> then the Objects in the object types of the languages of the runtime,
>>> where then the routines of the servicing of the protocol, are defined in
>>> types according to the domain types of the protocol (and their
>>> representations as byte-sequences and handles).
>>>
>>> As packets and bytes arrive in the byte-sequence, the Recognizer/Parser
>>> detects when there's a fully-formed command, and its payload, after the
>>> Mux/Demux Demultiplexer, has that the Demultiplexer represents any given
>>> number of separate byte-sequences, then according to the protocol
>>> anything their statefulness/session or orderedness/unorderedness.
>>>
>>> So, the Demultiplexer is to Recognize/Parse from the combined input
>>> byte-stream its chunks, that now the connection, has any number of
>>> ordered/unordered byte-sequences, then usually that those are ephemeral
>>> or come and go, while the connection endures, with the most usual notion
>>> that there's only one stream and it's ordered in requets and ordered in
>>> responses, then whether commands gets pipelined and requests need not
>>> await their responses (they're ordered), and whether commands are
>>> numbers and their responses get associated with their command sequence
>>> numbers (they're unordered and the client has its own mux/demux to
>>> relate them).
>>>
>>> So, the Recognizer/Parser, theoretically only gets a byte at a time, or
>>> even none, and may get an entire fully-formed message (command), or not,
>>> and may get more bytes than a fully-formed message, or not, and the
>>> bytes may be a well-formed message, or not, and valid, or not.
>>>
>>> Then the job of the Recognizer/Parser, is from the beginning of the
>>> byte-sequence, to Recognize a fully-formed message, then to create an
>>> instance of the command object related to the handle back through the
>>> mux/demux to the multiplexer, called the attachment to the connection,
>>> or the return address according to the attachment representing any
>>> routed response and usually meaning that the attachment is the user-data
>>> and any session data attached to the connection and here of the
>>> mux/demux of the connection, the job of the Recognizer/Parser is to work
>>> any time input is received, then to recognize and parse any number of
>>> fully-formed messages from the input, create those Commands according to
>>> the protocol, that the attachment includes the return destination, and,
>>> thusly release those buffers or advance the marker on the Input
>>> byte-sequence, so that the resources are freed, and later
>>> Recognizings/Parsing starts where it left off.
>>>
>>> The idea is that bytes arrive, the Recognizer/Parser has to determine
>>> when there's a fully-formed message, consume that and service the
>>> buffers the byte-sequence, having created the derived command.
>>>
>>> Now, commands are small, or so few words, then the headers/body/payload,
>>> basically get larger and later unboundedly large. Then, the idea is that
>>> the protocol, has certain modes or sub-protocols, about "switching
>>> protocols", or modes, when basically the service of the routine changes
>>> from recognizing and servicing the beginning to ending of a command, to
>>> recognizing and servicing an arbitrarily large payload, or, for example,
>>> entering a mode where streamed data arrives or whatever sort, then that
>>> according to the length or content of the sub-protocol format, the
>>> Recognizer's job includes that the sub-protocol-streaming, modes, get
>>> into that "sub-protocols" is a sort of "switching protocols", the only
>>> idea though being going into the sub-protocol then back out to the main
>>> protocol, while "switching protocols" is involved in basically any the
>>> establishment or upgrade of the protocol, with regards to the stateful
>>> connection (and not stateless messages, which always are according to
>>> their established or simply some fixed protocol).
>>>
>>> This way unboundedly large inputs, don't actually live in the buffers of
>>> the Recognizers that service the buffers of the Inputters/Readers and
>>> Multiplexers/Demultiplexers, instead define modes where they will be
>>> streaming through arbitrarily large payloads.
>>>
>>> Here for NNTP and so on, the payloads are not considered arbitrarily
>>> large, though, it's sort of a thing that sending or receiving the
>>> payload of each message, can be defined this way so that in very, very
>>> limited resources of buffers, that the flow-machine keeps flowing.
>>>
>>>
>>> Then, here, the idea is that these commands and their payloads, have
>>> their outputs that are derived as a function of the inputs. It's
>>> abstractly however this so occurs is the way it is. The idea here is
>>> that the attachment+command+payload makes a re-routine task, and is
>>> pushed onto a task queue (TQ). Then it's figured that the TQ represents
>>> abstractly the execution of all the commands. Then, however many Task
>>> Workers or TW, or the TQ that runs itself, get the oldest task from the
>>> queue (FIFO) and run it. When it's complete, then there's a response
>>> ready in byte-sequences are handles, these are returned to the
>>> attachment.
>>>
>>> (The "attachment" usually just means a user or private datum associated
>>> with the connection to identify its session with the connection
>>> according to non-blocking I/O, here it also means the mux/demux
>>> "remultiplexer" attachment, it's the destination of any response
>>> associated with a stream of commands over the connection.)
>>>
>>> So, here then the TQ basically has the idea of the re-routine, that is
>>> non-blocking and involves the asynchronous fulfillment of the routine in
>>> the domain types of the domain of object types that the protocol adapts
>>> as an adapter, that the domain types fulfill as adapted. Then for NNTP
>>> that's like groups and messages and summaries and such, the objects. For
>>> IMAP its mailboxes and messages to read, for SMTP its emails to send,
>>> with various protocols in SMTP being separate protocols like DKIM or
>>> what, for all these sorts protocols. For HTTP and HTTP/2 it's usual HTTP
>>> verbs, usually HTTP 1.1 serial and pipelined requests over a connection,
>>> in HTTP/2 mutiplexed requests over a connection. Then "session" means
>>> broadly that it may be across connections, what gets into the attachment
>>> and the establishment and upgrade of protocol, that sessions are
>>> stateful thusly, yet granularly, as to connections yet as to each
>>> request.
>>>
>>>
>>> Then, the same sort of thing is the same sort of thing to back-end,
>>> whatever makes for adapters, to domain types, that have their protocols,
>>> and what results the O/I side to the I/O side, that the I/O side is the
>>> server's client-facing side, while the O/I side is the
>>> server-as-a-client-to-the-backend's, side.
>>>
>>> Then, the O/I side is just the same sort of idea that in the
>>> flow-machine, the protocols get established in their layers, so that all
>>> through the routine, then the domain type are to get specialized to when
>>> byte-sequences and handles are known well-formed in compatible
>>> protocols, that the domain and protocol come together in their
>>> definition, basically so it results that from the back-end is retrieved
>>> for messages by their message-ID that are stored compressed at rest, to
>>> result passing back handles to those, for example a memory-map range
>>> offset to an open handle of a zip file that has the concatenable entry
>>> of the message-Id from the groups' day's messages, or a list of those
>>> for a range of messages, then the re-routine results passing the handles
>>> back out to the attachment, which sends them right out.
>>>
>>> So, this way there's that besides the TQ and its TW's, that those are to
>>> never block or be long-running, that anything that's long-running is on
>>> the O/I side, and has its own resources, buffers, and so on, where of
>>> course all the resources here of this flow-machine are shared by all the
>>> flow-machines in the flow-machine, in the sense that they are not shared
>>> yet come from a common resource altogether, and are exclusive. (This
>>> gets into the definition of "share" as with regards to "free to share,
>>> or copy" and "exclusive to share, a.k.a. taking turns, not cutting in
>>> line, and not stealing nor hoarding".)
>>>
>>>
>>> Then on the O/I side or the backend side, it's figured the backend is
>>> any kind of adapters, like DB adapters or FS adapters or WS adapters,
>>> database or filesystem or webservice, where object-stores are considered
>>> filesystem adapters. What that gets into is "pools" like client pools,
>>> connection pools, resource pools, that a pool is usually enough
>>> according to a session and the establishment of protocol, then with
>>> regards to servicing the adapter and according to the protocol and the
>>> domain objects that thusly implement the protocol, the backend side has
>>> its own dedicated routines and TW's, or threads of execution, with
>>> regards to that the backend side basically gets a callback+request and
>>> the job is to invoke the adapter with the request, and invoke the
>>> callback with the response, then whether for example the callback is
>>> actually the original attachment, or it involves "bridging the unbounded
>>> sub-protocol", what it means for the adapter to service the command.
>>>
>>> Then the adapter is usually either provided as with intermediate or
>>> domain types, or, for example it's just another protocol flow machine
>>> and according to the connections or messaging or mux/demux or
>>> establishing and upgrading layers and protocols, it basically works the
>>> same way as above in reverse.
>>>
>>> Here "to service" is the usual infinitive that for the noun means "this
>>> machine provides a service" yet as a verb that service means to operate
>>> according to the defined behavior of the machine in the resources of the
>>> machine to meet the resource needs of the machine's actions in the
>>> capabilities and limits of the resources of the machine, where this "I/O
>>> flow-machine: a service" is basically one "node" or "process" in a usual
>>> process model, allocated its own quota of resources according to the
>>> process and its environment model in the runtime in the system, and
>>> that's it. So, there's servicing as the main routine, then also what it
>>> means the maintenance servicing or service of the extended routine.
>>> Then, for protocols it's "implement this protocol according to its
>>> standards according to the resources in routine".
>>>
>>>
>>> You know, I don't know where they have one of these anywhere, ....
>>>
>>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> So, besides attachment+command+payload, also is for indicating the
>> protocol and layers, where it can inferred for the response, when the
>> callback exists or as the streaming sub-protocol starts|continues|ends,
>> what the response can be, in terms of domain objects, or handles, or
>> byte sequences, in terms of domain objects that can result handles to
>> transfer or byte-sequences to read or write,
>> attachment+command+payload+protocols "ACPP" data structure.
>>
>> Another idea that seems pretty usual, is when the payload is off to the
>> side, about picking up the payload when the request arrives, about when
>> the command, in the protocol, involves that the request payload, is off
>> to the side, to side-load the payload, where usually it means the
>> payload is large, or bigger than the limits of the request size limit in
>> the protocol, it sort of seems a good idea, to indicate for the
>> protocol, whether it can resolve resource references, "external", then
>> that accessing them as off to the side happens before ingesting the
>> command or as whether it's the intent to reference the external
>> resource, and when, when the external resource off to the side, "is",
>> part of the request payload, or otherwise that it's just part of the
>> routine.
>>
>> That though would get into when the side effect of the routine, is to
>> result the external reference or call, that it's figured that would all
>> be part of the routine. It depends on the protocol, and whether the
>> payload "is" fully-formed, with or without the external reference.
>>
>>
>> Then HTTP/2 and Websockets have plenty going on about the multiplexer,
>> where it's figured that multiplexed attachments, or "remultiplexer
>> attachment", RMA, out from the demultiplexer and back through the
>> multiplexer, have then that's another sort of protocol machine, in terms
>> of the layers, and about whether there's a thread or not that
>> multiplexing requires any sort of state on otherwise the connections'
>> attachment, that all the state of the multiplexer is figured lives in a
>> data structure on the actual attachment, while the logic should be
>> re-entrant and just a usual module for the protocol(s).
>>
>> It's figured then that the attachment is a key, with respect to a key
>> number for the attachment, then that in the multiplexing or muxing
>> protocols, there's a serial number of the request or command. There's a
>> usual idea to have serial numbers for commands besides, for each
>> connection, and then even serial numbers for commands for the lifetime
>> of the runtime. Then it's the usual metric of success or the error rate
>> how many of those are successes and how many are failures, that
>> otherwise the machine is pretty agnostic that being in the protocol.
>>
>> Timeouts and cancels are sort of figured to be attached to the monad and
>> the re-routine. It's figured that for any command in the protocol, it
>> has a timeout. When a command is received, is when the timeout countdown
>> starts, abstractly wall-clock time or system time. So, the ACPP has also
>> the timeout time, so, the task T has an ACPP
>> attachment-command-payload-protocol and a routine or reroutine R or RR.
>> Then also it has some metrics M or MT, here start time and expiry time,
>> and the serial numbers. So, how timeouts work is that when T is to be
>> picked up to a TW, first TW checks whether M.time is past expiry, then
>> if so it cancels the monad and results returning howsoever in the
>> protocol the timeout. If not what's figured is that before the
>> re-routine runs through, it just tosses T back on the TQ anyway, so that
>> then whenever it comes up again, it's just checked again until such time
>> as the task T actually completed, or it expires, or it was canceled, or
>> otherwise concluded, according to the combination of the monad of the
>> R/RR, and M.time, and system time. Now, this seems bad, because an
>> otherwise empty queue, would constantly be thrashing, so it's bad. Then,
>> what's to be figured is some sort of parameter, "toss when", that then
>> though would have timeout priority queues, or buckets of sorts with
>> regards to tossing all the tasks T back on the TQ for no other reason
>> than to check their timeout.
>>
>> It's figured that the monad of the re-routine is all the heap objects
>> and references to handles of the outstanding command. So, when the
>> re-routine is completed/canceled/concluded, then all the resources of
>> the monad should be freed. Then it's figured that any routine to access
>> the monad is re-entrant, and so that it results that access to the monad
>> is atomic, to build the graph of memos in the monad, then that access to
>> each memo is atomic as after access to the monad itself, so that the
>> access to the monad is thread-safe (and to be non-blocking, where the
>> only thing that happens to the monad is adding re-routine paths, and
>> getting and setting values of object values and handles, then releasing
>> all of it [, after picking out otherwise the result]).
>>
>> So it's figured that if there's a sort of sweeper or closer being the
>> usual idea of timeouts, then also in the case that for whatever reason
>> the asynchronous backend fails, to get a success or error result and
>> callback, so that the task T
>>
>> T{
>> RMA attachment; // return/remultiplexer attachment
>> PCP command; // protocol command/payload
>> RR routine; // routine / re-routine (monad)
>> MT metrics; // metrics/time
>> }
>>
>> has that timeouts, are of a sort of granularity. So, it's not so much
>> that timeouts need to be delivered at a given exact time, as delivered
>> within a given duration of time. The idea is that timeouts both call a
>> cancel on the routine and result an error in the protocol. (Connection
>> and socket timeouts or connection drops or closures and so on, should
>> also result cancels as some form of conclusion cleans up the monad's
>> resources.)
>>
>> There's also that timeouts are irrelevant after conclusion, yet if
>> there's a task queue of timeouts, not to do any work fishing them out,
>> just letting them expire. Yet, given that timeouts are usually much
>> longer than actual execution times, there's no point keeping them around.
>>
>> Then it's figured each routine and sub-routine, has its timing, then
>> it's figured to have that the RR and MT both have the time, then as with
>> regards to, the RR and MT both having a monad, then whether it's the
>> same monad what it's figured, is what it's figured.
>>
>> TASK {
>> RMA attachment; // return/remultiplexer attachment
>> PCP command; // protocol command/payload
>> RRMT routine; // routine / re-routine, metrics / time (monad)
>> }
>>
>> Then it's figured that any sub-routine checks the timeout overall, and
>> the timeouts up the re-routine, and the timeout of the task, resulting a
>> cancel in any timeout, then basically to push that on the back of the
>> task queue or LIFO last-in-first-out, which seems a bad idea, though
>> that it's to expeditiously return an error and release the resources,
>> and cancel any outstanding requests.
>>
>> So, any time a task is touched, there's checking the attachment whether
>> it's dropped, checking the routine whether it's canceled, with the goal
>> of that it's all cleaned up to free the resources, and to close any
>> handles opened in the course of building the monad of the routine's
>> results.
>>
>> Otherwise while a command is outstanding there's not much to be done
>> about it, it's either outstanding and not started or outstanding and
>> started, until it concludes and there's a return, the idea being that
>> the attachment can drop at any time and that would be according to the
>> Inputter/Reader or Recognizer/Parser (an ill-formed command results
>> either an error or a drop), the routine can conclude at any time either
>> completing or being canceled, then that whether any handles are open in
>> the payload, is that a drop in the attachment, disconnect in the
>> [streaming] command, or cancel in the routine, ends each of the three,
>> each of those two, or that one.
>>
>> (This is that the command when 'streaming sub-protocol' results a bunch
>> of commands in a sub-protocol that's one command in the protocol.)
>>
>> The idea is that the RMA is only enough detail to relate to the current
>> state in the attachment of the remultiplexing, the command is enough
>> state to describe its command and payload and with regards to what
>> protocol it is and what sub-protocols it entered and what protocol it
>> returns to, and the routine is the monad of the entire state of the
>> routine, either value objects or open handles, to keep track of all the
>> things according to these things.
>>
>> So, still it's not quite clear how to have the timeout in the case that
>> the backend hangs, or drops, or otherwise that there's no response from
>> the adapter, what's a timeout. This sort of introduces re-try logic to
>> go along with time-out logic.
>>
>> The re-try logic, involves that anything can fail, and some things can
>> be re-tried when they fail. The re-try logic would be part of the
>> routine or re-routine, figuring that any re-tries still have to live in
>> the time of the command. Then re-tries are kind of like time-outs, it's
>> usual that it's not just hammering the re-tries, yet a usual sort of
>> back-off and retry-count, or retry strategy, and then whether that it
>> involves that it should be a new adapter handle from the pool, about
>> that adapter handles from the pool should be round-robin and when there
>> are retry-able errors that usually means the adapter connection is
>> un-usable, that getting a new adapter connection will get a new one and
>> whether retry-able errors plainly enough indicate to recycle the adapter
>> pool.
>>
>> Then, retry-logic also involves resource-down, what's called
>> circuit-breaker when the resource is down that it's figured that it's
>> down until it's back up. [It's figured that errors by default are _not_
>> retry-able, and, then as about the resource-health or
>> backend-availability, what gets involved in a model of critical
>> resource-recycling and backend-health.]
>>
>>
>> About server-push, there's an idea that it involves the remultiplexer
>> and that the routine, according to the protocol, synthesizes tasks and
>> is involved with the remultiplexer, to result it makes tasks then that
>> run like usual tasks. [This is part of the idea also of the mux or
>> remux, about 1:many commands/responses, and usually enough their
>> serials, and then, with regards to "opportunistic server push", how to
>> drop the commands that follow that would otherwise request the
>> resources. HTTP/2 server-push looks deprecated, while then there's
>> WebSocket, which basically makes for a different sort of use-case
>> peer-peer than client-server. For IMAP is the idea that when there are
>> multiple responses to single commands then that's basically in the
>> mux/remux. For pipelined commands and also for serial commands is the
>> mux/remux. The pipelined commands would result state building in the
>> mux/remux when they're returned disordered, with regards to results and
>> the handles, and 'TCB' or 'TW' driving response results.]
>>
>>
>> So, how to implement timeout or the sweeper/closer, has for example that
>> a connection drop, should cancel all the outstanding tasks for that
>> connection. For example, undefined behavior of whatever sort results a
>> missed callback, should eventually timeout and cancel the task, or all
>> the tasks instances in the TQ for that task. (It's fair enough to just
>> mark the monads of the attachment or routine as canceled, then they'll
>> just get immediately discarded when they come up in the TQ.) There's no
>> point having timeouts in the task queue because they'd either get
>> invoked for nothing or get added to the task queue long after the task
>> usually completes. (It's figured that most timeouts are loose timeouts
>> and most tasks complete in much under their timeout, yet here it's
>> automatic that timeouts are granular to each step of the re-routine, in
>> terms of the re-routine erroring-out if a sub-routine times-out.)
>>
>>
>> The Recognizer/Parser (Commander) is otherwise stateless, the
>> Inputter/Reader and its Remultiplexer Attachment don't know what results
>> Tasks, the Task Queue will run (and here non-blockingly) any Task's
>> associated routine/re-reroutine, and catch timeouts in the execution of
>> the re-routine, the idea is that the sweeper/closer basically would only
>> result having anything to do when there's undefined behavior in the
>> re-routine, or bugs, or backend timeouts, then whether calls to the
>> adapter would have the timeout-task-lessors or "TTL's", in its task
>> queue, point being that when there's nothing going on that the entire
>> thing is essentially _idle_, with the Inputter/Reader blocked on select
>> on the I/O side, the Outputter/Writer or Backend Adapter sent on the O/I
>> side, the Inputter/Reader blocked on the O/I side, the TQ's empty (of,
>> the protocol, and, the backend adapters), and it's all just pending
>> input from the I/O or O/I side, to cascade the callbacks back to idle,
>> again.
>>
>> I.e. there shouldn't be timeout tasks in the TQ, because, at low load,
>> they would just thrash and waste cycles, and at high load, would arrive
>> late. Yet, it is so that there is formal un-reliability of the routines,
>> and, formal un-reliability of the O/I side or backend, [and formal
>> un-reliability of connections or drops,] so some sweeper/closer checks
>> outstanding commands what should result canceling the command and its
>> routines, then as with regards to the backend adapter, recycling or
>> teardown the backend adapter, to set it up again.
>>
>> Then the idea is that, Tasks, well enough represent the outstanding
>> commands, yet there's not to be maintaining a task set next to the task
>> queue, because it would use more space and maintenance in time than the
>> queue itself, while multiple instances of the same Task can be in the
>> Task queue as point each to the state of the monad in the re-routine,
>> then gets into whether it's so, that, there is a task-set next to the
>> task-queue, then that concluding the task removes it from the set, while
>> the sweeper/closer just is scheduled to run periodically through the
>> entire task-set and cancel those expired, or dropped.
>>
>> Then, having both a task-set TS and task-queue TQ, maybe seems the thing
>> to do, where, it should be sort of rotating, because, the task-queue is
>> FIFO, while the task-set is just a set (a concurrent set, though as with
>> regards to that the tasks can only be marked canceled, and resubmitted
>> to the task queue, with regards to that the only action that removes
>> tasks from the task-set is for the task-queue to result them being
>> concluded, then that whatever task gets tossed on the task queue is to
>> be inserted into the task-set).
>>
>> Then the task-set TS would be on the order of outstanding tasks, while,
>> the task-queue TQ would be on the order of outstanding tasks'
>> re-routines.
>>
>> Then the usual idea of sweeper/closer is to iterate through a view of
>> the TS, check each task whether its attachment dropped or command or
>> routine timed-out or canceled, then if dropped or canceled, to toss it
>> on the TQ, which would eventually result canceling if not already
>> canceled and dropping if dropped.
>>
>> (Canceling/Cancelling.)
>>
>> Most of the memory would be in the monads, also the open or live handles
>> would be in the routine's monads, with the idea being that when the task
>> concludes, then the results, that go out through the remultiplexer,
>> should be part of the task.
>>
>> TASK {
>> RMA attachment; // return/remultiplexer attachment
>> PCP command; // protocol command/payload
>> RRMT routine; // routine / re-routine, metrics / time (monad)
>> RSLT result; // result (monad)
>> }
>>
>> It's figured that the routine _returns_ a result, which is either a
>> serializable value or otherwise it's according to the protocol, or it's
>> a live handle or specification of handle, or it has an error/exception
>> that is expected to be according to the protocol, or that there was an
>> error then whether it results a drop according to the protocol. So, when
>> the routine and task concludes, then the routine and metrics monads can
>> be released, or de-allocated or deleted, while what live handles they
>> have, are to be passed back as expeditiously as possible to the
>> remultiplexer to be written to the output as on the wire the protocol,
>> so that the live handles can be closed or their reference counts
>> decremented or otherwise released to the handle pool, of a sort, which
>> is yet sort of undefined.
>>
>> The result RSLT isn't really part of the task, once the task is
>> concluding, the RRMT goes right to the RMA according to the PCP, that
>> being the atomic operation of concluding the task, and deleting it from
>> the task-set. (It's figured that outstanding callbacks unaware their
>> cancel, of the re-routines, basically don't toss the task back onto the
>> TQ if they're canceled, that if they do, it would just sort of
>> spuriously add it back to the task-set, which would result it being
>> swept out eventually.)
>>
>> TASK {
>> RMA attachment; // return/remultiplexer attachment
>> PCP command; // protocol command/payload
>> RRMT routine; // routine / re-routine, metrics / time (monad, live
>> handles)
>> }
>>
>> TQ // task queue
>> TS // task set
>>
>> TW // task-queue worker thread, latch on TQ
>> TZ // task-set cleanup thread, scheduled about timeouts
>>
>> Then, about what threads run the callbacks, is to get figured out.
>>
>> TCF // thread call forward
>> TCB // thread call back
>>
>> It's sort of figured that calling forward, is into the adapters and
>> backend, and calling back, is out of the result to the remultiplexer and
>> running the remultiplexer also. This is that the task-worker thread
>> invokes the re-routines, and the re-routine callbacks, are pretty much
>> called by the backend or TCF, because all they do is toss back onto the
>> TQ, so that the TW runs the re-routines, the TCF is involved in the O/I
>> side and the backend adapter, and what reserves live handles, while the
>> TCB returns the results through the I/O side, and what recycles live
>> handles.
>>
>> Then it's sort of figured that the TCF result thread groups or whatever
>> otherwise results whatever blocks and so on howsoever it is that the
>> backend adapter is implemented, while TCB is pretty much a single
>> thread, because it's driving I/O back out through all the open
>> connections, or that it describes thread groups back out the I/O side.
>> ("TCB" not to be confused with "thread control block".)
>>
>>
>> Nonblocking I/O, and, Asynchronous I/O
>>
>> One thing I'm not too sure about is the limits of the read and write of
>> the non-blocking I/O. What I figure is that mostly buffers throughout
>> are 4KiB buffers from a free-list, which is the usual idea of reserving
>> buffers and getting them off a free-list and returning them when done.
>> Then, I sort of figure that the reader, gets about a 1MiB buffer for
>> itself, with the idea being, that the Inputter when there is data off
>> the wire, reads it into 1MiB buffer, then copies that off to 4KiB
>> buffers.
>>
>> BFL // buffer free-list, 1
>> BIR // buffer of the inputter/reader, 1
>> B4K // buffer of 4KiB size, many
>>
>> What I figure that BIR is "direct memory" as much as possible, for DMA
>> where native, while, figuring that pretty much it's buffers on the heap,
>> fixed-size buffers of small enough size to usually not be mostly sparse,
>> while not so small that usual larger messages aren't a ton of them, then
>> with regards to the semantics of offsets and extents in the buffers and
>> buffer lists, and atomic consumption of the front of the list and atomic
>> concatenation to the back of the list, or queue, and about the
>> "monohydra" or "slique" data structure defined way above in this thread.
>>
>> Then about writing is another thing, I figure that a given number of
>> 4KiB buffers will write out, then no longer be non-blocking while
>> draining, about the non-blocking I/O, that read is usually non-blocking
>> because if nothing is available then nothing gets copied, while write
>> may be blocking because the UART or what it is remains to drain to write
>> more in.
>>
>> I'm not even sure about O_NONBLOCK, aio_read/aio_write, and overlapped
>> I/O.
>>
>> Then it looks like O_NONBLOCKING with select and asynchronous I/O the
>> aio or overlapped I/O, sort of have different approaches.
>>
>> I figure to use non-blocking select, then, the selector for the channel
>> at least in Java, has both read and write interest, or all interest,
>> with regards to there only being one selector key per channel (socket).
>> The issue with this is that there's basically that the Inputter/Reader
>> and Outputter/Writer are all one thread. So, it's figured that reads
>> would read about a megabyte at a time, then round-robin all the ready
>> reads and writes, that for each non-blocking read, it reads as much as a
>> megabyte into the one buffer there, copies the read bytes appending it
>> into the buffer array in front of the remux Input for the attachment,
>> tries to write as many as possbile for the buffer array for the write
>> output in front of the remux Output for the attachment, then proceeds
>> round-robin through the selector keys. (That each of those is
>> non-blocking on the read/write a.k.a. recv/send then copying from the
>> read buffer into application buffers is according to as fast as it can
>> fill a free-list given list of buffers, though that any might get
>> nothing done.)
>>
>> One of the issues is that the selector keys get waked up for read, when
>> there is any input, and for write, when the output has any writeable
>> space, yet, there's no reason to service the write keys when there is
>> nothing to write, and nothing to read from the read keys when nothing to
>> read.
>>
>> So, it's figured the read keys are always of interest, yet if the write
>> keys are of interest, mostly it's only one or the other. So I'd figure
>> to have separate read and write selectors, yet, it's suggested they must
>> go together the channel the operations of interest, then whether the
>> idea is "round-robin write then round-robin read", because all the
>> selector keys would always be waking up for writing nothing when the way
>> is clear, for nothing.
>>
>> Then besides non-blocking I/O is asynchronous I/O, where, mostly the
>> idea is that the completion handler results about the same, ..., where
>> the completion handler is usually enough "copy the data out to read,
>> repeat", or just "atomic append more to write, repeat", with though
>> whether that results that each connection needs its own read buffers, in
>> terms of asynchronous I/O, not saying in what order or whether
>> completion handlers, completion ports or completion handlers, would for
>> reading each need their own buffer. I.e., to scale to unbounded many
>> connections, the idea is to use constant size resources, because
>> anything linear would grow unbounded. That what's to write is still all
>> these buffers of data and how to "deduplicate the backend" still has
>> that the heap fills up with tasks, that the other great hope is that the
>> resulting runtime naturally rate-limits itself, by what resources it
>> has, heap.
>>
>> About "live handles" is the sort of hope that "well when it gets to the
>> writing the I/O, figuring to transfer an entire file, pass it an open
>> handle", is starting to seem a bad idea, mostly for not keeping handles
>> open while not actively reading and writing from them, and that mostly
>> for the usual backend though that does have a file-system or
>> object-store representation, how to result that results a sort of
>> streaming sub-protocol routine, about fetching ranges of the objects or
>> otherwise that the idea is that the backend file is a zip file, with
>> that the results are buffers of data ready to write, or handles, to
>> concatenate the compressed sections that happen to be just ranges in the
>> file, compressed, with concatenating them together about the internals
>> of zip file format, the data at rest. I.e. the idea is that handles are
>> sides of a pipe then to transfer the handle as readable to the output
>> side of the pipe as writeable.
>>
>> It seems though for various runtimes, that both a sort of "classic
>> O_NONBLOCKING" and "async I/O in callbacks" organizations, can be about
>> same, figuring that whenever there's a read that it drives the Layers
>> then the Recognizer/Parser (the remux if any and then the
>> command/payload parser), and the Layers, and if there's anything to
>> write then the usual routine is to send it and release to recycle any
>> buffers, or close the handles, as their contents are sent.
>>
>> It's figured to marshal whatever there is to write as buffers, while,
>> the idea of handles results being more on the asynchronous I/O on the
>> backend when it's filesystem. Otherwise it would get involved partially
>> written handles, though there's definitely something to be said for an
>> open handle to an unbounded file, and writing that out without breaking
>> it into a streaming-sub-protocol or not having it on the heap.
>>
>> "Use nonblocking mode for this operation; that is, this call to preadv2
>> will fail and set errno to EAGAIN if the operation would block. "
>>
>> The goal is mostly being entirely non-blocking, then with that the
>> atomic consume/concatenate of buffers makes for "don't touch the buffers
>> while their I/O is outstanding or imminent", then that what services I/O
>> only consumes and concatenates, while getting from the free-list or
>> returning to the free-list, what it concatenates or consumes. [It's
>> figured to have buffers of 4KiB or 512KiB size, the inputter gets a 1MiB
>> direct buffer, that RAM is a very scarce resource.]
>>
>> So, for the non-blocking I/O, I'm trying to figure out how to service
>> the ready reads, while, only servicing ready writes that also have
>> something to write. Then I don't much worry about it because ready
>> writes with nothing to write would result a no-op. Then, about the
>> asynchronous I/O, is that there would always be an outstanding or
>> imminent completion result for the ready read, or that, I'm not sure how
>> to make it so that reads are not making busy-work, while, it seems clear
>> that writes are driven by there being something to write, then though
>> not wanting those to hammer when the output buffer is full. In this
>> sense the non-blocking vector I/O with select/epoll/kqueue or what, uses
>> less resources for services that have various levels of load,
>> day-over-day.
>>
>>
>> https://hackage.haskell.org/package/machines
>> https://clojure.org/reference/transducers
>> https://chamibuddhika.wordpress.com/2012/08/11/io-demystified/
>>
>>
>> With non-blocking I/O, or at least in Java, the attachment, is attached
>> to the selection key, so, they're just round-robin'ed. In asynchronous
>> (aio on POSIX or overlapped I/O on Windows respectively), in Java the
>> completion event gets the attachment, but doesn't really say how to
>> invoke the async send/recv again, and I don't want to maintain a map of
>> attachments and connections, though it would be alright if that's the
>> way of things.
>>
>> Then it sort of seems like "non-blocking for read, or drops, async I/O
>> for writes". Yet, for example in Java, a SocketChannel is a
>> SelectableChannel, while, an AsyncSocketChannel, is not a
>> SelectableChannel.
>>
>> Then, it seems pretty clear that while on Windows, one might want to
>> employ the aio model, because it's built into Windows, then as for the
>> sort of followup guarantees, or best when on Windows, that otherwise the
>> most usual approach is "O_NONBLOCKING" for the socket fd and the fd_set.
>>
>> Then, what select seems to guarantee, is, that, operations of interest,
>> _going to ready_, get updated, it doesn't say anything about going to
>> un-ready. Reads start un-ready and writes start ready, then that the
>> idea is that select results updating readiness, but not unreadiness.
>> Then the usual selector implementation, for the selection keys, and the
>> registered keys and the selected keys, for the interest ops (here only
>> read and write yet also connect when drops fall out of it) and ready ops.
>>
>> Yet, it doesn't seem to really claim to guarantee, that while working
>> with a view of the selection keys, that if selection keys are removed
>> because they're read-unready (nothing to do) or nothing-to-write
>> (nothing to do), one worries that the next select round has to have
>> marked any read-ready, while, it's figured that any something-to-write,
>> should add the corresponding key back to the selection keys. (There's
>> for that if the write buffer is full, it would just return 0 I suppose,
>> yet not wanting to hammer/thrash/churn instead just write when ready.)
>>
>> So I want to establish that there can be more than one selector,
>> because, otherwise I suppose that the Inputter/Reader (now also
>> Outputter/Writer) wants read keys that update to ready, and write keys
>> that update to ready, yet not write keys that have nothing-to-do, when
>> they're all ready when they have nothing-to-do. Yet, it seems pretty
>> much that they all go through one function, like WSPSelect on Windows.
>>
>> I suppose there's setting the interest ops of the key, according to
>> whether there's something to write, figuring there's always something to
>> read, yet when there is something to write, would involve finding the
>> key and setting its write-interest again. I don't figure that any kind
>> of changing the selector keys themselves is any kind of good idea at
>> all, but I only want to deal with the keys that get activity.
>>
>> Also there's an idea that read() or write() might return -1 and set
>> EAGAIN in the POSIX thread local error number, yet for example in the
>> Java implementation it's to be avoided altogether calling the unready as
>> they only return >0 or throw an otherwise ambiguous exception.
>>
>> So, I'm pretty much of a mind to just invoke select according to 60
>> seconds timeout, then just have the I/O thread service all the selection
>> keys, what way it can sort of discover drops as it goes through then
>> read if readable and write if write-able and timeout according to the
>> protocol if the protocol has a timeout.
>>
>> Yet, it seems instead that when a read() or write() returns until read()
>> or write() returns 0, there is a bit of initialization to figure out,
>> must be. What it seems that selection is on all the interest ops, then
>> to unset interest on OP_WRITE, until there is something to write, then
>> to set interest on OP_WRITE on the selector's keys, before entering
>> select, wherein it will populate what's writable, as where it's
>> writable. Yet, there's not removing the key, as it will show up for
>> OP_READ presumably anyways.
>>
>> Anyways it seems that it's alright to have multiple selectors anyways,
>> so having separate read and write selectors seems fine. Then though
>> there's two threads, so both can block in select() at the same time.
>> Then it's figured that the write selector is initialized by deleting the
>> selected-key as it starts by default write-able, and then it's only of
>> interest when it's ever full on writing, so it comes up, there's writes
>> until done and its' deleted, then that continues until there's nothing
>> to do. The reads are pretty simple then and when the selected-keys come
>> up they're read until nothing-to-do, then deleted from selected-keys.
>> [So, the writer thread is mostly only around to finish unfulfilled
>> writes.]
>>
>>
>> Remux: Multiplexer/Demultiplexer, Remultiplexer, mux/demux
>>
>> A command might have multiple responses, where it's figured it will
>> result multiple tasks, or a single task, that return to a single
>> attachment's connection. The multiplexer mostly accepts that requests
>> are mutiplexed over the connection, so it results that those are
>> ephemeral and that the remux creates remux attachments to the original
>> attachment, involved in any sort of frames/chunks. The compression layer
>> is variously before or after that, then encryption is after that, while
>> some protocols also have encryption of a sort within that.
>>
>> The remux then results that the Recognizer/Parser just gets input, and
>> recognizes frames/chunks their creation, then assembling their contents
>> into commands/payloads. Then it's figured that the commands are
>> independent and just work their way through as tasks and then get
>> chunked/framed as according to the remux, then also as with regards to
>> "streaming sub-protocols with respect to the remux".
>>
>> Pipelined commands basically result a remux, establishing that the
>> responses are written in serial order as were received.
>>
>> It's basically figured that 63 bit or 31 bit serial numbers would be
>> plenty to identify unique requests per connection, and connections and
>> so on, about the lifetime of the routine and a serial number for each
>> thing.
>>
>>
>>
>> IO <-> Selectors <-> Rec/Par <-> Remux <-> Rec/Par <-> TQ/TS <-> backend
>>
>>
>>
>>
>>
>
>
>
>
>
> Well I figure that any kind of server module for the protocol needs the
> client module.
>
> Also it's sort of figured that a client adapter has a similar usual
> approach to the non-blocking I/O to get figured out, as what with
> regards to then usual usage patterns of the API, and expecting to have a
> same sort of model of anything stateful the session, and other issues
> involved with the User-Agent, what with regards to the things how a
> client is, then as with regards to it has how it constructs the commands
> and payloads, with the requests it gets of the commands and partial
> payloads (headers, body, payload), how it's to be a thing.
>
> Also it's figured that there should be a plain old stdin/stdout that
> then connects to one of these things instead of sockets, then also for
> testing and exercising the client/server that it just builds a pair of
> unidirectional pipes either way, these then being selectable channels in
> Java or otherwise the usual idea of making it so that stdin/stdout are a
> connection.
>
> With regards to that then it looks like TLS (1.2, 1.3, maybe 1.1) should
> be figured out first, then a reasonably involved multiplexing, then as
> with regards to something like the QUIC UDP multiplexing, then about how
> that sits with HTTP/2 style otherwise semantics, then as with regards to
> SCTP, and this kind of thing.
>
> I.e., if I'm going to implement QUIC, first it should be SCTP.
>
> The idea of the client in the same context as the server, sort of looks
> simple, it's a connection pool, then as with regards to that usually
> enough, clients call servers not the other way around, and clients send
> commands and receive results and servers receive commands and send
> results. So, it's the O/I side.
>
> It's pretty much figured that on protocols like HTTP 1.1, and otherwise
> usual adapters with one session, there's not much considered about
> sessions that bridge adapter connections, with the usual idea that
> multiple client-side connections might be a session, and anything
> session-stateful is session-stateful anywhere in the back-end fleet,
> where it's figured usually that any old host in the backend picks up any
> old connection from the front-end.
>
> Trying to figure out QUIC ("hi, we think that TCP/IP is ossified because
> we can't just update Chrome and outmode it, and can't be bothered to
> implement SCTP and get other things figured out about unreliable
> datagrams multiplexing a stream's multiplex connection, or changing IP
> addresses"), then it seems adds a sort of selectable-channel abstraction
> in front of it, in terms of anything about it being a session, and all
> the datagrams just arriving at the datagram port. (QUIC also has
> "server-initiated" so it's a peer-to-peer setup not a client-server
> setup.) Then it's figured that anything modeling QUIC (over UDP) should
> be side-by-side SCTP over UDP, and Datagram TLS DTLS.
>
>
> So, TLS is ubiquitous, figuring if anybody actually wants to encrypt
> anything that it's in the application layer, then there's this ALPN to
> get it associated with protocols, or this sort of "no-time for a TLS
> handshake, high-five", TLS is ubiquitous, to first sort of figure out
> TLS, then for the connections, then about the multiplexing, about kinds
> of stateful stream datagram, sessions. ("Man in the middle? Here let me
> NAT your PAC while you're on the phone.")
>
> As part of a protocol, there's either TLS always and it precedes
> entering otherwise the protocol, or, there's STARTTLS, which is figured
> then to for for the duration barring "switching protocols". It's assumed
> that "streaming-sub-protocols" are "assume/resume" protocol, while
> "switching protocols" is "finish/start".
>
> Then, there's a simple sort of composition of attributes of protocols,
> and profiles of protocols after capabilities and thus profiles of
> protocols in effect.
>
> In Java the usual idea of TLS is called SSLEngine. Yet, SSLEngine is
> sort of organized around blocking calls, or "sitting on the socket". It
> doesn't really have a way to feed it the handshake, get the master key,
> then just encrypt everything following with that. So it's figured that
> as a profile module, it's broken apart a bit the TLS protocol, then
> anything to do with certificates or algorithms is in java.security or
> javax.security anyways. Then AEAD is just a common way to make encrypted
> frames/chunks. It's similar with Zip/Deflate, and that it should be
> first-class anyways because there's a usual idea to use zip files as
> file system representation of compresssed, concatenable data at rest,
> for mostly transferring static constant at what's the compressed data at
> rest. The idea of partially-or-weakly encrypted data at rest is a good
> dog but won't hunt as above, yet the block-cipher in the usual sense
> should operate non-blockingly on the buffers. Not sure about "TLS Change
> Cipher".
>
> So, TLS has "TLS connection state", it's transport-layer. Then, what
> this introduces is how to characterize or specify frames, or chunks, in
> terms of structs, and alignment, and fixed-length and variable-length
> fields, of the most usual sorts of binary organizations of records, here
> frames or chunks.
>
> https://en.wikipedia.org/wiki/X.690
>
> The ASN.1 encoding, abstract syntax notation, is a very usual way to
> make a bunch of the usual things like for other ITU-T or ITU-X
> specifications, like X.509 the certificates and so on. Then if the
> structure is simple enough, then all the OID's have to get figured out
> as basically the OID's are reserved values and constants according to
> known kinds of contents, and in the wild there are many tens of
> thousands of them, while all that's of interest is a few tens or less,
> all the necessary things for interpreting TLS X.509 certificates and
> this kind of thing. So, this is a usual way to describe the structures
> and nested structures of any organization of data on the wire.
>
> Then, frames and chunks, basically are octets, of a very usual sort of
> structure as "header, and body", or "frame" or "chunk", where a frame
> usually has a trailer, header-body-trailer. The headers and trailers are
> usually fixed length, and one of the fields is the size or length of the
> variable-length body. They're sometimes called blocks.
>
> https://datatracker.ietf.org/doc/html/rfc1951 (Deflate)
>
> Deflate has Huffman codes then some run-length encoding, for most of the
> textual data it's from an alphabet of almost entirely ISO646 or ASCII,
> and there's not much run-length at all, while the alphabets of base32 or
> base64 might be very usual, then otherwise binary data is usually
> already compressed and to be left alone.
> There's basically to be figured if there's word-match for commonly or
> recently used words, in the about 32K window of the Deflate block,
> mostly otherwise about the minimal character sets and its plain sorted
> Huffman table the block. The TLS plaintext blocks are limited to 2^14 or
> about 16K, the Deflate non-compressed blocks are limited to about 64K,
> the compressed blocks don't have length semantics, only
> last-block/end-of-block. The Deflate blocks have a first few bits that
> indicate block/last-block, then there's a reserved code end-of-block.
> The TLS 1.2 with Deflate says that Deflate state has to continue
> TLS-block over TLS block, while, it needn't, for putting Deflate blocks
> in TLS blocks closed, though accepting Deflate blocks over consecutive
> TLS blocks. For email messages it's figured that the header is a block,
> the separator is a block, and the body is a block. For HTTP it's figured
> the header is a Defalte block, the separator is a Deflate block, and the
> body is a Deflate block. The commands and results, it's figured at
> Deflate blocks. This way then they just get concatenated, and are
> self-contained. It's figured that decompression, recognize/parse, copies
> into plaintext, as whatever has arrived, after encryption, block-ciphers
> the block into what's either the TLS 1.2 (not TLS 1.3) or mostly only
> the application protocol has as compression, Deflate. (Data is
> lsb-to-msb, Huffman codes msb-to-lsb. 256 = 0x100 =
> 1_0000_0000_0000_0000b is end-of-block. ) For text data it would seem
> better to reduce the literal range overall, and also to make a Huffman
> table of the characters, which are almost always < 256 many, anyways.
> I.e., Deflate doesn't make a Huffman table of the alphabet of the input,
> and the minimum length of a duplicate-coded word is 3.
>
>
> "The Huffman trees for each block are independent
> of those for previous or subsequent blocks; the LZ77 algorithm may
> use a reference to a duplicated string occurring in a previous block,
> up to 32K input bytes before." -
> https://datatracker.ietf.org/doc/html/rfc1951#section-2
>
> Zip file format (2012):
> https://www.loc.gov/preservation/digital/formats/digformatspecs/APPNOTE%2820120901%29_Version_6.3.3.txt
>
>
> https://www.loc.gov/preservation/digital/formats/fdd/fdd000354.shtml
>
> "The second mechanism is the creation of a hidden index file containing
> an array that maps file offsets of the uncompressed file, at every chunk
> size interval, to the corresponding offset in the Deflate compressed
> stream. This index is the structure that allows SOZip-aware readers to
> skip about throughout the file."
>
> - https://github.com/sozip/sozip-spec/blob/master/sozip_specification.md
>
> It's figured that if the zip file has a length check and perhaps a
> checksum attribute for the file, then besides modifications then
>
>
> So, the profiles in the protocols, or capabilities, are variously called
> extensions, about the mode of the protocol, and sub-protocols, or just
> the support of the commands.
>
> Then, there's what's "session", in the connection, and
> "streaming-sub-protocols", then sorts, "chained-sub-protocols" ("command
> sequence"), where streaming is for very large files where chained is for
> sequences of commands so related, for examples SMTP's MAIL-RCPT-DATA and
> MAIL-RCPT-RSET. Then the usual connection overall is a chained protocol,
> from beginning and something like HELO/EHLO to ending and something like
> QUIT. In HTTP for example, there's that besides upgrades which is
> switching, and perhaps streaming-sub-protocols for large files, and
> something like CORS expectations about OPTIONS, there are none, though
> application protocol above it like "Real REST", may have.
>
> Then, in "session", are where the application has from the server any of
> its own initiated events, these results tasks as for the attachment.
>
> The, "serial-sub-protocol" is when otherwise unordered commands, have
> that a command must be issued in order, and also, must be completed
> before the next command, altogether, with regards to the application and
> session.
>
>
>
>
> About driving the inputter/reader, it's figured the Reader thread, TIR,
> both services the input I/O, then also drives the remux remultiplexer
> and also drives the rec/par
> recognizer/parser, and the decryption and the decompression, so that its
> logic is to offload up to a megabyte from each connection, copying that
> into buffers for each connection, then go through each of those, and
> drive their inputs, constructing what's rec/par'ed and releasing the
> buffers. It results it gets a "set" of ready inputs, and there's an idea
> that the order those get served should be randomized, with the idea that
> any connection is as likely as any other to get their commands in first.
>
> Writing the Code
>
> The idea of writing the code is: the least amount. Then, the protocol
> and its related protocols, and its data structures and the elements of
> its recognition and parsing, should as possible be minimal, then at
> compile time, the implementation, derived, resulting then according to
> the schema in "abstract syntax", files and classes and metadata, that
> that interfaces and base classes are derived, generated, then that the
> implementations are composed as of those.
>
>
> (The "_" front or back is empty string,
> "_" inside is " " space,
> "__" inside is "-" dash,
> "___" inside is "_" underscore,
> and "____" inside is "." dot.)
>
> class SMTP {
>
> extension Base {
> enum Specification { RFC821, RFC1869, RFC2821, RFC5321 }
> enum Command {HELO, EHLO, MAIL, RCPT, DATA, RSET, VRFY, EXPN,
> HELP, NOOP, QUIT}
> }
>
> extension SIZE {
> enum Specification {RFC1870 }
> enum Result { SIZE }
> }
> extension CHECKPOINT {
> enum Specification {RFC1845 }
> }
> extension CHUNKING {
> enum Specification {RFC3030 }
> }
> extension PIPELINING {
> enum Specification {RFC2920 }
> }
> extension _8BITMIME {
> enum Specification {RFC6152 }
> enhanced _8BITMIME {
> enum Command {EHLO, MAIL}
> enum Result {_8BITMIME}
> }
> }
>
> extension SMTP__AUTH {
> enum Specification {RFC4954 }
> command {AUTH}
> }
> extension START__TLS {
> enum Specification {RFC3207}
> enum Command { STARTTLS }
> }
>
> extension DSN {
> enum Specification {RFC3461 }
> }
>
> extension RFC3865 {
> enum Specification {RFC3865}
> enhanced RFC3865 {
> enum Command {EHLO, MAIL, RCPT }
> enum Result {NO__SOLICITING, SOLICIT }
> }
> }
> extension RFC4141 {
> enum Specification {RFC4141 }
> enhanced RFC4141 {
> enum Command {EHLO, MAIL }
> enum Result {CONPERM, CONNEG }
> }
> }
>
> // enum Rfc {RFC3207, RFC6409 }
> }
>
> class POP3 {
> enum Rfc {RFC1939, RFC 1734 }
>
> extension Base {
> enum Specification {RFC1939 }
>
> class States {
> class AUTHORIZATION {
> enum Command {USER, PASS, APOP, QUIT}
> }
> class TRANSACTION {
> enum Command {STAT, LIST, RETR, DELE, NOOP, RSET, QUIT
> , TOP, UIDL}
> }
> class UPDATE {
> enum Command {QUIT}
> }
> }
>
> }
> }
>
> class IMAP {
> enum Rfc { RFC3501, RFC4315, RFC4466, RFC4978, RFC5256, RFC5819,
> RFC6851, RFC8474, RFC9042 }
>
> extension Base {
> enum Specification {RFC3501}
>
> class States {
> class Any {
> enum Command { CAPABILITY, NOOP, LOGOUT }
> }
> class NotAuthenticated {
> enum Command { STARTTLS, AUTHENTICATE, LOGIN }
> }
> class Authenticated {
> enum Command {SELECT, EXAMINE, CREATE, DELETE, RENAME,
> SUBSCRIBE, UNSUBSCRIBE, LIST, LSUB, STATUS, APPEND}
> }
> class Selected {
> enum Command { CHECK, CLOSE, EXPUNGE, SEARCH, FETCH,
> STORE, COPY, UID }
> }
> }
> }
>
> class NNTP {
> enum Rfc {RFC3977, RFC4642, RFC4643}
>
> extension Base {
> enum Specification {RFC3977}
> enum Command {CAPABILITIES, MODE_READER, QUIT, GROUP,
> LISTGROUP, LAST, NEXT, ARTICLE, HEAD, BODY, STAT, POST, IHAVE, DATE,
> HELP, NEWGROUPS, NEWNEWS, OVER, LIST_OVERVIEW____FMT, HDR, LIST_HEADERS }
> }
>
> extension NNTP__COMMON {
> enum Specification {RFC2980 }
> enum Command {MODE_STREAM, CHECK, TAKETHIS, XREPLIC,
> LIST_ACTIVE, LIST_ACTIVE____TIMES, LIST_DISTRIBUTIONS,
> LIST_DISTRIB____PATS, LIST_NEWSGROUPS, LIST_OVERVIEW___FMT, LISTGROUP,
> LIST_SUBSCRIPTIONS, MODE_READER, XGTITLE, XHDR, XINDEX, XOVER, XPAT,
> XPATH, XROVER, XTHREAD, AUTHINFO}
> }
>
> extension NNTP__TLS {
> enum Specification {RFC4642}
> enum Command {STARTTLS }
> }
>
> extension NNTP__AUTH {
> enum Specification {RFC4643}
> enum Command {AUTHINFO}
> }
>
> extension RFC4644 {
> enum Specification {RFC4644}
> enum Command {MODE_STREAM, CHECK, TAKETHIS }
> }
>
> enum RFC8054 {
> // "like XZVER, XZHDR, XFEATURE COMPRESS, or MODE COMPRESS"
> enum Specification {RFC8054}
> enum Command {COMPRESS}
> }
> }
>
> class HTTP {
> extension Base {
> enum Specification {RFC2616, RFC7321, RFC9110}
> enum Command {GET, PUT, POST, OPTIONS, HEAD, DELETE, TRACE,
> CONNECT, SEARCH}
> }
> }
>
> class HTTP2 {
> enum Rfc {RFC7540, RFC8740, RFC9113 }
> }
>
> class WebDAV { enum Rfc { RFC2518, RFC4918, RFC3253, RFC5323}}
> class CardDAV { enum Rfc { RFC6352}}
> class CalDAV { enum Rfc { RFC4791}}
> class JMAP {}
>
>
> Now, this isn't much in the way of code yet, just finding the
> specifications of the standards and looking through the history of their
> evolution in revision with some notion of their forward compatibility in
> extension and backward compatibility in deprecation, and just
> enumerating some or most of the commands, that according to the state of
> connection its implicit states, and about the remux its
> connection-multiplexing, what are commands as discrete, what either
> layer the protocol, switch the protocol, make states in the session, or
> imply constructed tasks what later result responses.
>
>
> Then it's not much at all with regards to the layers of the protocol,
> the streams in the protocol, their layers, the payload or parameters of
> the commands, the actual logic of the routines of the commands, and the
> results of the commands.
>
> For something like HTTP, then there gets involved that besides the
> commands, then the headers, trailers, or other: "attributes", of
> commands, that commands have attributes and then payloads or parameters,
> in their protocol, or as about "attachment protocol (state) command
> attribute parameter payload", is in the semantics of the recognition and
> parsing, of commands and attributes, as with regards to parameters that
> are part of the application data, and parameters that are part of
> attributes.
>
>
>
> Recognizer/Parser
>
> The Recognizer/Parser or recpar isn't so much for the entire object
> representation of a command, where it's figured that the command +
> attributes + payload is for the application itself, as for recognizing
> the beginnings through the ends of commands, only parsing so much as
> finding well-formedness in the command + attributes (the parameters,
> and, variously headers, or, data, in the wire transmission of the body
> or payload), and, the protocol semantics of the command, and the
> specific protocol change or command task(s) to be constructed, according
> to the session and state, of the connection or its stream, and the
> protocol.
>
> For the layers or filters, of the cryptec or compec, mostly the
> recognition is from the frames/chunks/blocks, while in the plaintext,
> involves the wire representation of the command, usually a line, its
> parameters on the command line, then if according to headers, and when
> content-length or other chunking, is according to headers, or a
> stop-bit, for example, the dot-stuffing. When for example trailers
> follow, is assumed to be defined by the protocol, then as what the
> recpar would expect.
>
> Then, parsing of the content is mostly in the application, about the
> notion that the commands reflect tasks of routines of a given logic its
> implementation, command and parameters are a sort of data structure
> specific to the command, that headers and perhaps trailers would be a
> usual sort of data structure, while the body, and perhaps headers and
> trailers, is a usual sort of data structure, given to the commands.
>
> The recpars will be driven along with the filters and by the TIR thread
> so must not block and mostly detect a shallow well-formedess. It's
> figured that the implementation of the task on the body, does
> deserialization and otherwise validation of the payload.
>
> The TIR, if it finds connection/stream errors, writes the errors as
> close streams/connections directly to TCB, 'short-circuit', while
> engaging Metrics/Time.
>
> The deserialization and validation of the payload then is as by a TW
> task-worker or into the TCF call-forward thread.
>
> The complement to Recognizer/Parser, is a Serializer/Passer, which is
> the return path toward the Writer/Outputter, as the TCB call-back
> thread, of directly serializable or passable representations of results,
> according to that the command and task, has a particular ordering,
> format, and wire format, the output. The idea is that results a byte
> sequences or transferrable handles that goes out the remux and streams
> to the connection according to the remultiplexer attachment to the
> outbound connection.
>
> The notion of callbacks generally, basically results the employment of
> uni-directional or half-duplex pipes, and a system selector, thus that
> as callbacks are constructed to be invoked, they have a plain input
> stream that's ignored except for the selector, then that the idea is
> there's an attachment on the pipe that's the holder for the
> buffers/handles. That is, the idea is that in Java there's
> java.nio.channels.spi.SelectorProvider, and openPipe, then that the Pipe
> when constructed has that its reference is stored in a synchronous pipe
> map of the TCB, with an associated attachment that when a callback
> occurs, the pipe attachment has set the buffers/handlers or exception
> the result, then simply sends a byte to the pipe, which the TCB picks up
> from waiting on the pipe selector, deregisters and deletes the pipe, and
> writes the results out the remux attachment, and returns to select() on
> the pipe provider, that ultimately the usually no-work-to-do Writer
> thread, sees any remaining output on its way out, among them releasing
> the buffers and closing the handles.
>
> If there's very much a hard limit on pipes, then the idea is to just
> have the reference to the remux attachment and output to write in the
> form of an address in the bytes written on the call-back pipe, in this
> way only having a single or few long-lived call-back pipes, that TCB
> blocks and waits on in select() when there's nothing to do, otherwise
> servicing the writing/outputting in the serial order delivered on the
> pipe, only having one SelectionKey in the Selector from the
> SelectorProvider for the uni-directional pipe, with only read-ops
> interest. Pipes are system objects and have limits on the count of
> pipes, and, the depth of pipes. So, according to the that, could result
> just an outputter/writer queue for a usually-nothing-to-do writer
> thread, whether the TCB has a queue to consume, and a pipe-reader thread
> that fills the queue, notify()'s the TCB as wait()'s on it, and
> re-enters select() then for however it's so that at idle, all threads
> are blocked in select(), or wait(), toward "0% CPU", and "floor RAM", at
> idle. (... And all usual calls are non-blocking.)
>
> For TW and TCF to "reach idle", or "rest at idle", basically has for
> wait() and notify() in Java, vis-a-vis, pipe() and select() in
> system-calls. (... Which idle at rest in wait() and wakeup and aren't
> interrupted any notify() or notifyAll().) This is a design using mostly
> plain and self-contained data structurse, and usual system calls and
> library network and synchronization routines, which Java happens to
> surface, so, it's not language or runtime-specific.
>
> The callbacks on the forward side, basically are driven by TCF which
> services the backend and adapters, those callbacks on the re-routines
> pretty much just results the TW tasks, those then in their results
> resulting the TCB callbacks as above. It's figured some adapters won't
> have non-blocking ways, then either to give them threads, or, to
> implement a queue as up after the pipe-selector approach, for the TCF
> thread and thread group, and the back-end adapters.
>
> It's figured for streaming-sub-protocols that backend adapters will be
> entirely non-blocking also, resulting a usual sort of approach to
> low-load high-output round-trip throughput, as what TCF (thread call
> forward) returns to TTW (thread task worker) returns to TCB (thread call
> back) off the TS and TQ (task set and task queue). Then also is a usual
> sort of direct TIR to TCF to TCB approach, or bridging adapters.
>
>
>


Protocol Establishment and Upgrade

It's figured each connection, is according to a bound listening socket,
the accepter to listen and accept or TLA.

Each connection, starts with a "no protocol", that buffers input and
responds nothing.

Then, the protocol goes to a sort of "CONNGATE" protocol, where rules
apply to allow or deny connections, then there's a "CONNDROP" protocol,
that any protocol goes to when a connection drops, then as whether
according to the protocol it dropped from, either pending writes are
written or pending writes are discarded.

For CONNGATE is that there's the socket or IP address of the connection,
that rules are according to that, for example local subnet, localhost,
local pod IP, the gateway, as matching reverse DNS,
or according to records in DNS or a lookup, or otherwise well-known
connections to allow, or anything else or unreachable addresses or
suspected spammer addresses, to deny. This is just a usual command
CONNGATE and task then to either go to the next protocol after
a CONNOPEN, according to the address and port, or go to CONNDENY and
CONNDROP, this way having usual events about connections.

Along with CONNGATE is a sort of extension protocol, "ROLL FOLD GOLD
COLD SHED HOLD", this has for these sorts beginnings.

CONNOPEN

ROLL: open, it's usual protocol
FOLD: open, it's new client or otherwise anything not usual
GOLD: open, it's expected client with any priority

CONNDROP

COLD: drop, silently, close
SHED: drop, the server is overloaded, or down, try to return a response
"server busy", close
HOLD: "drop", passive-aggressive, put in a protocol CONNHOLD, discard
input and dribble


"Do not fold, spindle, or mutilate."



The message-oriented instead of connection-oriented or UDP datagrams
instead of TCP sockets, has that each message that arrives, gets
multiplexed then as with regards to whether it builds
streams, on one listening port. So, there's a sort of default protocol
of DGOPEN and DGDROP, then the sort of default protocol that multiplexes
datagrams according to session and client,
then a usual way that datagrams are handled as either individual
messages or chunks of a stream, whether there's a remux involved or it's
just the fixed-mux attachment, whatever else results the protocol. Each
datagram that arrives is associated with its packet's socket address.



This way there's a usual sort of notion of changing protocols, so that a
protocol like TLS-HANDSHAKE, or TLS-RENEGOTIATE, is just a usual
protocol in usual commands, then as with regards to the establishment
of the security of TLS according to the protocol, then resulting the
block-ciphers and options of TLS is according to the options of the
protocol, with regards then the usual end of TLS is a sort of
TLS-ALERT, protocol, that then usually enough to a CONNDROP, protocol.

So, there are sort of, "CONN" protocol, and, issues with "STRM" and "MESG".



The protocol establishment and upgrade, has basically that by default,
commands are executed and completed serially in the order they arrive,
with regards to each connection or message, that thusly the
establishment of filters or layers in the protocol is just so
configuring the sequence of those in the attachment or as about the
remux attachment, as with regards, to notions of connections and
streams, and, layers per connection, and, layers per stream.

Then, "TLS" protocol, is a usual layer. Another is "SASL", about the state.

As a layer, TLS, after the handshake, is mostly just the frames and
block-encipherment either way, and in TLS 1.2 maybe compression and
decompression, though, that's been left out of TLS 1.3. In the remux,
like for HTTP/2 or HTTP/3, or "HTTP over SCTP" or what, then for
something like HTTP/2, TLS is for the connection then ignorant the
streams, while for something like HTTP/3, it's a sort of user-space
instead of kernel-space transport protocol itself, then it's figured
that the "TLS high-five" as of Datagram TLS, is per stream, and agnostic
the connection, or listener, except as of new streams.

The compression, or "transport-layer compression", is pretty universally
Deflate, then what gets involved with Deflate, is a 32Kib look-back
window, that any code in deflate, is either a literal byte, or a
look-back distance and length, copying bytes. So, that involves that in
otherwise the buffers, that anything that gets compressed or
decompressed with Deflate, or the "compec", ("codec", "cryptec",
"compec"), always has to keep around a 32 Kib look-back window, until
the end of a Deflate block, where it is truncatable, then as to grow a
new one.

Mostly here then the compression may be associated with TLS, or
otherwise at the "transport layer" it's associated stream-wise, not
connection-wise, and mostly it is according to the "application layer",
when and where compression starts and ends in the commands and/or the
payloads. Then a usual idea is as much as possible to store the data at
rest in a compressed edition so it can be consumed as concatenated.


This way this sort of "Multiple Protocol Server" is getting figured out,
in design at least, then with the idea that with a good design, it's
flexible.


Remultiplexer and Connections and Streams


The Remultiplexer is about the most complicated concept, with the idea
of the multiplexer and demultiplexer inbound then the multiplexer
and demultiplexer outbound, from and to the outward-facing multiplexer
and demultiplexer, and from and to the backend-facing multiplexer
and demultiplexer, that last bit though being adapter pools.

So, it's figured, that throughout the system, that there's the
identifier of the system, by its host, and, the process ID, and there's
identification of events in time, by system time, then that everything
else gets serial numbers, basically numbers that increment serially for
each connection, message, command, response, error, and here for the
remultiplexer for the streams, in protocols like HTTP/2 or WebSocket
with multiple streams, or for whatever are message-oriented protocols,
those multiple streams.

In this way, the attachment, it's an object related to the connection,
then the remultiplexer attachment, is the attachment then also
related to any stream.

The idea is that the serial numbers don't include zero, and otherwise
are positive integers, then sometimes the protocols have natural
associations of the client-initiated and server-initiated streams,
one or the other being even or odd, say, while, also internally is
that each would have their own sort of namespace and serial number.

Very similarly, the re-routines, have the serial numbers their issuance
and invocations, then the tree of sub-re-routines, has that those are
serially numbered also, with regards to that any one of those comes
into being according to that the runtime, as one process its space,
somehow must vend serial numbers, and in a non-blocking way.

Then, these are involved with the addressing/routing scheme of
the callbacks or the routines, then also, with metrics/timing,
of the things and all the things. The idea is that the callbacks
are basically an object reference to an attachment, or monad
of the re-routines, then a very general sort of association to
the ring buffer of streams that come and go, or just a list of
them, and the monad of the re-routine, which is course has
that it's always re-run in the same order, so the routing scheme
to the attachment and addressing scheme to the monad,
is a common thing.


In the protocol, there are "states" of the protocol.

The "session", according to the protocol, is quite abstract.
It might be per-connection, per-stream, or across or bridging
connections or streams. A usual sort of idea is to avoid any
state in the session at all, because, anything state-ful at all
makes that the distributed back-end needs a distributed
session, and anything outside the process runtime doesn't
have the synchronization to its memory barriers. It's similar
with issues in the "domain session" or "protocol session"
with regards to vending unique identifiers or guaranteeing
deduplication according to unique identifiers, "client session",
with anything at all relating the client to the server, besides
the contents of the commands and results.

So, in the protocol, there are "states" of the protocol. This
is usually enough about sequences or chains of commands,
and as well with regards to entering streaming-sub-protocols.
So, it's sort of figured, that "states" of the protocol, are
sub-protocols, then with usually entering and exiting the
sub-protocols, that being "in the protocol".

Then, there are "profiles" of the protocol, where the protocol
has a sort of "base protocol", which is always in effect, and
then any number of "protocol extensions", then as whether
or not those are available, and advertised, and/or, exercised,
or excluded. A profile then is whatever of those there are,
for a command and stream and connection and the server,
helping show protocol profile coverage according to that.


Here then it's a usual idea that "CONNDROP" is always an
extension of the protocol, because the network is formally
un-reliable, so any kind of best-effort salvage attempt any
other error that modeled by the protocol, goes to CONNDROP.

Then, it's also a usual idea that any other error, than modeled
by the protocol, has a sort of UNMODELED protocol,
though as with regards to that the behavior is CONNDROP.

Then, for streams and messages, gets to that CONNDROP,
and "STREAMDROP" and "MESSAGEDROP", sort of vary,
those though being usual sorts catch-all exceptions,
where the protocol is always in a sort of protocol.
Dan Christensen
2017-05-19 12:38:32 UTC
Reply
Permalink
Anyone can set up a moderated newsgroup in Google Groups for free. It will look just like sci.math on Google Groups, just a different name. The search feature is very limited -- a nice interface otherwise. No serious bugs AFAICT. It does freeze up occasionally, and you have refresh but nothing gets lost.

See: http://www.wikihow.com/Create-a-Google-Group. IIUC, new members can be moderated at first, then promoted by a moderator to unmoderated status after a while. They can also be subsequently demoted or banned, or their postings deleted if they go troll.


Dan
Archimedes Plutonium
2023-12-22 08:48:39 UTC
Reply
Permalink
Joe Biden are the 50 F-16 fighter jets ready to ship to Ukraine.

I like Ukraine to win before Xmass dinner, and you dragging your feet, pussyfooting around does not help matters. Just take the jets out of the largest Republican loudmouth states. Get 100 volunteer skilled pilots, brave souls to fly them, and force Russia out of Ukraine.

Keep me posted Joe.

50 F-16 fighter jets in Ukraine wins and ends the war.

\__[0]__/

\/
--=_/(·)\_=--



F-16

______
L,. ',
\ ',_
\ @ ',
\ ^~^ ',
\ NR ',
\___'98fw ',_ _..----.._
[______ ^~==.I\____________..-~<__\\***@___Z4,_
,..-=T __ ___________ \/ "'" o<== ^^~-+.._
I____|_____ }_>=========I>=**^^~~==-----------==- " | ^~-.,_
[_____,.--~^ _______ ~~--=<~~-----=====+==--~~^^
^~~-=+..,,__,-----,____| | -=* |
|_ / |---,--~^---+-----+-~^
^^"~ d~b=^ ^----+t
q_p '@

F-16
___
| \
| \ ___
|_____\______________.-'` `'-.,___
/| _____ _________ ___>---
\|___________________________,.-'`
`'-.,__________)


Any ascii Jets?
0 views
Skip to first unread message
Subscribe
Greg Goebel's profile photo
Greg Goebel
unread,
Mar 13, 1995, 10:34:47 AM



to
Pete Wilson (***@esu.edu) wrote:
: Could someone please post some ascii Jet fighters for me? I would really
: like to have and F-16 pic! Thanks
This is from a list I got off the Net some time ago and edited to my taste:

Hornet
from Joseph Hillenburg \ /
***@gnu.ai.mit.edu +----o0o----+

Comet by Dave Goodman __|__
***@misty.sara.fl.us ------oo(_)oo------

F-4 Phantom by .
Curtis Olson (***@sledge.mn.org) \__[0]__/


twin engine fighter | |
by Jim Knutson --=oOo=--
***@mcc.com Check Six! +


Jeff R. Sents \ /
***@dixie.com _____-/\-_____
***@loads1.lasc.lockheed.com \\//


Rob Logan -----|-----
***@sun.soe.clarkson.edu *>=====[_]L)
-'-`-

Helicopter from -----|-----
E Curtis *>=====[_]D
***@oucsace.cs.ohiou.edu -'-`-

--+--
by Kay R. Fisher |
***@kay.enet.dec.com ---------------O---------------

JAS-39 Gripen |
Isaac Kuo __ n __
***@math.berkeley.edu X------[O]------X


SR-71 Blackbird front / ^ \
David Williams ---(.)==<-.->==(.)---

P-38 _|______|_
Kim R. Volz ***@amc.com ----(*)=()=(*)----

YF-2[23] \ /
Jason Nyberg ____\___/O\___/____
***@ctron.com \_\\_//_/

F-18 Hornet \ /
Kenneth E. Bailey x________\(O)/________x
***@emuvax.emich.edu o o O(.)O o o

Fouga Magister by Geoff Miller \ /
***@purplehaze.Corp.Sun.COM \ _ /
\/_\/
()------------------(|_*_|)------------------()


__|__
Formation (Avro Canucks?) __|__ *---o0o---*
by Matt Kenner __|__ *---o0o---*
***@po.cwru.edu *---o0o---*


|
F4U4 Phantom from /O\
Chad B. Wemyss \_______[|(.)|]_______/
***@wpi.wpi.edu o ++ O ++ o


\ _ /
F-18 Hornet *__________\_(0)_/__________*
Greg Knoch @ @ (](_o_)[) @ @
New Mexico State o o

|
IGJAS 39 Grypen ____O____
Urban Fredriksson x-----=[[.]]=-----x
***@icl.se * X 0' X *

l
___ n ___
Urban Fredriksson x---=O(.)O=---x
***@icl.se

|
F16 by: |
(Grand Moff Tarkin) (O)
***@emunix.emich.edu X--------<_._>--------X
(___)


\ _ /
YF-22 by: \ /_\ /
Paul Adams Jr. ____________\___/_._\___/____________
***@erc.msstate.edu \ \ / /
\__/\_/\__/


YF-23 by: \ __ /
Paul Adams Jr. \ ____/__\____ /
***@erc.msstate.edu ___________\/___/____\___\/___________
\ \____/ /
\__/ \__/


YF-22 prototypes \ /
Jeff R. Sents _____-/\-_____
home: ***@dixie.com \_\/_/ \ /
work: ***@loads1.lasc.lockheed.com _____-/\-_____
\_\/_/

_________
Saab 105 Sk 60 |
Urban Fredriksson __________|__________
***@icl.se [/___\]
\_o_/


| |
F-15 Eagle from | _ |
Chad B. Wemyss ______________|_( )_|______________
***@wpi.wpi.edu o +|+ [ ( o ) ] +|+ o
*[_]---[_]*


_______
F-101 One-O-Wonder (Voodoo) |
JOHNNY CHIU |
***@ccwf.cc.utexas.edu /O\
-------<((o))>-------
O O

_______
F-104 Star Fighter |
JOHNNY CHIU |
***@ccwf.cc.utexas.edu /0\
O-----((.))-----O
* *


/\
B2 Bomber from: \ \
Isaac Kuo \ \
***@math.berkeley.edu / \
<===>\
< )>
<===>/
\ /
/ /
/ /
\/

^
/ \
F-117A Nighthawk //V\\
Rafael Yedwab / \|/ \
Clark University /// v \\\
/ \
/ \
/ \
/ /| |\ \
/ / \ / \ \
\ / X X \ /
\/ / \ / \ \/
/ V \
| |


\__[0]__/
50 F-16 fighter jets in Ukraine wins the war.

Upload 50 F-16 fighter jets to Ukraine and get this war over with for victory to Ukraine. F-16 is the gamechanger-- it blows up trenches and destroys Russian airspace. Russia is defeated against the F-16.

I do not know if China is defeated against F-16s, and maybe someone in the military can voice opinion on this.

███۞███████ ]▄▄▄▄▄▄▄▄▄▄▄▄▃
Radio Wave & Laser Rifle to shoot down GLONASS and BeiDou satellites

Xi masses troops on Russian border to take back Outer Manchuria of the Qing dynasty. If you do not know the history, Russia stole Outer Manchuria and Vladivostok from China.

While Putin is too busy with his personal war, Xi thinks time is ripe to get back what belongs to China in the first place. OUTER MANCHURIA and especially Vladivostok.

Xi gives the Chinese people a Christmas gift--- Outer Manchuria-- the beloved Old China

I am not positive we can take out GLONASS and BeiDou from ground based radio and microwaves and laser waves, even jamming.

But I am certain that we can put a satellite in orbit that is a wrecking ramming satellite that does take out GLONASS and BeiDou. I am certain of this because several countries have robotic satellites that maintenance their fleet of satellites. And to this end, we need such a wrecking ball satellite immediately up there.

[Note, graphics found in sci.physics when Nomen Nescio used to spam sci.physics with a fake FAQ.]



███۞███████ ]▄▄▄▄▄▄▄▄▄▄▄▄▃
▂▄▅█████████▅▄▃▂
I███████████████████].
◥⊙▲⊙▲⊙▲⊙▲⊙▲⊙▲⊙◤...
Satellite RIFLE to shoot down GLONASS, Iran,and BeiDou satellites.
Hooray

Hooray!! End the Ukraine war

Easiest way to end the Ukraine invasion by Russia, start felling GLONASS satellites, fell them directly with radar laser pulses or jam them to fall.

Now I thought GLONASS Russian satellites numbered in the thousands, for the Internet is lousy on this question of how many satellites, for recently BBC was vague with a estimate of 600 satellites, yet another web site said 42,000. But apparently only 24 are operational for GLONASS. And my take on this is that satellites are precarious vessels and easily for something to go wrong and be inoperative. All the better to look for flaws in engineering to down all 24 GLONASS Russian satellites.

So, easy easy Achilles tendon in all of the Russian ICBM military strategy, for knock out the 24 and you in a sense, knock out the entire Russian ICBM arsenal, for they no longer have any navigation.

And if the West is on its top shape and form in technology, we want the West Scientists to figure out how to intercept the Russian ICBM and cause it to fall upon Russia and explode upon Russia.

Get the best electronics and electrical engineers of the West to figure out how to cause all Russian launched and Chinese launched ICBMs to explode on home territory.

Caveat: if the West can do it, mind you, the Chinese and Russians will want to steal those secrets from the West and that should never be allowed--Ultimate Top Secret classification that not even a punk weirdo president like Trump cannot see, nor mention to him for he would likely sell it for a golf course in some foreign enemy country.

Google search reveals
24+
GLONASS (Globalnaya Navigazionnaya Sputnikovaya Sistema, or Global Navigation Satellite System) is a global GNSS owned and operated by the Russian Federation. The fully operational system consists of 24+ satellites.Oct 19, 2021

Other Global Navigation Satellite Systems (GNSS) - GPS.govhttps://www.gps.gov › systems › gnss
About featured snippets

Feedback
People also ask
How many satellites are in the GLONASS?
As of 15 October 2022, 143 GLONASS navigation satellites have been launched, of which 131 reached the correct orbit and 24 are currently operational.

List of GLONASS satellites - Wikipediahttps://en.wikipedia.org › wiki › List_of_GLONASS_sa...
Search for: How many satellites are in the GLONASS?
Archimedes Plutonium's profile photo
Archimedes Plutonium
Nov 5, 2022, 11:02:21 PM
to Plutonium Atom Universe
███۞███████ ]▄▄▄▄▄▄▄▄▄▄▄▄▃
Radio Wave--Laser Rifle felling BeiDou satellites

From what I gather on internet, Russia has 24 satellites in operation while BeiDou China has 35.
> ▂▄▅█████████▅▄▃▂
> I███████████████████].
> ◥⊙▲⊙▲⊙▲⊙▲⊙▲⊙▲⊙◤...
Radio Wave-- LASER RIFLE to shoot down the premier BeiDou satellite.
Ending the dumb and stupid petty dictators launching rockets from North Korea.

It is respectfully request help from engineers in Japan to help fell the BeiDou satellites that navigate the illegal North Korea launches.


--- quoting Wikipedia ---
The BeiDou Navigation Satellite System (BDS; Chinese: 北斗卫星导航系统; pinyin: Běidǒu Wèixīng Dǎoháng Xìtǒng) is a Chinese satellite navigation system. It consists of two separate satellite constellations. The first BeiDou system, officially called the BeiDou Satellite Navigation Experimental System and also known as BeiDou-1, consisted of three satellites which, beginning in 2000, offered limited coverage and navigation services, mainly for users in China and neighboring regions. BeiDou-1 was decommissioned at the end of 2012. The second generation of the system, officially called the BeiDou Navigation Satellite System (BDS) and also known as COMPASS or BeiDou-2, became operational in China in December 2011 with a partial constellation of 10 satellites in orbit. Since December 2012, it has been offering services to customers in the Asia-Pacific region.

In 2015, China launched the third generation BeiDou system (BeiDou-3) for global coverage. The first BDS-3 satellite was launched on 30 March 2015. On 27 December 2018, BeiDou Navigation Satellite System started providing global services. The 35th and the final satellite of BDS-3 was launched into orbit on 23 June 2020. It was said in 2016 that BeiDou-3 will reach millimeter-level accuracy (with post-processing). On 23 June 2020, the final BeiDou satellite was successfully launched, the launch of the 55th satellite in the Beidou family. The third iteration of the Beidou Navigation Satellite System provides full global coverage for timing and navigation, offering an alternative to Russia's GLONASS, the European Galileo positioning system, and the US's GPS.

According to China Daily, in 2015, fifteen years after the satellite system was launched, it was generating a turnover of $31.5 billion per annum for major companies such as China Aerospace Science and Industry Corporation, AutoNavi Holdings Ltd., and China North Industries Group Corp. The industry has grown an average of over 20% in value annually to reach $64 billion in 2020 according to Xinhua citing data.

Domestic industry reports forecast the satellite navigation service market output value, directly generated and driven by the Beidou system, will be worth 1 trillion yuan ($156.22 billion) by 2025, and $467 billion by 2035.

Archimedes Plutonium
Nov 5, 2022, 11:20:20 PM
to Plutonium Atom Universe
███۞███████ ]▄▄▄▄▄▄▄▄▄▄▄▄▃
Radio Wave--Laser Rifle felling Iran satellites
>
> From what I gather on internet, Russia has 24 satellites in operation while BeiDou China has 35, that would indicate Iran has but a few satellites.
> > ▂▄▅█████████▅▄▃▂
> > I███████████████████].
> > ◥⊙▲⊙▲⊙▲⊙▲⊙▲⊙▲⊙◤...
> Radio Wave-- LASER RIFLE to shoot down the premier BeiDou satellite.
> Ending the dumb and stupid petty dictator sending drones to Russia to down Ukraine power utility electric lines.

How many of the Iran satellites are used in drones destroying Ukraine electric grid. We should immediately fell those satellites.

--- quoting Wikipedia on Iran satellites ---
On 22 April 2020, Iran successfully launched "Noor" (Farsi for "Light"), a military satellite, into a 426 x 444 km / 59.8° orbit.

On 8 March 2022, Iran reportedly sent its second “Nour-2” military satellite into 500 km orbit.[55][56]
The Khayyam, a high resolution imaging satellite, was successfully launched into orbit by a Russian Soyuz rocket on 9 August 2022
Unlaunched satellites
Nahid (1), satellite with folding solar panels.
Toloo, is the first of a new generation of reconnaissance satellites being built by Iran Electronics Industries with SIGINT capabilities. It will be launched by a Simorgh.
Nasir 1, Iran's indigenously designed satellite navigation system (SAT NAV) has been manufactured to find the precise locations of satellites moving in orbit.

Zohreh, is a geosynchronous communication satellite which was originally proposed before the Revolution in the 1970s as part of a joint Indian-Iranian project of four Iranian satellites to be launched by the then upcoming NASA Space Shuttles. Iran had also negotiated with France to build and launch the satellites but the project never materialized. In 2005, Iran negotiated with Russia to build and launch the first Zohreh satellite under an agreement worth $132 million with the satellite launch date stipulated as 2007–2008. The new agreement had followed the earlier failed negotiations with Russia in 2003 when Russia cancelled the project under US pressures.
Ekvator, a geosynchronous communications satellite built by ISS Reshetnev for Iran in a continuation of previous Russia-Iran space cooperation efforts. As of October 2022, Ekvator is expected to be launched on a Proton-M rocket in early 2024.


Archimedes Plutonium
Nov 5, 2022, 11:38:31 PM
to Plutonium Atom Universe
███۞███████ ]▄▄▄▄▄▄▄▄▄▄▄▄▃
Radio Wave--Laser Rifle felling North Korean satellites
>
> From what I gather on internet, Russia has 24 satellites in operation while BeiDou China has 35, that would indicate Iran has but a few satellites. And North Korea fewer yet.


> > ▂▄▅█████████▅▄▃▂
> > I███████████████████].
> > ◥⊙▲⊙▲⊙▲⊙▲⊙▲⊙▲⊙◤...

Radio Wave-- LASER RIFLE to shoot down the 3 North Korean satellites.


Ending the dumb and stupid petty dictator with his endless illegal missile launches. Launches that do nothing of good for anyone. Not even the idiot petty dictator.

Internet search reveals 3 satellites for North Korea. I suspect though, that North Korea launches Chinese missiles and uses BeiDou satellites.

Kwangmyongsong -4

Kwangmyongsong -3

Kwangmyongsong -2

Knock all 3 satellites out with the radio-Laser Wave gun. And that will put an end to the petty dictators toys.

Asking for help from Japan in assistance.
Loading...