This is a glossary of terms referenced directly in the text, along with some terms that connect the directly-referenced terms together. Definitions should not be considered authoritative per se, but rather on a âwhat we mean when we sayâ¦â basis.
An abstract syntax tree is a representation of software code which has been tokenized and parsed, and is typically the penultimate step before the code's semantics are interpreted. It is a convenient representation for navigating the different structural elements of the code.
Access is the part of a security policy that concerns the assignment of privileges: who is allowed to see or change an information resource. It is usually discussed in terms of authentication and authorization.
An account is a record in a database that stores information about a person (or automated software agent), including their username and password (or cryptographic keys).
An adapter, in the generic sense, is anything that connects one system to another.
An adapter bus is a bus (in computer science parlance, a trunking mechanism) for adapters. Assuming a set of adapters have relatively common behavior, that common behavior could be taken over by the bus, to which the adapters would act as plug-ins.
An aggregate value is a quantity which is the result of some operation over some other data, such as a count of entities. Data which is already quantitative can have aggregates in the form of descriptive statistics.
AWS is a sprawling set of software-as-a-service products from Amazon that began as their own infrastructure.
An API adapter is typically considered in the context of Web-based APIs, where an organization's system connects to a remote networked information service to send or receive data.
Application logic can be understood as a special case of content, to the extent that it is software behavior that does something directly germane to a specific application, in contrast to being one or more degrees removed from it, such as with infrastructure or other more broadly-applicable code.
An API is, in the most generic sense, a published commitment on the part of a person or business entity to expose certain functionality, behavior, or semantics of an information system under their control. In the sense in which we are more accustomed, an API refers to a specific class of software that implements this functionality on the Web.
An asset is any information resource that is subsidiary to some main piece of content, such as an image, sound file, video, or script.
An asset transform is one that focuses on transforming files, either in ways that can be round-tripped or ways that are lossy. An example would be converting images from one format to another, or cropping and resizing them.
Atom (RFC 4287) is an XML format for syndication feeds, which supersedes RSS 2.0.
An attack surface is an imaginary space that a hypothetical adversary can target. There is likewise an imaginary metric to this space, so the attack surface of a particular system can be thought of as "larger" or "smaller" than some other system.
Authentication is the aspect of access control concerned with figuring out who a principal (usually a person) is, or otherwise proving who they claim they are, should they be making such a claim.
Authorization is the process of judging whether a person (or automated agent), or rather their account, is allowed to access a particular resource, after they have been authenticated.
Automation is the broad technical program that aims reduce the need for directy human attention on any given task. Automation can be understood as the maturation of mechanization, which only reduces physical labor.
A bastion host is a server whose duties have been partitioned to handle only relatively low-sensitivity information. Bastion hosts are part of a security strategy such that an attacker would have to compromise the ultimately disposable bastion host first before proceeding to more sensitive targets. Without any persistent data of its own, a bastion host can be wiped and reinstalled if it is ever compromised, making them particularly suitable to containerization.
The bibliographic ontology offers numerous extensions of foaf:Document for use in bibliographic citations.
An RDF blank node is an anonymous entity which can only reliably be reached through its topological relationship with non-blank (i.e., URI) nodes.
A binary large object is a backronym for a segment of bytes treated as opaque by the information system.
A box plot is a way to visualize descriptive statistics.
C is a general-purpose programming language designed by Dennis Ritchie in the early 1970s, which is basically the backbone for everything else.
C++ is an object-oriented extension of C (indeed its own distinct language) by Bjarne Stroustrup. It is said to be like creating an octopus by nailing extra legs on a dog. Popular in game development and with Windows.
Caching is the process of storing the outcome of a computation or retrieval of information from a remote or computationally expensive source, typically carried out to increase the efficiency and durability of the overall process. A cache is a region in data storage with additional directives to periodically remove elements that are judged by some mechanism to be stale.
A caching proxy is a proxy with a cache attached.
A caching reverse proxy is a reverse proxy with the addition of a cache.
A call graph is a statistical representation of code as it is being executed, useful for determining the code's actual behavior. A call graph generally makes use of AST to produce meaningful results.
To call a representation of any information resource "canonical" implies that said representation at least encapsulates all available information about the resource, and moreover is in some form that is most desirable for, if not storage, then at least exchange.
An information resource may have multiple representations, but the canonical representation is the "master" one, and usually the one that you store. Canonicalization can also be the result of a function, where grammars with some play in the representation are brought into one with the same semantic content as the original, but a literal representation (i.e., an exact string of bits) suitable for comparison (as with identifiers), or cryptographic identity (as with larger blobs).
Cascading Style Sheets are a standardized, declarative, rule-based, domain-specific language for describing presentational parameters of HTML and XML (including SVG) documents.
Client-side JavaScript refers specifically to the language JavaScript executed in a Web browser, versus in some other execution context, such as Node.JS. Client-side JavaScript is the source of most interactive behavior in Web applications.
A client-side script is procedural code executed on the site of a client in a client-server model of networked information services.
A coded property is one which takes a finite set of predefined values.
An individual codepoint is generally understood to be the virtual index of a given Unicode character. As such, the concrete representation of a codepoint may occupy a range of bytes within a particular encoding scheme. UTF-8, for example, represents individual Unicode codepoints within a specially-crafted representation that can span one to seven bytes per codepoint.
The colour palette assigns a particular colour to a particular mapping of data dimensions and values. This tools has tens of thousands of such mappings, and thus the colour palette is generated programmatically.
Comma-Separated Values (CSV) is an interchange format for tabular data. It is represented as a text file or network message wherein each line is treated as a record, separated (canonically) by the comma "," character. The first record in the file may optionally depict column semantics.
In software version control, a commit is both an event that represents the advancement of the representational state of the content under management, as well as the record of that state. The changes between one commit and another can be represented as a diff.
Composability is the ability to connect two functions together to create another function. This necessitates that the output of one function is of the same type as the input of the other, which is implicit in a unified interface.
A composite document is made up of more than one reusable fragments, merged together through transclusion, generally agglomerated into a shell or umbrella-like structure.
Concept scheme is an umbrella term for typologies, taxonomies, dictionaries and glossaries, thesauri, et cetera. A concept scheme typically not only lists a set of concepts and their definitions, but also the (hierarchical or set-theoretic) relationships between them.
Conceptual integrity (Brooks 1975) is the state of affairs wherein a project or its ultimate product(s), and everybody involved, has a unified mental model of what it is and what it's for. Achieving conceptual integrity means decisions come easier and outcomes are more coherent, because there is little to no ambiguity around an optimal course of action.
Containerization is a strategy for packaging and deploying virtual servers such that they can be rapidly cloned and deployed to utility computing systems (i.e., "the cloud").
Content is the information that the user of an information system uses the system for.
After a content inventory, the elements are audited for fitness, correctness, relevance, style, etc., and recommendations are made for amendments, replacements, or supplements.
A content delivery network is a fancy caching (plain or reverse) proxy. These networked information services enhance the speed and reliability of websites by putting their hardware physically closer to clients. Requests to the origin (in reverse proxy configuration) or media assets like images or video (in either configuration) are served through the network from the system deemed closest to the client, even if the origin is unavailable.
Content governance is the practice of managing the life cycle of content within an organization, and setting policies for content strategy and information architecture.
Content infrastructure can be understood as the superset of all content management and related systems within an organization.
A content inventory is an exhaustive account of all information resources e.g. on a website, or indeed within an entire organization. The results are typically delivered on one or more spreadsheets, with e.g. a URI or other identifier in the first column, the title in the second, author, date, etc.
Content management is the art and science of dealing with the mass of (usually digital) stuff—usually, but not exclusively text—within an organization.
A CMS is a piece of software for organizing and operating over one or more Web properties. Functions include things like applying visual design templates and composing pages, as well as organizing linked files and embedded audiovisual media.
Content negotiation is a long-standing feature of HTTP that affords the varying of representations of a resource at a given URI along a number of different dimensions. Two dimensions of particular interest are content type and natural language.
Content strategy is a broader concept than content management, because it entails more than merely "managing" content, but also concerns the governance, life cycle, technical minutiae, as well as things like style guides for the voice and tone of (usually textual) content.
The content-type of a file or network message is a specific mapping between identifiers and data serialization (file) formats. The identifiers, which are registered with IANA, are used in operating systems and network protocols like e-mail and HTTP to determine what type of file or message is being accessed. This term is also used by content strategists to refer to categories (genres, formats) of documents.
A content-addressable store is a form of opaque blob storage where the method of indexing and retrieval is derived from the content itself. This derivation is almost always one or more cryptographic digests from a chosen set of algorithms.
In software development, continuous integration refers to both a process and its concomitant infrastructure for deploying software that itself typically runs a networked information service. A continuous integration system can be configured to automatically package and deploy software directly from version control to its target, upon confirmation of a passing test suite.
A controlled vocabulary can be understood as both a document and a database of valid terms for a particular information domain.
A cookie is a state mechanism for HTTP, consisting of a key and an arbitrary value (with some additional metadata), sent from the server to the client. The client then sends this information back to the server with each request.
A corpus is a term used to refer to the totality of a body of content under consideration. A content inventory, for instance, would describe a corpus. Corpus is just Latin for “body”, and tends to be useful to disambiguate when there are a number of different bodies around.
Create, Read, Update, Delete are the four basic operations of a database or data-driven application.
Cybertext is a generalization of the concept of hypertext to include non-deterministic processes.
A data (or file format, or serialization format) is a set of rules for turning a data structure into a sequence of bytes which can be stored or transported. The term is also used when speaking about instances of such serializations.
Data interchange is the transmission of data across information system boundaries in a manner that keeps the meaning of the data intact.
A data structure is an object that associates together a set of slots (properties, fields, dimensions, columns), where each slot has a meaning, and can hold a value, which can possibly be another data structure.
In the general sense, a data structure definition is exactly what it says it is. In the specific sense, it is a particular class in the RDF Data Cube Vocabulary that determines which columns go in a prototypical tabular dataset, and (optionally) in what order.
A data visualization is a graphical representation of a data structure or set of data structures. Data visualizations can either be quantitative, where the geometry is driven by numerical values, or qualitative, where the geometry is driven by semantic relations between structures.
A data vocabulary (or schema) is a controlled vocabulary with additional rigor, such that is strict enough to define a formal data structure. The scope includes the names (identifiers) of members and valid sequences, cardinality constraints, valid data types, and so on. A data vocabulary can be a conventional document, but the more sophisticated ones can be fed directly into a computer.
A database is any persistent storage facility for digital information. Databases can be embedded in applications, or be stand-alone online resources accessed over a network. File systems are a form of database, as are relational (SQL) databases and (RDF or otherwise) graph stores.
DOAP is an RDF vocabulary for expressing mainly open-source software projects.
Descriptive statistics are a way to get a sense of the "shape" of data which is already quantitative. On this project, we take the term to mean the five-number summary { sample minimum, first quartile, median, third quartile, sample maximum }, plus the average (arithmetic mean) and standard deviation.
A diff is an algorithmic attempt to represent the minimum set of changes between two streams of symbols. More colloquially, a diff is a text file containing changes between one or more other text files across time.
A digital asset management system aggregates the assets (such as images) within an organization's content ecosystem and provides an authoritative repository and tools to facilitate the indexing and retrieval of said assets, with considerable advantages over an ordinary file system.
A (data) dimension is the same thing as a "column" in Excel, at least when each column is being treated as a separate field, or distinct kind of thing. In this usage, each row is a record with possibly many dimensions. If the spreadsheet is being treated as a grid instead, then there are only two dimensions: all columns and all rows.
A directed graph is a mathematical object consisting of a set of nodes and a set of ordered pairs that connect the nodes together. Directed graphs are absolutely everywhere in computing.
A document can be understood as an ordered tree of content.
A document fragment, colloquially, is what it sounds like, except in XML (and adjacent) parlance it takes on an additional specific meaning. That is, a document fragment consists of one or more nodes like text or markup elements (tags). Thus, a concrete representation (e.g. a file) of a document fragment would itself be valid markup, and not start or end in the middle of a tag. (XML documents go further and require exactly one root element.)
The document root of a Web server is the directory (folder) associated with the root address of a given website's URL.
DTD is the legacy XML schema definition format inherited from HTML via SGML. It has been supplanted by XML Schema and Relax NG.
A DNS domain name is a mapping of, among other things, IP addresses to (ideally) human-readable words. Domain names are delegated through a hierarchy in the form delegate.principal.top. Domain names form the authority component of a URL, and from which root URLs to websites can be programmatically inferred, which causes some to say URL when they mean domain name.
A domain-specific language is a programming language optimized for, or indeed only capable of, some particular category of application. Contrast with a general-purpose language, which can typically be used for anything.
The Dublin Core Metadata Initiative is an early and established vocabulary for modeling various aspects of document metadata.
XHTML is a dialect of HTML implemented in XML. With the exception of a namespace URI and some very minor differences in the syntax, it is otherwise identical to HTML.
XML, while scorned and vilified by contemporary Web developers, is still a usable standard framework for expressing ordered-tree data structures.
Data which is Findable, Accessible, Interoperable and Reusable. FAIR began as an initiative in the scientific community to improve the hygiene around experimental data.
The Fifteen Fundamental Properties/Transformations are a conceptual framework invented by Christopher Alexander and described at length in The Nature of Order. They describe certain empirically observable geometric properties in (built) space that are associated with Alexander's conceptualizations of "wholeness" and "life". They also represent structure-preserving transformations that take a region of space from a state of the given geometric property being weakly-expressed to being expressed more strongly.
A file is an opaque blob—a segment of bytes—with a name.
Friend-of-a-Friend is a vocabulary for expressing social networks.
A folder (directory) is a virtual container in, and the mechanism that creates the hierarchy of, a hierarchical file system.
A format string is a convention in numerous programming languages where static text is interspersed with one or more predefined placeholder symbols, which are then substituted with data from the program.
A software framework is a library and set of tools for creating software, generally encodes the opinions of its authors, and tends to be incompatible with other frameworks.
Freemium is the neologism given to the business model whereby a restricted set of functionality in (typically) some software-as-a-service product is given to users free of charge with the anticipation that they will convert into paid customers.
In software development parlance, the front-end typically refers to the code used to power the user interface. On the Web, this term is synonymous with JavaScript.
GraphQL is a query language designed at Facebook for remote procedure calls over HTTP.
Haskell is a functional programming language with the unique characteristic that all programs are pure functions with no side effects (such as input/output). Side effects, and indeed a great many other aspects of programming in Haskell, are handled through a creative use of monads.
Heterogeneity in an information system is when the system is made of different technologies, vendors, and platforms.
An information system is homogeneous when it is entirely made using a single platform.
An HTTP redirect instructs a client who has requested one URL to request a different one instead.
Hypermedia is a generalization of the concept of hypertext, to include audiovisual content.
Hypertext is a style of text where segments of content are connected together by links.
HTML is, of course, the lingua franca of the Web.
HTTP is the protocol that carries the World-Wide Web.
HTTPS is HTTP which has been wrapped in Transport Layer Security (TLS) encryption.
The Hypermedia Constraint (erstwhile known as HATEOAS) is a feature of REST whereby, when treating a complying system as a state machine, subsequent states (or rather their identifiers) are embedded in the representation of the current hypermedia resource. A website that shows a page with links nominally qualifies, with the user clicking on the links in the role of the state transition function.
In information systems, an identifier is a symbol, word, or token that is associated with an information resource, that (ideally) uniquely identifies it.
Information architecture is the design of both the topological structure and the semantic content of information systems. The goal of information architecture is to help people situate themselves, understand their surroundings, find the things they're looking for, and even arrange for valuable serendipitous discoveries.
Information infrastructure is the information system that helps a person or organization manage and organize its stocks and flows of information. As with any information system, the information is more significant than whatever technology is used to manipulate it.
An information resource is a distinct, identifiable object that carries information. On a website, a resource is uniquely identifiable by a URI, aka Web address. On our project, the resource is the basic unit of visible progress. Pages are resources, so are images, videos, etc. Resources can also be the interfaces to programs, take parameters, and process input.
Information security is the discipline and practice of ensuring that only the people who are allowed to access a given set of information, are the ones who can access it.
An information system can be understood as a system (a whole made of mutually-interacting parts) for storing, operating over, and moving information. While specific information technologies may influence the behavior of the system, how the information system does what it does is secondary to what it does.
Information technology refers to any technology whatsoever that is made to deal with information. The current state of the art of information technology is computers and related electronic hardware. "IT" can also refer to a department in an organization charged with managing said hardware (and its related software).
In software development, I/O is a first-order physical resource constraint, along with CPU, memory, and storage.
Intelligent heterogeneity is the principle that begins with a standing assumption that an information system is always going to drive toward a heterogeneous state, and therefore this should be accommodated and planned for.
An interface can be understood as a junction between two or more systems or media. Interfaces are closely related to protocols, with a possible distinction that an interface is biased toward space, while a protocol is biased toward time.
The ISBN is the quintessential identifier for books.
In programming, an interpolated (concatenated) string is a string of text that has been composed from multiple parts.
An Intranet is a synonym for a local (e.g. office) computer network. Colloquially, it refers to an internal website or Web application.
An isomorphism can be understood as a perfect, 1:1 relation between two sets, such that every element in one set has exactly one counterpart in the other. In computing, an isomorphism is the pair of operations that transform the elements from one set to the other. For example, the "zip" and "unzip" algorithms, together, form an isomorphism.
Java is a programming language and virtual machine designed for "write once, run anywhere" cross-platform targeting, initially developed by Sun Microsystems in the 1990s, which was subsequently acquired by Oracle. Java appears to be the language of choice for academics and large corporations, which may account for its strong representation in RDF tools.
JavaScript is the de facto scripting language embedded in Web browsers to provide interactive functionality to websites, and since about 2008, has been gaining momentum as a general-purpose programming language. JavaScript is so named to capitalize on the momentum of Java in the 1990s, which, while the syntax is superficially similar, is otherwise unrelated.
JSON (JavaScript Object Notation) is a popular data interchange (serialization) format (syntax) for structured data.
JSON Schema is an independent effort to ascribe integrity constraints and hypermedia semantics to JSON. It has a superficial resemblance to JSON-LD, and is about the same age, but is otherwise unrelated.
JSON-LD is a particular configuration of JSON (i.e., it is a serialization syntax) that enables the encoding of Linked Data (RDF).
A JSON-LD context is a piece of metadata that enables the decoupling of JSON-LD terms from RDF terms, enabling JSON-LD to do double-duty as vanilla JSON.
The Java Virtual Machine can refer to either the specification for, or an implementation of, the virtual machine that runs Java.
A thousand lines of code very roughly correlates to (at least no less than) a certain quantity of software development effort, which will nevertheless depend on many other factors to yield an intelligible number.
A lambda can be understood as a receptacle and execution environment for a Web hook or microservice that is billed (by Amazon) on a per-use basis.
In REST parlance, a layered system is when there are multiple waypoints (such as reverse proxies) between an origin server and a client.
A legend is a mapping between a set of entities and a set of features in a map or data visualization. Often the legend relates the entities to a set of colours.
Levels of Scale are one of the fifteen fundamental properties and transformations Christopher Alexander described in his magnum opus, The Nature of Order.
Link rot is the colloquial term used to refer to the tendency for URLs to become decoupled from the information resources they reference.
A local network is a computer network contained within a house, office, or organization.
A log, when referring to data, is a list of entities that accumulate over time.
Machine learning refers to the use of statistical methods to program computers. It is typically what people mean by the term "artificial intelligence".
All digital data segments are in principle machine-readable, the question is to what extent can the machine do anything meaningful with it. A CSV file, for example, is machine-readable such that it can be read into a spreadsheet program, but contains no embedded semantics beyond the fact that the lines signify rows in a grid, and the commas signify columns. In other words, it can be displayed, but the actual contents of the CSV are meaningless without some outside cue.
Markup is a general strategy for data representation (particularly but not necessarily text) where data structure and semantics are embedded into the content, as opposed to overlaid onto it.
A markup transform differs from an asset transform in that it operates over markup, and often adds information (such as presentation markup) or composes multiple resources together, rather than strictly isomorphic or lossy transformations.
The matrix resource, in the context of this project, is a program which generates a tabular grid representing (usually) 2-dimensional data, based on a set of input parameters. Instantiations of the matrix resource provide the data for visualizations in this project.
Metadata is data about data, or in the context of Web content, could be any structured data associated with a webpage.
In object-oriented programming, a method is the term for a subroutine bound to an object (or its class), and therefore has access to the object's state.
Microcontent refers to the text in software systems on e.g., buttons, labels, tool tips, and error messages.
Microdata refers to a set of attributes in HTML 5, distinct from and incompatible with RDFa, that provide some functionality for embedding metadata.
Microformats are an early ad-hoc strategy for embedding structured metadata into HTML, by punning standard elements and attributes. Microformats 2 has expanded and generalized this technique somewhat.
As the saying goes, a monad is a monoid in the category of endofunctors.
MySQL is a popular open-source database product. Its main competitors are PostgreSQL or SQLite (depending on the application).
Notation 3 is a terse, easy-to-type RDF syntax with an additional facility for expressing inference rules. N3 is a superset of Turtle.
This refers to the formal grammar and semantics of communication over (in our case) a digital network. We can think of a network protocol as an analogous design problem to a file format, in that it's some definite representation of data, albeit with an emphasis on transporting it rather than storing it. Network protocols also tend to (but not always) model some kind of conversational state between two or more systems, and are therefore potentially more complex to specify than a data format.
A networked information service is an umbrella term for systems where computation, to the extent that there is any, happens at the site of the service's operator, and only the resulting information is conveyed over the network.
OAuth (Open Authorization) is a token-passing protocol that enables users to delegate to a bearer (e.g. an application or website) their access rights to a different website.
An object-relational mapper is a software library that abstracts SQL interactions under an object-oriented interface. From the point of view of the programmer it's just another API in the native language, no messing about generating SQL query strings or parsing result sets.
An ontology (in the sense of computing and information science) differs from a taxonomy, in that a taxonomy organizes categories through a fixed set of (often hierarchical) semantic relations, whereas an ontology also affords the definition and meta-relation of semantic relations themselves. (this definition is bad; make a better one someday)
We can say an information resource is opaque when the information system that surrounds it either cannot or does not inspect its contents. Opaque resources are typically stored, retrieved, and conveyed by the system as-is, and have a canonical literal representation from which all other representations are derived.
OpenID Connect is an authentication protocol built on top of OAuth. It provides a standardized mechanism for verifying the identity of the user delegating the token.
Organizational (institutional, corporate) memory is any data, information, or knowledge that an organization has to "remember" in order to function. In practice, it includes the amalgamation of information infrastructure that affords the people within the organization to access, maintain, and operationalize this memory.
In a layered system, and in particular with Web systems, an origin server is one that is an authoritative source for content. This is in contrast to something like a (reverse) proxy, which only acts as an intermediary and houses no content of its own.
OWL (Web Ontology Language) is a framework for expressing ontologies for automated reasoning applications. While not strictly subordinate to RDF, OWL has an incarnation as an RDF vocabulary, which is a superset of RDF Schema. In this form, OWL ontologies make it possible to express stipulations of cardinality on properties, term reconciliation and equivalencies between classes and properties, and a number of other desirable capabilities.
A pure function—a necessary distinction in computing, as in mathematics, a function is just a relation, and inherently “pure”—is a procedure that takes an input and returns a value without causing any side effects. This makes them very predictable and well-behaved development targets. Pure functions, like their mathematical counterparts, are therefore composable.
"Pace layers" is a conceptual framework advanced by the writer Stewart Brand, taking Duffy's concept of shearing layers and applying them to the societal scale.
A parametric (information) resource can be thought of as a single logical resource that yields one or more distinct representations based on supplied parameters. In this way it is very much like a (mathematical) function.
A password is a secret intended to be known only by the person (or people) who are authorized to access an account.
In URLs, path segments are the individual slugs that appear between the slash (/) characters. They are intended to (and sometimes do) represent directory paths, though on the Web their relationship is certainly more tenuous.
Perl is a general-purpose interpreted programming language designed by Larry Wall. It was the lingua franca of server-side programming for at least the first decade of the Web.
PHP is a popular interpreted programming language for making web pages.
Presentation markup is the subset of markup that forms the basis for (e.g.) a document's presentation. In contrast to structural markup, presentation markup can be removed from the document without changing the meaning of the content.
A privilege, in the context of access control, is the ability to access or manipulate a particular information resource. Privileges are granted to accounts, or to security groups.
Code profiling is the technique of attaching sensors to running code to determine which parts are running at suboptimal efficiency. Profiling, among other things, consumes AST and produces a call graph.
A programming language is a formal language in which software is written.
Protégé is the go-to editor for OWL ontologies.
A proxy, in information system parlance, is an intermediary between a client and an origin. Proxies may simply convey information without changing it, or they may perform additional operations on the information in transit.
PR is the business you get into when you get let go from your journalism job. Not to be confused with pull requests.
The other PR is a process pioneered by GitHub (and quickly copied by its competitors) to formalize the process of collaborative software development. The applicant opens a pull request against the maintainer's code repository with their proposed changes, and the latter can review and accept it (or not).
Python is an object-oriented programming language by Guido van Rossum, initially designed for teaching programming. It is very similar in performance and capabilities to (albeit with some notable improvements), and slightly younger than Perl. Python is currently (2019) gaining attention in data science and machine learning applications.
The query component of a URI (everything after the first question mark) is by convention ascribed additional semantics in some contexts, whereby it is taken to represent a set of key-value pairs. A query parameter (key=value) is one of those pairs.
R is both a language and programming environment designed for statistical computing.
R2RML is an RDF vocabulary that maps between RDF and relational (SQL) databases.
RDF Schema is an RDF vocabulary for expressing RDF vocabularies. Compared to its superset, OWL, RDF Schema is bare-bones, only specifying classes, properties, subclass and subproperty relations, and relations denoting the domains and ranges of properties.
The XML (and first) concrete syntax for RDF.
RDFa (RDF in Attributes) describes a set of markup attributes that carry RDF data, which can be embedded in HTML or an XML host language. This enables the semantic data to piggyback on the markup structure and often reuse the same content.
A reasoner is an important piece of software (usually manifested as either a library or a service) that performs inferences over RDF data: it uses inference rules generated from RDF Schema, OWL, or some other mechanism, to supplement an RDF graph with inferred statements based on asserted statements.
A record is typically represented as a single row in a database or spreadsheet. It corresponds to one entity.
A relational database is just another term for a database built on top of SQL.
Relax NG is a schema definition format for XML, which is considerably lighter-weight than XML Schema, and can be expressed using a compact syntax.
A representation (of a Web resource) refers to the literal sequence of bits that comes out of an HTTP response. It could be a file, or it could be the output of a program. A resource can have multiple representations that vary along a number of dimensions (e.g. language, content type). The process of choosing a representation of a resource is called content negotiation.
REST (Representational State Transfer) is an architectural style of (originally but not necessarily confined to Web) software, wherein a resource (identified, e.g., by a URI) is considered to be a state in a state machine, and its representation (content) contains the list of subsequent states (i.e., links to other URIs). REST has a considerable amount in common with Linked Data.
A request-URI, in HTTP parlance, is specifically the part of an HTTP URL after the authority and before the fragment, constituting only the path and the query components. The request-URI is sent to the server as part of the resource request.
RDF is a W3C open standard framework for the encoding of data semantics. RDF uses URIs to uniquely identify information resources and the semantic relations between them. It is important to understand that RDF is not a syntax; there are many syntaxes available that will encode RDF.
Responsive design is a Web design technique that focuses on visual elements which can be stretched, squashed and resized.
A reverse proxy is an intermediary between a client and an origin that has been arranged to operate without explicit attention to its existence. In general the difference between a proxy and a reverse proxy is that the former has been explicitly accessed while the latter is implicit.
RDF Site Summary 1.0 is a feed definition format based on RDF.
Really Simple Syndication (2.0) is an XML vocabulary designed by Dave Winer to compete with RSS 1.0. It has no genealogical relation to RSS 1.0.
Ruby is an object-oriented interpreted programming language by Yukihiro Matsumoto. It is very similar in capabilities and performance to both Perl and Python (and the youngest of the three). Ruby is a popular server-side programming language for Web applications.
Rust is a new programming language which aims to be a modern replacement for C. The goal is the speed of C with a level of program safety (i.e., from undefined behaviour and concomitant attacks) completely unattainable by C. Many of the developers of Rust also work on WebAssembly, a target to which Rust can compile.
The dev team is the group (or potentially one of many) within an organization who is principally responsible for creating software.
The ops team is responsible for running the software written by the dev team. Sometimes this is the same team (DevOps). Ops is also typically responsible for information security and SRE.
A safe language is a data format or programming language whose entire range of possible behaviour can be deduced through static analysis. Safe languages are therefore not Turing-complete. Sandboxing is a popular, though not completely reliable strategy for dealing with unsafe languages.
Sandboxing is the act of creating a hermetic execution environment for a software program, with the aim of limiting its access to resources, or otherwise controlling its behaviour. Sandboxing is a necessary step for handling Turing-complete languages whose behaviour is unknown and potentially untrustworthy, such as any JavaScript downloaded from the internet.
A search engine is a company (and concomitant website) that helps people find other websites.
SEO is the sacred cow of internet marketing. Nothing bad is to be said about SEO.
The Semantic Web is an unrealized and likely unrealizable utopia of decentralized linked data systems with smart agents performing inferencing over information they find on the Web. It nevertheless boasts a raft of very useful technologies for everyday practical applications.
A server is a computer that provides services to multiple clients over a network. Aside from this, and the fact that it is typically made from more expensive parts, it is no different in principle from an ordinary desktop computer.
The Shape Constraints Language is a mechanism for expressing integrity constraints in RDF data.
"Shearing layers" is a conceptual framework coined by the architect Frank Duffy to refer to aspects of a building which vary by orders of magnitude in their sensitivity to time.
A single source of truth is an idiom in information system design, whereby a single logical system is deemed authoritative over a set of information, and measures are taken to ensure this remains the case.
SIOC (Semantically-Interlinked Online Communities; pronounced "shock") is an OWL ontology for representing metadata related to blogs, CMSes and social media websites.
SRE is the (often dedicated) function of keeping large sites online.
In mathematical logic, skolemization is the removal of existential quantifiers from a formal statement. In RDF, it means turning a blank node into an isomorphic (and potentially dereferenceable) URI node.
SKOS (Simple Knowledge Organization System) is an OWL ontology for representing concept schemes and/or thesauri, which are particular kinds of taxonomies.
A slug is a string of text used as an identifier, typically without any whitespace. The term is derived from hot metal typesetting.
A snapshot, when referring to data, captures the state of a set of entities at one particular instant. Unlike a log, which is cumulative, a snapshot replaces the contents of any snapshot that comes before it.
The making of software, which I trust the audience is familiar with.
An SDK is a framework and attendant tooling and other resources for creating software for a given platform.
SaaS is a special, albeit common, case of utility (cloud) computing, in which access to some particular functionality—often but not exclusively user-facing application software—is what is being rented, rather than just the raw computing resources. Examples of SaaS are services like GMail, Twilio, or Salesforce.
The SPARQL query language is an analogue of SQL for RDF data.
A specificity gradient is the idea that a conceptual structure can be organized by increasing detail and specificity, from plain language to highly specialized concepts, until they can ultimately be operated over by a computer.
Structured Query Language is a standardized fourth-generation programming language for accessing and manipulating relational databases.
In computing, state can be understood as the sum total of all the information needed to reconstruct a process at a particular instant. This in principle makes state an observable, describable, quantifiable object, and potentially an addressable information resource in its own right.
Static code analysis involves parsing software code out to AST and tracing the various references throughout. It less resource-intensive and less dangerous (when the code is not trusted) but ultimately less powerful than generating a call graph, because it only identifies code that may be run (and even then potentially not even all of it), instead of code that actually is run.
An SKU is an identifier for a particular product in a company's inventory.
In programming, a string literal is a constant that contains a string of text.
Structural markup can be understood as the subset of markup which exclusively describes the topological structure of (e.g.) a document and its semantics, i.e. all the parts and what they all mean. This is in contrast to presentation markup.
Structured argumentation is a form of (potentially collaborative) reasoning and persuasion that takes place in a formal system with a constrained repertoire of operations.
A DNS subdomain is precisely that delegated symbolic namespace, the delegate in delegate.principal.top. In effect, all (available) domain names are technically subdomains.
In the version control system Git, a submodule is a separate Git repository that is linked to the one in context. Other version control systems exhibit analogous features, though they may go by different names.
In programming, a subroutine is a procedure that has been demarcated from the rest of the program somehow, often (but not exclusively) by giving it a name.
Scalable Vector Graphics is an XML vocabulary for doing what it says on the tin.
The symbol management problem is roughly that of having too many symbols to manage: object keys, enumerated values, variables, class names, method names, URL path segments and query parameters, CSS selectors, etc.
A syndication feed is a list of Web resources which is polled for updates.
A taxonomy is a flat, hierarchical (i.e., tree), set-theoretic, or graph-theoretic structure relating a set of categories to one another. Examples: the Linnaean Taxonomy relates categories of life form, NAICS relates categories of industry, and the Dewey Decimal System relates categories of books.
Enterprise taxonomy software is purpose-made for designing and managing categories of (e.g.) products within an organization.
Term reconciliation is a process conceived in RDF circles, but not limited to them. It is the act of identifying terms (think keys in map-like data structures) from one information system to another.
TeX is a quasi-declarative macro processing language designed by Donald Knuth in the 1970s for the purpose of typesetting (in particular mathematical) documents.
A thesaurus is a specific kind of concept scheme, insofar as it organizes terms (words and phrases), rather than concepts which need not necessarily be labeled.
A time series is any data set where one of the dimensions is time.
Transclusion is the seamless embedding of one segment of content inside another.
A transform is a pure function that operates over a segment of bytes and returns another segment of bytes. An example would be the gzip and gunzip functions. Not to be confused with the machine learning concept of a transformer.
An information resource can be said to be transparent if the information system surrounding it can, or indeed must, inspect its contents in order to address them. Transparent resources typically do not have canonical representations as all (lossless) representations are equivalent. Contrast with opaque resources.
A programming language is Turing-complete, roughly, if it is capable of self-reference, and thus capable of redefining its own behaviour. Such programs are mathematically impossible to determine whether or not they will carry out some particular (e.g., malicious) process at some point in the future or under what conditions, and thus can never be considered completely safe.
Turtle is another terse RDF syntax that is basically N3 minus the inference rules.
UTF-8 is the ubiquitous encoding format that represents Unicode codepoints as variable sequences of one to seven bytes.
In REST parlance, a unified interface is exactly what it sounds like: all parts of the system speak the same language.
A uniform resource identifier is exactly what the name says: an identifier (text symbol) for (information, chiefly) resources which is (syntactically) uniform.
A uniform resource locator is a subset of URI which specifies a protocol-specific network resource which can be de-referenced, that is to say accessed.
A uniform resource name is a kind of URI which occupies the "urn:" scheme, and is used to refer to resources whose dereferencing mechanisms are well-known, unspecified (e.g. urn:isbn:9780262691918), or do not exist at all. Contrast with URLs, which encode both how (by scheme) and where (by network address) to obtain the resource.
The UUID is a 128-(actually 122, after you remove the control bits)-bit identifier that is particularly useful for naming things when you don't want to spend time coming up with a name for them. By various mechanisms, not the least of which being their size, UUIDs are all but guaranteed to be unique. Hence the name.
An upper ontology is intended to tie together domain-specific "lower" ontologies into a unified frame.
UX design, erstwhile called interaction design, is the strategic discipline of crafting the process of using a (often but not exclusively software) product. It is closely related to information architecture and content strategy.
A user flow is one discrete process undertaken by a user, mediated by a product, to achieve a particular goal.
A user interface is an interface that comes in direct contact with a human.
User-generated content is exactly what it sounds like: content on a networked information service platform generated or sourced by that platform's users, as opposed to the platform's operators.
A username (user name) is a token or identifier that represents the computer account of a person (or automated software agent).
A variant, in the context of content negotiation, is a particular representation of a resource that occupies a region in the space of all its available dimensions.
In software development, version control is the process of, and attendant tooling for, preserving the changes to a set of files over time.
A virtual machine is a computer within a computer. It serves two main purposes: one is to unify the underlying hardware interface from the point of view of the software, such as with Java, and another is to run as a "guest" on top of one, or possibly many, "host" computers.
A virtual server is a computer that runs inside another computer, meaning no additional hardware is necessary.
A web API is an API that has been implemented over HTTP. Web APIs often use JSON to transfer payloads and give a passing nod to REST.
A Web application is a software application which uses the World-Wide Web as a user interface. A Web application can be an entire website, or a cluster of Web resources embedded within a website, or even a single Web resource.
Crawling is the act of an automated agent mapping out the Web by following links. Google famously crawls the Web for its search engine. Web crawling software can be quite complex and sophisticated in order to deal with copious confounding factors on the open Internet. Under ideal conditions, content inventories can be generated by crawling.
A Web hook is a target for remote information systems, identified by URL, for accepting event notifications.
A Web property is another, smaller umbrella term for networked information services that operate exclusively over the Web, such as ordinary websites, and Web APIs.
A Web Resource is any information on the World-Wide Web which is distinctly identifiable by an address, or Uniform Resource Identifier. A Web resource can take a variety of forms, including a conventional document file, a document fragment, or an interface to a software program.
A Web template is typically some kind of domain-specific language (and instances thereof) that provides instructions for attaching presentation markup to structural markup.
WebAssembly is an assembly language for virtual machines such as those found in a Web browser. Being an assembly language, it is a compilation target for any ahead-of-time compiled programming language. The net effect of WebAssembly is that a Web browser is now a general-purpose computing environment that can run arbitrary programs.
WebID is a protocol that weds RDF to X.509, embedding links to the former in client certificates of the latter. A server looking to identify a client can access the link and download profile information, as well as a round-trip verification of the identity of the client.
A website can be understood in one aspect as a service, which runs on a server. In another aspect, it is an authoritative location—one of many locations on the World-Wide Web—for information resources, such as web pages and interfaces to software applications.
Website navigation, as a noun, refers to the data structure that encodes the hierarchical groupings of pages on a website.
In text processing, whitespace refers to the set of codepoints that are interpreted as breaks in the text. In graphic design, whitespace is the literal white (of the paper) space between elements on the page.
Wicked problems, a term coined by Horst Rittel, refer to a category of problems which are complex, dynamic, with multiple stakeholders, and often involve considerable trade-offs. The quintessential wicked problem is climate change.
The World-Wide Web is a particular implementation of Hypermedia, which runs on top of the internet.
XML Schema is the consummately baroque syntax definition specification for XML, intended to replace the senescent DTD, that was resisted tooth and nail until it was supplanted by Relax NG. The only thing that remains of XML Schema with any currency is its second volume, on literal datatypes (which is actually essential).
XUL is an XML vocabulary, developed by Mozilla, for describing graphical user interfaces.