Question 1

What is the problem solved by Dymium?

Accepted Answer

Dymium solves the problem of providing data securely to humans, applications, and AI entities.

Question 2

Why is securing AI important?

Accepted Answer

Prompts that go into LLMs potentially leak sensitive information. This sensitive information can be later used for training, and is ultimately discoverable in the trained model. Besides, the history of conversations with LLM can also reside with the vendor, and can, and has been stolen.

Question 3

How are you different from other security vendors?

Accepted Answer

People have been trying to use LLMs to secure LLMs. This approach is fundamentally flawed as LLM can always be prompt-engineered to leak data or to disobey the previous instructions or training.  Dymium approaches the problem from the tried and true traditional security principles - we prevent sharing sensitive data with the LLM in the first place!

Question 4

Data Security is an overused term. In what sense is data security used here?

Accepted Answer

Dymium acts as an intermediate layer called Ghost Layer between the original data sources and consumers. The consumers only see a subset of the data allowed by policy, and selected data elements can be transformed. For example, PIIs can be blocked, redacted, or obfuscated, that is replaced with synthetic substitutions. This takes care of data leakage prevention, and compliance.

Question 5

How does Dymium connect to my data?

Accepted Answer

We introduce a concept of data source, comprising data bases, files, API providers, document storages, etc.  To ensure security, we analyse the data structure, and detect the presence of sensitive data in this structure specifically for the particular data source.

Question 6

What are the data sources supported by Dymium?

Accepted Answer

Dymium supports most popular relational databases: PostgreSQL, MySQL, MariaDb, MS SQLServer, Oracle, IBM DB2, as well as MongoDB, Elastic Search, Parquet, CSV, JSON on S3, Parquet, CSV, JSON via SFTP, Amazon RedShift, YugaByte, CockroachDB. We also work with SalesForce, HubSpot, SnowFlake and DataBricks APIs as data sources. More data source types are coming online soon.

Question 7

I have a proprietary API that I want to extract data from. Can I do it with Dymium?

Accepted Answer

An arbitrary REST endpoint can also be connected as a data source. As long as you can hit it with a curl and get a JSON recordset back!

Question 8

So what entities are the data consumers for the Ghosts?

Accepted Answer

GhostDb presents itself as a PostgreSQL database. When you run the Dymium Tunneling client, it is available on a loopback adapter for any application running on your computer. Those apps are data consumers, as well as people writing Python scripts. GhostAI is a safe Chat interface for the corporate users.  It never sends PIIs in prompts to the commercial LLMs. Instead they are detected, substituted with synthetic data, sent to LLM safely, and the answers are reconstituted. In this case you have people as AI-generated data consumers. GhostAI can be integrated with the GhostDBs to provide seamless Business Intelligence capabilities. You simply ask a question that requires the corporate data to answer, and get a number, or a plot, or a table! The data never goes to the public LLM. We only send the metadata, and get back the recipes, which are executed in a secure sandbox. People are the data consumers. GhostAPI is a way to create and instantiate REST APIs that use GhostDBs with prompts. This is the ultimate no-code experience for API developers! New applications are data consumers. GhostMCP aggregates GhostAPIs, and provides a discovery service for them. The data consumers in this case are the AI Agents GhostLLM is a safe interface to commercial LLMs. All the security protection is seamlessly integrated. This is also for the AI Agents GhostFiles provides a clientless interface to files mounted from SMB, S3, SFTP or Google Drive. You can set up a policy that will redact the PIIs on the fly. People are the consumers. GhostFiles also allows to index a directory of reasonably uniform files, for example, invoices. The result will be a Data Source with the schema suggested by LLM, ready for analysis.  A GhostDB built from such a data source can be used in all scenarios above.

Question 9

Is Dymium platform a SaaS?

Accepted Answer

Yes, Dymium platform is a SaaS based in AWS cloud.

Question 10

Can it be deployed on-prem though?

Accepted Answer

Yes, Dymium Platform can be packaged as an on-prem, or a private cloud solution. All it needs is a Kubernetes cluster!

Question 11

Can Dymium be deployed in an air-gapped environment?

Accepted Answer

Yes, Dymium can be deployed in an air-gapped environment. We provide Zarf archives in a UDS bundle for installation.

Question 12

How does Authentication and Authorization work in Dymium?

Accepted Answer

Dymium integrates with your Identity Provider, such as Okta, EntraID, Ping or Keycloak using the OIDC protocol.  Authorization is based on group membership. We can also authenticate against Google Workspace and Office365.

Question 13

How do Groups work in Dymium?

Accepted Answer

The Dymium platform relies heavily on groups as the primary method for controlling permissions and access, using a whitelist approach. Data access in the form of GhostDBs, GhostAPIs, etc  is associated with specific groups, enabling fine-grained access control for different sets of users.

Question 14

How does Dymium help Data Engineers, and Data Analysts?

Accepted Answer

Dymium provides an instant ETL pipeline.  Data Engineers can create Ghost Databases with real time tokenization, and supply real time datasets for Data Analysts. Data Analysts can access data either as downloads in a web portal, or using Dymium Tunneling Client as a PostgreSQL database local to their computer.

Question 15

What is a Dymium Tunneling Client?

Accepted Answer

Dymium Tunneling Client is an application built for Windows, Mac OS and Linux. It authenticates a user using a web browser, establishes a tunnel to Dymium, and provides port forwarding to the Ghost Database with complete SQL access.

Question 16

Can Ghost Database be composed of multiple data sources?

Accepted Answer

Absolutely. While a Ghost Database will look as a single database to the consumer, it can consist of multiple backend databases and database types hosted in different regions and/or cloud providers. For example you can execute a join between a table hosted on Oracle on AWS in Europe, a table hosted on MySQL on Google GCP in the US and a CSV file hosted on an sFTP server hosted locally.

Question 17

Can I rename data source tables and columns?

Accepted Answer

Yes, absolutely. You can also change the data types when it makes sense.

Question 18

Does obfuscation present a problem for such joins?

Accepted Answer

No, obfuscation is consistent between multiple data sources within a session.

Question 19

How does Dymium increase data security while using LLMs?

Accepted Answer

Dymium offers GhostAI - a safe interface to chatbots from OpenAI, Google and Anthropic. This interface is auditable, allows to scrub PIIs from user's prompts and seamlessly reconstitute LLM replies. A CISO does not have to worry about sensitive data ever being leaked to the providers. You can also specify system prompts and guardrails policy, and block source code.

Question 20

Does GhostAI support images, documents and audio?

Accepted Answer

Yes it does. GhostAI detects PIIs in these data formats and blocks them.

Question 21

How does Dymium detect PIIs?

Accepted Answer

Dymium utilises a proprietary model based on a combination of rules and a fine tuned BERT neural network. In addition, a user can define a regular expression, or upload a file with a list of explicit entities.

Question 22

Can I avoid using a commercial LLM completely?

Accepted Answer

Yes, you can use a secure domestic private LLM run by Dymium.

Question 23

Can I integrate GhostAI in my own applications?

Accepted Answer

Yes, Dymium offers GhostLLM - a OpenAI-style LLM endpoint for GhostAI.

Question 24

Can GhostAI integrate with corporate data securely?

Accepted Answer

Yes, in two ways. A set of documents can be uploaded, or connected from a drive and used in RAG. This can be very effective in case of compliance documents, such as rules. Portions of these documents are fetched based on their relevance to your prompt, and the LLM can integrate this information in its response. Importantly, any such document will be clearly referenced in the response so you can verify or find the source for the answer. Dymium has an AI Agent implemented under the hood which can use Ghost Databases available to a given user to answer relevant questions. By design, the Agent will never be able to see your data as part of its workflows and it will never be sent to a public LLM. We only send metadata, and receive recipes, such as SQL queries and Python code that is executed in a secure sandbox environment.As a result, you can get metadata, specific values, and graphs in response to your prompt.

Question 25

So does Dymium support RAG?

Accepted Answer

RAG stands for Retrieval Augmented Generation. Our GhostAI uses corporate data in the form of files, databases, APIs with Large Language Models.  In particular, there is a built in vector database that indexes files uploaded specifically for RAG, or marked for RAG indexing in GhostAI.  We do not support interfacing with 3rd party vector databases though.

Question 26

You mentioned metadata. What is that?

Accepted Answer

The metadata is an extended description of the Ghost Database schema, including table and column names and descriptions. Dymium scans the data subsample, and generates an extended semantically complete description of the structure and the data content. This metadata is instrumental in a proper generation of GhostAPIs/MCP as well as in the data access within GhostAI.

Question 27

Does Dymium help application developers?

Accepted Answer

Yes, Dymium offers GhostAPI - a prompt- driven way to instantly develop and deploy REST APIs. Customer uses natural language to describe the function of the API, and Dymium generates a Node.js implementation. GhostDatabase acts as a back end for the API. The APIs are authenticated using preshared secrets or OIDC tokens.

Question 28

How does Dymium help with AI Agent development?

Accepted Answer

Dymium takes care of the MCP protocol security! Yes, MCP plugs into the GhostAPI which is a targeted interface into the Ghost Database. This means that the information is properly filtered, and transformed, and compliance in dealing with PIIs can be ensured. We address MCP security not at the LLM level, but based on a tried and true access control approach. You can go a step further than group whitelist permissioning for data access and create APIs that parametrize fields such as the email or name of a user's authenticated session token within your OIDC application. Agents can use these tokens on the user's behalf through GhostMCP to access data that is gated to the user based on their personal identity. We provide GhostLLM - an OpenAI API - compatible interface to GhostAI.

Question 29

How does Dymium connect to my data sources?

Accepted Answer

They are behind a firewall!We provide secure tunneling infrastructure to solve this problem.  You configure a tunnel, and get a Connector - available as a docker, .deb, .rtm or a Windows service PowerShell installer. The connector must have a route to the data sources that you want to be exposed to Dymium. You get a Helm chart for the docker, and the service installers are completely preconfigured!

Question 30

Can I have a permanent connection to a Ghost Database instead of relying on the interactive Tunneling Client?

Accepted Answer

Yes, we have a Machine Tunnel for that, implemented as a Docker.

Question 31

Does Dymium introduce latency when connecting to the Ghost Database?

Accepted Answer

When a query goes out to a single data source, the latency is minimal. If you are executing a join between multiple data sources, it depends on the dataset size, as Dymium has to accumulate multiple datasets before executing a join.

Question 32

Does Dymium provide access to non-structured data?

Accepted Answer

Yes, Dymium Platform gives you access to documents. Here's how it is done: GhostFiles gives you a targeted interface with data protocols such as SMB, SFT, S3, and Google Drive, containing unstructured files like PDF's, word documents, and others. Directories can be consolidated into Fileshares (which can include directories from different sources). Fileshares can be navigated through a clientless file explorer UI where files can be redacted on the fly, so the PIIs will be redacted if a user does not have unrestricted access. Directories containing uniform documents, for example, resumes or invoices, can be indexed by Dymium. A metadata schema containing one or multiple database tables will be created, and files will be processed using a secure LLM to extract and populate this structured data. The tables from these indexed Fileshares can be exposed via Ghost Database and used with all applicable features such as the AI Agent and GhostAPI/GhostMCP. You can even execute a join between a database and a directory of files!

Question 33

What about logging?

Accepted Answer

Logging is done using Kafka (in beta). A customer connects to the Kafka server endpoint, and downloads the logs. Dymium retains logs for a period of time.

Question 34

Does Dymium keep customer data?

Accepted Answer

By design, Dymium avoids keeping customer sensitive data. For example, GhostAI chat history is kept within a user's browser. Authentication and authorization is outsourced to the customer's own Idp. The logs that Dymium keeps are anonymized. However, there are some exceptions, namely: Files for RAG are indexed in a vector database. They are literally sliced and diced, but there are snippets of text from the original documents there. Indexed files in GhostFiles are also indexed in a separate database that is used as a cache, as the indexing process is not real time. Credentials to data sources are stored in an encrypted form

Question 35

How well does Dymium scale?

Accepted Answer

Dymium architecture is microservice-based, runs in Kubernetes, and the containers are stateless. We can add them on demand.

Question 36

Is Dymium a shared tenancy SaaS?

Accepted Answer

Mostly, it is not. There are some shared elements, for example, the web front end, the configuration database where customers are stored in different schemas, and the internal neural networks. Most of the microservices are dedicated, and isolated at a network level within Kubernetes, with secure communication between the microservices.

Question 37

Can I brand and customize the Dymium experience?

Accepted Answer

Yes, you can customize the colors for dark and light themes, as well as upload the logos

Question 38

Does Dymium follow the zero trust principle?

Accepted Answer

Absolutely! For example, the tunneling client and the data source connector use mutual TLS authentication to Dymium, using super short lived certificates. The certs for the web server are short lived. The encryption keys for JWTs are constantly rotated. Dymium security is strictly identity based, and tied to OIDC. The list can go on.

Question 39

Is Dymium Platform certified?

Accepted Answer

Dymium is SOC-2 compliant, and the Platform audited by a 3rd party. We constantly run code scans for CVEs, as well as OWASP ZAP scans for the web portal.

Frequently Asked Questions