Frequently Asked Questions
What is the problem solved by Dymium?
Dymium solves the problem of providing data securely to humans, applications, and AI entities.
Why is securing AI important?
Prompts that go into LLMs potentially leak sensitive information. This sensitive information can be later used for training, and is ultimately discoverable in the trained model. Besides, the history of conversations with LLM can also reside with the vendor, and can, and has been stolen.
How are you different from other security vendors?
People have been trying to use LLMs to secure LLMs. This approach is fundamentally flawed as LLM can always be prompt-engineered to leak data or to disobey the previous instructions or training. Dymium approaches the problem from the tried and true traditional security principles - we prevent sharing sensitive data with the LLM in the first place!
Data Security is an overused term. In what sense is data security used here?
Dymium acts as an intermediate layer called Ghost Layer between the original data sources and consumers. The consumers only see a subset of the data allowed by policy, and selected data elements can be transformed. For example, PIIs can be blocked, redacted, or obfuscated, that is replaced with synthetic substitutions. This takes care of data leakage prevention, and compliance.
How does Dymium connect to my data?
We introduce a concept of data source, comprising data bases, files, API providers, document storages, etc. To ensure security, we analyse the data structure, and detect the presence of sensitive data in this structure specifically for the particular data source.
What are the data sources supported by Dymium?
Dymium supports most popular relational databases: PostgreSQL, MySQL, MariaDb, MS SQLServer, Oracle, IBM DB2, as well as MongoDB, Elastic Search, Parquet, CSV, JSON on S3, Parquet, CSV, JSON via SFTP, Amazon RedShift, YugaByte, CockroachDB. We also work with SalesForce, HubSpot, SnowFlake and DataBricks APIs as data sources. More data source types are coming online soon.
I have a proprietary API that I want to extract data from. Can I do it with Dymium?
An arbitrary REST endpoint can also be connected as a data source. As long as you can hit it with a curl and get a JSON recordset back!
So what entities are the data consumers for the Ghosts?
GhostDb presents itself as a PostgreSQL database. When you run the Dymium Tunneling client, it is available on a loopback adapter for any application running on your computer. Those apps are data consumers, as well as people writing Python scripts. GhostAI is a safe Chat interface for the corporate users. It never sends PIIs in prompts to the commercial LLMs. Instead they are detected, substituted with synthetic data, sent to LLM safely, and the answers are reconstituted. In this case you have people as AI-generated data consumers. GhostAI can be integrated with the GhostDBs to provide seamless Business Intelligence capabilities. You simply ask a question that requires the corporate data to answer, and get a number, or a plot, or a table! The data never goes to the public LLM. We only send the metadata, and get back the recipes, which are executed in a secure sandbox. People are the data consumers. GhostAPI is a way to create and instantiate REST APIs that use GhostDBs with prompts. This is the ultimate no-code experience for API developers! New applications are data consumers. GhostMCP aggregates GhostAPIs, and provides a discovery service for them. The data consumers in this case are the AI Agents GhostLLM is a safe interface to commercial LLMs. All the security protection is seamlessly integrated. This is also for the AI Agents GhostFiles provides a clientless interface to files mounted from SMB, S3, SFTP or Google Drive. You can set up a policy that will redact the PIIs on the fly. People are the consumers. GhostFiles also allows to index a directory of reasonably uniform files, for example, invoices. The result will be a Data Source with the schema suggested by LLM, ready for analysis. A GhostDB built from such a data source can be used in all scenarios above.
Is Dymium platform a SaaS?
Yes, Dymium platform is a SaaS based in AWS cloud.
Can it be deployed on-prem though?
Yes, Dymium Platform can be packaged as an on-prem, or a private cloud solution. All it needs is a Kubernetes cluster!
Can Dymium be deployed in an air-gapped environment?
Yes, Dymium can be deployed in an air-gapped environment. We provide Zarf archives in a UDS bundle for installation.
How does Authentication and Authorization work in Dymium?
Dymium integrates with your Identity Provider, such as Okta, EntraID, Ping or Keycloak using the OIDC protocol. Authorization is based on group membership. We can also authenticate against Google Workspace and Office365.
How do Groups work in Dymium?
The Dymium platform relies heavily on groups as the primary method for controlling permissions and access, using a whitelist approach. Data access in the form of GhostDBs, GhostAPIs, etc is associated with specific groups, enabling fine-grained access control for different sets of users.
How does Dymium help Data Engineers, and Data Analysts?
Dymium provides an instant ETL pipeline. Data Engineers can create Ghost Databases with real time tokenization, and supply real time datasets for Data Analysts. Data Analysts can access data either as downloads in a web portal, or using Dymium Tunneling Client as a PostgreSQL database local to their computer.
What is a Dymium Tunneling Client?
Dymium Tunneling Client is an application built for Windows, Mac OS and Linux. It authenticates a user using a web browser, establishes a tunnel to Dymium, and provides port forwarding to the Ghost Database with complete SQL access.
Can Ghost Database be composed of multiple data sources?
Absolutely. While a Ghost Database will look as a single database to the consumer, it can consist of multiple backend databases and database types hosted in different regions and/or clock providers. For example you can execute a join between a table hosted on Oracle on AWS in Europe, a table hosted on MySQL on Google GCP in the US and a CSV file hosted on an sFTP server hosted locally.
Can I rename data source tables and columns?
Yes, absolutely. You can also change the data types when it makes sense.
Does obfuscation present a problem for such joins?
No, obfuscation is consistent between multiple data sources within a session.
How does Dymium increase data security while using LLMs?
Dymium offers GhostAI - a safe interface to chatbots from OpenAI, Google and Anthropic. This interface is auditable, allows to scrub PIIs from user's prompts and seamlessly reconstitute LLM replies. A CISO does not have to worry about sensitive data ever being leaked to the providers. You can also specify system prompts and guardrails policy, and block source code.
Does GhostAI support images, documents and audio?
Yes it does. GhostAI detects PIIs in these data formats and blocks them.
How does Dymium detect PIIs?
Dymium utilises a proprietary model based on a combination of rules and a fine tuned BERT neural network. In addition, a user can define a regular expression, or upload a file with a list of explicit entities.
Can I avoid using a commercial LLM completely?
Yes, you can use a secure domestic private LLM run by Dymium.
Can I integrate GhostAI in my own applications?
Yes, Dymium offers GhostLLM - a OpenAI-style LLM endpoint for GhostAI.
Can GhostAI integrate with corporate data securely?
Yes, in two ways. A set of documents can be uploaded, or connected from a drive and used in RAG. This can be very effective in case of compliance documents, such as rules. Portions of these documents are fetched based on their relevance to your prompt, and the LLM can integrate this information in its response. Importantly, any such document will be clearly referenced in the response so you can verify or find the source for the answer. Dymium has an AI Agent implemented under the hood which can use Ghost Databases available to a given user to answer relevant questions. By design, the Agent will never be able to see your data as part of its workflows and it will never be sent to a public LLM. We only send metadata, and receive recipes, such as SQL queries and Python code that is executed in a secure sandbox environment.As a result, you can get metadata, specific values, and graphs in response to your prompt.
So does Dymium support RAG?
RAG stands for Retrieval Augmented Generation. Our GhostAI uses corporate data in the form of files, databases, APIs with Large Language Models. In particular, there is a built in vector database that indexes files uploaded specifically for RAG, or marked for RAG indexing in GhostAI. We do not support interfacing with 3rd party vector databases though.
You mentioned metadata. What is that?
The metadata is an extended description of the Ghost Database schema, including table and column names and descriptions. Dymium scans the data subsample, and generates an extended semantically complete description of the structure and the data content. This metadata is instrumental in a proper generation of GhostAPIs/MCP as well as in the data access within GhostAI.
Does Dymium help application developers?
Yes, Dymium offers GhostAPI - a prompt- driven way to instantly develop and deploy REST APIs. Customer uses natural language to describe the function of the API, and Dymium generates a Node.js implementation. GhostDatabase acts as a back end for the API. The APIs are authenticated using preshared secrets or OIDC tokens.
How does Dymium help with AI Agent development?
Dymium takes care of the MCP protocol security! Yes, MCP plugs into the GhostAPI which is a targeted interface into the Ghost Database. This means that the information is properly filtered, and transformed, and compliance in dealing with PIIs can be ensured. We address MCP security not at the LLM level, but based on a tried and true access control approach. You can go a step further than group whitelist permissioning for data access and create APIs that parametrize fields such as the email or name of a user's authenticated session token within your OIDC application. Agents can use these tokens on the user's behalf through GhostMCP to access data that is gated to the user based on their personal identity. We provide GhostLLM - an OpenAI API - compatible interface to GhostAI.
How does Dymium connect to my data sources?
They are behind a firewall!We provide secure tunneling infrastructure to solve this problem. You configure a tunnel, and get a Connector - available as a docker, .deb, .rtm or a Windows service PowerShell installer. The connector must have a route to the data sources that you want to be exposed to Dymium. You get a Helm chart for the docker, and the service installers are completely preconfigured!
Can I have a permanent connection to a Ghost Database instead of relying on the interactive Tunneling Client?
Yes, we have a Machine Tunnel for that, implemented as a Docker.
Does Dymium introduce latency when connecting to the Ghost Database?
When a query goes out to a single data source, the latency is minimal. If you are executing a join between multiple data sources, it depends on the dataset size, as Dymium has to accumulate multiple datasets before executing a join.
Does Dymium provide access to non-structured data?
Yes, Dymium Platform gives you access to documents. Here's how it is done: GhostFiles gives you a targeted interface with data protocols such as SMB, SFT, S3, and Google Drive, containing unstructured files like PDF's, word documents, and others. Directories can be consolidated into Fileshares (which can include directories from different sources). Fileshares can be navigated through a clientless file explorer UI where files can be redacted on the fly, so the PIIs will be redacted if a user does not have unrestricted access. Directories containing uniform documents, for example, resumes or invoices, can be indexed by Dymium. A metadata schema containing one or multiple database tables will be created, and files will be processed using a secure LLM to extract and populate this structured data. The tables from these indexed Fileshares can be exposed via Ghost Database and used with all applicable features such as the AI Agent and GhostAPI/GhostMCP. You can even execute a join between a database and a directory of files!
What about logging?
Logging is done using Kafka (in beta). A customer connects to the Kafka server endpoint, and downloads the logs. Dymium retains logs for a period of time.
Does Dymium keep customer data?
By design, Dymium avoids keeping customer sensitive data. For example, GhostAI chat history is kept within a user's browser. Authentication and authorization is outsourced to the customer's own Idp. The logs that Dymium keeps are anonymized. However, there are some exceptions, namely: Files for RAG are indexed in a vector database. They are literally sliced and diced, but there are snippets of text from the original documents there. Indexed files in GhostFiles are also indexed in a separate database that is used as a cache, as the indexing process is not real time. Credentials to data sources are stored in an encrypted form
How well does Dymium scale?
Dymium architecture is microservice-based, runs in Kubernetes, and the containers are stateless. We can add them on demand.
Is Dymium a shared tenancy SaaS?
Mostly, it is not. There are some shared elements, for example, the web front end, the configuration database where customers are stored in different schemas, and the internal neural networks. Most of the microservices are dedicated, and isolated at a network level within Kubernetes, with secure communication between the microservices.
Can I brand and customize the Dymium experience?
Yes, you can customize the colors for dark and light themes, as well as upload the logos
Does Dymium follow the zero trust principle?
Absolutely! For example, the tunneling client and the data source connector use mutual TLS authentication to Dymium, using super short lived certificates. The certs for the web server are short lived. The encryption keys for JWTs are constantly rotated. Dymium security is strictly identity based, and tied to OIDC. The list can go on.
Is Dymium Platform certified?
Dymium is SOC-2 compliant, and the Platform audited by a 3rd party. We constantly run code scans for CVEs, as well as OWASP ZAP scans for the web portal.
Join Us
Dymium is an equal opportunity employer, always looking for driven professionals with a passion for data science, security, and ethical data sharing.
Check back soon for open roles.
.png)