Before uploading your docs
Here, we will guide you through the process of uploading files to IngestAI Libraries. We support a wide range of file formats, including:
- DOC
- DOCX
- PPTX
- HTML
- XLSX
- TXT
- PAGES
- WPD
- EPUB
- DJVU
- AZW
- CBZ ...and many more
Before uploading your files, it is essential to follow the GIGA rule, which is well-known among data scientists. GIGA stands for "garbage in, garbage out," meaning the quality of the output is severely impacted by the quality of the input data. Therefore, it is crucial to remove irrelevant, outdated, duplicate or contradictory data to improve the output quality.
Our core technology focuses on quick retrieval of relevant information from your knowledge base, and then converting it into a conversational chat that mimics human interactions, using AI – within your favorite app or just embedded into your website.
By considering the GIGA rule and using our technology, you can ensure accurate and reliable data outputs. If you have any questions or need further assistance, please don't hesitate to contact our support team.
If you’d like to know more about data preparation, please read our blog Data Wrangling for AI Virtual Assistants and AI Chatbots.
Important! Basic rules for creating a file for libraries. Follow the rules below:
- Ensure that all content relevant to a specific topic is stored in the same Library. If splitting data to make it accessible from different chats or slash commands is desired, create separate Libraries and upload the content accordingly.
- Higher granularity results in more predictable (and less creative) responses, since it's harder for the AI to give different answers based on small, precise pieces of text.
- Data accuracy: Make sure these answers are comprehensive and detailed and do not consist of short, one- or two-word answers such as "Yes" or "No."
For example: "Can I cancel my subscription during the trial period?" -> "Yes, you can cancel your subscription during the trial period" makes the information clearer
- Avoid Binary Responses:
For example: Columns with binary data showing whether a customer has children or higher education (represented by "0" or "1") should be changed to "have children" or "has higher education" as “1” and “doesn’t have children” or “doesn’t have higher education” instead of “0” "number of children: 3", "work experience: 5 years” this is the correct data for AI
- If you want to download data from videos or images, first reformat them into text format using subtitles or additional applications. Don't try to embed images or videos in your uploads
- Excel or Google Sheets files should only have one tab, extra tabs AI bot will not see
In complex scenarios, such as integrating tabular structured data (Excel, Google Sheets, or relational databases like SQL) with text content (.docx, .txt, etc.) in the same Library, manual configuration by IngestAI may be required to achieve the best results. This customization service is currently available only in Business or Enterprise tariff subscription plans.
- Use a unique paragraph for each topic of up to 2000 characters
If you have paragraphs or rows in Excel or Google Sheets exceeding 2000 characters, we recommend using summarization or other prompt methods (available in IngestAI Prompt Engineering functionality) to reduce the maximum paragraph size to no more than 2000 characters.
Сompliance with all of the above rules will help to fully use modern AI technologies