Categories
Enable Full Dataset Profiling in OAC/OAS DV

Currently, Data Profiling in Oracle Analytics Server/Cloud DV only runs on sample data, typically a subset of rows. While this works for quick overviews, it limits accuracy and reliability—especially when working with large datasets or columns with sparse or skewed data distributions.
Requested Enhancement:
Allow users to choose between:
- Sample-Based Profiling (default for performance)
- Full Dataset Profiling (optional, with a warning for large datasets)
Why This Matters:
- Misleading summaries: Outliers, null patterns, or rare categorical values may not appear in the sample but are critical for analysis.
- Data quality issues: Profiling is often used to detect issues like nulls, duplicates, or format anomalies—which may only be visible across the full dataset.
- Governance & trust: Data stewards and analysts need complete views to confidently certify datasets.
Comments
-
Hi @Dhaval Parikh Stantec - thank you for submitting this idea and for the additional details. Our profiling works on representative randomized samples that provide high level insights on your data - however, as you point out, larger datasets are more challenging and need further and deeper analysis to get all the outliers. The challenge we have is limiting the movement of data. A couple of additional details on your use case would help: what is the typical size of your datasets, in rows/columns. What is the data access mode to these datasets? Live or Cached, or Extracted? Would an interim solution to support full dataset profiling for only Cached or Extracted datasets be valuable in your use case? Thanks again for being part of our community and look forward to your answers.
2 -
@Luis E. Rivas -Oracle Thanks for your prompt response. We usually have datasets that are not more than 100,000 rows (column numbers vary anywhere from 30 to 60 or in some cases even more). The data access mode for the use cases we have currently are mostly live but wouldn't mind changing it to cached mode if we have full dataset profiling available just for cached datasets.
Yes, the interim solution of having the full dataset profiling support for only Cached/Extracted datasets would be perfect. Thank you for considering this.0 -
Thanks for the additional details @Dhaval Parikh Stantec - We will keep track on the votes on this and keep you updated.
1