Anuj Varma has provided Data Science expertise to customers. This has included Hadoop and BigData Analytics, Traditional B.I., Public and Private Visualizations using Tableau, and related tools.
This list of probing BigData and B.I. questions is proprietary; no part may be used without explicit permission from Anuj Varma.
Multiple Data Sources. Relational versus Non-Relational Data Requirements. Hadoop Readiness Assessment
Organizations are more frequently using disparate data types from many sources. Data sources might be highly structured in relational databases; other data might be semi-structured such as data from sensors and system logs.
Large amounts of data might be in managed in a Hadoop cluster, provided they pass the Hadoop Readiness criteria.
Private versus Public Data – Governance Requirements
· Is there a need to strip information or to anonymize any data that might be moved to the public cloud?
· Is there a need to mask sensitive data in order to analyze that information?
· Is there a need to bring a third-party data into the data warehouse? (This may entail separate network level isolation for the DW as well as virtual private network (VPN) gateways that work for the external providers)
Insights, Visualizations from Data, Reporting and Device Requirements
· Do business users need the ability to generate their own visualizations?
· Will Visualizations be public (visible to everyone), private (restricted to internal users) or a combination of the two?
· Is there a need to integrate existing applications (via APIs) to such visualizations?
· Once insights are generated, what will be the primary interaction mode with the insights? (Desktops or Mobile phones and tablets or BOTH)?
Initial Data Upload and Incremental Data Uploads, Pre-processing Requirements
· What is the initial data set size that will need to be uploaded?
· What are approximate incremental data set sizes? What is the frequency of the incremental updates?
· What is the level of pre-processing that needs to be performed on the data prior to upload? (Minor, somewhat significant, major?)
· Does the future data warehouse consist of both OLAP (analytics data) as well as OLTP (transactional data)? If so, would the preference be to run both workloads off a single analytics platform?
Predictive Analytics versus Business Intelligence Requirements
· Is there a need to perform predictive analytics on the data? What would be an example of such a use case (e.g. sales forecasting, risk modeling, cross-selling…)?
· Are there existing predictive models? What is the volume and the frequency of the predictive models being built and how often are they updated?