Skip to content

oarisur/Qd2i

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Qd2i - Lightweight Data Analytics Tool for Small Businesses

Streamlit App

Qd2i is a user-friendly, no-code data analytics tool specifically designed for small business owners and non-technical users. Built with Streamlit, it empowers you to effortlessly explore, analyze, and transform your business data.

✨ Features

  • Data Input:
    • Upload data from CSV and Excel files.
    • Fetch data from a Product Information Management (PIM) system via URL and API authentication (JSON format).
  • Data Overview:
    • Display key data metrics: total rows, columns, numerical/categorical columns, duplicates, missing data.
    • Quick column visualization: histograms for numerical, bar charts for categorical data.
    • Detailed column summary with data quality checks (missing values, data types).
    • Correlation heatmap for numerical columns.
    • Interactive dataset preview with column selection and row range control.
    • Display of duplicate rows.
    • Display of rows with missing values.
    • Missing data visualization: heatmap and correlation.
    • Data inconsistency checks.
    • Outlier detection using box plots.
  • Data Transformation:
    • Interactive data transformation pipeline.
    • Various transformation steps:
      • Remove Outliers (Z-score, IQR).
      • Convert Data Type.
      • Manipulate Strings (trim, replace, regex, case conversion, extract).
      • Feature Engineering (sum, difference, product, ratio, custom formulas).
      • Scale Data (MinMaxScaler, StandardScaler, RobustScaler).
      • Log Transformation.
      • Mathematical Transformation (Square root, Box-Cox, Yeo-Johnson).
      • Encode Categorical (Label, One-Hot, Binary, Hashing, Target).
      • Remove Duplicates.
      • Handle Missing Values (remove, fill, forward/backward fill, KNN, MICE).
      • Delete Rows/Columns.
      • Binning.
      • Polynomial Features.
      • Split Column.
      • Join Columns.
      • Extract Date/Time Features.
  • Data Validation:
    • Define and apply custom validation rules.
    • Support for various rule types: range, regex, set, datatype, conditional.
    • Interactive rule editor.
    • Detailed validation results and data quality summary.
  • Data Analysis:
    • Tabbed interface for different analysis types: Numerical, Categorical, Date, Advanced, Time Series, Sales, Product, Customer, Marketing, Inventory.
    • Interactive visualizations: histograms, bar charts, scatter plots, line charts, box plots, violin plots, Q-Q plots, pie charts, treemaps, network graphs, parallel coordinates, heatmaps, calendar plots, time series plots, ACF/PACF plots, rolling statistics, sales funnels, choropleth maps, stacked bar charts, Pareto charts, lifecycle curves, cohort analyses, RFM segmentation, CLV distributions, cluster visualizations, marketing funnels, etc.
    • Statistical analyses: descriptive statistics, correlation analysis, time series decomposition, RFM calculation, CLV estimation, clustering, sentiment analysis, etc.
    • Domain-specific analyses for various business areas.

🛠️ Installation and Setup

  1. Install Python and pip: Ensure you have Python (version 3.6 or higher is recommended) and pip installed on your system. You can check this by running python --version and pip --version in your terminal or command prompt. If not installed, you can download them from python.org.

  2. Create a virtual environment (recommended):

    # For Linux/macOS
    python -m venv venv
    source venv/bin/activate
    
    # For Windows
    python -m venv venv
    venv\Scripts\activate
  3. Install required packages:

    pip install -r requirements.txt
  4. Run the Streamlit app: Save the Python code for Qd2i as a .py file (e.g., qd2i_app.py) and then run it using Streamlit:

    streamlit run qd2i_app.py

    This will open the Qd2i application in your web browser.

⚙️ Usage Instructions

  1. Data Input:

    • In the "📥 Data Input" expander, click on the file uploader to upload your CSV or Excel file.
    • Alternatively, if you want to fetch data from a PIM system, provide the PIM URL and your API authentication details in JSON format in the designated fields.
    • Click the "Process Files" button to load your data.
  2. Data Overview:

    • Expand the "📋 Data Overview" section to see a summary of your data, including key metrics and initial visualizations.
    • Use the column visualization section to get a quick look at the distribution of individual columns.
    • Review the column summary for data types and missing values.
    • Examine the correlation heatmap to understand relationships between numerical columns.
    • Use the dataset preview to view a sample of your data and select specific columns or row ranges.
    • Check the "Duplicate Row display" and "Missing row display" to identify and understand these data quality issues.
    • The "Missing heatmap and missing correlation" sections provide visual insights into missing data patterns.
    • Review the "Data Inconsistency check" results to identify potential inconsistencies in your data.
    • Use the "Outlier detection" section to visualize and identify outliers in numerical columns using box plots.
  3. Data Transformation:

    • Navigate to the "🛠️ Data Transformation" expander.
    • Click the "➕ Add Transformation Step" button to add a new transformation to your pipeline.
    • Select the desired transformation type from the dropdown menu and configure the necessary parameters for that step.
    • The transformations will be applied sequentially in the order they appear in the pipeline.
    • Review the "Transformed Data Preview" to see the results of your transformations.
  4. Data Validation:

    • In the "Data Validation" section, define custom rules to validate your data.
    • Use the interactive form to add new validation rules, edit existing ones, or remove rules. You can specify rules based on range, regular expressions, sets of allowed values, data types, and conditional logic.
    • Choose whether to apply the validation rules to the raw data or the transformed data.
    • Review the "Validation Results" and the "Data Quality Summary" to see if your data meets the defined criteria.
  5. Data Analysis:

    • Click on the different analysis tabs (Numerical, Categorical, Date, Advanced, Time Series, Sales, Product, Customer, Marketing, Inventory) to explore your data in various ways.
    • Select the data source you want to analyze (either the raw data or the transformed data).
    • Within each tab, choose the relevant columns for your analysis and configure any specific parameters for the visualizations or statistical tests.
    • Interact with the generated charts and review the statistical results to gain insights from your data.

About

No-code data analytics tool for small businesses: data cleaning, transformation, validation, and advanced analysis built with Streamlit and Python.

Topics

Resources

Stars

Watchers

Forks

Contributors

Languages