๐งช Random User Dataset Generator
Type: Python
Role: Data Analyst
Context: Personal Project โ Data Preparation & Data Engineering Support
๐งฉ Business Context
Analytics and BI teams frequently require realistic datasets to test queries, dashboards, data pipelines, and validation logic.
However, real production data is often unavailable due to privacy, security, or accessibility constraints.
This project addresses that need by providing a flexible synthetic data generator that produces structured datasets ready for analysis.
๐ฏ Objective
Develop a Python-based tool capable of generating custom-sized synthetic datasets in CSV or JSON format, allowing analysts to quickly create reusable data inputs for analytical workflows.
๐ ๏ธ Tools & Technologies
- Python
- Pandas
- NumPy
- Random & Faker-style data generation
- CSV / JSON file handling
โ๏ธ Dataset Generation Logic
The generator follows a structured process:
- User-defined number of records
- Standardized user schema
- Realistic value distributions
- Consistent formatting across all rows
- Output format selection (CSV or JSON)
This ensures datasets are analysis-ready and consistent across multiple executions.
๐ฅ๏ธ Program Execution Examples
The following examples show the program generating a dataset in CSV format based on user input:
- The desired row numbers
- The exportation format (csv, json)
- Database name
๐ CSV File Generation
Input

Output

๐ Generated file:
๐ฆ JSON File Generation
This example demonstrates the same dataset generation logic exported as a JSON file, suitable for APIs or NoSQL-based workflows.
Input

Output

๐ Generated file:
๐ Output Validation
Basic validation steps were applied to ensure usability for analytics:
- Row count matches user-defined input
- Consistent schema across records
- No unexpected null values
- Clean formatting for direct SQL or Python ingestion
๐ก Business Value
- Enables rapid prototyping and testing
- Eliminates dependency on sensitive or restricted data
- Supports SQL analysis, EDA, and BI dashboard development
- Improves efficiency in analytics and data engineering workflows
๐ Project Resources
-
๐ Source Code (GitLab):
https://gitlab.com/acastro97/random_users_list_generator -
๐ Sample Outputs:
- CSV and JSON files included in this project