Task is to implement TPCH Query 5 using C++ and multithreading.
- Understand the query 5 given below.
- Generate input data for the query using TPCH Data Generation Tool
- Clone the current repository to your personal github.
- Complete the incomplete functions in query5.cpp file
- Compile and run the program. Capture the result with single thread and 4 thread.
- Share the link of your completed repository in email with below mentioned information.
| S.No. | Item | Description |
|---|---|---|
| 1 | GitHub Link | Share the GitHub link with your completed code. Make sure the program compiles and produces the same result as mentioned in point 2. |
| 2 | Final Result | Provide the final result of the query at SF2 (Scale factor 2). |
| 3 | Runtime | Share the runtime numbers for both single-threaded and 4-threaded execution. |
| 4 | Screenshot | Attach a screenshot showing the program running with the result visible. |
Note : Submissions without above 4 details would be considered as incomplete
select
n_name,
sum(l_extendedprice * (1 - l_discount)) as revenue
from
customer,
orders,
lineitem,
supplier,
nation,
region
where
c_custkey = o_custkey
and l_orderkey = o_orderkey
and l_suppkey = s_suppkey
and c_nationkey = s_nationkey
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'ASIA'
and o_orderdate >= '1994-01-01'
and o_orderdate < '1995-01-01'
group by
n_name
order by
revenue desc
- CMake (version 3.10 or higher)
- C++ compiler (supporting C++11 or later)
- TPCH Data Generation Tool : Generate data for query using this tool at scale factor 2
-
Clone the repository:
git clone <repository-url> cd tpch-query5-cpp
-
Create a build directory and navigate into it:
mkdir build cd build -
Generate the build files with CMake:
cmake ..
-
Compile the project:
make
To run the program in single-threaded mode, use the following command:
./tpch_query5 --r_name ASIA --start_date 1994-01-01 --end_date 1995-01-01 --threads 1 --table_path /path/to/tables --result_path /path/to/resultsTo run the program in multi-threaded mode, specify the number of threads (e.g., 4):
./tpch_query5 --r_name ASIA --start_date 1994-01-01 --end_date 1995-01-01 --threads 4 --table_path /path/to/tables --result_path /path/to/results- Run the program with the desired parameters.
- The results will be output to the specified result path.
- Analyze the results to compare performance between single-threaded and multi-threaded execution.
- Parallelization: By dividing the workload among multiple threads, the program can process data concurrently, reducing the overall execution time.
- Efficiency: Multithreading is particularly effective for I/O-bound tasks (like reading data) and CPU-bound tasks (like processing and joining tables).
- Scalability: The speedup is expected to scale with the number of threads, up to the point where the overhead of thread management becomes significant.
- Ensure that the TPCH data is correctly generated and placed in the specified table path.
- Adjust the number of threads based on your system's capabilities and the size of the dataset.
If you encounter any issues during build or execution, please check the following:
- Ensure all dependencies are installed.
- Verify that the TPCH data is correctly formatted and accessible.
- Check the command line arguments for correctness.