Skip to content

Pandas Big Data

This will generate tons of data using fake2db,

Create fake data:

fake2db --rows 1000000 --db sqlite --custom date random_int currency_code

2021-10-23 10:08:07,922 bl       Rows argument : 1000000
2021-10-23 10:08:07,947 bl       Database created and opened succesfully: sqlite_ORAUCPVE.db
2021-10-23 10:08:07,947 bl       fake2db found valid custom key provided: date
2021-10-23 10:08:07,947 bl       fake2db found valid custom key provided: random_int
2021-10-23 10:08:07,947 bl       fake2db found valid custom key provided: currency_code
2021-10-23 10:08:25,024 bl       custom Commits are successful after write job!

Load data:

Be sure to set sqlitedb name.

#!/usr/bin/env python3

import sqlite3
import pandas as pd
import numpy as np
import time
from rich_dataframe import prettify

conn = sqlite3.connect('sqlite_ORAUCPVE.db')
cur = conn.cursor()

sql_query = pd.read_sql_query ('''
                               SELECT
                               *
                               FROM custom
                               ''', conn)

#df = prettify(pd.DataFrame(sql_query, columns = ['date', 'random_int','currency_code' ]),
#                row_limit=50, first_rows=True)

df = pd.DataFrame(sql_query, columns = ['date', 'random_int','currency_code' ])
print(df)

Run it:

# bl @ bl-dt in ~/proj/fakeDataPY [10:13:22] 
$ ./new.py 
              date random_int currency_code
0       2001-08-30       5880           LAK
1       1993-10-13       8486           ARS
2       2014-06-04       7055           RUB
3       1998-08-19       1997           ERN
4       2008-11-15       7752           GTQ
...            ...        ...           ...
999995  1993-10-24       4791           TVD
999996  1976-06-14       6420           KMF
999997  1992-10-14       5631           BMD
999998  2013-02-10       3235           MUR
999999  2000-06-20       1544           CUC

[1000000 rows x 3 columns]

Next, load in datatime to make the date column usable: