pyxplr package¶
Submodules¶
pyxplr.explore_feature_map module¶
-
pyxplr.explore_feature_map.explore_feature_map(df, features=[])¶ Returns a cumulative faceted plot on pairwise feature relationships. The plot consists of NxN mini-charts where N is number of features. Main diagonal shows feature distribution. Pairwise Pearson correlations are shown above main diagonal. Pairwise feature joint distributions are shown below main diagonal.
NOTE: Non-numeric features will be skipped. All passed features should not include any missing data, otherwise an error will be raised.
Parameters: - dataframe (pandas.DataFrame) – The target dataframe to explore
- features (Array-like) – An array of strings representing feature names to include in the plot. Empty array means all features (Default = [])
Returns: The Altair chart is returned as the result.
Return type: Altair.Chart
Raises: TypeError– Invalid data frame.ValueError– Invalid features specification. No numeric features present in the dataset. Dataframe must not include any missing data. Features specification includes a non-existent feature. Features specification includes a non-numeric feature.
Notes
The function will only work with numeric features. Non-numeric features will be omitted.
Current implementation has performance limitation imposed by Altair - large datasets may take some time to render.
Examples
>>> df = pd.DataFrame({'col1': [1, 2, 4, 3, -1, 10], >>> 'col2': [3, 1 ,5, -2, 3, -1], >>> 'col3': [8, 1, 2, 3, 11, 10]}) >>> explore_feature_map(df)
pyxplr.explore_missing module¶
Created on February 28, 2020 @author: Braden Tam Implementation of the explore_missing function in the pyxplr package.
-
pyxplr.explore_missing.explore_missing(df, num_rows=0, df_type='location')¶ explore_missing will identify missing observations within df. It will return 1 of 2 tables: (location) 1 table of the exact location in the dataframe where there is missing data or (count) another table showing how many observationsare missing and the proportion of how much data is missing for each feature.
Parameters: - df (pandas.DataFrame) – The target dataframe to explore
- num_rows (integer) – The number of rows above and below the missing value to output
- df_type (str) – The desired type of output (location or count)
Returns: type – The resultant dataframe
Return type: pandas.DataFrame
Raises: ValueError– num_rows must be a positive integer num_rows must be of type int There are no missing values in the dataframeTypeError– Data must be a pandas DataFrameNameError– Type must be either “count” or “location”
Examples
>>> test = pd.DataFrame({'col1': [1, 2, None, 3, 4], >>> 'col2': [2, 3, 4, 5, 6]}) >>> explore_missing(test, num_rows = 1) >>> explore_missing(test, df_type = "count")
pyxplr.explore_outliers module¶
-
pyxplr.explore_outliers.explore_outliers(df, std_range)¶ Explores outliers in each feature of dataset based on given standard deviation range. Before calculation, NA rows are dropped and only numeric columns are considered for calculation.
Parameters: - df (pandas.DataFrame) – Target dataframe to explore
- std_range (integer) – Number of standard deviations used to find outliers
Returns: DataFrame – Dataframe containing the number of outliers for each numeric feature.
Return type: pandas.DataFrame
Raises: TypeError. Raises exception if the input is not pandas.DataFrame.
Notes
Does not consider non-numeric features.
Examples
>>> df = pd.DataFrame({'col1': [1, 2, 1.00, 3, -1, 100], >>> 'col2': [3, 1 ,5, -2, 3, -1]}) >>> explore_outliers(df, 2)
pyxplr.explore_summary module¶
-
pyxplr.explore_summary.explore_summary(df)¶ Print out the column names for categorical columns and numeric columns and the basic statistics summary: mean, variance, 0.25, 0.5, 0.75 quantile, min and max for numeric columns from provided data.
Parameters: dataframe (pandas.DataFrame) – The target dataframe to explore Returns: Dataframe with summary details on each numeric feature Return type: pandas.DataFrame Raises: Error– DescriptionExamples
>>> df = pd.DataFrame({"A":[12, 4, 5, 44, 1], >>> "B":["apple", "banada", "orange", >>> "strawberry", "blueberry"], >>> "C":["2", "1", "3", "4", "6"], >>> "D":[14, 3, 17, 2, 6]}) >>> explore_summary(df)