The big data system is the future of business. Banking organizations, that deals with a large number of data, are considering this system already. Many e-commerce sites are also considering this system. The system work on the principle of data training (Ekong & Vihinen, 2019). However, it creates implicit bias. Word2vec and Glove are already notorious for affecting the personal interest of people. It has been marked as racist and sexist by many clients. Thus, the researcher will analyze the validity of the same. The researcher will further find the obligations of this system in the banking industry. Based on these obligations, the researcher will find proper solutions to use a big data system and its precautions.
Part a) Application of word embedding in Business:
Word embedding is the representation of a word with a vector form to ease data mining and data sorting. It is a part of the natural learning process (NLP) in machine learning. It can be applied in business as mentioned below.
Word2vec
Glove
Part b) Implicit bias in the applications:
The positive facts choose word embedding applications as an ideal choice in business. However, it has some detrimental effects due to its origin follows some implicit issues. They are-
Part c) Alleviate the implicit bias in a word embedding
As mentioned above the gender bias has been a real challenge for Word embedding technique. However, it can be solved by considering two different methods. For example, the origin of bias in Word embedding is due to its fundamental concepts of the data learning process. According to Rigg (2018), the convectional model allows this technique to opt for the train data concept. As long as the model is subjected under operation, the model collects a total number of keywords prefers them according to the gender of the user. This concept may seem a bit unclear. Hence, taking an example. If 600 people are searching for fighter jet pilot and 500 of them are a male candidate. Then the train data approach presumes a fighter jet warrior to be a male candidate. In contrast to this, a data manager should consider data augmentation technique along with gender tagging technique to diminish the implicit bias issue associated with data training technique.
Data augmentation technique: The editing of each data should be allowed. No such possibility must be made before identifying the gender. This technique is further classified into two more subcategories. Position augmentation and colour augmentation. Rotation, cropping, clarity and contrast of a data should be maintained manually using this method. In lieu of that, the editing with the content should be permitted in this section (Collmann & Matei, 2016). So, no other entity can tag a word with a gender. This technique will not only diminish the issue but some extra advantage will be provided to a data manager. For example, colour augmentation method can provide numerous scopes to the data editor to manage the appearance of a data. Thus, along with diminishing the issue of gender bias policy.
Gender tagging: Data augmentation is sufficient to manage the implicit gender bias in a word embedding. However, extra protection is always preferable. Therefore, manual gender tagging is proposed. Besides, it can be an economic solution for companies with a low capital market. Data augmentation requires an ample amount of money and a regular cloud management process. Smaller industries can't invest a large capital regularly (Scassa & Taylor, 2017). Therefore, gender tagging is another option. Here, search content is managed manually. For example, it does not tag the warrior jet pilot as a male candidate. Rather than, it checks the image. If the image is found to a male pilot then a proper gender is tagged with the image. Similarly, for names, masculine names are tagged with the male gender. While the feminine names are tagged with female data. However, the unisex names are not tagged with any gender. Unlike the training data policy, where a probability determines the characteristics. Here, a complete assurance determines the characteristics of a data.
Part a) Benefits of using Big data system in financial organizations:
A human operator can't identify the validity of data for a million number of applicants. Each of these application is somehow related to hundreds of links. Therefore, the financial organizations are considering big data system to identify the risk associated with a loan application. It takes a minimum amount of time to provide a maximum validated result (Nair, 2020). Also, the data checks for any third-party links associated with the application. For example, in the given case study, Fred and Tamara were linked to a third-party source of data. Each of these data is analyzed by the system and no proper link has been found by the big data system. This source may be reliable as claimed by the couple. But there is an assurance that the flow of data will remain constant in future. If this flow is reduced or terminated in future, then their business may get affected. However, the enhanced big data system easily identified this issue and marked it as a medium to high-risk business.
Part b) Ethical harm in life interest faced by Fred and Tamara
Every people have a desire in life. Maslow probably describes these desires in the most scientific form. Thee basic desires are food to live, air to breathe, shelter to reside and physiological needs to suppress the basic instinct. Beyond that, a person seeks for respect, social position, a dream of career, reputation, knowledge, liberty in life, social security, economic security, entrepreneurship and many more (Cooper, 2016). A chief data manager, who analyses the loan application data, often considers the personal interests and bond of the application with the applicant. It helps one to identify the ethical life interests. However, a software-driven data analysis does not consider any ethical interest of a person. The same has been faced by Tamara and Fred.
Liberty in life: Tamara and Fred desired their life to be "their own boss" as mentioned in the case study. However, the loan application has been rejected by a soulless algorithm system. Therefore, their dream to be an entrepreneur is no longer looking possible. The advanced data learning system in big data analysis have marked the previous result. If they apply again then the application will be rejected again for its record.
Respect: Tamara and Fred believed that a business can provide respect in society (Beretta et al. 2018). However, the rejected application put a barrier in front of their respectful life. Also, the reason of loan application is mentioned as improper source odd data. It may harm their respect as they are applying loan for illicit business.
Economic security: The desire of Tamara and Fred may find a way to manage their financial requirement for the business. However, a loan from a bank is always an assurance in business. The security for the business and life may be destroyed by the algorithm system as it rejected the loan application.
Social security: It is obvious that being an entrepreneur is a better position in society than being employed. Therefore, this social security was a part of Fred and Tamara's life interest. This rejected loan application has also provided harm in this area.
Reputation: As their existing consultant already mentioned that their loan application draft was perfect. They probably had notified their friends, family members and other keen persons to them. These people are the most important portion of life interest (Beretta et al. 2018). If they again tell these people about the reject of the loan. Then it may affect their reputation as well.
Part c) Harm in society due to this system:
The big data analysis is dependent on the training data concept. As the loan has been marked as risk loan. All the entities associated with this loan is considered a part of a medium to a high-risk member (Scassa & Taylor, 2017). For example, Fred and Tamara acted as a stakeholder to another person’s business. The other person is now applying for the loan. As the system detects Fred and Tamara in the data, the system will alarm a bell of risk. Hence, it may provide immense harm to every individual linked with these two people.
In addition to this, the default nature of implicit bias of the training data concept may target any third-party data as risk data. If two or more such cases are detected by the system, then the system will never analyze the third-party data and presume it as a risk data due to its implicit bias issue.
Part d) Preventions to the Harms due to the big data system
As the issues identified in section 2b and 2c, the researcher recommends using three different solutions to be adopted by a financial organization.
In addition to this, the model will recover the names of the applicants and its stakeholders. The default value of the these will be reset and no such presumption will be made in future.
In this assignment, the researcher has initiated the implicit issues associated with machine learning systems. It can bee a great threat to analyze ethical data. As the system is designated to abide by the rules of its algorithm, it may never consider the social, economic and personal impact of a person. Thus, human accountability with a trained professional is highly recommended in this case.
Beretta, E., Vetrò, A., Lepri, B., & De Martin, J. C. (2018, September). Ethical and Socially-Aware Data Labels. In Annual International Symposium on Information Management and Big Data (pp. 320-327). Springer, Cham.
Boté, J. J., & Térmens, M. (2019). Reusing Data: Technical and Ethical Challenges. DESIDOC Journal of Library & Information Technology, 39(6).
Bourhis, P., Demartini, G., Elbassuoni, S., Hoareau, E., & Rao, H. R. (2019). Ethical Challenges in the Future of Work. Data Engineering, 55.
Clark, K., Duckham, M., Guillemin, M., Hunter, A., McVernon, J., O’Keefe, C., ... & Waycott, J. (2019). Advancing the ethical use of digital data in human research: challenges and strategies to promote ethical practice. Ethics and Information Technology, 21(1), 59-73.
Collmann, J., & Matei, S. A. (Eds.). (2016). Ethical Reasoning in Big Data: An Exploratory Analysis. Springer.
Cooper, H. (2016). Ethical choices in research: Managing data, writing reports, and publishing results in the social sciences. American Psychological Association.
Corrall, S., & Currier, J. D. (2017). Ethical Issues of Big Data 2.0 Collaborations: Roles and Preparation of Information Specialists.
Ekong, R., & Vihinen, M. (2019). Checklist for gene/disease‐specific variation database curators to enable ethical data management. Human mutation, 40(10), 1634-1640.
Firmani, D., Tanca, L., & Torlone, R. (2019). Ethical Dimensions for Data Quality. Journal of Data and Information Quality (JDIQ), 12(1), 1-5.
Jennette Chalcraft CPA, C. A. (2018). Drawing ethical boundaries for data analytics. Information Management, 52(1), 18-25.
Kwan, K., Schneider, J., & Ullman, J. S. (2019). Decompressive craniectomy: Long term outcome and ethical considerations. Frontiers in neurology, 10, 876.
Nair, H. L. K., Ten Wong, D. H., Fouladynezhad, N., Yusof, N. N., Chong, M. C., & Maarop, N. (2018). Big Data: Ethical, Social and Political Issues in Telecommunication Industry. Open International Journal of Informatics (OIJI), 18-25.
Nair, S. R. (2020). A review of ethical concerns in big data management. International Journal of Big Data Management, 1(1), 8-25.
O'Keefe, K., & Brien, D. O. (2018). Ethical data and information management: concepts, tools and methods. Kogan Page Publishers.
Olarewaju, O. M. (2018). Ethical Data Management and Research: Managing Ethical Issues for Research Integrity in Education. In Ensuring Research Integrity and the Ethical Management of Data (pp. 209-218). IGI Global.
Rigg, T. (2018). The ethical considerations for storing client information online. Professional Psychology: Research and Practice, 49(5-6), 332.
Scassa, T., & Taylor, F. (2017). Legal and ethical issues around incorporating Traditional Knowledge in polar data infrastructures. Data Science Journal, 16.
Remember, at the center of any academic work, lies clarity and evidence. Should you need further assistance, do look up to our Business Analytics Assignment Help
1,212,718Orders
4.9/5Rating
5,063Experts
Turnitin Report
$10.00Proofreading and Editing
$9.00Per PageConsultation with Expert
$35.00Per HourLive Session 1-on-1
$40.00Per 30 min.Quality Check
$25.00Total
FreeGet
500 Words Free
on your assignment today
Doing your Assignment with our resources is simple, take Expert assistance to ensure HD Grades. Here you Go....